DanteGPU is a sophisticated virtual machine management system designed specifically for AI workload distribution and GPU resource sharing. Built with Rust, it provides a robust, high-performance solution for managing VMs with GPU passthrough capabilities.
DanteGPU serves as the core component of the GPU Share Platform, offering:
- VM lifecycle management with GPU passthrough
- Real-time resource monitoring
- Automated GPU management
- RESTful API interface
- CLI tools for system management
- Full lifecycle control (create, start, stop, delete)
- GPU passthrough support
- Resource allocation optimization
- Template-based VM creation
- Automated recovery mechanisms
- Automated device discovery
- Dynamic GPU allocation
- Multi-vendor support (NVIDIA, AMD)
- Performance metrics tracking
- Resource isolation
- Real-time resource tracking
- Performance metrics collection
- GPU utilization monitoring
- Memory usage tracking
- Temperature and power monitoring
- RESTful API endpoints
- Git-style CLI commands
- Colored terminal output
- Async command processing
- Comprehensive error handling
-
Configuration Management
- Hierarchical config system
- Multiple override layers
- Environment variable support
- TOML-based configuration
- Secure secrets handling
-
CLI System
gpu-share ├── serve [--port] # API server management ├── vm # VM operations │ ├── list # List all VMs │ ├── create # Create new VM │ ├── start # Start VM │ ├── stop # Stop VM │ └── delete # Remove VM ├── gpu # GPU management │ ├── list # List GPUs │ ├── attach # Attach GPU to VM │ └── detach # Detach GPU from VM └── init # Generate config
-
API Endpoints
/api/v1/vms
- VM management/api/v1/gpus
- GPU operations/api/v1/metrics
- Performance metrics- RESTful design principles
- JSON payload support
-
Monitoring System
- Resource metrics collection
- Performance tracking
- Health monitoring
- Metrics retention management
- Real-time alerts
-
System Requirements
- Linux kernel with IOMMU support
- QEMU/KVM virtualization
- Libvirt daemon
- Compatible GPU (NVIDIA/AMD)
- Rust toolchain (latest stable)
-
Optional Components
- NVIDIA driver (for NVIDIA GPUs)
- AMD driver (for AMD GPUs)
- Docker (for containerized deployment)
-
System Setup
# Install dependencies sudo apt install qemu-kvm libvirt-daemon-system # Clone repository git clone https://github.com/yourusername/gpu-share-vm-manager cd gpu-share-vm-manager # Build project cargo build --release
-
Configuration
# Generate default config ./target/release/gpu-share init # Edit configuration (optional) vim config/default.toml
-
Start Service
# Run API server ./target/release/gpu-share serve --port 3000
- Input validation on all endpoints
- Resource limits enforcement
- Secure configuration management
- Environment variable protection
- API authentication (coming soon)
- Resource isolation
# Create new VM with GPU
gpu-share vm create --name ai-worker-01 --memory 8192 --vcpus 4 --gpu
# List available GPUs
gpu-share gpu list
# Attach GPU to VM
gpu-share gpu attach --vm-name ai-worker-01 --gpu-id 0
- CPU usage tracking
- Memory utilization
- GPU metrics
- Utilization percentage
- Memory usage
- Temperature
- Power consumption
- Performance analytics
- Resource optimization
We welcome contributions! Please see our CONTRIBUTING.md for guidelines.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
Currently in active development. Features being worked on:
- Enhanced GPU scheduling
- Multi-node support
- Advanced monitoring
- Security enhancements
- Performance optimizations
Full documentation available in /docs
:
- Installation Guide
- Configuration Reference
- API Documentation
- Development Guide
- Security Guidelines
Remember: With great GPU power comes great electricity bills! 🔋