346 lines
9.6 KiB
Markdown
346 lines
9.6 KiB
Markdown
|
|
# VPS Sizing for Production Deployment
|
|||
|
|
|
|||
|
|
## Executive Summary
|
|||
|
|
|
|||
|
|
This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at **clouding.io** for a **10-tenant pilot program** during the first 6 months.
|
|||
|
|
|
|||
|
|
### Recommended VPS Configuration
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
RAM: 20 GB
|
|||
|
|
Processor: 8 vCPU cores
|
|||
|
|
SSD NVMe (Triple Replica): 200 GB
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Estimated Monthly Cost**: Contact clouding.io for current pricing
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Resource Analysis
|
|||
|
|
|
|||
|
|
### 1. Application Services (18 Microservices)
|
|||
|
|
|
|||
|
|
#### Standard Services (14 services)
|
|||
|
|
Each service configured with:
|
|||
|
|
- **Request**: 256Mi RAM, 100m CPU
|
|||
|
|
- **Limit**: 512Mi RAM, 500m CPU
|
|||
|
|
- **Production replicas**: 2-3 per service (from prod overlay)
|
|||
|
|
|
|||
|
|
Services:
|
|||
|
|
- auth-service (3 replicas)
|
|||
|
|
- tenant-service (2 replicas)
|
|||
|
|
- inventory-service (2 replicas)
|
|||
|
|
- recipes-service (2 replicas)
|
|||
|
|
- suppliers-service (2 replicas)
|
|||
|
|
- orders-service (3 replicas) *with HPA 1-3*
|
|||
|
|
- sales-service (2 replicas)
|
|||
|
|
- pos-service (2 replicas)
|
|||
|
|
- production-service (2 replicas)
|
|||
|
|
- procurement-service (2 replicas)
|
|||
|
|
- orchestrator-service (2 replicas)
|
|||
|
|
- external-service (2 replicas)
|
|||
|
|
- ai-insights-service (2 replicas)
|
|||
|
|
- alert-processor (3 replicas)
|
|||
|
|
|
|||
|
|
**Total for standard services**: ~39 pods
|
|||
|
|
- RAM requests: ~10 GB
|
|||
|
|
- RAM limits: ~20 GB
|
|||
|
|
- CPU requests: ~3.9 cores
|
|||
|
|
- CPU limits: ~19.5 cores
|
|||
|
|
|
|||
|
|
#### ML/Heavy Services (2 services)
|
|||
|
|
|
|||
|
|
**Training Service** (2 replicas):
|
|||
|
|
- Request: 512Mi RAM, 200m CPU
|
|||
|
|
- Limit: 4Gi RAM, 2000m CPU
|
|||
|
|
- Special storage: 10Gi PVC for models, 4Gi temp storage
|
|||
|
|
|
|||
|
|
**Forecasting Service** (3 replicas) *with HPA 1-3*:
|
|||
|
|
- Request: 512Mi RAM, 200m CPU
|
|||
|
|
- Limit: 1Gi RAM, 1000m CPU
|
|||
|
|
|
|||
|
|
**Notification Service** (3 replicas) *with HPA 1-3*:
|
|||
|
|
- Request: 256Mi RAM, 100m CPU
|
|||
|
|
- Limit: 512Mi RAM, 500m CPU
|
|||
|
|
|
|||
|
|
**ML services total**:
|
|||
|
|
- RAM requests: ~2.3 GB
|
|||
|
|
- RAM limits: ~11 GB
|
|||
|
|
- CPU requests: ~1 core
|
|||
|
|
- CPU limits: ~7 cores
|
|||
|
|
|
|||
|
|
### 2. Databases (18 PostgreSQL instances)
|
|||
|
|
|
|||
|
|
Each database:
|
|||
|
|
- **Request**: 256Mi RAM, 100m CPU
|
|||
|
|
- **Limit**: 512Mi RAM, 500m CPU
|
|||
|
|
- **Storage**: 2Gi PVC each
|
|||
|
|
- **Production replicas**: 1 per database
|
|||
|
|
|
|||
|
|
**Total for databases**: 18 instances
|
|||
|
|
- RAM requests: ~4.6 GB
|
|||
|
|
- RAM limits: ~9.2 GB
|
|||
|
|
- CPU requests: ~1.8 cores
|
|||
|
|
- CPU limits: ~9 cores
|
|||
|
|
- Storage: 36 GB
|
|||
|
|
|
|||
|
|
### 3. Infrastructure Services
|
|||
|
|
|
|||
|
|
**Redis** (1 instance):
|
|||
|
|
- Request: 256Mi RAM, 100m CPU
|
|||
|
|
- Limit: 512Mi RAM, 500m CPU
|
|||
|
|
- Storage: 1Gi PVC
|
|||
|
|
- TLS enabled
|
|||
|
|
|
|||
|
|
**RabbitMQ** (1 instance):
|
|||
|
|
- Request: 512Mi RAM, 200m CPU
|
|||
|
|
- Limit: 1Gi RAM, 1000m CPU
|
|||
|
|
- Storage: 2Gi PVC
|
|||
|
|
|
|||
|
|
**Infrastructure total**:
|
|||
|
|
- RAM requests: ~0.8 GB
|
|||
|
|
- RAM limits: ~1.5 GB
|
|||
|
|
- CPU requests: ~0.3 cores
|
|||
|
|
- CPU limits: ~1.5 cores
|
|||
|
|
- Storage: 3 GB
|
|||
|
|
|
|||
|
|
### 4. Gateway & Frontend
|
|||
|
|
|
|||
|
|
**Gateway** (3 replicas):
|
|||
|
|
- Request: 256Mi RAM, 100m CPU
|
|||
|
|
- Limit: 512Mi RAM, 500m CPU
|
|||
|
|
|
|||
|
|
**Frontend** (2 replicas):
|
|||
|
|
- Request: 512Mi RAM, 250m CPU
|
|||
|
|
- Limit: 1Gi RAM, 500m CPU
|
|||
|
|
|
|||
|
|
**Total**:
|
|||
|
|
- RAM requests: ~1.8 GB
|
|||
|
|
- RAM limits: ~3.5 GB
|
|||
|
|
- CPU requests: ~0.8 cores
|
|||
|
|
- CPU limits: ~2.5 cores
|
|||
|
|
|
|||
|
|
### 5. Monitoring Stack (Optional but Recommended)
|
|||
|
|
|
|||
|
|
**Prometheus**:
|
|||
|
|
- Request: 1Gi RAM, 500m CPU
|
|||
|
|
- Limit: 2Gi RAM, 1000m CPU
|
|||
|
|
- Storage: 20Gi PVC
|
|||
|
|
- Retention: 200h
|
|||
|
|
|
|||
|
|
**Grafana**:
|
|||
|
|
- Request: 256Mi RAM, 100m CPU
|
|||
|
|
- Limit: 512Mi RAM, 200m CPU
|
|||
|
|
- Storage: 5Gi PVC
|
|||
|
|
|
|||
|
|
**Jaeger**:
|
|||
|
|
- Request: 256Mi RAM, 100m CPU
|
|||
|
|
- Limit: 512Mi RAM, 200m CPU
|
|||
|
|
|
|||
|
|
**Monitoring total**:
|
|||
|
|
- RAM requests: ~1.5 GB
|
|||
|
|
- RAM limits: ~3 GB
|
|||
|
|
- CPU requests: ~0.7 cores
|
|||
|
|
- CPU limits: ~1.4 cores
|
|||
|
|
- Storage: 25 GB
|
|||
|
|
|
|||
|
|
### 6. External Services (Optional in Production)
|
|||
|
|
|
|||
|
|
**Nominatim** (Disabled by default - can use external geocoding API):
|
|||
|
|
- If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit
|
|||
|
|
- Storage: 70Gi (50Gi data + 20Gi flatnode)
|
|||
|
|
- **Recommendation**: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Total Resource Summary
|
|||
|
|
|
|||
|
|
### With Monitoring, Without Nominatim (Recommended)
|
|||
|
|
|
|||
|
|
| Resource | Requests | Limits | Recommended VPS |
|
|||
|
|
|----------|----------|--------|-----------------|
|
|||
|
|
| **RAM** | ~21 GB | ~48 GB | **20 GB** |
|
|||
|
|
| **CPU** | ~8.5 cores | ~41 cores | **8 vCPU** |
|
|||
|
|
| **Storage** | ~79 GB | - | **200 GB NVMe** |
|
|||
|
|
|
|||
|
|
### Memory Calculation Details
|
|||
|
|
- Application services: 14.1 GB requests / 34.5 GB limits
|
|||
|
|
- Databases: 4.6 GB requests / 9.2 GB limits
|
|||
|
|
- Infrastructure: 0.8 GB requests / 1.5 GB limits
|
|||
|
|
- Gateway/Frontend: 1.8 GB requests / 3.5 GB limits
|
|||
|
|
- Monitoring: 1.5 GB requests / 3 GB limits
|
|||
|
|
- **Total requests**: ~22.8 GB
|
|||
|
|
- **Total limits**: ~51.7 GB
|
|||
|
|
|
|||
|
|
### Why 20 GB RAM is Sufficient
|
|||
|
|
|
|||
|
|
1. **Requests vs Limits**: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because:
|
|||
|
|
- Not all services will run at their request levels simultaneously during pilot
|
|||
|
|
- HPA-enabled services (orders, forecasting, notification) start at 1 replica
|
|||
|
|
- Some overhead included in our calculations
|
|||
|
|
|
|||
|
|
2. **Actual Usage**: Production limits are safety margins. Real usage for 10 tenants will be:
|
|||
|
|
- Most services use 40-60% of their limits under normal load
|
|||
|
|
- Pilot traffic is significantly lower than peak design capacity
|
|||
|
|
|
|||
|
|
3. **Cost-Effective Pilot**: Starting with 20 GB allows:
|
|||
|
|
- Room for monitoring and logging
|
|||
|
|
- Comfortable headroom (15-25%)
|
|||
|
|
- Easy vertical scaling if needed
|
|||
|
|
|
|||
|
|
### CPU Calculation Details
|
|||
|
|
- Application services: 5.7 cores requests / 28.5 cores limits
|
|||
|
|
- Databases: 1.8 cores requests / 9 cores limits
|
|||
|
|
- Infrastructure: 0.3 cores requests / 1.5 cores limits
|
|||
|
|
- Gateway/Frontend: 0.8 cores requests / 2.5 cores limits
|
|||
|
|
- Monitoring: 0.7 cores requests / 1.4 cores limits
|
|||
|
|
- **Total requests**: ~9.3 cores
|
|||
|
|
- **Total limits**: ~42.9 cores
|
|||
|
|
|
|||
|
|
### Storage Calculation
|
|||
|
|
- Databases: 36 GB (18 × 2Gi)
|
|||
|
|
- Model storage: 10 GB
|
|||
|
|
- Infrastructure (Redis, RabbitMQ): 3 GB
|
|||
|
|
- Monitoring: 25 GB
|
|||
|
|
- OS and container images: ~30 GB
|
|||
|
|
- Growth buffer: ~95 GB
|
|||
|
|
- **Total**: ~199 GB → **200 GB NVMe recommended**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Scaling Considerations
|
|||
|
|
|
|||
|
|
### Horizontal Pod Autoscaling (HPA)
|
|||
|
|
|
|||
|
|
Already configured for:
|
|||
|
|
1. **orders-service**: 1-3 replicas based on CPU (70%) and memory (80%)
|
|||
|
|
2. **forecasting-service**: 1-3 replicas based on CPU (70%) and memory (75%)
|
|||
|
|
3. **notification-service**: 1-3 replicas based on CPU (70%) and memory (80%)
|
|||
|
|
|
|||
|
|
These services will automatically scale up under load without manual intervention.
|
|||
|
|
|
|||
|
|
### Growth Path for 6-12 Months
|
|||
|
|
|
|||
|
|
If tenant count grows beyond 10:
|
|||
|
|
|
|||
|
|
| Tenants | RAM | CPU | Storage |
|
|||
|
|
|---------|-----|-----|---------|
|
|||
|
|
| 10 | 20 GB | 8 cores | 200 GB |
|
|||
|
|
| 25 | 32 GB | 12 cores | 300 GB |
|
|||
|
|
| 50 | 48 GB | 16 cores | 500 GB |
|
|||
|
|
| 100+ | Consider Kubernetes cluster with multiple nodes |
|
|||
|
|
|
|||
|
|
### Vertical Scaling
|
|||
|
|
|
|||
|
|
If you hit resource limits before adding more tenants:
|
|||
|
|
1. Upgrade RAM first (most common bottleneck)
|
|||
|
|
2. Then CPU if services show high utilization
|
|||
|
|
3. Storage can be expanded independently
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Cost Optimization Strategies
|
|||
|
|
|
|||
|
|
### For Pilot Phase (Months 1-6)
|
|||
|
|
|
|||
|
|
1. **Disable Nominatim**: Use external geocoding API
|
|||
|
|
- Saves: 70 GB storage, 2 GB RAM, 1 CPU core
|
|||
|
|
- Cost: ~$5-10/month for external API (Google Maps, Mapbox)
|
|||
|
|
- **Recommendation**: Enable Nominatim only if >50 tenants
|
|||
|
|
|
|||
|
|
2. **Start Without Monitoring**: Add later if needed
|
|||
|
|
- Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores
|
|||
|
|
- **Not recommended** - monitoring is crucial for production
|
|||
|
|
|
|||
|
|
3. **Reduce Database Replicas**: Keep at 1 per service
|
|||
|
|
- Already configured in base
|
|||
|
|
- **Acceptable risk** for pilot phase
|
|||
|
|
|
|||
|
|
### After Pilot Success (Months 6+)
|
|||
|
|
|
|||
|
|
1. **Enable full HA**: Increase database replicas to 2
|
|||
|
|
2. **Add Nominatim**: If external API costs exceed $20/month
|
|||
|
|
3. **Upgrade VPS**: To 32 GB RAM / 12 cores for 25+ tenants
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Network and Additional Requirements
|
|||
|
|
|
|||
|
|
### Bandwidth
|
|||
|
|
- Estimated: 2-5 TB/month for 10 tenants
|
|||
|
|
- Includes: API traffic, frontend assets, image uploads, reports
|
|||
|
|
|
|||
|
|
### Backup Strategy
|
|||
|
|
- Database backups: ~10 GB/day (compressed)
|
|||
|
|
- Retention: 30 days
|
|||
|
|
- Additional storage: 300 GB for backups (separate volume recommended)
|
|||
|
|
|
|||
|
|
### Domain & SSL
|
|||
|
|
- 1 domain: `yourdomain.com`
|
|||
|
|
- SSL: Let's Encrypt (free) or wildcard certificate
|
|||
|
|
- Ingress controller: nginx (included in stack)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Deployment Checklist
|
|||
|
|
|
|||
|
|
### Pre-Deployment
|
|||
|
|
- [ ] VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe
|
|||
|
|
- [ ] Docker and Kubernetes (k3s or similar) installed
|
|||
|
|
- [ ] Domain DNS configured
|
|||
|
|
- [ ] SSL certificates ready
|
|||
|
|
|
|||
|
|
### Initial Deployment
|
|||
|
|
- [ ] Deploy with `skaffold run -p prod`
|
|||
|
|
- [ ] Verify all pods running: `kubectl get pods -n bakery-ia`
|
|||
|
|
- [ ] Check PVC status: `kubectl get pvc -n bakery-ia`
|
|||
|
|
- [ ] Access frontend and test login
|
|||
|
|
|
|||
|
|
### Post-Deployment Monitoring
|
|||
|
|
- [ ] Set up external monitoring (UptimeRobot, Pingdom)
|
|||
|
|
- [ ] Configure backup schedule
|
|||
|
|
- [ ] Test database backups and restore
|
|||
|
|
- [ ] Load test with simulated tenant traffic
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Support and Scaling
|
|||
|
|
|
|||
|
|
### When to Scale Up
|
|||
|
|
|
|||
|
|
Monitor these metrics:
|
|||
|
|
1. **RAM usage consistently >80%** → Upgrade RAM
|
|||
|
|
2. **CPU usage consistently >70%** → Upgrade CPU
|
|||
|
|
3. **Storage >150 GB used** → Upgrade storage
|
|||
|
|
4. **Response times >2 seconds** → Add replicas or upgrade VPS
|
|||
|
|
|
|||
|
|
### Emergency Scaling
|
|||
|
|
|
|||
|
|
If you hit limits suddenly:
|
|||
|
|
1. Scale down non-critical services temporarily
|
|||
|
|
2. Disable monitoring temporarily (not recommended for >1 hour)
|
|||
|
|
3. Increase VPS resources (clouding.io allows live upgrades)
|
|||
|
|
4. Review and optimize resource-heavy queries
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
The recommended **20 GB RAM / 8 vCPU / 200 GB NVMe** configuration provides:
|
|||
|
|
|
|||
|
|
✅ Comfortable headroom for 10-tenant pilot
|
|||
|
|
✅ Full monitoring and observability
|
|||
|
|
✅ High availability for critical services
|
|||
|
|
✅ Room for traffic spikes (2-3x baseline)
|
|||
|
|
✅ Cost-effective starting point
|
|||
|
|
✅ Easy scaling path as you grow
|
|||
|
|
|
|||
|
|
**Total estimated compute cost**: €40-80/month (check clouding.io current pricing)
|
|||
|
|
**Additional costs**: Domain (~€15/year), external APIs (~€10/month), backups (~€10/month)
|
|||
|
|
|
|||
|
|
**Next steps**:
|
|||
|
|
1. Provision VPS at clouding.io
|
|||
|
|
2. Follow deployment guide in `/docs/DEPLOYMENT.md`
|
|||
|
|
3. Monitor resource usage for first 2 weeks
|
|||
|
|
4. Adjust based on actual metrics
|