Files
bakery-ia/docs/vps-sizing-production.md
2025-12-05 20:07:01 +01:00

346 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# VPS Sizing for Production Deployment
## Executive Summary
This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at **clouding.io** for a **10-tenant pilot program** during the first 6 months.
### Recommended VPS Configuration
```
RAM: 20 GB
Processor: 8 vCPU cores
SSD NVMe (Triple Replica): 200 GB
```
**Estimated Monthly Cost**: Contact clouding.io for current pricing
---
## Resource Analysis
### 1. Application Services (18 Microservices)
#### Standard Services (14 services)
Each service configured with:
- **Request**: 256Mi RAM, 100m CPU
- **Limit**: 512Mi RAM, 500m CPU
- **Production replicas**: 2-3 per service (from prod overlay)
Services:
- auth-service (3 replicas)
- tenant-service (2 replicas)
- inventory-service (2 replicas)
- recipes-service (2 replicas)
- suppliers-service (2 replicas)
- orders-service (3 replicas) *with HPA 1-3*
- sales-service (2 replicas)
- pos-service (2 replicas)
- production-service (2 replicas)
- procurement-service (2 replicas)
- orchestrator-service (2 replicas)
- external-service (2 replicas)
- ai-insights-service (2 replicas)
- alert-processor (3 replicas)
**Total for standard services**: ~39 pods
- RAM requests: ~10 GB
- RAM limits: ~20 GB
- CPU requests: ~3.9 cores
- CPU limits: ~19.5 cores
#### ML/Heavy Services (2 services)
**Training Service** (2 replicas):
- Request: 512Mi RAM, 200m CPU
- Limit: 4Gi RAM, 2000m CPU
- Special storage: 10Gi PVC for models, 4Gi temp storage
**Forecasting Service** (3 replicas) *with HPA 1-3*:
- Request: 512Mi RAM, 200m CPU
- Limit: 1Gi RAM, 1000m CPU
**Notification Service** (3 replicas) *with HPA 1-3*:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU
**ML services total**:
- RAM requests: ~2.3 GB
- RAM limits: ~11 GB
- CPU requests: ~1 core
- CPU limits: ~7 cores
### 2. Databases (18 PostgreSQL instances)
Each database:
- **Request**: 256Mi RAM, 100m CPU
- **Limit**: 512Mi RAM, 500m CPU
- **Storage**: 2Gi PVC each
- **Production replicas**: 1 per database
**Total for databases**: 18 instances
- RAM requests: ~4.6 GB
- RAM limits: ~9.2 GB
- CPU requests: ~1.8 cores
- CPU limits: ~9 cores
- Storage: 36 GB
### 3. Infrastructure Services
**Redis** (1 instance):
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU
- Storage: 1Gi PVC
- TLS enabled
**RabbitMQ** (1 instance):
- Request: 512Mi RAM, 200m CPU
- Limit: 1Gi RAM, 1000m CPU
- Storage: 2Gi PVC
**Infrastructure total**:
- RAM requests: ~0.8 GB
- RAM limits: ~1.5 GB
- CPU requests: ~0.3 cores
- CPU limits: ~1.5 cores
- Storage: 3 GB
### 4. Gateway & Frontend
**Gateway** (3 replicas):
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU
**Frontend** (2 replicas):
- Request: 512Mi RAM, 250m CPU
- Limit: 1Gi RAM, 500m CPU
**Total**:
- RAM requests: ~1.8 GB
- RAM limits: ~3.5 GB
- CPU requests: ~0.8 cores
- CPU limits: ~2.5 cores
### 5. Monitoring Stack (Optional but Recommended)
**Prometheus**:
- Request: 1Gi RAM, 500m CPU
- Limit: 2Gi RAM, 1000m CPU
- Storage: 20Gi PVC
- Retention: 200h
**Grafana**:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 200m CPU
- Storage: 5Gi PVC
**Jaeger**:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 200m CPU
**Monitoring total**:
- RAM requests: ~1.5 GB
- RAM limits: ~3 GB
- CPU requests: ~0.7 cores
- CPU limits: ~1.4 cores
- Storage: 25 GB
### 6. External Services (Optional in Production)
**Nominatim** (Disabled by default - can use external geocoding API):
- If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit
- Storage: 70Gi (50Gi data + 20Gi flatnode)
- **Recommendation**: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources
---
## Total Resource Summary
### With Monitoring, Without Nominatim (Recommended)
| Resource | Requests | Limits | Recommended VPS |
|----------|----------|--------|-----------------|
| **RAM** | ~21 GB | ~48 GB | **20 GB** |
| **CPU** | ~8.5 cores | ~41 cores | **8 vCPU** |
| **Storage** | ~79 GB | - | **200 GB NVMe** |
### Memory Calculation Details
- Application services: 14.1 GB requests / 34.5 GB limits
- Databases: 4.6 GB requests / 9.2 GB limits
- Infrastructure: 0.8 GB requests / 1.5 GB limits
- Gateway/Frontend: 1.8 GB requests / 3.5 GB limits
- Monitoring: 1.5 GB requests / 3 GB limits
- **Total requests**: ~22.8 GB
- **Total limits**: ~51.7 GB
### Why 20 GB RAM is Sufficient
1. **Requests vs Limits**: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because:
- Not all services will run at their request levels simultaneously during pilot
- HPA-enabled services (orders, forecasting, notification) start at 1 replica
- Some overhead included in our calculations
2. **Actual Usage**: Production limits are safety margins. Real usage for 10 tenants will be:
- Most services use 40-60% of their limits under normal load
- Pilot traffic is significantly lower than peak design capacity
3. **Cost-Effective Pilot**: Starting with 20 GB allows:
- Room for monitoring and logging
- Comfortable headroom (15-25%)
- Easy vertical scaling if needed
### CPU Calculation Details
- Application services: 5.7 cores requests / 28.5 cores limits
- Databases: 1.8 cores requests / 9 cores limits
- Infrastructure: 0.3 cores requests / 1.5 cores limits
- Gateway/Frontend: 0.8 cores requests / 2.5 cores limits
- Monitoring: 0.7 cores requests / 1.4 cores limits
- **Total requests**: ~9.3 cores
- **Total limits**: ~42.9 cores
### Storage Calculation
- Databases: 36 GB (18 × 2Gi)
- Model storage: 10 GB
- Infrastructure (Redis, RabbitMQ): 3 GB
- Monitoring: 25 GB
- OS and container images: ~30 GB
- Growth buffer: ~95 GB
- **Total**: ~199 GB → **200 GB NVMe recommended**
---
## Scaling Considerations
### Horizontal Pod Autoscaling (HPA)
Already configured for:
1. **orders-service**: 1-3 replicas based on CPU (70%) and memory (80%)
2. **forecasting-service**: 1-3 replicas based on CPU (70%) and memory (75%)
3. **notification-service**: 1-3 replicas based on CPU (70%) and memory (80%)
These services will automatically scale up under load without manual intervention.
### Growth Path for 6-12 Months
If tenant count grows beyond 10:
| Tenants | RAM | CPU | Storage |
|---------|-----|-----|---------|
| 10 | 20 GB | 8 cores | 200 GB |
| 25 | 32 GB | 12 cores | 300 GB |
| 50 | 48 GB | 16 cores | 500 GB |
| 100+ | Consider Kubernetes cluster with multiple nodes |
### Vertical Scaling
If you hit resource limits before adding more tenants:
1. Upgrade RAM first (most common bottleneck)
2. Then CPU if services show high utilization
3. Storage can be expanded independently
---
## Cost Optimization Strategies
### For Pilot Phase (Months 1-6)
1. **Disable Nominatim**: Use external geocoding API
- Saves: 70 GB storage, 2 GB RAM, 1 CPU core
- Cost: ~$5-10/month for external API (Google Maps, Mapbox)
- **Recommendation**: Enable Nominatim only if >50 tenants
2. **Start Without Monitoring**: Add later if needed
- Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores
- **Not recommended** - monitoring is crucial for production
3. **Reduce Database Replicas**: Keep at 1 per service
- Already configured in base
- **Acceptable risk** for pilot phase
### After Pilot Success (Months 6+)
1. **Enable full HA**: Increase database replicas to 2
2. **Add Nominatim**: If external API costs exceed $20/month
3. **Upgrade VPS**: To 32 GB RAM / 12 cores for 25+ tenants
---
## Network and Additional Requirements
### Bandwidth
- Estimated: 2-5 TB/month for 10 tenants
- Includes: API traffic, frontend assets, image uploads, reports
### Backup Strategy
- Database backups: ~10 GB/day (compressed)
- Retention: 30 days
- Additional storage: 300 GB for backups (separate volume recommended)
### Domain & SSL
- 1 domain: `yourdomain.com`
- SSL: Let's Encrypt (free) or wildcard certificate
- Ingress controller: nginx (included in stack)
---
## Deployment Checklist
### Pre-Deployment
- [ ] VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe
- [ ] Docker and Kubernetes (k3s or similar) installed
- [ ] Domain DNS configured
- [ ] SSL certificates ready
### Initial Deployment
- [ ] Deploy with `skaffold run -p prod`
- [ ] Verify all pods running: `kubectl get pods -n bakery-ia`
- [ ] Check PVC status: `kubectl get pvc -n bakery-ia`
- [ ] Access frontend and test login
### Post-Deployment Monitoring
- [ ] Set up external monitoring (UptimeRobot, Pingdom)
- [ ] Configure backup schedule
- [ ] Test database backups and restore
- [ ] Load test with simulated tenant traffic
---
## Support and Scaling
### When to Scale Up
Monitor these metrics:
1. **RAM usage consistently >80%** → Upgrade RAM
2. **CPU usage consistently >70%** → Upgrade CPU
3. **Storage >150 GB used** → Upgrade storage
4. **Response times >2 seconds** → Add replicas or upgrade VPS
### Emergency Scaling
If you hit limits suddenly:
1. Scale down non-critical services temporarily
2. Disable monitoring temporarily (not recommended for >1 hour)
3. Increase VPS resources (clouding.io allows live upgrades)
4. Review and optimize resource-heavy queries
---
## Conclusion
The recommended **20 GB RAM / 8 vCPU / 200 GB NVMe** configuration provides:
✅ Comfortable headroom for 10-tenant pilot
✅ Full monitoring and observability
✅ High availability for critical services
✅ Room for traffic spikes (2-3x baseline)
✅ Cost-effective starting point
✅ Easy scaling path as you grow
**Total estimated compute cost**: €40-80/month (check clouding.io current pricing)
**Additional costs**: Domain (~€15/year), external APIs (~€10/month), backups (~€10/month)
**Next steps**:
1. Provision VPS at clouding.io
2. Follow deployment guide in `/docs/DEPLOYMENT.md`
3. Monitor resource usage for first 2 weeks
4. Adjust based on actual metrics