# VPS Sizing for Production Deployment ## Executive Summary This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at **clouding.io** for a **10-tenant pilot program** during the first 6 months. ### Recommended VPS Configuration ``` RAM: 20 GB Processor: 8 vCPU cores SSD NVMe (Triple Replica): 200 GB ``` **Estimated Monthly Cost**: Contact clouding.io for current pricing --- ## Resource Analysis ### 1. Application Services (18 Microservices) #### Standard Services (14 services) Each service configured with: - **Request**: 256Mi RAM, 100m CPU - **Limit**: 512Mi RAM, 500m CPU - **Production replicas**: 2-3 per service (from prod overlay) Services: - auth-service (3 replicas) - tenant-service (2 replicas) - inventory-service (2 replicas) - recipes-service (2 replicas) - suppliers-service (2 replicas) - orders-service (3 replicas) *with HPA 1-3* - sales-service (2 replicas) - pos-service (2 replicas) - production-service (2 replicas) - procurement-service (2 replicas) - orchestrator-service (2 replicas) - external-service (2 replicas) - ai-insights-service (2 replicas) - alert-processor (3 replicas) **Total for standard services**: ~39 pods - RAM requests: ~10 GB - RAM limits: ~20 GB - CPU requests: ~3.9 cores - CPU limits: ~19.5 cores #### ML/Heavy Services (2 services) **Training Service** (2 replicas): - Request: 512Mi RAM, 200m CPU - Limit: 4Gi RAM, 2000m CPU - Special storage: 10Gi PVC for models, 4Gi temp storage **Forecasting Service** (3 replicas) *with HPA 1-3*: - Request: 512Mi RAM, 200m CPU - Limit: 1Gi RAM, 1000m CPU **Notification Service** (3 replicas) *with HPA 1-3*: - Request: 256Mi RAM, 100m CPU - Limit: 512Mi RAM, 500m CPU **ML services total**: - RAM requests: ~2.3 GB - RAM limits: ~11 GB - CPU requests: ~1 core - CPU limits: ~7 cores ### 2. Databases (18 PostgreSQL instances) Each database: - **Request**: 256Mi RAM, 100m CPU - **Limit**: 512Mi RAM, 500m CPU - **Storage**: 2Gi PVC each - **Production replicas**: 1 per database **Total for databases**: 18 instances - RAM requests: ~4.6 GB - RAM limits: ~9.2 GB - CPU requests: ~1.8 cores - CPU limits: ~9 cores - Storage: 36 GB ### 3. Infrastructure Services **Redis** (1 instance): - Request: 256Mi RAM, 100m CPU - Limit: 512Mi RAM, 500m CPU - Storage: 1Gi PVC - TLS enabled **RabbitMQ** (1 instance): - Request: 512Mi RAM, 200m CPU - Limit: 1Gi RAM, 1000m CPU - Storage: 2Gi PVC **Infrastructure total**: - RAM requests: ~0.8 GB - RAM limits: ~1.5 GB - CPU requests: ~0.3 cores - CPU limits: ~1.5 cores - Storage: 3 GB ### 4. Gateway & Frontend **Gateway** (3 replicas): - Request: 256Mi RAM, 100m CPU - Limit: 512Mi RAM, 500m CPU **Frontend** (2 replicas): - Request: 512Mi RAM, 250m CPU - Limit: 1Gi RAM, 500m CPU **Total**: - RAM requests: ~1.8 GB - RAM limits: ~3.5 GB - CPU requests: ~0.8 cores - CPU limits: ~2.5 cores ### 5. Monitoring Stack (Optional but Recommended) **Prometheus**: - Request: 1Gi RAM, 500m CPU - Limit: 2Gi RAM, 1000m CPU - Storage: 20Gi PVC - Retention: 200h **Grafana**: - Request: 256Mi RAM, 100m CPU - Limit: 512Mi RAM, 200m CPU - Storage: 5Gi PVC **Jaeger**: - Request: 256Mi RAM, 100m CPU - Limit: 512Mi RAM, 200m CPU **Monitoring total**: - RAM requests: ~1.5 GB - RAM limits: ~3 GB - CPU requests: ~0.7 cores - CPU limits: ~1.4 cores - Storage: 25 GB ### 6. External Services (Optional in Production) **Nominatim** (Disabled by default - can use external geocoding API): - If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit - Storage: 70Gi (50Gi data + 20Gi flatnode) - **Recommendation**: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources --- ## Total Resource Summary ### With Monitoring, Without Nominatim (Recommended) | Resource | Requests | Limits | Recommended VPS | |----------|----------|--------|-----------------| | **RAM** | ~21 GB | ~48 GB | **20 GB** | | **CPU** | ~8.5 cores | ~41 cores | **8 vCPU** | | **Storage** | ~79 GB | - | **200 GB NVMe** | ### Memory Calculation Details - Application services: 14.1 GB requests / 34.5 GB limits - Databases: 4.6 GB requests / 9.2 GB limits - Infrastructure: 0.8 GB requests / 1.5 GB limits - Gateway/Frontend: 1.8 GB requests / 3.5 GB limits - Monitoring: 1.5 GB requests / 3 GB limits - **Total requests**: ~22.8 GB - **Total limits**: ~51.7 GB ### Why 20 GB RAM is Sufficient 1. **Requests vs Limits**: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because: - Not all services will run at their request levels simultaneously during pilot - HPA-enabled services (orders, forecasting, notification) start at 1 replica - Some overhead included in our calculations 2. **Actual Usage**: Production limits are safety margins. Real usage for 10 tenants will be: - Most services use 40-60% of their limits under normal load - Pilot traffic is significantly lower than peak design capacity 3. **Cost-Effective Pilot**: Starting with 20 GB allows: - Room for monitoring and logging - Comfortable headroom (15-25%) - Easy vertical scaling if needed ### CPU Calculation Details - Application services: 5.7 cores requests / 28.5 cores limits - Databases: 1.8 cores requests / 9 cores limits - Infrastructure: 0.3 cores requests / 1.5 cores limits - Gateway/Frontend: 0.8 cores requests / 2.5 cores limits - Monitoring: 0.7 cores requests / 1.4 cores limits - **Total requests**: ~9.3 cores - **Total limits**: ~42.9 cores ### Storage Calculation - Databases: 36 GB (18 × 2Gi) - Model storage: 10 GB - Infrastructure (Redis, RabbitMQ): 3 GB - Monitoring: 25 GB - OS and container images: ~30 GB - Growth buffer: ~95 GB - **Total**: ~199 GB → **200 GB NVMe recommended** --- ## Scaling Considerations ### Horizontal Pod Autoscaling (HPA) Already configured for: 1. **orders-service**: 1-3 replicas based on CPU (70%) and memory (80%) 2. **forecasting-service**: 1-3 replicas based on CPU (70%) and memory (75%) 3. **notification-service**: 1-3 replicas based on CPU (70%) and memory (80%) These services will automatically scale up under load without manual intervention. ### Growth Path for 6-12 Months If tenant count grows beyond 10: | Tenants | RAM | CPU | Storage | |---------|-----|-----|---------| | 10 | 20 GB | 8 cores | 200 GB | | 25 | 32 GB | 12 cores | 300 GB | | 50 | 48 GB | 16 cores | 500 GB | | 100+ | Consider Kubernetes cluster with multiple nodes | ### Vertical Scaling If you hit resource limits before adding more tenants: 1. Upgrade RAM first (most common bottleneck) 2. Then CPU if services show high utilization 3. Storage can be expanded independently --- ## Cost Optimization Strategies ### For Pilot Phase (Months 1-6) 1. **Disable Nominatim**: Use external geocoding API - Saves: 70 GB storage, 2 GB RAM, 1 CPU core - Cost: ~$5-10/month for external API (Google Maps, Mapbox) - **Recommendation**: Enable Nominatim only if >50 tenants 2. **Start Without Monitoring**: Add later if needed - Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores - **Not recommended** - monitoring is crucial for production 3. **Reduce Database Replicas**: Keep at 1 per service - Already configured in base - **Acceptable risk** for pilot phase ### After Pilot Success (Months 6+) 1. **Enable full HA**: Increase database replicas to 2 2. **Add Nominatim**: If external API costs exceed $20/month 3. **Upgrade VPS**: To 32 GB RAM / 12 cores for 25+ tenants --- ## Network and Additional Requirements ### Bandwidth - Estimated: 2-5 TB/month for 10 tenants - Includes: API traffic, frontend assets, image uploads, reports ### Backup Strategy - Database backups: ~10 GB/day (compressed) - Retention: 30 days - Additional storage: 300 GB for backups (separate volume recommended) ### Domain & SSL - 1 domain: `yourdomain.com` - SSL: Let's Encrypt (free) or wildcard certificate - Ingress controller: nginx (included in stack) --- ## Deployment Checklist ### Pre-Deployment - [ ] VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe - [ ] Docker and Kubernetes (k3s or similar) installed - [ ] Domain DNS configured - [ ] SSL certificates ready ### Initial Deployment - [ ] Deploy with `skaffold run -p prod` - [ ] Verify all pods running: `kubectl get pods -n bakery-ia` - [ ] Check PVC status: `kubectl get pvc -n bakery-ia` - [ ] Access frontend and test login ### Post-Deployment Monitoring - [ ] Set up external monitoring (UptimeRobot, Pingdom) - [ ] Configure backup schedule - [ ] Test database backups and restore - [ ] Load test with simulated tenant traffic --- ## Support and Scaling ### When to Scale Up Monitor these metrics: 1. **RAM usage consistently >80%** → Upgrade RAM 2. **CPU usage consistently >70%** → Upgrade CPU 3. **Storage >150 GB used** → Upgrade storage 4. **Response times >2 seconds** → Add replicas or upgrade VPS ### Emergency Scaling If you hit limits suddenly: 1. Scale down non-critical services temporarily 2. Disable monitoring temporarily (not recommended for >1 hour) 3. Increase VPS resources (clouding.io allows live upgrades) 4. Review and optimize resource-heavy queries --- ## Conclusion The recommended **20 GB RAM / 8 vCPU / 200 GB NVMe** configuration provides: ✅ Comfortable headroom for 10-tenant pilot ✅ Full monitoring and observability ✅ High availability for critical services ✅ Room for traffic spikes (2-3x baseline) ✅ Cost-effective starting point ✅ Easy scaling path as you grow **Total estimated compute cost**: €40-80/month (check clouding.io current pricing) **Additional costs**: Domain (~€15/year), external APIs (~€10/month), backups (~€10/month) **Next steps**: 1. Provision VPS at clouding.io 2. Follow deployment guide in `/docs/DEPLOYMENT.md` 3. Monitor resource usage for first 2 weeks 4. Adjust based on actual metrics