bakery-ia/docs/05-deployment/vps-sizing-production.md

# VPS Sizing for Production Deployment

## Executive Summary

This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at **clouding.io** for a **10-tenant pilot program** during the first 6 months.

### Recommended VPS Configuration

```
RAM: 20 GB
Processor: 8 vCPU cores
SSD NVMe (Triple Replica): 200 GB
```

**Estimated Monthly Cost**: Contact clouding.io for current pricing

---

## Resource Analysis

### 1. Application Services (18 Microservices)

#### Standard Services (14 services)
Each service configured with:
- **Request**: 256Mi RAM, 100m CPU
- **Limit**: 512Mi RAM, 500m CPU
- **Production replicas**: 2-3 per service (from prod overlay)

Services:
- auth-service (3 replicas)
- tenant-service (2 replicas)
- inventory-service (2 replicas)
- recipes-service (2 replicas)
- suppliers-service (2 replicas)
- orders-service (3 replicas) *with HPA 1-3*
- sales-service (2 replicas)
- pos-service (2 replicas)
- production-service (2 replicas)
- procurement-service (2 replicas)
- orchestrator-service (2 replicas)
- external-service (2 replicas)
- ai-insights-service (2 replicas)
- alert-processor (3 replicas)

**Total for standard services**: ~39 pods
- RAM requests: ~10 GB
- RAM limits: ~20 GB
- CPU requests: ~3.9 cores
- CPU limits: ~19.5 cores

#### ML/Heavy Services (2 services)

**Training Service** (2 replicas):
- Request: 512Mi RAM, 200m CPU
- Limit: 4Gi RAM, 2000m CPU
- Special storage: 10Gi PVC for models, 4Gi temp storage

**Forecasting Service** (3 replicas) *with HPA 1-3*:
- Request: 512Mi RAM, 200m CPU
- Limit: 1Gi RAM, 1000m CPU

**Notification Service** (3 replicas) *with HPA 1-3*:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU

**ML services total**:
- RAM requests: ~2.3 GB
- RAM limits: ~11 GB
- CPU requests: ~1 core
- CPU limits: ~7 cores

### 2. Databases (18 PostgreSQL instances)

Each database:
- **Request**: 256Mi RAM, 100m CPU
- **Limit**: 512Mi RAM, 500m CPU
- **Storage**: 2Gi PVC each
- **Production replicas**: 1 per database

**Total for databases**: 18 instances
- RAM requests: ~4.6 GB
- RAM limits: ~9.2 GB
- CPU requests: ~1.8 cores
- CPU limits: ~9 cores
- Storage: 36 GB

### 3. Infrastructure Services

**Redis** (1 instance):
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU
- Storage: 1Gi PVC
- TLS enabled

**RabbitMQ** (1 instance):
- Request: 512Mi RAM, 200m CPU
- Limit: 1Gi RAM, 1000m CPU
- Storage: 2Gi PVC

**Infrastructure total**:
- RAM requests: ~0.8 GB
- RAM limits: ~1.5 GB
- CPU requests: ~0.3 cores
- CPU limits: ~1.5 cores
- Storage: 3 GB

### 4. Gateway & Frontend

**Gateway** (3 replicas):
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU

**Frontend** (2 replicas):
- Request: 512Mi RAM, 250m CPU
- Limit: 1Gi RAM, 500m CPU

**Total**:
- RAM requests: ~1.8 GB
- RAM limits: ~3.5 GB
- CPU requests: ~0.8 cores
- CPU limits: ~2.5 cores

### 5. Monitoring Stack (Optional but Recommended)

**Prometheus**:
- Request: 1Gi RAM, 500m CPU
- Limit: 2Gi RAM, 1000m CPU
- Storage: 20Gi PVC
- Retention: 200h

**Grafana**:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 200m CPU
- Storage: 5Gi PVC

**Jaeger**:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 200m CPU

**Monitoring total**:
- RAM requests: ~1.5 GB
- RAM limits: ~3 GB
- CPU requests: ~0.7 cores
- CPU limits: ~1.4 cores
- Storage: 25 GB

### 6. External Services (Optional in Production)

**Nominatim** (Disabled by default - can use external geocoding API):
- If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit
- Storage: 70Gi (50Gi data + 20Gi flatnode)
- **Recommendation**: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources

---

## Total Resource Summary

### With Monitoring, Without Nominatim (Recommended)

| Resource | Requests | Limits | Recommended VPS |
|----------|----------|--------|-----------------|
| **RAM** | ~21 GB | ~48 GB | **20 GB** |
| **CPU** | ~8.5 cores | ~41 cores | **8 vCPU** |
| **Storage** | ~79 GB | - | **200 GB NVMe** |

### Memory Calculation Details
- Application services: 14.1 GB requests / 34.5 GB limits
- Databases: 4.6 GB requests / 9.2 GB limits
- Infrastructure: 0.8 GB requests / 1.5 GB limits
- Gateway/Frontend: 1.8 GB requests / 3.5 GB limits
- Monitoring: 1.5 GB requests / 3 GB limits
- **Total requests**: ~22.8 GB
- **Total limits**: ~51.7 GB

### Why 20 GB RAM is Sufficient

1. **Requests vs Limits**: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because:
   - Not all services will run at their request levels simultaneously during pilot
   - HPA-enabled services (orders, forecasting, notification) start at 1 replica
   - Some overhead included in our calculations

2. **Actual Usage**: Production limits are safety margins. Real usage for 10 tenants will be:
   - Most services use 40-60% of their limits under normal load
   - Pilot traffic is significantly lower than peak design capacity

3. **Cost-Effective Pilot**: Starting with 20 GB allows:
   - Room for monitoring and logging
   - Comfortable headroom (15-25%)
   - Easy vertical scaling if needed

### CPU Calculation Details
- Application services: 5.7 cores requests / 28.5 cores limits
- Databases: 1.8 cores requests / 9 cores limits
- Infrastructure: 0.3 cores requests / 1.5 cores limits
- Gateway/Frontend: 0.8 cores requests / 2.5 cores limits
- Monitoring: 0.7 cores requests / 1.4 cores limits
- **Total requests**: ~9.3 cores
- **Total limits**: ~42.9 cores

### Storage Calculation
- Databases: 36 GB (18 × 2Gi)
- Model storage: 10 GB
- Infrastructure (Redis, RabbitMQ): 3 GB
- Monitoring: 25 GB
- OS and container images: ~30 GB
- Growth buffer: ~95 GB
- **Total**: ~199 GB → **200 GB NVMe recommended**

---

## Scaling Considerations

### Horizontal Pod Autoscaling (HPA)

Already configured for:
1. **orders-service**: 1-3 replicas based on CPU (70%) and memory (80%)
2. **forecasting-service**: 1-3 replicas based on CPU (70%) and memory (75%)
3. **notification-service**: 1-3 replicas based on CPU (70%) and memory (80%)

These services will automatically scale up under load without manual intervention.

### Growth Path for 6-12 Months

If tenant count grows beyond 10:

| Tenants | RAM | CPU | Storage |
|---------|-----|-----|---------|
| 10 | 20 GB | 8 cores | 200 GB |
| 25 | 32 GB | 12 cores | 300 GB |
| 50 | 48 GB | 16 cores | 500 GB |
| 100+ | Consider Kubernetes cluster with multiple nodes |

### Vertical Scaling

If you hit resource limits before adding more tenants:
1. Upgrade RAM first (most common bottleneck)
2. Then CPU if services show high utilization
3. Storage can be expanded independently

---

## Cost Optimization Strategies

### For Pilot Phase (Months 1-6)

1. **Disable Nominatim**: Use external geocoding API
   - Saves: 70 GB storage, 2 GB RAM, 1 CPU core
   - Cost: ~$5-10/month for external API (Google Maps, Mapbox)
   - **Recommendation**: Enable Nominatim only if >50 tenants

2. **Start Without Monitoring**: Add later if needed
   - Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores
   - **Not recommended** - monitoring is crucial for production

3. **Reduce Database Replicas**: Keep at 1 per service
   - Already configured in base
   - **Acceptable risk** for pilot phase

### After Pilot Success (Months 6+)

1. **Enable full HA**: Increase database replicas to 2
2. **Add Nominatim**: If external API costs exceed $20/month
3. **Upgrade VPS**: To 32 GB RAM / 12 cores for 25+ tenants

---

## Network and Additional Requirements

### Bandwidth
- Estimated: 2-5 TB/month for 10 tenants
- Includes: API traffic, frontend assets, image uploads, reports

### Backup Strategy
- Database backups: ~10 GB/day (compressed)
- Retention: 30 days
- Additional storage: 300 GB for backups (separate volume recommended)

### Domain & SSL
- 1 domain: `yourdomain.com`
- SSL: Let's Encrypt (free) or wildcard certificate
- Ingress controller: nginx (included in stack)

---

## Deployment Checklist

### Pre-Deployment
- [ ] VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe
- [ ] Docker and Kubernetes (k3s or similar) installed
- [ ] Domain DNS configured
- [ ] SSL certificates ready

### Initial Deployment
- [ ] Deploy with `skaffold run -p prod`
- [ ] Verify all pods running: `kubectl get pods -n bakery-ia`
- [ ] Check PVC status: `kubectl get pvc -n bakery-ia`
- [ ] Access frontend and test login

### Post-Deployment Monitoring
- [ ] Set up external monitoring (UptimeRobot, Pingdom)
- [ ] Configure backup schedule
- [ ] Test database backups and restore
- [ ] Load test with simulated tenant traffic

---

## Support and Scaling

### When to Scale Up

Monitor these metrics:
1. **RAM usage consistently >80%** → Upgrade RAM
2. **CPU usage consistently >70%** → Upgrade CPU
3. **Storage >150 GB used** → Upgrade storage
4. **Response times >2 seconds** → Add replicas or upgrade VPS

### Emergency Scaling

If you hit limits suddenly:
1. Scale down non-critical services temporarily
2. Disable monitoring temporarily (not recommended for >1 hour)
3. Increase VPS resources (clouding.io allows live upgrades)
4. Review and optimize resource-heavy queries

---

## Conclusion

The recommended **20 GB RAM / 8 vCPU / 200 GB NVMe** configuration provides:

✅ Comfortable headroom for 10-tenant pilot
✅ Full monitoring and observability
✅ High availability for critical services
✅ Room for traffic spikes (2-3x baseline)
✅ Cost-effective starting point
✅ Easy scaling path as you grow

**Total estimated compute cost**: €40-80/month (check clouding.io current pricing)
**Additional costs**: Domain (~€15/year), external APIs (~€10/month), backups (~€10/month)

**Next steps**:
1. Provision VPS at clouding.io
2. Follow deployment guide in `/docs/DEPLOYMENT.md`
3. Monitor resource usage for first 2 weeks
4. Adjust based on actual metrics