bakery-ia/docs/05-deployment/vps-sizing-production.md

# VPS Sizing for Production Deployment

## Executive Summary

This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at **clouding.io** for a **10-tenant pilot program** during the first 6 months.

### Recommended VPS Configuration

```
RAM: 20 GB
Processor: 8 vCPU cores
SSD NVMe (Triple Replica): 200 GB
```

**Estimated Monthly Cost**: Contact clouding.io for current pricing

---

## Resource Analysis

### 1. Application Services (18 Microservices)

#### Standard Services (14 services)
Each service configured with:
- **Request**: 256Mi RAM, 100m CPU
- **Limit**: 512Mi RAM, 500m CPU
- **Production replicas**: 2-3 per service (from prod overlay)

Services:
- auth-service (3 replicas)
- tenant-service (2 replicas)
- inventory-service (2 replicas)
- recipes-service (2 replicas)
- suppliers-service (2 replicas)
- orders-service (3 replicas) *with HPA 1-3*
- sales-service (2 replicas)
- pos-service (2 replicas)
- production-service (2 replicas)
- procurement-service (2 replicas)
- orchestrator-service (2 replicas)
- external-service (2 replicas)
- ai-insights-service (2 replicas)
- alert-processor (3 replicas)

**Total for standard services**: ~39 pods
- RAM requests: ~10 GB
- RAM limits: ~20 GB
- CPU requests: ~3.9 cores
- CPU limits: ~19.5 cores

#### ML/Heavy Services (2 services)

**Training Service** (2 replicas):
- Request: 512Mi RAM, 200m CPU
- Limit: 4Gi RAM, 2000m CPU
- Special storage: 10Gi PVC for models, 4Gi temp storage

**Forecasting Service** (3 replicas) *with HPA 1-3*:
- Request: 512Mi RAM, 200m CPU
- Limit: 1Gi RAM, 1000m CPU

**Notification Service** (3 replicas) *with HPA 1-3*:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU

**ML services total**:
- RAM requests: ~2.3 GB
- RAM limits: ~11 GB
- CPU requests: ~1 core
- CPU limits: ~7 cores

### 2. Databases (18 PostgreSQL instances)

Each database:
- **Request**: 256Mi RAM, 100m CPU
- **Limit**: 512Mi RAM, 500m CPU
- **Storage**: 2Gi PVC each
- **Production replicas**: 1 per database

**Total for databases**: 18 instances
- RAM requests: ~4.6 GB
- RAM limits: ~9.2 GB
- CPU requests: ~1.8 cores
- CPU limits: ~9 cores
- Storage: 36 GB

### 3. Infrastructure Services

**Redis** (1 instance):
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU
- Storage: 1Gi PVC
- TLS enabled

**RabbitMQ** (1 instance):
- Request: 512Mi RAM, 200m CPU
- Limit: 1Gi RAM, 1000m CPU
- Storage: 2Gi PVC

**Infrastructure total**:
- RAM requests: ~0.8 GB
- RAM limits: ~1.5 GB
- CPU requests: ~0.3 cores
- CPU limits: ~1.5 cores
- Storage: 3 GB

### 4. Gateway & Frontend

**Gateway** (3 replicas):
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 500m CPU

**Frontend** (2 replicas):
- Request: 512Mi RAM, 250m CPU
- Limit: 1Gi RAM, 500m CPU

**Total**:
- RAM requests: ~1.8 GB
- RAM limits: ~3.5 GB
- CPU requests: ~0.8 cores
- CPU limits: ~2.5 cores

### 5. Monitoring Stack (Optional but Recommended)

**Prometheus**:
- Request: 1Gi RAM, 500m CPU
- Limit: 2Gi RAM, 1000m CPU
- Storage: 20Gi PVC
- Retention: 200h

**Grafana**:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 200m CPU
- Storage: 5Gi PVC

**Jaeger**:
- Request: 256Mi RAM, 100m CPU
- Limit: 512Mi RAM, 200m CPU

**Monitoring total**:
- RAM requests: ~1.5 GB
- RAM limits: ~3 GB
- CPU requests: ~0.7 cores
- CPU limits: ~1.4 cores
- Storage: 25 GB

### 6. External Services (Optional in Production)

**Nominatim** (Disabled by default - can use external geocoding API):
- If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit
- Storage: 70Gi (50Gi data + 20Gi flatnode)
- **Recommendation**: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources

---

## Total Resource Summary

### With Monitoring, Without Nominatim (Recommended)

| Resource | Requests | Limits | Recommended VPS |
|----------|----------|--------|-----------------|
| **RAM** | ~21 GB | ~48 GB | **20 GB** |
| **CPU** | ~8.5 cores | ~41 cores | **8 vCPU** |
| **Storage** | ~79 GB | - | **200 GB NVMe** |

### Memory Calculation Details
- Application services: 14.1 GB requests / 34.5 GB limits
- Databases: 4.6 GB requests / 9.2 GB limits
- Infrastructure: 0.8 GB requests / 1.5 GB limits
- Gateway/Frontend: 1.8 GB requests / 3.5 GB limits
- Monitoring: 1.5 GB requests / 3 GB limits
- **Total requests**: ~22.8 GB
- **Total limits**: ~51.7 GB

### Why 20 GB RAM is Sufficient

1. **Requests vs Limits**: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because:
   - Not all services will run at their request levels simultaneously during pilot
   - HPA-enabled services (orders, forecasting, notification) start at 1 replica
   - Some overhead included in our calculations

2. **Actual Usage**: Production limits are safety margins. Real usage for 10 tenants will be:
   - Most services use 40-60% of their limits under normal load
   - Pilot traffic is significantly lower than peak design capacity

3. **Cost-Effective Pilot**: Starting with 20 GB allows:
   - Room for monitoring and logging
   - Comfortable headroom (15-25%)
   - Easy vertical scaling if needed

### CPU Calculation Details
- Application services: 5.7 cores requests / 28.5 cores limits
- Databases: 1.8 cores requests / 9 cores limits
- Infrastructure: 0.3 cores requests / 1.5 cores limits
- Gateway/Frontend: 0.8 cores requests / 2.5 cores limits
- Monitoring: 0.7 cores requests / 1.4 cores limits
- **Total requests**: ~9.3 cores
- **Total limits**: ~42.9 cores

### Storage Calculation
- Databases: 36 GB (18 × 2Gi)
- Model storage: 10 GB
- Infrastructure (Redis, RabbitMQ): 3 GB
- Monitoring: 25 GB
- OS and container images: ~30 GB
- Growth buffer: ~95 GB
- **Total**: ~199 GB → **200 GB NVMe recommended**

---

## Scaling Considerations

### Horizontal Pod Autoscaling (HPA)

Already configured for:
1. **orders-service**: 1-3 replicas based on CPU (70%) and memory (80%)
2. **forecasting-service**: 1-3 replicas based on CPU (70%) and memory (75%)
3. **notification-service**: 1-3 replicas based on CPU (70%) and memory (80%)

These services will automatically scale up under load without manual intervention.

### Growth Path for 6-12 Months

If tenant count grows beyond 10:

| Tenants | RAM | CPU | Storage |
|---------|-----|-----|---------|
| 10 | 20 GB | 8 cores | 200 GB |
| 25 | 32 GB | 12 cores | 300 GB |
| 50 | 48 GB | 16 cores | 500 GB |
| 100+ | Consider Kubernetes cluster with multiple nodes |

### Vertical Scaling

If you hit resource limits before adding more tenants:
1. Upgrade RAM first (most common bottleneck)
2. Then CPU if services show high utilization
3. Storage can be expanded independently

---

## Cost Optimization Strategies

### For Pilot Phase (Months 1-6)

1. **Disable Nominatim**: Use external geocoding API
   - Saves: 70 GB storage, 2 GB RAM, 1 CPU core
   - Cost: ~$5-10/month for external API (Google Maps, Mapbox)
   - **Recommendation**: Enable Nominatim only if >50 tenants

2. **Start Without Monitoring**: Add later if needed
   - Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores
   - **Not recommended** - monitoring is crucial for production

3. **Reduce Database Replicas**: Keep at 1 per service
   - Already configured in base
   - **Acceptable risk** for pilot phase

### After Pilot Success (Months 6+)

1. **Enable full HA**: Increase database replicas to 2
2. **Add Nominatim**: If external API costs exceed $20/month
3. **Upgrade VPS**: To 32 GB RAM / 12 cores for 25+ tenants

---

## Network and Additional Requirements

### Bandwidth
- Estimated: 2-5 TB/month for 10 tenants
- Includes: API traffic, frontend assets, image uploads, reports

### Backup Strategy
- Database backups: ~10 GB/day (compressed)
- Retention: 30 days
- Additional storage: 300 GB for backups (separate volume recommended)

### Domain & SSL
- 1 domain: `yourdomain.com`
- SSL: Let's Encrypt (free) or wildcard certificate
- Ingress controller: nginx (included in stack)

---

## Deployment Checklist

### Pre-Deployment
- [ ] VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe
- [ ] Docker and Kubernetes (k3s or similar) installed
- [ ] Domain DNS configured
- [ ] SSL certificates ready

### Initial Deployment
- [ ] Deploy with `skaffold run -p prod`
- [ ] Verify all pods running: `kubectl get pods -n bakery-ia`
- [ ] Check PVC status: `kubectl get pvc -n bakery-ia`
- [ ] Access frontend and test login

### Post-Deployment Monitoring
- [ ] Set up external monitoring (UptimeRobot, Pingdom)
- [ ] Configure backup schedule
- [ ] Test database backups and restore
- [ ] Load test with simulated tenant traffic

---

## Support and Scaling

### When to Scale Up

Monitor these metrics:
1. **RAM usage consistently >80%** → Upgrade RAM
2. **CPU usage consistently >70%** → Upgrade CPU
3. **Storage >150 GB used** → Upgrade storage
4. **Response times >2 seconds** → Add replicas or upgrade VPS

### Emergency Scaling

If you hit limits suddenly:
1. Scale down non-critical services temporarily
2. Disable monitoring temporarily (not recommended for >1 hour)
3. Increase VPS resources (clouding.io allows live upgrades)
4. Review and optimize resource-heavy queries

---

## Conclusion

The recommended **20 GB RAM / 8 vCPU / 200 GB NVMe** configuration provides:

✅ Comfortable headroom for 10-tenant pilot
✅ Full monitoring and observability
✅ High availability for critical services
✅ Room for traffic spikes (2-3x baseline)
✅ Cost-effective starting point
✅ Easy scaling path as you grow

**Total estimated compute cost**: €40-80/month (check clouding.io current pricing)
**Additional costs**: Domain (~€15/year), external APIs (~€10/month), backups (~€10/month)

**Next steps**:
1. Provision VPS at clouding.io
2. Follow deployment guide in `/docs/DEPLOYMENT.md`
3. Monitor resource usage for first 2 weeks
4. Adjust based on actual metrics
-												Improve kubernetes for prod

											
										
										
											2025-11-06 11:04:50 +01:00
+								# VPS Sizing for Production Deployment
 								## Executive Summary
 								This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at **clouding.io** for a **10-tenant pilot program** during the first 6 months.
 								### Recommended VPS Configuration
 								```
 								RAM: 20 GB
 								Processor: 8 vCPU cores
 								SSD NVMe (Triple Replica): 200 GB
 								```
 								**Estimated Monthly Cost**: Contact clouding.io for current pricing
 								---
 								## Resource Analysis
 								### 1. Application Services (18 Microservices)
 								#### Standard Services (14 services)
 								Each service configured with:
 								- **Request**: 256Mi RAM, 100m CPU
 								- **Limit**: 512Mi RAM, 500m CPU
 								- **Production replicas**: 2-3 per service (from prod overlay)
 								Services:
 								- auth-service (3 replicas)
 								- tenant-service (2 replicas)
 								- inventory-service (2 replicas)
 								- recipes-service (2 replicas)
 								- suppliers-service (2 replicas)
 								- orders-service (3 replicas) *with HPA 1-3*
 								- sales-service (2 replicas)
 								- pos-service (2 replicas)
 								- production-service (2 replicas)
 								- procurement-service (2 replicas)
 								- orchestrator-service (2 replicas)
 								- external-service (2 replicas)
 								- ai-insights-service (2 replicas)
 								- alert-processor (3 replicas)
 								**Total for standard services**: ~39 pods
 								- RAM requests: ~10 GB
 								- RAM limits: ~20 GB
 								- CPU requests: ~3.9 cores
 								- CPU limits: ~19.5 cores
 								#### ML/Heavy Services (2 services)
 								**Training Service** (2 replicas):
 								- Request: 512Mi RAM, 200m CPU
 								- Limit: 4Gi RAM, 2000m CPU
 								- Special storage: 10Gi PVC for models, 4Gi temp storage
 								**Forecasting Service** (3 replicas) *with HPA 1-3*:
 								- Request: 512Mi RAM, 200m CPU
 								- Limit: 1Gi RAM, 1000m CPU
 								**Notification Service** (3 replicas) *with HPA 1-3*:
 								- Request: 256Mi RAM, 100m CPU
 								- Limit: 512Mi RAM, 500m CPU
 								**ML services total**:
 								- RAM requests: ~2.3 GB
 								- RAM limits: ~11 GB
 								- CPU requests: ~1 core
 								- CPU limits: ~7 cores
 								### 2. Databases (18 PostgreSQL instances)
 								Each database:
 								- **Request**: 256Mi RAM, 100m CPU
 								- **Limit**: 512Mi RAM, 500m CPU
 								- **Storage**: 2Gi PVC each
 								- **Production replicas**: 1 per database
 								**Total for databases**: 18 instances
 								- RAM requests: ~4.6 GB
 								- RAM limits: ~9.2 GB
 								- CPU requests: ~1.8 cores
 								- CPU limits: ~9 cores
 								- Storage: 36 GB
 								### 3. Infrastructure Services
 								**Redis** (1 instance):
 								- Request: 256Mi RAM, 100m CPU
 								- Limit: 512Mi RAM, 500m CPU
 								- Storage: 1Gi PVC
 								- TLS enabled
 								**RabbitMQ** (1 instance):
 								- Request: 512Mi RAM, 200m CPU
 								- Limit: 1Gi RAM, 1000m CPU
 								- Storage: 2Gi PVC
 								**Infrastructure total**:
 								- RAM requests: ~0.8 GB
 								- RAM limits: ~1.5 GB
 								- CPU requests: ~0.3 cores
 								- CPU limits: ~1.5 cores
 								- Storage: 3 GB
 								### 4. Gateway & Frontend
 								**Gateway** (3 replicas):
 								- Request: 256Mi RAM, 100m CPU
 								- Limit: 512Mi RAM, 500m CPU
 								**Frontend** (2 replicas):
 								- Request: 512Mi RAM, 250m CPU
 								- Limit: 1Gi RAM, 500m CPU
 								**Total**:
 								- RAM requests: ~1.8 GB
 								- RAM limits: ~3.5 GB
 								- CPU requests: ~0.8 cores
 								- CPU limits: ~2.5 cores
 								### 5. Monitoring Stack (Optional but Recommended)
 								**Prometheus**:
 								- Request: 1Gi RAM, 500m CPU
 								- Limit: 2Gi RAM, 1000m CPU
 								- Storage: 20Gi PVC
 								- Retention: 200h
 								**Grafana**:
 								- Request: 256Mi RAM, 100m CPU
 								- Limit: 512Mi RAM, 200m CPU
 								- Storage: 5Gi PVC
 								**Jaeger**:
 								- Request: 256Mi RAM, 100m CPU
 								- Limit: 512Mi RAM, 200m CPU
 								**Monitoring total**:
 								- RAM requests: ~1.5 GB
 								- RAM limits: ~3 GB
 								- CPU requests: ~0.7 cores
 								- CPU limits: ~1.4 cores
 								- Storage: 25 GB
 								### 6. External Services (Optional in Production)
 								**Nominatim** (Disabled by default - can use external geocoding API):
 								- If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit
 								- Storage: 70Gi (50Gi data + 20Gi flatnode)
 								- **Recommendation**: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources
 								---
 								## Total Resource Summary
 								### With Monitoring, Without Nominatim (Recommended)
 								| Resource | Requests | Limits | Recommended VPS |
 								|----------|----------|--------|-----------------|
 								| **RAM** | ~21 GB | ~48 GB | **20 GB** |
 								| **CPU** | ~8.5 cores | ~41 cores | **8 vCPU** |
 								| **Storage** | ~79 GB | - | **200 GB NVMe** |
 								### Memory Calculation Details
 								- Application services: 14.1 GB requests / 34.5 GB limits
 								- Databases: 4.6 GB requests / 9.2 GB limits
 								- Infrastructure: 0.8 GB requests / 1.5 GB limits
 								- Gateway/Frontend: 1.8 GB requests / 3.5 GB limits
 								- Monitoring: 1.5 GB requests / 3 GB limits
 								- **Total requests**: ~22.8 GB
 								- **Total limits**: ~51.7 GB
 								### Why 20 GB RAM is Sufficient
 . **Requests vs Limits**: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because:
 								   - Not all services will run at their request levels simultaneously during pilot
 								   - HPA-enabled services (orders, forecasting, notification) start at 1 replica
 								   - Some overhead included in our calculations
 . **Actual Usage**: Production limits are safety margins. Real usage for 10 tenants will be:
 								   - Most services use 40-60% of their limits under normal load
 								   - Pilot traffic is significantly lower than peak design capacity
 . **Cost-Effective Pilot**: Starting with 20 GB allows:
 								   - Room for monitoring and logging
 								   - Comfortable headroom (15-25%)
 								   - Easy vertical scaling if needed
 								### CPU Calculation Details
 								- Application services: 5.7 cores requests / 28.5 cores limits
 								- Databases: 1.8 cores requests / 9 cores limits
 								- Infrastructure: 0.3 cores requests / 1.5 cores limits
 								- Gateway/Frontend: 0.8 cores requests / 2.5 cores limits
 								- Monitoring: 0.7 cores requests / 1.4 cores limits
 								- **Total requests**: ~9.3 cores
 								- **Total limits**: ~42.9 cores
 								### Storage Calculation
 								- Databases: 36 GB (18 × 2Gi)
 								- Model storage: 10 GB
 								- Infrastructure (Redis, RabbitMQ): 3 GB
 								- Monitoring: 25 GB
 								- OS and container images: ~30 GB
 								- Growth buffer: ~95 GB
 								- **Total**: ~199 GB → **200 GB NVMe recommended**
 								---
 								## Scaling Considerations
 								### Horizontal Pod Autoscaling (HPA)
 								Already configured for:
 . **orders-service**: 1-3 replicas based on CPU (70%) and memory (80%)
 . **forecasting-service**: 1-3 replicas based on CPU (70%) and memory (75%)
 . **notification-service**: 1-3 replicas based on CPU (70%) and memory (80%)
 								These services will automatically scale up under load without manual intervention.
 								### Growth Path for 6-12 Months
 								If tenant count grows beyond 10:
 								| Tenants | RAM | CPU | Storage |
 								|---------|-----|-----|---------|
 								| 10 | 20 GB | 8 cores | 200 GB |
 								| 25 | 32 GB | 12 cores | 300 GB |
 								| 50 | 48 GB | 16 cores | 500 GB |
 								| 100+ | Consider Kubernetes cluster with multiple nodes |
 								### Vertical Scaling
 								If you hit resource limits before adding more tenants:
 . Upgrade RAM first (most common bottleneck)
 . Then CPU if services show high utilization
 . Storage can be expanded independently
 								---
 								## Cost Optimization Strategies
 								### For Pilot Phase (Months 1-6)
 . **Disable Nominatim**: Use external geocoding API
 								   - Saves: 70 GB storage, 2 GB RAM, 1 CPU core
 								   - Cost: ~$5-10/month for external API (Google Maps, Mapbox)
 								   - **Recommendation**: Enable Nominatim only if >50 tenants
 . **Start Without Monitoring**: Add later if needed
 								   - Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores
 								   - **Not recommended** - monitoring is crucial for production
 . **Reduce Database Replicas**: Keep at 1 per service
 								   - Already configured in base
 								   - **Acceptable risk** for pilot phase
 								### After Pilot Success (Months 6+)
 . **Enable full HA**: Increase database replicas to 2
 . **Add Nominatim**: If external API costs exceed $20/month
 . **Upgrade VPS**: To 32 GB RAM / 12 cores for 25+ tenants
 								---
 								## Network and Additional Requirements
 								### Bandwidth
 								- Estimated: 2-5 TB/month for 10 tenants
 								- Includes: API traffic, frontend assets, image uploads, reports
 								### Backup Strategy
 								- Database backups: ~10 GB/day (compressed)
 								- Retention: 30 days
 								- Additional storage: 300 GB for backups (separate volume recommended)
 								### Domain & SSL
 								- 1 domain: `yourdomain.com`
 								- SSL: Let's Encrypt (free) or wildcard certificate
 								- Ingress controller: nginx (included in stack)
 								---
 								## Deployment Checklist
 								### Pre-Deployment
 								- [ ] VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe
 								- [ ] Docker and Kubernetes (k3s or similar) installed
 								- [ ] Domain DNS configured
 								- [ ] SSL certificates ready
 								### Initial Deployment
 								- [ ] Deploy with `skaffold run -p prod`
 								- [ ] Verify all pods running: `kubectl get pods -n bakery-ia`
 								- [ ] Check PVC status: `kubectl get pvc -n bakery-ia`
 								- [ ] Access frontend and test login
 								### Post-Deployment Monitoring
 								- [ ] Set up external monitoring (UptimeRobot, Pingdom)
 								- [ ] Configure backup schedule
 								- [ ] Test database backups and restore
 								- [ ] Load test with simulated tenant traffic
 								---
 								## Support and Scaling
 								### When to Scale Up
 								Monitor these metrics:
 . **RAM usage consistently >80%** → Upgrade RAM
 . **CPU usage consistently >70%** → Upgrade CPU
 . **Storage >150 GB used** → Upgrade storage
 . **Response times >2 seconds** → Add replicas or upgrade VPS
 								### Emergency Scaling
 								If you hit limits suddenly:
 . Scale down non-critical services temporarily
 . Disable monitoring temporarily (not recommended for >1 hour)
 . Increase VPS resources (clouding.io allows live upgrades)
 . Review and optimize resource-heavy queries
 								---
 								## Conclusion
 								The recommended **20 GB RAM / 8 vCPU / 200 GB NVMe** configuration provides:
 								✅ Comfortable headroom for 10-tenant pilot
 								✅ Full monitoring and observability
 								✅ High availability for critical services
 								✅ Room for traffic spikes (2-3x baseline)
 								✅ Cost-effective starting point
 								✅ Easy scaling path as you grow
 								**Total estimated compute cost**: €40-80/month (check clouding.io current pricing)
 								**Additional costs**: Domain (~€15/year), external APIs (~€10/month), backups (~€10/month)
 								**Next steps**:
 . Provision VPS at clouding.io
 . Follow deployment guide in `/docs/DEPLOYMENT.md`
 . Monitor resource usage for first 2 weeks
 . Adjust based on actual metrics