Files
bakery-ia/docs/vps-sizing-production.md
2025-12-05 20:07:01 +01:00

9.6 KiB
Raw Blame History

VPS Sizing for Production Deployment

Executive Summary

This document provides detailed resource requirements for deploying the Bakery IA platform to a production VPS environment at clouding.io for a 10-tenant pilot program during the first 6 months.

RAM: 20 GB
Processor: 8 vCPU cores
SSD NVMe (Triple Replica): 200 GB

Estimated Monthly Cost: Contact clouding.io for current pricing


Resource Analysis

1. Application Services (18 Microservices)

Standard Services (14 services)

Each service configured with:

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 500m CPU
  • Production replicas: 2-3 per service (from prod overlay)

Services:

  • auth-service (3 replicas)
  • tenant-service (2 replicas)
  • inventory-service (2 replicas)
  • recipes-service (2 replicas)
  • suppliers-service (2 replicas)
  • orders-service (3 replicas) with HPA 1-3
  • sales-service (2 replicas)
  • pos-service (2 replicas)
  • production-service (2 replicas)
  • procurement-service (2 replicas)
  • orchestrator-service (2 replicas)
  • external-service (2 replicas)
  • ai-insights-service (2 replicas)
  • alert-processor (3 replicas)

Total for standard services: ~39 pods

  • RAM requests: ~10 GB
  • RAM limits: ~20 GB
  • CPU requests: ~3.9 cores
  • CPU limits: ~19.5 cores

ML/Heavy Services (2 services)

Training Service (2 replicas):

  • Request: 512Mi RAM, 200m CPU
  • Limit: 4Gi RAM, 2000m CPU
  • Special storage: 10Gi PVC for models, 4Gi temp storage

Forecasting Service (3 replicas) with HPA 1-3:

  • Request: 512Mi RAM, 200m CPU
  • Limit: 1Gi RAM, 1000m CPU

Notification Service (3 replicas) with HPA 1-3:

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 500m CPU

ML services total:

  • RAM requests: ~2.3 GB
  • RAM limits: ~11 GB
  • CPU requests: ~1 core
  • CPU limits: ~7 cores

2. Databases (18 PostgreSQL instances)

Each database:

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 500m CPU
  • Storage: 2Gi PVC each
  • Production replicas: 1 per database

Total for databases: 18 instances

  • RAM requests: ~4.6 GB
  • RAM limits: ~9.2 GB
  • CPU requests: ~1.8 cores
  • CPU limits: ~9 cores
  • Storage: 36 GB

3. Infrastructure Services

Redis (1 instance):

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 500m CPU
  • Storage: 1Gi PVC
  • TLS enabled

RabbitMQ (1 instance):

  • Request: 512Mi RAM, 200m CPU
  • Limit: 1Gi RAM, 1000m CPU
  • Storage: 2Gi PVC

Infrastructure total:

  • RAM requests: ~0.8 GB
  • RAM limits: ~1.5 GB
  • CPU requests: ~0.3 cores
  • CPU limits: ~1.5 cores
  • Storage: 3 GB

4. Gateway & Frontend

Gateway (3 replicas):

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 500m CPU

Frontend (2 replicas):

  • Request: 512Mi RAM, 250m CPU
  • Limit: 1Gi RAM, 500m CPU

Total:

  • RAM requests: ~1.8 GB
  • RAM limits: ~3.5 GB
  • CPU requests: ~0.8 cores
  • CPU limits: ~2.5 cores

Prometheus:

  • Request: 1Gi RAM, 500m CPU
  • Limit: 2Gi RAM, 1000m CPU
  • Storage: 20Gi PVC
  • Retention: 200h

Grafana:

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 200m CPU
  • Storage: 5Gi PVC

Jaeger:

  • Request: 256Mi RAM, 100m CPU
  • Limit: 512Mi RAM, 200m CPU

Monitoring total:

  • RAM requests: ~1.5 GB
  • RAM limits: ~3 GB
  • CPU requests: ~0.7 cores
  • CPU limits: ~1.4 cores
  • Storage: 25 GB

6. External Services (Optional in Production)

Nominatim (Disabled by default - can use external geocoding API):

  • If enabled: 2Gi/1 CPU request, 4Gi/2 CPU limit
  • Storage: 70Gi (50Gi data + 20Gi flatnode)
  • Recommendation: Use external geocoding service (Google Maps API, Mapbox) for pilot to save resources

Total Resource Summary

Resource Requests Limits Recommended VPS
RAM ~21 GB ~48 GB 20 GB
CPU ~8.5 cores ~41 cores 8 vCPU
Storage ~79 GB - 200 GB NVMe

Memory Calculation Details

  • Application services: 14.1 GB requests / 34.5 GB limits
  • Databases: 4.6 GB requests / 9.2 GB limits
  • Infrastructure: 0.8 GB requests / 1.5 GB limits
  • Gateway/Frontend: 1.8 GB requests / 3.5 GB limits
  • Monitoring: 1.5 GB requests / 3 GB limits
  • Total requests: ~22.8 GB
  • Total limits: ~51.7 GB

Why 20 GB RAM is Sufficient

  1. Requests vs Limits: Kubernetes uses requests for scheduling. Our total requests (~22.8 GB) fit in 20 GB because:

    • Not all services will run at their request levels simultaneously during pilot
    • HPA-enabled services (orders, forecasting, notification) start at 1 replica
    • Some overhead included in our calculations
  2. Actual Usage: Production limits are safety margins. Real usage for 10 tenants will be:

    • Most services use 40-60% of their limits under normal load
    • Pilot traffic is significantly lower than peak design capacity
  3. Cost-Effective Pilot: Starting with 20 GB allows:

    • Room for monitoring and logging
    • Comfortable headroom (15-25%)
    • Easy vertical scaling if needed

CPU Calculation Details

  • Application services: 5.7 cores requests / 28.5 cores limits
  • Databases: 1.8 cores requests / 9 cores limits
  • Infrastructure: 0.3 cores requests / 1.5 cores limits
  • Gateway/Frontend: 0.8 cores requests / 2.5 cores limits
  • Monitoring: 0.7 cores requests / 1.4 cores limits
  • Total requests: ~9.3 cores
  • Total limits: ~42.9 cores

Storage Calculation

  • Databases: 36 GB (18 × 2Gi)
  • Model storage: 10 GB
  • Infrastructure (Redis, RabbitMQ): 3 GB
  • Monitoring: 25 GB
  • OS and container images: ~30 GB
  • Growth buffer: ~95 GB
  • Total: ~199 GB → 200 GB NVMe recommended

Scaling Considerations

Horizontal Pod Autoscaling (HPA)

Already configured for:

  1. orders-service: 1-3 replicas based on CPU (70%) and memory (80%)
  2. forecasting-service: 1-3 replicas based on CPU (70%) and memory (75%)
  3. notification-service: 1-3 replicas based on CPU (70%) and memory (80%)

These services will automatically scale up under load without manual intervention.

Growth Path for 6-12 Months

If tenant count grows beyond 10:

Tenants RAM CPU Storage
10 20 GB 8 cores 200 GB
25 32 GB 12 cores 300 GB
50 48 GB 16 cores 500 GB
100+ Consider Kubernetes cluster with multiple nodes

Vertical Scaling

If you hit resource limits before adding more tenants:

  1. Upgrade RAM first (most common bottleneck)
  2. Then CPU if services show high utilization
  3. Storage can be expanded independently

Cost Optimization Strategies

For Pilot Phase (Months 1-6)

  1. Disable Nominatim: Use external geocoding API

    • Saves: 70 GB storage, 2 GB RAM, 1 CPU core
    • Cost: ~$5-10/month for external API (Google Maps, Mapbox)
    • Recommendation: Enable Nominatim only if >50 tenants
  2. Start Without Monitoring: Add later if needed

    • Saves: 25 GB storage, 1.5 GB RAM, 0.7 CPU cores
    • Not recommended - monitoring is crucial for production
  3. Reduce Database Replicas: Keep at 1 per service

    • Already configured in base
    • Acceptable risk for pilot phase

After Pilot Success (Months 6+)

  1. Enable full HA: Increase database replicas to 2
  2. Add Nominatim: If external API costs exceed $20/month
  3. Upgrade VPS: To 32 GB RAM / 12 cores for 25+ tenants

Network and Additional Requirements

Bandwidth

  • Estimated: 2-5 TB/month for 10 tenants
  • Includes: API traffic, frontend assets, image uploads, reports

Backup Strategy

  • Database backups: ~10 GB/day (compressed)
  • Retention: 30 days
  • Additional storage: 300 GB for backups (separate volume recommended)

Domain & SSL

  • 1 domain: yourdomain.com
  • SSL: Let's Encrypt (free) or wildcard certificate
  • Ingress controller: nginx (included in stack)

Deployment Checklist

Pre-Deployment

  • VPS provisioned with 20 GB RAM, 8 cores, 200 GB NVMe
  • Docker and Kubernetes (k3s or similar) installed
  • Domain DNS configured
  • SSL certificates ready

Initial Deployment

  • Deploy with skaffold run -p prod
  • Verify all pods running: kubectl get pods -n bakery-ia
  • Check PVC status: kubectl get pvc -n bakery-ia
  • Access frontend and test login

Post-Deployment Monitoring

  • Set up external monitoring (UptimeRobot, Pingdom)
  • Configure backup schedule
  • Test database backups and restore
  • Load test with simulated tenant traffic

Support and Scaling

When to Scale Up

Monitor these metrics:

  1. RAM usage consistently >80% → Upgrade RAM
  2. CPU usage consistently >70% → Upgrade CPU
  3. Storage >150 GB used → Upgrade storage
  4. Response times >2 seconds → Add replicas or upgrade VPS

Emergency Scaling

If you hit limits suddenly:

  1. Scale down non-critical services temporarily
  2. Disable monitoring temporarily (not recommended for >1 hour)
  3. Increase VPS resources (clouding.io allows live upgrades)
  4. Review and optimize resource-heavy queries

Conclusion

The recommended 20 GB RAM / 8 vCPU / 200 GB NVMe configuration provides:

Comfortable headroom for 10-tenant pilot Full monitoring and observability High availability for critical services Room for traffic spikes (2-3x baseline) Cost-effective starting point Easy scaling path as you grow

Total estimated compute cost: €40-80/month (check clouding.io current pricing) Additional costs: Domain (€15/year), external APIs (€10/month), backups (~€10/month)

Next steps:

  1. Provision VPS at clouding.io
  2. Follow deployment guide in /docs/DEPLOYMENT.md
  3. Monitor resource usage for first 2 weeks
  4. Adjust based on actual metrics