Files
bakery-ia/docs/DEV-PROD-PARITY-ANALYSIS.md
Claude 50c1eb3469 Add dev-prod parity analysis and recommendations
Analyze current differences between development and production environments
and provide three options for improving parity:

1. Conservative: Minimal changes, maximum benefit
   - 2 replicas for critical services
   - Resource limits at 50% of prod
   - Specific CORS origins
   - Resource impact: +30% RAM

2. High Parity: Maximum similarity
   - Match all prod replica counts
   - Production resource limits
   - Enable SSL and monitoring
   - Resource impact: +200% RAM

3. Hybrid: Balanced approach
   - 2 replicas for stateful services
   - Resources at 60% of prod
   - Production configs with dev features
   - Resource impact: +100% RAM

Recommendation: Start with Option 1 for best cost/benefit ratio.
2026-01-02 19:04:49 +00:00

6.0 KiB

Dev-Prod Parity Analysis

Current Differences Between Dev and Prod

1. Replicas

  • Dev: 1 replica per service
  • Prod: 2-3 replicas per service
  • Impact: Multi-replica issues (race conditions, session handling, etc.) won't be caught in dev

2. Resource Limits

  • Dev: Minimal (64Mi-256Mi RAM, 25m-200m CPU)
  • Prod: Not explicitly set (uses defaults from base manifests)
  • Impact: Resource exhaustion issues may appear only in prod

3. Environment Variables

  • Dev: DEBUG=true, LOG_LEVEL=DEBUG, PROFILING_ENABLED=true
  • Prod: DEBUG=false, LOG_LEVEL=INFO, PROFILING_ENABLED=false
  • Impact: Different code paths, performance characteristics

4. CORS Configuration

  • Dev: * (wildcard, accepts all origins)
  • Prod: Specific domains only
  • Impact: CORS issues won't be caught in dev

5. SSL/TLS

  • Dev: HTTP only (ssl-redirect: false)
  • Prod: HTTPS required (Let's Encrypt)
  • Impact: SSL-related issues not tested in dev

6. Image Pull Policy

  • Dev: Never (uses local images)
  • Prod: Default (pulls from registry)
  • Impact: Image versioning issues not caught in dev

7. Storage Class

  • Dev: Uses default Kind storage
  • Prod: Uses microk8s-hostpath
  • Impact: Storage-related differences

8. Rate Limiting

  • Dev: RATE_LIMIT_ENABLED=false
  • Prod: RATE_LIMIT_ENABLED=true
  • Impact: Rate limit logic not tested in dev

Recommendations for Dev-Prod Parity

What SHOULD Be Aligned

  1. Resource Limits Structure

    • Keep dev limits lower, but use same structure
    • Use 50% of prod limits in dev
    • This catches resource issues early
  2. Critical Environment Variables

    • Same security settings (password requirements, JWT config)
    • Same timeout values
    • Same business rules
    • Different: DEBUG, LOG_LEVEL (dev needs verbosity)
  3. Some Replicas for Critical Services

    • Run 2 replicas of gateway, auth in dev
    • Catches load balancing and state management issues
    • Still saves resources vs prod
  4. CORS Configuration

    • Use specific origins in dev (localhost, 127.0.0.1)
    • Catches CORS issues early
  5. Rate Limiting

    • Enable in dev with higher limits
    • Tests the code path without being restrictive

⚠️ What SHOULD Stay Different

  1. Debug Settings

    • Keep DEBUG=true in dev (needed for development)
    • Keep verbose logging (LOG_LEVEL=DEBUG)
    • Keep profiling enabled
  2. SSL/TLS

    • Optional: Can enable self-signed certs in dev
    • But HTTP is simpler for local development
  3. Image Pull Policy

    • Keep Never in dev (faster iteration)
    • Local builds are essential for dev workflow
  4. Replica Counts

    • 1-2 in dev vs 2-3 in prod (balance between parity and resources)
  5. Monitoring

    • Optional in dev to save resources
    • Essential in prod

Proposed Changes for Better Dev-Prod Parity

Minimal changes, maximum benefit:

  1. Increase critical service replicas to 2

    • gateway: 1 → 2
    • auth-service: 1 → 2
    • Tests load balancing, keeps other services at 1
  2. Align resource limits structure

    • Use same resource structure as prod
    • Set to 50% of prod values
  3. Fix CORS in dev

    • Use specific origins instead of wildcard
    • Better matches prod behavior
  4. Enable rate limiting with high limits

    • Tests the code path
    • Won't interfere with development

Option 2: High Parity (More Resources Needed)

Maximum similarity, higher resource usage:

  1. Match prod replica counts

    • Run 2 replicas of all services
    • Requires more RAM (12-16GB)
  2. Use production resource limits

    • Helps catch OOM issues early
    • Requires powerful development machine
  3. Enable SSL in dev

    • Use self-signed certs
    • Matches prod HTTPS behavior
  4. Enable all production features

    • Monitoring, tracing, etc.

Option 3: Hybrid (Best Balance)

Balance between parity and development speed:

  1. 2 replicas for stateful/critical services

    • gateway, auth, tenant, orders: 2 replicas
    • Others: 1 replica
  2. Resource limits at 60% of prod

    • Catches issues without being restrictive
  3. Production-like configuration

    • Same CORS policy (with dev domains)
    • Rate limiting enabled (higher limits)
    • Same security settings
  4. Keep dev-friendly features

    • DEBUG=true
    • Verbose logging
    • Hot reload
    • HTTP (no SSL)

Impact Analysis

Resource Usage Comparison

Current Dev Setup:

  • ~20 pods running
  • ~2-3GB RAM
  • ~1-2 CPU cores

Option 1 (Conservative):

  • ~22 pods (2 extra replicas)
  • ~3-4GB RAM (+30%)
  • ~1.5-2.5 CPU cores

Option 2 (High Parity):

  • ~40 pods (double)
  • ~8-10GB RAM (+200%)
  • ~4-5 CPU cores

Option 3 (Hybrid):

  • ~28 pods
  • ~5-6GB RAM (+100%)
  • ~2-3 CPU cores

Benefits of Increased Parity

  1. Catch Multi-Instance Issues

    • Race conditions
    • Distributed locks
    • Session management
    • Load balancing problems
  2. Resource Issues Found Early

    • Memory leaks
    • OOM errors
    • CPU bottlenecks
  3. Configuration Validation

    • CORS issues
    • Rate limiting bugs
    • Security misconfigurations
  4. Deployment Confidence

    • Fewer surprises in production
    • Better testing
    • Reduced rollbacks

Tradeoffs

Pros:

  • Catches more issues before production
  • More realistic testing environment
  • Better confidence in deployments
  • Team learns production behavior

Cons:

  • Higher resource requirements
  • Slower startup times
  • More complex troubleshooting
  • Longer rebuild cycles

Implementation Guide

If you want to proceed with Option 1 (Conservative), I can:

  1. Update dev kustomization to run 2 replicas of critical services
  2. Add resource limits that mirror prod structure (at 50%)
  3. Fix CORS to use specific origins
  4. Enable rate limiting with dev-friendly limits
  5. Create a "dev-high-parity" profile for those who want closer matching

Would you like me to implement these changes?