# 🚀 Quick Start: Deploy Monitoring to Production **Time to deploy: ~15 minutes** --- ## Step 1: Update Secrets (5 min) ```bash cd infrastructure/kubernetes/base/components/monitoring # 1. Generate strong passwords GRAFANA_PASS=$(openssl rand -base64 32) echo "Grafana Password: $GRAFANA_PASS" > ~/SAVE_THIS_PASSWORD.txt # 2. Edit secrets.yaml and replace: # - CHANGE_ME_IN_PRODUCTION (Grafana password) # - SMTP settings (your email server) # - PostgreSQL connection string (your DB) nano secrets.yaml ``` **Required Changes in secrets.yaml:** ```yaml # Line 13: Change Grafana password admin-password: "YOUR_STRONG_PASSWORD_HERE" # Lines 30-33: Update SMTP settings smtp-host: "smtp.gmail.com:587" smtp-username: "your-alerts@yourdomain.com" smtp-password: "YOUR_SMTP_PASSWORD" smtp-from: "alerts@yourdomain.com" # Line 49: Update PostgreSQL connection data-source-name: "postgresql://USER:PASSWORD@postgres.bakery-ia:5432/bakery?sslmode=require" ``` --- ## Step 2: Update Alert Email Addresses (2 min) ```bash # Edit alertmanager.yaml to set your team's email addresses nano alertmanager.yaml # Update these lines (search for @yourdomain.com): # - Line 93: to: 'alerts@yourdomain.com' # - Line 101: to: 'critical-alerts@yourdomain.com,oncall@yourdomain.com' # - Line 116: to: 'alerts@yourdomain.com' # - Line 125: to: 'alert-system-team@yourdomain.com' # - Line 134: to: 'database-team@yourdomain.com' # - Line 143: to: 'infra-team@yourdomain.com' ``` --- ## Step 3: Deploy to Production (3 min) ```bash # Return to project root cd /Users/urtzialfaro/Documents/bakery-ia # Deploy the entire stack kubectl apply -k infrastructure/kubernetes/overlays/prod # Watch the pods come up kubectl get pods -n monitoring -w ``` **Expected Output:** ``` NAME READY STATUS RESTARTS AGE prometheus-0 1/1 Running 0 2m prometheus-1 1/1 Running 0 1m alertmanager-0 2/2 Running 0 2m alertmanager-1 2/2 Running 0 1m alertmanager-2 2/2 Running 0 1m grafana-xxxxx 1/1 Running 0 2m postgres-exporter-xxxxx 1/1 Running 0 2m node-exporter-xxxxx 1/1 Running 0 2m jaeger-xxxxx 1/1 Running 0 2m ``` --- ## Step 4: Verify Deployment (3 min) ```bash # Check all pods are running kubectl get pods -n monitoring # Check storage is provisioned kubectl get pvc -n monitoring # Check services are created kubectl get svc -n monitoring ``` --- ## Step 5: Access Dashboards (2 min) ### **Option A: Via Ingress (if configured)** ``` https://monitoring.yourdomain.com/grafana https://monitoring.yourdomain.com/prometheus https://monitoring.yourdomain.com/alertmanager https://monitoring.yourdomain.com/jaeger ``` ### **Option B: Via Port Forwarding** ```bash # Grafana kubectl port-forward -n monitoring svc/grafana 3000:3000 & # Prometheus kubectl port-forward -n monitoring svc/prometheus-external 9090:9090 & # AlertManager kubectl port-forward -n monitoring svc/alertmanager-external 9093:9093 & # Jaeger kubectl port-forward -n monitoring svc/jaeger-query 16686:16686 & # Now access: # - Grafana: http://localhost:3000 (admin / YOUR_PASSWORD) # - Prometheus: http://localhost:9090 # - AlertManager: http://localhost:9093 # - Jaeger: http://localhost:16686 ``` --- ## Step 6: Verify Everything Works (5 min) ### **Check Prometheus Targets** 1. Open Prometheus: http://localhost:9090 2. Go to Status → Targets 3. Verify all targets are **UP**: - prometheus (1/1 up) - bakery-services (multiple pods up) - alertmanager (3/3 up) - postgres-exporter (1/1 up) - node-exporter (N/N up, where N = number of nodes) ### **Check Grafana Dashboards** 1. Open Grafana: http://localhost:3000 2. Login with admin / YOUR_PASSWORD 3. Go to Dashboards → Browse 4. You should see 11 dashboards: - Bakery IA folder: Gateway Metrics, Services Overview, Circuit Breakers - Bakery IA - Extended folder: PostgreSQL, Node Exporter, AlertManager, Business Metrics 5. Open any dashboard and verify data is loading ### **Test Alert Flow** ```bash # Fire a test alert by creating high memory pod kubectl run memory-test --image=polinux/stress --restart=Never \ --namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s # Wait 5 minutes, then check: # 1. Prometheus Alerts: http://localhost:9090/alerts # - Should see "HighMemoryUsage" firing # 2. AlertManager: http://localhost:9093 # - Should see the alert # 3. Email inbox - Should receive notification # Clean up kubectl delete pod memory-test -n bakery-ia ``` ### **Verify Jaeger Tracing** 1. Make a request to your API: ```bash curl -H "Authorization: Bearer YOUR_TOKEN" \ https://api.yourdomain.com/api/v1/health ``` 2. Open Jaeger: http://localhost:16686 3. Select a service from dropdown 4. Click "Find Traces" 5. You should see traces appearing --- ## ✅ Success Criteria Your monitoring is working correctly if: - [x] All Prometheus targets show "UP" status - [x] Grafana dashboards display metrics - [x] AlertManager cluster shows 3/3 members - [x] Test alert fired and email received - [x] Jaeger shows traces from services - [x] No pods in CrashLoopBackOff state - [x] All PVCs are Bound --- ## 🔧 Troubleshooting ### **Problem: Pods not starting** ```bash # Check pod status kubectl describe pod POD_NAME -n monitoring # Check logs kubectl logs POD_NAME -n monitoring # Common issues: # - Insufficient resources: Check node capacity # - PVC not binding: Check storage class exists # - Image pull errors: Check network/registry access ``` ### **Problem: Prometheus targets DOWN** ```bash # Check if services exist kubectl get svc -n bakery-ia # Check if pods have correct labels kubectl get pods -n bakery-ia --show-labels # Check if pods expose metrics port (8080) kubectl get pod POD_NAME -n bakery-ia -o yaml | grep -A 5 ports ``` ### **Problem: Grafana shows "No Data"** ```bash # Test Prometheus datasource kubectl port-forward -n monitoring svc/prometheus-external 9090:9090 # Run a test query in Prometheus curl "http://localhost:9090/api/v1/query?query=up" | jq # If Prometheus has data but Grafana doesn't, check Grafana datasource config ``` ### **Problem: Alerts not firing** ```bash # Check alert rules are loaded kubectl logs -n monitoring prometheus-0 | grep "Loading configuration" # Check AlertManager config kubectl exec -n monitoring alertmanager-0 -- cat /etc/alertmanager/alertmanager.yml # Test SMTP connection kubectl exec -n monitoring alertmanager-0 -- \ nc -zv smtp.gmail.com 587 ``` --- ## 📞 Need Help? 1. Check full documentation: [infrastructure/kubernetes/base/components/monitoring/README.md](infrastructure/kubernetes/base/components/monitoring/README.md) 2. Review deployment summary: [MONITORING_DEPLOYMENT_SUMMARY.md](MONITORING_DEPLOYMENT_SUMMARY.md) 3. Check Prometheus logs: `kubectl logs -n monitoring prometheus-0` 4. Check AlertManager logs: `kubectl logs -n monitoring alertmanager-0` 5. Check Grafana logs: `kubectl logs -n monitoring deployment/grafana` --- ## 🎉 You're Done! Your monitoring stack is now running in production! **Next steps:** 1. Save your Grafana password securely 2. Set up on-call rotation 3. Review alert thresholds and adjust as needed 4. Create team-specific dashboards 5. Train team on using monitoring tools **Access your monitoring:** - Grafana: https://monitoring.yourdomain.com/grafana - Prometheus: https://monitoring.yourdomain.com/prometheus - AlertManager: https://monitoring.yourdomain.com/alertmanager - Jaeger: https://monitoring.yourdomain.com/jaeger --- *Deployment time: ~15 minutes* *Last updated: 2026-01-07*