Files
bakery-ia/docs/PILOT_LAUNCH_GUIDE.md
Urtzi Alfaro dfb7e4b237 Add signoz
2026-01-08 12:58:00 +01:00

1260 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Bakery-IA Pilot Launch Guide
**Complete guide for deploying to production for a 10-tenant pilot program**
**Last Updated:** 2026-01-07
**Target Environment:** clouding.io VPS with MicroK8s
**Estimated Cost:** €41-81/month
**Time to Deploy:** 2-4 hours (first time)
---
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [Pre-Launch Checklist](#pre-launch-checklist)
3. [VPS Provisioning](#vps-provisioning)
4. [Infrastructure Setup](#infrastructure-setup)
5. [Domain & DNS Configuration](#domain--dns-configuration)
6. [TLS/SSL Certificates](#tlsssl-certificates)
7. [Email & Communication Setup](#email--communication-setup)
8. [Kubernetes Deployment](#kubernetes-deployment)
9. [Configuration & Secrets](#configuration--secrets)
10. [Database Migrations](#database-migrations)
11. [Verification & Testing](#verification--testing)
12. [Post-Deployment](#post-deployment)
---
## Executive Summary
### What You're Deploying
A complete multi-tenant SaaS platform with:
- **18 microservices** (auth, tenant, ML forecasting, inventory, sales, orders, etc.)
- **14 PostgreSQL databases** with TLS encryption
- **Redis cache** with TLS
- **RabbitMQ** message broker
- **Monitoring stack** (Prometheus, Grafana, AlertManager)
- **Full security** (TLS, RBAC, audit logging)
### Total Cost Breakdown
| Service | Provider | Monthly Cost |
|---------|----------|-------------|
| VPS Server (20GB RAM, 8 vCPU, 200GB SSD) | clouding.io | €40-80 |
| Domain | Namecheap/Cloudflare | €1.25 (€15/year) |
| Email | Zoho Free / Gmail | €0 |
| WhatsApp API | Meta Business | €0 (1k free conversations) |
| DNS | Cloudflare | €0 |
| SSL | Let's Encrypt | €0 |
| **TOTAL** | | **€41-81/month** |
### Timeline
| Phase | Duration | Description |
|-------|----------|-------------|
| Pre-Launch Setup | 1-2 hours | Domain, VPS provisioning, accounts setup |
| Infrastructure Setup | 1 hour | MicroK8s installation, firewall config |
| Deployment | 30-60 min | Deploy all services and databases |
| Verification | 30-60 min | Test everything works |
| **Total** | **2-4 hours** | First-time deployment |
---
## Pre-Launch Checklist
### Required Accounts & Services
- [ ] **Domain Name**
- Register at Namecheap or Cloudflare (€10-15/year)
- Suggested: `bakeryforecast.es` or `bakery-ia.com`
- [ ] **VPS Account**
- Sign up at [clouding.io](https://www.clouding.io)
- Payment method configured
- [ ] **Email Service** (Choose ONE)
- Option A: Zoho Mail FREE (recommended for full send/receive)
- Option B: Gmail SMTP + domain forwarding
- Option C: Google Workspace (14-day free trial, then €5.75/month)
- [ ] **WhatsApp Business API**
- Create Meta Business Account (free)
- Verify business identity
- Phone number ready (non-VoIP)
- [ ] **DNS Access**
- Cloudflare account (free, recommended)
- Or domain registrar DNS panel access
- [ ] **Container Registry** (Choose ONE)
- Option A: Docker Hub account (recommended)
- Option B: GitHub Container Registry
- Option C: MicroK8s built-in registry
### Required Tools on Local Machine
```bash
# Verify you have these installed:
kubectl version --client
docker --version
git --version
ssh -V
openssl version
# Install if missing (macOS):
brew install kubectl docker git openssh openssl
```
### Repository Setup
```bash
# Clone the repository
git clone https://github.com/yourusername/bakery-ia.git
cd bakery-ia
# Verify structure
ls infrastructure/kubernetes/overlays/prod/
```
---
## VPS Provisioning
### Recommended Configuration
**For 10-tenant pilot program:**
- **RAM:** 20 GB
- **CPU:** 8 vCPU cores
- **Storage:** 200 GB NVMe SSD (triple replica)
- **Network:** 1 Gbps connection
- **OS:** Ubuntu 22.04 LTS
- **Monthly Cost:** €40-80 (check current pricing)
### Why These Specs?
**Memory Breakdown:**
- Application services: 14.1 GB
- Databases (18 instances): 4.6 GB
- Infrastructure (Redis, RabbitMQ): 0.8 GB
- Gateway/Frontend: 1.8 GB
- Monitoring: 1.5 GB
- System overhead: ~3 GB
- **Total:** ~26 GB capacity needed, 20 GB is sufficient with HPA
**Storage Breakdown:**
- Databases: 36 GB (18 × 2GB)
- ML Models: 10 GB
- Redis: 1 GB
- RabbitMQ: 2 GB
- Prometheus metrics: 20 GB
- Container images: ~30 GB
- Growth buffer: 100 GB
- **Total:** 199 GB
### Provisioning Steps
1. **Create VPS at clouding.io:**
```
1. Log in to clouding.io dashboard
2. Click "Create New Server"
3. Select:
- OS: Ubuntu 22.04 LTS
- RAM: 20 GB
- CPU: 8 vCPU
- Storage: 200 GB NVMe SSD
- Location: Barcelona (best for Spain)
4. Set hostname: bakery-ia-prod-01
5. Add SSH key (or use password)
6. Create server
```
2. **Note your server details:**
```bash
# Save these for later:
VPS_IP="YOUR_VPS_IP_ADDRESS"
VPS_ROOT_PASSWORD="YOUR_ROOT_PASSWORD" # If not using SSH key
```
3. **Initial SSH connection:**
```bash
# Test connection
ssh root@$VPS_IP
# Update system
apt update && apt upgrade -y
```
---
## Infrastructure Setup
### Step 1: Install MicroK8s
```bash
# SSH into your VPS
ssh root@$VPS_IP
# Install MicroK8s
snap install microk8s --classic --channel=1.28/stable
# Add your user to microk8s group
usermod -a -G microk8s $USER
chown -f -R $USER ~/.kube
newgrp microk8s
# Verify installation
microk8s status --wait-ready
```
### Step 2: Enable Required Add-ons
```bash
# Enable core add-ons
microk8s enable dns
microk8s enable hostpath-storage
microk8s enable ingress
microk8s enable cert-manager
microk8s enable metrics-server
microk8s enable rbac
# Optional but recommended
microk8s enable prometheus # For monitoring
microk8s enable registry # If using local registry
# Setup kubectl alias
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc
# Verify
kubectl get nodes
kubectl get pods -A
```
### Step 3: Configure Firewall
```bash
# Allow necessary ports
ufw allow 22/tcp # SSH
ufw allow 80/tcp # HTTP
ufw allow 443/tcp # HTTPS
ufw allow 16443/tcp # Kubernetes API (optional)
# Enable firewall
ufw enable
# Check status
ufw status verbose
```
### Step 4: Create Namespace
```bash
# Create bakery-ia namespace
kubectl create namespace bakery-ia
# Verify
kubectl get namespaces
```
---
## Domain & DNS Configuration
### Step 1: Register Domain
1. Go to Namecheap or Cloudflare Registrar
2. Search for your desired domain
3. Complete purchase (~€10-15/year)
4. Save domain credentials
### Step 2: Configure Cloudflare DNS (Recommended)
1. **Add site to Cloudflare:**
```
1. Log in to Cloudflare
2. Click "Add a Site"
3. Enter your domain name
4. Choose Free plan
5. Cloudflare will scan existing DNS records
```
2. **Update nameservers at registrar:**
```
Point your domain's nameservers to Cloudflare:
- NS1: assigned.cloudflare.com
- NS2: assigned.cloudflare.com
(Cloudflare will provide the exact values)
```
3. **Add DNS records:**
```
Type Name Content TTL Proxy
A @ YOUR_VPS_IP Auto Yes
A www YOUR_VPS_IP Auto Yes
A api YOUR_VPS_IP Auto Yes
A monitoring YOUR_VPS_IP Auto Yes
CNAME * yourdomain.com Auto No
```
4. **Configure SSL/TLS mode:**
```
SSL/TLS tab → Overview → Set to "Full (strict)"
```
5. **Test DNS propagation:**
```bash
# Wait 5-10 minutes, then test
nslookup yourdomain.com
nslookup api.yourdomain.com
```
---
## TLS/SSL Certificates
### Understanding Certificate Setup
The platform uses **two layers** of SSL/TLS:
1. **External (Ingress) SSL:** Let's Encrypt for public HTTPS
2. **Internal (Database) SSL:** Self-signed certificates for database connections
### Step 1: Generate Internal Certificates
```bash
# On your local machine
cd infrastructure/tls
# Generate certificates
./generate-certificates.sh
# This creates:
# - ca/ (Certificate Authority)
# - postgres/ (PostgreSQL server certs)
# - redis/ (Redis server certs)
```
**Certificate Details:**
- Root CA: 10-year validity (expires 2035)
- Server certs: 3-year validity (expires October 2028)
- Algorithm: RSA 4096-bit
- Signature: SHA-256
### Step 2: Create Kubernetes Secrets
```bash
# Create PostgreSQL TLS secret
kubectl create secret generic postgres-tls \
--from-file=server-cert.pem=infrastructure/tls/postgres/server-cert.pem \
--from-file=server-key.pem=infrastructure/tls/postgres/server-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/postgres/ca-cert.pem \
-n bakery-ia
# Create Redis TLS secret
kubectl create secret generic redis-tls \
--from-file=redis-cert.pem=infrastructure/tls/redis/redis-cert.pem \
--from-file=redis-key.pem=infrastructure/tls/redis/redis-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/redis/ca-cert.pem \
-n bakery-ia
# Verify secrets created
kubectl get secrets -n bakery-ia | grep tls
```
### Step 3: Configure Let's Encrypt (External SSL)
cert-manager is already enabled. Configure the ClusterIssuer:
```bash
# On VPS, create ClusterIssuer
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com # CHANGE THIS
privateKeySecretRef:
name: letsencrypt-production
solvers:
- http01:
ingress:
class: public
EOF
# Verify ClusterIssuer is ready
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-production
```
---
## Email & Communication Setup
### Option A: Zoho Mail (FREE, Recommended)
**Features:**
- ✅ Free forever for 1 domain, 5 users
- ✅ 5GB storage per user
- ✅ Full send/receive capability
- ✅ Web interface + SMTP/IMAP
- ✅ Professional email addresses
**Setup Steps:**
1. **Sign up for Zoho Mail:**
```
1. Go to https://www.zoho.com/mail/
2. Click "Sign Up for Free"
3. Choose "Forever Free" plan
4. Enter your domain name
5. Complete verification
```
2. **Verify domain ownership:**
```
Add TXT record to your DNS:
Type: TXT
Name: @
Value: zoho-verification=XXXXX.zoho.com
```
3. **Configure MX records:**
```
Priority Type Name Value
10 MX @ mx.zoho.com
20 MX @ mx2.zoho.com
50 MX @ mx3.zoho.com
```
4. **Get SMTP credentials:**
```
SMTP Host: smtp.zoho.com
SMTP Port: 587
SMTP Username: noreply@yourdomain.com
SMTP Password: (generate app password in Zoho settings)
```
### Option B: Gmail SMTP + Forwarding
**Features:**
- ✅ Completely free
- ✅ 500 emails/day (sufficient for pilot)
- ✅ Receive via domain forwarding
**Setup Steps:**
1. **Enable 2FA on your Gmail:**
```
1. Go to myaccount.google.com
2. Security → 2-Step Verification
3. Enable and complete setup
```
2. **Generate app password:**
```
1. Security → 2-Step Verification → App passwords
2. Select "Mail" and "Other (Custom name)"
3. Name it "Bakery-IA SMTP"
4. Copy the 16-character password
```
3. **Configure domain email forwarding:**
```
At your domain registrar or Cloudflare:
- Forward noreply@yourdomain.com → your.gmail@gmail.com
- Forward alerts@yourdomain.com → your.gmail@gmail.com
```
4. **SMTP Settings:**
```
SMTP Host: smtp.gmail.com
SMTP Port: 587
SMTP Username: your.gmail@gmail.com
SMTP Password: (16-char app password from step 2)
From Email: noreply@yourdomain.com
```
### WhatsApp Business API Setup
**Features:**
- ✅ First 1,000 conversations/month FREE
- ✅ Perfect for 10 tenants (~500 messages/month)
**Setup Steps:**
1. **Create Meta Business Account:**
```
1. Go to business.facebook.com
2. Create Business Account
3. Complete business verification
```
2. **Add WhatsApp Product:**
```
1. Go to developers.facebook.com
2. Create New App → Business
3. Add WhatsApp product
4. Complete setup wizard
```
3. **Configure Phone Number:**
```
1. Test with your personal number initially
2. Later: Get dedicated business number
3. Verify phone number with SMS code
```
4. **Create Message Templates:**
```
1. Go to WhatsApp Manager
2. Create templates for:
- Low inventory alert
- Expired product alert
- Forecast summary
- Order notification
3. Submit for approval (15 min - 24 hours)
```
5. **Get API Credentials:**
```
Save these values:
- Phone Number ID: (from WhatsApp Manager)
- Access Token: (from App Dashboard)
- Business Account ID: (from WhatsApp Manager)
- Webhook Verify Token: (create your own secure string)
```
---
## Kubernetes Deployment
### Step 1: Prepare Container Images
#### Option A: Using Docker Hub (Recommended)
```bash
# On your local machine
docker login
# Build all images
docker-compose build
# Tag images for Docker Hub
# Replace YOUR_USERNAME with your Docker Hub username
export DOCKER_USERNAME="YOUR_USERNAME"
./scripts/tag-images.sh $DOCKER_USERNAME
# Push to Docker Hub
./scripts/push-images.sh $DOCKER_USERNAME
# Update prod kustomization with your username
# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Replace all "bakery/" with "$DOCKER_USERNAME/"
```
#### Option B: Using MicroK8s Registry
```bash
# On VPS
microk8s enable registry
# Get registry address (usually localhost:32000)
kubectl get service -n container-registry
# On local machine, configure insecure registry
# Edit /etc/docker/daemon.json:
{
"insecure-registries": ["YOUR_VPS_IP:32000"]
}
# Restart Docker
sudo systemctl restart docker
# Tag and push images
docker tag bakery/auth-service YOUR_VPS_IP:32000/bakery/auth-service
docker push YOUR_VPS_IP:32000/bakery/auth-service
# Repeat for all services...
```
### Step 2: Update Production Configuration
The production configuration is already set up for **bakewise.ai** domain:
**Production URLs:**
- **Main Application:** https://bakewise.ai
- **API Endpoints:** https://bakewise.ai/api/v1/...
- **Monitoring Dashboard:** https://monitoring.bakewise.ai/grafana
- **Prometheus:** https://monitoring.bakewise.ai/prometheus
- **SigNoz (Traces/Metrics/Logs):** https://monitoring.bakewise.ai/signoz
- **AlertManager:** https://monitoring.bakewise.ai/alertmanager
```bash
# Verify the configuration is correct:
cat infrastructure/kubernetes/overlays/prod/prod-ingress.yaml | grep -A 3 "host:"
# Expected output should show:
# - host: bakewise.ai
# - host: monitoring.bakewise.ai
# Verify CORS configuration
cat infrastructure/kubernetes/overlays/prod/prod-configmap.yaml | grep CORS
# Expected: CORS_ORIGINS: "https://bakewise.ai"
```
**If using a different domain**, update these files:
```bash
# 1. Update domain names
nano infrastructure/kubernetes/overlays/prod/prod-ingress.yaml
# Replace bakewise.ai with your domain
# 2. Update ConfigMap
nano infrastructure/kubernetes/overlays/prod/prod-configmap.yaml
# Update CORS_ORIGINS
# 3. Verify image names (if using custom registry)
nano infrastructure/kubernetes/overlays/prod/kustomization.yaml
```
---
## Configuration & Secrets
### Step 1: Generate Strong Passwords
```bash
# Generate passwords for all services
openssl rand -base64 32 # For each database
openssl rand -hex 32 # For JWT secrets and API keys
# Save all passwords securely!
# Recommended: Use a password manager (1Password, LastPass, Bitwarden)
```
### Step 2: Update Application Secrets
```bash
# Edit the secrets file
nano infrastructure/kubernetes/base/secrets.yaml
# Update ALL of these values:
# Database passwords (14 databases):
AUTH_DB_PASSWORD: <base64-encoded-password>
TENANT_DB_PASSWORD: <base64-encoded-password>
# ... (all 14 databases)
# Redis password:
REDIS_PASSWORD: <base64-encoded-password>
# JWT secrets:
JWT_SECRET_KEY: <base64-encoded-secret>
JWT_REFRESH_SECRET_KEY: <base64-encoded-secret>
# SMTP settings (from email setup):
SMTP_HOST: <base64-encoded-host> # smtp.zoho.com or smtp.gmail.com
SMTP_PORT: <base64-encoded-port> # 587
SMTP_USERNAME: <base64-encoded-username> # your email
SMTP_PASSWORD: <base64-encoded-password> # app password
DEFAULT_FROM_EMAIL: <base64-encoded-email> # noreply@yourdomain.com
# WhatsApp credentials (from WhatsApp setup):
WHATSAPP_ACCESS_TOKEN: <base64-encoded-token>
WHATSAPP_PHONE_NUMBER_ID: <base64-encoded-id>
WHATSAPP_BUSINESS_ACCOUNT_ID: <base64-encoded-id>
WHATSAPP_WEBHOOK_VERIFY_TOKEN: <base64-encoded-token>
# Database connection strings (update with actual passwords):
AUTH_DATABASE_URL: postgresql+asyncpg://auth_user:PASSWORD@auth-db:5432/auth_db?ssl=require
# ... (all 14 databases)
```
**To base64 encode:**
```bash
echo -n "your-password-here" | base64
```
**CRITICAL:** Never commit real secrets to git! Use `.gitignore` for secrets files.
### Step 3: Apply Secrets
```bash
# Copy manifests to VPS
scp -r infrastructure/kubernetes user@YOUR_VPS_IP:~/
# SSH to VPS
ssh user@YOUR_VPS_IP
# Apply secrets
kubectl apply -f ~/infrastructure/kubernetes/base/secrets.yaml
# Verify secrets created
kubectl get secrets -n bakery-ia
```
---
## Database Migrations
### Step 1: Deploy Databases
```bash
# On VPS
kubectl apply -k ~/kubernetes/overlays/prod
# Wait for databases to be ready (5-10 minutes)
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=database -n bakery-ia --timeout=600s
# Check status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
```
### Step 2: Run Migrations
Migrations are automatically handled by init containers in each service. Verify they completed:
```bash
# Check migration job status
kubectl get jobs -n bakery-ia | grep migration
# All should show "COMPLETIONS = 1/1"
# Check logs if any failed
kubectl logs -n bakery-ia job/auth-migration
```
### Step 3: Verify Database Schemas
```bash
# Connect to a database to verify
kubectl exec -n bakery-ia deployment/auth-db -it -- psql -U auth_user -d auth_db
# Inside psql:
\dt # List tables
\d users # Describe users table
\q # Quit
```
---
## Verification & Testing
### Step 1: Check All Pods Running
```bash
# View all pods
kubectl get pods -n bakery-ia
# Expected: All pods in "Running" state, none in CrashLoopBackOff
# Check for issues
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"
# View logs for any problematic pods
kubectl logs -n bakery-ia POD_NAME
```
### Step 2: Check Services and Ingress
```bash
# View services
kubectl get svc -n bakery-ia
# View ingress
kubectl get ingress -n bakery-ia
# View certificates (should auto-issue from Let's Encrypt)
kubectl get certificate -n bakery-ia
# Describe certificate to check status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
```
### Step 3: Test Database Connections
```bash
# Test PostgreSQL TLS
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected output: on
# Test Redis TLS
kubectl exec -n bakery-ia deployment/redis -- redis-cli \
--tls \
--cert /tls/redis-cert.pem \
--key /tls/redis-key.pem \
--cacert /tls/ca-cert.pem \
-a $REDIS_PASSWORD \
ping
# Expected output: PONG
```
### Step 4: Test Frontend Access
```bash
# Test frontend (replace with your domain)
curl -I https://bakery.yourdomain.com
# Expected: HTTP/2 200 OK
# Test API health
curl https://api.yourdomain.com/health
# Expected: {"status": "healthy"}
```
### Step 5: Test Authentication
```bash
# Create a test user (using your frontend or API)
curl -X POST https://api.yourdomain.com/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!",
"name": "Test User"
}'
# Login
curl -X POST https://api.yourdomain.com/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!"
}'
# Expected: JWT token in response
```
### Step 6: Test Email Delivery
```bash
# Trigger a password reset to test email
curl -X POST https://api.yourdomain.com/api/v1/auth/forgot-password \
-H "Content-Type: application/json" \
-d '{"email": "test@yourdomain.com"}'
# Check your email inbox for the reset link
# Check service logs if email not received:
kubectl logs -n bakery-ia deployment/auth-service | grep -i "email\|smtp"
```
### Step 7: Test WhatsApp (Optional)
```bash
# Send a test WhatsApp message
# This requires creating a tenant and configuring WhatsApp in the UI
# Or test via API once authenticated
```
---
## Post-Deployment
### Step 1: Access Monitoring Stack
Your production monitoring stack provides complete observability with multiple tools:
#### Production Monitoring URLs
Access via domain (recommended):
```
https://monitoring.bakewise.ai/grafana # Dashboards & visualization
https://monitoring.bakewise.ai/prometheus # Metrics & queries
https://monitoring.bakewise.ai/signoz # Unified observability platform (traces, metrics, logs)
https://monitoring.bakewise.ai/alertmanager # Alert management
```
Or via port forwarding (if needed):
```bash
# Grafana
kubectl port-forward -n monitoring svc/grafana 3000:3000 &
# Prometheus
kubectl port-forward -n monitoring svc/prometheus-external 9090:9090 &
# SigNoz
kubectl port-forward -n monitoring svc/signoz-frontend 3301:3301 &
# AlertManager
kubectl port-forward -n monitoring svc/alertmanager-external 9093:9093 &
```
#### Available Dashboards
Login to Grafana (admin / your-password) and explore:
**Main Dashboards:**
1. **Gateway Metrics** - HTTP request rates, latencies, error rates
2. **Services Overview** - Multi-service health and performance
3. **Circuit Breakers** - Reliability metrics
**Extended Dashboards:**
4. **Service Performance Monitoring (SPM)** - RED metrics from distributed traces
5. **PostgreSQL Database** - Database health, connections, query performance
6. **Node Exporter Infrastructure** - CPU, memory, disk, network per node
7. **AlertManager Monitoring** - Alert tracking and notification status
8. **Business Metrics & KPIs** - Tenant activity, ML jobs, forecasts
#### Quick Health Check
```bash
# Verify all monitoring pods are running
kubectl get pods -n monitoring
# Check Prometheus targets (all should be UP)
kubectl port-forward -n monitoring svc/prometheus-external 9090:9090
# Open: http://localhost:9090/targets
# View active alerts
kubectl port-forward -n monitoring svc/prometheus-external 9090:9090
# Open: http://localhost:9090/alerts
```
### Step 2: Configure Alerting
Update AlertManager with your notification email addresses:
```bash
# Edit alertmanager configuration
kubectl edit configmap -n monitoring alertmanager-config
# Update recipient emails in the routes section:
# - alerts@bakewise.ai (general alerts)
# - critical-alerts@bakewise.ai (critical issues)
# - oncall@bakewise.ai (on-call rotation)
```
Test alert delivery:
```bash
# Fire a test alert
kubectl run memory-test --image=polinux/stress --restart=Never \
--namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s
# Check alert appears in AlertManager
# https://monitoring.bakewise.ai/alertmanager
# Verify email notification received
# Clean up test
kubectl delete pod memory-test -n bakery-ia
```
### Step 3: Configure Backups
```bash
# Create backup script on VPS
cat > ~/backup-databases.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR
# Get all database pods
DBS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database -o name)
for db in $DBS; do
DB_NAME=$(echo $db | cut -d'/' -f2)
echo "Backing up $DB_NAME..."
kubectl exec -n bakery-ia $db -- pg_dump -U postgres > "$BACKUP_DIR/${DB_NAME}.sql"
done
# Compress backups
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"
# Keep only last 7 days
find /backups -name "*.tar.gz" -mtime +7 -delete
echo "Backup completed: $BACKUP_DIR.tar.gz"
EOF
chmod +x ~/backup-databases.sh
# Test backup
./backup-databases.sh
# Setup daily cron job (2 AM)
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -
```
### Step 3: Setup Alerting
```bash
# Update AlertManager configuration with your email
kubectl edit configmap -n monitoring alertmanager-config
# Update recipient emails in the routes section
```
### Step 4: Verify Monitoring is Working
Before proceeding, ensure all monitoring components are operational:
```bash
# 1. Check Prometheus targets
# Open: https://monitoring.bakewise.ai/prometheus/targets
# All targets should show "UP" status
# 2. Verify Grafana dashboards load data
# Open: https://monitoring.bakewise.ai/grafana
# Navigate to any dashboard and verify metrics are displaying
# 3. Check SigNoz is receiving traces
# Open: https://monitoring.bakewise.ai/signoz
# Search for traces from "gateway" service
# 4. Verify AlertManager cluster
# Open: https://monitoring.bakewise.ai/alertmanager
# Check that all 3 AlertManager instances are connected
```
### Step 5: Document Everything
Create a secure runbook with all credentials and procedures:
**Essential Information to Document:**
- [ ] VPS login credentials (stored securely in password manager)
- [ ] Database passwords (in password manager)
- [ ] Grafana admin password
- [ ] Domain registrar access (for bakewise.ai)
- [ ] Cloudflare access
- [ ] Email service credentials (SMTP)
- [ ] WhatsApp API credentials
- [ ] Docker Hub / Registry credentials
- [ ] Emergency contact information
- [ ] Rollback procedures
- [ ] Monitoring URLs and access procedures
### Step 6: Train Your Team
Conduct a training session covering:
- [ ] **Access monitoring dashboards**
- Show how to login to https://monitoring.bakewise.ai/grafana
- Walk through key dashboards (Services Overview, Database, Infrastructure)
- Explain how to interpret metrics and identify issues
- [ ] **Check application logs**
```bash
# View logs for a service
kubectl logs -n bakery-ia deployment/orders-service --tail=100 -f
# Search for errors
kubectl logs -n bakery-ia deployment/gateway | grep ERROR
```
- [ ] **Restart services when needed**
```bash
# Restart a service (rolling update, no downtime)
kubectl rollout restart deployment/orders-service -n bakery-ia
```
- [ ] **Respond to alerts**
- Show how to access AlertManager at https://monitoring.bakewise.ai/alertmanager
- Review common alerts and their resolution steps
- Reference the [Production Operations Guide](./PRODUCTION_OPERATIONS_GUIDE.md)
- [ ] **Share documentation**
- [PILOT_LAUNCH_GUIDE.md](./PILOT_LAUNCH_GUIDE.md) - This guide
- [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md) - Daily operations
- [security-checklist.md](./security-checklist.md) - Security procedures
- [ ] **Setup on-call rotation** (if applicable)
- Configure in AlertManager
- Document escalation procedures
---
## Troubleshooting
### Issue: Pods Not Starting
```bash
# Check pod status
kubectl describe pod POD_NAME -n bakery-ia
# Common causes:
# 1. Image pull errors
kubectl get events -n bakery-ia | grep -i "pull"
# 2. Resource limits
kubectl describe node
# 3. Volume mount issues
kubectl get pvc -n bakery-ia
```
### Issue: Certificate Not Issuing
```bash
# Check certificate status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
# Check challenges
kubectl get challenges -n bakery-ia
# Verify DNS is correct
nslookup bakery.yourdomain.com
```
### Issue: Database Connection Errors
```bash
# Check database pod
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
# Check database logs
kubectl logs -n bakery-ia deployment/auth-db
# Test connection from service pod
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432
```
### Issue: Services Can't Connect to Databases
```bash
# Check if SSL is enabled
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Check service logs for SSL errors
kubectl logs -n bakery-ia deployment/auth-service | grep -i "ssl\|tls"
# Restart service to pick up new SSL config
kubectl rollout restart deployment/auth-service -n bakery-ia
```
### Issue: Out of Resources
```bash
# Check node resources
kubectl top nodes
# Check pod resource usage
kubectl top pods -n bakery-ia
# Identify resource hogs
kubectl top pods -n bakery-ia --sort-by=memory
# Scale down non-critical services temporarily
kubectl scale deployment monitoring -n bakery-ia --replicas=0
```
---
## Next Steps After Successful Launch
1. **Monitor for 48 Hours**
- Check dashboards daily
- Review error logs
- Monitor resource usage
- Test all functionality
2. **Optimize Based on Metrics**
- Adjust resource limits if needed
- Fine-tune autoscaling thresholds
- Optimize database queries if slow
3. **Onboard First Tenant**
- Create test tenant
- Upload sample data
- Test all features
- Gather feedback
4. **Scale Gradually**
- Add 1-2 tenants at a time
- Monitor resource usage
- Upgrade VPS if needed (see scaling guide)
5. **Plan for Growth**
- Review [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md)
- Implement additional monitoring
- Plan capacity upgrades
- Consider managed services for scale
---
## Cost Scaling Path
| Tenants | RAM | CPU | Storage | Monthly Cost |
|---------|-----|-----|---------|--------------|
| 10 | 20 GB | 8 cores | 200 GB | €40-80 |
| 25 | 32 GB | 12 cores | 300 GB | €80-120 |
| 50 | 48 GB | 16 cores | 500 GB | €150-200 |
| 100+ | Consider multi-node cluster or managed K8s | €300+ |
---
## Support Resources
**Documentation:**
- **Operations Guide:** [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md) - Daily operations, monitoring, incident response
- **Security Guide:** [security-checklist.md](./security-checklist.md) - Security procedures and compliance
- **Database Security:** [database-security.md](./database-security.md) - Database operations and TLS configuration
- **TLS Configuration:** [tls-configuration.md](./tls-configuration.md) - Certificate management
- **RBAC Implementation:** [rbac-implementation.md](./rbac-implementation.md) - Access control
**Monitoring Access:**
- **Grafana:** https://monitoring.bakewise.ai/grafana (admin / your-password)
- **Prometheus:** https://monitoring.bakewise.ai/prometheus
- **SigNoz:** https://monitoring.bakewise.ai/signoz
- **AlertManager:** https://monitoring.bakewise.ai/alertmanager
**External Resources:**
- **MicroK8s Docs:** https://microk8s.io/docs
- **Kubernetes Docs:** https://kubernetes.io/docs
- **Let's Encrypt:** https://letsencrypt.org/docs
- **Cloudflare DNS:** https://developers.cloudflare.com/dns
- **Monitoring Stack README:** infrastructure/kubernetes/base/components/monitoring/README.md
---
## Summary Checklist
Before going live, ensure:
- [ ] VPS provisioned and accessible
- [ ] MicroK8s installed and configured
- [ ] Domain registered and DNS configured
- [ ] Cloudflare protection enabled
- [ ] TLS certificates generated
- [ ] Email service configured and tested
- [ ] WhatsApp API setup (optional for launch)
- [ ] Container images built and pushed
- [ ] Production configs updated (domains, CORS, etc.)
- [ ] Secrets generated (strong passwords!)
- [ ] All pods running successfully
- [ ] Databases accepting TLS connections
- [ ] Let's Encrypt certificates issued
- [ ] Frontend accessible via HTTPS
- [ ] API health check passing
- [ ] Test user can login
- [ ] Email delivery working
- [ ] Monitoring dashboards loading
- [ ] Backups configured and tested
- [ ] Team trained on operations
- [ ] Documentation complete
- [ ] Emergency procedures documented
---
**🎉 Congratulations! Your Bakery-IA platform is now live in production!**
*Estimated total time: 2-4 hours for first deployment*
*Subsequent updates: 15-30 minutes*
---
**Document Version:** 1.0
**Last Updated:** 2026-01-07
**Maintained By:** DevOps Team