Files
bakery-ia/docs/PILOT_LAUNCH_GUIDE.md

1260 lines
31 KiB
Markdown
Raw Normal View History

2026-01-07 19:12:35 +01:00
# Bakery-IA Pilot Launch Guide
**Complete guide for deploying to production for a 10-tenant pilot program**
**Last Updated:** 2026-01-07
**Target Environment:** clouding.io VPS with MicroK8s
**Estimated Cost:** €41-81/month
**Time to Deploy:** 2-4 hours (first time)
---
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [Pre-Launch Checklist](#pre-launch-checklist)
3. [VPS Provisioning](#vps-provisioning)
4. [Infrastructure Setup](#infrastructure-setup)
5. [Domain & DNS Configuration](#domain--dns-configuration)
6. [TLS/SSL Certificates](#tlsssl-certificates)
7. [Email & Communication Setup](#email--communication-setup)
8. [Kubernetes Deployment](#kubernetes-deployment)
9. [Configuration & Secrets](#configuration--secrets)
10. [Database Migrations](#database-migrations)
11. [Verification & Testing](#verification--testing)
12. [Post-Deployment](#post-deployment)
---
## Executive Summary
### What You're Deploying
A complete multi-tenant SaaS platform with:
- **18 microservices** (auth, tenant, ML forecasting, inventory, sales, orders, etc.)
- **14 PostgreSQL databases** with TLS encryption
- **Redis cache** with TLS
- **RabbitMQ** message broker
- **Monitoring stack** (Prometheus, Grafana, AlertManager)
- **Full security** (TLS, RBAC, audit logging)
### Total Cost Breakdown
| Service | Provider | Monthly Cost |
|---------|----------|-------------|
| VPS Server (20GB RAM, 8 vCPU, 200GB SSD) | clouding.io | €40-80 |
| Domain | Namecheap/Cloudflare | €1.25 (€15/year) |
| Email | Zoho Free / Gmail | €0 |
| WhatsApp API | Meta Business | €0 (1k free conversations) |
| DNS | Cloudflare | €0 |
| SSL | Let's Encrypt | €0 |
| **TOTAL** | | **€41-81/month** |
### Timeline
| Phase | Duration | Description |
|-------|----------|-------------|
| Pre-Launch Setup | 1-2 hours | Domain, VPS provisioning, accounts setup |
| Infrastructure Setup | 1 hour | MicroK8s installation, firewall config |
| Deployment | 30-60 min | Deploy all services and databases |
| Verification | 30-60 min | Test everything works |
| **Total** | **2-4 hours** | First-time deployment |
---
## Pre-Launch Checklist
### Required Accounts & Services
- [ ] **Domain Name**
- Register at Namecheap or Cloudflare (€10-15/year)
- Suggested: `bakeryforecast.es` or `bakery-ia.com`
- [ ] **VPS Account**
- Sign up at [clouding.io](https://www.clouding.io)
- Payment method configured
- [ ] **Email Service** (Choose ONE)
- Option A: Zoho Mail FREE (recommended for full send/receive)
- Option B: Gmail SMTP + domain forwarding
- Option C: Google Workspace (14-day free trial, then €5.75/month)
- [ ] **WhatsApp Business API**
- Create Meta Business Account (free)
- Verify business identity
- Phone number ready (non-VoIP)
- [ ] **DNS Access**
- Cloudflare account (free, recommended)
- Or domain registrar DNS panel access
- [ ] **Container Registry** (Choose ONE)
- Option A: Docker Hub account (recommended)
- Option B: GitHub Container Registry
- Option C: MicroK8s built-in registry
### Required Tools on Local Machine
```bash
# Verify you have these installed:
kubectl version --client
docker --version
git --version
ssh -V
openssl version
# Install if missing (macOS):
brew install kubectl docker git openssh openssl
```
### Repository Setup
```bash
# Clone the repository
git clone https://github.com/yourusername/bakery-ia.git
cd bakery-ia
# Verify structure
ls infrastructure/kubernetes/overlays/prod/
```
---
## VPS Provisioning
### Recommended Configuration
**For 10-tenant pilot program:**
- **RAM:** 20 GB
- **CPU:** 8 vCPU cores
- **Storage:** 200 GB NVMe SSD (triple replica)
- **Network:** 1 Gbps connection
- **OS:** Ubuntu 22.04 LTS
- **Monthly Cost:** €40-80 (check current pricing)
### Why These Specs?
**Memory Breakdown:**
- Application services: 14.1 GB
- Databases (18 instances): 4.6 GB
- Infrastructure (Redis, RabbitMQ): 0.8 GB
- Gateway/Frontend: 1.8 GB
- Monitoring: 1.5 GB
- System overhead: ~3 GB
- **Total:** ~26 GB capacity needed, 20 GB is sufficient with HPA
**Storage Breakdown:**
- Databases: 36 GB (18 × 2GB)
- ML Models: 10 GB
- Redis: 1 GB
- RabbitMQ: 2 GB
- Prometheus metrics: 20 GB
- Container images: ~30 GB
- Growth buffer: 100 GB
- **Total:** 199 GB
### Provisioning Steps
1. **Create VPS at clouding.io:**
```
1. Log in to clouding.io dashboard
2. Click "Create New Server"
3. Select:
- OS: Ubuntu 22.04 LTS
- RAM: 20 GB
- CPU: 8 vCPU
- Storage: 200 GB NVMe SSD
- Location: Barcelona (best for Spain)
4. Set hostname: bakery-ia-prod-01
5. Add SSH key (or use password)
6. Create server
```
2. **Note your server details:**
```bash
# Save these for later:
VPS_IP="YOUR_VPS_IP_ADDRESS"
VPS_ROOT_PASSWORD="YOUR_ROOT_PASSWORD" # If not using SSH key
```
3. **Initial SSH connection:**
```bash
# Test connection
ssh root@$VPS_IP
# Update system
apt update && apt upgrade -y
```
---
## Infrastructure Setup
### Step 1: Install MicroK8s
```bash
# SSH into your VPS
ssh root@$VPS_IP
# Install MicroK8s
snap install microk8s --classic --channel=1.28/stable
# Add your user to microk8s group
usermod -a -G microk8s $USER
chown -f -R $USER ~/.kube
newgrp microk8s
# Verify installation
microk8s status --wait-ready
```
### Step 2: Enable Required Add-ons
```bash
# Enable core add-ons
microk8s enable dns
microk8s enable hostpath-storage
microk8s enable ingress
microk8s enable cert-manager
microk8s enable metrics-server
microk8s enable rbac
# Optional but recommended
microk8s enable prometheus # For monitoring
microk8s enable registry # If using local registry
# Setup kubectl alias
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc
# Verify
kubectl get nodes
kubectl get pods -A
```
### Step 3: Configure Firewall
```bash
# Allow necessary ports
ufw allow 22/tcp # SSH
ufw allow 80/tcp # HTTP
ufw allow 443/tcp # HTTPS
ufw allow 16443/tcp # Kubernetes API (optional)
# Enable firewall
ufw enable
# Check status
ufw status verbose
```
### Step 4: Create Namespace
```bash
# Create bakery-ia namespace
kubectl create namespace bakery-ia
# Verify
kubectl get namespaces
```
---
## Domain & DNS Configuration
### Step 1: Register Domain
1. Go to Namecheap or Cloudflare Registrar
2. Search for your desired domain
3. Complete purchase (~€10-15/year)
4. Save domain credentials
### Step 2: Configure Cloudflare DNS (Recommended)
1. **Add site to Cloudflare:**
```
1. Log in to Cloudflare
2. Click "Add a Site"
3. Enter your domain name
4. Choose Free plan
5. Cloudflare will scan existing DNS records
```
2. **Update nameservers at registrar:**
```
Point your domain's nameservers to Cloudflare:
- NS1: assigned.cloudflare.com
- NS2: assigned.cloudflare.com
(Cloudflare will provide the exact values)
```
3. **Add DNS records:**
```
Type Name Content TTL Proxy
A @ YOUR_VPS_IP Auto Yes
A www YOUR_VPS_IP Auto Yes
A api YOUR_VPS_IP Auto Yes
A monitoring YOUR_VPS_IP Auto Yes
CNAME * yourdomain.com Auto No
```
4. **Configure SSL/TLS mode:**
```
SSL/TLS tab → Overview → Set to "Full (strict)"
```
5. **Test DNS propagation:**
```bash
# Wait 5-10 minutes, then test
nslookup yourdomain.com
nslookup api.yourdomain.com
```
---
## TLS/SSL Certificates
### Understanding Certificate Setup
The platform uses **two layers** of SSL/TLS:
1. **External (Ingress) SSL:** Let's Encrypt for public HTTPS
2. **Internal (Database) SSL:** Self-signed certificates for database connections
### Step 1: Generate Internal Certificates
```bash
# On your local machine
cd infrastructure/tls
# Generate certificates
./generate-certificates.sh
# This creates:
# - ca/ (Certificate Authority)
# - postgres/ (PostgreSQL server certs)
# - redis/ (Redis server certs)
```
**Certificate Details:**
- Root CA: 10-year validity (expires 2035)
- Server certs: 3-year validity (expires October 2028)
- Algorithm: RSA 4096-bit
- Signature: SHA-256
### Step 2: Create Kubernetes Secrets
```bash
# Create PostgreSQL TLS secret
kubectl create secret generic postgres-tls \
--from-file=server-cert.pem=infrastructure/tls/postgres/server-cert.pem \
--from-file=server-key.pem=infrastructure/tls/postgres/server-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/postgres/ca-cert.pem \
-n bakery-ia
# Create Redis TLS secret
kubectl create secret generic redis-tls \
--from-file=redis-cert.pem=infrastructure/tls/redis/redis-cert.pem \
--from-file=redis-key.pem=infrastructure/tls/redis/redis-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/redis/ca-cert.pem \
-n bakery-ia
# Verify secrets created
kubectl get secrets -n bakery-ia | grep tls
```
### Step 3: Configure Let's Encrypt (External SSL)
cert-manager is already enabled. Configure the ClusterIssuer:
```bash
# On VPS, create ClusterIssuer
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com # CHANGE THIS
privateKeySecretRef:
name: letsencrypt-production
solvers:
- http01:
ingress:
class: public
EOF
# Verify ClusterIssuer is ready
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-production
```
---
## Email & Communication Setup
### Option A: Zoho Mail (FREE, Recommended)
**Features:**
- ✅ Free forever for 1 domain, 5 users
- ✅ 5GB storage per user
- ✅ Full send/receive capability
- ✅ Web interface + SMTP/IMAP
- ✅ Professional email addresses
**Setup Steps:**
1. **Sign up for Zoho Mail:**
```
1. Go to https://www.zoho.com/mail/
2. Click "Sign Up for Free"
3. Choose "Forever Free" plan
4. Enter your domain name
5. Complete verification
```
2. **Verify domain ownership:**
```
Add TXT record to your DNS:
Type: TXT
Name: @
Value: zoho-verification=XXXXX.zoho.com
```
3. **Configure MX records:**
```
Priority Type Name Value
10 MX @ mx.zoho.com
20 MX @ mx2.zoho.com
50 MX @ mx3.zoho.com
```
4. **Get SMTP credentials:**
```
SMTP Host: smtp.zoho.com
SMTP Port: 587
SMTP Username: noreply@yourdomain.com
SMTP Password: (generate app password in Zoho settings)
```
### Option B: Gmail SMTP + Forwarding
**Features:**
- ✅ Completely free
- ✅ 500 emails/day (sufficient for pilot)
- ✅ Receive via domain forwarding
**Setup Steps:**
1. **Enable 2FA on your Gmail:**
```
1. Go to myaccount.google.com
2. Security → 2-Step Verification
3. Enable and complete setup
```
2. **Generate app password:**
```
1. Security → 2-Step Verification → App passwords
2. Select "Mail" and "Other (Custom name)"
3. Name it "Bakery-IA SMTP"
4. Copy the 16-character password
```
3. **Configure domain email forwarding:**
```
At your domain registrar or Cloudflare:
- Forward noreply@yourdomain.com → your.gmail@gmail.com
- Forward alerts@yourdomain.com → your.gmail@gmail.com
```
4. **SMTP Settings:**
```
SMTP Host: smtp.gmail.com
SMTP Port: 587
SMTP Username: your.gmail@gmail.com
SMTP Password: (16-char app password from step 2)
From Email: noreply@yourdomain.com
```
### WhatsApp Business API Setup
**Features:**
- ✅ First 1,000 conversations/month FREE
- ✅ Perfect for 10 tenants (~500 messages/month)
**Setup Steps:**
1. **Create Meta Business Account:**
```
1. Go to business.facebook.com
2. Create Business Account
3. Complete business verification
```
2. **Add WhatsApp Product:**
```
1. Go to developers.facebook.com
2. Create New App → Business
3. Add WhatsApp product
4. Complete setup wizard
```
3. **Configure Phone Number:**
```
1. Test with your personal number initially
2. Later: Get dedicated business number
3. Verify phone number with SMS code
```
4. **Create Message Templates:**
```
1. Go to WhatsApp Manager
2. Create templates for:
- Low inventory alert
- Expired product alert
- Forecast summary
- Order notification
3. Submit for approval (15 min - 24 hours)
```
5. **Get API Credentials:**
```
Save these values:
- Phone Number ID: (from WhatsApp Manager)
- Access Token: (from App Dashboard)
- Business Account ID: (from WhatsApp Manager)
- Webhook Verify Token: (create your own secure string)
```
---
## Kubernetes Deployment
### Step 1: Prepare Container Images
#### Option A: Using Docker Hub (Recommended)
```bash
# On your local machine
docker login
# Build all images
docker-compose build
# Tag images for Docker Hub
# Replace YOUR_USERNAME with your Docker Hub username
export DOCKER_USERNAME="YOUR_USERNAME"
./scripts/tag-images.sh $DOCKER_USERNAME
# Push to Docker Hub
./scripts/push-images.sh $DOCKER_USERNAME
# Update prod kustomization with your username
# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Replace all "bakery/" with "$DOCKER_USERNAME/"
```
#### Option B: Using MicroK8s Registry
```bash
# On VPS
microk8s enable registry
# Get registry address (usually localhost:32000)
kubectl get service -n container-registry
# On local machine, configure insecure registry
# Edit /etc/docker/daemon.json:
{
"insecure-registries": ["YOUR_VPS_IP:32000"]
}
# Restart Docker
sudo systemctl restart docker
# Tag and push images
docker tag bakery/auth-service YOUR_VPS_IP:32000/bakery/auth-service
docker push YOUR_VPS_IP:32000/bakery/auth-service
# Repeat for all services...
```
### Step 2: Update Production Configuration
2026-01-08 12:58:00 +01:00
The production configuration is already set up for **bakewise.ai** domain:
**Production URLs:**
- **Main Application:** https://bakewise.ai
- **API Endpoints:** https://bakewise.ai/api/v1/...
- **Monitoring Dashboard:** https://monitoring.bakewise.ai/grafana
- **Prometheus:** https://monitoring.bakewise.ai/prometheus
- **SigNoz (Traces/Metrics/Logs):** https://monitoring.bakewise.ai/signoz
- **AlertManager:** https://monitoring.bakewise.ai/alertmanager
2026-01-07 19:12:35 +01:00
```bash
2026-01-08 12:58:00 +01:00
# Verify the configuration is correct:
cat infrastructure/kubernetes/overlays/prod/prod-ingress.yaml | grep -A 3 "host:"
# Expected output should show:
# - host: bakewise.ai
# - host: monitoring.bakewise.ai
# Verify CORS configuration
cat infrastructure/kubernetes/overlays/prod/prod-configmap.yaml | grep CORS
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
# Expected: CORS_ORIGINS: "https://bakewise.ai"
```
**If using a different domain**, update these files:
```bash
2026-01-07 19:12:35 +01:00
# 1. Update domain names
nano infrastructure/kubernetes/overlays/prod/prod-ingress.yaml
2026-01-08 12:58:00 +01:00
# Replace bakewise.ai with your domain
2026-01-07 19:12:35 +01:00
# 2. Update ConfigMap
nano infrastructure/kubernetes/overlays/prod/prod-configmap.yaml
2026-01-08 12:58:00 +01:00
# Update CORS_ORIGINS
2026-01-07 19:12:35 +01:00
# 3. Verify image names (if using custom registry)
nano infrastructure/kubernetes/overlays/prod/kustomization.yaml
```
---
## Configuration & Secrets
### Step 1: Generate Strong Passwords
```bash
# Generate passwords for all services
openssl rand -base64 32 # For each database
openssl rand -hex 32 # For JWT secrets and API keys
# Save all passwords securely!
# Recommended: Use a password manager (1Password, LastPass, Bitwarden)
```
### Step 2: Update Application Secrets
```bash
# Edit the secrets file
nano infrastructure/kubernetes/base/secrets.yaml
# Update ALL of these values:
# Database passwords (14 databases):
AUTH_DB_PASSWORD: <base64-encoded-password>
TENANT_DB_PASSWORD: <base64-encoded-password>
# ... (all 14 databases)
# Redis password:
REDIS_PASSWORD: <base64-encoded-password>
# JWT secrets:
JWT_SECRET_KEY: <base64-encoded-secret>
JWT_REFRESH_SECRET_KEY: <base64-encoded-secret>
# SMTP settings (from email setup):
SMTP_HOST: <base64-encoded-host> # smtp.zoho.com or smtp.gmail.com
SMTP_PORT: <base64-encoded-port> # 587
SMTP_USERNAME: <base64-encoded-username> # your email
SMTP_PASSWORD: <base64-encoded-password> # app password
DEFAULT_FROM_EMAIL: <base64-encoded-email> # noreply@yourdomain.com
# WhatsApp credentials (from WhatsApp setup):
WHATSAPP_ACCESS_TOKEN: <base64-encoded-token>
WHATSAPP_PHONE_NUMBER_ID: <base64-encoded-id>
WHATSAPP_BUSINESS_ACCOUNT_ID: <base64-encoded-id>
WHATSAPP_WEBHOOK_VERIFY_TOKEN: <base64-encoded-token>
# Database connection strings (update with actual passwords):
AUTH_DATABASE_URL: postgresql+asyncpg://auth_user:PASSWORD@auth-db:5432/auth_db?ssl=require
# ... (all 14 databases)
```
**To base64 encode:**
```bash
echo -n "your-password-here" | base64
```
**CRITICAL:** Never commit real secrets to git! Use `.gitignore` for secrets files.
### Step 3: Apply Secrets
```bash
# Copy manifests to VPS
scp -r infrastructure/kubernetes user@YOUR_VPS_IP:~/
# SSH to VPS
ssh user@YOUR_VPS_IP
# Apply secrets
kubectl apply -f ~/infrastructure/kubernetes/base/secrets.yaml
# Verify secrets created
kubectl get secrets -n bakery-ia
```
---
## Database Migrations
### Step 1: Deploy Databases
```bash
# On VPS
kubectl apply -k ~/kubernetes/overlays/prod
# Wait for databases to be ready (5-10 minutes)
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=database -n bakery-ia --timeout=600s
# Check status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
```
### Step 2: Run Migrations
Migrations are automatically handled by init containers in each service. Verify they completed:
```bash
# Check migration job status
kubectl get jobs -n bakery-ia | grep migration
# All should show "COMPLETIONS = 1/1"
# Check logs if any failed
kubectl logs -n bakery-ia job/auth-migration
```
### Step 3: Verify Database Schemas
```bash
# Connect to a database to verify
kubectl exec -n bakery-ia deployment/auth-db -it -- psql -U auth_user -d auth_db
# Inside psql:
\dt # List tables
\d users # Describe users table
\q # Quit
```
---
## Verification & Testing
### Step 1: Check All Pods Running
```bash
# View all pods
kubectl get pods -n bakery-ia
# Expected: All pods in "Running" state, none in CrashLoopBackOff
# Check for issues
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"
# View logs for any problematic pods
kubectl logs -n bakery-ia POD_NAME
```
### Step 2: Check Services and Ingress
```bash
# View services
kubectl get svc -n bakery-ia
# View ingress
kubectl get ingress -n bakery-ia
# View certificates (should auto-issue from Let's Encrypt)
kubectl get certificate -n bakery-ia
# Describe certificate to check status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
```
### Step 3: Test Database Connections
```bash
# Test PostgreSQL TLS
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected output: on
# Test Redis TLS
kubectl exec -n bakery-ia deployment/redis -- redis-cli \
--tls \
--cert /tls/redis-cert.pem \
--key /tls/redis-key.pem \
--cacert /tls/ca-cert.pem \
-a $REDIS_PASSWORD \
ping
# Expected output: PONG
```
### Step 4: Test Frontend Access
```bash
# Test frontend (replace with your domain)
curl -I https://bakery.yourdomain.com
# Expected: HTTP/2 200 OK
# Test API health
curl https://api.yourdomain.com/health
# Expected: {"status": "healthy"}
```
### Step 5: Test Authentication
```bash
# Create a test user (using your frontend or API)
curl -X POST https://api.yourdomain.com/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!",
"name": "Test User"
}'
# Login
curl -X POST https://api.yourdomain.com/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!"
}'
# Expected: JWT token in response
```
### Step 6: Test Email Delivery
```bash
# Trigger a password reset to test email
curl -X POST https://api.yourdomain.com/api/v1/auth/forgot-password \
-H "Content-Type: application/json" \
-d '{"email": "test@yourdomain.com"}'
# Check your email inbox for the reset link
# Check service logs if email not received:
kubectl logs -n bakery-ia deployment/auth-service | grep -i "email\|smtp"
```
### Step 7: Test WhatsApp (Optional)
```bash
# Send a test WhatsApp message
# This requires creating a tenant and configuring WhatsApp in the UI
# Or test via API once authenticated
```
---
## Post-Deployment
2026-01-08 12:58:00 +01:00
### Step 1: Access Monitoring Stack
Your production monitoring stack provides complete observability with multiple tools:
#### Production Monitoring URLs
Access via domain (recommended):
```
https://monitoring.bakewise.ai/grafana # Dashboards & visualization
https://monitoring.bakewise.ai/prometheus # Metrics & queries
https://monitoring.bakewise.ai/signoz # Unified observability platform (traces, metrics, logs)
https://monitoring.bakewise.ai/alertmanager # Alert management
```
Or via port forwarding (if needed):
```bash
# Grafana
kubectl port-forward -n monitoring svc/grafana 3000:3000 &
# Prometheus
kubectl port-forward -n monitoring svc/prometheus-external 9090:9090 &
# SigNoz
kubectl port-forward -n monitoring svc/signoz-frontend 3301:3301 &
# AlertManager
kubectl port-forward -n monitoring svc/alertmanager-external 9093:9093 &
```
#### Available Dashboards
Login to Grafana (admin / your-password) and explore:
**Main Dashboards:**
1. **Gateway Metrics** - HTTP request rates, latencies, error rates
2. **Services Overview** - Multi-service health and performance
3. **Circuit Breakers** - Reliability metrics
**Extended Dashboards:**
4. **Service Performance Monitoring (SPM)** - RED metrics from distributed traces
5. **PostgreSQL Database** - Database health, connections, query performance
6. **Node Exporter Infrastructure** - CPU, memory, disk, network per node
7. **AlertManager Monitoring** - Alert tracking and notification status
8. **Business Metrics & KPIs** - Tenant activity, ML jobs, forecasts
#### Quick Health Check
2026-01-07 19:12:35 +01:00
```bash
2026-01-08 12:58:00 +01:00
# Verify all monitoring pods are running
2026-01-07 19:12:35 +01:00
kubectl get pods -n monitoring
2026-01-08 12:58:00 +01:00
# Check Prometheus targets (all should be UP)
kubectl port-forward -n monitoring svc/prometheus-external 9090:9090
# Open: http://localhost:9090/targets
# View active alerts
kubectl port-forward -n monitoring svc/prometheus-external 9090:9090
# Open: http://localhost:9090/alerts
```
### Step 2: Configure Alerting
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
Update AlertManager with your notification email addresses:
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
```bash
# Edit alertmanager configuration
kubectl edit configmap -n monitoring alertmanager-config
# Update recipient emails in the routes section:
# - alerts@bakewise.ai (general alerts)
# - critical-alerts@bakewise.ai (critical issues)
# - oncall@bakewise.ai (on-call rotation)
2026-01-07 19:12:35 +01:00
```
2026-01-08 12:58:00 +01:00
Test alert delivery:
```bash
# Fire a test alert
kubectl run memory-test --image=polinux/stress --restart=Never \
--namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s
# Check alert appears in AlertManager
# https://monitoring.bakewise.ai/alertmanager
# Verify email notification received
# Clean up test
kubectl delete pod memory-test -n bakery-ia
```
### Step 3: Configure Backups
2026-01-07 19:12:35 +01:00
```bash
# Create backup script on VPS
cat > ~/backup-databases.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR
# Get all database pods
DBS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database -o name)
for db in $DBS; do
DB_NAME=$(echo $db | cut -d'/' -f2)
echo "Backing up $DB_NAME..."
kubectl exec -n bakery-ia $db -- pg_dump -U postgres > "$BACKUP_DIR/${DB_NAME}.sql"
done
# Compress backups
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"
# Keep only last 7 days
find /backups -name "*.tar.gz" -mtime +7 -delete
echo "Backup completed: $BACKUP_DIR.tar.gz"
EOF
chmod +x ~/backup-databases.sh
# Test backup
./backup-databases.sh
# Setup daily cron job (2 AM)
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -
```
### Step 3: Setup Alerting
```bash
# Update AlertManager configuration with your email
kubectl edit configmap -n monitoring alertmanager-config
# Update recipient emails in the routes section
```
2026-01-08 12:58:00 +01:00
### Step 4: Verify Monitoring is Working
Before proceeding, ensure all monitoring components are operational:
```bash
# 1. Check Prometheus targets
# Open: https://monitoring.bakewise.ai/prometheus/targets
# All targets should show "UP" status
# 2. Verify Grafana dashboards load data
# Open: https://monitoring.bakewise.ai/grafana
# Navigate to any dashboard and verify metrics are displaying
# 3. Check SigNoz is receiving traces
# Open: https://monitoring.bakewise.ai/signoz
# Search for traces from "gateway" service
# 4. Verify AlertManager cluster
# Open: https://monitoring.bakewise.ai/alertmanager
# Check that all 3 AlertManager instances are connected
```
### Step 5: Document Everything
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
Create a secure runbook with all credentials and procedures:
**Essential Information to Document:**
- [ ] VPS login credentials (stored securely in password manager)
2026-01-07 19:12:35 +01:00
- [ ] Database passwords (in password manager)
2026-01-08 12:58:00 +01:00
- [ ] Grafana admin password
- [ ] Domain registrar access (for bakewise.ai)
2026-01-07 19:12:35 +01:00
- [ ] Cloudflare access
2026-01-08 12:58:00 +01:00
- [ ] Email service credentials (SMTP)
2026-01-07 19:12:35 +01:00
- [ ] WhatsApp API credentials
- [ ] Docker Hub / Registry credentials
- [ ] Emergency contact information
- [ ] Rollback procedures
2026-01-08 12:58:00 +01:00
- [ ] Monitoring URLs and access procedures
### Step 6: Train Your Team
Conduct a training session covering:
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
- [ ] **Access monitoring dashboards**
- Show how to login to https://monitoring.bakewise.ai/grafana
- Walk through key dashboards (Services Overview, Database, Infrastructure)
- Explain how to interpret metrics and identify issues
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
- [ ] **Check application logs**
```bash
# View logs for a service
kubectl logs -n bakery-ia deployment/orders-service --tail=100 -f
# Search for errors
kubectl logs -n bakery-ia deployment/gateway | grep ERROR
```
- [ ] **Restart services when needed**
```bash
# Restart a service (rolling update, no downtime)
kubectl rollout restart deployment/orders-service -n bakery-ia
```
- [ ] **Respond to alerts**
- Show how to access AlertManager at https://monitoring.bakewise.ai/alertmanager
- Review common alerts and their resolution steps
- Reference the [Production Operations Guide](./PRODUCTION_OPERATIONS_GUIDE.md)
- [ ] **Share documentation**
- [PILOT_LAUNCH_GUIDE.md](./PILOT_LAUNCH_GUIDE.md) - This guide
- [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md) - Daily operations
- [security-checklist.md](./security-checklist.md) - Security procedures
- [ ] **Setup on-call rotation** (if applicable)
- Configure in AlertManager
- Document escalation procedures
2026-01-07 19:12:35 +01:00
---
## Troubleshooting
### Issue: Pods Not Starting
```bash
# Check pod status
kubectl describe pod POD_NAME -n bakery-ia
# Common causes:
# 1. Image pull errors
kubectl get events -n bakery-ia | grep -i "pull"
# 2. Resource limits
kubectl describe node
# 3. Volume mount issues
kubectl get pvc -n bakery-ia
```
### Issue: Certificate Not Issuing
```bash
# Check certificate status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
# Check challenges
kubectl get challenges -n bakery-ia
# Verify DNS is correct
nslookup bakery.yourdomain.com
```
### Issue: Database Connection Errors
```bash
# Check database pod
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
# Check database logs
kubectl logs -n bakery-ia deployment/auth-db
# Test connection from service pod
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432
```
### Issue: Services Can't Connect to Databases
```bash
# Check if SSL is enabled
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Check service logs for SSL errors
kubectl logs -n bakery-ia deployment/auth-service | grep -i "ssl\|tls"
# Restart service to pick up new SSL config
kubectl rollout restart deployment/auth-service -n bakery-ia
```
### Issue: Out of Resources
```bash
# Check node resources
kubectl top nodes
# Check pod resource usage
kubectl top pods -n bakery-ia
# Identify resource hogs
kubectl top pods -n bakery-ia --sort-by=memory
# Scale down non-critical services temporarily
kubectl scale deployment monitoring -n bakery-ia --replicas=0
```
---
## Next Steps After Successful Launch
1. **Monitor for 48 Hours**
- Check dashboards daily
- Review error logs
- Monitor resource usage
- Test all functionality
2. **Optimize Based on Metrics**
- Adjust resource limits if needed
- Fine-tune autoscaling thresholds
- Optimize database queries if slow
3. **Onboard First Tenant**
- Create test tenant
- Upload sample data
- Test all features
- Gather feedback
4. **Scale Gradually**
- Add 1-2 tenants at a time
- Monitor resource usage
- Upgrade VPS if needed (see scaling guide)
5. **Plan for Growth**
- Review [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md)
- Implement additional monitoring
- Plan capacity upgrades
- Consider managed services for scale
---
## Cost Scaling Path
| Tenants | RAM | CPU | Storage | Monthly Cost |
|---------|-----|-----|---------|--------------|
| 10 | 20 GB | 8 cores | 200 GB | €40-80 |
| 25 | 32 GB | 12 cores | 300 GB | €80-120 |
| 50 | 48 GB | 16 cores | 500 GB | €150-200 |
| 100+ | Consider multi-node cluster or managed K8s | €300+ |
---
## Support Resources
2026-01-08 12:58:00 +01:00
**Documentation:**
- **Operations Guide:** [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md) - Daily operations, monitoring, incident response
- **Security Guide:** [security-checklist.md](./security-checklist.md) - Security procedures and compliance
- **Database Security:** [database-security.md](./database-security.md) - Database operations and TLS configuration
- **TLS Configuration:** [tls-configuration.md](./tls-configuration.md) - Certificate management
- **RBAC Implementation:** [rbac-implementation.md](./rbac-implementation.md) - Access control
**Monitoring Access:**
- **Grafana:** https://monitoring.bakewise.ai/grafana (admin / your-password)
- **Prometheus:** https://monitoring.bakewise.ai/prometheus
- **SigNoz:** https://monitoring.bakewise.ai/signoz
- **AlertManager:** https://monitoring.bakewise.ai/alertmanager
2026-01-07 19:12:35 +01:00
2026-01-08 12:58:00 +01:00
**External Resources:**
2026-01-07 19:12:35 +01:00
- **MicroK8s Docs:** https://microk8s.io/docs
- **Kubernetes Docs:** https://kubernetes.io/docs
- **Let's Encrypt:** https://letsencrypt.org/docs
- **Cloudflare DNS:** https://developers.cloudflare.com/dns
2026-01-08 12:58:00 +01:00
- **Monitoring Stack README:** infrastructure/kubernetes/base/components/monitoring/README.md
2026-01-07 19:12:35 +01:00
---
## Summary Checklist
Before going live, ensure:
- [ ] VPS provisioned and accessible
- [ ] MicroK8s installed and configured
- [ ] Domain registered and DNS configured
- [ ] Cloudflare protection enabled
- [ ] TLS certificates generated
- [ ] Email service configured and tested
- [ ] WhatsApp API setup (optional for launch)
- [ ] Container images built and pushed
- [ ] Production configs updated (domains, CORS, etc.)
- [ ] Secrets generated (strong passwords!)
- [ ] All pods running successfully
- [ ] Databases accepting TLS connections
- [ ] Let's Encrypt certificates issued
- [ ] Frontend accessible via HTTPS
- [ ] API health check passing
- [ ] Test user can login
- [ ] Email delivery working
- [ ] Monitoring dashboards loading
- [ ] Backups configured and tested
- [ ] Team trained on operations
- [ ] Documentation complete
- [ ] Emergency procedures documented
---
**🎉 Congratulations! Your Bakery-IA platform is now live in production!**
*Estimated total time: 2-4 hours for first deployment*
*Subsequent updates: 15-30 minutes*
---
**Document Version:** 1.0
**Last Updated:** 2026-01-07
**Maintained By:** DevOps Team