Files
bakery-ia/docs/PILOT_LAUNCH_GUIDE.md
2026-01-20 22:05:10 +01:00

75 KiB
Raw Blame History

Bakery-IA Pilot Launch Guide

Complete guide for deploying to production for a 10-tenant pilot program

Last Updated: 2026-01-20 Target Environment: clouding.io VPS with MicroK8s Estimated Cost: €41-81/month Time to Deploy: 3-5 hours (first time, including fixes) Status: ⚠️ REQUIRES PRE-DEPLOYMENT FIXES - See Production VPS Deployment Fixes Version: 3.0


Table of Contents

  1. Executive Summary
  2. Infrastructure Architecture Overview
  3. ⚠️ CRITICAL: Pre-Deployment Fixes
  4. Pre-Launch Checklist
  5. VPS Provisioning
  6. Infrastructure Setup
  7. Domain & DNS Configuration
  8. TLS/SSL Certificates
  9. Email & Communication Setup
  10. Kubernetes Deployment
  11. Configuration & Secrets
  12. Database Migrations
  13. CI/CD Infrastructure Deployment
  14. Mailu Email Server Deployment
  15. Nominatim Geocoding Service
  16. SigNoz Monitoring Deployment
  17. Verification & Testing
  18. Post-Deployment

Executive Summary

What You're Deploying

A complete multi-tenant SaaS platform with:

  • 18 microservices (auth, tenant, ML forecasting, inventory, sales, orders, etc.)
  • 14 PostgreSQL databases with TLS encryption
  • Redis cache with TLS
  • RabbitMQ message broker
  • Monitoring stack (Prometheus, Grafana, AlertManager)
  • Full security (TLS, RBAC, audit logging)

Total Cost Breakdown

Service Provider Monthly Cost
VPS Server (20GB RAM, 8 vCPU, 200GB SSD) clouding.io €40-80
Domain Namecheap/Cloudflare €1.25 (€15/year)
Email Zoho Free / Gmail €0
WhatsApp API Meta Business €0 (1k free conversations)
DNS Cloudflare €0
SSL Let's Encrypt €0
TOTAL €41-81/month

Timeline

Phase Duration Description
Pre-Launch Setup 1-2 hours Domain, VPS provisioning, accounts setup
Infrastructure Setup 1 hour MicroK8s installation, firewall config
Deployment 30-60 min Deploy all services and databases
Verification 30-60 min Test everything works
Total 2-4 hours First-time deployment

Infrastructure Architecture Overview

Component Layers

The Bakery-IA platform is organized into distinct infrastructure layers, each with specific deployment dependencies.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         LAYER 6: APPLICATION                                 │
│  Frontend │ Gateway │ 18 Microservices │ CronJobs & Workers                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 5: MONITORING                                  │
│  SigNoz (Unified Observability) │ AlertManager │ OTel Collector             │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 4: PLATFORM SERVICES (Optional)               │
│  Mailu (Email) │ Nominatim (Geocoding) │ CI/CD (Tekton, Flux, Gitea)       │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 3: DATA & STORAGE                              │
│  PostgreSQL (18 DBs) │ Redis │ RabbitMQ │ MinIO                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 2: NETWORK & SECURITY                          │
│  Unbound DNS │ CoreDNS │ Ingress Controller │ Cert-Manager │ TLS           │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 1: FOUNDATION                                  │
│  Namespaces │ Storage Classes │ RBAC │ ConfigMaps │ Secrets                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 0: KUBERNETES CLUSTER                          │
│  MicroK8s (Production) │ Kind (Local Dev) │ EKS (AWS Alternative)           │
└─────────────────────────────────────────────────────────────────────────────┘

Deployment Order & Dependencies

Components must be deployed in a specific order due to dependencies:

1. Namespaces (bakery-ia, tekton-pipelines, flux-system)
   ↓
2. Cert-Manager & ClusterIssuers
   ↓
3. TLS Certificates (internal + ingress)
   ↓
4. Unbound DNS Resolver (required for Mailu DNSSEC)
   ↓
5. CoreDNS Configuration (forward to Unbound)
   ↓
6. Ingress Controller & Resources
   ↓
7. Data Layer: PostgreSQL, Redis, RabbitMQ, MinIO
   ↓
8. Database Migrations
   ↓
9. Application Services (18 microservices)
   ↓
10. Gateway & Frontend
    ↓
11. (Optional) CI/CD: Gitea → Tekton → Flux
    ↓
12. (Optional) Mailu Email Server
    ↓
13. (Optional) Nominatim Geocoding
    ↓
14. (Optional) SigNoz Monitoring

Infrastructure Components Summary

Component Purpose Required Namespace
MicroK8s Kubernetes cluster Yes -
Cert-Manager TLS certificate management Yes cert-manager
Ingress-Nginx External traffic routing Yes ingress
PostgreSQL 18 service databases Yes bakery-ia
Redis Caching & sessions Yes bakery-ia
RabbitMQ Message broker Yes bakery-ia
MinIO Object storage (ML models) Yes bakery-ia
Unbound DNS DNSSEC resolver For Mailu bakery-ia
Mailu Self-hosted email server Optional bakery-ia
Nominatim Geocoding service Optional bakery-ia
Gitea Git server + container registry Optional gitea
Tekton CI/CD pipelines Optional tekton-pipelines
Flux CD GitOps deployment Optional flux-system
SigNoz Unified observability Recommended bakery-ia

Quick Reference: What to Deploy

Minimal Production Setup:

  • Kubernetes cluster + addons
  • Core infrastructure (databases, cache, broker)
  • Application services
  • External email (Zoho/Gmail)

Full Production Setup (Recommended):

  • Everything above, plus:
  • Mailu (self-hosted email)
  • SigNoz (monitoring)
  • CI/CD (Gitea + Tekton + Flux)
  • Nominatim (if geocoding needed)

⚠️ CRITICAL: Pre-Deployment Configuration

READ THIS FIRST: The Kubernetes configuration requires updates for secure production deployment.

🔴 Configuration Status

Your manifests need the following updates before deploying to production:

Required Configuration Changes

1. Remove imagePullSecrets (BLOCKING)

Why: Images are public/don't require authentication Impact if skipped: All pods fail with ImagePullBackOff

2. Update Image Tags to Semantic Versions (BLOCKING)

Why: Using 'latest' causes non-deterministic deployments Impact if skipped: Unpredictable behavior, impossible rollbacks

3. Fix SigNoz Namespace References (BLOCKING) - ALREADY FIXED

Why: SigNoz must be in bakery-ia namespace Impact if skipped: Kustomize apply fails Status: Fixed in latest commit

4. Production Secrets (ALREADY CONFIGURED)

Status: Strong production secrets have been generated and configured Impact if skipped: N/A - This step is already completed

5. Update Cert-Manager Email (HIGH PRIORITY) - ALREADY FIXED

Why: Receive Let's Encrypt renewal notifications Impact if skipped: Won't receive SSL expiry warnings Status: Fixed - email is now admin@bakewise.ai

6. Update Stripe Publishable Key (HIGH PRIORITY)

Why: Payment processing requires production Stripe key Impact if skipped: Payments will use test mode (no real charges) File: infrastructure/kubernetes/base/configmap.yaml line 378 Current value: pk_test_your_stripe_publishable_key_here Required: Your Stripe production publishable key from https://dashboard.stripe.com/apikeys

7. Pilot Coupon Configuration (OPTIONAL)

Why: Control pilot program settings Files: infrastructure/kubernetes/base/configmap.yaml lines 375-377 Current values (defaults are correct for pilot):

  • VITE_PILOT_MODE_ENABLED: "true" - Enables pilot UI features
  • VITE_PILOT_COUPON_CODE: "PILOT2025" - Coupon code for 3 months free
  • VITE_PILOT_TRIAL_MONTHS: "3" - Trial extension duration

Note: The PILOT2025 coupon is automatically created when tenant-service starts. No manual seeding required - it's handled by app/jobs/startup_seeder.py.

Already Correct (No Changes Needed)

  • Storage Class - microk8s-hostpath is correct for MicroK8s
  • Domain Names - bakewise.ai is your production domain
  • Service Types - ClusterIP + Ingress is correct architecture
  • Network Policies - Not required for single-namespace deployment
  • SigNoz Namespace - Fixed to use bakery-ia namespace

Step-by-Step Configuration Script

Run these commands on your local machine before deployment:

# Navigate to repository root
cd /path/to/bakery-ia

# ========================================
# STEP 1: Remove imagePullSecrets
# ========================================
echo "Step 1: Removing imagePullSecrets..."
chmod +x infrastructure/kubernetes/remove-imagepullsecrets.sh
./infrastructure/kubernetes/remove-imagepullsecrets.sh

# Verify removal
grep -r "imagePullSecrets" infrastructure/kubernetes/base/ && \
  echo "⚠️  WARNING: Some files still have imagePullSecrets" || \
  echo "✅ imagePullSecrets removed"

# ========================================
# STEP 2: Update Image Tags
# ========================================
echo -e "\nStep 2: Updating image tags..."
export VERSION="1.0.0"  # Change this to your version
sed -i.bak "s/newTag: latest/newTag: v${VERSION}/g" infrastructure/kubernetes/overlays/prod/kustomization.yaml

# Verify no 'latest' tags remain
grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml | grep "latest" && \
  echo "⚠️  WARNING: Some images still use 'latest'" || \
  echo "✅ All images now use version v${VERSION}"

# ========================================
# STEP 3: Production Secrets (ALREADY DONE) ✅
# ========================================
echo -e "\nStep 3: Verifying production secrets..."
echo "✅ Production secrets have been pre-configured with strong passwords"
echo "   - JWT secrets: 256-bit cryptographically secure"
echo "   - Database passwords: 24-character random strings"
echo "   - Redis password: 24-character random string"
echo "   - RabbitMQ password: 24-character random string"
echo "   - Service API key: 64-character hex string"
echo ""
echo "All secrets are already set in infrastructure/kubernetes/base/secrets.yaml"
echo "No manual action required for this step."

# ========================================
# STEP 4: Cert-Manager Email (ALREADY FIXED)
# ========================================
echo -e "\nStep 4: Verifying cert-manager email..."
grep "admin@bakewise.ai" infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml && \
  echo "✅ Cert-manager email already set to admin@bakewise.ai" || \
  echo "⚠️  WARNING: Cert-manager email needs updating"

# ========================================
# STEP 5: Update Stripe Publishable Key
# ========================================
echo -e "\nStep 5: Stripe Publishable Key Configuration..."
echo "================================================================"
echo "⚠️  MANUAL STEP REQUIRED"
echo ""
echo "Edit: infrastructure/kubernetes/base/configmap.yaml"
echo "Find: VITE_STRIPE_PUBLISHABLE_KEY: \"pk_test_your_stripe_publishable_key_here\""
echo "Replace with your production Stripe publishable key from:"
echo "  https://dashboard.stripe.com/apikeys"
echo ""
echo "Example:"
echo "  VITE_STRIPE_PUBLISHABLE_KEY: \"pk_live_XXXXXXXXXXXXXXXXXXXX\""
echo ""
echo "Press Enter when you've updated the Stripe key..."
read

# ========================================
# FINAL VALIDATION
# ========================================
echo -e "\n========================================"
echo "Pre-Deployment Configuration Complete!"
echo "========================================"
echo ""
echo "Validation Checklist:"
echo "  ✅ imagePullSecrets removed"
echo "  ✅ Image tags updated to v${VERSION}"
echo "  ✅ SigNoz namespace fixed (bakery-ia)"
echo "  ✅ Production secrets configured with strong passwords"
echo "  ✅ Cert-manager email set to admin@bakewise.ai"
echo "  ⚠️  Stripe publishable key updated (manual verification required)"
echo "  ✅ Pilot coupon auto-seeded on tenant-service startup"
echo ""
echo "Next: Copy manifests to VPS and begin deployment"

Manual Verification

After running the script above:

  1. Verify production secrets are configured:

    # Verify secrets.yaml has strong passwords (not placeholders)
    grep "JWT_SECRET_KEY" infrastructure/kubernetes/base/secrets.yaml
    # Should show: dXNNSHc5a1FDUW95cmM3d1BtTWkzYkNscjBsVFk5d3Z6Wm1jVGJBRHZMMD0=
    # (This is the base64-encoded production JWT secret)
    
  2. Check image tags:

    grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml
    # All should show v1.0.0 (or your version), NOT 'latest'
    
  3. Verify SigNoz namespace:

    grep -A 3 "name: signoz" infrastructure/kubernetes/overlays/prod/kustomization.yaml
    # All should show: namespace: bakery-ia
    

⏱️ Estimated Time: 30-45 minutes


Pre-Launch Checklist

Required Accounts & Services

  • Domain Name

    • Register at Namecheap or Cloudflare (€10-15/year)
    • Suggested: bakeryforecast.es or bakery-ia.com
  • VPS Account

  • Email Service (Choose ONE)

    • Option A: Zoho Mail FREE (recommended for full send/receive)
    • Option B: Gmail SMTP + domain forwarding
    • Option C: Google Workspace (14-day free trial, then €5.75/month)
  • WhatsApp Business API

    • Create Meta Business Account (free)
    • Verify business identity
    • Phone number ready (non-VoIP)
  • DNS Access

    • Cloudflare account (free, recommended)
    • Or domain registrar DNS panel access
  • Container Registry (Choose ONE)

    • Option A: Docker Hub account (recommended)
    • Option B: GitHub Container Registry
    • Option C: MicroK8s built-in registry

Required Tools on Local Machine

# Verify you have these installed:
kubectl version --client
docker --version
git --version
ssh -V
openssl version

# Install if missing (macOS):
brew install kubectl docker git openssh openssl

Repository Setup

# Clone the repository
git clone https://github.com/yourusername/bakery-ia.git
cd bakery-ia

# Verify structure
ls infrastructure/kubernetes/overlays/prod/

VPS Provisioning

For 10-tenant pilot program:

  • RAM: 20 GB
  • CPU: 8 vCPU cores
  • Storage: 200 GB NVMe SSD (triple replica)
  • Network: 1 Gbps connection
  • OS: Ubuntu 22.04 LTS
  • Monthly Cost: €40-80 (check current pricing)

Why These Specs?

Memory Breakdown:

  • Application services: 14.1 GB
  • Databases (18 instances): 4.6 GB
  • Infrastructure (Redis, RabbitMQ): 0.8 GB
  • Gateway/Frontend: 1.8 GB
  • Monitoring: 1.5 GB
  • System overhead: ~3 GB
  • Total: ~26 GB capacity needed, 20 GB is sufficient with HPA

Storage Breakdown:

  • Databases: 36 GB (18 × 2GB)
  • ML Models: 10 GB
  • Redis: 1 GB
  • RabbitMQ: 2 GB
  • Prometheus metrics: 20 GB
  • Container images: ~30 GB
  • Growth buffer: 100 GB
  • Total: 199 GB

Provisioning Steps

  1. Create VPS at clouding.io:

    1. Log in to clouding.io dashboard
    2. Click "Create New Server"
    3. Select:
       - OS: Ubuntu 22.04 LTS
       - RAM: 20 GB
       - CPU: 8 vCPU
       - Storage: 200 GB NVMe SSD
       - Location: Barcelona (best for Spain)
    4. Set hostname: bakery-ia-prod-01
    5. Add SSH key (or use password)
    6. Create server
    
  2. Note your server details:

    # Save these for later:
    VPS_IP="YOUR_VPS_IP_ADDRESS"
    VPS_ROOT_PASSWORD="YOUR_ROOT_PASSWORD"  # If not using SSH key
    
  3. Initial SSH connection:

    # Test connection
    ssh root@$VPS_IP
    
    # Update system
    apt update && apt upgrade -y
    

Infrastructure Setup

Step 1: Install MicroK8s

Using MicroK8s for production VPS deployment on clouding.io

# SSH into your VPS
ssh root@$VPS_IP

# Update system
apt update && apt upgrade -y

# Install MicroK8s
snap install microk8s --classic --channel=1.28/stable

# Add your user to microk8s group
usermod -a -G microk8s $USER
chown -f -R $USER ~/.kube
newgrp microk8s

# Verify installation
microk8s status --wait-ready

Step 2: Enable Required MicroK8s Addons

All required components are available as MicroK8s addons:

# Enable core addons
microk8s enable dns                  # DNS resolution within cluster
microk8s enable hostpath-storage     # Provides microk8s-hostpath storage class
microk8s enable ingress              # Nginx ingress controller (uses class "public")
microk8s enable cert-manager         # Let's Encrypt SSL certificates
microk8s enable metrics-server       # For HPA autoscaling
microk8s enable rbac                 # Role-based access control

# Setup kubectl alias
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc

# Verify all components are running
kubectl get nodes
# Should show: Ready

kubectl get storageclass
# Should show: microk8s-hostpath (default)

kubectl get pods -A
# Should show pods in: kube-system, ingress, cert-manager namespaces

# Verify ingress controller is running
kubectl get pods -n ingress
# Should show: nginx-ingress-microk8s-controller-xxx  Running

# Verify cert-manager is running
kubectl get pods -n cert-manager
# Should show: cert-manager-xxx, cert-manager-webhook-xxx, cert-manager-cainjector-xxx

# Verify metrics-server is working
kubectl top nodes
# Should return CPU/Memory metrics

Important - MicroK8s Ingress Class:

  • MicroK8s ingress addon uses class name public (NOT nginx)
  • The ClusterIssuers in this repo are already configured with class: public
  • If you see cert-manager challenges failing, verify the ingress class matches

Optional but Recommended:

# Enable Prometheus for additional monitoring (optional)
microk8s enable prometheus

# Enable registry if you want local image storage (optional)
microk8s enable registry

Step 3: Enhanced Infrastructure Components

The platform includes additional infrastructure components that enhance security, monitoring, and operations:

# The platform includes Mailu for email services
# Deploy Mailu via Helm (optional but recommended for production):
kubectl create namespace bakery-ia --dry-run=client -o yaml | kubectl apply -f -
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update
helm install mailu mailu/mailu \
  -n bakery-ia \
  -f infrastructure/platform/mail/mailu-helm/values.yaml \
  --timeout 10m \
  --wait

# Verify Mailu deployment
kubectl get pods -n bakery-ia | grep mailu

For development environments, ensure the prepull-base-images script is run:

# On your local machine, run the prepull script to cache base images
cd bakery-ia
chmod +x scripts/prepull-base-images.sh
./scripts/prepull-base-images.sh

For production environments, ensure CI/CD infrastructure is properly configured:

# Tekton Pipelines for CI/CD (optional - can be deployed separately)
kubectl create namespace tekton-pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml

# Flux CD for GitOps (already enabled in MicroK8s if needed)
# flux install --namespace=flux-system --network-policy=false

Step 4: Configure Firewall

CRITICAL: Ports 80 and 443 must be open for Let's Encrypt HTTP-01 challenges to work.

# Allow necessary ports
ufw allow 22/tcp      # SSH
ufw allow 80/tcp      # HTTP - REQUIRED for Let's Encrypt HTTP-01 challenge
ufw allow 443/tcp     # HTTPS - For your application traffic
ufw allow 16443/tcp   # Kubernetes API (optional, for remote kubectl access)

# Enable firewall
ufw enable

# Check status
ufw status verbose

# Expected output should include:
# 80/tcp    ALLOW   Anywhere
# 443/tcp   ALLOW   Anywhere

Also check clouding.io firewall:

  • Log in to clouding.io dashboard
  • Go to your VPS → Firewall settings
  • Ensure ports 80 and 443 are allowed from anywhere (0.0.0.0/0)

Step 5: Create Namespace

# Create bakery-ia namespace
kubectl create namespace bakery-ia

# Verify
kubectl get namespaces

Domain & DNS Configuration

Step 1: Register Domain at Namecheap

  1. Go to Namecheap
  2. Search for your desired domain (e.g., bakewise.ia)
  3. Complete purchase (~€10-15/year)
  4. Save domain credentials

Step 2: Configure DNS at Namecheap

  1. Access DNS settings:

    1. Log in to Namecheap
    2. Go to Domain List → Manage → Advanced DNS
    
  2. Add DNS records pointing to your VPS:

    Type    Host    Value               TTL
    A       @       YOUR_VPS_IP         Automatic
    A       *       YOUR_VPS_IP         Automatic
    

    This points both bakewise.ia and all subdomains (*.bakewise.ia) to your VPS.

  3. Test DNS propagation:

    # Wait 5-10 minutes, then test
    nslookup bakewise.ia
    nslookup api.bakewise.ia
    nslookup mail.bakewise.ia
    

Step 3 (Optional): Configure Cloudflare DNS

  1. Add site to Cloudflare:

    1. Log in to Cloudflare
    2. Click "Add a Site"
    3. Enter your domain name
    4. Choose Free plan
    5. Cloudflare will scan existing DNS records
    
  2. Update nameservers at registrar:

    Point your domain's nameservers to Cloudflare:
    - NS1: assigned.cloudflare.com
    - NS2: assigned.cloudflare.com
    (Cloudflare will provide the exact values)
    
  3. Add DNS records:

    Type    Name        Content         TTL     Proxy
    A       @           YOUR_VPS_IP     Auto    Yes
    A       www         YOUR_VPS_IP     Auto    Yes
    A       api         YOUR_VPS_IP     Auto    Yes
    A       monitoring  YOUR_VPS_IP     Auto    Yes
    CNAME   *           yourdomain.com  Auto    No
    
  4. Configure SSL/TLS mode:

    SSL/TLS tab → Overview → Set to "Full (strict)"
    
  5. Test DNS propagation:

    # Wait 5-10 minutes, then test
    nslookup yourdomain.com
    nslookup api.yourdomain.com
    

TLS/SSL Certificates

Understanding Certificate Setup

The platform uses two layers of SSL/TLS:

  1. External (Ingress) SSL: Let's Encrypt for public HTTPS
  2. Internal (Database) SSL: Self-signed certificates for database connections

Step 1: Generate Internal Certificates

# On your local machine
cd infrastructure/tls

# Generate certificates
./generate-certificates.sh

# This creates:
# - ca/ (Certificate Authority)
# - postgres/ (PostgreSQL server certs)
# - redis/ (Redis server certs)

Certificate Details:

  • Root CA: 10-year validity (expires 2035)
  • Server certs: 3-year validity (expires October 2028)
  • Algorithm: RSA 4096-bit
  • Signature: SHA-256

Step 2: Create Kubernetes Secrets

# Create PostgreSQL TLS secret
kubectl create secret generic postgres-tls \
  --from-file=server-cert.pem=infrastructure/tls/postgres/server-cert.pem \
  --from-file=server-key.pem=infrastructure/tls/postgres/server-key.pem \
  --from-file=ca-cert.pem=infrastructure/tls/postgres/ca-cert.pem \
  -n bakery-ia

# Create Redis TLS secret
kubectl create secret generic redis-tls \
  --from-file=redis-cert.pem=infrastructure/tls/redis/redis-cert.pem \
  --from-file=redis-key.pem=infrastructure/tls/redis/redis-key.pem \
  --from-file=ca-cert.pem=infrastructure/tls/redis/ca-cert.pem \
  -n bakery-ia

# Verify secrets created
kubectl get secrets -n bakery-ia | grep tls

Step 3: Configure Let's Encrypt (External SSL)

cert-manager is already enabled via microk8s enable cert-manager. The ClusterIssuer is pre-configured in the repository.

Important: MicroK8s ingress addon uses ingress class public (not nginx). This is already configured in:

  • infrastructure/platform/cert-manager/cluster-issuer-production.yaml
  • infrastructure/platform/cert-manager/cluster-issuer-staging.yaml
# On VPS, apply the pre-configured ClusterIssuers
kubectl apply -k infrastructure/platform/cert-manager/

# Verify ClusterIssuers are ready
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-production

# Expected output:
# NAME                     READY   AGE
# letsencrypt-production   True    1m
# letsencrypt-staging      True    1m

Configuration details (already set):

  • Email: admin@bakewise.ai (receives Let's Encrypt expiry notifications)
  • Ingress class: public (MicroK8s default)
  • Challenge type: HTTP-01 (requires port 80 open)

If you need to customize the email, edit before applying:

# Edit the production issuer
nano infrastructure/platform/cert-manager/cluster-issuer-production.yaml
# Change: email: admin@bakewise.ai → email: your-email@yourdomain.com

Email & Communication Setup

Features:

  • Free forever for 1 domain, 5 users
  • 5GB storage per user
  • Full send/receive capability
  • Web interface + SMTP/IMAP
  • Professional email addresses

Setup Steps:

  1. Sign up for Zoho Mail:

    1. Go to https://www.zoho.com/mail/
    2. Click "Sign Up for Free"
    3. Choose "Forever Free" plan
    4. Enter your domain name
    5. Complete verification
    
  2. Verify domain ownership:

    Add TXT record to your DNS:
    Type: TXT
    Name: @
    Value: zoho-verification=XXXXX.zoho.com
    
  3. Configure MX records:

    Priority  Type  Name  Value
    10        MX    @     mx.zoho.com
    20        MX    @     mx2.zoho.com
    50        MX    @     mx3.zoho.com
    
  4. Get SMTP credentials:

    SMTP Host: smtp.zoho.com
    SMTP Port: 587
    SMTP Username: noreply@yourdomain.com
    SMTP Password: (generate app password in Zoho settings)
    

Option B: Gmail SMTP + Forwarding

Features:

  • Completely free
  • 500 emails/day (sufficient for pilot)
  • Receive via domain forwarding

Setup Steps:

  1. Enable 2FA on your Gmail:

    1. Go to myaccount.google.com
    2. Security → 2-Step Verification
    3. Enable and complete setup
    
  2. Generate app password:

    1. Security → 2-Step Verification → App passwords
    2. Select "Mail" and "Other (Custom name)"
    3. Name it "Bakery-IA SMTP"
    4. Copy the 16-character password
    
  3. Configure domain email forwarding:

    At your domain registrar or Cloudflare:
    - Forward noreply@yourdomain.com → your.gmail@gmail.com
    - Forward alerts@yourdomain.com → your.gmail@gmail.com
    
  4. SMTP Settings:

    SMTP Host: smtp.gmail.com
    SMTP Port: 587
    SMTP Username: your.gmail@gmail.com
    SMTP Password: (16-char app password from step 2)
    From Email: noreply@yourdomain.com
    

Features:

  • Full control over email infrastructure
  • No external dependencies or rate limits
  • Built-in antispam (rspamd) with DNSSEC validation
  • Webmail interface (Roundcube)
  • IMAP/SMTP with TLS
  • Admin panel for user management
  • Integrated with Kubernetes

Why Mailu for Production:

  • Complete email stack (Postfix, Dovecot, Rspamd, ClamAV)
  • DNSSEC validation for email authentication (DKIM/SPF/DMARC)
  • No monthly email limits or third-party dependencies
  • Professional email addresses: admin@bakewise.ai, noreply@bakewise.ai

Prerequisites

Before deploying Mailu, ensure:

  1. Unbound DNS is deployed (for DNSSEC validation)
  2. CoreDNS is configured to forward to Unbound
  3. DNS records are configured for your domain

Step 1: Configure DNS Records

Add these DNS records for your domain (e.g., bakewise.ai):

Type    Name            Value                           TTL
A       mail            YOUR_VPS_IP                     Auto
MX      @               mail.bakewise.ai (priority 10)  Auto
TXT     @               v=spf1 mx a -all                Auto
TXT     _dmarc          v=DMARC1; p=reject; rua=...     Auto

DKIM record will be generated after Mailu is running - you'll add it later.

Step 2: Deploy Unbound DNS Resolver

Unbound provides DNSSEC validation required by Mailu for email authentication.

# On VPS - Deploy Unbound via Helm
helm upgrade --install unbound infrastructure/platform/networking/dns/unbound-helm \
    -n bakery-ia \
    --create-namespace \
    -f infrastructure/platform/networking/dns/unbound-helm/values.yaml \
    -f infrastructure/platform/networking/dns/unbound-helm/prod/values.yaml \
    --timeout 5m \
    --wait

# Verify Unbound is running
kubectl get pods -n bakery-ia | grep unbound
# Should show: unbound-xxx  1/1  Running

# Get Unbound service IP (needed for CoreDNS configuration)
UNBOUND_IP=$(kubectl get svc unbound-dns -n bakery-ia -o jsonpath='{.spec.clusterIP}')
echo "Unbound DNS IP: $UNBOUND_IP"

Step 3: Configure CoreDNS for DNSSEC

Mailu requires DNSSEC validation. Configure CoreDNS to forward external queries to Unbound:

# Get the Unbound service IP
UNBOUND_IP=$(kubectl get svc unbound-dns -n bakery-ia -o jsonpath='{.spec.clusterIP}')

# Patch CoreDNS to forward to Unbound
kubectl patch configmap coredns -n kube-system --type merge -p "{
  \"data\": {
    \"Corefile\": \".:53 {\\n    errors\\n    health {\\n       lameduck 5s\\n    }\\n    ready\\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\\n       pods insecure\\n       fallthrough in-addr.arpa ip6.arpa\\n       ttl 30\\n    }\\n    prometheus :9153\\n    forward . $UNBOUND_IP {\\n       max_concurrent 1000\\n    }\\n    cache 30 {\\n       disable success cluster.local\\n       disable denial cluster.local\\n    }\\n    loop\\n    reload\\n    loadbalance\\n}\\n\"
  }
}"

# Restart CoreDNS to apply changes
kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system --timeout=60s

# Verify DNSSEC is working
kubectl run -it --rm debug --image=alpine --restart=Never -- \
    sh -c "apk add drill && drill -D google.com"
# Should show: ;; flags: ... ad ...  (ad = authenticated data = DNSSEC valid)

Step 4: Create TLS Certificate Secret

Mailu Front pod requires a TLS certificate:

# Generate self-signed certificate for internal use
# (Let's Encrypt handles external TLS via Ingress)
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR"

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
    -keyout tls.key -out tls.crt \
    -subj "/CN=mail.bakewise.ai/O=bakewise"

kubectl create secret tls mailu-certificates \
    --cert=tls.crt \
    --key=tls.key \
    -n bakery-ia

rm -rf "$TEMP_DIR"

# Verify secret created
kubectl get secret mailu-certificates -n bakery-ia

Step 5: Deploy Mailu via Helm

# Add Mailu Helm repository
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update mailu

# Deploy Mailu with production values
helm upgrade --install mailu mailu/mailu \
    -n bakery-ia \
    --create-namespace \
    -f infrastructure/platform/mail/mailu-helm/values.yaml \
    -f infrastructure/platform/mail/mailu-helm/prod/values.yaml \
    --timeout 10m

# Wait for pods to be ready (may take 5-10 minutes for ClamAV)
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=mailu -w

Step 6: Create Admin User

# Create initial admin user
kubectl exec -it -n bakery-ia deployment/mailu-admin -- \
    flask mailu admin admin bakewise.ai 'YourSecurePassword123!'

# Credentials:
#   Email: admin@bakewise.ai
#   Password: YourSecurePassword123!

Step 7: Configure DKIM

After Mailu is running, get the DKIM key and add it to DNS:

# Get DKIM public key
kubectl exec -n bakery-ia deployment/mailu-admin -- \
    cat /dkim/bakewise.ai.dkim.pub

# Add this as a TXT record in your DNS:
# Name: dkim._domainkey
# Value: (the key from above)

Step 8: Verify Email Setup

# Check all Mailu pods are running
kubectl get pods -n bakery-ia | grep mailu
# Expected: All 10 pods in Running state

# Test SMTP connectivity
kubectl run -it --rm smtp-test --image=alpine --restart=Never -- \
    sh -c "apk add swaks && swaks --to test@example.com --from admin@bakewise.ai --server mailu-front.bakery-ia.svc.cluster.local:25"

# Access webmail (via port-forward for testing)
kubectl port-forward -n bakery-ia svc/mailu-front 8080:80
# Open: http://localhost:8080/webmail

Production Email Endpoints

Service URL/Address
Admin Panel https://mail.bakewise.ai/admin
Webmail https://mail.bakewise.ai/webmail
SMTP (STARTTLS) mail.bakewise.ai:587
SMTP (SSL) mail.bakewise.ai:465
IMAP (SSL) mail.bakewise.ai:993

Troubleshooting Mailu

Issue: Admin pod CrashLoopBackOff with "DNSSEC validation" error

# Verify CoreDNS is forwarding to Unbound
kubectl get configmap coredns -n kube-system -o yaml | grep forward
# Should show: forward . <unbound-ip>

# If not, re-run Step 3 above

Issue: Front pod stuck in ContainerCreating

# Check for missing certificate secret
kubectl describe pod -n bakery-ia -l app.kubernetes.io/component=front | grep -A5 Events

# If missing mailu-certificates, re-run Step 4 above

Issue: Admin pod can't connect to Redis

# Verify externalRedis is disabled in values
helm get values mailu -n bakery-ia | grep -A5 externalRedis
# Should show: enabled: false

# If enabled: true, upgrade with correct values
helm upgrade mailu mailu/mailu -n bakery-ia \
    -f infrastructure/platform/mail/mailu-helm/values.yaml \
    -f infrastructure/platform/mail/mailu-helm/prod/values.yaml

WhatsApp Business API Setup

Features:

  • First 1,000 conversations/month FREE
  • Perfect for 10 tenants (~500 messages/month)

Setup Steps:

  1. Create Meta Business Account:

    1. Go to business.facebook.com
    2. Create Business Account
    3. Complete business verification
    
  2. Add WhatsApp Product:

    1. Go to developers.facebook.com
    2. Create New App → Business
    3. Add WhatsApp product
    4. Complete setup wizard
    
  3. Configure Phone Number:

    1. Test with your personal number initially
    2. Later: Get dedicated business number
    3. Verify phone number with SMS code
    
  4. Create Message Templates:

    1. Go to WhatsApp Manager
    2. Create templates for:
       - Low inventory alert
       - Expired product alert
       - Forecast summary
       - Order notification
    3. Submit for approval (15 min - 24 hours)
    
  5. Get API Credentials:

    Save these values:
    - Phone Number ID: (from WhatsApp Manager)
    - Access Token: (from App Dashboard)
    - Business Account ID: (from WhatsApp Manager)
    - Webhook Verify Token: (create your own secure string)
    

Kubernetes Deployment

Step 1: Prepare Container Images

# On your local machine
docker login

# Build all images
docker-compose build

# Tag images for Docker Hub
# Replace YOUR_USERNAME with your Docker Hub username
export DOCKER_USERNAME="YOUR_USERNAME"

./scripts/tag-images.sh $DOCKER_USERNAME

# Push to Docker Hub
./scripts/push-images.sh $DOCKER_USERNAME

# Update prod kustomization with your username
# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Replace all "bakery/" with "$DOCKER_USERNAME/"

Option B: Using MicroK8s Registry

# On VPS
microk8s enable registry

# Get registry address (usually localhost:32000)
kubectl get service -n container-registry

# On local machine, configure insecure registry
# Edit /etc/docker/daemon.json:
{
  "insecure-registries": ["YOUR_VPS_IP:32000"]
}

# Restart Docker
sudo systemctl restart docker

# Tag and push images
docker tag bakery/auth-service YOUR_VPS_IP:32000/bakery/auth-service
docker push YOUR_VPS_IP:32000/bakery/auth-service
# Repeat for all services...

Step 2: Update Production Configuration

⚠️ CRITICAL: The default configuration uses bakewise.ai domain. You MUST update this before deployment if using a different domain.

Required Configuration Updates

Step 2.1: Remove imagePullSecrets

# On your local machine
cd bakery-ia

# Remove imagePullSecrets from all deployment files
find infrastructure/kubernetes/base -name "*.yaml" -type f -exec sed -i.bak '/imagePullSecrets:/,+1d' {} \;

# Verify removal
grep -r "imagePullSecrets" infrastructure/kubernetes/base/
# Should return NO results

Step 2.2: Update Image Tags (Use Semantic Versions)

# Edit kustomization.yaml to replace 'latest' with actual version
nano infrastructure/kubernetes/overlays/prod/kustomization.yaml

# Find the images section (lines 163-196) and update:
# BEFORE:
#   - name: bakery/auth-service
#     newTag: latest
# AFTER:
#   - name: bakery/auth-service
#     newTag: v1.0.0

# Do this for ALL 22 services, or use this helper:
export VERSION="1.0.0"  # Your version

# Create a script to update all image tags
cat > /tmp/update-tags.sh <<'EOF'
#!/bin/bash
VERSION="${1:-1.0.0}"
sed -i "s/newTag: latest/newTag: v${VERSION}/g" infrastructure/kubernetes/overlays/prod/kustomization.yaml
EOF

chmod +x /tmp/update-tags.sh
/tmp/update-tags.sh ${VERSION}

# Verify no 'latest' tags remain
grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml | grep -c "latest"
# Should return: 0

Step 2.3: Fix SigNoz Namespace References

# Update SigNoz patches to use bakery-ia namespace instead of signoz
sed -i 's/namespace: signoz/namespace: bakery-ia/g' infrastructure/kubernetes/overlays/prod/kustomization.yaml

# Verify changes (should show bakery-ia in all 3 patches)
grep -A 3 "name: signoz" infrastructure/kubernetes/overlays/prod/kustomization.yaml

Step 2.4: Update Cert-Manager Email

# Update Let's Encrypt notification email to your production email
sed -i "s/admin@bakery-ia.local/admin@bakewise.ai/g" \
  infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml

Step 2.5: Verify Production Secrets (Already Configured)

# Production secrets have been pre-configured with strong cryptographic passwords
# No manual action required - secrets are already set in secrets.yaml

# Verify the secrets are configured (optional)
echo "Verifying production secrets configuration..."
grep "JWT_SECRET_KEY" infrastructure/kubernetes/base/secrets.yaml | head -1
grep "AUTH_DB_PASSWORD" infrastructure/kubernetes/base/secrets.yaml | head -1
grep "REDIS_PASSWORD" infrastructure/kubernetes/base/secrets.yaml | head -1

echo "✅ All production secrets are configured and ready for deployment"

Production URLs:


Configuration & Secrets

Production Secrets Status

All core secrets have been pre-configured with strong cryptographic passwords:

  • Database passwords (19 databases) - 24-character random strings
  • JWT secrets - 256-bit cryptographically secure tokens
  • Service API key - 64-character hexadecimal string
  • Redis password - 24-character random string
  • RabbitMQ password - 24-character random string
  • RabbitMQ Erlang cookie - 64-character hexadecimal string

Step 1: Configure External Service Credentials (Email & WhatsApp)

You still need to update these external service credentials:

# Edit the secrets file
nano infrastructure/kubernetes/base/secrets.yaml

# Update ONLY these external service credentials:

# SMTP settings (from email setup):
SMTP_USER: <base64-encoded-username>      # your email
SMTP_PASSWORD: <base64-encoded-password>  # app password

# WhatsApp credentials (from WhatsApp setup - optional):
WHATSAPP_API_KEY: <base64-encoded-key>

# Payment processing (from Stripe setup):
STRIPE_SECRET_KEY: <base64-encoded-key>
STRIPE_WEBHOOK_SECRET: <base64-encoded-secret>

To base64 encode:

echo -n "your-value-here" | base64

CRITICAL: Never commit real secrets to git! The secrets.yaml file should be in .gitignore.

Step 2: CI/CD Secrets Configuration

For production CI/CD setup, additional secrets are required:

# Create Docker Hub credentials secret (for image pulls)
kubectl create secret docker-registry dockerhub-creds \
  --docker-server=docker.io \
  --docker-username=YOUR_DOCKERHUB_USERNAME \
  --docker-password=YOUR_DOCKERHUB_TOKEN \
  --docker-email=your-email@example.com \
  -n bakery-ia

# Create Gitea registry credentials (if using Gitea for CI/CD)
kubectl create secret docker-registry gitea-registry-credentials \
  -n tekton-pipelines \
  --docker-server=gitea.bakery-ia.local:5000 \
  --docker-username=your-username \
  --docker-password=your-password

# Create Git credentials for Flux (if using GitOps)
kubectl create secret generic gitea-credentials \
  -n flux-system \
  --from-literal=username=your-username \
  --from-literal=password=your-password

Step 3: Apply Application Secrets

# Copy manifests to VPS (from local machine)
scp -r infrastructure/kubernetes root@YOUR_VPS_IP:~/

# SSH to VPS
ssh root@YOUR_VPS_IP

# Apply application secrets
kubectl apply -f ~/infrastructure/kubernetes/base/secrets.yaml -n bakery-ia

# Verify secrets created
kubectl get secrets -n bakery-ia
# Should show multiple secrets including postgres-tls, redis-tls, app-secrets, etc.

Database Migrations

For production environments, deploy CI/CD infrastructure components:

# Deploy Tekton Pipelines for CI/CD (optional but recommended for production)
kubectl create namespace tekton-pipelines

# Install Tekton Pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml

# Install Tekton Triggers
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml

# Apply Tekton configurations
kubectl apply -f ~/infrastructure/cicd/tekton/tasks/
kubectl apply -f ~/infrastructure/cicd/tekton/pipelines/
kubectl apply -f ~/infrastructure/cicd/tekton/triggers/

# Verify Tekton deployment
kubectl get pods -n tekton-pipelines

Step 1: Deploy SigNoz Monitoring (BEFORE Application)

⚠️ CRITICAL: SigNoz must be deployed BEFORE the application into the bakery-ia namespace because the production kustomization patches SigNoz resources.

# On VPS
# 1. Ensure bakery-ia namespace exists
kubectl get namespace bakery-ia || kubectl create namespace bakery-ia

# 2. Add Helm repo
helm repo add signoz https://charts.signoz.io
helm repo update

# 3. Install SigNoz into bakery-ia namespace (NOT separate signoz namespace)
helm install signoz signoz/signoz \
  -n bakery-ia \
  --set frontend.service.type=ClusterIP \
  --set clickhouse.persistence.size=20Gi \
  --set clickhouse.persistence.storageClass=microk8s-hostpath

# 4. Wait for SigNoz to be ready (this may take 10-15 minutes)
kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/instance=signoz \
  -n bakery-ia \
  --timeout=900s

# 5. Verify SigNoz components running in bakery-ia namespace
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Should show: signoz-0, signoz-otel-collector, signoz-clickhouse, signoz-zookeeper, signoz-alertmanager

# 6. Verify StatefulSets exist (kustomization will patch these)
kubectl get statefulset -n bakery-ia | grep signoz
# Should show: signoz, signoz-clickhouse

⚠️ Important: Do NOT create a separate signoz namespace. SigNoz must be in bakery-ia namespace for the overlays to work correctly.

Step 2: Deploy Application and Databases

# On VPS
kubectl apply -k ~/infrastructure/kubernetes/overlays/prod

# Wait for databases to be ready (5-10 minutes)
kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/component=database \
  -n bakery-ia \
  --timeout=600s

# Check status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database

Step 2: Run Migrations

Migrations are automatically handled by init containers in each service. Verify they completed:

# Check migration job status
kubectl get jobs -n bakery-ia | grep migration

# All should show "COMPLETIONS = 1/1"

# Check logs if any failed
kubectl logs -n bakery-ia job/auth-migration

Step 3: Verify Database Schemas

# Connect to a database to verify
kubectl exec -n bakery-ia deployment/auth-db -it -- psql -U auth_user -d auth_db

# Inside psql:
\dt          # List tables
\d users     # Describe users table
\q           # Quit

Verification & Testing

Step 1: Check All Pods Running

# View all pods
kubectl get pods -n bakery-ia

# Expected: All pods in "Running" state, none in CrashLoopBackOff

# Check for issues
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"

# View logs for any problematic pods
kubectl logs -n bakery-ia POD_NAME

Step 2: Check Services and Ingress

# View services
kubectl get svc -n bakery-ia

# View ingress
kubectl get ingress -n bakery-ia

# View certificates (should auto-issue from Let's Encrypt)
kubectl get certificate -n bakery-ia

# Describe certificate to check status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia

Step 3: Test Database Connections

# Test PostgreSQL TLS
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
  'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected output: on

# Test Redis TLS
kubectl exec -n bakery-ia deployment/redis -- redis-cli \
  --tls \
  --cert /tls/redis-cert.pem \
  --key /tls/redis-key.pem \
  --cacert /tls/ca-cert.pem \
  -a $REDIS_PASSWORD \
  ping
# Expected output: PONG

Step 4: Test Frontend Access

# Test frontend (replace with your domain)
curl -I https://bakery.yourdomain.com

# Expected: HTTP/2 200 OK

# Test API health
curl https://api.yourdomain.com/health

# Expected: {"status": "healthy"}

Step 5: Test Authentication

# Create a test user (using your frontend or API)
curl -X POST https://api.yourdomain.com/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "test@yourdomain.com",
    "password": "TestPassword123!",
    "name": "Test User"
  }'

# Login
curl -X POST https://api.yourdomain.com/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "test@yourdomain.com",
    "password": "TestPassword123!"
  }'

# Expected: JWT token in response

Step 6: Test Email Delivery

# Trigger a password reset to test email
curl -X POST https://api.yourdomain.com/api/v1/auth/forgot-password \
  -H "Content-Type: application/json" \
  -d '{"email": "test@yourdomain.com"}'

# Check your email inbox for the reset link
# Check service logs if email not received:
kubectl logs -n bakery-ia deployment/auth-service | grep -i "email\|smtp"

Step 7: Test WhatsApp (Optional)

# Send a test WhatsApp message
# This requires creating a tenant and configuring WhatsApp in the UI
# Or test via API once authenticated

Post-Deployment

Step 1: Access SigNoz Monitoring Stack

Your production deployment includes SigNoz, a unified observability platform that provides complete visibility into your application:

What is SigNoz?

SigNoz is an open-source, all-in-one observability platform that provides:

  • 📊 Distributed Tracing - See end-to-end request flows across all 18 microservices
  • 📈 Metrics Monitoring - Application performance and infrastructure metrics
  • 📝 Log Management - Centralized logs from all services with trace correlation
  • 🔍 Service Performance Monitoring (SPM) - Automatic RED metrics (Rate, Error, Duration)
  • 🗄️ Database Monitoring - All 18 PostgreSQL databases + Redis + RabbitMQ
  • ☸️ Kubernetes Monitoring - Cluster, node, pod, and container metrics

Why SigNoz instead of Prometheus/Grafana?

  • Single unified UI for traces, metrics, and logs (no context switching)
  • Automatic service dependency mapping
  • Built-in APM (Application Performance Monitoring)
  • Log-trace correlation with one click
  • Better query performance with ClickHouse backend
  • Modern UI designed for microservices

Production Monitoring URLs

Access via domain:

https://monitoring.bakewise.ai/signoz        # SigNoz - Main observability UI
https://monitoring.bakewise.ai/alertmanager  # AlertManager - Alert management

Or via port forwarding (if needed):

# SigNoz Frontend (Main UI)
kubectl port-forward -n bakery-ia svc/signoz 8080:8080 &
# Open: http://localhost:8080

# SigNoz AlertManager
kubectl port-forward -n bakery-ia svc/signoz-alertmanager 9093:9093 &
# Open: http://localhost:9093

# OTel Collector (for debugging)
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4317:4317 &  # gRPC
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4318:4318 &  # HTTP

Key SigNoz Features to Explore

Once you open SigNoz (https://monitoring.bakewise.ai/signoz), explore these tabs:

1. Services Tab - Application Performance

  • View all 18 microservices with live metrics
  • See request rate, error rate, and latency (P50/P90/P99)
  • Click on any service to drill down into operations
  • Identify slow endpoints and error-prone operations

2. Traces Tab - Request Flow Visualization

  • See complete request journeys across services
  • Identify bottlenecks (slow database queries, API calls)
  • Debug errors with full stack traces
  • Correlate with logs for complete context

3. Dashboards Tab - Infrastructure & Database Metrics

  • PostgreSQL - Monitor all 18 databases (connections, queries, cache hit ratio)
  • Redis - Cache performance (memory, hit rate, commands/sec)
  • RabbitMQ - Message queue health (depth, rates, consumers)
  • Kubernetes - Cluster metrics (nodes, pods, containers)

4. Logs Tab - Centralized Log Management

  • Search and filter logs from all services
  • Click on trace ID in logs to see related request trace
  • Auto-enriched with Kubernetes metadata (pod, namespace, container)
  • Identify patterns and anomalies

5. Alerts Tab - Proactive Monitoring

  • Configure alerts on metrics, traces, or logs
  • Email/Slack/Webhook notifications
  • View firing alerts and alert history

Quick Health Check

# Verify SigNoz components are running
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz

# Expected output:
# signoz-0                              READY 1/1
# signoz-otel-collector-xxx             READY 1/1
# signoz-alertmanager-xxx               READY 1/1
# signoz-clickhouse-xxx                 READY 1/1
# signoz-zookeeper-xxx                  READY 1/1

# Check OTel Collector health
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- wget -qO- http://localhost:13133

# View recent telemetry in OTel Collector logs
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=50 | grep -i "traces\|metrics\|logs"

Verify Telemetry is Working

  1. Check Services are Reporting:

    # Open SigNoz and navigate to Services tab
    # You should see all 18 microservices listed
    
    # If services are missing, check if they're sending telemetry:
    kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel"
    
  2. Check Database Metrics:

    # Navigate to Dashboards → PostgreSQL in SigNoz
    # You should see metrics from all 18 databases
    
    # Verify OTel Collector is scraping databases:
    kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep postgresql
    
  3. Check Traces are Being Collected:

    # Make a test API request
    curl https://bakewise.ai/api/v1/health
    
    # Navigate to Traces tab in SigNoz
    # Search for "gateway" service
    # You should see the trace for your request
    
  4. Check Logs are Being Collected:

    # Navigate to Logs tab in SigNoz
    # Filter by namespace: bakery-ia
    # You should see logs from all pods
    
    # Verify filelog receiver is working:
    kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep filelog
    

If you deployed the CI/CD infrastructure, configure it for your workflow:

Gitea Setup (Git Server + Registry)

# Access Gitea at: http://gitea.bakery-ia.local (for dev) or http://gitea.bakewise.ai (for prod)
# Make sure to add the appropriate hostname to /etc/hosts or configure DNS

# Create your repositories for each service
# Configure webhook to trigger Tekton pipelines

Tekton Pipeline Configuration

# Verify Tekton pipelines are running
kubectl get pods -n tekton-pipelines

# Create a PipelineRun manually to test:
kubectl create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: manual-ci-run
  namespace: tekton-pipelines
spec:
  pipelineRef:
    name: bakery-ia-ci
  workspaces:
    - name: shared-workspace
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi
    - name: docker-credentials
      secret:
        secretName: gitea-registry-credentials
  params:
    - name: git-url
      value: "http://gitea.bakery-ia.local/bakery-admin/bakery-ia.git"
    - name: git-revision
      value: "main"
EOF

Flux CD Configuration (GitOps)

# Verify Flux is running
kubectl get pods -n flux-system

# Set up GitRepository and Kustomization resources for GitOps deployment
# Example:
cat <<EOF | kubectl apply -f -
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: bakery-ia
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/your-org/bakery-ia.git
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: bakery-ia
  namespace: flux-system
spec:
  interval: 5m
  sourceRef:
    kind: GitRepository
    name: bakery-ia
  path: ./infrastructure/environments/prod/k8s-manifests
  prune: true
  validation: client
EOF

Step 2: Configure Alerting

SigNoz includes integrated alerting with AlertManager. Configure it for your team:

Update Email Notification Settings

The alerting configuration is in the SigNoz Helm values. To update:

# For production, edit the values file:
nano infrastructure/helm/signoz-values-prod.yaml

# Update the alertmanager.config section:
# 1. Update SMTP settings:
#    - smtp_from: 'your-alerts@bakewise.ai'
#    - smtp_auth_username: 'your-alerts@bakewise.ai'
#    - smtp_auth_password: (use Kubernetes secret)
#
# 2. Update receivers:
#    - critical-alerts email: critical-alerts@bakewise.ai
#    - warning-alerts email: oncall@bakewise.ai
#
# 3. (Optional) Add Slack webhook for critical alerts

# Apply the updated configuration:
helm upgrade signoz signoz/signoz \
  -n bakery-ia \
  -f infrastructure/helm/signoz-values-prod.yaml

Create Alerts in SigNoz UI

  1. Open SigNoz Alerts Tab:

    https://monitoring.bakewise.ai/signoz → Alerts
    
  2. Create Common Alerts:

    Alert 1: High Error Rate

    • Name: HighErrorRate
    • Query: error_rate > 5 for 5 minutes
    • Severity: critical
    • Description: "Service {{service_name}} has error rate >5%"

    Alert 2: High Latency

    • Name: HighLatency
    • Query: P99_latency > 3000ms for 5 minutes
    • Severity: warning
    • Description: "Service {{service_name}} P99 latency >3s"

    Alert 3: Service Down

    • Name: ServiceDown
    • Query: request_rate == 0 for 2 minutes
    • Severity: critical
    • Description: "Service {{service_name}} not receiving requests"

    Alert 4: Database Connection Issues

    • Name: DatabaseConnectionsHigh
    • Query: pg_active_connections > 80 for 5 minutes
    • Severity: warning
    • Description: "Database {{database}} connection count >80%"

    Alert 5: High Memory Usage

    • Name: HighMemoryUsage
    • Query: container_memory_percent > 85 for 5 minutes
    • Severity: warning
    • Description: "Pod {{pod_name}} using >85% memory"

Test Alert Delivery

# Method 1: Create a test alert in SigNoz UI
# Go to Alerts → New Alert → Set a test condition that will fire

# Method 2: Fire a test alert via stress test
kubectl run memory-test --image=polinux/stress --restart=Never \
  --namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s

# Check alert appears in SigNoz Alerts tab
# https://monitoring.bakewise.ai/signoz → Alerts

# Also check AlertManager
# https://monitoring.bakewise.ai/alertmanager

# Verify email notification received

# Clean up test
kubectl delete pod memory-test -n bakery-ia

Configure Notification Channels

In SigNoz Alerts tab, configure channels:

  1. Email Channel:

    • Already configured via AlertManager
    • Emails sent to addresses in signoz-values-prod.yaml
  2. Slack Channel (Optional):

    # Add Slack webhook URL to signoz-values-prod.yaml
    # Under alertmanager.config.receivers.critical-alerts.slack_configs:
    #   - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
    #     channel: '#alerts-critical'
    
  3. Webhook Channel (Optional):

    • Configure custom webhook for integration with PagerDuty, OpsGenie, etc.
    • Add to alertmanager.config.receivers

Step 3: Configure Backups

# Create backup script on VPS
cat > ~/backup-databases.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR

# Get all database pods
DBS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database -o name)

for db in $DBS; do
  DB_NAME=$(echo $db | cut -d'/' -f2)
  echo "Backing up $DB_NAME..."

  kubectl exec -n bakery-ia $db -- pg_dump -U postgres > "$BACKUP_DIR/${DB_NAME}.sql"
done

# Compress backups
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"

# Keep only last 7 days
find /backups -name "*.tar.gz" -mtime +7 -delete

echo "Backup completed: $BACKUP_DIR.tar.gz"
EOF

chmod +x ~/backup-databases.sh

# Test backup
./backup-databases.sh

# Setup daily cron job (2 AM)
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -

Step 3: Setup Alerting

# Update AlertManager configuration with your email
kubectl edit configmap -n monitoring alertmanager-config

# Update recipient emails in the routes section

Step 4: Verify SigNoz Monitoring is Working

Before proceeding, ensure all monitoring components are operational:

# 1. Verify SigNoz pods are running
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz

# Expected pods (all should be Running/Ready):
# - signoz-0 (or signoz-1, signoz-2 for HA)
# - signoz-otel-collector-xxx
# - signoz-alertmanager-xxx
# - signoz-clickhouse-xxx
# - signoz-zookeeper-xxx

# 2. Check SigNoz UI is accessible
curl -I https://monitoring.bakewise.ai/signoz
# Should return: HTTP/2 200 OK

# 3. Verify OTel Collector is receiving data
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=100 | grep -i "received"
# Should show: "Traces received: X" "Metrics received: Y" "Logs received: Z"

# 4. Check ClickHouse database is healthy
kubectl exec -n bakery-ia deployment/signoz-clickhouse -- clickhouse-client --query="SELECT count() FROM system.tables WHERE database LIKE 'signoz_%'"
# Should return a number > 0 (tables exist)

Complete Verification Checklist:

  • SigNoz UI loads at https://monitoring.bakewise.ai/signoz
  • Services tab shows all 18 microservices with metrics
  • Traces tab has sample traces from gateway and other services
  • Dashboards tab shows PostgreSQL metrics from all 18 databases
  • Dashboards tab shows Redis metrics (memory, commands, etc.)
  • Dashboards tab shows RabbitMQ metrics (queues, messages)
  • Dashboards tab shows Kubernetes metrics (nodes, pods)
  • Logs tab displays logs from all services in bakery-ia namespace
  • Alerts tab is accessible and can create new alerts
  • AlertManager is reachable at https://monitoring.bakewise.ai/alertmanager

If any checks fail, troubleshoot:

# Check OTel Collector configuration
kubectl describe configmap -n bakery-ia signoz-otel-collector

# Check for errors in OTel Collector
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep -i error

# Check ClickHouse is accepting writes
kubectl logs -n bakery-ia deployment/signoz-clickhouse | grep -i error

# Restart OTel Collector if needed
kubectl rollout restart deployment/signoz-otel-collector -n bakery-ia

Step 5: Document Everything

Create a secure runbook with all credentials and procedures:

Essential Information to Document:

  • VPS login credentials (stored securely in password manager)
  • Database passwords (in password manager)
  • Grafana admin password
  • Domain registrar access (for bakewise.ai)
  • Cloudflare access
  • Email service credentials (SMTP)
  • WhatsApp API credentials
  • Docker Hub / Registry credentials
  • Emergency contact information
  • Rollback procedures
  • Monitoring URLs and access procedures

Step 6: Train Your Team

Conduct a training session covering SigNoz and operational procedures:

Part 1: SigNoz Navigation (30 minutes)

  • Login and Overview

    • Show how to access https://monitoring.bakewise.ai/signoz
    • Navigate through main tabs: Services, Traces, Dashboards, Logs, Alerts
    • Explain the unified nature of SigNoz (all-in-one platform)
  • Services Tab - Application Performance Monitoring

    • Show all 18 microservices
    • Explain RED metrics (Request rate, Error rate, Duration/latency)
    • Demo: Click on a service → Operations → See endpoint breakdown
    • Demo: Identify slow endpoints and high error rates
  • Traces Tab - Request Flow Debugging

    • Show how to search for traces by service, operation, or time
    • Demo: Click on a trace → See full waterfall (service → database → cache)
    • Demo: Find slow database queries in trace spans
    • Demo: Click "View Logs" to correlate trace with logs
  • Dashboards Tab - Infrastructure Monitoring

    • Navigate to PostgreSQL dashboard → Show all 18 databases
    • Navigate to Redis dashboard → Show cache metrics
    • Navigate to Kubernetes dashboard → Show node/pod metrics
    • Explain what metrics indicate issues (connection %, memory %, etc.)
  • Logs Tab - Log Search and Analysis

    • Show how to filter by service, severity, time range
    • Demo: Search for "error" in last hour
    • Demo: Click on trace_id in log → Jump to related trace
    • Show Kubernetes metadata (pod, namespace, container)
  • Alerts Tab - Proactive Monitoring

    • Show how to create alerts on metrics
    • Review pre-configured alerts
    • Show alert history and firing alerts
    • Explain how to acknowledge/silence alerts

Part 2: Operational Tasks (30 minutes)

  • Check application logs (multiple ways)

    # Method 1: Via kubectl (for immediate debugging)
    kubectl logs -n bakery-ia deployment/orders-service --tail=100 -f
    
    # Method 2: Via SigNoz Logs tab (for analysis and correlation)
    # 1. Open https://monitoring.bakewise.ai/signoz → Logs
    # 2. Filter by k8s_deployment_name: orders-service
    # 3. Click on trace_id to see related request flow
    
  • Restart services when needed

    # Restart a service (rolling update, no downtime)
    kubectl rollout restart deployment/orders-service -n bakery-ia
    
    # Verify restart in SigNoz:
    # 1. Check Services tab → orders-service → Should show brief dip then recovery
    # 2. Check Logs tab → Filter by orders-service → See restart logs
    
  • Investigate performance issues

    # Scenario: "Orders API is slow"
    # 1. SigNoz → Services → orders-service → Check P99 latency
    # 2. SigNoz → Traces → Filter service:orders-service, duration:>1s
    # 3. Click on slow trace → Identify bottleneck (DB query? External API?)
    # 4. SigNoz → Dashboards → PostgreSQL → Check orders_db connections/queries
    # 5. Fix identified issue (add index, optimize query, scale service)
    
  • Respond to alerts

Part 3: Documentation and Resources (10 minutes)

Part 4: Hands-On Exercise (15 minutes)

Exercise: Investigate a Simulated Issue

  1. Create a load test to generate traffic
  2. Use SigNoz to find the slowest endpoint
  3. Identify the root cause using traces
  4. Correlate with logs to confirm
  5. Check infrastructure metrics (DB, memory, CPU)
  6. Propose a fix based on findings

This trains the team to use SigNoz effectively for real incidents.


Troubleshooting

Issue: Pods Not Starting

# Check pod status
kubectl describe pod POD_NAME -n bakery-ia

# Common causes:
# 1. Image pull errors
kubectl get events -n bakery-ia | grep -i "pull"

# 2. Resource limits
kubectl describe node

# 3. Volume mount issues
kubectl get pvc -n bakery-ia

Issue: Certificate Not Issuing

# Check certificate status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager

# Check challenges
kubectl get challenges -n bakery-ia

# Verify DNS is correct
nslookup bakery.yourdomain.com

Issue: Database Connection Errors

# Check database pod
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database

# Check database logs
kubectl logs -n bakery-ia deployment/auth-db

# Test connection from service pod
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432

Issue: Services Can't Connect to Databases

# Check if SSL is enabled
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
  'psql -U auth_user -d auth_db -c "SHOW ssl;"'

# Check service logs for SSL errors
kubectl logs -n bakery-ia deployment/auth-service | grep -i "ssl\|tls"

# Restart service to pick up new SSL config
kubectl rollout restart deployment/auth-service -n bakery-ia

Issue: Out of Resources

# Check node resources
kubectl top nodes

# Check pod resource usage
kubectl top pods -n bakery-ia

# Identify resource hogs
kubectl top pods -n bakery-ia --sort-by=memory

# Scale down non-critical services temporarily
kubectl scale deployment monitoring -n bakery-ia --replicas=0

Next Steps After Successful Launch

  1. Monitor for 48 Hours

    • Check dashboards daily
    • Review error logs
    • Monitor resource usage
    • Test all functionality
  2. Optimize Based on Metrics

    • Adjust resource limits if needed
    • Fine-tune autoscaling thresholds
    • Optimize database queries if slow
  3. Onboard First Tenant

    • Create test tenant
    • Upload sample data
    • Test all features
    • Gather feedback
  4. Scale Gradually

    • Add 1-2 tenants at a time
    • Monitor resource usage
    • Upgrade VPS if needed (see scaling guide)
  5. Plan for Growth


Cost Scaling Path

Tenants RAM CPU Storage Monthly Cost
10 20 GB 8 cores 200 GB €40-80
25 32 GB 12 cores 300 GB €80-120
50 48 GB 16 cores 500 GB €150-200
100+ Consider multi-node cluster or managed K8s €300+

Support Resources

Documentation:

Monitoring Access:

  • SigNoz (Primary): https://monitoring.bakewise.ai/signoz - All-in-one observability
    • Services: Application performance monitoring (APM)
    • Traces: Distributed tracing across all services
    • Dashboards: PostgreSQL, Redis, RabbitMQ, Kubernetes metrics
    • Logs: Centralized log management with trace correlation
    • Alerts: Alert configuration and management
  • AlertManager: https://monitoring.bakewise.ai/alertmanager - Alert routing and notifications

External Resources:

Monitoring Architecture:

  • OpenTelemetry: Industry-standard instrumentation framework
    • Auto-instruments FastAPI, HTTPX, SQLAlchemy, Redis
    • Collects traces, metrics, and logs from all services
    • Exports to SigNoz via OTLP protocol (gRPC port 4317, HTTP port 4318)
  • SigNoz Components:
    • Frontend: Web UI for visualization and analysis
    • OTel Collector: Receives and processes telemetry data
    • ClickHouse: Time-series database for fast queries
    • AlertManager: Alert routing and notification delivery
    • Zookeeper: Coordination service for ClickHouse cluster

Summary Checklist

Pre-Deployment Configuration (LOCAL MACHINE)

  • Production secrets configured - JWT, database passwords, API keys (ALREADY DONE)
  • External service credentials - Update SMTP, WhatsApp, Stripe in secrets.yaml
  • imagePullSecrets removed - Delete from all 67 manifests
  • Image tags updated - Change all 'latest' to v1.0.0 (semantic version)
  • SigNoz namespace fixed - Already done (bakery-ia namespace)
  • Cert-manager email updated - Already set to admin@bakewise.ai
  • Stripe publishable key updated - Replace pk_test_... with production key in configmap.yaml
  • Pilot mode verified - VITE_PILOT_MODE_ENABLED=true (default is correct)
  • Manifests validated - No 'latest' tags, no imagePullSecrets remaining

Infrastructure Setup

  • VPS provisioned and accessible
  • k3s (or Kubernetes) installed and configured
  • nginx-ingress-controller installed
  • metrics-server installed and working
  • cert-manager installed
  • local-path-provisioner installed
  • Domain registered and DNS configured
  • Cloudflare protection enabled (optional but recommended)

Secrets and Configuration

  • TLS certificates generated (postgres, redis)
  • Email service configured and tested
  • WhatsApp API setup (optional for launch)
  • Container images built and pushed with version tags
  • Production configs verified (domains, CORS, storage class)
  • Strong passwords generated for all services
  • Docker registry secret created (dockerhub-creds)
  • Application secrets applied

Monitoring

  • SigNoz deployed via Helm
  • SigNoz pods running and healthy
  • signoz namespace created

Application Deployment

  • All pods running successfully
  • Databases accepting TLS connections
  • Let's Encrypt certificates issued
  • Frontend accessible via HTTPS
  • API health check passing
  • Test user can login
  • Email delivery working
  • SigNoz monitoring accessible
  • Metrics flowing to SigNoz
  • Pilot coupon verified - Check tenant-service logs for "Pilot coupon created successfully"

Post-Deployment

  • Backups configured and tested
  • Team trained on operations
  • Documentation complete
  • Emergency procedures documented
  • Monitoring alerts configured

🎉 Congratulations! Your Bakery-IA platform is now live in production!

Estimated total time: 2-4 hours for first deployment Subsequent updates: 15-30 minutes


Document Version: 2.1 Last Updated: 2026-01-20 Maintained By: DevOps Team Changes in v2.1:

  • Updated DNS configuration for Namecheap (primary) with Cloudflare as optional
  • Clarified MicroK8s ingress class is public (not nginx)
  • Updated Let's Encrypt ClusterIssuer documentation to reference pre-configured files
  • Added firewall requirements for clouding.io VPS
  • Emphasized port 80/443 requirements for HTTP-01 challenges

Changes in v2.0:

  • Added critical pre-deployment fixes section
  • Updated infrastructure setup for MicroK8s
  • Added required component installation (nginx-ingress, metrics-server, etc.)
  • Updated configuration steps with domain replacement
  • Added Docker registry secret creation
  • Added SigNoz Helm deployment before application
  • Updated storage class configuration
  • Added image tag version requirements
  • Expanded verification checklist