75 KiB
Bakery-IA Pilot Launch Guide
Complete guide for deploying to production for a 10-tenant pilot program
Last Updated: 2026-01-20 Target Environment: clouding.io VPS with MicroK8s Estimated Cost: €41-81/month Time to Deploy: 3-5 hours (first time, including fixes) Status: ⚠️ REQUIRES PRE-DEPLOYMENT FIXES - See Production VPS Deployment Fixes Version: 3.0
Table of Contents
- Executive Summary
- Infrastructure Architecture Overview
- ⚠️ CRITICAL: Pre-Deployment Fixes
- Pre-Launch Checklist
- VPS Provisioning
- Infrastructure Setup
- Domain & DNS Configuration
- TLS/SSL Certificates
- Email & Communication Setup
- Kubernetes Deployment
- Configuration & Secrets
- Database Migrations
- CI/CD Infrastructure Deployment
- Mailu Email Server Deployment
- Nominatim Geocoding Service
- SigNoz Monitoring Deployment
- Verification & Testing
- Post-Deployment
Executive Summary
What You're Deploying
A complete multi-tenant SaaS platform with:
- 18 microservices (auth, tenant, ML forecasting, inventory, sales, orders, etc.)
- 14 PostgreSQL databases with TLS encryption
- Redis cache with TLS
- RabbitMQ message broker
- Monitoring stack (Prometheus, Grafana, AlertManager)
- Full security (TLS, RBAC, audit logging)
Total Cost Breakdown
| Service | Provider | Monthly Cost |
|---|---|---|
| VPS Server (20GB RAM, 8 vCPU, 200GB SSD) | clouding.io | €40-80 |
| Domain | Namecheap/Cloudflare | €1.25 (€15/year) |
| Zoho Free / Gmail | €0 | |
| WhatsApp API | Meta Business | €0 (1k free conversations) |
| DNS | Cloudflare | €0 |
| SSL | Let's Encrypt | €0 |
| TOTAL | €41-81/month |
Timeline
| Phase | Duration | Description |
|---|---|---|
| Pre-Launch Setup | 1-2 hours | Domain, VPS provisioning, accounts setup |
| Infrastructure Setup | 1 hour | MicroK8s installation, firewall config |
| Deployment | 30-60 min | Deploy all services and databases |
| Verification | 30-60 min | Test everything works |
| Total | 2-4 hours | First-time deployment |
Infrastructure Architecture Overview
Component Layers
The Bakery-IA platform is organized into distinct infrastructure layers, each with specific deployment dependencies.
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 6: APPLICATION │
│ Frontend │ Gateway │ 18 Microservices │ CronJobs & Workers │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 5: MONITORING │
│ SigNoz (Unified Observability) │ AlertManager │ OTel Collector │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 4: PLATFORM SERVICES (Optional) │
│ Mailu (Email) │ Nominatim (Geocoding) │ CI/CD (Tekton, Flux, Gitea) │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 3: DATA & STORAGE │
│ PostgreSQL (18 DBs) │ Redis │ RabbitMQ │ MinIO │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 2: NETWORK & SECURITY │
│ Unbound DNS │ CoreDNS │ Ingress Controller │ Cert-Manager │ TLS │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 1: FOUNDATION │
│ Namespaces │ Storage Classes │ RBAC │ ConfigMaps │ Secrets │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 0: KUBERNETES CLUSTER │
│ MicroK8s (Production) │ Kind (Local Dev) │ EKS (AWS Alternative) │
└─────────────────────────────────────────────────────────────────────────────┘
Deployment Order & Dependencies
Components must be deployed in a specific order due to dependencies:
1. Namespaces (bakery-ia, tekton-pipelines, flux-system)
↓
2. Cert-Manager & ClusterIssuers
↓
3. TLS Certificates (internal + ingress)
↓
4. Unbound DNS Resolver (required for Mailu DNSSEC)
↓
5. CoreDNS Configuration (forward to Unbound)
↓
6. Ingress Controller & Resources
↓
7. Data Layer: PostgreSQL, Redis, RabbitMQ, MinIO
↓
8. Database Migrations
↓
9. Application Services (18 microservices)
↓
10. Gateway & Frontend
↓
11. (Optional) CI/CD: Gitea → Tekton → Flux
↓
12. (Optional) Mailu Email Server
↓
13. (Optional) Nominatim Geocoding
↓
14. (Optional) SigNoz Monitoring
Infrastructure Components Summary
| Component | Purpose | Required | Namespace |
|---|---|---|---|
| MicroK8s | Kubernetes cluster | Yes | - |
| Cert-Manager | TLS certificate management | Yes | cert-manager |
| Ingress-Nginx | External traffic routing | Yes | ingress |
| PostgreSQL | 18 service databases | Yes | bakery-ia |
| Redis | Caching & sessions | Yes | bakery-ia |
| RabbitMQ | Message broker | Yes | bakery-ia |
| MinIO | Object storage (ML models) | Yes | bakery-ia |
| Unbound DNS | DNSSEC resolver | For Mailu | bakery-ia |
| Mailu | Self-hosted email server | Optional | bakery-ia |
| Nominatim | Geocoding service | Optional | bakery-ia |
| Gitea | Git server + container registry | Optional | gitea |
| Tekton | CI/CD pipelines | Optional | tekton-pipelines |
| Flux CD | GitOps deployment | Optional | flux-system |
| SigNoz | Unified observability | Recommended | bakery-ia |
Quick Reference: What to Deploy
Minimal Production Setup:
- Kubernetes cluster + addons
- Core infrastructure (databases, cache, broker)
- Application services
- External email (Zoho/Gmail)
Full Production Setup (Recommended):
- Everything above, plus:
- Mailu (self-hosted email)
- SigNoz (monitoring)
- CI/CD (Gitea + Tekton + Flux)
- Nominatim (if geocoding needed)
⚠️ CRITICAL: Pre-Deployment Configuration
READ THIS FIRST: The Kubernetes configuration requires updates for secure production deployment.
🔴 Configuration Status
Your manifests need the following updates before deploying to production:
Required Configuration Changes
1. Remove imagePullSecrets (BLOCKING)
Why: Images are public/don't require authentication Impact if skipped: All pods fail with ImagePullBackOff
2. Update Image Tags to Semantic Versions (BLOCKING)
Why: Using 'latest' causes non-deterministic deployments Impact if skipped: Unpredictable behavior, impossible rollbacks
3. Fix SigNoz Namespace References (BLOCKING) - ✅ ALREADY FIXED
Why: SigNoz must be in bakery-ia namespace Impact if skipped: Kustomize apply fails Status: ✅ Fixed in latest commit
4. Production Secrets (ALREADY CONFIGURED) ✅
Status: Strong production secrets have been generated and configured Impact if skipped: N/A - This step is already completed
5. Update Cert-Manager Email (HIGH PRIORITY) - ✅ ALREADY FIXED
Why: Receive Let's Encrypt renewal notifications
Impact if skipped: Won't receive SSL expiry warnings
Status: ✅ Fixed - email is now admin@bakewise.ai
6. Update Stripe Publishable Key (HIGH PRIORITY)
Why: Payment processing requires production Stripe key
Impact if skipped: Payments will use test mode (no real charges)
File: infrastructure/kubernetes/base/configmap.yaml line 378
Current value: pk_test_your_stripe_publishable_key_here
Required: Your Stripe production publishable key from https://dashboard.stripe.com/apikeys
7. Pilot Coupon Configuration (OPTIONAL)
Why: Control pilot program settings
Files: infrastructure/kubernetes/base/configmap.yaml lines 375-377
Current values (defaults are correct for pilot):
VITE_PILOT_MODE_ENABLED: "true"- Enables pilot UI featuresVITE_PILOT_COUPON_CODE: "PILOT2025"- Coupon code for 3 months freeVITE_PILOT_TRIAL_MONTHS: "3"- Trial extension duration
Note: The PILOT2025 coupon is automatically created when tenant-service starts.
No manual seeding required - it's handled by app/jobs/startup_seeder.py.
✅ Already Correct (No Changes Needed)
- Storage Class -
microk8s-hostpathis correct for MicroK8s - Domain Names -
bakewise.aiis your production domain - Service Types - ClusterIP + Ingress is correct architecture
- Network Policies - Not required for single-namespace deployment
- SigNoz Namespace - ✅ Fixed to use bakery-ia namespace
Step-by-Step Configuration Script
Run these commands on your local machine before deployment:
# Navigate to repository root
cd /path/to/bakery-ia
# ========================================
# STEP 1: Remove imagePullSecrets
# ========================================
echo "Step 1: Removing imagePullSecrets..."
chmod +x infrastructure/kubernetes/remove-imagepullsecrets.sh
./infrastructure/kubernetes/remove-imagepullsecrets.sh
# Verify removal
grep -r "imagePullSecrets" infrastructure/kubernetes/base/ && \
echo "⚠️ WARNING: Some files still have imagePullSecrets" || \
echo "✅ imagePullSecrets removed"
# ========================================
# STEP 2: Update Image Tags
# ========================================
echo -e "\nStep 2: Updating image tags..."
export VERSION="1.0.0" # Change this to your version
sed -i.bak "s/newTag: latest/newTag: v${VERSION}/g" infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Verify no 'latest' tags remain
grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml | grep "latest" && \
echo "⚠️ WARNING: Some images still use 'latest'" || \
echo "✅ All images now use version v${VERSION}"
# ========================================
# STEP 3: Production Secrets (ALREADY DONE) ✅
# ========================================
echo -e "\nStep 3: Verifying production secrets..."
echo "✅ Production secrets have been pre-configured with strong passwords"
echo " - JWT secrets: 256-bit cryptographically secure"
echo " - Database passwords: 24-character random strings"
echo " - Redis password: 24-character random string"
echo " - RabbitMQ password: 24-character random string"
echo " - Service API key: 64-character hex string"
echo ""
echo "All secrets are already set in infrastructure/kubernetes/base/secrets.yaml"
echo "No manual action required for this step."
# ========================================
# STEP 4: Cert-Manager Email (ALREADY FIXED)
# ========================================
echo -e "\nStep 4: Verifying cert-manager email..."
grep "admin@bakewise.ai" infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml && \
echo "✅ Cert-manager email already set to admin@bakewise.ai" || \
echo "⚠️ WARNING: Cert-manager email needs updating"
# ========================================
# STEP 5: Update Stripe Publishable Key
# ========================================
echo -e "\nStep 5: Stripe Publishable Key Configuration..."
echo "================================================================"
echo "⚠️ MANUAL STEP REQUIRED"
echo ""
echo "Edit: infrastructure/kubernetes/base/configmap.yaml"
echo "Find: VITE_STRIPE_PUBLISHABLE_KEY: \"pk_test_your_stripe_publishable_key_here\""
echo "Replace with your production Stripe publishable key from:"
echo " https://dashboard.stripe.com/apikeys"
echo ""
echo "Example:"
echo " VITE_STRIPE_PUBLISHABLE_KEY: \"pk_live_XXXXXXXXXXXXXXXXXXXX\""
echo ""
echo "Press Enter when you've updated the Stripe key..."
read
# ========================================
# FINAL VALIDATION
# ========================================
echo -e "\n========================================"
echo "Pre-Deployment Configuration Complete!"
echo "========================================"
echo ""
echo "Validation Checklist:"
echo " ✅ imagePullSecrets removed"
echo " ✅ Image tags updated to v${VERSION}"
echo " ✅ SigNoz namespace fixed (bakery-ia)"
echo " ✅ Production secrets configured with strong passwords"
echo " ✅ Cert-manager email set to admin@bakewise.ai"
echo " ⚠️ Stripe publishable key updated (manual verification required)"
echo " ✅ Pilot coupon auto-seeded on tenant-service startup"
echo ""
echo "Next: Copy manifests to VPS and begin deployment"
Manual Verification
After running the script above:
-
Verify production secrets are configured:
# Verify secrets.yaml has strong passwords (not placeholders) grep "JWT_SECRET_KEY" infrastructure/kubernetes/base/secrets.yaml # Should show: dXNNSHc5a1FDUW95cmM3d1BtTWkzYkNscjBsVFk5d3Z6Wm1jVGJBRHZMMD0= # (This is the base64-encoded production JWT secret) -
Check image tags:
grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml # All should show v1.0.0 (or your version), NOT 'latest' -
Verify SigNoz namespace:
grep -A 3 "name: signoz" infrastructure/kubernetes/overlays/prod/kustomization.yaml # All should show: namespace: bakery-ia
⏱️ Estimated Time: 30-45 minutes
Pre-Launch Checklist
Required Accounts & Services
-
Domain Name
- Register at Namecheap or Cloudflare (€10-15/year)
- Suggested:
bakeryforecast.esorbakery-ia.com
-
VPS Account
- Sign up at clouding.io
- Payment method configured
-
Email Service (Choose ONE)
- Option A: Zoho Mail FREE (recommended for full send/receive)
- Option B: Gmail SMTP + domain forwarding
- Option C: Google Workspace (14-day free trial, then €5.75/month)
-
WhatsApp Business API
- Create Meta Business Account (free)
- Verify business identity
- Phone number ready (non-VoIP)
-
DNS Access
- Cloudflare account (free, recommended)
- Or domain registrar DNS panel access
-
Container Registry (Choose ONE)
- Option A: Docker Hub account (recommended)
- Option B: GitHub Container Registry
- Option C: MicroK8s built-in registry
Required Tools on Local Machine
# Verify you have these installed:
kubectl version --client
docker --version
git --version
ssh -V
openssl version
# Install if missing (macOS):
brew install kubectl docker git openssh openssl
Repository Setup
# Clone the repository
git clone https://github.com/yourusername/bakery-ia.git
cd bakery-ia
# Verify structure
ls infrastructure/kubernetes/overlays/prod/
VPS Provisioning
Recommended Configuration
For 10-tenant pilot program:
- RAM: 20 GB
- CPU: 8 vCPU cores
- Storage: 200 GB NVMe SSD (triple replica)
- Network: 1 Gbps connection
- OS: Ubuntu 22.04 LTS
- Monthly Cost: €40-80 (check current pricing)
Why These Specs?
Memory Breakdown:
- Application services: 14.1 GB
- Databases (18 instances): 4.6 GB
- Infrastructure (Redis, RabbitMQ): 0.8 GB
- Gateway/Frontend: 1.8 GB
- Monitoring: 1.5 GB
- System overhead: ~3 GB
- Total: ~26 GB capacity needed, 20 GB is sufficient with HPA
Storage Breakdown:
- Databases: 36 GB (18 × 2GB)
- ML Models: 10 GB
- Redis: 1 GB
- RabbitMQ: 2 GB
- Prometheus metrics: 20 GB
- Container images: ~30 GB
- Growth buffer: 100 GB
- Total: 199 GB
Provisioning Steps
-
Create VPS at clouding.io:
1. Log in to clouding.io dashboard 2. Click "Create New Server" 3. Select: - OS: Ubuntu 22.04 LTS - RAM: 20 GB - CPU: 8 vCPU - Storage: 200 GB NVMe SSD - Location: Barcelona (best for Spain) 4. Set hostname: bakery-ia-prod-01 5. Add SSH key (or use password) 6. Create server -
Note your server details:
# Save these for later: VPS_IP="YOUR_VPS_IP_ADDRESS" VPS_ROOT_PASSWORD="YOUR_ROOT_PASSWORD" # If not using SSH key -
Initial SSH connection:
# Test connection ssh root@$VPS_IP # Update system apt update && apt upgrade -y
Infrastructure Setup
Step 1: Install MicroK8s
Using MicroK8s for production VPS deployment on clouding.io
# SSH into your VPS
ssh root@$VPS_IP
# Update system
apt update && apt upgrade -y
# Install MicroK8s
snap install microk8s --classic --channel=1.28/stable
# Add your user to microk8s group
usermod -a -G microk8s $USER
chown -f -R $USER ~/.kube
newgrp microk8s
# Verify installation
microk8s status --wait-ready
Step 2: Enable Required MicroK8s Addons
All required components are available as MicroK8s addons:
# Enable core addons
microk8s enable dns # DNS resolution within cluster
microk8s enable hostpath-storage # Provides microk8s-hostpath storage class
microk8s enable ingress # Nginx ingress controller (uses class "public")
microk8s enable cert-manager # Let's Encrypt SSL certificates
microk8s enable metrics-server # For HPA autoscaling
microk8s enable rbac # Role-based access control
# Setup kubectl alias
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc
# Verify all components are running
kubectl get nodes
# Should show: Ready
kubectl get storageclass
# Should show: microk8s-hostpath (default)
kubectl get pods -A
# Should show pods in: kube-system, ingress, cert-manager namespaces
# Verify ingress controller is running
kubectl get pods -n ingress
# Should show: nginx-ingress-microk8s-controller-xxx Running
# Verify cert-manager is running
kubectl get pods -n cert-manager
# Should show: cert-manager-xxx, cert-manager-webhook-xxx, cert-manager-cainjector-xxx
# Verify metrics-server is working
kubectl top nodes
# Should return CPU/Memory metrics
Important - MicroK8s Ingress Class:
- MicroK8s ingress addon uses class name
public(NOTnginx) - The ClusterIssuers in this repo are already configured with
class: public - If you see cert-manager challenges failing, verify the ingress class matches
Optional but Recommended:
# Enable Prometheus for additional monitoring (optional)
microk8s enable prometheus
# Enable registry if you want local image storage (optional)
microk8s enable registry
Step 3: Enhanced Infrastructure Components
The platform includes additional infrastructure components that enhance security, monitoring, and operations:
# The platform includes Mailu for email services
# Deploy Mailu via Helm (optional but recommended for production):
kubectl create namespace bakery-ia --dry-run=client -o yaml | kubectl apply -f -
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update
helm install mailu mailu/mailu \
-n bakery-ia \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
--timeout 10m \
--wait
# Verify Mailu deployment
kubectl get pods -n bakery-ia | grep mailu
For development environments, ensure the prepull-base-images script is run:
# On your local machine, run the prepull script to cache base images
cd bakery-ia
chmod +x scripts/prepull-base-images.sh
./scripts/prepull-base-images.sh
For production environments, ensure CI/CD infrastructure is properly configured:
# Tekton Pipelines for CI/CD (optional - can be deployed separately)
kubectl create namespace tekton-pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
# Flux CD for GitOps (already enabled in MicroK8s if needed)
# flux install --namespace=flux-system --network-policy=false
Step 4: Configure Firewall
CRITICAL: Ports 80 and 443 must be open for Let's Encrypt HTTP-01 challenges to work.
# Allow necessary ports
ufw allow 22/tcp # SSH
ufw allow 80/tcp # HTTP - REQUIRED for Let's Encrypt HTTP-01 challenge
ufw allow 443/tcp # HTTPS - For your application traffic
ufw allow 16443/tcp # Kubernetes API (optional, for remote kubectl access)
# Enable firewall
ufw enable
# Check status
ufw status verbose
# Expected output should include:
# 80/tcp ALLOW Anywhere
# 443/tcp ALLOW Anywhere
Also check clouding.io firewall:
- Log in to clouding.io dashboard
- Go to your VPS → Firewall settings
- Ensure ports 80 and 443 are allowed from anywhere (0.0.0.0/0)
Step 5: Create Namespace
# Create bakery-ia namespace
kubectl create namespace bakery-ia
# Verify
kubectl get namespaces
Domain & DNS Configuration
Step 1: Register Domain at Namecheap
- Go to Namecheap
- Search for your desired domain (e.g.,
bakewise.ia) - Complete purchase (~€10-15/year)
- Save domain credentials
Step 2: Configure DNS at Namecheap
-
Access DNS settings:
1. Log in to Namecheap 2. Go to Domain List → Manage → Advanced DNS -
Add DNS records pointing to your VPS:
Type Host Value TTL A @ YOUR_VPS_IP Automatic A * YOUR_VPS_IP AutomaticThis points both
bakewise.iaand all subdomains (*.bakewise.ia) to your VPS. -
Test DNS propagation:
# Wait 5-10 minutes, then test nslookup bakewise.ia nslookup api.bakewise.ia nslookup mail.bakewise.ia
Step 3 (Optional): Configure Cloudflare DNS
-
Add site to Cloudflare:
1. Log in to Cloudflare 2. Click "Add a Site" 3. Enter your domain name 4. Choose Free plan 5. Cloudflare will scan existing DNS records -
Update nameservers at registrar:
Point your domain's nameservers to Cloudflare: - NS1: assigned.cloudflare.com - NS2: assigned.cloudflare.com (Cloudflare will provide the exact values) -
Add DNS records:
Type Name Content TTL Proxy A @ YOUR_VPS_IP Auto Yes A www YOUR_VPS_IP Auto Yes A api YOUR_VPS_IP Auto Yes A monitoring YOUR_VPS_IP Auto Yes CNAME * yourdomain.com Auto No -
Configure SSL/TLS mode:
SSL/TLS tab → Overview → Set to "Full (strict)" -
Test DNS propagation:
# Wait 5-10 minutes, then test nslookup yourdomain.com nslookup api.yourdomain.com
TLS/SSL Certificates
Understanding Certificate Setup
The platform uses two layers of SSL/TLS:
- External (Ingress) SSL: Let's Encrypt for public HTTPS
- Internal (Database) SSL: Self-signed certificates for database connections
Step 1: Generate Internal Certificates
# On your local machine
cd infrastructure/tls
# Generate certificates
./generate-certificates.sh
# This creates:
# - ca/ (Certificate Authority)
# - postgres/ (PostgreSQL server certs)
# - redis/ (Redis server certs)
Certificate Details:
- Root CA: 10-year validity (expires 2035)
- Server certs: 3-year validity (expires October 2028)
- Algorithm: RSA 4096-bit
- Signature: SHA-256
Step 2: Create Kubernetes Secrets
# Create PostgreSQL TLS secret
kubectl create secret generic postgres-tls \
--from-file=server-cert.pem=infrastructure/tls/postgres/server-cert.pem \
--from-file=server-key.pem=infrastructure/tls/postgres/server-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/postgres/ca-cert.pem \
-n bakery-ia
# Create Redis TLS secret
kubectl create secret generic redis-tls \
--from-file=redis-cert.pem=infrastructure/tls/redis/redis-cert.pem \
--from-file=redis-key.pem=infrastructure/tls/redis/redis-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/redis/ca-cert.pem \
-n bakery-ia
# Verify secrets created
kubectl get secrets -n bakery-ia | grep tls
Step 3: Configure Let's Encrypt (External SSL)
cert-manager is already enabled via microk8s enable cert-manager. The ClusterIssuer is pre-configured in the repository.
Important: MicroK8s ingress addon uses ingress class public (not nginx). This is already configured in:
infrastructure/platform/cert-manager/cluster-issuer-production.yamlinfrastructure/platform/cert-manager/cluster-issuer-staging.yaml
# On VPS, apply the pre-configured ClusterIssuers
kubectl apply -k infrastructure/platform/cert-manager/
# Verify ClusterIssuers are ready
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-production
# Expected output:
# NAME READY AGE
# letsencrypt-production True 1m
# letsencrypt-staging True 1m
Configuration details (already set):
- Email:
admin@bakewise.ai(receives Let's Encrypt expiry notifications) - Ingress class:
public(MicroK8s default) - Challenge type: HTTP-01 (requires port 80 open)
If you need to customize the email, edit before applying:
# Edit the production issuer
nano infrastructure/platform/cert-manager/cluster-issuer-production.yaml
# Change: email: admin@bakewise.ai → email: your-email@yourdomain.com
Email & Communication Setup
Option A: Zoho Mail (FREE, Recommended)
Features:
- ✅ Free forever for 1 domain, 5 users
- ✅ 5GB storage per user
- ✅ Full send/receive capability
- ✅ Web interface + SMTP/IMAP
- ✅ Professional email addresses
Setup Steps:
-
Sign up for Zoho Mail:
1. Go to https://www.zoho.com/mail/ 2. Click "Sign Up for Free" 3. Choose "Forever Free" plan 4. Enter your domain name 5. Complete verification -
Verify domain ownership:
Add TXT record to your DNS: Type: TXT Name: @ Value: zoho-verification=XXXXX.zoho.com -
Configure MX records:
Priority Type Name Value 10 MX @ mx.zoho.com 20 MX @ mx2.zoho.com 50 MX @ mx3.zoho.com -
Get SMTP credentials:
SMTP Host: smtp.zoho.com SMTP Port: 587 SMTP Username: noreply@yourdomain.com SMTP Password: (generate app password in Zoho settings)
Option B: Gmail SMTP + Forwarding
Features:
- ✅ Completely free
- ✅ 500 emails/day (sufficient for pilot)
- ✅ Receive via domain forwarding
Setup Steps:
-
Enable 2FA on your Gmail:
1. Go to myaccount.google.com 2. Security → 2-Step Verification 3. Enable and complete setup -
Generate app password:
1. Security → 2-Step Verification → App passwords 2. Select "Mail" and "Other (Custom name)" 3. Name it "Bakery-IA SMTP" 4. Copy the 16-character password -
Configure domain email forwarding:
At your domain registrar or Cloudflare: - Forward noreply@yourdomain.com → your.gmail@gmail.com - Forward alerts@yourdomain.com → your.gmail@gmail.com -
SMTP Settings:
SMTP Host: smtp.gmail.com SMTP Port: 587 SMTP Username: your.gmail@gmail.com SMTP Password: (16-char app password from step 2) From Email: noreply@yourdomain.com
Option C: Self-Hosted Mailu (RECOMMENDED for Production)
Features:
- ✅ Full control over email infrastructure
- ✅ No external dependencies or rate limits
- ✅ Built-in antispam (rspamd) with DNSSEC validation
- ✅ Webmail interface (Roundcube)
- ✅ IMAP/SMTP with TLS
- ✅ Admin panel for user management
- ✅ Integrated with Kubernetes
Why Mailu for Production:
- Complete email stack (Postfix, Dovecot, Rspamd, ClamAV)
- DNSSEC validation for email authentication (DKIM/SPF/DMARC)
- No monthly email limits or third-party dependencies
- Professional email addresses: admin@bakewise.ai, noreply@bakewise.ai
Prerequisites
Before deploying Mailu, ensure:
- Unbound DNS is deployed (for DNSSEC validation)
- CoreDNS is configured to forward to Unbound
- DNS records are configured for your domain
Step 1: Configure DNS Records
Add these DNS records for your domain (e.g., bakewise.ai):
Type Name Value TTL
A mail YOUR_VPS_IP Auto
MX @ mail.bakewise.ai (priority 10) Auto
TXT @ v=spf1 mx a -all Auto
TXT _dmarc v=DMARC1; p=reject; rua=... Auto
DKIM record will be generated after Mailu is running - you'll add it later.
Step 2: Deploy Unbound DNS Resolver
Unbound provides DNSSEC validation required by Mailu for email authentication.
# On VPS - Deploy Unbound via Helm
helm upgrade --install unbound infrastructure/platform/networking/dns/unbound-helm \
-n bakery-ia \
--create-namespace \
-f infrastructure/platform/networking/dns/unbound-helm/values.yaml \
-f infrastructure/platform/networking/dns/unbound-helm/prod/values.yaml \
--timeout 5m \
--wait
# Verify Unbound is running
kubectl get pods -n bakery-ia | grep unbound
# Should show: unbound-xxx 1/1 Running
# Get Unbound service IP (needed for CoreDNS configuration)
UNBOUND_IP=$(kubectl get svc unbound-dns -n bakery-ia -o jsonpath='{.spec.clusterIP}')
echo "Unbound DNS IP: $UNBOUND_IP"
Step 3: Configure CoreDNS for DNSSEC
Mailu requires DNSSEC validation. Configure CoreDNS to forward external queries to Unbound:
# Get the Unbound service IP
UNBOUND_IP=$(kubectl get svc unbound-dns -n bakery-ia -o jsonpath='{.spec.clusterIP}')
# Patch CoreDNS to forward to Unbound
kubectl patch configmap coredns -n kube-system --type merge -p "{
\"data\": {
\"Corefile\": \".:53 {\\n errors\\n health {\\n lameduck 5s\\n }\\n ready\\n kubernetes cluster.local in-addr.arpa ip6.arpa {\\n pods insecure\\n fallthrough in-addr.arpa ip6.arpa\\n ttl 30\\n }\\n prometheus :9153\\n forward . $UNBOUND_IP {\\n max_concurrent 1000\\n }\\n cache 30 {\\n disable success cluster.local\\n disable denial cluster.local\\n }\\n loop\\n reload\\n loadbalance\\n}\\n\"
}
}"
# Restart CoreDNS to apply changes
kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system --timeout=60s
# Verify DNSSEC is working
kubectl run -it --rm debug --image=alpine --restart=Never -- \
sh -c "apk add drill && drill -D google.com"
# Should show: ;; flags: ... ad ... (ad = authenticated data = DNSSEC valid)
Step 4: Create TLS Certificate Secret
Mailu Front pod requires a TLS certificate:
# Generate self-signed certificate for internal use
# (Let's Encrypt handles external TLS via Ingress)
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt \
-subj "/CN=mail.bakewise.ai/O=bakewise"
kubectl create secret tls mailu-certificates \
--cert=tls.crt \
--key=tls.key \
-n bakery-ia
rm -rf "$TEMP_DIR"
# Verify secret created
kubectl get secret mailu-certificates -n bakery-ia
Step 5: Deploy Mailu via Helm
# Add Mailu Helm repository
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update mailu
# Deploy Mailu with production values
helm upgrade --install mailu mailu/mailu \
-n bakery-ia \
--create-namespace \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
-f infrastructure/platform/mail/mailu-helm/prod/values.yaml \
--timeout 10m
# Wait for pods to be ready (may take 5-10 minutes for ClamAV)
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=mailu -w
Step 6: Create Admin User
# Create initial admin user
kubectl exec -it -n bakery-ia deployment/mailu-admin -- \
flask mailu admin admin bakewise.ai 'YourSecurePassword123!'
# Credentials:
# Email: admin@bakewise.ai
# Password: YourSecurePassword123!
Step 7: Configure DKIM
After Mailu is running, get the DKIM key and add it to DNS:
# Get DKIM public key
kubectl exec -n bakery-ia deployment/mailu-admin -- \
cat /dkim/bakewise.ai.dkim.pub
# Add this as a TXT record in your DNS:
# Name: dkim._domainkey
# Value: (the key from above)
Step 8: Verify Email Setup
# Check all Mailu pods are running
kubectl get pods -n bakery-ia | grep mailu
# Expected: All 10 pods in Running state
# Test SMTP connectivity
kubectl run -it --rm smtp-test --image=alpine --restart=Never -- \
sh -c "apk add swaks && swaks --to test@example.com --from admin@bakewise.ai --server mailu-front.bakery-ia.svc.cluster.local:25"
# Access webmail (via port-forward for testing)
kubectl port-forward -n bakery-ia svc/mailu-front 8080:80
# Open: http://localhost:8080/webmail
Production Email Endpoints
| Service | URL/Address |
|---|---|
| Admin Panel | https://mail.bakewise.ai/admin |
| Webmail | https://mail.bakewise.ai/webmail |
| SMTP (STARTTLS) | mail.bakewise.ai:587 |
| SMTP (SSL) | mail.bakewise.ai:465 |
| IMAP (SSL) | mail.bakewise.ai:993 |
Troubleshooting Mailu
Issue: Admin pod CrashLoopBackOff with "DNSSEC validation" error
# Verify CoreDNS is forwarding to Unbound
kubectl get configmap coredns -n kube-system -o yaml | grep forward
# Should show: forward . <unbound-ip>
# If not, re-run Step 3 above
Issue: Front pod stuck in ContainerCreating
# Check for missing certificate secret
kubectl describe pod -n bakery-ia -l app.kubernetes.io/component=front | grep -A5 Events
# If missing mailu-certificates, re-run Step 4 above
Issue: Admin pod can't connect to Redis
# Verify externalRedis is disabled in values
helm get values mailu -n bakery-ia | grep -A5 externalRedis
# Should show: enabled: false
# If enabled: true, upgrade with correct values
helm upgrade mailu mailu/mailu -n bakery-ia \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
-f infrastructure/platform/mail/mailu-helm/prod/values.yaml
WhatsApp Business API Setup
Features:
- ✅ First 1,000 conversations/month FREE
- ✅ Perfect for 10 tenants (~500 messages/month)
Setup Steps:
-
Create Meta Business Account:
1. Go to business.facebook.com 2. Create Business Account 3. Complete business verification -
Add WhatsApp Product:
1. Go to developers.facebook.com 2. Create New App → Business 3. Add WhatsApp product 4. Complete setup wizard -
Configure Phone Number:
1. Test with your personal number initially 2. Later: Get dedicated business number 3. Verify phone number with SMS code -
Create Message Templates:
1. Go to WhatsApp Manager 2. Create templates for: - Low inventory alert - Expired product alert - Forecast summary - Order notification 3. Submit for approval (15 min - 24 hours) -
Get API Credentials:
Save these values: - Phone Number ID: (from WhatsApp Manager) - Access Token: (from App Dashboard) - Business Account ID: (from WhatsApp Manager) - Webhook Verify Token: (create your own secure string)
Kubernetes Deployment
Step 1: Prepare Container Images
Option A: Using Docker Hub (Recommended)
# On your local machine
docker login
# Build all images
docker-compose build
# Tag images for Docker Hub
# Replace YOUR_USERNAME with your Docker Hub username
export DOCKER_USERNAME="YOUR_USERNAME"
./scripts/tag-images.sh $DOCKER_USERNAME
# Push to Docker Hub
./scripts/push-images.sh $DOCKER_USERNAME
# Update prod kustomization with your username
# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Replace all "bakery/" with "$DOCKER_USERNAME/"
Option B: Using MicroK8s Registry
# On VPS
microk8s enable registry
# Get registry address (usually localhost:32000)
kubectl get service -n container-registry
# On local machine, configure insecure registry
# Edit /etc/docker/daemon.json:
{
"insecure-registries": ["YOUR_VPS_IP:32000"]
}
# Restart Docker
sudo systemctl restart docker
# Tag and push images
docker tag bakery/auth-service YOUR_VPS_IP:32000/bakery/auth-service
docker push YOUR_VPS_IP:32000/bakery/auth-service
# Repeat for all services...
Step 2: Update Production Configuration
⚠️ CRITICAL: The default configuration uses bakewise.ai domain. You MUST update this before deployment if using a different domain.
Required Configuration Updates
Step 2.1: Remove imagePullSecrets
# On your local machine
cd bakery-ia
# Remove imagePullSecrets from all deployment files
find infrastructure/kubernetes/base -name "*.yaml" -type f -exec sed -i.bak '/imagePullSecrets:/,+1d' {} \;
# Verify removal
grep -r "imagePullSecrets" infrastructure/kubernetes/base/
# Should return NO results
Step 2.2: Update Image Tags (Use Semantic Versions)
# Edit kustomization.yaml to replace 'latest' with actual version
nano infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Find the images section (lines 163-196) and update:
# BEFORE:
# - name: bakery/auth-service
# newTag: latest
# AFTER:
# - name: bakery/auth-service
# newTag: v1.0.0
# Do this for ALL 22 services, or use this helper:
export VERSION="1.0.0" # Your version
# Create a script to update all image tags
cat > /tmp/update-tags.sh <<'EOF'
#!/bin/bash
VERSION="${1:-1.0.0}"
sed -i "s/newTag: latest/newTag: v${VERSION}/g" infrastructure/kubernetes/overlays/prod/kustomization.yaml
EOF
chmod +x /tmp/update-tags.sh
/tmp/update-tags.sh ${VERSION}
# Verify no 'latest' tags remain
grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml | grep -c "latest"
# Should return: 0
Step 2.3: Fix SigNoz Namespace References
# Update SigNoz patches to use bakery-ia namespace instead of signoz
sed -i 's/namespace: signoz/namespace: bakery-ia/g' infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Verify changes (should show bakery-ia in all 3 patches)
grep -A 3 "name: signoz" infrastructure/kubernetes/overlays/prod/kustomization.yaml
Step 2.4: Update Cert-Manager Email
# Update Let's Encrypt notification email to your production email
sed -i "s/admin@bakery-ia.local/admin@bakewise.ai/g" \
infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml
Step 2.5: Verify Production Secrets (Already Configured) ✅
# Production secrets have been pre-configured with strong cryptographic passwords
# No manual action required - secrets are already set in secrets.yaml
# Verify the secrets are configured (optional)
echo "Verifying production secrets configuration..."
grep "JWT_SECRET_KEY" infrastructure/kubernetes/base/secrets.yaml | head -1
grep "AUTH_DB_PASSWORD" infrastructure/kubernetes/base/secrets.yaml | head -1
grep "REDIS_PASSWORD" infrastructure/kubernetes/base/secrets.yaml | head -1
echo "✅ All production secrets are configured and ready for deployment"
Production URLs:
- Main Application: https://bakewise.ai
- API Endpoints: https://bakewise.ai/api/v1/...
- SigNoz (Monitoring): https://monitoring.bakewise.ai/signoz
- AlertManager: https://monitoring.bakewise.ai/alertmanager
Configuration & Secrets
Production Secrets Status ✅
All core secrets have been pre-configured with strong cryptographic passwords:
- ✅ Database passwords (19 databases) - 24-character random strings
- ✅ JWT secrets - 256-bit cryptographically secure tokens
- ✅ Service API key - 64-character hexadecimal string
- ✅ Redis password - 24-character random string
- ✅ RabbitMQ password - 24-character random string
- ✅ RabbitMQ Erlang cookie - 64-character hexadecimal string
Step 1: Configure External Service Credentials (Email & WhatsApp)
You still need to update these external service credentials:
# Edit the secrets file
nano infrastructure/kubernetes/base/secrets.yaml
# Update ONLY these external service credentials:
# SMTP settings (from email setup):
SMTP_USER: <base64-encoded-username> # your email
SMTP_PASSWORD: <base64-encoded-password> # app password
# WhatsApp credentials (from WhatsApp setup - optional):
WHATSAPP_API_KEY: <base64-encoded-key>
# Payment processing (from Stripe setup):
STRIPE_SECRET_KEY: <base64-encoded-key>
STRIPE_WEBHOOK_SECRET: <base64-encoded-secret>
To base64 encode:
echo -n "your-value-here" | base64
CRITICAL: Never commit real secrets to git! The secrets.yaml file should be in .gitignore.
Step 2: CI/CD Secrets Configuration
For production CI/CD setup, additional secrets are required:
# Create Docker Hub credentials secret (for image pulls)
kubectl create secret docker-registry dockerhub-creds \
--docker-server=docker.io \
--docker-username=YOUR_DOCKERHUB_USERNAME \
--docker-password=YOUR_DOCKERHUB_TOKEN \
--docker-email=your-email@example.com \
-n bakery-ia
# Create Gitea registry credentials (if using Gitea for CI/CD)
kubectl create secret docker-registry gitea-registry-credentials \
-n tekton-pipelines \
--docker-server=gitea.bakery-ia.local:5000 \
--docker-username=your-username \
--docker-password=your-password
# Create Git credentials for Flux (if using GitOps)
kubectl create secret generic gitea-credentials \
-n flux-system \
--from-literal=username=your-username \
--from-literal=password=your-password
Step 3: Apply Application Secrets
# Copy manifests to VPS (from local machine)
scp -r infrastructure/kubernetes root@YOUR_VPS_IP:~/
# SSH to VPS
ssh root@YOUR_VPS_IP
# Apply application secrets
kubectl apply -f ~/infrastructure/kubernetes/base/secrets.yaml -n bakery-ia
# Verify secrets created
kubectl get secrets -n bakery-ia
# Should show multiple secrets including postgres-tls, redis-tls, app-secrets, etc.
Database Migrations
Step 0: Deploy CI/CD Infrastructure (Optional but Recommended)
For production environments, deploy CI/CD infrastructure components:
# Deploy Tekton Pipelines for CI/CD (optional but recommended for production)
kubectl create namespace tekton-pipelines
# Install Tekton Pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
# Install Tekton Triggers
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
# Apply Tekton configurations
kubectl apply -f ~/infrastructure/cicd/tekton/tasks/
kubectl apply -f ~/infrastructure/cicd/tekton/pipelines/
kubectl apply -f ~/infrastructure/cicd/tekton/triggers/
# Verify Tekton deployment
kubectl get pods -n tekton-pipelines
Step 1: Deploy SigNoz Monitoring (BEFORE Application)
⚠️ CRITICAL: SigNoz must be deployed BEFORE the application into the bakery-ia namespace because the production kustomization patches SigNoz resources.
# On VPS
# 1. Ensure bakery-ia namespace exists
kubectl get namespace bakery-ia || kubectl create namespace bakery-ia
# 2. Add Helm repo
helm repo add signoz https://charts.signoz.io
helm repo update
# 3. Install SigNoz into bakery-ia namespace (NOT separate signoz namespace)
helm install signoz signoz/signoz \
-n bakery-ia \
--set frontend.service.type=ClusterIP \
--set clickhouse.persistence.size=20Gi \
--set clickhouse.persistence.storageClass=microk8s-hostpath
# 4. Wait for SigNoz to be ready (this may take 10-15 minutes)
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/instance=signoz \
-n bakery-ia \
--timeout=900s
# 5. Verify SigNoz components running in bakery-ia namespace
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Should show: signoz-0, signoz-otel-collector, signoz-clickhouse, signoz-zookeeper, signoz-alertmanager
# 6. Verify StatefulSets exist (kustomization will patch these)
kubectl get statefulset -n bakery-ia | grep signoz
# Should show: signoz, signoz-clickhouse
⚠️ Important: Do NOT create a separate signoz namespace. SigNoz must be in bakery-ia namespace for the overlays to work correctly.
Step 2: Deploy Application and Databases
# On VPS
kubectl apply -k ~/infrastructure/kubernetes/overlays/prod
# Wait for databases to be ready (5-10 minutes)
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/component=database \
-n bakery-ia \
--timeout=600s
# Check status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
Step 2: Run Migrations
Migrations are automatically handled by init containers in each service. Verify they completed:
# Check migration job status
kubectl get jobs -n bakery-ia | grep migration
# All should show "COMPLETIONS = 1/1"
# Check logs if any failed
kubectl logs -n bakery-ia job/auth-migration
Step 3: Verify Database Schemas
# Connect to a database to verify
kubectl exec -n bakery-ia deployment/auth-db -it -- psql -U auth_user -d auth_db
# Inside psql:
\dt # List tables
\d users # Describe users table
\q # Quit
Verification & Testing
Step 1: Check All Pods Running
# View all pods
kubectl get pods -n bakery-ia
# Expected: All pods in "Running" state, none in CrashLoopBackOff
# Check for issues
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"
# View logs for any problematic pods
kubectl logs -n bakery-ia POD_NAME
Step 2: Check Services and Ingress
# View services
kubectl get svc -n bakery-ia
# View ingress
kubectl get ingress -n bakery-ia
# View certificates (should auto-issue from Let's Encrypt)
kubectl get certificate -n bakery-ia
# Describe certificate to check status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
Step 3: Test Database Connections
# Test PostgreSQL TLS
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected output: on
# Test Redis TLS
kubectl exec -n bakery-ia deployment/redis -- redis-cli \
--tls \
--cert /tls/redis-cert.pem \
--key /tls/redis-key.pem \
--cacert /tls/ca-cert.pem \
-a $REDIS_PASSWORD \
ping
# Expected output: PONG
Step 4: Test Frontend Access
# Test frontend (replace with your domain)
curl -I https://bakery.yourdomain.com
# Expected: HTTP/2 200 OK
# Test API health
curl https://api.yourdomain.com/health
# Expected: {"status": "healthy"}
Step 5: Test Authentication
# Create a test user (using your frontend or API)
curl -X POST https://api.yourdomain.com/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!",
"name": "Test User"
}'
# Login
curl -X POST https://api.yourdomain.com/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!"
}'
# Expected: JWT token in response
Step 6: Test Email Delivery
# Trigger a password reset to test email
curl -X POST https://api.yourdomain.com/api/v1/auth/forgot-password \
-H "Content-Type: application/json" \
-d '{"email": "test@yourdomain.com"}'
# Check your email inbox for the reset link
# Check service logs if email not received:
kubectl logs -n bakery-ia deployment/auth-service | grep -i "email\|smtp"
Step 7: Test WhatsApp (Optional)
# Send a test WhatsApp message
# This requires creating a tenant and configuring WhatsApp in the UI
# Or test via API once authenticated
Post-Deployment
Step 1: Access SigNoz Monitoring Stack
Your production deployment includes SigNoz, a unified observability platform that provides complete visibility into your application:
What is SigNoz?
SigNoz is an open-source, all-in-one observability platform that provides:
- 📊 Distributed Tracing - See end-to-end request flows across all 18 microservices
- 📈 Metrics Monitoring - Application performance and infrastructure metrics
- 📝 Log Management - Centralized logs from all services with trace correlation
- 🔍 Service Performance Monitoring (SPM) - Automatic RED metrics (Rate, Error, Duration)
- 🗄️ Database Monitoring - All 18 PostgreSQL databases + Redis + RabbitMQ
- ☸️ Kubernetes Monitoring - Cluster, node, pod, and container metrics
Why SigNoz instead of Prometheus/Grafana?
- Single unified UI for traces, metrics, and logs (no context switching)
- Automatic service dependency mapping
- Built-in APM (Application Performance Monitoring)
- Log-trace correlation with one click
- Better query performance with ClickHouse backend
- Modern UI designed for microservices
Production Monitoring URLs
Access via domain:
https://monitoring.bakewise.ai/signoz # SigNoz - Main observability UI
https://monitoring.bakewise.ai/alertmanager # AlertManager - Alert management
Or via port forwarding (if needed):
# SigNoz Frontend (Main UI)
kubectl port-forward -n bakery-ia svc/signoz 8080:8080 &
# Open: http://localhost:8080
# SigNoz AlertManager
kubectl port-forward -n bakery-ia svc/signoz-alertmanager 9093:9093 &
# Open: http://localhost:9093
# OTel Collector (for debugging)
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4317:4317 & # gRPC
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4318:4318 & # HTTP
Key SigNoz Features to Explore
Once you open SigNoz (https://monitoring.bakewise.ai/signoz), explore these tabs:
1. Services Tab - Application Performance
- View all 18 microservices with live metrics
- See request rate, error rate, and latency (P50/P90/P99)
- Click on any service to drill down into operations
- Identify slow endpoints and error-prone operations
2. Traces Tab - Request Flow Visualization
- See complete request journeys across services
- Identify bottlenecks (slow database queries, API calls)
- Debug errors with full stack traces
- Correlate with logs for complete context
3. Dashboards Tab - Infrastructure & Database Metrics
- PostgreSQL - Monitor all 18 databases (connections, queries, cache hit ratio)
- Redis - Cache performance (memory, hit rate, commands/sec)
- RabbitMQ - Message queue health (depth, rates, consumers)
- Kubernetes - Cluster metrics (nodes, pods, containers)
4. Logs Tab - Centralized Log Management
- Search and filter logs from all services
- Click on trace ID in logs to see related request trace
- Auto-enriched with Kubernetes metadata (pod, namespace, container)
- Identify patterns and anomalies
5. Alerts Tab - Proactive Monitoring
- Configure alerts on metrics, traces, or logs
- Email/Slack/Webhook notifications
- View firing alerts and alert history
Quick Health Check
# Verify SigNoz components are running
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Expected output:
# signoz-0 READY 1/1
# signoz-otel-collector-xxx READY 1/1
# signoz-alertmanager-xxx READY 1/1
# signoz-clickhouse-xxx READY 1/1
# signoz-zookeeper-xxx READY 1/1
# Check OTel Collector health
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- wget -qO- http://localhost:13133
# View recent telemetry in OTel Collector logs
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=50 | grep -i "traces\|metrics\|logs"
Verify Telemetry is Working
-
Check Services are Reporting:
# Open SigNoz and navigate to Services tab # You should see all 18 microservices listed # If services are missing, check if they're sending telemetry: kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel" -
Check Database Metrics:
# Navigate to Dashboards → PostgreSQL in SigNoz # You should see metrics from all 18 databases # Verify OTel Collector is scraping databases: kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep postgresql -
Check Traces are Being Collected:
# Make a test API request curl https://bakewise.ai/api/v1/health # Navigate to Traces tab in SigNoz # Search for "gateway" service # You should see the trace for your request -
Check Logs are Being Collected:
# Navigate to Logs tab in SigNoz # Filter by namespace: bakery-ia # You should see logs from all pods # Verify filelog receiver is working: kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep filelog
Step 2: Configure CI/CD Infrastructure (Optional but Recommended)
If you deployed the CI/CD infrastructure, configure it for your workflow:
Gitea Setup (Git Server + Registry)
# Access Gitea at: http://gitea.bakery-ia.local (for dev) or http://gitea.bakewise.ai (for prod)
# Make sure to add the appropriate hostname to /etc/hosts or configure DNS
# Create your repositories for each service
# Configure webhook to trigger Tekton pipelines
Tekton Pipeline Configuration
# Verify Tekton pipelines are running
kubectl get pods -n tekton-pipelines
# Create a PipelineRun manually to test:
kubectl create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: manual-ci-run
namespace: tekton-pipelines
spec:
pipelineRef:
name: bakery-ia-ci
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
- name: docker-credentials
secret:
secretName: gitea-registry-credentials
params:
- name: git-url
value: "http://gitea.bakery-ia.local/bakery-admin/bakery-ia.git"
- name: git-revision
value: "main"
EOF
Flux CD Configuration (GitOps)
# Verify Flux is running
kubectl get pods -n flux-system
# Set up GitRepository and Kustomization resources for GitOps deployment
# Example:
cat <<EOF | kubectl apply -f -
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: bakery-ia
namespace: flux-system
spec:
interval: 1m
url: https://github.com/your-org/bakery-ia.git
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: bakery-ia
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: bakery-ia
path: ./infrastructure/environments/prod/k8s-manifests
prune: true
validation: client
EOF
Step 2: Configure Alerting
SigNoz includes integrated alerting with AlertManager. Configure it for your team:
Update Email Notification Settings
The alerting configuration is in the SigNoz Helm values. To update:
# For production, edit the values file:
nano infrastructure/helm/signoz-values-prod.yaml
# Update the alertmanager.config section:
# 1. Update SMTP settings:
# - smtp_from: 'your-alerts@bakewise.ai'
# - smtp_auth_username: 'your-alerts@bakewise.ai'
# - smtp_auth_password: (use Kubernetes secret)
#
# 2. Update receivers:
# - critical-alerts email: critical-alerts@bakewise.ai
# - warning-alerts email: oncall@bakewise.ai
#
# 3. (Optional) Add Slack webhook for critical alerts
# Apply the updated configuration:
helm upgrade signoz signoz/signoz \
-n bakery-ia \
-f infrastructure/helm/signoz-values-prod.yaml
Create Alerts in SigNoz UI
-
Open SigNoz Alerts Tab:
https://monitoring.bakewise.ai/signoz → Alerts -
Create Common Alerts:
Alert 1: High Error Rate
- Name:
HighErrorRate - Query:
error_rate > 5for5 minutes - Severity:
critical - Description: "Service {{service_name}} has error rate >5%"
Alert 2: High Latency
- Name:
HighLatency - Query:
P99_latency > 3000msfor5 minutes - Severity:
warning - Description: "Service {{service_name}} P99 latency >3s"
Alert 3: Service Down
- Name:
ServiceDown - Query:
request_rate == 0for2 minutes - Severity:
critical - Description: "Service {{service_name}} not receiving requests"
Alert 4: Database Connection Issues
- Name:
DatabaseConnectionsHigh - Query:
pg_active_connections > 80for5 minutes - Severity:
warning - Description: "Database {{database}} connection count >80%"
Alert 5: High Memory Usage
- Name:
HighMemoryUsage - Query:
container_memory_percent > 85for5 minutes - Severity:
warning - Description: "Pod {{pod_name}} using >85% memory"
- Name:
Test Alert Delivery
# Method 1: Create a test alert in SigNoz UI
# Go to Alerts → New Alert → Set a test condition that will fire
# Method 2: Fire a test alert via stress test
kubectl run memory-test --image=polinux/stress --restart=Never \
--namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s
# Check alert appears in SigNoz Alerts tab
# https://monitoring.bakewise.ai/signoz → Alerts
# Also check AlertManager
# https://monitoring.bakewise.ai/alertmanager
# Verify email notification received
# Clean up test
kubectl delete pod memory-test -n bakery-ia
Configure Notification Channels
In SigNoz Alerts tab, configure channels:
-
Email Channel:
- Already configured via AlertManager
- Emails sent to addresses in signoz-values-prod.yaml
-
Slack Channel (Optional):
# Add Slack webhook URL to signoz-values-prod.yaml # Under alertmanager.config.receivers.critical-alerts.slack_configs: # - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL' # channel: '#alerts-critical' -
Webhook Channel (Optional):
- Configure custom webhook for integration with PagerDuty, OpsGenie, etc.
- Add to alertmanager.config.receivers
Step 3: Configure Backups
# Create backup script on VPS
cat > ~/backup-databases.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR
# Get all database pods
DBS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database -o name)
for db in $DBS; do
DB_NAME=$(echo $db | cut -d'/' -f2)
echo "Backing up $DB_NAME..."
kubectl exec -n bakery-ia $db -- pg_dump -U postgres > "$BACKUP_DIR/${DB_NAME}.sql"
done
# Compress backups
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"
# Keep only last 7 days
find /backups -name "*.tar.gz" -mtime +7 -delete
echo "Backup completed: $BACKUP_DIR.tar.gz"
EOF
chmod +x ~/backup-databases.sh
# Test backup
./backup-databases.sh
# Setup daily cron job (2 AM)
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -
Step 3: Setup Alerting
# Update AlertManager configuration with your email
kubectl edit configmap -n monitoring alertmanager-config
# Update recipient emails in the routes section
Step 4: Verify SigNoz Monitoring is Working
Before proceeding, ensure all monitoring components are operational:
# 1. Verify SigNoz pods are running
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Expected pods (all should be Running/Ready):
# - signoz-0 (or signoz-1, signoz-2 for HA)
# - signoz-otel-collector-xxx
# - signoz-alertmanager-xxx
# - signoz-clickhouse-xxx
# - signoz-zookeeper-xxx
# 2. Check SigNoz UI is accessible
curl -I https://monitoring.bakewise.ai/signoz
# Should return: HTTP/2 200 OK
# 3. Verify OTel Collector is receiving data
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=100 | grep -i "received"
# Should show: "Traces received: X" "Metrics received: Y" "Logs received: Z"
# 4. Check ClickHouse database is healthy
kubectl exec -n bakery-ia deployment/signoz-clickhouse -- clickhouse-client --query="SELECT count() FROM system.tables WHERE database LIKE 'signoz_%'"
# Should return a number > 0 (tables exist)
Complete Verification Checklist:
- SigNoz UI loads at https://monitoring.bakewise.ai/signoz
- Services tab shows all 18 microservices with metrics
- Traces tab has sample traces from gateway and other services
- Dashboards tab shows PostgreSQL metrics from all 18 databases
- Dashboards tab shows Redis metrics (memory, commands, etc.)
- Dashboards tab shows RabbitMQ metrics (queues, messages)
- Dashboards tab shows Kubernetes metrics (nodes, pods)
- Logs tab displays logs from all services in bakery-ia namespace
- Alerts tab is accessible and can create new alerts
- AlertManager is reachable at https://monitoring.bakewise.ai/alertmanager
If any checks fail, troubleshoot:
# Check OTel Collector configuration
kubectl describe configmap -n bakery-ia signoz-otel-collector
# Check for errors in OTel Collector
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep -i error
# Check ClickHouse is accepting writes
kubectl logs -n bakery-ia deployment/signoz-clickhouse | grep -i error
# Restart OTel Collector if needed
kubectl rollout restart deployment/signoz-otel-collector -n bakery-ia
Step 5: Document Everything
Create a secure runbook with all credentials and procedures:
Essential Information to Document:
- VPS login credentials (stored securely in password manager)
- Database passwords (in password manager)
- Grafana admin password
- Domain registrar access (for bakewise.ai)
- Cloudflare access
- Email service credentials (SMTP)
- WhatsApp API credentials
- Docker Hub / Registry credentials
- Emergency contact information
- Rollback procedures
- Monitoring URLs and access procedures
Step 6: Train Your Team
Conduct a training session covering SigNoz and operational procedures:
Part 1: SigNoz Navigation (30 minutes)
-
Login and Overview
- Show how to access https://monitoring.bakewise.ai/signoz
- Navigate through main tabs: Services, Traces, Dashboards, Logs, Alerts
- Explain the unified nature of SigNoz (all-in-one platform)
-
Services Tab - Application Performance Monitoring
- Show all 18 microservices
- Explain RED metrics (Request rate, Error rate, Duration/latency)
- Demo: Click on a service → Operations → See endpoint breakdown
- Demo: Identify slow endpoints and high error rates
-
Traces Tab - Request Flow Debugging
- Show how to search for traces by service, operation, or time
- Demo: Click on a trace → See full waterfall (service → database → cache)
- Demo: Find slow database queries in trace spans
- Demo: Click "View Logs" to correlate trace with logs
-
Dashboards Tab - Infrastructure Monitoring
- Navigate to PostgreSQL dashboard → Show all 18 databases
- Navigate to Redis dashboard → Show cache metrics
- Navigate to Kubernetes dashboard → Show node/pod metrics
- Explain what metrics indicate issues (connection %, memory %, etc.)
-
Logs Tab - Log Search and Analysis
- Show how to filter by service, severity, time range
- Demo: Search for "error" in last hour
- Demo: Click on trace_id in log → Jump to related trace
- Show Kubernetes metadata (pod, namespace, container)
-
Alerts Tab - Proactive Monitoring
- Show how to create alerts on metrics
- Review pre-configured alerts
- Show alert history and firing alerts
- Explain how to acknowledge/silence alerts
Part 2: Operational Tasks (30 minutes)
-
Check application logs (multiple ways)
# Method 1: Via kubectl (for immediate debugging) kubectl logs -n bakery-ia deployment/orders-service --tail=100 -f # Method 2: Via SigNoz Logs tab (for analysis and correlation) # 1. Open https://monitoring.bakewise.ai/signoz → Logs # 2. Filter by k8s_deployment_name: orders-service # 3. Click on trace_id to see related request flow -
Restart services when needed
# Restart a service (rolling update, no downtime) kubectl rollout restart deployment/orders-service -n bakery-ia # Verify restart in SigNoz: # 1. Check Services tab → orders-service → Should show brief dip then recovery # 2. Check Logs tab → Filter by orders-service → See restart logs -
Investigate performance issues
# Scenario: "Orders API is slow" # 1. SigNoz → Services → orders-service → Check P99 latency # 2. SigNoz → Traces → Filter service:orders-service, duration:>1s # 3. Click on slow trace → Identify bottleneck (DB query? External API?) # 4. SigNoz → Dashboards → PostgreSQL → Check orders_db connections/queries # 5. Fix identified issue (add index, optimize query, scale service) -
Respond to alerts
- Show how to access alerts in SigNoz → Alerts tab
- Show AlertManager UI at https://monitoring.bakewise.ai/alertmanager
- Review common alerts and their resolution steps
- Reference the Production Operations Guide
Part 3: Documentation and Resources (10 minutes)
-
Share documentation
- PILOT_LAUNCH_GUIDE.md - This guide (deployment)
- PRODUCTION_OPERATIONS_GUIDE.md - Daily operations with SigNoz
- security-checklist.md - Security procedures
-
Bookmark key URLs
- SigNoz: https://monitoring.bakewise.ai/signoz
- AlertManager: https://monitoring.bakewise.ai/alertmanager
- Production app: https://bakewise.ai
-
Setup on-call rotation (if applicable)
- Configure rotation schedule in AlertManager
- Document escalation procedures
- Test alert delivery to on-call phone/email
Part 4: Hands-On Exercise (15 minutes)
Exercise: Investigate a Simulated Issue
- Create a load test to generate traffic
- Use SigNoz to find the slowest endpoint
- Identify the root cause using traces
- Correlate with logs to confirm
- Check infrastructure metrics (DB, memory, CPU)
- Propose a fix based on findings
This trains the team to use SigNoz effectively for real incidents.
Troubleshooting
Issue: Pods Not Starting
# Check pod status
kubectl describe pod POD_NAME -n bakery-ia
# Common causes:
# 1. Image pull errors
kubectl get events -n bakery-ia | grep -i "pull"
# 2. Resource limits
kubectl describe node
# 3. Volume mount issues
kubectl get pvc -n bakery-ia
Issue: Certificate Not Issuing
# Check certificate status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
# Check challenges
kubectl get challenges -n bakery-ia
# Verify DNS is correct
nslookup bakery.yourdomain.com
Issue: Database Connection Errors
# Check database pod
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
# Check database logs
kubectl logs -n bakery-ia deployment/auth-db
# Test connection from service pod
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432
Issue: Services Can't Connect to Databases
# Check if SSL is enabled
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Check service logs for SSL errors
kubectl logs -n bakery-ia deployment/auth-service | grep -i "ssl\|tls"
# Restart service to pick up new SSL config
kubectl rollout restart deployment/auth-service -n bakery-ia
Issue: Out of Resources
# Check node resources
kubectl top nodes
# Check pod resource usage
kubectl top pods -n bakery-ia
# Identify resource hogs
kubectl top pods -n bakery-ia --sort-by=memory
# Scale down non-critical services temporarily
kubectl scale deployment monitoring -n bakery-ia --replicas=0
Next Steps After Successful Launch
-
Monitor for 48 Hours
- Check dashboards daily
- Review error logs
- Monitor resource usage
- Test all functionality
-
Optimize Based on Metrics
- Adjust resource limits if needed
- Fine-tune autoscaling thresholds
- Optimize database queries if slow
-
Onboard First Tenant
- Create test tenant
- Upload sample data
- Test all features
- Gather feedback
-
Scale Gradually
- Add 1-2 tenants at a time
- Monitor resource usage
- Upgrade VPS if needed (see scaling guide)
-
Plan for Growth
- Review PRODUCTION_OPERATIONS_GUIDE.md
- Implement additional monitoring
- Plan capacity upgrades
- Consider managed services for scale
Cost Scaling Path
| Tenants | RAM | CPU | Storage | Monthly Cost |
|---|---|---|---|---|
| 10 | 20 GB | 8 cores | 200 GB | €40-80 |
| 25 | 32 GB | 12 cores | 300 GB | €80-120 |
| 50 | 48 GB | 16 cores | 500 GB | €150-200 |
| 100+ | Consider multi-node cluster or managed K8s | €300+ |
Support Resources
Documentation:
- Operations Guide: PRODUCTION_OPERATIONS_GUIDE.md - Daily operations, monitoring, incident response
- Security Guide: security-checklist.md - Security procedures and compliance
- Database Security: database-security.md - Database operations and TLS configuration
- TLS Configuration: tls-configuration.md - Certificate management
- RBAC Implementation: rbac-implementation.md - Access control
Monitoring Access:
- SigNoz (Primary): https://monitoring.bakewise.ai/signoz - All-in-one observability
- Services: Application performance monitoring (APM)
- Traces: Distributed tracing across all services
- Dashboards: PostgreSQL, Redis, RabbitMQ, Kubernetes metrics
- Logs: Centralized log management with trace correlation
- Alerts: Alert configuration and management
- AlertManager: https://monitoring.bakewise.ai/alertmanager - Alert routing and notifications
External Resources:
- MicroK8s Docs: https://microk8s.io/docs
- Kubernetes Docs: https://kubernetes.io/docs
- Let's Encrypt: https://letsencrypt.org/docs
- Cloudflare DNS: https://developers.cloudflare.com/dns
- SigNoz Documentation: https://signoz.io/docs/
- OpenTelemetry Documentation: https://opentelemetry.io/docs/
Monitoring Architecture:
- OpenTelemetry: Industry-standard instrumentation framework
- Auto-instruments FastAPI, HTTPX, SQLAlchemy, Redis
- Collects traces, metrics, and logs from all services
- Exports to SigNoz via OTLP protocol (gRPC port 4317, HTTP port 4318)
- SigNoz Components:
- Frontend: Web UI for visualization and analysis
- OTel Collector: Receives and processes telemetry data
- ClickHouse: Time-series database for fast queries
- AlertManager: Alert routing and notification delivery
- Zookeeper: Coordination service for ClickHouse cluster
Summary Checklist
Pre-Deployment Configuration (LOCAL MACHINE)
- Production secrets configured - ✅ JWT, database passwords, API keys (ALREADY DONE)
- External service credentials - Update SMTP, WhatsApp, Stripe in secrets.yaml
- imagePullSecrets removed - Delete from all 67 manifests
- Image tags updated - Change all 'latest' to v1.0.0 (semantic version)
- SigNoz namespace fixed - ✅ Already done (bakery-ia namespace)
- Cert-manager email updated - ✅ Already set to admin@bakewise.ai
- Stripe publishable key updated - Replace
pk_test_...with production key in configmap.yaml - Pilot mode verified - ✅ VITE_PILOT_MODE_ENABLED=true (default is correct)
- Manifests validated - No 'latest' tags, no imagePullSecrets remaining
Infrastructure Setup
- VPS provisioned and accessible
- k3s (or Kubernetes) installed and configured
- nginx-ingress-controller installed
- metrics-server installed and working
- cert-manager installed
- local-path-provisioner installed
- Domain registered and DNS configured
- Cloudflare protection enabled (optional but recommended)
Secrets and Configuration
- TLS certificates generated (postgres, redis)
- Email service configured and tested
- WhatsApp API setup (optional for launch)
- Container images built and pushed with version tags
- Production configs verified (domains, CORS, storage class)
- Strong passwords generated for all services
- Docker registry secret created (dockerhub-creds)
- Application secrets applied
Monitoring
- SigNoz deployed via Helm
- SigNoz pods running and healthy
- signoz namespace created
Application Deployment
- All pods running successfully
- Databases accepting TLS connections
- Let's Encrypt certificates issued
- Frontend accessible via HTTPS
- API health check passing
- Test user can login
- Email delivery working
- SigNoz monitoring accessible
- Metrics flowing to SigNoz
- Pilot coupon verified - Check tenant-service logs for "Pilot coupon created successfully"
Post-Deployment
- Backups configured and tested
- Team trained on operations
- Documentation complete
- Emergency procedures documented
- Monitoring alerts configured
🎉 Congratulations! Your Bakery-IA platform is now live in production!
Estimated total time: 2-4 hours for first deployment Subsequent updates: 15-30 minutes
Document Version: 2.1 Last Updated: 2026-01-20 Maintained By: DevOps Team Changes in v2.1:
- Updated DNS configuration for Namecheap (primary) with Cloudflare as optional
- Clarified MicroK8s ingress class is
public(notnginx) - Updated Let's Encrypt ClusterIssuer documentation to reference pre-configured files
- Added firewall requirements for clouding.io VPS
- Emphasized port 80/443 requirements for HTTP-01 challenges
Changes in v2.0:
- Added critical pre-deployment fixes section
- Updated infrastructure setup for MicroK8s
- Added required component installation (nginx-ingress, metrics-server, etc.)
- Updated configuration steps with domain replacement
- Added Docker registry secret creation
- Added SigNoz Helm deployment before application
- Updated storage class configuration
- Added image tag version requirements
- Expanded verification checklist