112 KiB
Bakery-IA Pilot Launch Guide
Complete guide for deploying to production for a 10-tenant pilot program
Last Updated: 2026-01-20 Target Environment: clouding.io VPS with MicroK8s Estimated Cost: €41-81/month Time to Deploy: 3-5 hours (first time, including fixes) Status: ⚠️ REQUIRES PRE-DEPLOYMENT FIXES - See Production VPS Deployment Fixes Version: 3.0
Table of Contents
- Executive Summary
- Infrastructure Architecture Overview
- ⚠️ CRITICAL: Pre-Deployment Fixes
- Pre-Launch Checklist
- VPS Provisioning
- Infrastructure Setup
- Domain & DNS Configuration
- TLS/SSL Certificates
- Email & Communication Setup
- Kubernetes Deployment
- Configuration & Secrets
- Database Migrations
- CI/CD Infrastructure Deployment
- Mailu Email Server Deployment
- Nominatim Geocoding Service
- SigNoz Monitoring Deployment
- Verification & Testing
- Post-Deployment
Executive Summary
What You're Deploying
A complete multi-tenant SaaS platform with:
- 18 microservices (auth, tenant, ML forecasting, inventory, sales, orders, etc.)
- 14 PostgreSQL databases with TLS encryption
- Redis cache with TLS
- RabbitMQ message broker
- Monitoring stack (Prometheus, Grafana, AlertManager)
- Full security (TLS, RBAC, audit logging)
Total Cost Breakdown
| Service | Provider | Monthly Cost |
|---|---|---|
| VPS Server (20GB RAM, 8 vCPU, 200GB SSD) | clouding.io | €40-80 |
| Domain | Namecheap/Cloudflare | €1.25 (€15/year) |
| Zoho Free / Gmail | €0 | |
| WhatsApp API | Meta Business | €0 (1k free conversations) |
| DNS | Cloudflare | €0 |
| SSL | Let's Encrypt | €0 |
| TOTAL | €41-81/month |
Timeline
| Phase | Duration | Description |
|---|---|---|
| Pre-Launch Setup | 1-2 hours | Domain, VPS provisioning, accounts setup |
| Infrastructure Setup | 1 hour | MicroK8s installation, firewall config |
| Deployment | 30-60 min | Deploy all services and databases |
| Verification | 30-60 min | Test everything works |
| Total | 2-4 hours | First-time deployment |
Infrastructure Architecture Overview
Component Layers
The Bakery-IA platform is organized into distinct infrastructure layers, each with specific deployment dependencies.
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 6: APPLICATION │
│ Frontend │ Gateway │ 18 Microservices │ CronJobs & Workers │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 5: MONITORING │
│ SigNoz (Unified Observability) │ AlertManager │ OTel Collector │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 4: PLATFORM SERVICES (Optional) │
│ Mailu (Email) │ Nominatim (Geocoding) │ CI/CD (Tekton, Flux, Gitea) │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 3: DATA & STORAGE │
│ PostgreSQL (18 DBs) │ Redis │ RabbitMQ │ MinIO │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 2: NETWORK & SECURITY │
│ CoreDNS (DNS-over-TLS) │ Ingress Controller │ Cert-Manager │ TLS │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 1: FOUNDATION │
│ Namespaces │ Storage Classes │ RBAC │ ConfigMaps │ Secrets │
├─────────────────────────────────────────────────────────────────────────────┤
│ LAYER 0: KUBERNETES CLUSTER │
│ MicroK8s (Production) │ Kind (Local Dev) │ EKS (AWS Alternative) │
└─────────────────────────────────────────────────────────────────────────────┘
Deployment Order & Dependencies
Components must be deployed in a specific order due to dependencies:
1. Namespaces (bakery-ia, tekton-pipelines, flux-system)
↓
2. Cert-Manager & ClusterIssuers
↓
3. TLS Certificates (internal + ingress)
↓
4. CoreDNS Configuration (DNS-over-TLS for DNSSEC)
↓
5. Ingress Controller & Resources
↓
7. Data Layer: PostgreSQL, Redis, RabbitMQ, MinIO
↓
8. Database Migrations
↓
9. Application Services (18 microservices)
↓
10. Gateway & Frontend
↓
11. (Optional) CI/CD: Gitea → Tekton → Flux
↓
12. (Optional) Mailu Email Server
↓
13. (Optional) Nominatim Geocoding
↓
14. (Optional) SigNoz Monitoring
Infrastructure Components Summary
| Component | Purpose | Required | Namespace |
|---|---|---|---|
| MicroK8s | Kubernetes cluster | Yes | - |
| Cert-Manager | TLS certificate management | Yes | cert-manager |
| Ingress-Nginx | External traffic routing | Yes | ingress |
| PostgreSQL | 18 service databases | Yes | bakery-ia |
| Redis | Caching & sessions | Yes | bakery-ia |
| RabbitMQ | Message broker | Yes | bakery-ia |
| MinIO | Object storage (ML models) | Yes | bakery-ia |
| Mailu | Self-hosted email server | Optional | bakery-ia |
| Nominatim | Geocoding service | Optional | bakery-ia |
| Gitea | Git server + container registry | Optional | gitea |
| Tekton | CI/CD pipelines | Optional | tekton-pipelines |
| Flux CD | GitOps deployment | Optional | flux-system |
| SigNoz | Unified observability | Recommended | bakery-ia |
Quick Reference: What to Deploy
Minimal Production Setup:
- Kubernetes cluster + addons
- Core infrastructure (databases, cache, broker)
- Application services
- External email (Zoho/Gmail)
Full Production Setup (Recommended):
- Everything above, plus:
- Mailu (self-hosted email)
- SigNoz (monitoring)
- CI/CD (Gitea + Tekton + Flux)
- Nominatim (if geocoding needed)
⚠️ CRITICAL: Pre-Deployment Configuration
READ THIS FIRST: Review and complete these configuration steps before deploying to production.
Infrastructure Architecture (Updated)
The infrastructure has been reorganized with the following structure:
infrastructure/
├── environments/ # Environment-specific configs
│ ├── common/configs/ # Shared ConfigMaps and Secrets
│ │ ├── configmap.yaml # Application configuration
│ │ ├── secrets.yaml # All secrets (database, JWT, Redis, etc.)
│ │ └── kustomization.yaml
│ ├── dev/k8s-manifests/ # Development Kustomization
│ └── prod/k8s-manifests/ # Production Kustomization & patches
├── platform/ # Platform-level infrastructure
│ ├── cert-manager/ # TLS certificate issuers (Let's Encrypt)
│ ├── networking/ingress/ # NGINX ingress (base + overlays)
│ ├── storage/ # PostgreSQL, Redis, MinIO
│ ├── gateway/ # API Gateway service
│ └── mail/mailu-helm/ # Email server (Helm chart)
├── services/ # Application services
│ ├── databases/ # 19 PostgreSQL database instances
│ └── microservices/ # 19 microservices
├── cicd/ # CI/CD (deployed via Helm, NOT kustomize)
│ ├── gitea/ # Git server + container registry
│ ├── tekton-helm/ # CI pipelines
│ └── flux/ # GitOps deployment
└── monitoring/signoz/ # SigNoz observability (via Helm)
🔴 Configuration Status
| Item | Status | File Location |
|---|---|---|
| Production Secrets | ✅ Configured | infrastructure/environments/common/configs/secrets.yaml |
| Cert-Manager Email | ✅ Configured | infrastructure/platform/cert-manager/cluster-issuer-production.yaml |
| SigNoz Namespace | ✅ Uses bakery-ia | infrastructure/environments/prod/k8s-manifests/kustomization.yaml |
| imagePullSecrets | ✅ Auto-patched | Production kustomization adds gitea-registry-secret automatically |
| Image Tags | ⚠️ Update for releases | infrastructure/environments/prod/k8s-manifests/kustomization.yaml |
| Stripe Keys | ⚠️ Configure before launch | ConfigMap + Secrets |
| Pilot Coupon | ✅ Auto-seeded | app/jobs/startup_seeder.py |
Required Configuration Changes
1. imagePullSecrets - ✅ AUTOMATICALLY HANDLED
Status: ✅ The production kustomization automatically patches all workloads
File: infrastructure/environments/prod/k8s-manifests/kustomization.yaml
The production overlay adds gitea-registry-secret to all Deployments, StatefulSets, Jobs, and CronJobs via Kustomize patches:
patches:
- target:
kind: Deployment
patch: |-
- op: add
path: /spec/template/spec/imagePullSecrets
value:
- name: gitea-registry-secret
Note: The gitea-registry-secret is created by infrastructure/cicd/gitea/sync-registry-secret.sh after Gitea deployment.
2. Update Image Tags to Semantic Versions (FOR RELEASES)
Why: Using 'latest' causes non-deterministic deployments
Impact if skipped: Unpredictable behavior, impossible rollbacks
File: infrastructure/environments/prod/k8s-manifests/kustomization.yaml
For production releases, update the images: section from latest to semantic versions.
3. Production Secrets - ✅ ALREADY CONFIGURED
Status: ✅ Strong production secrets have been pre-generated
File: infrastructure/environments/common/configs/secrets.yaml
Pre-configured secrets include:
- 19 database passwords (24-character URL-safe random strings)
- JWT secrets (256-bit cryptographically secure)
- Redis password (24-character random string)
- RabbitMQ credentials
- PostgreSQL monitoring user for SigNoz metrics collection
4. Cert-Manager Email - ✅ ALREADY CONFIGURED
Status: ✅ Email set to admin@bakewise.ai
File: infrastructure/platform/cert-manager/cluster-issuer-production.yaml
5. Update Stripe Keys (HIGH PRIORITY)
Why: Payment processing requires production Stripe keys Impact if skipped: Payments will use test mode (no real charges)
ConfigMap (infrastructure/environments/common/configs/configmap.yaml):
VITE_STRIPE_PUBLISHABLE_KEY: "pk_live_XXXXXXXXXXXXXXXXXXXX"
Secrets (infrastructure/environments/common/configs/secrets.yaml):
# Add to payment-secrets section (base64 encoded)
STRIPE_SECRET_KEY: <base64-encoded-secret-key>
STRIPE_WEBHOOK_SECRET: <base64-encoded-webhook-secret>
Get your keys from: https://dashboard.stripe.com/apikeys
6. Pilot Coupon Configuration - ✅ AUTO-SEEDED
Status: ✅ Automatically created when tenant-service starts
How it works: app/jobs/startup_seeder.py creates the PILOT2025 coupon
Default pilot settings (in configmap, can be customized):
VITE_PILOT_MODE_ENABLED: "true"- Enables pilot UI featuresVITE_PILOT_COUPON_CODE: "PILOT2025"- Coupon code for 3 months freeVITE_PILOT_TRIAL_MONTHS: "3"- Trial extension duration
✅ Already Correct (No Changes Needed)
- Storage Class - Uses MicroK8s default storage provisioner
- Domain Names -
bakewise.aiconfigured in production overlay - Service Types - ClusterIP + Ingress is correct architecture
- Network Policies - Defined in
infrastructure/platform/security/network-policies/ - SigNoz Namespace - ✅ Uses
bakery-ianamespace (unified with application) - OTEL Configuration - ✅ Pre-configured for SigNoz in production patches
- Replica Counts - ✅ Production replicas defined in kustomization (2-3 per service)
Step-by-Step Configuration Script
Run these commands on your local machine before deployment:
# Navigate to repository root
cd /path/to/bakery-ia
# ========================================
# STEP 1: Verify Infrastructure Structure
# ========================================
echo "Step 1: Verifying new infrastructure structure..."
echo "Checking directories..."
ls -d infrastructure/environments/common/configs/ && echo "✅ Common configs"
ls -d infrastructure/environments/prod/k8s-manifests/ && echo "✅ Prod kustomization"
ls -d infrastructure/platform/cert-manager/ && echo "✅ Cert-manager"
ls -d infrastructure/cicd/gitea/ && echo "✅ Gitea CI/CD"
# ========================================
# STEP 2: Update Image Tags (for releases)
# ========================================
echo -e "\nStep 2: Updating image tags for release..."
export VERSION="1.0.0" # Change this to your version
# Update application image tags in production kustomization
sed -i.bak "s/newTag: latest/newTag: v${VERSION}/g" \
infrastructure/environments/prod/k8s-manifests/kustomization.yaml
# Verify (show first 10 image entries)
echo "Current image tags:"
grep -A 1 "name: bakery/" infrastructure/environments/prod/k8s-manifests/kustomization.yaml | head -20
# ========================================
# STEP 3: Verify Production Secrets
# ========================================
echo -e "\nStep 3: Verifying production secrets..."
echo "✅ Production secrets are pre-configured with strong passwords:"
echo " - 19 database passwords (24-char URL-safe random)"
echo " - JWT secrets (256-bit cryptographically secure)"
echo " - Redis password (24-char random)"
echo " - RabbitMQ credentials"
echo " - PostgreSQL monitoring user for SigNoz"
echo ""
echo "Location: infrastructure/environments/common/configs/secrets.yaml"
# Quick verification
grep -c "_DB_PASSWORD:" infrastructure/environments/common/configs/secrets.yaml
echo "database password entries found"
# ========================================
# STEP 4: Verify Cert-Manager Email
# ========================================
echo -e "\nStep 4: Verifying cert-manager email..."
grep "email:" infrastructure/platform/cert-manager/cluster-issuer-production.yaml
# Should show: email: admin@bakewise.ai
# ========================================
# STEP 5: Verify imagePullSecrets Patch
# ========================================
echo -e "\nStep 5: Verifying imagePullSecrets configuration..."
grep -A 5 "gitea-registry-secret" infrastructure/environments/prod/k8s-manifests/kustomization.yaml && \
echo "✅ imagePullSecrets patch configured" || \
echo "⚠️ WARNING: imagePullSecrets patch missing"
# ========================================
# STEP 6: Configure Stripe Keys (MANUAL)
# ========================================
echo -e "\nStep 6: Stripe Configuration..."
echo "================================================================"
echo "⚠️ MANUAL STEP REQUIRED"
echo ""
echo "1. Edit ConfigMap:"
echo " File: infrastructure/environments/common/configs/configmap.yaml"
echo " Add: VITE_STRIPE_PUBLISHABLE_KEY: \"pk_live_XXXX\""
echo ""
echo "2. Edit Secrets:"
echo " File: infrastructure/environments/common/configs/secrets.yaml"
echo " Add to payment-secrets (base64 encoded):"
echo " STRIPE_SECRET_KEY: <base64-encoded>"
echo " STRIPE_WEBHOOK_SECRET: <base64-encoded>"
echo ""
echo "Get keys from: https://dashboard.stripe.com/apikeys"
echo "================================================================"
read -p "Press Enter when Stripe keys are configured..."
# ========================================
# STEP 7: Validate Kustomization Build
# ========================================
echo -e "\nStep 7: Validating Kustomization..."
cd infrastructure/environments/prod/k8s-manifests
kustomize build . > /dev/null 2>&1 && \
echo "✅ Kustomization builds successfully" || \
echo "⚠️ WARNING: Kustomization build failed"
cd - > /dev/null
# ========================================
# FINAL VALIDATION
# ========================================
echo -e "\n========================================"
echo "Pre-Deployment Configuration Complete!"
echo "========================================"
echo ""
echo "Validation Checklist:"
echo " ✅ Infrastructure structure verified"
echo " ✅ Image tags updated to v${VERSION}"
echo " ✅ Production secrets pre-configured"
echo " ✅ Cert-manager email: admin@bakewise.ai"
echo " ✅ imagePullSecrets auto-patched via Kustomize"
echo " ⚠️ Stripe keys configured (manual verification)"
echo " ✅ Pilot coupon auto-seeded on startup"
echo ""
echo "Next Steps:"
echo " 1. Deploy CI/CD: Gitea, Tekton, Flux (via Helm)"
echo " 2. Push images to Gitea registry"
echo " 3. Apply Kustomization to cluster"
Manual Verification
After running the script above:
-
Verify production secrets are configured:
# Check secrets file has strong passwords head -80 infrastructure/environments/common/configs/secrets.yaml # Should show base64-encoded passwords for all 19 databases -
Check image tags in production overlay:
grep "newTag:" infrastructure/environments/prod/k8s-manifests/kustomization.yaml | head -10 # For releases: should show v1.0.0 (your version) # For development: 'latest' is acceptable -
Verify imagePullSecrets patch:
grep -B 2 -A 6 "imagePullSecrets" infrastructure/environments/prod/k8s-manifests/kustomization.yaml # Should show patches for Deployment, StatefulSet, Job, CronJob -
Verify OTEL/SigNoz configuration:
grep "OTEL_EXPORTER" infrastructure/environments/prod/k8s-manifests/kustomization.yaml # Should show: http://signoz-otel-collector.bakery-ia.svc.cluster.local:4317 -
Test Kustomize build:
cd infrastructure/environments/prod/k8s-manifests kustomize build . | kubectl apply --dry-run=client -f - # Should complete without errors
Key File Locations Reference
| Configuration | File Path |
|---|---|
| ConfigMap | infrastructure/environments/common/configs/configmap.yaml |
| Secrets | infrastructure/environments/common/configs/secrets.yaml |
| Prod Kustomization | infrastructure/environments/prod/k8s-manifests/kustomization.yaml |
| Cert-Manager Issuer | infrastructure/platform/cert-manager/cluster-issuer-production.yaml |
| Ingress | infrastructure/platform/networking/ingress/base/ingress.yaml |
| Gitea Values | infrastructure/cicd/gitea/values.yaml |
| Mailu Values | infrastructure/platform/mail/mailu-helm/values.yaml |
Pre-Launch Checklist
Required Accounts & Services
-
Domain Name
- Register at Namecheap or Cloudflare (€10-15/year)
- Suggested:
bakeryforecast.esorbakery-ia.com
-
VPS Account
- Sign up at clouding.io
- Payment method configured
-
Email Service - Self-hosted Mailu with Mailgun relay
- Mailu deployed via Helm chart (see Mailu Email Server Deployment)
- Mailgun account for outbound relay (improves deliverability)
- DNS records configured (MX, SPF, DKIM, DMARC)
-
WhatsApp Business API
- Create Meta Business Account (free)
- Verify business identity
- Phone number ready (non-VoIP)
-
DNS Access
- Cloudflare account (free, recommended)
- Or domain registrar DNS panel access
-
Container Registry (Choose ONE)
- Option A: Docker Hub account (recommended)
- Option B: GitHub Container Registry
- Option C: MicroK8s built-in registry
Required Tools on Local Machine
# Verify you have these installed:
kubectl version --client
docker --version
git --version
ssh -V
openssl version
# Install if missing (macOS):
brew install kubectl docker git openssh openssl
Repository Setup
# Clone the repository
git clone https://github.com/yourusername/bakery-ia.git
cd bakery-ia
# Verify structure
ls infrastructure/kubernetes/overlays/prod/
VPS Provisioning
Recommended Configuration
For 10-tenant pilot program:
- RAM: 20 GB
- CPU: 8 vCPU cores
- Storage: 200 GB NVMe SSD (triple replica)
- Network: 1 Gbps connection
- OS: Ubuntu 22.04 LTS
- Monthly Cost: €40-80 (check current pricing)
Why These Specs?
Memory Breakdown:
- Application services: 14.1 GB
- Databases (18 instances): 4.6 GB
- Infrastructure (Redis, RabbitMQ): 0.8 GB
- Gateway/Frontend: 1.8 GB
- Monitoring: 1.5 GB
- System overhead: ~3 GB
- Total: ~26 GB capacity needed, 20 GB is sufficient with HPA
Storage Breakdown:
- Databases: 36 GB (18 × 2GB)
- ML Models: 10 GB
- Redis: 1 GB
- RabbitMQ: 2 GB
- Prometheus metrics: 20 GB
- Container images: ~30 GB
- Growth buffer: 100 GB
- Total: 199 GB
Provisioning Steps
-
Create VPS at clouding.io:
1. Log in to clouding.io dashboard 2. Click "Create New Server" 3. Select: - OS: Ubuntu 22.04 LTS - RAM: 20 GB - CPU: 8 vCPU - Storage: 200 GB NVMe SSD - Location: Barcelona (best for Spain) 4. Set hostname: bakery-ia-prod-01 5. Add SSH key (or use password) 6. Create server -
Note your server details:
# Save these for later: VPS_IP="YOUR_VPS_IP_ADDRESS" VPS_ROOT_PASSWORD="YOUR_ROOT_PASSWORD" # If not using SSH key -
Initial SSH connection:
# Test connection ssh root@$VPS_IP # Update system apt update && apt upgrade -y
Infrastructure Setup
Step 1: Install MicroK8s
Using MicroK8s for production VPS deployment on clouding.io
# SSH into your VPS
ssh root@$VPS_IP
# Update system
apt update && apt upgrade -y
# Install MicroK8s
snap install microk8s --classic --channel=1.28/stable
# Add your user to microk8s group
usermod -a -G microk8s $USER
chown -f -R $USER ~/.kube
newgrp microk8s
# Verify installation
microk8s status --wait-ready
Step 2: Enable Required MicroK8s Addons
All required components are available as MicroK8s addons:
# Enable core addons
microk8s enable dns # DNS resolution within cluster
microk8s enable hostpath-storage # Provides microk8s-hostpath storage class
microk8s enable ingress # Nginx ingress controller (uses class "public")
microk8s enable cert-manager # Let's Encrypt SSL certificates
microk8s enable metrics-server # For HPA autoscaling
microk8s enable rbac # Role-based access control
# Setup kubectl alias
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc
# Verify all components are running
kubectl get nodes
# Should show: Ready
kubectl get storageclass
# Should show: microk8s-hostpath (default)
kubectl get pods -A
# Should show pods in: kube-system, ingress, cert-manager namespaces
# Verify ingress controller is running
kubectl get pods -n ingress
# Should show: nginx-ingress-microk8s-controller-xxx Running
# Verify cert-manager is running
kubectl get pods -n cert-manager
# Should show: cert-manager-xxx, cert-manager-webhook-xxx, cert-manager-cainjector-xxx
# Verify metrics-server is working
kubectl top nodes
# Should return CPU/Memory metrics
Important - MicroK8s Ingress Class:
- MicroK8s ingress addon uses class name
public(NOTnginx) - The ClusterIssuers in this repo are already configured with
class: public - If you see cert-manager challenges failing, verify the ingress class matches
Optional but Recommended:
# Enable Prometheus for additional monitoring (optional)
microk8s enable prometheus
# Enable registry if you want local image storage (optional)
microk8s enable registry
Step 3: Enhanced Infrastructure Components
The platform includes additional infrastructure components that enhance security, monitoring, and operations:
# The platform includes Mailu for email services
# Deploy Mailu via Helm (optional but recommended for production):
kubectl create namespace bakery-ia --dry-run=client -o yaml | kubectl apply -f -
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update
helm install mailu mailu/mailu \
-n bakery-ia \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
--timeout 10m \
--wait
# Verify Mailu deployment
kubectl get pods -n bakery-ia | grep mailu
For development environments, ensure the prepull-base-images script is run:
# On your local machine, run the prepull script to cache base images
cd bakery-ia
chmod +x scripts/prepull-base-images.sh
./scripts/prepull-base-images.sh
For production environments, ensure CI/CD infrastructure is properly configured:
# Tekton Pipelines for CI/CD (optional - can be deployed separately)
kubectl create namespace tekton-pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
# Flux CD for GitOps (already enabled in MicroK8s if needed)
# flux install --namespace=flux-system --network-policy=false
Step 4: Configure Firewall
CRITICAL: Ports 80 and 443 must be open for Let's Encrypt HTTP-01 challenges to work.
# Allow necessary ports
ufw allow 22/tcp # SSH
ufw allow 80/tcp # HTTP - REQUIRED for Let's Encrypt HTTP-01 challenge
ufw allow 443/tcp # HTTPS - For your application traffic
ufw allow 16443/tcp # Kubernetes API (optional, for remote kubectl access)
# Enable firewall
ufw enable
# Check status
ufw status verbose
# Expected output should include:
# 80/tcp ALLOW Anywhere
# 443/tcp ALLOW Anywhere
Also check clouding.io firewall:
- Log in to clouding.io dashboard
- Go to your VPS → Firewall settings
- Ensure ports 80 and 443 are allowed from anywhere (0.0.0.0/0)
Step 5: Create Namespace
# Create bakery-ia namespace
kubectl create namespace bakery-ia
# Verify
kubectl get namespaces
Domain & DNS Configuration
Step 1: Register Domain at Namecheap
- Go to Namecheap
- Search for your desired domain (e.g.,
bakewise.ia) - Complete purchase (~€10-15/year)
- Save domain credentials
Step 2: Configure DNS at Namecheap
-
Access DNS settings:
1. Log in to Namecheap 2. Go to Domain List → Manage → Advanced DNS -
Add DNS records pointing to your VPS:
Type Host Value TTL A @ YOUR_VPS_IP Automatic A * YOUR_VPS_IP AutomaticThis points both
bakewise.iaand all subdomains (*.bakewise.ia) to your VPS. -
Test DNS propagation:
# Wait 5-10 minutes, then test nslookup bakewise.ia nslookup api.bakewise.ia nslookup mail.bakewise.ia
Step 3 (Optional): Configure Cloudflare DNS
-
Add site to Cloudflare:
1. Log in to Cloudflare 2. Click "Add a Site" 3. Enter your domain name 4. Choose Free plan 5. Cloudflare will scan existing DNS records -
Update nameservers at registrar:
Point your domain's nameservers to Cloudflare: - NS1: assigned.cloudflare.com - NS2: assigned.cloudflare.com (Cloudflare will provide the exact values) -
Add DNS records:
Type Name Content TTL Proxy A @ YOUR_VPS_IP Auto Yes A www YOUR_VPS_IP Auto Yes A api YOUR_VPS_IP Auto Yes A monitoring YOUR_VPS_IP Auto Yes CNAME * yourdomain.com Auto No -
Configure SSL/TLS mode:
SSL/TLS tab → Overview → Set to "Full (strict)" -
Test DNS propagation:
# Wait 5-10 minutes, then test nslookup yourdomain.com nslookup api.yourdomain.com
TLS/SSL Certificates
Understanding Certificate Setup
The platform uses two layers of SSL/TLS:
- External (Ingress) SSL: Let's Encrypt for public HTTPS
- Internal (Database) SSL: Self-signed certificates for database connections
Step 1: Generate Internal Certificates
# On your local machine
cd infrastructure/tls
# Generate certificates
./generate-certificates.sh
# This creates:
# - ca/ (Certificate Authority)
# - postgres/ (PostgreSQL server certs)
# - redis/ (Redis server certs)
Certificate Details:
- Root CA: 10-year validity (expires 2035)
- Server certs: 3-year validity (expires October 2028)
- Algorithm: RSA 4096-bit
- Signature: SHA-256
Step 2: Create Kubernetes Secrets
# Create PostgreSQL TLS secret
kubectl create secret generic postgres-tls \
--from-file=server-cert.pem=infrastructure/tls/postgres/server-cert.pem \
--from-file=server-key.pem=infrastructure/tls/postgres/server-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/postgres/ca-cert.pem \
-n bakery-ia
# Create Redis TLS secret
kubectl create secret generic redis-tls \
--from-file=redis-cert.pem=infrastructure/tls/redis/redis-cert.pem \
--from-file=redis-key.pem=infrastructure/tls/redis/redis-key.pem \
--from-file=ca-cert.pem=infrastructure/tls/redis/ca-cert.pem \
-n bakery-ia
# Verify secrets created
kubectl get secrets -n bakery-ia | grep tls
Step 3: Configure Let's Encrypt (External SSL)
cert-manager is already enabled via microk8s enable cert-manager. The ClusterIssuer is pre-configured in the repository.
Important: MicroK8s ingress addon uses ingress class public (not nginx). This is already configured in:
infrastructure/platform/cert-manager/cluster-issuer-production.yamlinfrastructure/platform/cert-manager/cluster-issuer-staging.yaml
# On VPS, apply the pre-configured ClusterIssuers
kubectl apply -k infrastructure/platform/cert-manager/
# Verify ClusterIssuers are ready
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-production
# Expected output:
# NAME READY AGE
# letsencrypt-production True 1m
# letsencrypt-staging True 1m
Configuration details (already set):
- Email:
admin@bakewise.ai(receives Let's Encrypt expiry notifications) - Ingress class:
public(MicroK8s default) - Challenge type: HTTP-01 (requires port 80 open)
If you need to customize the email, edit before applying:
# Edit the production issuer
nano infrastructure/platform/cert-manager/cluster-issuer-production.yaml
# Change: email: admin@bakewise.ai → email: your-email@yourdomain.com
Email & Communication Setup
Self-Hosted Mailu with Mailgun Relay
Architecture:
- Mailu - Self-hosted email server (Postfix, Dovecot, Rspamd, Roundcube webmail)
- Mailgun - External SMTP relay for improved outbound deliverability
- Helm deployment -
infrastructure/platform/mail/mailu-helm/
Features:
- ✅ Full control over email infrastructure
- ✅ Mailgun relay improves deliverability (avoids VPS IP reputation issues)
- ✅ Built-in antispam (rspamd) with DNSSEC validation
- ✅ Webmail interface (Roundcube) at
/webmail - ✅ Admin panel at
/admin - ✅ IMAP/SMTP with TLS
- ✅ Professional addresses: admin@bakewise.ai, noreply@bakewise.ai
Configuration Files:
| File | Purpose |
|---|---|
infrastructure/platform/mail/mailu-helm/values.yaml |
Base Mailu configuration |
infrastructure/platform/mail/mailu-helm/prod/values.yaml |
Production overrides |
infrastructure/platform/mail/mailu-helm/configs/mailgun-credentials-secret.yaml |
Mailgun SMTP credentials |
Internal SMTP for Application Services:
# Services use Mailu's internal postfix for sending
SMTP_HOST: mailu-postfix.bakery-ia.svc.cluster.local
SMTP_PORT: 587
Prerequisites
Before deploying Mailu, ensure:
- CoreDNS is configured with DNS-over-TLS for DNSSEC validation
- DNS records are configured for your domain
Step 1: Configure DNS Records
Add these DNS records for your domain (e.g., bakewise.ai):
Type Name Value TTL
A mail YOUR_VPS_IP Auto
MX @ mail.bakewise.ai (priority 10) Auto
TXT @ v=spf1 mx a -all Auto
TXT _dmarc v=DMARC1; p=reject; rua=... Auto
DKIM record will be generated after Mailu is running - you'll add it later.
Step 2: Configure CoreDNS for DNSSEC (DNS-over-TLS)
Mailu requires DNSSEC validation. Configure CoreDNS to use DNS-over-TLS with Cloudflare:
# Check if CoreDNS is already configured with DNS-over-TLS
kubectl get configmap coredns -n kube-system -o jsonpath='{.data.Corefile}' | grep -o 'tls://1.1.1.1' || echo "Not configured"
# If not configured, update CoreDNS
cat > /tmp/coredns-corefile.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . tls://1.1.1.1 tls://1.0.0.1 {
tls_servername cloudflare-dns.com
health_check 5s
}
cache 30 {
disable success cluster.local
disable denial cluster.local
}
loop
reload
loadbalance
}
EOF
kubectl apply -f /tmp/coredns-corefile.yaml
# Restart CoreDNS to apply changes
kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system --timeout=60s
# Get CoreDNS service IP (needed for Mailu configuration)
COREDNS_IP=$(kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}')
echo "CoreDNS IP: $COREDNS_IP"
# Verify DNS resolution is working
kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup google.com
Step 3: Create TLS Certificate Secret
Mailu Front pod requires a TLS certificate:
# Generate self-signed certificate for internal use
# (Let's Encrypt handles external TLS via Ingress)
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt \
-subj "/CN=mail.bakewise.ai/O=bakewise"
kubectl create secret tls mailu-certificates \
--cert=tls.crt \
--key=tls.key \
-n bakery-ia
rm -rf "$TEMP_DIR"
# Verify secret created
kubectl get secret mailu-certificates -n bakery-ia
Step 4: Create Admin Credentials Secret
# Generate a secure password (or use your own)
ADMIN_PASSWORD=$(openssl rand -base64 16 | tr -d '/+=' | head -c 16)
echo "Admin password: $ADMIN_PASSWORD"
echo "SAVE THIS PASSWORD SECURELY!"
# Create the admin credentials secret
kubectl create secret generic mailu-admin-credentials \
--from-literal=password="$ADMIN_PASSWORD" \
-n bakery-ia
Step 5: Deploy Mailu via Helm
# Add Mailu Helm repository
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update mailu
# Get CoreDNS service IP
COREDNS_IP=$(kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}')
# Deploy Mailu with production values
# Admin user is created automatically via initialAccount feature
# CoreDNS provides DNSSEC validation via DNS-over-TLS to Cloudflare
helm upgrade --install mailu mailu/mailu \
-n bakery-ia \
--create-namespace \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
-f infrastructure/platform/mail/mailu-helm/prod/values.yaml \
--set global.custom_dns_servers="$COREDNS_IP" \
--timeout 10m
# Wait for pods to be ready (may take 5-10 minutes for ClamAV)
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=mailu -w
# Admin user (admin@bakewise.ai) is created automatically!
# Password is the one you set in Step 4
Step 6: Configure DKIM
After Mailu is running, get the DKIM key and add it to DNS:
# Get DKIM public key
kubectl exec -n bakery-ia deployment/mailu-admin -- \
cat /dkim/bakewise.ai.dkim.pub
# Add this as a TXT record in your DNS:
# Name: dkim._domainkey
# Value: (the key from above)
Step 7: Verify Email Setup
# Check all Mailu pods are running
kubectl get pods -n bakery-ia | grep mailu
# Expected: All 10 pods in Running state
# Test SMTP connectivity
kubectl run -it --rm smtp-test --image=alpine --restart=Never -- \
sh -c "apk add swaks && swaks --to test@example.com --from admin@bakewise.ai --server mailu-front.bakery-ia.svc.cluster.local:25"
# Access webmail (via port-forward for testing)
kubectl port-forward -n bakery-ia svc/mailu-front 8080:80
# Open: http://localhost:8080/webmail
Production Email Endpoints
| Service | URL/Address |
|---|---|
| Admin Panel | https://mail.bakewise.ai/admin |
| Webmail | https://mail.bakewise.ai/webmail |
| SMTP (STARTTLS) | mail.bakewise.ai:587 |
| SMTP (SSL) | mail.bakewise.ai:465 |
| IMAP (SSL) | mail.bakewise.ai:993 |
Troubleshooting Mailu
Issue: Admin pod CrashLoopBackOff with "DNSSEC validation" error
# Verify CoreDNS is configured with DNS-over-TLS
kubectl get configmap coredns -n kube-system -o yaml | grep 'tls://'
# Should show: tls://1.1.1.1 tls://1.0.0.1
# If not, re-run Step 2 above
Issue: Front pod stuck in ContainerCreating
# Check for missing certificate secret
kubectl describe pod -n bakery-ia -l app.kubernetes.io/component=front | grep -A5 Events
# If missing mailu-certificates, re-run Step 3 above
Issue: Admin pod can't connect to Redis
# Verify externalRedis is disabled in values
helm get values mailu -n bakery-ia | grep -A5 externalRedis
# Should show: enabled: false
# If enabled: true, upgrade with correct values
helm upgrade mailu mailu/mailu -n bakery-ia \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
-f infrastructure/platform/mail/mailu-helm/prod/values.yaml
WhatsApp Business API Setup
Features:
- ✅ First 1,000 conversations/month FREE
- ✅ Perfect for 10 tenants (~500 messages/month)
Setup Steps:
-
Create Meta Business Account:
1. Go to business.facebook.com 2. Create Business Account 3. Complete business verification -
Add WhatsApp Product:
1. Go to developers.facebook.com 2. Create New App → Business 3. Add WhatsApp product 4. Complete setup wizard -
Configure Phone Number:
1. Test with your personal number initially 2. Later: Get dedicated business number 3. Verify phone number with SMS code -
Create Message Templates:
1. Go to WhatsApp Manager 2. Create templates for: - Low inventory alert - Expired product alert - Forecast summary - Order notification 3. Submit for approval (15 min - 24 hours) -
Get API Credentials:
Save these values: - Phone Number ID: (from WhatsApp Manager) - Access Token: (from App Dashboard) - Business Account ID: (from WhatsApp Manager) - Webhook Verify Token: (create your own secure string)
Kubernetes Deployment
Step 1: Prepare Container Images
Option A: Using Docker Hub (Recommended)
# On your local machine
docker login
# Build all images
docker-compose build
# Tag images for Docker Hub
# Replace YOUR_USERNAME with your Docker Hub username
export DOCKER_USERNAME="YOUR_USERNAME"
./scripts/tag-images.sh $DOCKER_USERNAME
# Push to Docker Hub
./scripts/push-images.sh $DOCKER_USERNAME
# Update prod kustomization with your username
# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Replace all "bakery/" with "$DOCKER_USERNAME/"
Option B: Using MicroK8s Registry
# On VPS
microk8s enable registry
# Get registry address (usually localhost:32000)
kubectl get service -n container-registry
# On local machine, configure insecure registry
# Edit /etc/docker/daemon.json:
{
"insecure-registries": ["YOUR_VPS_IP:32000"]
}
# Restart Docker
sudo systemctl restart docker
# Tag and push images
docker tag bakery/auth-service YOUR_VPS_IP:32000/bakery/auth-service
docker push YOUR_VPS_IP:32000/bakery/auth-service
# Repeat for all services...
Step 2: Update Production Configuration
⚠️ CRITICAL: The default configuration uses bakewise.ai domain. You MUST update this before deployment if using a different domain.
Required Configuration Updates
Step 2.1: Remove imagePullSecrets
# On your local machine
cd bakery-ia
# Remove imagePullSecrets from all deployment files
find infrastructure/kubernetes/base -name "*.yaml" -type f -exec sed -i.bak '/imagePullSecrets:/,+1d' {} \;
# Verify removal
grep -r "imagePullSecrets" infrastructure/kubernetes/base/
# Should return NO results
Step 2.2: Update Image Tags (Use Semantic Versions)
# Edit kustomization.yaml to replace 'latest' with actual version
nano infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Find the images section (lines 163-196) and update:
# BEFORE:
# - name: bakery/auth-service
# newTag: latest
# AFTER:
# - name: bakery/auth-service
# newTag: v1.0.0
# Do this for ALL 22 services, or use this helper:
export VERSION="1.0.0" # Your version
# Create a script to update all image tags
cat > /tmp/update-tags.sh <<'EOF'
#!/bin/bash
VERSION="${1:-1.0.0}"
sed -i "s/newTag: latest/newTag: v${VERSION}/g" infrastructure/kubernetes/overlays/prod/kustomization.yaml
EOF
chmod +x /tmp/update-tags.sh
/tmp/update-tags.sh ${VERSION}
# Verify no 'latest' tags remain
grep "newTag:" infrastructure/kubernetes/overlays/prod/kustomization.yaml | grep -c "latest"
# Should return: 0
Step 2.3: Fix SigNoz Namespace References
# Update SigNoz patches to use bakery-ia namespace instead of signoz
sed -i 's/namespace: signoz/namespace: bakery-ia/g' infrastructure/kubernetes/overlays/prod/kustomization.yaml
# Verify changes (should show bakery-ia in all 3 patches)
grep -A 3 "name: signoz" infrastructure/kubernetes/overlays/prod/kustomization.yaml
Step 2.4: Update Cert-Manager Email
# Update Let's Encrypt notification email to your production email
sed -i "s/admin@bakery-ia.local/admin@bakewise.ai/g" \
infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml
Step 2.5: Verify Production Secrets (Already Configured) ✅
# Production secrets have been pre-configured with strong cryptographic passwords
# No manual action required - secrets are already set in secrets.yaml
# Verify the secrets are configured (optional)
echo "Verifying production secrets configuration..."
grep "JWT_SECRET_KEY" infrastructure/kubernetes/base/secrets.yaml | head -1
grep "AUTH_DB_PASSWORD" infrastructure/kubernetes/base/secrets.yaml | head -1
grep "REDIS_PASSWORD" infrastructure/kubernetes/base/secrets.yaml | head -1
echo "✅ All production secrets are configured and ready for deployment"
Production URLs:
- Main Application: https://bakewise.ai
- API Endpoints: https://bakewise.ai/api/v1/...
- SigNoz (Monitoring): https://monitoring.bakewise.ai/signoz
- AlertManager: https://monitoring.bakewise.ai/alertmanager
Configuration & Secrets
Production Secrets Status ✅
All core secrets have been pre-configured with strong cryptographic passwords:
- ✅ Database passwords (19 databases) - 24-character random strings
- ✅ JWT secrets - 256-bit cryptographically secure tokens
- ✅ Service API key - 64-character hexadecimal string
- ✅ Redis password - 24-character random string
- ✅ RabbitMQ password - 24-character random string
- ✅ RabbitMQ Erlang cookie - 64-character hexadecimal string
Step 1: Configure External Service Credentials (Email & WhatsApp)
You still need to update these external service credentials:
# Edit the secrets file
nano infrastructure/kubernetes/base/secrets.yaml
# Update ONLY these external service credentials:
# SMTP settings (from email setup):
SMTP_USER: <base64-encoded-username> # your email
SMTP_PASSWORD: <base64-encoded-password> # app password
# WhatsApp credentials (from WhatsApp setup - optional):
WHATSAPP_API_KEY: <base64-encoded-key>
# Payment processing (from Stripe setup):
STRIPE_SECRET_KEY: <base64-encoded-key>
STRIPE_WEBHOOK_SECRET: <base64-encoded-secret>
To base64 encode:
echo -n "your-value-here" | base64
CRITICAL: Never commit real secrets to git! The secrets.yaml file should be in .gitignore.
Step 2: CI/CD Secrets Configuration
For production CI/CD setup, additional secrets are required:
# Create Docker Hub credentials secret (for image pulls)
kubectl create secret docker-registry dockerhub-creds \
--docker-server=docker.io \
--docker-username=YOUR_DOCKERHUB_USERNAME \
--docker-password=YOUR_DOCKERHUB_TOKEN \
--docker-email=your-email@example.com \
-n bakery-ia
# Create Gitea registry credentials (if using Gitea for CI/CD)
kubectl create secret docker-registry gitea-registry-credentials \
-n tekton-pipelines \
--docker-server=gitea.bakery-ia.local:5000 \
--docker-username=your-username \
--docker-password=your-password
# Create Git credentials for Flux (if using GitOps)
kubectl create secret generic gitea-credentials \
-n flux-system \
--from-literal=username=your-username \
--from-literal=password=your-password
Step 3: Apply Application Secrets
# Copy manifests to VPS (from local machine)
scp -r infrastructure/kubernetes root@YOUR_VPS_IP:~/
# SSH to VPS
ssh root@YOUR_VPS_IP
# Apply application secrets
kubectl apply -f ~/infrastructure/kubernetes/base/secrets.yaml -n bakery-ia
# Verify secrets created
kubectl get secrets -n bakery-ia
# Should show multiple secrets including postgres-tls, redis-tls, app-secrets, etc.
Database Migrations
Step 0: Deploy CI/CD Infrastructure (Optional but Recommended)
For production environments, deploy CI/CD infrastructure components:
# Deploy Tekton Pipelines for CI/CD (optional but recommended for production)
kubectl create namespace tekton-pipelines
# Install Tekton Pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
# Install Tekton Triggers
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
# Apply Tekton configurations
kubectl apply -f ~/infrastructure/cicd/tekton/tasks/
kubectl apply -f ~/infrastructure/cicd/tekton/pipelines/
kubectl apply -f ~/infrastructure/cicd/tekton/triggers/
# Verify Tekton deployment
kubectl get pods -n tekton-pipelines
Step 1: Deploy SigNoz Monitoring (BEFORE Application)
⚠️ CRITICAL: SigNoz must be deployed BEFORE the application into the bakery-ia namespace because the production kustomization patches SigNoz resources.
# On VPS
# 1. Ensure bakery-ia namespace exists
kubectl get namespace bakery-ia || kubectl create namespace bakery-ia
# 2. Add Helm repo
helm repo add signoz https://charts.signoz.io
helm repo update
# 3. Install SigNoz into bakery-ia namespace (NOT separate signoz namespace)
helm install signoz signoz/signoz \
-n bakery-ia \
--set frontend.service.type=ClusterIP \
--set clickhouse.persistence.size=20Gi \
--set clickhouse.persistence.storageClass=microk8s-hostpath
# 4. Wait for SigNoz to be ready (this may take 10-15 minutes)
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/instance=signoz \
-n bakery-ia \
--timeout=900s
# 5. Verify SigNoz components running in bakery-ia namespace
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Should show: signoz-0, signoz-otel-collector, signoz-clickhouse, signoz-zookeeper, signoz-alertmanager
# 6. Verify StatefulSets exist (kustomization will patch these)
kubectl get statefulset -n bakery-ia | grep signoz
# Should show: signoz, signoz-clickhouse
⚠️ Important: Do NOT create a separate signoz namespace. SigNoz must be in bakery-ia namespace for the overlays to work correctly.
Step 2: Deploy Application and Databases
# On VPS
kubectl apply -k ~/infrastructure/kubernetes/overlays/prod
# Wait for databases to be ready (5-10 minutes)
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/component=database \
-n bakery-ia \
--timeout=600s
# Check status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
Step 2: Run Migrations
Migrations are automatically handled by init containers in each service. Verify they completed:
# Check migration job status
kubectl get jobs -n bakery-ia | grep migration
# All should show "COMPLETIONS = 1/1"
# Check logs if any failed
kubectl logs -n bakery-ia job/auth-migration
Step 3: Verify Database Schemas
# Connect to a database to verify
kubectl exec -n bakery-ia deployment/auth-db -it -- psql -U auth_user -d auth_db
# Inside psql:
\dt # List tables
\d users # Describe users table
\q # Quit
CI/CD Infrastructure Deployment
This section covers deploying the complete CI/CD stack: Gitea (Git server + container registry), Tekton (CI pipelines), and Flux CD (GitOps deployments).
Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ CI/CD ARCHITECTURE │
│ │
│ Developer Push │
│ │ │
│ ▼ │
│ ┌─────────┐ Webhook ┌─────────────┐ Build/Test ┌─────────┐ │
│ │ Gitea │ ───────────────► │ Tekton │ ─────────────────►│ Images │ │
│ │ (Git) │ │ (Pipelines)│ │(Registry)│ │
│ └─────────┘ └─────────────┘ └─────────┘ │
│ │ │ │ │
│ │ │ Update manifests │ │
│ │ ▼ │ │
│ │ ┌─────────────┐ │ │
│ └──────────────────────►│ Flux CD │◄───────────────────────┘ │
│ Monitor changes │ (GitOps) │ Pull images │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Kubernetes │ │
│ │ Cluster │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Prerequisites
Before deploying CI/CD infrastructure:
- Kubernetes cluster is running
- Ingress controller is configured
- TLS certificates are available
- DNS records configured for
gitea.bakewise.ai
Step 1: Deploy Gitea (Git Server + Container Registry)
Gitea provides a self-hosted Git server with built-in container registry support. The setup is fully automated - admin user and initial repository are created automatically.
1.1 Create Secrets and Init Job (One Command)
The setup script creates all necessary secrets and applies the initialization job.
For Production Deployment:
# Generate a secure password (minimum 16 characters required for production)
export GITEA_ADMIN_PASSWORD=$(openssl rand -base64 32)
echo "Gitea Admin Password: $GITEA_ADMIN_PASSWORD"
echo "⚠️ Save this password securely - you'll need it for Tekton setup!"
# Run the setup script with --production flag
# This enforces password requirements and uses production registry URL
./infrastructure/cicd/gitea/setup-admin-secret.sh --production
What the --production flag does:
- Requires
GITEA_ADMIN_PASSWORDenvironment variable (won't use defaults) - Validates password is at least 16 characters
- Uses production registry URL (
registry.bakewise.ai) - Hides password in output for security
- Shows production-specific next steps
This creates:
gitea-admin-secretingiteanamespace - admin credentials for Giteagitea-registry-secretinbakery-ianamespace - for imagePullSecretsgitea-init-job- Kubernetes Job that creates thebakery-iarepository automatically
For dev environments only: Run without flags to use the default static password:
./infrastructure/cicd/gitea/setup-admin-secret.sh
1.2 Install Gitea via Helm
# Add Gitea Helm repository
helm repo add gitea https://dl.gitea.io/charts
helm repo update gitea
# Install Gitea with PRODUCTION values (includes TLS, proper domains, resources)
helm upgrade --install gitea gitea/gitea \
-n gitea \
-f infrastructure/cicd/gitea/values.yaml \
-f infrastructure/cicd/gitea/values-prod.yaml \
--timeout 10m \
--wait
# Wait for Gitea to be ready
kubectl wait --for=condition=ready pod -n gitea -l app.kubernetes.io/name=gitea --timeout=300s
# Verify Gitea is running
kubectl get pods -n gitea
kubectl get svc -n gitea
Production values (values-prod.yaml) include:
- Domain:
gitea.bakewise.aiandregistry.bakewise.ai - TLS via cert-manager with Let's Encrypt production issuer
- 50Gi storage (vs 10Gi in dev)
- Increased resource limits
1.3 Verify Repository Initialization
The init job automatically creates the bakery-ia repository once Gitea is ready:
# Check init job completed successfully
kubectl logs -n gitea job/gitea-init-repo
# Expected output:
# === Gitea Repository Initialization ===
# Gitea is ready!
# Repository 'bakery-ia' created successfully!
If the job needs to be re-run:
kubectl delete job gitea-init-repo -n gitea
kubectl apply -f infrastructure/cicd/gitea/gitea-init-job.yaml
1.4 Configure DNS for Gitea
Add DNS record pointing to your VPS:
Type Name Value TTL
A gitea YOUR_VPS_IP Auto
1.5 Verify Gitea Access
# Check ingress is configured
kubectl get ingress -n gitea
# Test access (after DNS propagation)
curl -I https://gitea.bakewise.ai
# Access web interface
# URL: https://gitea.bakewise.ai
# Username: bakery-admin
# Password: (from step 1.1)
# Verify repository was created via API
curl -u bakery-admin:$GITEA_ADMIN_PASSWORD \
https://gitea.bakewise.ai/api/v1/repos/bakery-admin/bakery-ia
1.6 Push Code to Repository
The bakery-ia repository is already created with a README. Push your code:
# Add Gitea as remote and push code
cd /path/to/bakery-ia
git remote add gitea https://gitea.bakewise.ai/bakery-admin/bakery-ia.git
git push gitea main
Step 2: Deploy Tekton Pipelines
Tekton provides cloud-native CI/CD pipelines.
2.1 Install Tekton Core Components
# Create Tekton namespace
kubectl create namespace tekton-pipelines
# Install Tekton Pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
# Wait for Tekton Pipelines to be ready
kubectl wait --for=condition=available --timeout=300s \
deployment/tekton-pipelines-controller -n tekton-pipelines
kubectl wait --for=condition=available --timeout=300s \
deployment/tekton-pipelines-webhook -n tekton-pipelines
# Install Tekton Triggers (for webhook-based automation)
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/interceptors.yaml
# Wait for Tekton Triggers to be ready
kubectl wait --for=condition=available --timeout=300s \
deployment/tekton-triggers-controller -n tekton-pipelines
kubectl wait --for=condition=available --timeout=300s \
deployment/tekton-triggers-webhook -n tekton-pipelines
# Verify installation
kubectl get pods -n tekton-pipelines
2.2 Deploy Tekton CI/CD Configuration via Helm
For Production Deployment:
# Generate secure webhook token (save this for Gitea webhook configuration)
export TEKTON_WEBHOOK_TOKEN=$(openssl rand -hex 32)
echo "Webhook Token: $TEKTON_WEBHOOK_TOKEN"
echo "⚠️ Save this token - you'll need it for Gitea webhook setup!"
# Ensure GITEA_ADMIN_PASSWORD is still set from Step 1
echo "Using Gitea password from: GITEA_ADMIN_PASSWORD"
# Install Tekton CI/CD with PRODUCTION values
helm upgrade --install tekton-cicd infrastructure/cicd/tekton-helm \
-n tekton-pipelines \
-f infrastructure/cicd/tekton-helm/values.yaml \
-f infrastructure/cicd/tekton-helm/values-prod.yaml \
--set secrets.webhook.token=$TEKTON_WEBHOOK_TOKEN \
--set secrets.registry.password=$GITEA_ADMIN_PASSWORD \
--set secrets.git.password=$GITEA_ADMIN_PASSWORD \
--timeout 10m \
--wait
# Verify resources created
kubectl get pipelines -n tekton-pipelines
kubectl get tasks -n tekton-pipelines
kubectl get eventlisteners -n tekton-pipelines
kubectl get triggerbindings -n tekton-pipelines
kubectl get triggertemplates -n tekton-pipelines
What the production values (values-prod.yaml) provide:
- Empty default secrets (must be provided via
--setflags) - Increased controller/webhook replicas (2 each)
- Higher resource limits for production workloads
- 10Gi workspace storage (vs 5Gi in dev)
⚠️ Security Note: Never commit actual secrets to values files. Always pass them via
--setflags or use external secret management.
2.3 Configure Gitea Webhook
- Go to Gitea repository settings → Webhooks
- Add webhook:
- Target URL:
http://el-bakery-ia-listener.tekton-pipelines.svc.cluster.local:8080 - HTTP Method: POST
- Content Type: application/json
- Secret: (same as
secrets.webhook.tokenfrom Helm) - Trigger on: Push events
- Target URL:
- Save webhook
2.4 Test Pipeline Manually
# Create a manual PipelineRun to test the CI pipeline
cat <<EOF | kubectl create -f -
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: manual-ci-run-
namespace: tekton-pipelines
spec:
pipelineRef:
name: bakery-ia-ci
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
- name: docker-credentials
secret:
secretName: gitea-registry-credentials
params:
- name: git-url
value: "http://gitea-http.gitea.svc.cluster.local:3000/bakery-admin/bakery-ia.git"
- name: git-revision
value: "main"
EOF
# Watch pipeline progress
kubectl get pipelineruns -n tekton-pipelines -w
# View logs
tkn pipelinerun logs -n tekton-pipelines -f
Step 3: Deploy Flux CD (GitOps)
Flux CD provides GitOps-based continuous deployment.
3.1 Install Flux CLI
# Install Flux CLI (if not already installed)
curl -s https://fluxcd.io/install.sh | sudo bash
# Verify installation
flux --version
3.2 Install Flux Components
# Install Flux CRDs and controllers
flux install --namespace=flux-system --network-policy=false
# Verify Flux is running
kubectl get pods -n flux-system
flux check
3.3 Deploy Flux Configuration via Helm
# Create Flux namespace if not exists
kubectl create namespace flux-system --dry-run=client -o yaml | kubectl apply -f -
# Create Git credentials secret for Flux
kubectl create secret generic gitea-credentials \
-n flux-system \
--from-literal=username=bakery-admin \
--from-literal=password="$GITEA_ADMIN_PASSWORD"
# Install Flux configuration
helm upgrade --install flux-cd infrastructure/cicd/flux \
-n flux-system \
--timeout 10m \
--wait
# Verify Flux resources
kubectl get gitrepositories -n flux-system
kubectl get kustomizations -n flux-system
3.4 Verify GitOps Sync
# Check GitRepository status
flux get sources git -n flux-system
# Check Kustomization status
flux get kustomizations -n flux-system
# Force reconciliation
flux reconcile source git bakery-ia -n flux-system
flux reconcile kustomization bakery-ia-prod -n flux-system
Step 4: Complete CI/CD Workflow Test
Test the entire CI/CD pipeline end-to-end:
# 1. Make a small change in your local repo
echo "# CI/CD Test $(date)" >> README.md
git add README.md
git commit -m "Test CI/CD pipeline"
# 2. Push to Gitea
git push gitea main
# 3. Watch Tekton pipeline triggered by webhook
kubectl get pipelineruns -n tekton-pipelines -w
# 4. After pipeline completes, watch Flux sync
flux get kustomizations -n flux-system -w
# 5. Verify deployment updated
kubectl get deployments -n bakery-ia -o wide
CI/CD Troubleshooting
Tekton Pipeline Fails
# View pipeline run status
kubectl get pipelineruns -n tekton-pipelines
# Get detailed logs
tkn pipelinerun describe <pipelinerun-name> -n tekton-pipelines
tkn pipelinerun logs <pipelinerun-name> -n tekton-pipelines
# Check EventListener logs (for webhook issues)
kubectl logs -n tekton-pipelines -l app.kubernetes.io/component=eventlistener
Flux Not Syncing
# Check GitRepository status
kubectl describe gitrepository bakery-ia -n flux-system
# Check Kustomization status
kubectl describe kustomization bakery-ia-prod -n flux-system
# View Flux controller logs
kubectl logs -n flux-system deployment/source-controller
kubectl logs -n flux-system deployment/kustomize-controller
# Force reconciliation
flux reconcile source git bakery-ia -n flux-system --with-source
Gitea Webhook Not Triggering
# Check webhook delivery in Gitea UI
# Settings → Webhooks → Recent Deliveries
# Verify EventListener is running
kubectl get eventlisteners -n tekton-pipelines
kubectl get svc -n tekton-pipelines | grep listener
# Check EventListener logs
kubectl logs -n tekton-pipelines -l eventlistener=bakery-ia-listener
CI/CD URLs Summary
| Service | URL | Purpose |
|---|---|---|
| Gitea | https://gitea.bakewise.ai | Git repository & container registry |
| Gitea Registry | https://gitea.bakewise.ai/v2/ | Docker registry API |
| Tekton Dashboard | (install separately if needed) | Pipeline visualization |
| Flux | CLI only | GitOps status via flux commands |
CI/CD Security Considerations
The CI/CD infrastructure has been configured with production security in mind:
Secrets Management
| Secret | Purpose | How to Generate |
|---|---|---|
GITEA_ADMIN_PASSWORD |
Gitea admin & registry auth | openssl rand -base64 32 |
TEKTON_WEBHOOK_TOKEN |
Webhook signature validation | openssl rand -hex 32 |
Security Features
-
Production Mode Enforcement
- The
--productionflag onsetup-admin-secret.shenforces:- Mandatory
GITEA_ADMIN_PASSWORDenvironment variable - Minimum 16-character password requirement
- Password hidden from terminal output
- Mandatory
- The
-
Registry Communication
- Git operations (clone, push) use internal cluster DNS:
gitea-http.gitea.svc.cluster.local:3000 - Image references use external HTTPS URL:
registry.bakewise.ai(containerd requires HTTPS for auth) - This ensures image pulls work correctly while git operations stay internal
- Git operations (clone, push) use internal cluster DNS:
-
Credential Isolation
- Secrets are passed via
--setflags, never committed to git - Registry credentials are scoped per-namespace
- Webhook tokens are unique per installation
- Secrets are passed via
Post-Deployment Security Checklist
# Verify no default passwords in use
kubectl get secret gitea-admin-secret -n gitea -o jsonpath='{.data.password}' | base64 -d | wc -c
# Should be 32+ characters for production
# Verify webhook secret is set
kubectl get secret gitea-webhook-secret -n tekton-pipelines -o jsonpath='{.data.secretToken}' | base64 -d | wc -c
# Should be 64 characters (hex-encoded 32 bytes)
# Verify no hardcoded URLs in tasks
kubectl get task update-gitops -n tekton-pipelines -o yaml | grep -c "bakery-ia.local"
# Should be 0
Mailu Email Server Deployment
Mailu is a full-featured, self-hosted email server with built-in antispam, webmail, and admin panel. Outbound emails are relayed through Mailgun for improved deliverability and to avoid IP reputation issues.
Prerequisites
Before deploying Mailu:
- CoreDNS configured with DNS-over-TLS for DNSSEC validation
- DNS records configured for mail domain
- TLS certificates available
- Mailgun account created and domain verified (for outbound email relay)
Step 1: Configure CoreDNS for DNSSEC (DNS-over-TLS)
Mailu requires DNSSEC validation for email authentication (DKIM/SPF/DMARC). CoreDNS is configured to use DNS-over-TLS with Cloudflare for DNSSEC validation.
# Check if CoreDNS is already configured with DNS-over-TLS
kubectl get configmap coredns -n kube-system -o jsonpath='{.data.Corefile}' | grep -o 'tls://1.1.1.1' || echo "Not configured"
# If not configured, update CoreDNS ConfigMap
cat > /tmp/coredns-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . tls://1.1.1.1 tls://1.0.0.1 {
tls_servername cloudflare-dns.com
health_check 5s
}
cache 30 {
disable success cluster.local
disable denial cluster.local
}
loop
reload
loadbalance
}
EOF
# Apply configuration
kubectl apply -f /tmp/coredns-config.yaml
# Restart CoreDNS
kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system --timeout=60s
# Get CoreDNS service IP
COREDNS_IP=$(kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}')
echo "CoreDNS IP: $COREDNS_IP"
# Verify DNS resolution is working
kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup google.com
Step 3: Configure Mailgun (External SMTP Relay)
Mailu uses Mailgun as an external SMTP relay for all outbound emails. This improves deliverability and avoids IP reputation issues common with self-hosted mail servers.
3.1: Create Mailgun Account
- Go to https://www.mailgun.com and create an account
- Add your domain (bakewise.ai) in the Mailgun dashboard
- Verify domain ownership by adding the DNS records Mailgun provides
3.2: Get SMTP Credentials
- In Mailgun dashboard, go to Domain Settings > SMTP credentials
- Note your credentials:
- SMTP hostname:
smtp.mailgun.org - Port:
587(TLS/STARTTLS) - Username: typically
postmaster@bakewise.ai - Password: your Mailgun SMTP password (NOT the API key)
- SMTP hostname:
3.3: Create Kubernetes Secret for Mailgun
# Edit the secret template with your Mailgun credentials
nano infrastructure/platform/mail/mailu-helm/configs/mailgun-credentials-secret.yaml
# Replace the placeholder values:
# RELAY_USERNAME: "postmaster@bakewise.ai"
# RELAY_PASSWORD: "your-mailgun-smtp-password"
# Apply the secret
kubectl apply -f infrastructure/platform/mail/mailu-helm/configs/mailgun-credentials-secret.yaml -n bakery-ia
# Verify secret created
kubectl get secret mailu-mailgun-credentials -n bakery-ia
Step 4: Configure DNS Records for Mail
Add these DNS records for your domain (e.g., bakewise.ai):
Type Name Value TTL Priority
A mail YOUR_VPS_IP Auto -
MX @ mail.bakewise.ai Auto 10
TXT @ v=spf1 include:mailgun.org mx a ~all Auto -
TXT _dmarc v=DMARC1; p=quarantine; rua=... Auto -
Mailgun-specific DNS records (Mailgun will provide exact values):
Type Name Value TTL
TXT (provided by Mailgun) (DKIM key from Mailgun) Auto
TXT (provided by Mailgun) (DKIM key from Mailgun) Auto
Note:
- The SPF record includes
mailgun.orgto authorize Mailgun to send on your behalf - Add the DKIM records exactly as Mailgun provides them
- Mailu's own DKIM record will be added after deployment (Step 9)
Step 5: Create TLS Certificate Secret
# Generate self-signed certificate for internal Mailu use
# (Ingress handles external TLS termination)
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt \
-subj "/CN=mail.bakewise.ai/O=bakewise"
kubectl create secret tls mailu-certificates \
--cert=tls.crt \
--key=tls.key \
-n bakery-ia
rm -rf "$TEMP_DIR"
# Verify secret created
kubectl get secret mailu-certificates -n bakery-ia
Step 6: Create Admin Credentials Secret
The admin account is created automatically during Helm deployment using the initialAccount feature. Create a secret with the admin password before deploying.
# Generate a secure password (or use your own)
ADMIN_PASSWORD=$(openssl rand -base64 16 | tr -d '/+=' | head -c 16)
echo "Admin password: $ADMIN_PASSWORD"
echo "SAVE THIS PASSWORD SECURELY!"
# Create the admin credentials secret
kubectl create secret generic mailu-admin-credentials \
--from-literal=password="$ADMIN_PASSWORD" \
-n bakery-ia
# Verify secret created
kubectl get secret mailu-admin-credentials -n bakery-ia
Alternative: Use the provided template file:
# Edit the secret template with your password (base64 encoded)
nano infrastructure/platform/mail/mailu-helm/configs/mailu-admin-credentials-secret.yaml
# Apply the secret
kubectl apply -f infrastructure/platform/mail/mailu-helm/configs/mailu-admin-credentials-secret.yaml
Step 7: Deploy Mailu via Helm
# Add Mailu Helm repository
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update mailu
# Get CoreDNS service IP for Mailu DNS configuration
COREDNS_IP=$(kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}')
# Deploy Mailu with production values
# Note:
# - externalRelay uses Mailgun via the secret created in Step 3
# - initialAccount creates admin user automatically using the secret from Step 6
# - CoreDNS provides DNSSEC validation via DNS-over-TLS (Cloudflare)
helm upgrade --install mailu mailu/mailu \
-n bakery-ia \
--create-namespace \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
-f infrastructure/platform/mail/mailu-helm/prod/values.yaml \
--set global.custom_dns_servers="$COREDNS_IP" \
--timeout 10m
# Wait for pods to be ready (ClamAV may take 5-10 minutes)
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=mailu -w
# The admin user (admin@bakewise.ai) is created automatically!
Step 8: Apply Mailu Ingress
# Apply Mailu-specific ingress configuration
kubectl apply -f infrastructure/platform/mail/mailu-helm/mailu-ingress.yaml
# Verify ingress
kubectl get ingress -n bakery-ia | grep mailu
Admin Credentials (created automatically in Step 7):
- Email:
admin@bakewise.ai - Password: The password you set in Step 6 (stored in
mailu-admin-credentialssecret)
To retrieve the password later:
kubectl get secret mailu-admin-credentials -n bakery-ia -o jsonpath='{.data.password}' | base64 -d
Step 9: Configure DKIM
# Get DKIM public key from Mailu
kubectl exec -n bakery-ia deployment/mailu-admin -- \
cat /dkim/bakewise.ai.dkim.pub
# Add DKIM record to DNS:
# Type: TXT
# Name: dkim._domainkey
# Value: (output from above command)
Step 10: Verify Email Setup
# Check all Mailu pods are running
kubectl get pods -n bakery-ia | grep mailu
# Expected: All pods in Running state
# Verify Mailgun secret is configured
kubectl get secret mailu-mailgun-credentials -n bakery-ia
kubectl get secret mailu-mailgun-credentials -n bakery-ia -o jsonpath='{.data.RELAY_USERNAME}' | base64 -d
# Should show: postmaster@bakewise.ai
# Test internal SMTP connectivity
kubectl run -it --rm smtp-test --image=alpine --restart=Never -- \
sh -c "apk add swaks && swaks --to test@example.com --from admin@bakewise.ai --server mailu-front.bakery-ia.svc.cluster.local:25"
# Test outbound email via Mailgun relay (send test email)
kubectl exec -it -n bakery-ia deployment/mailu-admin -- \
flask mailu alias_create test bakewise.ai 'your-personal-email@gmail.com'
# Then send a test email from webmail to your personal email
# Access webmail (via port-forward for testing)
kubectl port-forward -n bakery-ia svc/mailu-front 8080:80
# Open: http://localhost:8080/webmail
Mailu Endpoints
| Service | URL/Address |
|---|---|
| Admin Panel | https://mail.bakewise.ai/admin |
| Webmail | https://mail.bakewise.ai/webmail |
| SMTP (STARTTLS) | mail.bakewise.ai:587 |
| SMTP (SSL) | mail.bakewise.ai:465 |
| IMAP (SSL) | mail.bakewise.ai:993 |
Mailu Troubleshooting
Admin Pod CrashLoopBackOff with DNSSEC Error
# Verify CoreDNS is configured with DNS-over-TLS
kubectl get configmap coredns -n kube-system -o yaml | grep 'tls://'
# Should show: tls://1.1.1.1 tls://1.0.0.1
# If not configured, re-run Step 1
Front Pod Stuck in ContainerCreating
# Check for missing certificate secret
kubectl describe pod -n bakery-ia -l app.kubernetes.io/component=front | grep -A5 Events
# If missing mailu-certificates, re-run Step 4
Cannot Connect to Redis
# Verify internal Redis is enabled (not external)
helm get values mailu -n bakery-ia | grep -A5 externalRedis
# Should show: enabled: false
# If enabled: true, upgrade with correct values
helm upgrade mailu mailu/mailu -n bakery-ia \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
-f infrastructure/platform/mail/mailu-helm/prod/values.yaml
Outbound Emails Not Delivered (Mailgun Relay Issues)
# Check if Mailgun credentials secret exists
kubectl get secret mailu-mailgun-credentials -n bakery-ia
# If missing, create it (see Step 3)
# Verify credentials are set correctly
kubectl get secret mailu-mailgun-credentials -n bakery-ia -o jsonpath='{.data.RELAY_USERNAME}' | base64 -d
# Should show your Mailgun username (e.g., postmaster@bakewise.ai)
# Check Postfix logs for relay errors
kubectl logs -n bakery-ia deployment/mailu-postfix | grep -i "relay\|mailgun\|sasl"
# Look for authentication errors or connection failures
# Verify Mailgun domain is verified
# Go to Mailgun dashboard > Domain Settings > DNS Records
# All records should show "Verified" status
# Test Mailgun SMTP connectivity directly
kubectl run -it --rm mailgun-test --image=alpine --restart=Never -- \
sh -c "apk add swaks && swaks --to test@example.com --from postmaster@bakewise.ai \
--server smtp.mailgun.org:587 --tls \
--auth-user 'postmaster@bakewise.ai' \
--auth-password 'YOUR_MAILGUN_PASSWORD'"
Emails Going to Spam
- Verify SPF record includes Mailgun:
v=spf1 include:mailgun.org mx a ~all - Check DKIM records are properly configured in both Mailgun and Mailu
- Verify DMARC record is set
- Check your domain reputation at mail-tester.com
Nominatim Geocoding Service
Nominatim provides geocoding (address to coordinates) and reverse geocoding for delivery and distribution features.
When to Deploy
Deploy Nominatim if you need:
- Address autocomplete in the frontend
- Delivery route optimization
- Location-based analytics
Step 1: Deploy Nominatim via Helm
# Deploy Nominatim with production values
helm upgrade --install nominatim infrastructure/platform/nominatim/nominatim-helm \
-n bakery-ia \
--create-namespace \
-f infrastructure/platform/nominatim/nominatim-helm/values.yaml \
-f infrastructure/platform/nominatim/nominatim-helm/prod/values.yaml \
--timeout 15m \
--wait
# Verify deployment
kubectl get pods -n bakery-ia | grep nominatim
Note: Initial deployment may take 10-15 minutes as Nominatim downloads and processes geographic data.
Step 2: Verify Nominatim Service
# Check pod status
kubectl get pods -n bakery-ia -l app=nominatim
# Check service
kubectl get svc -n bakery-ia | grep nominatim
# Test geocoding endpoint
kubectl run -it --rm curl-test --image=curlimages/curl --restart=Never -- \
curl "http://nominatim-service.bakery-ia.svc.cluster.local:8080/search?q=Madrid&format=json"
Step 3: Configure Application to Use Nominatim
Update the application ConfigMap to use the internal Nominatim service:
# Edit configmap
kubectl edit configmap bakery-ia-config -n bakery-ia
# Set:
# NOMINATIM_URL: "http://nominatim-service.bakery-ia.svc.cluster.local:8080"
Nominatim Service Information
| Property | Value |
|---|---|
| Service Name | nominatim-service.bakery-ia.svc.cluster.local |
| Port | 8080 |
| Health Check | http://nominatim-service:8080/status |
| Search Endpoint | /search?q={query}&format=json |
| Reverse Endpoint | /reverse?lat={lat}&lon={lon}&format=json |
SigNoz Monitoring Deployment
SigNoz provides unified observability (traces, metrics, logs) for the entire platform.
Step 1: Deploy SigNoz via Helm
# Add SigNoz Helm repository
helm repo add signoz https://charts.signoz.io
helm repo update signoz
# Deploy SigNoz into bakery-ia namespace
helm upgrade --install signoz signoz/signoz \
-n bakery-ia \
-f infrastructure/monitoring/signoz/signoz-values-prod.yaml \
--set frontend.service.type=ClusterIP \
--set clickhouse.persistence.size=20Gi \
--set clickhouse.persistence.storageClass=microk8s-hostpath \
--timeout 15m \
--wait
# Wait for all components (may take 10-15 minutes)
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/instance=signoz \
-n bakery-ia \
--timeout=900s
# Verify deployment
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
Step 2: Configure Ingress for SigNoz
# Apply SigNoz ingress (if not already included in overlays)
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: signoz-ingress
namespace: bakery-ia
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: public
tls:
- hosts:
- monitoring.bakewise.ai
secretName: signoz-tls-cert
rules:
- host: monitoring.bakewise.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: signoz-frontend
port:
number: 3301
EOF
Step 3: Verify SigNoz Access
# Check ingress
kubectl get ingress -n bakery-ia | grep signoz
# Test access
curl -I https://monitoring.bakewise.ai
# Access SigNoz UI
# URL: https://monitoring.bakewise.ai
# Default credentials: admin / admin (change after first login)
SigNoz Endpoints
| Service | URL |
|---|---|
| SigNoz UI | https://monitoring.bakewise.ai |
| AlertManager | https://monitoring.bakewise.ai/alertmanager |
| OTel Collector (gRPC) | signoz-otel-collector:4317 (internal) |
| OTel Collector (HTTP) | signoz-otel-collector:4318 (internal) |
Verification & Testing
Step 1: Check All Pods Running
# View all pods
kubectl get pods -n bakery-ia
# Expected: All pods in "Running" state, none in CrashLoopBackOff
# Check for issues
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"
# View logs for any problematic pods
kubectl logs -n bakery-ia POD_NAME
Step 2: Check Services and Ingress
# View services
kubectl get svc -n bakery-ia
# View ingress
kubectl get ingress -n bakery-ia
# View certificates (should auto-issue from Let's Encrypt)
kubectl get certificate -n bakery-ia
# Describe certificate to check status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
Step 3: Test Database Connections
# Test PostgreSQL TLS
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected output: on
# Test Redis TLS
kubectl exec -n bakery-ia deployment/redis -- redis-cli \
--tls \
--cert /tls/redis-cert.pem \
--key /tls/redis-key.pem \
--cacert /tls/ca-cert.pem \
-a $REDIS_PASSWORD \
ping
# Expected output: PONG
Step 4: Test Frontend Access
# Test frontend (replace with your domain)
curl -I https://bakery.yourdomain.com
# Expected: HTTP/2 200 OK
# Test API health
curl https://api.yourdomain.com/health
# Expected: {"status": "healthy"}
Step 5: Test Authentication
# Create a test user (using your frontend or API)
curl -X POST https://api.yourdomain.com/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!",
"name": "Test User"
}'
# Login
curl -X POST https://api.yourdomain.com/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "test@yourdomain.com",
"password": "TestPassword123!"
}'
# Expected: JWT token in response
Step 6: Test Email Delivery
# Trigger a password reset to test email
curl -X POST https://api.yourdomain.com/api/v1/auth/forgot-password \
-H "Content-Type: application/json" \
-d '{"email": "test@yourdomain.com"}'
# Check your email inbox for the reset link
# Check service logs if email not received:
kubectl logs -n bakery-ia deployment/auth-service | grep -i "email\|smtp"
Step 7: Test WhatsApp (Optional)
# Send a test WhatsApp message
# This requires creating a tenant and configuring WhatsApp in the UI
# Or test via API once authenticated
Post-Deployment
Step 1: Access SigNoz Monitoring Stack
Your production deployment includes SigNoz, a unified observability platform that provides complete visibility into your application:
What is SigNoz?
SigNoz is an open-source, all-in-one observability platform that provides:
- 📊 Distributed Tracing - See end-to-end request flows across all 18 microservices
- 📈 Metrics Monitoring - Application performance and infrastructure metrics
- 📝 Log Management - Centralized logs from all services with trace correlation
- 🔍 Service Performance Monitoring (SPM) - Automatic RED metrics (Rate, Error, Duration)
- 🗄️ Database Monitoring - All 18 PostgreSQL databases + Redis + RabbitMQ
- ☸️ Kubernetes Monitoring - Cluster, node, pod, and container metrics
Why SigNoz instead of Prometheus/Grafana?
- Single unified UI for traces, metrics, and logs (no context switching)
- Automatic service dependency mapping
- Built-in APM (Application Performance Monitoring)
- Log-trace correlation with one click
- Better query performance with ClickHouse backend
- Modern UI designed for microservices
Production Monitoring URLs
Access via domain:
https://monitoring.bakewise.ai/signoz # SigNoz - Main observability UI
https://monitoring.bakewise.ai/alertmanager # AlertManager - Alert management
Or via port forwarding (if needed):
# SigNoz Frontend (Main UI)
kubectl port-forward -n bakery-ia svc/signoz 8080:8080 &
# Open: http://localhost:8080
# SigNoz AlertManager
kubectl port-forward -n bakery-ia svc/signoz-alertmanager 9093:9093 &
# Open: http://localhost:9093
# OTel Collector (for debugging)
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4317:4317 & # gRPC
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4318:4318 & # HTTP
Key SigNoz Features to Explore
Once you open SigNoz (https://monitoring.bakewise.ai/signoz), explore these tabs:
1. Services Tab - Application Performance
- View all 18 microservices with live metrics
- See request rate, error rate, and latency (P50/P90/P99)
- Click on any service to drill down into operations
- Identify slow endpoints and error-prone operations
2. Traces Tab - Request Flow Visualization
- See complete request journeys across services
- Identify bottlenecks (slow database queries, API calls)
- Debug errors with full stack traces
- Correlate with logs for complete context
3. Dashboards Tab - Infrastructure & Database Metrics
- PostgreSQL - Monitor all 18 databases (connections, queries, cache hit ratio)
- Redis - Cache performance (memory, hit rate, commands/sec)
- RabbitMQ - Message queue health (depth, rates, consumers)
- Kubernetes - Cluster metrics (nodes, pods, containers)
4. Logs Tab - Centralized Log Management
- Search and filter logs from all services
- Click on trace ID in logs to see related request trace
- Auto-enriched with Kubernetes metadata (pod, namespace, container)
- Identify patterns and anomalies
5. Alerts Tab - Proactive Monitoring
- Configure alerts on metrics, traces, or logs
- Email/Slack/Webhook notifications
- View firing alerts and alert history
Quick Health Check
# Verify SigNoz components are running
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Expected output:
# signoz-0 READY 1/1
# signoz-otel-collector-xxx READY 1/1
# signoz-alertmanager-xxx READY 1/1
# signoz-clickhouse-xxx READY 1/1
# signoz-zookeeper-xxx READY 1/1
# Check OTel Collector health
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- wget -qO- http://localhost:13133
# View recent telemetry in OTel Collector logs
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=50 | grep -i "traces\|metrics\|logs"
Verify Telemetry is Working
-
Check Services are Reporting:
# Open SigNoz and navigate to Services tab # You should see all 18 microservices listed # If services are missing, check if they're sending telemetry: kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel" -
Check Database Metrics:
# Navigate to Dashboards → PostgreSQL in SigNoz # You should see metrics from all 18 databases # Verify OTel Collector is scraping databases: kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep postgresql -
Check Traces are Being Collected:
# Make a test API request curl https://bakewise.ai/api/v1/health # Navigate to Traces tab in SigNoz # Search for "gateway" service # You should see the trace for your request -
Check Logs are Being Collected:
# Navigate to Logs tab in SigNoz # Filter by namespace: bakery-ia # You should see logs from all pods # Verify filelog receiver is working: kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep filelog
Step 2: Configure CI/CD Infrastructure (Optional but Recommended)
If you deployed the CI/CD infrastructure, configure it for your workflow:
Gitea Setup (Git Server + Registry)
# Access Gitea at: http://gitea.bakery-ia.local (for dev) or http://gitea.bakewise.ai (for prod)
# Make sure to add the appropriate hostname to /etc/hosts or configure DNS
# Create your repositories for each service
# Configure webhook to trigger Tekton pipelines
Tekton Pipeline Configuration
# Verify Tekton pipelines are running
kubectl get pods -n tekton-pipelines
# Create a PipelineRun manually to test:
kubectl create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: manual-ci-run
namespace: tekton-pipelines
spec:
pipelineRef:
name: bakery-ia-ci
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
- name: docker-credentials
secret:
secretName: gitea-registry-credentials
params:
- name: git-url
value: "http://gitea.bakery-ia.local/bakery-admin/bakery-ia.git"
- name: git-revision
value: "main"
EOF
Flux CD Configuration (GitOps)
# Verify Flux is running
kubectl get pods -n flux-system
# Set up GitRepository and Kustomization resources for GitOps deployment
# Example:
cat <<EOF | kubectl apply -f -
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: bakery-ia
namespace: flux-system
spec:
interval: 1m
url: https://github.com/your-org/bakery-ia.git
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: bakery-ia
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: bakery-ia
path: ./infrastructure/environments/prod/k8s-manifests
prune: true
validation: client
EOF
Step 2: Configure Alerting
SigNoz includes integrated alerting with AlertManager. Configure it for your team:
Update Email Notification Settings
The alerting configuration is in the SigNoz Helm values. To update:
# For production, edit the values file:
nano infrastructure/helm/signoz-values-prod.yaml
# Update the alertmanager.config section:
# 1. Update SMTP settings:
# - smtp_from: 'your-alerts@bakewise.ai'
# - smtp_auth_username: 'your-alerts@bakewise.ai'
# - smtp_auth_password: (use Kubernetes secret)
#
# 2. Update receivers:
# - critical-alerts email: critical-alerts@bakewise.ai
# - warning-alerts email: oncall@bakewise.ai
#
# 3. (Optional) Add Slack webhook for critical alerts
# Apply the updated configuration:
helm upgrade signoz signoz/signoz \
-n bakery-ia \
-f infrastructure/helm/signoz-values-prod.yaml
Create Alerts in SigNoz UI
-
Open SigNoz Alerts Tab:
https://monitoring.bakewise.ai/signoz → Alerts -
Create Common Alerts:
Alert 1: High Error Rate
- Name:
HighErrorRate - Query:
error_rate > 5for5 minutes - Severity:
critical - Description: "Service {{service_name}} has error rate >5%"
Alert 2: High Latency
- Name:
HighLatency - Query:
P99_latency > 3000msfor5 minutes - Severity:
warning - Description: "Service {{service_name}} P99 latency >3s"
Alert 3: Service Down
- Name:
ServiceDown - Query:
request_rate == 0for2 minutes - Severity:
critical - Description: "Service {{service_name}} not receiving requests"
Alert 4: Database Connection Issues
- Name:
DatabaseConnectionsHigh - Query:
pg_active_connections > 80for5 minutes - Severity:
warning - Description: "Database {{database}} connection count >80%"
Alert 5: High Memory Usage
- Name:
HighMemoryUsage - Query:
container_memory_percent > 85for5 minutes - Severity:
warning - Description: "Pod {{pod_name}} using >85% memory"
- Name:
Test Alert Delivery
# Method 1: Create a test alert in SigNoz UI
# Go to Alerts → New Alert → Set a test condition that will fire
# Method 2: Fire a test alert via stress test
kubectl run memory-test --image=polinux/stress --restart=Never \
--namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s
# Check alert appears in SigNoz Alerts tab
# https://monitoring.bakewise.ai/signoz → Alerts
# Also check AlertManager
# https://monitoring.bakewise.ai/alertmanager
# Verify email notification received
# Clean up test
kubectl delete pod memory-test -n bakery-ia
Configure Notification Channels
In SigNoz Alerts tab, configure channels:
-
Email Channel:
- Already configured via AlertManager
- Emails sent to addresses in signoz-values-prod.yaml
-
Slack Channel (Optional):
# Add Slack webhook URL to signoz-values-prod.yaml # Under alertmanager.config.receivers.critical-alerts.slack_configs: # - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL' # channel: '#alerts-critical' -
Webhook Channel (Optional):
- Configure custom webhook for integration with PagerDuty, OpsGenie, etc.
- Add to alertmanager.config.receivers
Step 3: Configure Backups
# Create backup script on VPS
cat > ~/backup-databases.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR
# Get all database pods
DBS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database -o name)
for db in $DBS; do
DB_NAME=$(echo $db | cut -d'/' -f2)
echo "Backing up $DB_NAME..."
kubectl exec -n bakery-ia $db -- pg_dump -U postgres > "$BACKUP_DIR/${DB_NAME}.sql"
done
# Compress backups
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"
# Keep only last 7 days
find /backups -name "*.tar.gz" -mtime +7 -delete
echo "Backup completed: $BACKUP_DIR.tar.gz"
EOF
chmod +x ~/backup-databases.sh
# Test backup
./backup-databases.sh
# Setup daily cron job (2 AM)
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -
Step 3: Setup Alerting
# Update AlertManager configuration with your email
kubectl edit configmap -n monitoring alertmanager-config
# Update recipient emails in the routes section
Step 4: Verify SigNoz Monitoring is Working
Before proceeding, ensure all monitoring components are operational:
# 1. Verify SigNoz pods are running
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
# Expected pods (all should be Running/Ready):
# - signoz-0 (or signoz-1, signoz-2 for HA)
# - signoz-otel-collector-xxx
# - signoz-alertmanager-xxx
# - signoz-clickhouse-xxx
# - signoz-zookeeper-xxx
# 2. Check SigNoz UI is accessible
curl -I https://monitoring.bakewise.ai/signoz
# Should return: HTTP/2 200 OK
# 3. Verify OTel Collector is receiving data
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=100 | grep -i "received"
# Should show: "Traces received: X" "Metrics received: Y" "Logs received: Z"
# 4. Check ClickHouse database is healthy
kubectl exec -n bakery-ia deployment/signoz-clickhouse -- clickhouse-client --query="SELECT count() FROM system.tables WHERE database LIKE 'signoz_%'"
# Should return a number > 0 (tables exist)
Complete Verification Checklist:
- SigNoz UI loads at https://monitoring.bakewise.ai/signoz
- Services tab shows all 18 microservices with metrics
- Traces tab has sample traces from gateway and other services
- Dashboards tab shows PostgreSQL metrics from all 18 databases
- Dashboards tab shows Redis metrics (memory, commands, etc.)
- Dashboards tab shows RabbitMQ metrics (queues, messages)
- Dashboards tab shows Kubernetes metrics (nodes, pods)
- Logs tab displays logs from all services in bakery-ia namespace
- Alerts tab is accessible and can create new alerts
- AlertManager is reachable at https://monitoring.bakewise.ai/alertmanager
If any checks fail, troubleshoot:
# Check OTel Collector configuration
kubectl describe configmap -n bakery-ia signoz-otel-collector
# Check for errors in OTel Collector
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep -i error
# Check ClickHouse is accepting writes
kubectl logs -n bakery-ia deployment/signoz-clickhouse | grep -i error
# Restart OTel Collector if needed
kubectl rollout restart deployment/signoz-otel-collector -n bakery-ia
Step 5: Document Everything
Create a secure runbook with all credentials and procedures:
Essential Information to Document:
- VPS login credentials (stored securely in password manager)
- Database passwords (in password manager)
- Grafana admin password
- Domain registrar access (for bakewise.ai)
- Cloudflare access
- Email service credentials (SMTP)
- WhatsApp API credentials
- Docker Hub / Registry credentials
- Emergency contact information
- Rollback procedures
- Monitoring URLs and access procedures
Step 6: Train Your Team
Conduct a training session covering SigNoz and operational procedures:
Part 1: SigNoz Navigation (30 minutes)
-
Login and Overview
- Show how to access https://monitoring.bakewise.ai/signoz
- Navigate through main tabs: Services, Traces, Dashboards, Logs, Alerts
- Explain the unified nature of SigNoz (all-in-one platform)
-
Services Tab - Application Performance Monitoring
- Show all 18 microservices
- Explain RED metrics (Request rate, Error rate, Duration/latency)
- Demo: Click on a service → Operations → See endpoint breakdown
- Demo: Identify slow endpoints and high error rates
-
Traces Tab - Request Flow Debugging
- Show how to search for traces by service, operation, or time
- Demo: Click on a trace → See full waterfall (service → database → cache)
- Demo: Find slow database queries in trace spans
- Demo: Click "View Logs" to correlate trace with logs
-
Dashboards Tab - Infrastructure Monitoring
- Navigate to PostgreSQL dashboard → Show all 18 databases
- Navigate to Redis dashboard → Show cache metrics
- Navigate to Kubernetes dashboard → Show node/pod metrics
- Explain what metrics indicate issues (connection %, memory %, etc.)
-
Logs Tab - Log Search and Analysis
- Show how to filter by service, severity, time range
- Demo: Search for "error" in last hour
- Demo: Click on trace_id in log → Jump to related trace
- Show Kubernetes metadata (pod, namespace, container)
-
Alerts Tab - Proactive Monitoring
- Show how to create alerts on metrics
- Review pre-configured alerts
- Show alert history and firing alerts
- Explain how to acknowledge/silence alerts
Part 2: Operational Tasks (30 minutes)
-
Check application logs (multiple ways)
# Method 1: Via kubectl (for immediate debugging) kubectl logs -n bakery-ia deployment/orders-service --tail=100 -f # Method 2: Via SigNoz Logs tab (for analysis and correlation) # 1. Open https://monitoring.bakewise.ai/signoz → Logs # 2. Filter by k8s_deployment_name: orders-service # 3. Click on trace_id to see related request flow -
Restart services when needed
# Restart a service (rolling update, no downtime) kubectl rollout restart deployment/orders-service -n bakery-ia # Verify restart in SigNoz: # 1. Check Services tab → orders-service → Should show brief dip then recovery # 2. Check Logs tab → Filter by orders-service → See restart logs -
Investigate performance issues
# Scenario: "Orders API is slow" # 1. SigNoz → Services → orders-service → Check P99 latency # 2. SigNoz → Traces → Filter service:orders-service, duration:>1s # 3. Click on slow trace → Identify bottleneck (DB query? External API?) # 4. SigNoz → Dashboards → PostgreSQL → Check orders_db connections/queries # 5. Fix identified issue (add index, optimize query, scale service) -
Respond to alerts
- Show how to access alerts in SigNoz → Alerts tab
- Show AlertManager UI at https://monitoring.bakewise.ai/alertmanager
- Review common alerts and their resolution steps
- Reference the Production Operations Guide
Part 3: Documentation and Resources (10 minutes)
-
Share documentation
- PILOT_LAUNCH_GUIDE.md - This guide (deployment)
- PRODUCTION_OPERATIONS_GUIDE.md - Daily operations with SigNoz
- security-checklist.md - Security procedures
-
Bookmark key URLs
- SigNoz: https://monitoring.bakewise.ai/signoz
- AlertManager: https://monitoring.bakewise.ai/alertmanager
- Production app: https://bakewise.ai
-
Setup on-call rotation (if applicable)
- Configure rotation schedule in AlertManager
- Document escalation procedures
- Test alert delivery to on-call phone/email
Part 4: Hands-On Exercise (15 minutes)
Exercise: Investigate a Simulated Issue
- Create a load test to generate traffic
- Use SigNoz to find the slowest endpoint
- Identify the root cause using traces
- Correlate with logs to confirm
- Check infrastructure metrics (DB, memory, CPU)
- Propose a fix based on findings
This trains the team to use SigNoz effectively for real incidents.
Troubleshooting
Issue: Pods Not Starting
# Check pod status
kubectl describe pod POD_NAME -n bakery-ia
# Common causes:
# 1. Image pull errors
kubectl get events -n bakery-ia | grep -i "pull"
# 2. Resource limits
kubectl describe node
# 3. Volume mount issues
kubectl get pvc -n bakery-ia
Issue: Certificate Not Issuing
# Check certificate status
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
# Check challenges
kubectl get challenges -n bakery-ia
# Verify DNS is correct
nslookup bakery.yourdomain.com
Issue: Database Connection Errors
# Check database pod
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
# Check database logs
kubectl logs -n bakery-ia deployment/auth-db
# Test connection from service pod
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432
Issue: Services Can't Connect to Databases
# Check if SSL is enabled
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Check service logs for SSL errors
kubectl logs -n bakery-ia deployment/auth-service | grep -i "ssl\|tls"
# Restart service to pick up new SSL config
kubectl rollout restart deployment/auth-service -n bakery-ia
Issue: Out of Resources
# Check node resources
kubectl top nodes
# Check pod resource usage
kubectl top pods -n bakery-ia
# Identify resource hogs
kubectl top pods -n bakery-ia --sort-by=memory
# Scale down non-critical services temporarily
kubectl scale deployment monitoring -n bakery-ia --replicas=0
Next Steps After Successful Launch
-
Monitor for 48 Hours
- Check dashboards daily
- Review error logs
- Monitor resource usage
- Test all functionality
-
Optimize Based on Metrics
- Adjust resource limits if needed
- Fine-tune autoscaling thresholds
- Optimize database queries if slow
-
Onboard First Tenant
- Create test tenant
- Upload sample data
- Test all features
- Gather feedback
-
Scale Gradually
- Add 1-2 tenants at a time
- Monitor resource usage
- Upgrade VPS if needed (see scaling guide)
-
Plan for Growth
- Review PRODUCTION_OPERATIONS_GUIDE.md
- Implement additional monitoring
- Plan capacity upgrades
- Consider managed services for scale
Cost Scaling Path
| Tenants | RAM | CPU | Storage | Monthly Cost |
|---|---|---|---|---|
| 10 | 20 GB | 8 cores | 200 GB | €40-80 |
| 25 | 32 GB | 12 cores | 300 GB | €80-120 |
| 50 | 48 GB | 16 cores | 500 GB | €150-200 |
| 100+ | Consider multi-node cluster or managed K8s | €300+ |
Support Resources
Documentation:
- Operations Guide: PRODUCTION_OPERATIONS_GUIDE.md - Daily operations, monitoring, incident response
- Security Guide: security-checklist.md - Security procedures and compliance
- Database Security: database-security.md - Database operations and TLS configuration
- TLS Configuration: tls-configuration.md - Certificate management
- RBAC Implementation: rbac-implementation.md - Access control
Monitoring Access:
- SigNoz (Primary): https://monitoring.bakewise.ai/signoz - All-in-one observability
- Services: Application performance monitoring (APM)
- Traces: Distributed tracing across all services
- Dashboards: PostgreSQL, Redis, RabbitMQ, Kubernetes metrics
- Logs: Centralized log management with trace correlation
- Alerts: Alert configuration and management
- AlertManager: https://monitoring.bakewise.ai/alertmanager - Alert routing and notifications
External Resources:
- MicroK8s Docs: https://microk8s.io/docs
- Kubernetes Docs: https://kubernetes.io/docs
- Let's Encrypt: https://letsencrypt.org/docs
- Cloudflare DNS: https://developers.cloudflare.com/dns
- SigNoz Documentation: https://signoz.io/docs/
- OpenTelemetry Documentation: https://opentelemetry.io/docs/
Monitoring Architecture:
- OpenTelemetry: Industry-standard instrumentation framework
- Auto-instruments FastAPI, HTTPX, SQLAlchemy, Redis
- Collects traces, metrics, and logs from all services
- Exports to SigNoz via OTLP protocol (gRPC port 4317, HTTP port 4318)
- SigNoz Components:
- Frontend: Web UI for visualization and analysis
- OTel Collector: Receives and processes telemetry data
- ClickHouse: Time-series database for fast queries
- AlertManager: Alert routing and notification delivery
- Zookeeper: Coordination service for ClickHouse cluster
Summary Checklist
Pre-Deployment Configuration (LOCAL MACHINE)
- Production secrets configured - ✅ JWT, database passwords, API keys (ALREADY DONE)
- External service credentials - Update SMTP, WhatsApp, Stripe in secrets.yaml
- imagePullSecrets removed - Delete from all 67 manifests
- Image tags updated - Change all 'latest' to v1.0.0 (semantic version)
- SigNoz namespace fixed - ✅ Already done (bakery-ia namespace)
- Cert-manager email updated - ✅ Already set to admin@bakewise.ai
- Stripe publishable key updated - Replace
pk_test_...with production key in configmap.yaml - Pilot mode verified - ✅ VITE_PILOT_MODE_ENABLED=true (default is correct)
- Manifests validated - No 'latest' tags, no imagePullSecrets remaining
Infrastructure Setup
- VPS provisioned and accessible
- k3s (or Kubernetes) installed and configured
- nginx-ingress-controller installed
- metrics-server installed and working
- cert-manager installed
- local-path-provisioner installed
- Domain registered and DNS configured
- Cloudflare protection enabled (optional but recommended)
Secrets and Configuration
- TLS certificates generated (postgres, redis)
- Email service configured and tested
- WhatsApp API setup (optional for launch)
- Container images built and pushed with version tags
- Production configs verified (domains, CORS, storage class)
- Strong passwords generated for all services
- Docker registry secret created (dockerhub-creds)
- Application secrets applied
Monitoring
- SigNoz deployed via Helm
- SigNoz pods running and healthy
- SigNoz in bakery-ia namespace
CI/CD Infrastructure (Optional)
- Gitea deployed and accessible
- Gitea admin user created
- Repository created and code pushed
- Tekton Pipelines installed
- Tekton Triggers configured
- Tekton Helm chart deployed
- Webhook configured in Gitea
- Flux CD installed
- GitRepository and Kustomization configured
- End-to-end pipeline test successful
Email Infrastructure (Optional - Mailu)
- CoreDNS configured with DNS-over-TLS for DNSSEC
- Mailu TLS certificate created
- Mailu deployed via Helm
- Admin user created
- DKIM record added to DNS
- Email sending/receiving tested
Geocoding (Optional - Nominatim)
- Nominatim deployed
- Health check passing
- Application configured to use Nominatim
Application Deployment
- All pods running successfully
- Databases accepting TLS connections
- Let's Encrypt certificates issued
- Frontend accessible via HTTPS
- API health check passing
- Test user can login
- Email delivery working
- SigNoz monitoring accessible
- Metrics flowing to SigNoz
- Pilot coupon verified - Check tenant-service logs for "Pilot coupon created successfully"
Post-Deployment
- Backups configured and tested
- Team trained on operations
- Documentation complete
- Emergency procedures documented
- Monitoring alerts configured
🎉 Congratulations! Your Bakery-IA platform is now live in production!
Estimated total time: 2-4 hours for first deployment Subsequent updates: 15-30 minutes
Document Version: 3.0 Last Updated: 2026-01-21 Maintained By: DevOps Team
Changes in v3.0:
- NEW: Infrastructure Architecture Overview - Added component layers diagram and deployment dependencies
- NEW: CI/CD Infrastructure Deployment - Complete guide for Gitea, Tekton, and Flux CD
- Step-by-step Gitea installation with container registry
- Tekton Pipelines and Triggers setup via Helm
- Flux CD GitOps configuration
- Webhook integration and end-to-end testing
- Troubleshooting guide for CI/CD issues
- NEW: Mailu Email Server Deployment - Comprehensive self-hosted email setup
- CoreDNS configuration with DNS-over-TLS for DNSSEC validation
- Mailu Helm deployment with all components
- DKIM/SPF/DMARC configuration
- Troubleshooting common Mailu issues
- NEW: Nominatim Geocoding Service - Address lookup service deployment
- NEW: SigNoz Monitoring Deployment - Dedicated section (previously embedded)
- UPDATED: Table of Contents - Reorganized with new sections (18 sections total)
- UPDATED: Summary Checklist - Added CI/CD, Email, and Geocoding verification items
- UPDATED: Infrastructure Components Summary - Added all optional components with namespaces
Changes in v2.1:
- Updated DNS configuration for Namecheap (primary) with Cloudflare as optional
- Clarified MicroK8s ingress class is
public(notnginx) - Updated Let's Encrypt ClusterIssuer documentation to reference pre-configured files
- Added firewall requirements for clouding.io VPS
- Emphasized port 80/443 requirements for HTTP-01 challenges
Changes in v2.0:
- Added critical pre-deployment fixes section
- Updated infrastructure setup for MicroK8s
- Added required component installation (nginx-ingress, metrics-server, etc.)
- Updated configuration steps with domain replacement
- Added Docker registry secret creation
- Added SigNoz Helm deployment before application
- Updated storage class configuration
- Added image tag version requirements
- Expanded verification checklist