Files
bakery-ia/PRODUCTION_DEPLOYMENT_GUIDE.md
2026-01-21 23:16:19 +01:00

48 KiB

Bakery-IA Production Deployment Guide

Complete guide for deploying Bakery-IA to production on a MicroK8s cluster

Version 4.0
Last Updated 2026-01-21
Target Environment VPS with MicroK8s (Ubuntu 22.04 LTS)
Estimated Deployment Time 3-5 hours (first-time deployment)
Monthly Cost ~€41-81 (10-tenant pilot)

Table of Contents

  1. Quick Start Overview
  2. Prerequisites
  3. Phase 0: Transfer Infrastructure Code to Server
  4. Phase 1: VPS Setup & MicroK8s Installation
  5. Phase 2: Domain & DNS Configuration
  6. Phase 3: Deploy Foundation Layer
  7. Phase 4: Deploy CI/CD Infrastructure
  8. Phase 5: Pre-Pull and Push Base Images to Gitea Registry
  9. Phase 6: Deploy Application Services
  10. Phase 7: Deploy Optional Services
  11. Phase 8: Verification & Validation
  12. Post-Deployment Operations
  13. Troubleshooting Guide
  14. Reference & Resources

Quick Start Overview

What You're Deploying

A complete multi-tenant SaaS platform consisting of:

Component Details
Microservices 18 Python/FastAPI services
Databases 18 PostgreSQL instances with TLS
Cache Redis with TLS
Message Broker RabbitMQ
Object Storage MinIO (S3-compatible)
Email Mailu (self-hosted) with Mailgun relay
Monitoring SigNoz (unified observability)
CI/CD Gitea + Tekton + Flux CD
Security TLS everywhere, RBAC, Network Policies

Infrastructure Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         LAYER 6: APPLICATION                                │
│  Frontend │ Gateway │ 18 Microservices │ CronJobs & Workers                │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 5: MONITORING                                 │
│  SigNoz (Unified Observability) │ AlertManager │ OTel Collector            │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 4: PLATFORM SERVICES (Optional)              │
│  Mailu (Email) │ Nominatim (Geocoding) │ CI/CD (Tekton, Flux, Gitea)      │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 3: DATA & STORAGE                             │
│  PostgreSQL (18 DBs) │ Redis │ RabbitMQ │ MinIO                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 2: NETWORK & SECURITY                         │
│  Unbound DNS │ CoreDNS │ Ingress Controller │ Cert-Manager │ TLS          │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 1: FOUNDATION                                 │
│  Namespaces │ Storage Classes │ RBAC │ ConfigMaps │ Secrets               │
├─────────────────────────────────────────────────────────────────────────────┤
│                         LAYER 0: KUBERNETES CLUSTER                         │
│  MicroK8s (Production) │ Kind (Local Dev) │ EKS (AWS Alternative)          │
└─────────────────────────────────────────────────────────────────────────────┘

Deployment Order (Critical)

Components must be deployed in this order due to dependencies:

Phase 0: Transfer code to server (bootstrap)
    ↓
Phase 1: MicroK8s + Addons
    ↓
Phase 2: DNS + Domain configuration
    ↓
Phase 3: Foundation (Namespaces, Cert-Manager, TLS)
    ↓
Phase 4: CI/CD (Gitea → Tekton → Flux)
    ↓
Phase 5: Base & Service Images (Pre-pull base images, build service images)
    ↓
Phase 6: Application Services (21 microservices + Gateway + Data Layer)
    ↓
Phase 7: Optional (Mailu, SigNoz, Nominatim)
    ↓
Phase 8: Verification & Validation

Note on First Deployment: For the first deployment, you must manually build and push service images (Phase 5, Step 5.5) before applying the production kustomization. After the first deployment, the CI/CD pipeline will automatically build and push images on subsequent commits.

Cost Breakdown

Service Provider Monthly Cost
VPS (20GB RAM, 8 vCPU, 200GB SSD) clouding.io €40-80
Domain Namecheap/Cloudflare ~€1.25 (€15/year)
Email Relay Mailgun (free tier) €0
SSL Certificates Let's Encrypt €0
DNS Cloudflare €0
Total €41-81/month

Prerequisites

System Requirements

Requirement Specification
OS Ubuntu 22.04 LTS
RAM Minimum 16GB (20GB recommended)
CPU 8 vCPU cores
Storage 200GB NVMe SSD
Network Static public IP, 1 Gbps

Required Accounts

  • VPS Provider (clouding.io, Hetzner, DigitalOcean, etc.)
  • Domain Registrar (Namecheap, Cloudflare, etc.)
  • Cloudflare Account (recommended for DNS)
  • Mailgun Account (for email relay, optional)
  • Stripe Account (for payments)

Local Machine Requirements

# Verify these tools are installed:
kubectl version --client    # Kubernetes CLI
docker --version            # Container runtime
git --version               # Version control
ssh -V                      # SSH client
helm version                # Helm package manager
openssl version             # TLS utilities

# Install if missing (macOS):
brew install kubectl docker git helm openssl

# Install if missing (Ubuntu):
sudo apt install -y docker.io git openssl
sudo snap install kubectl --classic
sudo snap install helm --classic

Set up SSH config for easier access:

# Create/edit ~/.ssh/config
cat >> ~/.ssh/config << 'EOF'
Host bakery-vps
    HostName 200.234.233.87
    User root
    IdentityFile ~/.ssh/bakewise.pem
    IdentitiesOnly yes
EOF

# Set proper permissions on key
chmod 600 ~/.ssh/bakewise.pem

# Test connection
ssh bakery-vps

Phase 0: Transfer Infrastructure Code to Server

Problem: You need the infrastructure code on the server to deploy Gitea, but Gitea is your target repository.

This is the bootstrap approach - transfer code directly, then push to Gitea once it's running.

# From your LOCAL machine - transfer entire repository
rsync -avz --progress \
  --exclude='.git' \
  --exclude='node_modules' \
  --exclude='__pycache__' \
  --exclude='.venv' \
  --exclude='*.pyc' \
  /Users/urtzialfaro/Documents/bakery-ia/ \
  bakery-vps:/root/bakery-ia/

# Verify transfer
ssh bakery-vps "ls -la /root/bakery-ia/infrastructure/"

Option 2: SCP Tarball Transfer

# Create a tarball locally (excludes unnecessary files)
cd /Users/urtzialfaro/Documents/bakery-ia
tar -czvf /tmp/bakery-ia-infra.tar.gz \
  --exclude='.git' \
  --exclude='node_modules' \
  --exclude='__pycache__' \
  --exclude='.venv' \
  infrastructure/ \
  PRODUCTION_DEPLOYMENT_GUIDE.md \
  docs/

# Transfer to server
scp /tmp/bakery-ia-infra.tar.gz bakery-vps:/root/

# On server - extract
ssh bakery-vps "cd /root && tar -xzvf bakery-ia-infra.tar.gz"

Option 3: Temporary GitHub/GitLab (If Needed)

Use if rsync/scp are not available:

  1. Push to a temporary private GitHub/GitLab repo
  2. Clone on the server
  3. After Gitea is running, migrate the repo to Gitea
  4. Delete the temporary remote repo

After Transfer - Push to Gitea (Post Phase 5)

Once Gitea is deployed (Phase 5), push the full repo:

# On the SERVER after Gitea is running
cd /root/bakery-ia
git init
git add .
git commit -m "Initial commit - production deployment"
git remote add origin https://gitea.bakewise.ai/bakery-admin/bakery-ia.git
git push -u origin main

Phase 1: VPS Setup & MicroK8s Installation

Step 1.1: Initial Server Setup

# SSH into your VPS
ssh bakery-vps

# Update system
apt update && apt upgrade -y

# Set hostname
hostnamectl set-hostname bakery-ia-prod

# Install essential tools
apt install -y curl wget git jq openssl

Step 1.2: Install MicroK8s

# Install MicroK8s (stable channel)
snap install microk8s --classic --channel=1.28/stable

# Add user to microk8s group
usermod -a -G microk8s $USER
chown -f -R $USER ~/.kube
newgrp microk8s

# Wait for MicroK8s to be ready
microk8s status --wait-ready

Step 1.3: Enable Required Addons

# Enable core addons (in order)
microk8s enable dns                  # DNS resolution
microk8s enable hostpath-storage     # Storage provisioner
microk8s enable ingress              # NGINX ingress (class: "public")
microk8s enable cert-manager         # Let's Encrypt certificates
microk8s enable metrics-server       # HPA autoscaling
microk8s enable rbac                 # Role-based access control

# Optional but recommended
microk8s enable prometheus           # Metrics collection

# Setup kubectl alias
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc

# Verify installation
kubectl get nodes                    # Should show: Ready
kubectl get storageclass             # Should show: microk8s-hostpath (default)
kubectl get pods -A                  # All pods should be Running

Step 1.4: Configure kubectl Access

# Create kubectl config
mkdir -p ~/.kube
microk8s config > ~/.kube/config
chmod 600 ~/.kube/config

# Test cluster connectivity
kubectl cluster-info
kubectl top nodes

Step 1.5: Install Helm

# Install Helm 3
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Verify installation
helm version

Step 1.6: Configure Firewall (Optional)

Skip this step if: Your VPS provider already has firewall rules configured in their dashboard with ports 22, 80, 443 open. Most providers (clouding.io, Hetzner, etc.) manage this at the infrastructure level.

Required ports for Bakery-IA:

Port Protocol Purpose
22 TCP SSH access
80 TCP HTTP (Let's Encrypt ACME challenges)
443 TCP HTTPS (application access)
25, 465, 587 TCP SMTP/SMTPS (if using Mailu)
143, 993 TCP IMAP/IMAPS (if using Mailu)

Only if using UFW on the server:

# Allow necessary ports
ufw allow 22/tcp      # SSH
ufw allow 80/tcp      # HTTP (required for Let's Encrypt)
ufw allow 443/tcp     # HTTPS

# Enable firewall (if not already enabled)
ufw enable

# Verify
ufw status verbose

Phase 2: Domain & DNS Configuration

Step 2.1: DNS Records Configuration

Add these DNS records pointing to your VPS IP (200.234.233.87):

Type Name Value TTL
A @ 200.234.233.87 Auto
A www 200.234.233.87 Auto
A mail 200.234.233.87 Auto
A monitoring 200.234.233.87 Auto
A gitea 200.234.233.87 Auto
A registry 200.234.233.87 Auto
A api 200.234.233.87 Auto
MX @ mail.bakewise.ai 10
TXT @ v=spf1 mx a -all Auto
TXT _dmarc v=DMARC1; p=reject; rua=mailto:admin@bakewise.ai Auto

Step 2.2: Verify DNS Propagation

# Test DNS resolution (wait 5-10 minutes after changes)
dig bakewise.ai +short
dig www.bakewise.ai +short
dig mail.bakewise.ai +short
dig gitea.bakewise.ai +short

# Check MX records
dig bakewise.ai MX +short

# Use online tools for comprehensive check:
# https://dnschecker.org/
# https://mxtoolbox.com/

Step 2.3: Cloudflare Configuration (If Using)

If using Cloudflare for DNS:

  1. SSL/TLS Mode: Set to "Full (strict)"
  2. Proxy Status: Set to "DNS only" (orange cloud OFF) for direct IP access
  3. Edge Certificates: Let cert-manager handle certificates (not Cloudflare)

Phase 3: Deploy Foundation Layer

Step 3.1: Create Namespaces

# Apply namespace definitions using kustomize (-k flag)
kubectl apply -k infrastructure/namespaces/

# Verify
kubectl get namespaces
# Expected: bakery-ia, flux-system, tekton-pipelines

# Alternative: Apply individual namespace files directly
# kubectl apply -f infrastructure/namespaces/bakery-ia.yaml
# kubectl apply -f infrastructure/namespaces/flux-system.yaml
# kubectl apply -f infrastructure/namespaces/tekton-pipelines.yaml

Step 3.2: Install Cert-Manager and Deploy ClusterIssuers

Note: The MicroK8s cert-manager addon may only create the namespace without installing the actual components. Install cert-manager manually to ensure it works correctly.

# Check if cert-manager pods exist
kubectl get pods -n cert-manager

# If no pods are running, install cert-manager manually:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml

# Wait for all cert-manager pods to be ready (this may take 1-2 minutes)
kubectl wait --for=condition=ready pod --all -n cert-manager --timeout=300s

# Verify all 3 components are running
kubectl get pods -n cert-manager
# Expected output:
# NAME                                       READY   STATUS    RESTARTS   AGE
# cert-manager-xxxxxxxxxx-xxxxx              1/1     Running   0          1m
# cert-manager-cainjector-xxxxxxxxxx-xxxxx   1/1     Running   0          1m
# cert-manager-webhook-xxxxxxxxxx-xxxxx      1/1     Running   0          1m

Deploy ClusterIssuers:

# Wait for webhook to be fully initialized
sleep 10

# Apply ClusterIssuers for Let's Encrypt
kubectl apply -f infrastructure/platform/cert-manager/cluster-issuer-staging.yaml
kubectl apply -f infrastructure/platform/cert-manager/cluster-issuer-production.yaml

# Verify ClusterIssuers are ready
kubectl get clusterissuer

# Expected output:
# NAME                     READY   AGE
# letsencrypt-production   True    1m
# letsencrypt-staging      True    1m

If you get webhook errors:

# The webhook may need more time to initialize
# Wait and retry:
sleep 30
kubectl apply -f infrastructure/platform/cert-manager/cluster-issuer-staging.yaml
kubectl apply -f infrastructure/platform/cert-manager/cluster-issuer-production.yaml

Note: Common configs (secrets, configmaps) and TLS secrets are automatically included when you apply the prod kustomization in Phase 6. No manual application needed.


Phase 4: Deploy CI/CD Infrastructure

Step 4.1: Deploy Gitea (Git Server + Container Registry)

# Add Gitea Helm repository
helm repo add gitea https://dl.gitea.io/charts
helm repo update

# Generate and export admin password (REQUIRED for --production flag)
export GITEA_ADMIN_PASSWORD=$(openssl rand -base64 32)
echo "Gitea Admin Password: $GITEA_ADMIN_PASSWORD"
echo "⚠️  SAVE THIS PASSWORD SECURELY!"

# Run setup script - creates secrets and init job automatically
# The script will:
#   1. Create gitea namespace (if not exists)
#   2. Create gitea-admin-secret in gitea namespace
#   3. Create gitea-registry-secret in bakery-ia namespace
#   4. Apply gitea-init-job.yaml (creates bakery-ia repo)
cd /root/bakery-ia/infrastructure/cicd/gitea
chmod +x setup-admin-secret.sh
./setup-admin-secret.sh --production
cd /root/bakery-ia

# Install Gitea with production values
helm upgrade --install gitea gitea/gitea -n gitea \
  -f infrastructure/cicd/gitea/values.yaml \
  -f infrastructure/cicd/gitea/values-prod.yaml \
  --timeout 10m \
  --wait

# Wait for Gitea to be ready
kubectl wait --for=condition=ready pod -n gitea -l app.kubernetes.io/name=gitea --timeout=300s

# Verify
kubectl get pods -n gitea

# Check init job status (creates bakery-ia repository)
kubectl logs -n gitea -l app.kubernetes.io/component=init --tail=50

Step 4.2: Push Repository to Gitea

cd /root/bakery-ia

# Fix Git ownership warning (common when using rsync as different user)
git config --global --add safe.directory /root/bakery-ia

# Configure git user (required for commits)
git config --global user.email "admin@bakewise.ai"
git config --global user.name "Bakery Admin"

# Initialize repository
git init

# Rename branch to main (git init may create 'master' by default)
git branch -m main

# Add all files and commit
git add .
git commit -m "Initial commit - production deployment"

# Add remote and push (you'll need the admin password from Step 5.1)
git remote add origin https://gitea.bakewise.ai/bakery-admin/bakery-ia.git

# Force push to overwrite init job's auto-generated content
# This is safe for initial deployment - your local code is the source of truth
git push -u origin main --force

Step 4.3: Verify Registry Secret (Already Created)

Note: The registry secret gitea-registry-secret was already created by setup-admin-secret.sh in Step 4.1.

# Verify the registry secret exists
kubectl get secret gitea-registry-secret -n bakery-ia

# Expected output:
# NAME                    TYPE                             DATA   AGE
# gitea-registry-secret   kubernetes.io/dockerconfigjson   1      Xm

Step 4.4: Deploy Tekton (CI Pipelines)

# Step 1: Install Tekton Pipelines (the controller)
kubectl apply --filename https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml

# Wait for Tekton Pipelines to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/part-of=tekton-pipelines -n tekton-pipelines --timeout=300s

# Step 2: Install Tekton Triggers (for webhooks)
kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/latest/interceptors.yaml

# Wait for Tekton Triggers to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/part-of=tekton-triggers -n tekton-pipelines --timeout=300s

# Verify Tekton is installed
kubectl get pods -n tekton-pipelines

# Step 3: Create flux-system namespace (required by Tekton helm chart)
# The Tekton chart creates a secret for Flux in this namespace
kubectl create namespace flux-system --dry-run=client -o yaml | kubectl apply -f -

# Step 4: Get Gitea password and generate webhook token
export GITEA_ADMIN_PASSWORD=$(kubectl get secret gitea-admin-secret -n gitea -o jsonpath='{.data.password}' | base64 -d)
export TEKTON_WEBHOOK_TOKEN=$(openssl rand -hex 32)
echo "Tekton Webhook Token: $TEKTON_WEBHOOK_TOKEN"
echo "⚠️  SAVE THIS TOKEN - needed to configure Gitea webhook!"

# Step 5: Deploy Bakery-IA CI/CD pipelines and tasks
helm upgrade --install tekton-cicd infrastructure/cicd/tekton-helm \
  -n tekton-pipelines \
  -f infrastructure/cicd/tekton-helm/values.yaml \
  -f infrastructure/cicd/tekton-helm/values-prod.yaml \
  --set secrets.webhook.token=$TEKTON_WEBHOOK_TOKEN \
  --set secrets.registry.password=$GITEA_ADMIN_PASSWORD \
  --set secrets.git.password=$GITEA_ADMIN_PASSWORD \
  --timeout 5m

# Verify all components
kubectl get pods -n tekton-pipelines
kubectl get tasks -n tekton-pipelines
kubectl get pipelines -n tekton-pipelines
kubectl get eventlisteners -n tekton-pipelines

Step 4.5: Deploy Flux CD (GitOps)

# Step 1: Install Flux CLI (required for bootstrap)
curl -s https://fluxcd.io/install.sh | sudo bash

# Verify Flux CLI installation
flux --version

# Step 2: Install Flux components (controllers and CRDs)
flux install --namespace=flux-system

# Wait for Flux controllers to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/part-of=flux -n flux-system --timeout=300s

# Verify Flux controllers are running
kubectl get pods -n flux-system

# Step 3: Create Git credentials secret for Flux to access Gitea
export GITEA_ADMIN_PASSWORD=$(kubectl get secret gitea-admin-secret -n gitea -o jsonpath='{.data.password}' | base64 -d)

kubectl create secret generic gitea-credentials \
  --namespace=flux-system \
  --from-literal=username=bakery-admin \
  --from-literal=password=$GITEA_ADMIN_PASSWORD

# Step 4: Deploy Bakery-IA Flux configuration (GitRepository + Kustomization)
helm upgrade --install flux-cd infrastructure/cicd/flux \
  -n flux-system \
  --timeout 5m

# Verify Flux resources
kubectl get gitrepository -n flux-system
kubectl get kustomization -n flux-system

# Check Flux sync status
flux get sources git -n flux-system
flux get kustomizations -n flux-system

Phase 5: Pre-Pull and Push Base Images to Gitea Registry

Critical Step: This phase must be completed after Gitea is configured (Phase 4) and before deploying application services (Phase 6). It ensures all required base images are available in the Gitea registry.

Overview

This phase involves two main steps:

  1. Step 5.1-5.4: Pre-pull base images from Docker Hub and push them to Gitea registry
  2. Step 5.5: Build and push all service images (first-time deployment only)

Important: MicroK8s uses containerd, not Docker. You need to install Docker separately for building and pushing images. Also, scripts need kubectl to be available in PATH.

# Step 1: Install Docker
apt-get update
apt-get install -y docker.io

# Start and enable Docker service
systemctl enable docker
systemctl start docker

# Verify Docker installation
docker --version
# Expected: Docker version 28.x.x or similar

# Step 2: Create kubectl symlink (required for scripts)
# MicroK8s bundles its own kubectl, but scripts need it in PATH
sudo ln -sf /snap/microk8s/current/microk8s-kubectl.wrapper /usr/local/bin/kubectl

# Verify kubectl works
kubectl version --client

Base Images Required

The following base images must be available in the Gitea registry:

Category Image Used By
Python Runtime python:3.11-slim All microservices, gateway
Frontend Build node:18-alpine Frontend build stage
Frontend Runtime nginx:1.25-alpine Frontend production server
Database postgres:17-alpine All PostgreSQL instances
Cache redis:7.4-alpine Redis cache
Message Broker rabbitmq:4.1-management-alpine RabbitMQ
Storage minio/minio:RELEASE.2024-11-07T00-52-20Z MinIO object storage
CI/CD gcr.io/kaniko-project/executor:v1.23.0 Tekton image builds

Step 5.1: Pre-Pull Base Images and Push to Registry

# Navigate to the scripts directory
cd /root/bakery-ia/scripts

# Make the script executable
chmod +x prepull-base-images-for-prod.sh

# Run the prepull script in production mode WITH push enabled
# IMPORTANT: Use -r flag to specify the external registry URL
./prepull-base-images-for-prod.sh -e prod --push-images -r registry.bakewise.ai

# The script will:
#   1. Authenticate with Docker Hub (uses embedded credentials or env vars)
#   2. Pull all required base images from Docker Hub/GHCR
#   3. Tag them for Gitea registry (bakery-admin namespace)
#   4. Push them to the Gitea container registry
#   5. Report success/failure for each image

Alternative: Specify Custom Registry URL

# If auto-detection fails or you need a specific registry URL:
./prepull-base-images-for-prod.sh -e prod --push-images -r registry.bakewise.ai

Handle Docker Hub Rate Limits

# If you hit Docker Hub rate limits, use your own credentials:
export DOCKER_HUB_USERNAME=your_username
export DOCKER_HUB_PASSWORD=your_password_or_token
./prepull-base-images-for-prod.sh -e prod --push-images

Step 5.2: Verify Images in Gitea Registry

# Get Gitea admin password
export GITEA_ADMIN_PASSWORD=$(kubectl get secret gitea-admin-secret -n gitea -o jsonpath='{.data.password}' | base64 -d)

# Login to Gitea registry
# Note: Use registry.bakewise.ai for external access
docker login registry.bakewise.ai -u bakery-admin -p $GITEA_ADMIN_PASSWORD

# List all images in the registry
curl -s -u bakery-admin:$GITEA_ADMIN_PASSWORD https://registry.bakewise.ai/v2/_catalog | jq

# Verify specific critical images exist
echo "Checking Python base image..."
curl -s -u bakery-admin:$GITEA_ADMIN_PASSWORD https://registry.bakewise.ai/v2/bakery-admin/python/tags/list | jq

echo "Checking Node.js base image..."
curl -s -u bakery-admin:$GITEA_ADMIN_PASSWORD https://registry.bakewise.ai/v2/bakery-admin/node/tags/list | jq

echo "Checking Nginx base image..."
curl -s -u bakery-admin:$GITEA_ADMIN_PASSWORD https://registry.bakewise.ai/v2/bakery-admin/nginx/tags/list | jq

Alternative: Verify via Gitea Web Interface

  1. Visit https://gitea.bakewise.ai
  2. Login with username: bakery-admin, password: (from secret)
  3. Navigate to Packages > Container
  4. Verify images are listed under the bakery-admin namespace
  5. Confirm tags match expected versions (3.11-slim, 18-alpine, 1.25-alpine, etc.)

Step 5.3: Troubleshooting Image Issues

Registry Not Accessible

# Check Gitea pods are running
kubectl get pods -n gitea

# Check Gitea service
kubectl get svc -n gitea

# Check ingress for registry
kubectl get ingress -n gitea

# View Gitea logs for registry errors
kubectl logs -n gitea -l app.kubernetes.io/name=gitea --tail=100

Images Failed to Push

# Verify Docker can reach the registry
docker info | grep -i registry

# Test registry connectivity
curl -v https://registry.bakewise.ai/v2/

# Check for TLS certificate issues
openssl s_client -connect registry.bakewise.ai:443 -servername registry.bakewise.ai

Re-run Failed Images Only

# Manually pull and push a specific image
docker pull python:3.11-slim
docker tag python:3.11-slim registry.bakewise.ai/bakery-admin/python:3.11-slim
docker push registry.bakewise.ai/bakery-admin/python:3.11-slim

Step 5.4: Verify CI/CD Pipeline Can Access Images

# Verify gitea-registry-secret exists in bakery-ia namespace
kubectl get secret gitea-registry-secret -n bakery-ia

# Check the secret contains correct registry URL
kubectl get secret gitea-registry-secret -n bakery-ia \
  -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys[]'

# Test that Kubernetes can pull images using the secret
# Create a test pod that uses the base image
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-registry-pull
  namespace: bakery-ia
spec:
  containers:
  - name: test
    image: registry.bakewise.ai/bakery-admin/python:3.11-slim
    command: ["python", "--version"]
  imagePullSecrets:
  - name: gitea-registry-secret
  restartPolicy: Never
EOF

# Wait and check if pod succeeded
kubectl wait --for=condition=Ready pod/test-registry-pull -n bakery-ia --timeout=60s || \
  kubectl describe pod/test-registry-pull -n bakery-ia

# Clean up test pod
kubectl delete pod test-registry-pull -n bakery-ia --ignore-not-found

If this test fails: The CI/CD pipeline and application deployments will not be able to pull images. Check:

  1. Registry URL in the secret matches your setup
  2. Credentials are correct
  3. Images were successfully pushed in Step 5.1

Step 5.5: Build and Push Service Images (First-Time Deployment Only)

Critical: For the first deployment, service images don't exist in the registry yet. You must build and push them before applying the production kustomization in Phase 6.

# Navigate to the repository root
cd /root/bakery-ia

# Make the script executable
chmod +x scripts/build-all-services.sh

# Run the build script
# This will build and push all 21 services to the Gitea registry
./scripts/build-all-services.sh

The script builds the following services:

Service Image Name Dockerfile
Gateway gateway gateway/Dockerfile
Frontend dashboard frontend/Dockerfile.kubernetes
Auth auth-service services/auth/Dockerfile
Tenant tenant-service services/tenant/Dockerfile
Training training-service services/training/Dockerfile
Forecasting forecasting-service services/forecasting/Dockerfile
Sales sales-service services/sales/Dockerfile
Inventory inventory-service services/inventory/Dockerfile
Recipes recipes-service services/recipes/Dockerfile
Suppliers suppliers-service services/suppliers/Dockerfile
POS pos-service services/pos/Dockerfile
Orders orders-service services/orders/Dockerfile
Production production-service services/production/Dockerfile
Procurement procurement-service services/procurement/Dockerfile
Distribution distribution-service services/distribution/Dockerfile
External external-service services/external/Dockerfile
Notification notification-service services/notification/Dockerfile
Orchestrator orchestrator-service services/orchestrator/Dockerfile
Alert Processor alert-processor services/alert_processor/Dockerfile
AI Insights ai-insights-service services/ai_insights/Dockerfile
Demo Session demo-session-service services/demo_session/Dockerfile

Option B: Trigger CI/CD Pipeline

If Tekton is properly configured, you can trigger the CI/CD pipeline instead:

cd /root/bakery-ia

# Create an empty commit to trigger the pipeline
git commit --allow-empty -m "Trigger initial CI/CD build"
git push origin main

# Monitor pipeline execution
kubectl get pipelineruns -n tekton-pipelines --watch

# Wait for all builds to complete (may take 20-30 minutes)
kubectl wait --for=condition=Succeeded pipelinerun --all -n tekton-pipelines --timeout=1800s

Option C: Build Individual Services Manually

# Get credentials
export GITEA_ADMIN_PASSWORD=$(kubectl get secret gitea-admin-secret -n gitea -o jsonpath='{.data.password}' | base64 -d)
export REGISTRY="registry.bakewise.ai/bakery-admin"

# Login to registry
docker login registry.bakewise.ai -u bakery-admin -p $GITEA_ADMIN_PASSWORD

# Build and push a single service (example: auth-service)
docker build -t $REGISTRY/auth-service:latest \
  --build-arg BASE_REGISTRY=$REGISTRY \
  --build-arg PYTHON_IMAGE=python:3.11-slim \
  -f services/auth/Dockerfile .
docker push $REGISTRY/auth-service:latest

# Build and push frontend
docker build -t $REGISTRY/dashboard:latest \
  -f frontend/Dockerfile.kubernetes frontend/
docker push $REGISTRY/dashboard:latest

Step 5.6: Verify All Service Images Are Available

# Get Gitea admin password
export GITEA_ADMIN_PASSWORD=$(kubectl get secret gitea-admin-secret -n gitea -o jsonpath='{.data.password}' | base64 -d)

# List all images in the registry
echo "=== Images in Gitea Registry ==="
curl -s -u bakery-admin:$GITEA_ADMIN_PASSWORD https://registry.bakewise.ai/v2/_catalog | jq -r '.repositories[]' | sort

# Verify critical service images exist
for service in gateway dashboard auth-service tenant-service forecasting-service; do
  echo -n "Checking $service... "
  if curl -s -u bakery-admin:$GITEA_ADMIN_PASSWORD \
    "https://registry.bakewise.ai/v2/bakery-admin/$service/tags/list" | jq -e '.tags' > /dev/null 2>&1; then
    echo "✅ OK"
  else
    echo "❌ MISSING"
  fi
done

Ready for Phase 6: Once all service images are verified in the registry, you can proceed to Phase 6: Deploy Application Services.

Phase 6: Deploy Application Services

Prerequisite: This phase assumes that all service images have been built and pushed to the Gitea registry (completed in Phase 5, Step 5.5). The production kustomization references these pre-built images.

Step 6.1: Apply Production Certificate

# Apply the production TLS certificate
kubectl apply -f infrastructure/environments/prod/k8s-manifests/prod-certificate.yaml

# Verify certificate is issued
kubectl get certificate -n bakery-ia
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia

Step 6.2: Deploy Application with Kustomize

# Apply the complete production configuration
kubectl apply -k infrastructure/environments/prod/k8s-manifests

# Wait for all deployments to be ready (10-15 minutes)
kubectl wait --for=condition=available --timeout=900s deployment --all -n bakery-ia

# Monitor deployment progress
kubectl get pods -n bakery-ia --watch

# if fails
# From your Mac
rsync -avz --progress --delete \
  --exclude='.git' \
  --exclude='node_modules' \
  --exclude='__pycache__' \
  --exclude='.venv' \
  /Users/urtzialfaro/Documents/bakery-ia/ \
  bakery-vps:/root/bakery-ia/

# On the VPS
kubectl delete deployments --all -n bakery-ia
kubectl delete jobs --all -n bakery-ia
kubectl delete statefulsets --all -n bakery-ia
sleep 30
kubectl apply -k infrastructure/environments/prod/k8s-manifests
kubectl get pods -n bakery-ia -w

kubectl get pods -n bakery-ia
kubectl describe node | grep -A 10 "Allocated resources"

Step 6.3: Verify Application Health

# Check all pods are running
kubectl get pods -n bakery-ia

# Check services
kubectl get svc -n bakery-ia

# Check ingress
kubectl get ingress -n bakery-ia

# Test gateway health
kubectl exec -n bakery-ia deployment/gateway -- curl -s http://localhost:8000/health

Phase 7: Deploy Optional Services

Step 7.1: Deploy Unbound DNS (Required for Mailu)

# Deploy Unbound DNS resolver
helm upgrade --install unbound infrastructure/platform/networking/dns/unbound-helm \
  -n bakery-ia \
  -f infrastructure/platform/networking/dns/unbound-helm/values.yaml \
  -f infrastructure/platform/networking/dns/unbound-helm/prod/values.yaml \
  --timeout 5m \
  --wait

# Get Unbound service IP
UNBOUND_IP=$(kubectl get svc unbound-dns -n bakery-ia -o jsonpath='{.spec.clusterIP}')
echo "Unbound DNS IP: $UNBOUND_IP"

Step 7.2: Configure CoreDNS for DNSSEC

# Patch CoreDNS to forward to Unbound
kubectl patch configmap coredns -n kube-system --type merge -p "{
  \"data\": {
    \"Corefile\": \".:53 {\\n    errors\\n    health {\\n       lameduck 5s\\n    }\\n    ready\\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\\n       pods insecure\\n       fallthrough in-addr.arpa ip6.arpa\\n       ttl 30\\n    }\\n    prometheus :9153\\n    forward . $UNBOUND_IP {\\n       max_concurrent 1000\\n    }\\n    cache 30\\n    loop\\n    reload\\n    loadbalance\\n}\\n\"
  }
}"

# Restart CoreDNS
kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system --timeout=60s

Step 7.3: Deploy Mailu Email Server

# Add Mailu Helm repository
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update

# Apply Mailu configuration secrets
# These are pre-configured with secure defaults
kubectl apply -f infrastructure/platform/mail/mailu-helm/configs/mailu-admin-credentials-secret.yaml -n bakery-ia
kubectl apply -f infrastructure/platform/mail/mailu-helm/configs/mailu-certificates-secret.yaml -n bakery-ia

# Install Mailu with production configuration
# The Helm chart uses the pre-configured secrets for admin credentials and TLS certificates
helm upgrade --install mailu mailu/mailu \
  -n bakery-ia \
  -f infrastructure/platform/mail/mailu-helm/values.yaml \
  -f infrastructure/platform/mail/mailu-helm/prod/values.yaml \
  --timeout 10m

# Wait for Mailu to be ready
kubectl wait --for=condition=available --timeout=600s deployment/mailu-front -n bakery-ia

# Verify Mailu pods are running
kubectl get pods -n bakery-ia | grep mailu

# Get the admin password from the pre-configured secret
MAILU_ADMIN_PASSWORD=$(kubectl get secret mailu-admin-credentials -n bakery-ia -o jsonpath='{.data.password}' | base64 -d)
echo "Mailu Admin Password: $MAILU_ADMIN_PASSWORD"
echo "⚠️  SAVE THIS PASSWORD SECURELY!"

# Check Mailu initialization status
kubectl logs -n bakery-ia deployment/mailu-front --tail=10

Important Notes about Mailu Deployment:

  1. Pre-Configured Secrets: Mailu uses pre-configured secrets for admin credentials and TLS certificates. These are defined in the configuration files.

  2. Password Management: The admin password is stored in mailu-admin-credentials-secret.yaml. For production, you should update this with a secure password before deployment.

  3. TLS Certificates: The self-signed certificates in mailu-certificates-secret.yaml are for initial setup. For production, replace these with proper certificates from cert-manager (see Step 7.3.1).

  4. Initialization Time: Mailu may take 5-10 minutes to fully initialize. During this time, some pods may restart as the system configures itself.

  5. Accessing Mailu:

  • Webmail: https://mail.bakewise.ai/webmail
  • Admin Interface: https://mail.bakewise.ai/admin
  • Username: admin@bakewise.ai
  • Password: (from mailu-admin-credentials-secret.yaml)
  1. Mailgun Relay: The production configuration includes Mailgun SMTP relay. Configure your Mailgun credentials in mailu-mailgun-credentials-secret.yaml before deployment.

Step 7.3.1: Mailu Configuration Notes

Important Information about Mailu Certificates:

  1. Dual Certificate Architecture:
  • Internal Communication: Uses self-signed certificates (mailu-certificates-secret.yaml)
  • External Communication: Uses Let's Encrypt certificates via NGINX Ingress (bakery-ia-prod-tls-cert)
  1. No Certificate Replacement Needed: The self-signed certificates are only used for internal communication between Mailu services. External clients connect through the NGINX Ingress Controller which uses the publicly trusted Let's Encrypt certificates.

  2. Certificate Flow:

External Client → NGINX Ingress (Let's Encrypt) → Internal Network → Mailu Services (Self-signed)
  1. Security: This architecture is secure because:
  • External connections use publicly trusted certificates
  • Internal connections are still encrypted (even if self-signed)
  • Ingress terminates TLS, reducing load on Mailu services
  1. Mailgun Relay Configuration: For outbound email delivery, configure your Mailgun credentials:
# Edit the Mailgun credentials secret
nano infrastructure/platform/mail/mailu-helm/configs/mailu-mailgun-credentials-secret.yaml

# Apply the secret
kubectl apply -f infrastructure/platform/mail/mailu-helm/configs/mailu-mailgun-credentials-secret.yaml -n bakery-ia

# Restart Mailu to pick up the new relay configuration
kubectl rollout restart deployment -n bakery-ia -l app.kubernetes.io/instance=mailu

Step 7.4: Deploy SigNoz Monitoring

# Add SigNoz Helm repository
helm repo add signoz https://charts.signoz.io
helm repo update

# Install SigNoz
helm install signoz signoz/signoz \
  -n bakery-ia \
  -f infrastructure/monitoring/signoz/signoz-values-prod.yaml \
  --set global.storageClass="microk8s-hostpath" \
  --set clickhouse.persistence.enabled=true \
  --set clickhouse.persistence.size=50Gi \
  --timeout 15m

# Wait for SigNoz to be ready
kubectl wait --for=condition=available --timeout=600s deployment/signoz-frontend -n bakery-ia

# Verify
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz

Phase 8: Verification & Validation

Step 8.1: Complete Verification Checklist

# 1. Check all pods are running
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"
# Should return NO results

# 2. Check services
kubectl get svc -n bakery-ia

# 3. Check ingress
kubectl get ingress -n bakery-ia

# 4. Check certificates
kubectl get certificate -n bakery-ia
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia

# 5. Check PVCs
kubectl get pvc -n bakery-ia

Step 8.2: Test Application Endpoints

# Test frontend (from external machine)
curl -I https://bakewise.ai
# Expected: HTTP/2 200 OK

# Test API health
curl https://bakewise.ai/api/v1/health
# Expected: {"status": "healthy"}

# Test monitoring
curl -I https://monitoring.bakewise.ai/signoz
# Expected: HTTP/2 200 OK

Step 8.3: Test Database Connections

# Test PostgreSQL SSL
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
  'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected: on

# Test Redis
kubectl exec -n bakery-ia deployment/redis -- redis-cli ping
# Expected: PONG

Step 8.4: Production Validation Checklist

  • Application accessible at https://bakewise.ai
  • Monitoring accessible at https://monitoring.bakewise.ai
  • SSL certificates valid (check with browser)
  • All services running and healthy
  • Database connections working with TLS
  • CI/CD pipeline operational
  • Email service working (if deployed)
  • Pilot coupon verified (check tenant-service logs)

Post-Deployment Operations

Configure Stripe Keys (Required Before Going Live)

Before accepting payments, configure your Stripe credentials:

# Edit ConfigMap for publishable key
nano infrastructure/environments/common/configs/configmap.yaml
# Add: VITE_STRIPE_PUBLISHABLE_KEY: "pk_live_XXXXXXXXXXXX"

# Encode your secret keys
echo -n "sk_live_XXXXXXXXXX" | base64  # Your secret key
echo -n "whsec_XXXXXXXXXX" | base64     # Your webhook secret

# Edit Secrets
nano infrastructure/environments/common/configs/secrets.yaml
# Add to payment-secrets section:
#   STRIPE_SECRET_KEY: <base64-encoded>
#   STRIPE_WEBHOOK_SECRET: <base64-encoded>

# Apply the updated configuration
kubectl apply -k infrastructure/environments/prod/k8s-manifests

# Restart services that use Stripe
kubectl rollout restart deployment/payment-service -n bakery-ia

Backup Strategy

# Create backup script
cat > ~/backup-databases.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR

# Backup all databases
for db in auth tenant training forecasting ai-insights sales inventory production procurement distribution recipes suppliers pos orders external notification alert-processor orchestrator demo-session; do
  echo "Backing up ${db}-db..."
  kubectl exec -n bakery-ia deployment/${db}-db -- \
    pg_dump -U ${db}_user -d ${db}_db > "$BACKUP_DIR/${db}.sql"
done

# Compress
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"

# Keep only last 7 days
find /backups -name "*.tar.gz" -mtime +7 -delete

echo "Backup completed: $BACKUP_DIR.tar.gz"
EOF

chmod +x ~/backup-databases.sh

# Setup daily cron job (2 AM)
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -

Scaling Guidelines

Tenants RAM CPU Storage Monthly Cost
10 20 GB 8 cores 200 GB €40-80
25 32 GB 12 cores 300 GB €80-120
50 48 GB 16 cores 500 GB €150-200
100+ Consider multi-node cluster €300+

Regular Maintenance Tasks

Frequency Task
Daily Check logs and alerts
Weekly Review resource utilization
Monthly Update dependencies, security patches
Quarterly Review backup procedures, disaster recovery

Troubleshooting Guide

Common Issues

Pods Stuck in Pending State

# Check node resources
kubectl describe nodes

# Check PVC status
kubectl get pvc -n bakery-ia

# Check events
kubectl get events -n bakery-ia --sort-by='.lastTimestamp'

Certificate Not Issuing

# Check cluster issuer
kubectl get clusterissuer

# Check certificate status
kubectl describe certificate -n bakery-ia

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager

# Verify ports 80/443 are open
curl -I http://bakewise.ai

Services Not Accessible

# Check ingress
kubectl describe ingress -n bakery-ia

# Check ingress controller logs
kubectl logs -n ingress deployment/nginx-ingress-microk8s-controller

# Check endpoints
kubectl get endpoints -n bakery-ia

Database Connection Errors

# Check database pod
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database

# Check database logs
kubectl logs -n bakery-ia deployment/auth-db

# Test connection from service
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432

Out of Resources

# Check node resources
kubectl top nodes

# Check pod resource usage
kubectl top pods -n bakery-ia --sort-by=memory

# Scale down non-critical services temporarily
kubectl scale deployment monitoring -n bakery-ia --replicas=0

Reference & Resources

Key File Locations

Configuration File Path
ConfigMap infrastructure/environments/common/configs/configmap.yaml
Secrets infrastructure/environments/common/configs/secrets.yaml
Prod Kustomization infrastructure/environments/prod/k8s-manifests/kustomization.yaml
Cert-Manager Issuer infrastructure/platform/cert-manager/cluster-issuer-production.yaml
Ingress infrastructure/platform/networking/ingress/base/ingress.yaml
Gitea Values infrastructure/cicd/gitea/values.yaml
Mailu Values infrastructure/platform/mail/mailu-helm/values.yaml

Production URLs

Service URL
Main Application https://bakewise.ai
API https://bakewise.ai/api/v1/...
Monitoring https://monitoring.bakewise.ai
Gitea https://gitea.bakewise.ai
Registry https://registry.bakewise.ai
Webmail https://mail.bakewise.ai/webmail
Mail Admin https://mail.bakewise.ai/admin

External Documentation

Support Resources


Conclusion

This guide provides a complete, step-by-step process for deploying Bakery-IA to production. Key highlights:

  1. Bootstrap Approach: Transfer code to server first, then push to Gitea
  2. Layered Deployment: Components deployed in dependency order
  3. Production Ready: TLS everywhere, monitoring, CI/CD, backups
  4. Scalable: Designed for 10-100+ tenants with clear scaling path

For questions or issues, refer to the troubleshooting guide or consult the support resources listed above.