2026-01-07 19:12:35 +01:00
|
|
|
|
# Bakery-IA Pilot Launch Guide
|
|
|
|
|
|
|
|
|
|
|
|
**Complete guide for deploying to production for a 10-tenant pilot program**
|
|
|
|
|
|
|
|
|
|
|
|
**Last Updated:** 2026-01-07
|
|
|
|
|
|
**Target Environment:** clouding.io VPS with MicroK8s
|
|
|
|
|
|
**Estimated Cost:** €41-81/month
|
|
|
|
|
|
**Time to Deploy:** 2-4 hours (first time)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Table of Contents
|
|
|
|
|
|
|
|
|
|
|
|
1. [Executive Summary](#executive-summary)
|
|
|
|
|
|
2. [Pre-Launch Checklist](#pre-launch-checklist)
|
|
|
|
|
|
3. [VPS Provisioning](#vps-provisioning)
|
|
|
|
|
|
4. [Infrastructure Setup](#infrastructure-setup)
|
|
|
|
|
|
5. [Domain & DNS Configuration](#domain--dns-configuration)
|
|
|
|
|
|
6. [TLS/SSL Certificates](#tlsssl-certificates)
|
|
|
|
|
|
7. [Email & Communication Setup](#email--communication-setup)
|
|
|
|
|
|
8. [Kubernetes Deployment](#kubernetes-deployment)
|
|
|
|
|
|
9. [Configuration & Secrets](#configuration--secrets)
|
|
|
|
|
|
10. [Database Migrations](#database-migrations)
|
|
|
|
|
|
11. [Verification & Testing](#verification--testing)
|
|
|
|
|
|
12. [Post-Deployment](#post-deployment)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Executive Summary
|
|
|
|
|
|
|
|
|
|
|
|
### What You're Deploying
|
|
|
|
|
|
|
|
|
|
|
|
A complete multi-tenant SaaS platform with:
|
|
|
|
|
|
- **18 microservices** (auth, tenant, ML forecasting, inventory, sales, orders, etc.)
|
|
|
|
|
|
- **14 PostgreSQL databases** with TLS encryption
|
|
|
|
|
|
- **Redis cache** with TLS
|
|
|
|
|
|
- **RabbitMQ** message broker
|
|
|
|
|
|
- **Monitoring stack** (Prometheus, Grafana, AlertManager)
|
|
|
|
|
|
- **Full security** (TLS, RBAC, audit logging)
|
|
|
|
|
|
|
|
|
|
|
|
### Total Cost Breakdown
|
|
|
|
|
|
|
|
|
|
|
|
| Service | Provider | Monthly Cost |
|
|
|
|
|
|
|---------|----------|-------------|
|
|
|
|
|
|
| VPS Server (20GB RAM, 8 vCPU, 200GB SSD) | clouding.io | €40-80 |
|
|
|
|
|
|
| Domain | Namecheap/Cloudflare | €1.25 (€15/year) |
|
|
|
|
|
|
| Email | Zoho Free / Gmail | €0 |
|
|
|
|
|
|
| WhatsApp API | Meta Business | €0 (1k free conversations) |
|
|
|
|
|
|
| DNS | Cloudflare | €0 |
|
|
|
|
|
|
| SSL | Let's Encrypt | €0 |
|
|
|
|
|
|
| **TOTAL** | | **€41-81/month** |
|
|
|
|
|
|
|
|
|
|
|
|
### Timeline
|
|
|
|
|
|
|
|
|
|
|
|
| Phase | Duration | Description |
|
|
|
|
|
|
|-------|----------|-------------|
|
|
|
|
|
|
| Pre-Launch Setup | 1-2 hours | Domain, VPS provisioning, accounts setup |
|
|
|
|
|
|
| Infrastructure Setup | 1 hour | MicroK8s installation, firewall config |
|
|
|
|
|
|
| Deployment | 30-60 min | Deploy all services and databases |
|
|
|
|
|
|
| Verification | 30-60 min | Test everything works |
|
|
|
|
|
|
| **Total** | **2-4 hours** | First-time deployment |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Pre-Launch Checklist
|
|
|
|
|
|
|
|
|
|
|
|
### Required Accounts & Services
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Domain Name**
|
|
|
|
|
|
- Register at Namecheap or Cloudflare (€10-15/year)
|
|
|
|
|
|
- Suggested: `bakeryforecast.es` or `bakery-ia.com`
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **VPS Account**
|
|
|
|
|
|
- Sign up at [clouding.io](https://www.clouding.io)
|
|
|
|
|
|
- Payment method configured
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Email Service** (Choose ONE)
|
|
|
|
|
|
- Option A: Zoho Mail FREE (recommended for full send/receive)
|
|
|
|
|
|
- Option B: Gmail SMTP + domain forwarding
|
|
|
|
|
|
- Option C: Google Workspace (14-day free trial, then €5.75/month)
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **WhatsApp Business API**
|
|
|
|
|
|
- Create Meta Business Account (free)
|
|
|
|
|
|
- Verify business identity
|
|
|
|
|
|
- Phone number ready (non-VoIP)
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **DNS Access**
|
|
|
|
|
|
- Cloudflare account (free, recommended)
|
|
|
|
|
|
- Or domain registrar DNS panel access
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Container Registry** (Choose ONE)
|
|
|
|
|
|
- Option A: Docker Hub account (recommended)
|
|
|
|
|
|
- Option B: GitHub Container Registry
|
|
|
|
|
|
- Option C: MicroK8s built-in registry
|
|
|
|
|
|
|
|
|
|
|
|
### Required Tools on Local Machine
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Verify you have these installed:
|
|
|
|
|
|
kubectl version --client
|
|
|
|
|
|
docker --version
|
|
|
|
|
|
git --version
|
|
|
|
|
|
ssh -V
|
|
|
|
|
|
openssl version
|
|
|
|
|
|
|
|
|
|
|
|
# Install if missing (macOS):
|
|
|
|
|
|
brew install kubectl docker git openssh openssl
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Repository Setup
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Clone the repository
|
|
|
|
|
|
git clone https://github.com/yourusername/bakery-ia.git
|
|
|
|
|
|
cd bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Verify structure
|
|
|
|
|
|
ls infrastructure/kubernetes/overlays/prod/
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## VPS Provisioning
|
|
|
|
|
|
|
|
|
|
|
|
### Recommended Configuration
|
|
|
|
|
|
|
|
|
|
|
|
**For 10-tenant pilot program:**
|
|
|
|
|
|
- **RAM:** 20 GB
|
|
|
|
|
|
- **CPU:** 8 vCPU cores
|
|
|
|
|
|
- **Storage:** 200 GB NVMe SSD (triple replica)
|
|
|
|
|
|
- **Network:** 1 Gbps connection
|
|
|
|
|
|
- **OS:** Ubuntu 22.04 LTS
|
|
|
|
|
|
- **Monthly Cost:** €40-80 (check current pricing)
|
|
|
|
|
|
|
|
|
|
|
|
### Why These Specs?
|
|
|
|
|
|
|
|
|
|
|
|
**Memory Breakdown:**
|
|
|
|
|
|
- Application services: 14.1 GB
|
|
|
|
|
|
- Databases (18 instances): 4.6 GB
|
|
|
|
|
|
- Infrastructure (Redis, RabbitMQ): 0.8 GB
|
|
|
|
|
|
- Gateway/Frontend: 1.8 GB
|
|
|
|
|
|
- Monitoring: 1.5 GB
|
|
|
|
|
|
- System overhead: ~3 GB
|
|
|
|
|
|
- **Total:** ~26 GB capacity needed, 20 GB is sufficient with HPA
|
|
|
|
|
|
|
|
|
|
|
|
**Storage Breakdown:**
|
|
|
|
|
|
- Databases: 36 GB (18 × 2GB)
|
|
|
|
|
|
- ML Models: 10 GB
|
|
|
|
|
|
- Redis: 1 GB
|
|
|
|
|
|
- RabbitMQ: 2 GB
|
|
|
|
|
|
- Prometheus metrics: 20 GB
|
|
|
|
|
|
- Container images: ~30 GB
|
|
|
|
|
|
- Growth buffer: 100 GB
|
|
|
|
|
|
- **Total:** 199 GB
|
|
|
|
|
|
|
|
|
|
|
|
### Provisioning Steps
|
|
|
|
|
|
|
|
|
|
|
|
1. **Create VPS at clouding.io:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Log in to clouding.io dashboard
|
|
|
|
|
|
2. Click "Create New Server"
|
|
|
|
|
|
3. Select:
|
|
|
|
|
|
- OS: Ubuntu 22.04 LTS
|
|
|
|
|
|
- RAM: 20 GB
|
|
|
|
|
|
- CPU: 8 vCPU
|
|
|
|
|
|
- Storage: 200 GB NVMe SSD
|
|
|
|
|
|
- Location: Barcelona (best for Spain)
|
|
|
|
|
|
4. Set hostname: bakery-ia-prod-01
|
|
|
|
|
|
5. Add SSH key (or use password)
|
|
|
|
|
|
6. Create server
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Note your server details:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Save these for later:
|
|
|
|
|
|
VPS_IP="YOUR_VPS_IP_ADDRESS"
|
|
|
|
|
|
VPS_ROOT_PASSWORD="YOUR_ROOT_PASSWORD" # If not using SSH key
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Initial SSH connection:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Test connection
|
|
|
|
|
|
ssh root@$VPS_IP
|
|
|
|
|
|
|
|
|
|
|
|
# Update system
|
|
|
|
|
|
apt update && apt upgrade -y
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Infrastructure Setup
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Install MicroK8s
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# SSH into your VPS
|
|
|
|
|
|
ssh root@$VPS_IP
|
|
|
|
|
|
|
|
|
|
|
|
# Install MicroK8s
|
|
|
|
|
|
snap install microk8s --classic --channel=1.28/stable
|
|
|
|
|
|
|
|
|
|
|
|
# Add your user to microk8s group
|
|
|
|
|
|
usermod -a -G microk8s $USER
|
|
|
|
|
|
chown -f -R $USER ~/.kube
|
|
|
|
|
|
newgrp microk8s
|
|
|
|
|
|
|
|
|
|
|
|
# Verify installation
|
|
|
|
|
|
microk8s status --wait-ready
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Enable Required Add-ons
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Enable core add-ons
|
|
|
|
|
|
microk8s enable dns
|
|
|
|
|
|
microk8s enable hostpath-storage
|
|
|
|
|
|
microk8s enable ingress
|
|
|
|
|
|
microk8s enable cert-manager
|
|
|
|
|
|
microk8s enable metrics-server
|
|
|
|
|
|
microk8s enable rbac
|
|
|
|
|
|
|
|
|
|
|
|
# Optional but recommended
|
|
|
|
|
|
microk8s enable prometheus # For monitoring
|
|
|
|
|
|
microk8s enable registry # If using local registry
|
|
|
|
|
|
|
|
|
|
|
|
# Setup kubectl alias
|
|
|
|
|
|
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
|
|
|
|
|
|
source ~/.bashrc
|
|
|
|
|
|
|
|
|
|
|
|
# Verify
|
|
|
|
|
|
kubectl get nodes
|
|
|
|
|
|
kubectl get pods -A
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 3: Configure Firewall
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Allow necessary ports
|
|
|
|
|
|
ufw allow 22/tcp # SSH
|
|
|
|
|
|
ufw allow 80/tcp # HTTP
|
|
|
|
|
|
ufw allow 443/tcp # HTTPS
|
|
|
|
|
|
ufw allow 16443/tcp # Kubernetes API (optional)
|
|
|
|
|
|
|
|
|
|
|
|
# Enable firewall
|
|
|
|
|
|
ufw enable
|
|
|
|
|
|
|
|
|
|
|
|
# Check status
|
|
|
|
|
|
ufw status verbose
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 4: Create Namespace
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Create bakery-ia namespace
|
|
|
|
|
|
kubectl create namespace bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Verify
|
|
|
|
|
|
kubectl get namespaces
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Domain & DNS Configuration
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Register Domain
|
|
|
|
|
|
|
|
|
|
|
|
1. Go to Namecheap or Cloudflare Registrar
|
|
|
|
|
|
2. Search for your desired domain
|
|
|
|
|
|
3. Complete purchase (~€10-15/year)
|
|
|
|
|
|
4. Save domain credentials
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Configure Cloudflare DNS (Recommended)
|
|
|
|
|
|
|
|
|
|
|
|
1. **Add site to Cloudflare:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Log in to Cloudflare
|
|
|
|
|
|
2. Click "Add a Site"
|
|
|
|
|
|
3. Enter your domain name
|
|
|
|
|
|
4. Choose Free plan
|
|
|
|
|
|
5. Cloudflare will scan existing DNS records
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Update nameservers at registrar:**
|
|
|
|
|
|
```
|
|
|
|
|
|
Point your domain's nameservers to Cloudflare:
|
|
|
|
|
|
- NS1: assigned.cloudflare.com
|
|
|
|
|
|
- NS2: assigned.cloudflare.com
|
|
|
|
|
|
(Cloudflare will provide the exact values)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Add DNS records:**
|
|
|
|
|
|
```
|
|
|
|
|
|
Type Name Content TTL Proxy
|
|
|
|
|
|
A @ YOUR_VPS_IP Auto Yes
|
|
|
|
|
|
A www YOUR_VPS_IP Auto Yes
|
|
|
|
|
|
A api YOUR_VPS_IP Auto Yes
|
|
|
|
|
|
A monitoring YOUR_VPS_IP Auto Yes
|
|
|
|
|
|
CNAME * yourdomain.com Auto No
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
4. **Configure SSL/TLS mode:**
|
|
|
|
|
|
```
|
|
|
|
|
|
SSL/TLS tab → Overview → Set to "Full (strict)"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
5. **Test DNS propagation:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Wait 5-10 minutes, then test
|
|
|
|
|
|
nslookup yourdomain.com
|
|
|
|
|
|
nslookup api.yourdomain.com
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## TLS/SSL Certificates
|
|
|
|
|
|
|
|
|
|
|
|
### Understanding Certificate Setup
|
|
|
|
|
|
|
|
|
|
|
|
The platform uses **two layers** of SSL/TLS:
|
|
|
|
|
|
|
|
|
|
|
|
1. **External (Ingress) SSL:** Let's Encrypt for public HTTPS
|
|
|
|
|
|
2. **Internal (Database) SSL:** Self-signed certificates for database connections
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Generate Internal Certificates
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# On your local machine
|
|
|
|
|
|
cd infrastructure/tls
|
|
|
|
|
|
|
|
|
|
|
|
# Generate certificates
|
|
|
|
|
|
./generate-certificates.sh
|
|
|
|
|
|
|
|
|
|
|
|
# This creates:
|
|
|
|
|
|
# - ca/ (Certificate Authority)
|
|
|
|
|
|
# - postgres/ (PostgreSQL server certs)
|
|
|
|
|
|
# - redis/ (Redis server certs)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Certificate Details:**
|
|
|
|
|
|
- Root CA: 10-year validity (expires 2035)
|
|
|
|
|
|
- Server certs: 3-year validity (expires October 2028)
|
|
|
|
|
|
- Algorithm: RSA 4096-bit
|
|
|
|
|
|
- Signature: SHA-256
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Create Kubernetes Secrets
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Create PostgreSQL TLS secret
|
|
|
|
|
|
kubectl create secret generic postgres-tls \
|
|
|
|
|
|
--from-file=server-cert.pem=infrastructure/tls/postgres/server-cert.pem \
|
|
|
|
|
|
--from-file=server-key.pem=infrastructure/tls/postgres/server-key.pem \
|
|
|
|
|
|
--from-file=ca-cert.pem=infrastructure/tls/postgres/ca-cert.pem \
|
|
|
|
|
|
-n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Create Redis TLS secret
|
|
|
|
|
|
kubectl create secret generic redis-tls \
|
|
|
|
|
|
--from-file=redis-cert.pem=infrastructure/tls/redis/redis-cert.pem \
|
|
|
|
|
|
--from-file=redis-key.pem=infrastructure/tls/redis/redis-key.pem \
|
|
|
|
|
|
--from-file=ca-cert.pem=infrastructure/tls/redis/ca-cert.pem \
|
|
|
|
|
|
-n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Verify secrets created
|
|
|
|
|
|
kubectl get secrets -n bakery-ia | grep tls
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 3: Configure Let's Encrypt (External SSL)
|
|
|
|
|
|
|
|
|
|
|
|
cert-manager is already enabled. Configure the ClusterIssuer:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# On VPS, create ClusterIssuer
|
|
|
|
|
|
cat <<EOF | kubectl apply -f -
|
|
|
|
|
|
apiVersion: cert-manager.io/v1
|
|
|
|
|
|
kind: ClusterIssuer
|
|
|
|
|
|
metadata:
|
|
|
|
|
|
name: letsencrypt-production
|
|
|
|
|
|
spec:
|
|
|
|
|
|
acme:
|
|
|
|
|
|
server: https://acme-v02.api.letsencrypt.org/directory
|
|
|
|
|
|
email: admin@yourdomain.com # CHANGE THIS
|
|
|
|
|
|
privateKeySecretRef:
|
|
|
|
|
|
name: letsencrypt-production
|
|
|
|
|
|
solvers:
|
|
|
|
|
|
- http01:
|
|
|
|
|
|
ingress:
|
|
|
|
|
|
class: public
|
|
|
|
|
|
EOF
|
|
|
|
|
|
|
|
|
|
|
|
# Verify ClusterIssuer is ready
|
|
|
|
|
|
kubectl get clusterissuer
|
|
|
|
|
|
kubectl describe clusterissuer letsencrypt-production
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Email & Communication Setup
|
|
|
|
|
|
|
|
|
|
|
|
### Option A: Zoho Mail (FREE, Recommended)
|
|
|
|
|
|
|
|
|
|
|
|
**Features:**
|
|
|
|
|
|
- ✅ Free forever for 1 domain, 5 users
|
|
|
|
|
|
- ✅ 5GB storage per user
|
|
|
|
|
|
- ✅ Full send/receive capability
|
|
|
|
|
|
- ✅ Web interface + SMTP/IMAP
|
|
|
|
|
|
- ✅ Professional email addresses
|
|
|
|
|
|
|
|
|
|
|
|
**Setup Steps:**
|
|
|
|
|
|
|
|
|
|
|
|
1. **Sign up for Zoho Mail:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Go to https://www.zoho.com/mail/
|
|
|
|
|
|
2. Click "Sign Up for Free"
|
|
|
|
|
|
3. Choose "Forever Free" plan
|
|
|
|
|
|
4. Enter your domain name
|
|
|
|
|
|
5. Complete verification
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Verify domain ownership:**
|
|
|
|
|
|
```
|
|
|
|
|
|
Add TXT record to your DNS:
|
|
|
|
|
|
Type: TXT
|
|
|
|
|
|
Name: @
|
|
|
|
|
|
Value: zoho-verification=XXXXX.zoho.com
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Configure MX records:**
|
|
|
|
|
|
```
|
|
|
|
|
|
Priority Type Name Value
|
|
|
|
|
|
10 MX @ mx.zoho.com
|
|
|
|
|
|
20 MX @ mx2.zoho.com
|
|
|
|
|
|
50 MX @ mx3.zoho.com
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
4. **Get SMTP credentials:**
|
|
|
|
|
|
```
|
|
|
|
|
|
SMTP Host: smtp.zoho.com
|
|
|
|
|
|
SMTP Port: 587
|
|
|
|
|
|
SMTP Username: noreply@yourdomain.com
|
|
|
|
|
|
SMTP Password: (generate app password in Zoho settings)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Option B: Gmail SMTP + Forwarding
|
|
|
|
|
|
|
|
|
|
|
|
**Features:**
|
|
|
|
|
|
- ✅ Completely free
|
|
|
|
|
|
- ✅ 500 emails/day (sufficient for pilot)
|
|
|
|
|
|
- ✅ Receive via domain forwarding
|
|
|
|
|
|
|
|
|
|
|
|
**Setup Steps:**
|
|
|
|
|
|
|
|
|
|
|
|
1. **Enable 2FA on your Gmail:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Go to myaccount.google.com
|
|
|
|
|
|
2. Security → 2-Step Verification
|
|
|
|
|
|
3. Enable and complete setup
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Generate app password:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Security → 2-Step Verification → App passwords
|
|
|
|
|
|
2. Select "Mail" and "Other (Custom name)"
|
|
|
|
|
|
3. Name it "Bakery-IA SMTP"
|
|
|
|
|
|
4. Copy the 16-character password
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Configure domain email forwarding:**
|
|
|
|
|
|
```
|
|
|
|
|
|
At your domain registrar or Cloudflare:
|
|
|
|
|
|
- Forward noreply@yourdomain.com → your.gmail@gmail.com
|
|
|
|
|
|
- Forward alerts@yourdomain.com → your.gmail@gmail.com
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
4. **SMTP Settings:**
|
|
|
|
|
|
```
|
|
|
|
|
|
SMTP Host: smtp.gmail.com
|
|
|
|
|
|
SMTP Port: 587
|
|
|
|
|
|
SMTP Username: your.gmail@gmail.com
|
|
|
|
|
|
SMTP Password: (16-char app password from step 2)
|
|
|
|
|
|
From Email: noreply@yourdomain.com
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### WhatsApp Business API Setup
|
|
|
|
|
|
|
|
|
|
|
|
**Features:**
|
|
|
|
|
|
- ✅ First 1,000 conversations/month FREE
|
|
|
|
|
|
- ✅ Perfect for 10 tenants (~500 messages/month)
|
|
|
|
|
|
|
|
|
|
|
|
**Setup Steps:**
|
|
|
|
|
|
|
|
|
|
|
|
1. **Create Meta Business Account:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Go to business.facebook.com
|
|
|
|
|
|
2. Create Business Account
|
|
|
|
|
|
3. Complete business verification
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Add WhatsApp Product:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Go to developers.facebook.com
|
|
|
|
|
|
2. Create New App → Business
|
|
|
|
|
|
3. Add WhatsApp product
|
|
|
|
|
|
4. Complete setup wizard
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Configure Phone Number:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Test with your personal number initially
|
|
|
|
|
|
2. Later: Get dedicated business number
|
|
|
|
|
|
3. Verify phone number with SMS code
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
4. **Create Message Templates:**
|
|
|
|
|
|
```
|
|
|
|
|
|
1. Go to WhatsApp Manager
|
|
|
|
|
|
2. Create templates for:
|
|
|
|
|
|
- Low inventory alert
|
|
|
|
|
|
- Expired product alert
|
|
|
|
|
|
- Forecast summary
|
|
|
|
|
|
- Order notification
|
|
|
|
|
|
3. Submit for approval (15 min - 24 hours)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
5. **Get API Credentials:**
|
|
|
|
|
|
```
|
|
|
|
|
|
Save these values:
|
|
|
|
|
|
- Phone Number ID: (from WhatsApp Manager)
|
|
|
|
|
|
- Access Token: (from App Dashboard)
|
|
|
|
|
|
- Business Account ID: (from WhatsApp Manager)
|
|
|
|
|
|
- Webhook Verify Token: (create your own secure string)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Kubernetes Deployment
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Prepare Container Images
|
|
|
|
|
|
|
|
|
|
|
|
#### Option A: Using Docker Hub (Recommended)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# On your local machine
|
|
|
|
|
|
docker login
|
|
|
|
|
|
|
|
|
|
|
|
# Build all images
|
|
|
|
|
|
docker-compose build
|
|
|
|
|
|
|
|
|
|
|
|
# Tag images for Docker Hub
|
|
|
|
|
|
# Replace YOUR_USERNAME with your Docker Hub username
|
|
|
|
|
|
export DOCKER_USERNAME="YOUR_USERNAME"
|
|
|
|
|
|
|
|
|
|
|
|
./scripts/tag-images.sh $DOCKER_USERNAME
|
|
|
|
|
|
|
|
|
|
|
|
# Push to Docker Hub
|
|
|
|
|
|
./scripts/push-images.sh $DOCKER_USERNAME
|
|
|
|
|
|
|
|
|
|
|
|
# Update prod kustomization with your username
|
|
|
|
|
|
# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml
|
|
|
|
|
|
# Replace all "bakery/" with "$DOCKER_USERNAME/"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### Option B: Using MicroK8s Registry
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# On VPS
|
|
|
|
|
|
microk8s enable registry
|
|
|
|
|
|
|
|
|
|
|
|
# Get registry address (usually localhost:32000)
|
|
|
|
|
|
kubectl get service -n container-registry
|
|
|
|
|
|
|
|
|
|
|
|
# On local machine, configure insecure registry
|
|
|
|
|
|
# Edit /etc/docker/daemon.json:
|
|
|
|
|
|
{
|
|
|
|
|
|
"insecure-registries": ["YOUR_VPS_IP:32000"]
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
# Restart Docker
|
|
|
|
|
|
sudo systemctl restart docker
|
|
|
|
|
|
|
|
|
|
|
|
# Tag and push images
|
|
|
|
|
|
docker tag bakery/auth-service YOUR_VPS_IP:32000/bakery/auth-service
|
|
|
|
|
|
docker push YOUR_VPS_IP:32000/bakery/auth-service
|
|
|
|
|
|
# Repeat for all services...
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Update Production Configuration
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
The production configuration is already set up for **bakewise.ai** domain:
|
|
|
|
|
|
|
|
|
|
|
|
**Production URLs:**
|
|
|
|
|
|
- **Main Application:** https://bakewise.ai
|
|
|
|
|
|
- **API Endpoints:** https://bakewise.ai/api/v1/...
|
|
|
|
|
|
- **Monitoring Dashboard:** https://monitoring.bakewise.ai/grafana
|
|
|
|
|
|
- **Prometheus:** https://monitoring.bakewise.ai/prometheus
|
|
|
|
|
|
- **SigNoz (Traces/Metrics/Logs):** https://monitoring.bakewise.ai/signoz
|
|
|
|
|
|
- **AlertManager:** https://monitoring.bakewise.ai/alertmanager
|
|
|
|
|
|
|
2026-01-07 19:12:35 +01:00
|
|
|
|
```bash
|
2026-01-08 12:58:00 +01:00
|
|
|
|
# Verify the configuration is correct:
|
|
|
|
|
|
cat infrastructure/kubernetes/overlays/prod/prod-ingress.yaml | grep -A 3 "host:"
|
|
|
|
|
|
|
|
|
|
|
|
# Expected output should show:
|
|
|
|
|
|
# - host: bakewise.ai
|
|
|
|
|
|
# - host: monitoring.bakewise.ai
|
|
|
|
|
|
|
|
|
|
|
|
# Verify CORS configuration
|
|
|
|
|
|
cat infrastructure/kubernetes/overlays/prod/prod-configmap.yaml | grep CORS
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
# Expected: CORS_ORIGINS: "https://bakewise.ai"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**If using a different domain**, update these files:
|
|
|
|
|
|
```bash
|
2026-01-07 19:12:35 +01:00
|
|
|
|
# 1. Update domain names
|
|
|
|
|
|
nano infrastructure/kubernetes/overlays/prod/prod-ingress.yaml
|
2026-01-08 12:58:00 +01:00
|
|
|
|
# Replace bakewise.ai with your domain
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
|
|
|
|
|
# 2. Update ConfigMap
|
|
|
|
|
|
nano infrastructure/kubernetes/overlays/prod/prod-configmap.yaml
|
2026-01-08 12:58:00 +01:00
|
|
|
|
# Update CORS_ORIGINS
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
|
|
|
|
|
# 3. Verify image names (if using custom registry)
|
|
|
|
|
|
nano infrastructure/kubernetes/overlays/prod/kustomization.yaml
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Configuration & Secrets
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Generate Strong Passwords
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Generate passwords for all services
|
|
|
|
|
|
openssl rand -base64 32 # For each database
|
|
|
|
|
|
openssl rand -hex 32 # For JWT secrets and API keys
|
|
|
|
|
|
|
|
|
|
|
|
# Save all passwords securely!
|
|
|
|
|
|
# Recommended: Use a password manager (1Password, LastPass, Bitwarden)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Update Application Secrets
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Edit the secrets file
|
|
|
|
|
|
nano infrastructure/kubernetes/base/secrets.yaml
|
|
|
|
|
|
|
|
|
|
|
|
# Update ALL of these values:
|
|
|
|
|
|
# Database passwords (14 databases):
|
|
|
|
|
|
AUTH_DB_PASSWORD: <base64-encoded-password>
|
|
|
|
|
|
TENANT_DB_PASSWORD: <base64-encoded-password>
|
|
|
|
|
|
# ... (all 14 databases)
|
|
|
|
|
|
|
|
|
|
|
|
# Redis password:
|
|
|
|
|
|
REDIS_PASSWORD: <base64-encoded-password>
|
|
|
|
|
|
|
|
|
|
|
|
# JWT secrets:
|
|
|
|
|
|
JWT_SECRET_KEY: <base64-encoded-secret>
|
|
|
|
|
|
JWT_REFRESH_SECRET_KEY: <base64-encoded-secret>
|
|
|
|
|
|
|
|
|
|
|
|
# SMTP settings (from email setup):
|
|
|
|
|
|
SMTP_HOST: <base64-encoded-host> # smtp.zoho.com or smtp.gmail.com
|
|
|
|
|
|
SMTP_PORT: <base64-encoded-port> # 587
|
|
|
|
|
|
SMTP_USERNAME: <base64-encoded-username> # your email
|
|
|
|
|
|
SMTP_PASSWORD: <base64-encoded-password> # app password
|
|
|
|
|
|
DEFAULT_FROM_EMAIL: <base64-encoded-email> # noreply@yourdomain.com
|
|
|
|
|
|
|
|
|
|
|
|
# WhatsApp credentials (from WhatsApp setup):
|
|
|
|
|
|
WHATSAPP_ACCESS_TOKEN: <base64-encoded-token>
|
|
|
|
|
|
WHATSAPP_PHONE_NUMBER_ID: <base64-encoded-id>
|
|
|
|
|
|
WHATSAPP_BUSINESS_ACCOUNT_ID: <base64-encoded-id>
|
|
|
|
|
|
WHATSAPP_WEBHOOK_VERIFY_TOKEN: <base64-encoded-token>
|
|
|
|
|
|
|
|
|
|
|
|
# Database connection strings (update with actual passwords):
|
|
|
|
|
|
AUTH_DATABASE_URL: postgresql+asyncpg://auth_user:PASSWORD@auth-db:5432/auth_db?ssl=require
|
|
|
|
|
|
# ... (all 14 databases)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**To base64 encode:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
echo -n "your-password-here" | base64
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**CRITICAL:** Never commit real secrets to git! Use `.gitignore` for secrets files.
|
|
|
|
|
|
|
|
|
|
|
|
### Step 3: Apply Secrets
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Copy manifests to VPS
|
|
|
|
|
|
scp -r infrastructure/kubernetes user@YOUR_VPS_IP:~/
|
|
|
|
|
|
|
|
|
|
|
|
# SSH to VPS
|
|
|
|
|
|
ssh user@YOUR_VPS_IP
|
|
|
|
|
|
|
|
|
|
|
|
# Apply secrets
|
|
|
|
|
|
kubectl apply -f ~/infrastructure/kubernetes/base/secrets.yaml
|
|
|
|
|
|
|
|
|
|
|
|
# Verify secrets created
|
|
|
|
|
|
kubectl get secrets -n bakery-ia
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Database Migrations
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Deploy Databases
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# On VPS
|
|
|
|
|
|
kubectl apply -k ~/kubernetes/overlays/prod
|
|
|
|
|
|
|
|
|
|
|
|
# Wait for databases to be ready (5-10 minutes)
|
|
|
|
|
|
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=database -n bakery-ia --timeout=600s
|
|
|
|
|
|
|
|
|
|
|
|
# Check status
|
|
|
|
|
|
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Run Migrations
|
|
|
|
|
|
|
|
|
|
|
|
Migrations are automatically handled by init containers in each service. Verify they completed:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check migration job status
|
|
|
|
|
|
kubectl get jobs -n bakery-ia | grep migration
|
|
|
|
|
|
|
|
|
|
|
|
# All should show "COMPLETIONS = 1/1"
|
|
|
|
|
|
|
|
|
|
|
|
# Check logs if any failed
|
|
|
|
|
|
kubectl logs -n bakery-ia job/auth-migration
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 3: Verify Database Schemas
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Connect to a database to verify
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/auth-db -it -- psql -U auth_user -d auth_db
|
|
|
|
|
|
|
|
|
|
|
|
# Inside psql:
|
|
|
|
|
|
\dt # List tables
|
|
|
|
|
|
\d users # Describe users table
|
|
|
|
|
|
\q # Quit
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Verification & Testing
|
|
|
|
|
|
|
|
|
|
|
|
### Step 1: Check All Pods Running
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# View all pods
|
|
|
|
|
|
kubectl get pods -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Expected: All pods in "Running" state, none in CrashLoopBackOff
|
|
|
|
|
|
|
|
|
|
|
|
# Check for issues
|
|
|
|
|
|
kubectl get pods -n bakery-ia | grep -vE "Running|Completed"
|
|
|
|
|
|
|
|
|
|
|
|
# View logs for any problematic pods
|
|
|
|
|
|
kubectl logs -n bakery-ia POD_NAME
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 2: Check Services and Ingress
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# View services
|
|
|
|
|
|
kubectl get svc -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# View ingress
|
|
|
|
|
|
kubectl get ingress -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# View certificates (should auto-issue from Let's Encrypt)
|
|
|
|
|
|
kubectl get certificate -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Describe certificate to check status
|
|
|
|
|
|
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 3: Test Database Connections
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Test PostgreSQL TLS
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
|
|
|
|
|
|
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
|
|
|
|
|
|
# Expected output: on
|
|
|
|
|
|
|
|
|
|
|
|
# Test Redis TLS
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/redis -- redis-cli \
|
|
|
|
|
|
--tls \
|
|
|
|
|
|
--cert /tls/redis-cert.pem \
|
|
|
|
|
|
--key /tls/redis-key.pem \
|
|
|
|
|
|
--cacert /tls/ca-cert.pem \
|
|
|
|
|
|
-a $REDIS_PASSWORD \
|
|
|
|
|
|
ping
|
|
|
|
|
|
# Expected output: PONG
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 4: Test Frontend Access
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Test frontend (replace with your domain)
|
|
|
|
|
|
curl -I https://bakery.yourdomain.com
|
|
|
|
|
|
|
|
|
|
|
|
# Expected: HTTP/2 200 OK
|
|
|
|
|
|
|
|
|
|
|
|
# Test API health
|
|
|
|
|
|
curl https://api.yourdomain.com/health
|
|
|
|
|
|
|
|
|
|
|
|
# Expected: {"status": "healthy"}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 5: Test Authentication
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Create a test user (using your frontend or API)
|
|
|
|
|
|
curl -X POST https://api.yourdomain.com/api/v1/auth/register \
|
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
|
-d '{
|
|
|
|
|
|
"email": "test@yourdomain.com",
|
|
|
|
|
|
"password": "TestPassword123!",
|
|
|
|
|
|
"name": "Test User"
|
|
|
|
|
|
}'
|
|
|
|
|
|
|
|
|
|
|
|
# Login
|
|
|
|
|
|
curl -X POST https://api.yourdomain.com/api/v1/auth/login \
|
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
|
-d '{
|
|
|
|
|
|
"email": "test@yourdomain.com",
|
|
|
|
|
|
"password": "TestPassword123!"
|
|
|
|
|
|
}'
|
|
|
|
|
|
|
|
|
|
|
|
# Expected: JWT token in response
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 6: Test Email Delivery
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Trigger a password reset to test email
|
|
|
|
|
|
curl -X POST https://api.yourdomain.com/api/v1/auth/forgot-password \
|
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
|
-d '{"email": "test@yourdomain.com"}'
|
|
|
|
|
|
|
|
|
|
|
|
# Check your email inbox for the reset link
|
|
|
|
|
|
# Check service logs if email not received:
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/auth-service | grep -i "email\|smtp"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 7: Test WhatsApp (Optional)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Send a test WhatsApp message
|
|
|
|
|
|
# This requires creating a tenant and configuring WhatsApp in the UI
|
|
|
|
|
|
# Or test via API once authenticated
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Post-Deployment
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
### Step 1: Access SigNoz Monitoring Stack
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
Your production deployment includes **SigNoz**, a unified observability platform that provides complete visibility into your application:
|
|
|
|
|
|
|
|
|
|
|
|
#### What is SigNoz?
|
|
|
|
|
|
|
|
|
|
|
|
SigNoz is an **open-source, all-in-one observability platform** that provides:
|
|
|
|
|
|
- **📊 Distributed Tracing** - See end-to-end request flows across all 18 microservices
|
|
|
|
|
|
- **📈 Metrics Monitoring** - Application performance and infrastructure metrics
|
|
|
|
|
|
- **📝 Log Management** - Centralized logs from all services with trace correlation
|
|
|
|
|
|
- **🔍 Service Performance Monitoring (SPM)** - Automatic RED metrics (Rate, Error, Duration)
|
|
|
|
|
|
- **🗄️ Database Monitoring** - All 18 PostgreSQL databases + Redis + RabbitMQ
|
|
|
|
|
|
- **☸️ Kubernetes Monitoring** - Cluster, node, pod, and container metrics
|
|
|
|
|
|
|
|
|
|
|
|
**Why SigNoz instead of Prometheus/Grafana?**
|
|
|
|
|
|
- Single unified UI for traces, metrics, and logs (no context switching)
|
|
|
|
|
|
- Automatic service dependency mapping
|
|
|
|
|
|
- Built-in APM (Application Performance Monitoring)
|
|
|
|
|
|
- Log-trace correlation with one click
|
|
|
|
|
|
- Better query performance with ClickHouse backend
|
|
|
|
|
|
- Modern UI designed for microservices
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
|
|
|
|
|
#### Production Monitoring URLs
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
Access via domain:
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
2026-01-10 13:43:38 +01:00
|
|
|
|
https://monitoring.bakewise.ai/signoz # SigNoz - Main observability UI
|
|
|
|
|
|
https://monitoring.bakewise.ai/alertmanager # AlertManager - Alert management
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Or via port forwarding (if needed):
|
|
|
|
|
|
```bash
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# SigNoz Frontend (Main UI)
|
|
|
|
|
|
kubectl port-forward -n bakery-ia svc/signoz 8080:8080 &
|
|
|
|
|
|
# Open: http://localhost:8080
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# SigNoz AlertManager
|
|
|
|
|
|
kubectl port-forward -n bakery-ia svc/signoz-alertmanager 9093:9093 &
|
|
|
|
|
|
# Open: http://localhost:9093
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# OTel Collector (for debugging)
|
|
|
|
|
|
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4317:4317 & # gRPC
|
|
|
|
|
|
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4318:4318 & # HTTP
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
#### Key SigNoz Features to Explore
|
|
|
|
|
|
|
|
|
|
|
|
Once you open SigNoz (https://monitoring.bakewise.ai/signoz), explore these tabs:
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
**1. Services Tab - Application Performance**
|
|
|
|
|
|
- View all 18 microservices with live metrics
|
|
|
|
|
|
- See request rate, error rate, and latency (P50/P90/P99)
|
|
|
|
|
|
- Click on any service to drill down into operations
|
|
|
|
|
|
- Identify slow endpoints and error-prone operations
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
**2. Traces Tab - Request Flow Visualization**
|
|
|
|
|
|
- See complete request journeys across services
|
|
|
|
|
|
- Identify bottlenecks (slow database queries, API calls)
|
|
|
|
|
|
- Debug errors with full stack traces
|
|
|
|
|
|
- Correlate with logs for complete context
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
**3. Dashboards Tab - Infrastructure & Database Metrics**
|
|
|
|
|
|
- **PostgreSQL** - Monitor all 18 databases (connections, queries, cache hit ratio)
|
|
|
|
|
|
- **Redis** - Cache performance (memory, hit rate, commands/sec)
|
|
|
|
|
|
- **RabbitMQ** - Message queue health (depth, rates, consumers)
|
|
|
|
|
|
- **Kubernetes** - Cluster metrics (nodes, pods, containers)
|
|
|
|
|
|
|
|
|
|
|
|
**4. Logs Tab - Centralized Log Management**
|
|
|
|
|
|
- Search and filter logs from all services
|
|
|
|
|
|
- Click on trace ID in logs to see related request trace
|
|
|
|
|
|
- Auto-enriched with Kubernetes metadata (pod, namespace, container)
|
|
|
|
|
|
- Identify patterns and anomalies
|
|
|
|
|
|
|
|
|
|
|
|
**5. Alerts Tab - Proactive Monitoring**
|
|
|
|
|
|
- Configure alerts on metrics, traces, or logs
|
|
|
|
|
|
- Email/Slack/Webhook notifications
|
|
|
|
|
|
- View firing alerts and alert history
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
|
|
|
|
|
#### Quick Health Check
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# Verify SigNoz components are running
|
|
|
|
|
|
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
|
|
|
|
|
|
|
|
|
|
|
|
# Expected output:
|
|
|
|
|
|
# signoz-0 READY 1/1
|
|
|
|
|
|
# signoz-otel-collector-xxx READY 1/1
|
|
|
|
|
|
# signoz-alertmanager-xxx READY 1/1
|
|
|
|
|
|
# signoz-clickhouse-xxx READY 1/1
|
|
|
|
|
|
# signoz-zookeeper-xxx READY 1/1
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# Check OTel Collector health
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- wget -qO- http://localhost:13133
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# View recent telemetry in OTel Collector logs
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=50 | grep -i "traces\|metrics\|logs"
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
#### Verify Telemetry is Working
|
|
|
|
|
|
|
|
|
|
|
|
1. **Check Services are Reporting:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Open SigNoz and navigate to Services tab
|
|
|
|
|
|
# You should see all 18 microservices listed
|
|
|
|
|
|
|
|
|
|
|
|
# If services are missing, check if they're sending telemetry:
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Check Database Metrics:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Navigate to Dashboards → PostgreSQL in SigNoz
|
|
|
|
|
|
# You should see metrics from all 18 databases
|
|
|
|
|
|
|
|
|
|
|
|
# Verify OTel Collector is scraping databases:
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep postgresql
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Check Traces are Being Collected:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Make a test API request
|
|
|
|
|
|
curl https://bakewise.ai/api/v1/health
|
|
|
|
|
|
|
|
|
|
|
|
# Navigate to Traces tab in SigNoz
|
|
|
|
|
|
# Search for "gateway" service
|
|
|
|
|
|
# You should see the trace for your request
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
4. **Check Logs are Being Collected:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Navigate to Logs tab in SigNoz
|
|
|
|
|
|
# Filter by namespace: bakery-ia
|
|
|
|
|
|
# You should see logs from all pods
|
|
|
|
|
|
|
|
|
|
|
|
# Verify filelog receiver is working:
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep filelog
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
### Step 2: Configure Alerting
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
SigNoz includes integrated alerting with AlertManager. Configure it for your team:
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
#### Update Email Notification Settings
|
|
|
|
|
|
|
|
|
|
|
|
The alerting configuration is in the SigNoz Helm values. To update:
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
```bash
|
|
|
|
|
|
# For production, edit the values file:
|
|
|
|
|
|
nano infrastructure/helm/signoz-values-prod.yaml
|
|
|
|
|
|
|
|
|
|
|
|
# Update the alertmanager.config section:
|
|
|
|
|
|
# 1. Update SMTP settings:
|
|
|
|
|
|
# - smtp_from: 'your-alerts@bakewise.ai'
|
|
|
|
|
|
# - smtp_auth_username: 'your-alerts@bakewise.ai'
|
|
|
|
|
|
# - smtp_auth_password: (use Kubernetes secret)
|
|
|
|
|
|
#
|
|
|
|
|
|
# 2. Update receivers:
|
|
|
|
|
|
# - critical-alerts email: critical-alerts@bakewise.ai
|
|
|
|
|
|
# - warning-alerts email: oncall@bakewise.ai
|
|
|
|
|
|
#
|
|
|
|
|
|
# 3. (Optional) Add Slack webhook for critical alerts
|
|
|
|
|
|
|
|
|
|
|
|
# Apply the updated configuration:
|
|
|
|
|
|
helm upgrade signoz signoz/signoz \
|
|
|
|
|
|
-n bakery-ia \
|
|
|
|
|
|
-f infrastructure/helm/signoz-values-prod.yaml
|
2026-01-07 19:12:35 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
#### Create Alerts in SigNoz UI
|
|
|
|
|
|
|
|
|
|
|
|
1. **Open SigNoz Alerts Tab:**
|
|
|
|
|
|
```
|
|
|
|
|
|
https://monitoring.bakewise.ai/signoz → Alerts
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Create Common Alerts:**
|
|
|
|
|
|
|
|
|
|
|
|
**Alert 1: High Error Rate**
|
|
|
|
|
|
- Name: `HighErrorRate`
|
|
|
|
|
|
- Query: `error_rate > 5` for `5 minutes`
|
|
|
|
|
|
- Severity: `critical`
|
|
|
|
|
|
- Description: "Service {{service_name}} has error rate >5%"
|
|
|
|
|
|
|
|
|
|
|
|
**Alert 2: High Latency**
|
|
|
|
|
|
- Name: `HighLatency`
|
|
|
|
|
|
- Query: `P99_latency > 3000ms` for `5 minutes`
|
|
|
|
|
|
- Severity: `warning`
|
|
|
|
|
|
- Description: "Service {{service_name}} P99 latency >3s"
|
|
|
|
|
|
|
|
|
|
|
|
**Alert 3: Service Down**
|
|
|
|
|
|
- Name: `ServiceDown`
|
|
|
|
|
|
- Query: `request_rate == 0` for `2 minutes`
|
|
|
|
|
|
- Severity: `critical`
|
|
|
|
|
|
- Description: "Service {{service_name}} not receiving requests"
|
|
|
|
|
|
|
|
|
|
|
|
**Alert 4: Database Connection Issues**
|
|
|
|
|
|
- Name: `DatabaseConnectionsHigh`
|
|
|
|
|
|
- Query: `pg_active_connections > 80` for `5 minutes`
|
|
|
|
|
|
- Severity: `warning`
|
|
|
|
|
|
- Description: "Database {{database}} connection count >80%"
|
|
|
|
|
|
|
|
|
|
|
|
**Alert 5: High Memory Usage**
|
|
|
|
|
|
- Name: `HighMemoryUsage`
|
|
|
|
|
|
- Query: `container_memory_percent > 85` for `5 minutes`
|
|
|
|
|
|
- Severity: `warning`
|
|
|
|
|
|
- Description: "Pod {{pod_name}} using >85% memory"
|
|
|
|
|
|
|
|
|
|
|
|
#### Test Alert Delivery
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```bash
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# Method 1: Create a test alert in SigNoz UI
|
|
|
|
|
|
# Go to Alerts → New Alert → Set a test condition that will fire
|
|
|
|
|
|
|
|
|
|
|
|
# Method 2: Fire a test alert via stress test
|
2026-01-08 12:58:00 +01:00
|
|
|
|
kubectl run memory-test --image=polinux/stress --restart=Never \
|
|
|
|
|
|
--namespace=bakery-ia -- stress --vm 1 --vm-bytes 600M --timeout 300s
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# Check alert appears in SigNoz Alerts tab
|
|
|
|
|
|
# https://monitoring.bakewise.ai/signoz → Alerts
|
|
|
|
|
|
|
|
|
|
|
|
# Also check AlertManager
|
2026-01-08 12:58:00 +01:00
|
|
|
|
# https://monitoring.bakewise.ai/alertmanager
|
|
|
|
|
|
|
|
|
|
|
|
# Verify email notification received
|
|
|
|
|
|
|
|
|
|
|
|
# Clean up test
|
|
|
|
|
|
kubectl delete pod memory-test -n bakery-ia
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
#### Configure Notification Channels
|
|
|
|
|
|
|
|
|
|
|
|
In SigNoz Alerts tab, configure channels:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Email Channel:**
|
|
|
|
|
|
- Already configured via AlertManager
|
|
|
|
|
|
- Emails sent to addresses in signoz-values-prod.yaml
|
|
|
|
|
|
|
|
|
|
|
|
2. **Slack Channel (Optional):**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Add Slack webhook URL to signoz-values-prod.yaml
|
|
|
|
|
|
# Under alertmanager.config.receivers.critical-alerts.slack_configs:
|
|
|
|
|
|
# - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
|
|
|
|
|
|
# channel: '#alerts-critical'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Webhook Channel (Optional):**
|
|
|
|
|
|
- Configure custom webhook for integration with PagerDuty, OpsGenie, etc.
|
|
|
|
|
|
- Add to alertmanager.config.receivers
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
### Step 3: Configure Backups
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Create backup script on VPS
|
|
|
|
|
|
cat > ~/backup-databases.sh <<'EOF'
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
|
|
|
|
|
|
mkdir -p $BACKUP_DIR
|
|
|
|
|
|
|
|
|
|
|
|
# Get all database pods
|
|
|
|
|
|
DBS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database -o name)
|
|
|
|
|
|
|
|
|
|
|
|
for db in $DBS; do
|
|
|
|
|
|
DB_NAME=$(echo $db | cut -d'/' -f2)
|
|
|
|
|
|
echo "Backing up $DB_NAME..."
|
|
|
|
|
|
|
|
|
|
|
|
kubectl exec -n bakery-ia $db -- pg_dump -U postgres > "$BACKUP_DIR/${DB_NAME}.sql"
|
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
|
|
# Compress backups
|
|
|
|
|
|
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
|
|
|
|
|
|
rm -rf "$BACKUP_DIR"
|
|
|
|
|
|
|
|
|
|
|
|
# Keep only last 7 days
|
|
|
|
|
|
find /backups -name "*.tar.gz" -mtime +7 -delete
|
|
|
|
|
|
|
|
|
|
|
|
echo "Backup completed: $BACKUP_DIR.tar.gz"
|
|
|
|
|
|
EOF
|
|
|
|
|
|
|
|
|
|
|
|
chmod +x ~/backup-databases.sh
|
|
|
|
|
|
|
|
|
|
|
|
# Test backup
|
|
|
|
|
|
./backup-databases.sh
|
|
|
|
|
|
|
|
|
|
|
|
# Setup daily cron job (2 AM)
|
|
|
|
|
|
(crontab -l 2>/dev/null; echo "0 2 * * * ~/backup-databases.sh") | crontab -
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 3: Setup Alerting
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Update AlertManager configuration with your email
|
|
|
|
|
|
kubectl edit configmap -n monitoring alertmanager-config
|
|
|
|
|
|
|
|
|
|
|
|
# Update recipient emails in the routes section
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
### Step 4: Verify SigNoz Monitoring is Working
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
|
|
|
|
|
Before proceeding, ensure all monitoring components are operational:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# 1. Verify SigNoz pods are running
|
|
|
|
|
|
kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz
|
|
|
|
|
|
|
|
|
|
|
|
# Expected pods (all should be Running/Ready):
|
|
|
|
|
|
# - signoz-0 (or signoz-1, signoz-2 for HA)
|
|
|
|
|
|
# - signoz-otel-collector-xxx
|
|
|
|
|
|
# - signoz-alertmanager-xxx
|
|
|
|
|
|
# - signoz-clickhouse-xxx
|
|
|
|
|
|
# - signoz-zookeeper-xxx
|
|
|
|
|
|
|
|
|
|
|
|
# 2. Check SigNoz UI is accessible
|
|
|
|
|
|
curl -I https://monitoring.bakewise.ai/signoz
|
|
|
|
|
|
# Should return: HTTP/2 200 OK
|
|
|
|
|
|
|
|
|
|
|
|
# 3. Verify OTel Collector is receiving data
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=100 | grep -i "received"
|
|
|
|
|
|
# Should show: "Traces received: X" "Metrics received: Y" "Logs received: Z"
|
|
|
|
|
|
|
|
|
|
|
|
# 4. Check ClickHouse database is healthy
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/signoz-clickhouse -- clickhouse-client --query="SELECT count() FROM system.tables WHERE database LIKE 'signoz_%'"
|
|
|
|
|
|
# Should return a number > 0 (tables exist)
|
|
|
|
|
|
```
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
**Complete Verification Checklist:**
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- [ ] **SigNoz UI loads** at https://monitoring.bakewise.ai/signoz
|
|
|
|
|
|
- [ ] **Services tab shows all 18 microservices** with metrics
|
|
|
|
|
|
- [ ] **Traces tab has sample traces** from gateway and other services
|
|
|
|
|
|
- [ ] **Dashboards tab shows PostgreSQL metrics** from all 18 databases
|
|
|
|
|
|
- [ ] **Dashboards tab shows Redis metrics** (memory, commands, etc.)
|
|
|
|
|
|
- [ ] **Dashboards tab shows RabbitMQ metrics** (queues, messages)
|
|
|
|
|
|
- [ ] **Dashboards tab shows Kubernetes metrics** (nodes, pods)
|
|
|
|
|
|
- [ ] **Logs tab displays logs** from all services in bakery-ia namespace
|
|
|
|
|
|
- [ ] **Alerts tab is accessible** and can create new alerts
|
|
|
|
|
|
- [ ] **AlertManager** is reachable at https://monitoring.bakewise.ai/alertmanager
|
2026-01-08 12:58:00 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
**If any checks fail, troubleshoot:**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check OTel Collector configuration
|
|
|
|
|
|
kubectl describe configmap -n bakery-ia signoz-otel-collector
|
|
|
|
|
|
|
|
|
|
|
|
# Check for errors in OTel Collector
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep -i error
|
|
|
|
|
|
|
|
|
|
|
|
# Check ClickHouse is accepting writes
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/signoz-clickhouse | grep -i error
|
|
|
|
|
|
|
|
|
|
|
|
# Restart OTel Collector if needed
|
|
|
|
|
|
kubectl rollout restart deployment/signoz-otel-collector -n bakery-ia
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Step 5: Document Everything
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
Create a secure runbook with all credentials and procedures:
|
|
|
|
|
|
|
|
|
|
|
|
**Essential Information to Document:**
|
|
|
|
|
|
- [ ] VPS login credentials (stored securely in password manager)
|
2026-01-07 19:12:35 +01:00
|
|
|
|
- [ ] Database passwords (in password manager)
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- [ ] Grafana admin password
|
|
|
|
|
|
- [ ] Domain registrar access (for bakewise.ai)
|
2026-01-07 19:12:35 +01:00
|
|
|
|
- [ ] Cloudflare access
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- [ ] Email service credentials (SMTP)
|
2026-01-07 19:12:35 +01:00
|
|
|
|
- [ ] WhatsApp API credentials
|
|
|
|
|
|
- [ ] Docker Hub / Registry credentials
|
|
|
|
|
|
- [ ] Emergency contact information
|
|
|
|
|
|
- [ ] Rollback procedures
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- [ ] Monitoring URLs and access procedures
|
|
|
|
|
|
|
|
|
|
|
|
### Step 6: Train Your Team
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
Conduct a training session covering SigNoz and operational procedures:
|
|
|
|
|
|
|
|
|
|
|
|
#### Part 1: SigNoz Navigation (30 minutes)
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- [ ] **Login and Overview**
|
|
|
|
|
|
- Show how to access https://monitoring.bakewise.ai/signoz
|
|
|
|
|
|
- Navigate through main tabs: Services, Traces, Dashboards, Logs, Alerts
|
|
|
|
|
|
- Explain the unified nature of SigNoz (all-in-one platform)
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- [ ] **Services Tab - Application Performance Monitoring**
|
|
|
|
|
|
- Show all 18 microservices
|
|
|
|
|
|
- Explain RED metrics (Request rate, Error rate, Duration/latency)
|
|
|
|
|
|
- Demo: Click on a service → Operations → See endpoint breakdown
|
|
|
|
|
|
- Demo: Identify slow endpoints and high error rates
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Traces Tab - Request Flow Debugging**
|
|
|
|
|
|
- Show how to search for traces by service, operation, or time
|
|
|
|
|
|
- Demo: Click on a trace → See full waterfall (service → database → cache)
|
|
|
|
|
|
- Demo: Find slow database queries in trace spans
|
|
|
|
|
|
- Demo: Click "View Logs" to correlate trace with logs
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Dashboards Tab - Infrastructure Monitoring**
|
|
|
|
|
|
- Navigate to PostgreSQL dashboard → Show all 18 databases
|
|
|
|
|
|
- Navigate to Redis dashboard → Show cache metrics
|
|
|
|
|
|
- Navigate to Kubernetes dashboard → Show node/pod metrics
|
|
|
|
|
|
- Explain what metrics indicate issues (connection %, memory %, etc.)
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Logs Tab - Log Search and Analysis**
|
|
|
|
|
|
- Show how to filter by service, severity, time range
|
|
|
|
|
|
- Demo: Search for "error" in last hour
|
|
|
|
|
|
- Demo: Click on trace_id in log → Jump to related trace
|
|
|
|
|
|
- Show Kubernetes metadata (pod, namespace, container)
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Alerts Tab - Proactive Monitoring**
|
|
|
|
|
|
- Show how to create alerts on metrics
|
|
|
|
|
|
- Review pre-configured alerts
|
|
|
|
|
|
- Show alert history and firing alerts
|
|
|
|
|
|
- Explain how to acknowledge/silence alerts
|
|
|
|
|
|
|
|
|
|
|
|
#### Part 2: Operational Tasks (30 minutes)
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Check application logs** (multiple ways)
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```bash
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# Method 1: Via kubectl (for immediate debugging)
|
2026-01-08 12:58:00 +01:00
|
|
|
|
kubectl logs -n bakery-ia deployment/orders-service --tail=100 -f
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
# Method 2: Via SigNoz Logs tab (for analysis and correlation)
|
|
|
|
|
|
# 1. Open https://monitoring.bakewise.ai/signoz → Logs
|
|
|
|
|
|
# 2. Filter by k8s_deployment_name: orders-service
|
|
|
|
|
|
# 3. Click on trace_id to see related request flow
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Restart services when needed**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Restart a service (rolling update, no downtime)
|
|
|
|
|
|
kubectl rollout restart deployment/orders-service -n bakery-ia
|
2026-01-10 13:43:38 +01:00
|
|
|
|
|
|
|
|
|
|
# Verify restart in SigNoz:
|
|
|
|
|
|
# 1. Check Services tab → orders-service → Should show brief dip then recovery
|
|
|
|
|
|
# 2. Check Logs tab → Filter by orders-service → See restart logs
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Investigate performance issues**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Scenario: "Orders API is slow"
|
|
|
|
|
|
# 1. SigNoz → Services → orders-service → Check P99 latency
|
|
|
|
|
|
# 2. SigNoz → Traces → Filter service:orders-service, duration:>1s
|
|
|
|
|
|
# 3. Click on slow trace → Identify bottleneck (DB query? External API?)
|
|
|
|
|
|
# 4. SigNoz → Dashboards → PostgreSQL → Check orders_db connections/queries
|
|
|
|
|
|
# 5. Fix identified issue (add index, optimize query, scale service)
|
2026-01-08 12:58:00 +01:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Respond to alerts**
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- Show how to access alerts in SigNoz → Alerts tab
|
|
|
|
|
|
- Show AlertManager UI at https://monitoring.bakewise.ai/alertmanager
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- Review common alerts and their resolution steps
|
|
|
|
|
|
- Reference the [Production Operations Guide](./PRODUCTION_OPERATIONS_GUIDE.md)
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
#### Part 3: Documentation and Resources (10 minutes)
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- [ ] **Share documentation**
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- [PILOT_LAUNCH_GUIDE.md](./PILOT_LAUNCH_GUIDE.md) - This guide (deployment)
|
|
|
|
|
|
- [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md) - Daily operations with SigNoz
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- [security-checklist.md](./security-checklist.md) - Security procedures
|
|
|
|
|
|
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- [ ] **Bookmark key URLs**
|
|
|
|
|
|
- SigNoz: https://monitoring.bakewise.ai/signoz
|
|
|
|
|
|
- AlertManager: https://monitoring.bakewise.ai/alertmanager
|
|
|
|
|
|
- Production app: https://bakewise.ai
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- [ ] **Setup on-call rotation** (if applicable)
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- Configure rotation schedule in AlertManager
|
2026-01-08 12:58:00 +01:00
|
|
|
|
- Document escalation procedures
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- Test alert delivery to on-call phone/email
|
|
|
|
|
|
|
|
|
|
|
|
#### Part 4: Hands-On Exercise (15 minutes)
|
|
|
|
|
|
|
|
|
|
|
|
**Exercise: Investigate a Simulated Issue**
|
|
|
|
|
|
|
|
|
|
|
|
1. Create a load test to generate traffic
|
|
|
|
|
|
2. Use SigNoz to find the slowest endpoint
|
|
|
|
|
|
3. Identify the root cause using traces
|
|
|
|
|
|
4. Correlate with logs to confirm
|
|
|
|
|
|
5. Check infrastructure metrics (DB, memory, CPU)
|
|
|
|
|
|
6. Propose a fix based on findings
|
|
|
|
|
|
|
|
|
|
|
|
This trains the team to use SigNoz effectively for real incidents.
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
|
|
### Issue: Pods Not Starting
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check pod status
|
|
|
|
|
|
kubectl describe pod POD_NAME -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Common causes:
|
|
|
|
|
|
# 1. Image pull errors
|
|
|
|
|
|
kubectl get events -n bakery-ia | grep -i "pull"
|
|
|
|
|
|
|
|
|
|
|
|
# 2. Resource limits
|
|
|
|
|
|
kubectl describe node
|
|
|
|
|
|
|
|
|
|
|
|
# 3. Volume mount issues
|
|
|
|
|
|
kubectl get pvc -n bakery-ia
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Issue: Certificate Not Issuing
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check certificate status
|
|
|
|
|
|
kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Check cert-manager logs
|
|
|
|
|
|
kubectl logs -n cert-manager deployment/cert-manager
|
|
|
|
|
|
|
|
|
|
|
|
# Check challenges
|
|
|
|
|
|
kubectl get challenges -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Verify DNS is correct
|
|
|
|
|
|
nslookup bakery.yourdomain.com
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Issue: Database Connection Errors
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check database pod
|
|
|
|
|
|
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
|
|
|
|
|
|
|
|
|
|
|
|
# Check database logs
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/auth-db
|
|
|
|
|
|
|
|
|
|
|
|
# Test connection from service pod
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/auth-service -- nc -zv auth-db 5432
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Issue: Services Can't Connect to Databases
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check if SSL is enabled
|
|
|
|
|
|
kubectl exec -n bakery-ia deployment/auth-db -- sh -c \
|
|
|
|
|
|
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
|
|
|
|
|
|
|
|
|
|
|
|
# Check service logs for SSL errors
|
|
|
|
|
|
kubectl logs -n bakery-ia deployment/auth-service | grep -i "ssl\|tls"
|
|
|
|
|
|
|
|
|
|
|
|
# Restart service to pick up new SSL config
|
|
|
|
|
|
kubectl rollout restart deployment/auth-service -n bakery-ia
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Issue: Out of Resources
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Check node resources
|
|
|
|
|
|
kubectl top nodes
|
|
|
|
|
|
|
|
|
|
|
|
# Check pod resource usage
|
|
|
|
|
|
kubectl top pods -n bakery-ia
|
|
|
|
|
|
|
|
|
|
|
|
# Identify resource hogs
|
|
|
|
|
|
kubectl top pods -n bakery-ia --sort-by=memory
|
|
|
|
|
|
|
|
|
|
|
|
# Scale down non-critical services temporarily
|
|
|
|
|
|
kubectl scale deployment monitoring -n bakery-ia --replicas=0
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Next Steps After Successful Launch
|
|
|
|
|
|
|
|
|
|
|
|
1. **Monitor for 48 Hours**
|
|
|
|
|
|
- Check dashboards daily
|
|
|
|
|
|
- Review error logs
|
|
|
|
|
|
- Monitor resource usage
|
|
|
|
|
|
- Test all functionality
|
|
|
|
|
|
|
|
|
|
|
|
2. **Optimize Based on Metrics**
|
|
|
|
|
|
- Adjust resource limits if needed
|
|
|
|
|
|
- Fine-tune autoscaling thresholds
|
|
|
|
|
|
- Optimize database queries if slow
|
|
|
|
|
|
|
|
|
|
|
|
3. **Onboard First Tenant**
|
|
|
|
|
|
- Create test tenant
|
|
|
|
|
|
- Upload sample data
|
|
|
|
|
|
- Test all features
|
|
|
|
|
|
- Gather feedback
|
|
|
|
|
|
|
|
|
|
|
|
4. **Scale Gradually**
|
|
|
|
|
|
- Add 1-2 tenants at a time
|
|
|
|
|
|
- Monitor resource usage
|
|
|
|
|
|
- Upgrade VPS if needed (see scaling guide)
|
|
|
|
|
|
|
|
|
|
|
|
5. **Plan for Growth**
|
|
|
|
|
|
- Review [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md)
|
|
|
|
|
|
- Implement additional monitoring
|
|
|
|
|
|
- Plan capacity upgrades
|
|
|
|
|
|
- Consider managed services for scale
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Cost Scaling Path
|
|
|
|
|
|
|
|
|
|
|
|
| Tenants | RAM | CPU | Storage | Monthly Cost |
|
|
|
|
|
|
|---------|-----|-----|---------|--------------|
|
|
|
|
|
|
| 10 | 20 GB | 8 cores | 200 GB | €40-80 |
|
|
|
|
|
|
| 25 | 32 GB | 12 cores | 300 GB | €80-120 |
|
|
|
|
|
|
| 50 | 48 GB | 16 cores | 500 GB | €150-200 |
|
|
|
|
|
|
| 100+ | Consider multi-node cluster or managed K8s | €300+ |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Support Resources
|
|
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
**Documentation:**
|
|
|
|
|
|
- **Operations Guide:** [PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md) - Daily operations, monitoring, incident response
|
|
|
|
|
|
- **Security Guide:** [security-checklist.md](./security-checklist.md) - Security procedures and compliance
|
|
|
|
|
|
- **Database Security:** [database-security.md](./database-security.md) - Database operations and TLS configuration
|
|
|
|
|
|
- **TLS Configuration:** [tls-configuration.md](./tls-configuration.md) - Certificate management
|
|
|
|
|
|
- **RBAC Implementation:** [rbac-implementation.md](./rbac-implementation.md) - Access control
|
|
|
|
|
|
|
|
|
|
|
|
**Monitoring Access:**
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- **SigNoz (Primary):** https://monitoring.bakewise.ai/signoz - All-in-one observability
|
|
|
|
|
|
- Services: Application performance monitoring (APM)
|
|
|
|
|
|
- Traces: Distributed tracing across all services
|
|
|
|
|
|
- Dashboards: PostgreSQL, Redis, RabbitMQ, Kubernetes metrics
|
|
|
|
|
|
- Logs: Centralized log management with trace correlation
|
|
|
|
|
|
- Alerts: Alert configuration and management
|
|
|
|
|
|
- **AlertManager:** https://monitoring.bakewise.ai/alertmanager - Alert routing and notifications
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
2026-01-08 12:58:00 +01:00
|
|
|
|
**External Resources:**
|
2026-01-07 19:12:35 +01:00
|
|
|
|
- **MicroK8s Docs:** https://microk8s.io/docs
|
|
|
|
|
|
- **Kubernetes Docs:** https://kubernetes.io/docs
|
|
|
|
|
|
- **Let's Encrypt:** https://letsencrypt.org/docs
|
|
|
|
|
|
- **Cloudflare DNS:** https://developers.cloudflare.com/dns
|
2026-01-10 13:43:38 +01:00
|
|
|
|
- **SigNoz Documentation:** https://signoz.io/docs/
|
|
|
|
|
|
- **OpenTelemetry Documentation:** https://opentelemetry.io/docs/
|
|
|
|
|
|
|
|
|
|
|
|
**Monitoring Architecture:**
|
|
|
|
|
|
- **OpenTelemetry:** Industry-standard instrumentation framework
|
|
|
|
|
|
- Auto-instruments FastAPI, HTTPX, SQLAlchemy, Redis
|
|
|
|
|
|
- Collects traces, metrics, and logs from all services
|
|
|
|
|
|
- Exports to SigNoz via OTLP protocol (gRPC port 4317, HTTP port 4318)
|
|
|
|
|
|
- **SigNoz Components:**
|
|
|
|
|
|
- **Frontend:** Web UI for visualization and analysis
|
|
|
|
|
|
- **OTel Collector:** Receives and processes telemetry data
|
|
|
|
|
|
- **ClickHouse:** Time-series database for fast queries
|
|
|
|
|
|
- **AlertManager:** Alert routing and notification delivery
|
|
|
|
|
|
- **Zookeeper:** Coordination service for ClickHouse cluster
|
2026-01-07 19:12:35 +01:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Summary Checklist
|
|
|
|
|
|
|
|
|
|
|
|
Before going live, ensure:
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] VPS provisioned and accessible
|
|
|
|
|
|
- [ ] MicroK8s installed and configured
|
|
|
|
|
|
- [ ] Domain registered and DNS configured
|
|
|
|
|
|
- [ ] Cloudflare protection enabled
|
|
|
|
|
|
- [ ] TLS certificates generated
|
|
|
|
|
|
- [ ] Email service configured and tested
|
|
|
|
|
|
- [ ] WhatsApp API setup (optional for launch)
|
|
|
|
|
|
- [ ] Container images built and pushed
|
|
|
|
|
|
- [ ] Production configs updated (domains, CORS, etc.)
|
|
|
|
|
|
- [ ] Secrets generated (strong passwords!)
|
|
|
|
|
|
- [ ] All pods running successfully
|
|
|
|
|
|
- [ ] Databases accepting TLS connections
|
|
|
|
|
|
- [ ] Let's Encrypt certificates issued
|
|
|
|
|
|
- [ ] Frontend accessible via HTTPS
|
|
|
|
|
|
- [ ] API health check passing
|
|
|
|
|
|
- [ ] Test user can login
|
|
|
|
|
|
- [ ] Email delivery working
|
|
|
|
|
|
- [ ] Monitoring dashboards loading
|
|
|
|
|
|
- [ ] Backups configured and tested
|
|
|
|
|
|
- [ ] Team trained on operations
|
|
|
|
|
|
- [ ] Documentation complete
|
|
|
|
|
|
- [ ] Emergency procedures documented
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**🎉 Congratulations! Your Bakery-IA platform is now live in production!**
|
|
|
|
|
|
|
|
|
|
|
|
*Estimated total time: 2-4 hours for first deployment*
|
|
|
|
|
|
*Subsequent updates: 15-30 minutes*
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**Document Version:** 1.0
|
|
|
|
|
|
**Last Updated:** 2026-01-07
|
|
|
|
|
|
**Maintained By:** DevOps Team
|