Files
bakery-ia/infrastructure/kubernetes/README.md
Urtzi Alfaro dfb7e4b237 Add signoz
2026-01-08 12:58:00 +01:00

299 lines
9.5 KiB
Markdown

# Bakery IA Kubernetes Configuration
This directory contains Kubernetes manifests for deploying the Bakery IA platform in local development and production environments with HTTPS support using cert-manager and NGINX ingress.
## Quick Start
Deploy the entire platform with these 4 commands:
```bash
# 1. Start Colima with adequate resources
colima start --cpu 6 --memory 12 --disk 120 --runtime docker --profile k8s-local
# 2. Create Kind cluster with permanent localhost access
kind create cluster --config kind-config.yaml
# 3. Install NGINX Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=300s
# 4. Deploy with Tilt
tilt up
# 🎉 Access at: http://localhost (or see Tilt for individual service ports)
```
> **Note**: The kind-config.yaml already configures port mappings (30080→80, 30443→443) for localhost access, so no additional service patching is needed. The NGINX Ingress for Kind uses NodePort by default on those exact ports.
## Prerequisites
Install the following tools on macOS:
```bash
# Install via Homebrew
brew install colima kind kubectl skaffold
# Verify installations
colima version && kind version && kubectl version --client && skaffold version
```
## Directory Structure
```
infrastructure/kubernetes/
├── base/ # Base Kubernetes resources
│ ├── namespace.yaml # Namespace definition
│ ├── configmap.yaml # Shared configuration
│ ├── secrets.yaml # Base64 encoded secrets
│ ├── ingress-https.yaml # HTTPS ingress rules
│ ├── kustomization.yaml # Base kustomization
│ └── components/ # Individual component manifests
│ ├── cert-manager/ # Certificate management
│ ├── auth/ # Authentication service
│ ├── tenant/ # Tenant management
│ ├── training/ # ML training service
│ ├── forecasting/ # Demand forecasting
│ ├── sales/ # Sales management
│ ├── external/ # External API service
│ ├── notification/ # Notification service
│ ├── inventory/ # Inventory management
│ ├── recipes/ # Recipe management
│ ├── suppliers/ # Supplier management
│ ├── pos/ # Point of sale
│ ├── orders/ # Order management
│ ├── production/ # Production planning
│ ├── alert-processor/ # Alert processing
│ ├── frontend/ # React frontend
│ ├── databases/ # Database deployments
│ └── infrastructure/ # Gateway & monitoring
└── overlays/
└── dev/ # Development environment
├── kustomization.yaml # Dev-specific configuration
└── dev-patches.yaml # Development patches
```
## Access URLs
### Primary Access (Standard Web Ports)
- **Frontend**: https://localhost
- **API Gateway**: https://localhost/api
### Named Host Access (Optional)
Add to `/etc/hosts` for named access:
```bash
echo "127.0.0.1 bakery-ia.local" | sudo tee -a /etc/hosts
echo "127.0.0.1 api.bakery-ia.local" | sudo tee -a /etc/hosts
echo "127.0.0.1 monitoring.bakery-ia.local" | sudo tee -a /etc/hosts
```
Then access via:
- **Frontend**: https://bakery-ia.local
- **API**: https://api.bakery-ia.local
- **Monitoring**: https://monitoring.bakery-ia.local
### Direct Service Access (Development)
- **Frontend**: http://localhost:3000
- **Gateway**: http://localhost:8000
## Development Workflow
### Start Development Environment
```bash
# Start development mode with hot-reload using Tilt
tilt up
# Or start in background
tilt up --stream
```
### Key Features
-**Hot-reload development** - Automatic rebuilds on code changes
-**Permanent localhost access** - No port forwarding needed
-**HTTPS by default** - Local CA certificates for secure development
-**Microservices architecture** - All services deployed together
-**Database management** - PostgreSQL, Redis, and RabbitMQ included
### Monitor and Debug
```bash
# Check all resources
kubectl get all -n bakery-ia
# View logs
kubectl logs -n bakery-ia deployment/auth-service -f
# Check ingress status
kubectl get ingress -n bakery-ia
# Debug certificate issues
kubectl describe certificate bakery-ia-tls-cert -n bakery-ia
```
## Certificate Management
The platform uses cert-manager for automatic HTTPS certificate generation:
- **Local CA**: For development (default)
- **Let's Encrypt Staging**: For testing
- **Let's Encrypt Production**: For production deployments
### Trust Local Certificates
```bash
# Export CA certificate
kubectl get secret local-ca-key-pair -n cert-manager -o jsonpath='{.data.tls\.crt}' | base64 -d > bakery-ia-ca.crt
# Trust in macOS
open bakery-ia-ca.crt
# In Keychain Access, set "bakery-ia-local-ca" to "Always Trust"
```
## Configuration Management
### Secrets
Base64-encoded secrets are stored in `base/secrets.yaml`. For production:
- Use external secret management (HashiCorp Vault, AWS Secrets Manager)
- Never commit real secrets to version control
```bash
# Encode secrets
echo -n "your-secret-value" | base64
# Decode secrets
echo "eW91ci1zZWNyZXQtdmFsdWU=" | base64 -d
```
### Environment Configuration
Development-specific settings are in `overlays/dev/`:
- **Resource limits**: Reduced for local development
- **Image pull policy**: Never (for local images)
- **Debug settings**: Enabled
- **CORS**: Configured for localhost
## Scaling and Resource Management
### Scale Services
```bash
# Scale individual service
kubectl scale -n bakery-ia deployment/auth-service --replicas=3
# Or update kustomization.yaml replicas section
```
### Resource Configuration
Development environment uses minimal resources:
- **Databases**: 64Mi-256Mi memory, 25m-200m CPU
- **Services**: 64Mi-256Mi memory, 25m-200m CPU
- **Training Service**: 256Mi-1Gi memory (ML workloads)
## Troubleshooting
### Common Issues
1. **Images not found**
```bash
# Build images with Skaffold
skaffold build --profile=dev
```
2. **Database corruption after restart**
```bash
# Delete corrupted PVC and restart
kubectl delete pod -n bakery-ia -l app.kubernetes.io/name=inventory-db
kubectl delete pvc -n bakery-ia inventory-db-pvc
```
3. **HTTPS certificate not issued**
```bash
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
kubectl describe certificate bakery-ia-tls-cert -n bakery-ia
```
4. **Port conflicts**
```bash
# Check what's using ports 80/443
sudo lsof -i :80 -i :443
```
### Debug Commands
```bash
# Get cluster events
kubectl get events -n bakery-ia --sort-by='.firstTimestamp'
# Resource usage
kubectl top pods -n bakery-ia
kubectl top nodes
# Execute in pod
kubectl exec -n bakery-ia -it <pod-name> -- bash
```
## Cleanup
### Quick Cleanup
```bash
# Stop Skaffold (Ctrl+C or)
skaffold delete --profile=dev
```
### Complete Cleanup
```bash
# Delete everything
kubectl delete namespace bakery-ia
kind delete cluster --name bakery-ia-local
colima stop --profile k8s-local
```
### Restart Sequence
```bash
# Post-restart startup (or use kubernetes_restart.sh script)
colima start --cpu 6 --memory 12 --disk 120 --runtime docker --profile k8s-local
kind create cluster --config kind-config.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=300s
tilt up
```
## Production Deployment
### Production URLs
The production environment uses the following domains:
- **Main Application**: https://bakewise.ai
- Frontend application and all public pages
- API endpoints: https://bakewise.ai/api/v1/...
- **Monitoring Stack**: https://monitoring.bakewise.ai
- Grafana: https://monitoring.bakewise.ai/grafana
- Prometheus: https://monitoring.bakewise.ai/prometheus
- Jaeger: https://monitoring.bakewise.ai/jaeger
- AlertManager: https://monitoring.bakewise.ai/alertmanager
### Production Configuration
The production overlay (`overlays/prod/`) includes:
- **Domain Configuration**: bakewise.ai with Let's Encrypt certificates
- **High Availability**: Multi-replica deployments (2-3 replicas per service)
- **Enhanced Security**: Rate limiting, CORS restrictions, security headers
- **Monitoring**: Full observability stack with Prometheus, Grafana, Jaeger
### Production Considerations
For production deployment:
- **Security**: Implement RBAC, network policies, pod security standards
- **Monitoring**: Deploy Prometheus, Grafana, and alerting
- **Backup**: Database backup strategies
- **High Availability**: Multi-replica deployments with anti-affinity
- **External Secrets**: Use managed secret services
- **TLS**: Production Let's Encrypt certificates
- **CI/CD**: Automated deployment pipelines
- **DNS**: Configure DNS A/CNAME records pointing to your cluster's load balancer
## Next Steps
1. Add comprehensive monitoring and logging
2. Implement automated testing
3. Set up CI/CD pipelines
4. Add health checks and metrics endpoints
5. Implement proper backup strategies