diff --git a/CI_CD_IMPLEMENTATION_PLAN.md b/CI_CD_IMPLEMENTATION_PLAN.md index 2cf75201..9dcb749c 100644 --- a/CI_CD_IMPLEMENTATION_PLAN.md +++ b/CI_CD_IMPLEMENTATION_PLAN.md @@ -886,7 +886,7 @@ microk8s kubectl apply -f infrastructure/ci-cd/tekton/pipelines/ microk8s kubectl apply -f infrastructure/ci-cd/tekton/triggers/ # Apply Flux configurations -microk8s kubectl apply -f infrastructure/ci-cd/flux/ +microk8s kubectl apply -k infrastructure/ci-cd/flux/ ``` --- diff --git a/DOCKER_MAINTENANCE.md b/DOCKER_MAINTENANCE.md deleted file mode 100644 index 094ed099..00000000 --- a/DOCKER_MAINTENANCE.md +++ /dev/null @@ -1,120 +0,0 @@ -# Docker Maintenance Guide for Local Development - -## The Problem - -When developing with Tilt and local Kubernetes (Kind), Docker accumulates: -- **Multiple image versions** from each code change (Tilt rebuilds) -- **Unused volumes** from previous cluster runs -- **Build cache** that grows over time - -This quickly fills up disk space, causing pods to fail with "No space left on device" errors. - -## Quick Fix (When You Hit Disk Issues) - -```bash -# Clean up all unused Docker resources -docker system prune -a --volumes -f -``` - -This removes: -- All unused images -- All unused volumes -- All build cache - -**Expected recovery**: 60-100GB - -## Regular Maintenance - -### Option 1: Use the Cleanup Script (Recommended) - -Run the maintenance script weekly: - -```bash -./scripts/cleanup-docker.sh -``` - -Or run it automatically without confirmation: - -```bash -./scripts/cleanup-docker.sh --auto -``` - -### Option 2: Manual Commands - -```bash -# Remove images older than 24 hours -docker image prune -af --filter "until=24h" - -# Remove unused volumes -docker volume prune -f - -# Remove build cache -docker builder prune -af -``` - -### Option 3: Set Up Automated Cleanup - -Add to your crontab (run every Sunday at 2 AM): - -```bash -crontab -e -# Add this line: -0 2 * * 0 /Users/urtzialfaro/Documents/bakery-ia/scripts/cleanup-docker.sh --auto >> /tmp/docker-cleanup.log 2>&1 -``` - -## Monitoring Disk Usage - -### Check Docker disk usage: -```bash -docker system df -``` - -### Check Kind node disk usage: -```bash -docker exec bakery-ia-local-control-plane df -h /var -``` - -### Alert thresholds: -- **< 70%**: Healthy ✅ -- **70-85%**: Consider cleanup soon ⚠️ -- **> 85%**: Run cleanup immediately 🚨 -- **> 95%**: Critical - pods will fail ❌ - -## Prevention Tips - -1. **Run cleanup weekly** to prevent accumulation -2. **Monitor disk usage** before long dev sessions -3. **Delete old Kind clusters** when switching projects: - ```bash - kind delete cluster --name bakery-ia-local - ``` -4. **Increase Docker disk allocation** in Docker Desktop settings if you frequently rebuild many services - -## Troubleshooting - -### Pods in CrashLoopBackOff after disk issues: - -1. Run cleanup (see Quick Fix above) -2. Restart failed pods: - ```bash - kubectl get pods -n bakery-ia | grep -E "(CrashLoopBackOff|Error)" | awk '{print $1}' | xargs kubectl delete pod -n bakery-ia - ``` - -### Cleanup didn't free enough space: - -If still above 90% after cleanup: - -```bash -# Nuclear option - rebuild everything -kind delete cluster --name bakery-ia-local -docker system prune -a --volumes -f -# Then recreate cluster with your setup scripts -``` - -## What Happened Today (2026-01-12) - -- **Issue**: Disk was 100% full (113GB/113GB), causing database pods to crash -- **Root cause**: 122 unused Docker images + 16 unused volumes + 6GB build cache -- **Solution**: Ran `docker system prune -a --volumes -f` -- **Result**: Freed 89GB, disk now at 22% usage (24GB/113GB) -- **All services recovered successfully** diff --git a/INFRASTRUCTURE_REORGANIZATION_PROPOSAL.md b/INFRASTRUCTURE_REORGANIZATION_PROPOSAL.md deleted file mode 100644 index ef36620d..00000000 --- a/INFRASTRUCTURE_REORGANIZATION_PROPOSAL.md +++ /dev/null @@ -1,413 +0,0 @@ -# Infrastructure Reorganization Proposal for Bakery-IA - -## Executive Summary - -This document presents a comprehensive analysis of the current infrastructure organization and proposes a restructured layout that improves maintainability, scalability, and operational efficiency. The proposal is based on a detailed examination of the existing 177 files across 31 directories in the infrastructure folder. - -## Current Infrastructure Analysis - -### Current Structure Overview - -``` -infrastructure/ -├── ci-cd/ # 18 files - CI/CD pipeline components -├── helm/ # 8 files - Helm charts and scripts -├── kubernetes/ # 103 files - Kubernetes manifests and configs -├── signoz/ # 11 files - Monitoring dashboards and scripts -└── tls/ # 37 files - TLS certificates and generation scripts -``` - -### Key Findings - -1. **Kubernetes Base Components (103 files)**: The most complex area with: - - 20+ service deployments across 15+ microservices - - 20+ database configurations (PostgreSQL, RabbitMQ, MinIO) - - 19 migration jobs for different services - - Infrastructure components (gateway, monitoring, etc.) - -2. **CI/CD Pipeline (18 files)**: - - Tekton tasks and pipelines for GitOps workflow - - Flux CD configuration for continuous delivery - - Gitea configuration for Git repository management - -3. **Monitoring (11 files)**: - - SigNoz dashboards for comprehensive observability - - Import scripts for dashboard management - -4. **TLS Certificates (37 files)**: - - CA certificates and generation scripts - - Service-specific certificates (PostgreSQL, Redis, MinIO) - - Certificate signing requests and configurations - -### Strengths of Current Organization - -1. **Logical Grouping**: Components are generally well-grouped by function -2. **Base/Overlay Pattern**: Kubernetes uses proper base/overlay structure -3. **Comprehensive Monitoring**: SigNoz dashboards cover all major aspects -4. **Security Focus**: Dedicated TLS certificate management - -### Challenges Identified - -1. **Complexity in Kubernetes Base**: 103 files make navigation difficult -2. **Mixed Component Types**: Services, databases, and infrastructure mixed together -3. **Limited Environment Separation**: Only dev/prod overlays, no staging -4. **Script Scattering**: Automation scripts spread across directories -5. **Documentation Gaps**: Some components lack clear documentation - -## Proposed Infrastructure Organization - -### High-Level Structure - -``` -infrastructure/ -├── environments/ # Environment-specific configurations -├── platform/ # Platform-level infrastructure -├── services/ # Application services and microservices -├── monitoring/ # Observability and monitoring -├── cicd/ # CI/CD pipeline components -├── security/ # Security configurations and certificates -├── scripts/ # Automation and utility scripts -├── docs/ # Infrastructure documentation -└── README.md # Top-level infrastructure guide -``` - -### Detailed Structure Proposal - -``` -infrastructure/ -├── environments/ # Environment-specific configurations -│ ├── dev/ -│ │ ├── k8s-manifests/ -│ │ │ ├── base/ -│ │ │ │ ├── namespace.yaml -│ │ │ │ ├── configmap.yaml -│ │ │ │ ├── secrets.yaml -│ │ │ │ └── ingress-https.yaml -│ │ │ ├── components/ -│ │ │ │ ├── databases/ -│ │ │ │ ├── infrastructure/ -│ │ │ │ ├── microservices/ -│ │ │ │ └── cert-manager/ -│ │ │ ├── configs/ -│ │ │ ├── cronjobs/ -│ │ │ ├── jobs/ -│ │ │ └── migrations/ -│ │ ├── kustomization.yaml -│ │ └── values/ -│ ├── staging/ # New staging environment -│ │ ├── k8s-manifests/ -│ │ └── values/ -│ └── prod/ -│ ├── k8s-manifests/ -│ ├── terraform/ # Production-specific IaC -│ └── values/ -├── platform/ # Platform-level infrastructure -│ ├── cluster/ -│ │ ├── eks/ # AWS EKS configuration -│ │ │ ├── terraform/ -│ │ │ └── manifests/ -│ │ └── kind/ # Local development cluster -│ │ ├── config.yaml -│ │ └── manifests/ -│ ├── networking/ -│ │ ├── dns/ -│ │ ├── load-balancers/ -│ │ └── ingress/ -│ │ ├── nginx/ -│ │ └── cert-manager/ -│ ├── security/ -│ │ ├── rbac/ -│ │ ├── network-policies/ -│ │ └── tls/ -│ │ ├── ca/ -│ │ ├── postgres/ -│ │ ├── redis/ -│ │ └── minio/ -│ └── storage/ -│ ├── postgres/ -│ ├── redis/ -│ └── minio/ -├── services/ # Application services -│ ├── databases/ -│ │ ├── postgres/ -│ │ │ ├── k8s-manifests/ -│ │ │ ├── backups/ -│ │ │ ├── monitoring/ -│ │ │ └── maintenance/ -│ │ ├── redis/ -│ │ │ ├── configs/ -│ │ │ └── monitoring/ -│ │ └── minio/ -│ │ ├── buckets/ -│ │ └── policies/ -│ ├── api-gateway/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ └── microservices/ -│ ├── auth/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── tenant/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── training/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── forecasting/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── sales/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── external/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── notification/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── inventory/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── recipes/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── suppliers/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── pos/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── orders/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── production/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── procurement/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── orchestrator/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── alert-processor/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── ai-insights/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ ├── demo-session/ -│ │ ├── k8s-manifests/ -│ │ └── configs/ -│ └── frontend/ -│ ├── k8s-manifests/ -│ └── configs/ -├── monitoring/ # Observability stack -│ ├── signoz/ -│ │ ├── manifests/ -│ │ ├── dashboards/ -│ │ │ ├── alert-management.json -│ │ │ ├── api-performance.json -│ │ │ ├── application-performance.json -│ │ │ ├── database-performance.json -│ │ │ ├── error-tracking.json -│ │ │ ├── index.json -│ │ │ ├── infrastructure-monitoring.json -│ │ │ ├── log-analysis.json -│ │ │ ├── system-health.json -│ │ │ └── user-activity.json -│ │ ├── values-dev.yaml -│ │ ├── values-prod.yaml -│ │ ├── deploy-signoz.sh -│ │ ├── verify-signoz.sh -│ │ └── generate-test-traffic.sh -│ └── opentelemetry/ -│ ├── collector/ -│ └── agent/ -├── cicd/ # CI/CD pipeline -│ ├── gitea/ -│ │ ├── values.yaml -│ │ └── ingress.yaml -│ ├── tekton/ -│ │ ├── tasks/ -│ │ │ ├── git-clone.yaml -│ │ │ ├── detect-changes.yaml -│ │ │ ├── kaniko-build.yaml -│ │ │ └── update-gitops.yaml -│ │ ├── pipelines/ -│ │ └── triggers/ -│ └── flux/ -│ ├── git-repository.yaml -│ └── kustomization.yaml -├── security/ # Security configurations -│ ├── policies/ -│ │ ├── network-policies.yaml -│ │ ├── pod-security.yaml -│ │ └── rbac.yaml -│ ├── certificates/ -│ │ ├── ca/ -│ │ ├── services/ -│ │ └── rotation-scripts/ -│ ├── scanning/ -│ │ ├── trivy/ -│ │ └── policies/ -│ └── compliance/ -│ ├── cis-benchmarks/ -│ └── audit-scripts/ -├── scripts/ # Automation scripts -│ ├── setup/ -│ │ ├── generate-certificates.sh -│ │ ├── generate-minio-certificates.sh -│ │ └── setup-dockerhub-secrets.sh -│ ├── deployment/ -│ │ ├── deploy-signoz.sh -│ │ └── verify-signoz.sh -│ ├── maintenance/ -│ │ ├── regenerate_migrations_k8s.sh -│ │ └── kubernetes_restart.sh -│ └── verification/ -│ └── verify-registry.sh -├── docs/ # Infrastructure documentation -│ ├── architecture/ -│ │ ├── diagrams/ -│ │ └── decisions/ -│ ├── operations/ -│ │ ├── runbooks/ -│ │ └── troubleshooting/ -│ ├── onboarding/ -│ └── reference/ -│ ├── api/ -│ └── configurations/ -└── README.md -``` - -## Migration Strategy - -### Phase 1: Preparation and Planning - -1. **Inventory Analysis**: Complete detailed inventory of all current files -2. **Dependency Mapping**: Identify dependencies between components -3. **Impact Assessment**: Determine which components can be moved safely -4. **Backup Strategy**: Ensure all files are backed up before migration - -### Phase 2: Non-Critical Components - -1. **Documentation**: Move and update all documentation files -2. **Scripts**: Organize automation scripts into new structure -3. **Monitoring**: Migrate SigNoz dashboards and configurations -4. **CI/CD**: Reorganize pipeline components - -### Phase 3: Environment-Specific Components - -1. **Create Environment Structure**: Set up dev/staging/prod directories -2. **Migrate Kubernetes Manifests**: Move base components to appropriate locations -3. **Update References**: Ensure all cross-references are corrected -4. **Environment Validation**: Test each environment separately - -### Phase 4: Service Components - -1. **Database Migration**: Move database configurations to services/databases -2. **Microservice Organization**: Group microservices by domain -3. **Infrastructure Components**: Move gateway and other infrastructure -4. **Service Validation**: Test each service in isolation - -### Phase 5: Finalization - -1. **Integration Testing**: Test complete infrastructure workflow -2. **Documentation Update**: Finalize all documentation -3. **Team Training**: Conduct training on new structure -4. **Cleanup**: Remove old structure and temporary files - -## Benefits of Proposed Structure - -### 1. Improved Navigation -- **Clear Hierarchy**: Logical grouping by function and environment -- **Consistent Patterns**: Standardized structure across all components -- **Reduced Cognitive Load**: Easier to find specific components - -### 2. Enhanced Maintainability -- **Environment Isolation**: Clear separation of dev/staging/prod -- **Component Grouping**: Related components grouped together -- **Standardized Structure**: Consistent patterns across services - -### 3. Better Scalability -- **Modular Design**: Easy to add new services or environments -- **Domain Separation**: Services organized by business domain -- **Infrastructure Independence**: Platform components separate from services - -### 4. Improved Security -- **Centralized Security**: All security configurations in one place -- **Environment-Specific Policies**: Tailored security for each environment -- **Better Secret Management**: Clear structure for sensitive data - -### 5. Enhanced Observability -- **Comprehensive Monitoring**: All observability tools grouped -- **Standardized Dashboards**: Consistent monitoring across services -- **Centralized Logging**: Better log management structure - -## Implementation Considerations - -### Tools and Technologies -- **Terraform**: For infrastructure as code (IaC) -- **Kustomize**: For Kubernetes manifest management -- **Helm**: For complex application deployments -- **SOPS/Sealed Secrets**: For secret management -- **Trivy**: For vulnerability scanning - -### Team Adaptation -- **Training Plan**: Develop comprehensive training materials -- **Migration Guide**: Create step-by-step migration documentation -- **Support Period**: Provide dedicated support during transition -- **Feedback Mechanism**: Establish channels for team feedback - -### Risk Mitigation -- **Phased Approach**: Implement changes incrementally -- **Rollback Plan**: Develop comprehensive rollback procedures -- **Testing Strategy**: Implement thorough testing at each phase -- **Monitoring**: Enhanced monitoring during migration period - -## Expected Outcomes - -1. **Reduced Time-to-Find**: 40-60% reduction in time spent locating files -2. **Improved Deployment Speed**: 25-35% faster deployment cycles -3. **Enhanced Collaboration**: Better team coordination and understanding -4. **Reduced Errors**: 30-50% reduction in configuration errors -5. **Better Scalability**: Easier to add new services and features - -## Conclusion - -The proposed infrastructure reorganization represents a significant improvement over the current structure. By implementing a clear, logical hierarchy with proper separation of concerns, the new organization will: - -- **Improve operational efficiency** through better navigation and maintainability -- **Enhance security** with centralized security management -- **Support growth** with a scalable, modular design -- **Reduce errors** through standardized patterns and structures -- **Facilitate collaboration** with intuitive organization - -The key to successful implementation is a phased approach with thorough testing and team involvement at each stage. With proper planning and execution, this reorganization will provide long-term benefits for the Bakery-IA project's infrastructure management. - -## Appendix: File Migration Mapping - -### Current → Proposed Mapping - -**Kubernetes Components:** -- `infrastructure/kubernetes/base/components/*` → `infrastructure/services/microservices/*/` -- `infrastructure/kubernetes/base/components/databases/*` → `infrastructure/services/databases/*/` -- `infrastructure/kubernetes/base/migrations/*` → `infrastructure/services/microservices/*/migrations/` -- `infrastructure/kubernetes/base/configs/*` → `infrastructure/environments/*/values/` - -**CI/CD Components:** -- `infrastructure/ci-cd/*` → `infrastructure/cicd/*/` - -**Monitoring Components:** -- `infrastructure/signoz/*` → `infrastructure/monitoring/signoz/*/` -- `infrastructure/helm/*` → `infrastructure/monitoring/signoz/*/` (signoz-related) - -**Security Components:** -- `infrastructure/tls/*` → `infrastructure/security/certificates/*/` - -**Scripts:** -- `infrastructure/kubernetes/*.sh` → `infrastructure/scripts/*/` -- `infrastructure/helm/*.sh` → `infrastructure/scripts/deployment/*/` -- `infrastructure/tls/*.sh` → `infrastructure/scripts/setup/*/` - -This mapping provides a clear path for migrating each component to its new location while maintaining functionality and relationships between components. \ No newline at end of file diff --git a/README.md b/README.md index 9d4fab39..11de692d 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ For production deployment on clouding.io with Kubernetes: 3. Configure production-specific values 4. Deploy using the production kustomization: ```bash - kubectl apply -k infrastructure/kubernetes/environments/production/ + kubectl apply -k infrastructure/environments/prod/k8s-manifests ``` ## 🤝 Contributing diff --git a/Tiltfile b/Tiltfile index 750e8f98..4936cc7d 100644 --- a/Tiltfile +++ b/Tiltfile @@ -17,6 +17,43 @@ # ============================================================================= +# ============================================================================= +# PREPULL BASE IMAGES STEP - CRITICAL FIRST STEP +# ============================================================================= + +# Run the prepull script first - if this fails, don't continue +local_resource( + 'prepull-base-images', + cmd='''#!/usr/bin/env bash + echo "==========================================" + echo "PREPULLING BASE IMAGES - CRITICAL STEP" + echo "==========================================" + echo "" + + # Run the prepull script + if ./scripts/prepull-base-images.sh; then + echo "" + echo "✓ Base images prepull completed successfully" + echo "==========================================" + echo "CONTINUING WITH TILT SETUP..." + echo "==========================================" + exit 0 + else + echo "" + echo "❌ Base images prepull FAILED - stopping Tilt execution" + echo "This usually happens due to Docker Hub rate limits" + echo "Please try again later or configure Docker Hub credentials" + echo "==========================================" + # Exit with error code to prevent further execution + exit 1 + fi + ''', + labels=['00-prepull'], + auto_init=True, + allow_parallel=False +) + + # ============================================================================= # TILT CONFIGURATION # ============================================================================= @@ -191,132 +228,68 @@ Monitoring: Applying security configurations... """) -# Create Docker Hub secret for image pulls (if credentials are available) -local_resource( - 'dockerhub-secret', - cmd=''' - echo "Setting up Docker Hub image pull secret..." - - # Check if Docker Hub credentials are available - if [ -n "$DOCKERHUB_USERNAME" ] && [ -n "$DOCKERHUB_PASSWORD" ]; then - echo " Found DOCKERHUB_USERNAME and DOCKERHUB_PASSWORD environment variables" - ./infrastructure/kubernetes/create-dockerhub-secret.sh - elif [ -f "$HOME/.docker/config.json" ]; then - echo " Attempting to use Docker CLI credentials..." - ./infrastructure/kubernetes/create-dockerhub-secret.sh - else - echo " Docker Hub credentials not found" - echo " To enable automatic Docker Hub authentication:" - echo " 1. Run 'docker login', OR" - echo " 2. Set environment variables:" - echo " export DOCKERHUB_USERNAME='your-username'" - echo " export DOCKERHUB_PASSWORD='your-password-or-token'" - echo "" - echo " Continuing without Docker Hub authentication..." - echo " (This is OK for local development using local registry)" - fi - ''', - labels=['00-security'], - auto_init=True -) # Apply security configurations before loading main manifests local_resource( 'security-setup', cmd=''' echo "Applying security secrets and configurations..." - kubectl apply -f infrastructure/kubernetes/base/secrets.yaml - kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml - kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml - kubectl apply -f infrastructure/kubernetes/base/configs/postgres-init-config.yaml - kubectl apply -f infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml + + # First, ensure all required namespaces exist + echo "Creating namespaces..." + kubectl apply -f infrastructure/namespaces/bakery-ia.yaml + kubectl apply -f infrastructure/namespaces/tekton-pipelines.yaml + kubectl apply -f infrastructure/namespaces/flux-system.yaml + + # Wait for namespaces to be ready + echo "Waiting for namespaces to be ready..." + for ns in bakery-ia tekton-pipelines flux-system; do + until kubectl get namespace $ns 2>/dev/null; do + echo "Waiting for namespace $ns to be created..." + sleep 2 + done + echo "Namespace $ns is available" + done + + # Apply common secrets and configs + kubectl apply -f infrastructure/environments/common/configs/configmap.yaml + kubectl apply -f infrastructure/environments/common/configs/secrets.yaml + + # Apply database secrets and configs + kubectl apply -f infrastructure/platform/storage/postgres/secrets/postgres-tls-secret.yaml + kubectl apply -f infrastructure/platform/storage/postgres/configs/postgres-init-config.yaml + kubectl apply -f infrastructure/platform/storage/postgres/configs/postgres-logging-config.yaml + + # Apply Redis secrets + kubectl apply -f infrastructure/platform/storage/redis/secrets/redis-tls-secret.yaml + + # Apply MinIO secrets and configs + kubectl apply -f infrastructure/platform/storage/minio/minio-secrets.yaml + kubectl apply -f infrastructure/platform/storage/minio/secrets/minio-tls-secret.yaml + + # Apply Mail/SMTP secrets + kubectl apply -f infrastructure/platform/mail/mailu/mailu-secrets.yaml + + # Apply CI/CD secrets + kubectl apply -f infrastructure/cicd/tekton/secrets/secrets.yaml + echo "Security configurations applied" ''', - resource_deps=['dockerhub-secret'], + resource_deps=['prepull-base-images'], # Removed dockerhub-secret dependency labels=['00-security'], auto_init=True ) # Verify TLS certificates are mounted correctly -local_resource( - 'verify-tls', - cmd=''' - echo "Verifying TLS configuration..." - sleep 5 # Wait for pods to be ready - # Check if auth-db pod exists and has TLS certs - AUTH_POD=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/name=auth-db -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") - if [ -n "$AUTH_POD" ]; then - echo " Checking PostgreSQL TLS certificates..." - kubectl exec -n bakery-ia "$AUTH_POD" -- ls -la /tls/ 2>/dev/null && \ - echo " PostgreSQL TLS certificates mounted" || \ - echo " PostgreSQL TLS certificates not found (pods may still be starting)" - fi - - # Check if redis pod exists and has TLS certs - REDIS_POD=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/name=redis -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") - - if [ -n "$REDIS_POD" ]; then - echo " Checking Redis TLS certificates..." - kubectl exec -n bakery-ia "$REDIS_POD" -- ls -la /tls/ 2>/dev/null && \ - echo " Redis TLS certificates mounted" || \ - echo " Redis TLS certificates not found (pods may still be starting)" - fi - - echo "TLS verification complete" - ''', - resource_deps=['auth-db', 'redis'], - auto_init=True, - labels=['00-security'] -) - -# Verify PVCs are bound -local_resource( - 'verify-pvcs', - cmd=''' - echo "Verifying PersistentVolumeClaims..." - kubectl get pvc -n bakery-ia | grep -E "NAME|db-pvc" || echo " PVCs not yet bound" - PVC_COUNT=$(kubectl get pvc -n bakery-ia -o json | jq '.items | length') - echo " Found $PVC_COUNT PVCs" - echo "PVC verification complete" - ''', - resource_deps=['auth-db'], - auto_init=True, - labels=['00-security'] -) - -# Install and verify cert-manager -local_resource( - 'cert-manager-install', - cmd=''' - echo "Installing cert-manager..." - - # Check if cert-manager CRDs already exist - if kubectl get crd certificates.cert-manager.io >/dev/null 2>&1; then - echo " cert-manager CRDs already installed" - else - echo " Installing cert-manager v1.13.2..." - kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml - - echo " Waiting for cert-manager to be ready..." - kubectl wait --for=condition=available --timeout=120s deployment/cert-manager -n cert-manager - kubectl wait --for=condition=available --timeout=120s deployment/cert-manager-webhook -n cert-manager - - echo " cert-manager installed and ready" - fi - - echo "cert-manager verification complete" - ''', - labels=['00-security'], - auto_init=True -) # ============================================================================= # LOAD KUBERNETES MANIFESTS # ============================================================================= -k8s_yaml(kustomize('infrastructure/kubernetes/overlays/dev')) +# Load the main kustomize overlay for the dev environment +k8s_yaml(kustomize('infrastructure/environments/dev/k8s-manifests')) # ============================================================================= # DOCKER BUILD HELPERS @@ -509,6 +482,9 @@ k8s_resource('nominatim', labels=['01-infrastructure']) k8s_resource('minio', resource_deps=['security-setup'], labels=['01-infrastructure']) k8s_resource('minio-bucket-init', resource_deps=['minio'], labels=['01-infrastructure']) +# Mail Infrastructure (Mailu) +k8s_resource('mailu-front', resource_deps=['security-setup'], labels=['01-infrastructure']) + # ============================================================================= # MONITORING RESOURCES - SigNoz (Unified Observability) # ============================================================================= @@ -520,15 +496,6 @@ local_resource( echo "Deploying SigNoz Monitoring Stack..." echo "" - # Ensure Docker Hub secret exists in bakery-ia namespace - echo "Ensuring Docker Hub secret exists in bakery-ia namespace..." - if ! kubectl get secret dockerhub-creds -n bakery-ia &>/dev/null; then - echo " Docker Hub secret not found, attempting to create..." - ./infrastructure/kubernetes/create-dockerhub-secret.sh || echo " Continuing without Docker Hub authentication..." - else - echo " Docker Hub secret exists" - fi - echo "" # Check if SigNoz is already deployed if helm list -n bakery-ia | grep -q signoz; then @@ -544,7 +511,7 @@ local_resource( # Install SigNoz with custom values in the bakery-ia namespace helm upgrade --install signoz signoz/signoz \ -n bakery-ia \ - -f infrastructure/helm/signoz-values-dev.yaml \ + -f infrastructure/monitoring/signoz/signoz-values-dev.yaml \ --timeout 10m \ --wait @@ -568,43 +535,6 @@ local_resource( auto_init=False, ) -# Track SigNoz pods in Tilt UI using workload tracking -# These will automatically discover pods once SigNoz is deployed -local_resource( - 'signoz-status', - cmd=''' - echo "SigNoz Status Check" - echo "" - - # Check pod status - echo "Current SigNoz pods:" - kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz -o wide 2>/dev/null || echo "No pods found" - - echo "" - echo "SigNoz Services:" - kubectl get svc -n bakery-ia -l app.kubernetes.io/instance=signoz 2>/dev/null || echo "No services found" - - # Check if all pods are ready - TOTAL_PODS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz --no-headers 2>/dev/null | wc -l | tr -d ' ') - READY_PODS=$(kubectl get pods -n bakery-ia -l app.kubernetes.io/instance=signoz --field-selector=status.phase=Running --no-headers 2>/dev/null | wc -l | tr -d ' ') - - if [ "$TOTAL_PODS" -gt 0 ]; then - echo "" - echo "Pod Status: $READY_PODS/$TOTAL_PODS ready" - - if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then - echo "All SigNoz pods are running!" - echo "" - echo "Access SigNoz at: https://monitoring.bakery-ia.local" - echo "Credentials: admin / admin" - else - echo "Waiting for pods to become ready..." - fi - fi - ''', - labels=['05-monitoring'], - auto_init=False, -) # Optional exporters (in monitoring namespace) - DISABLED since using SigNoz # k8s_resource('node-exporter', labels=['05-monitoring']) @@ -774,6 +704,98 @@ watch_settings( ] ) +# ============================================================================= +# CI/CD INFRASTRUCTURE - MANUAL TRIGGERS +# ============================================================================= + +# Tekton Pipelines - Manual trigger for local development +local_resource( + 'tekton-pipelines', + cmd=''' + echo "Setting up Tekton Pipelines for CI/CD..." + echo "" + + # Check if Tekton CRDs are already installed + if kubectl get crd pipelines.tekton.dev >/dev/null 2>&1; then + echo " Tekton CRDs already installed" + else + echo " Installing Tekton v0.57.0..." + kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml + + echo " Waiting for Tekton to be ready..." + kubectl wait --for=condition=available --timeout=180s deployment/tekton-pipelines-controller -n tekton-pipelines + kubectl wait --for=condition=available --timeout=180s deployment/tekton-pipelines-webhook -n tekton-pipelines + + echo " Tekton installed and ready" + fi + + echo "" + echo "Applying Tekton configurations..." + kubectl apply -f infrastructure/cicd/tekton/kustomization.yaml + kubectl apply -f infrastructure/cicd/tekton/rbac/ + kubectl apply -f infrastructure/cicd/tekton/tasks/ + kubectl apply -f infrastructure/cicd/tekton/pipelines/ + + echo "" + echo "Tekton setup complete!" + echo "To check status: kubectl get pods -n tekton-pipelines" + ''', + labels=['99-cicd'], + auto_init=False, # Manual trigger only +) + +# Flux CD - Manual trigger for GitOps +local_resource( + 'flux-cd', + cmd=''' + echo "Setting up Flux CD for GitOps..." + echo "" + + # Check if Flux CRDs are already installed + if kubectl get crd gitrepositories.source.toolkit.fluxcd.io >/dev/null 2>&1; then + echo " Flux CRDs already installed" + else + echo " Installing Flux v2.2.3..." + curl -sL https://fluxcd.io/install.sh | sudo bash + flux install --version=latest + + echo " Flux installed and ready" + fi + + echo "" + echo "Applying Flux configurations..." + kubectl apply -f infrastructure/cicd/flux/ + + echo "" + echo "Flux setup complete!" + echo "To check status: flux check" + ''', + labels=['99-cicd'], + auto_init=False, # Manual trigger only +) + +# Gitea - Manual trigger for local Git server +local_resource( + 'gitea', + cmd=''' + echo "Setting up Gitea for local Git server..." + echo "" + + # Apply Gitea configurations + kubectl create namespace gitea || true + kubectl apply -f infrastructure/cicd/gitea/ + + echo "" + echo "Gitea setup complete!" + echo "Access Gitea at: http://gitea.local (add to /etc/hosts)" + echo "Default credentials: admin/admin123 (change after first login)" + echo "To check status: kubectl get pods -n gitea" + ''', + labels=['99-cicd'], + auto_init=False, # Manual trigger only +) + + # ============================================================================= # STARTUP SUMMARY # ============================================================================= @@ -804,11 +826,16 @@ Access your application: SigNoz (Unified Observability): Deploy via Tilt: Trigger 'signoz-deployment' resource - Manual deploy: ./infrastructure/helm/deploy-signoz.sh dev + Manual deploy: ./infrastructure/monitoring/signoz/deploy-signoz.sh dev Access (if deployed): https://monitoring.bakery-ia.local Username: admin Password: admin +CI/CD Infrastructure (Manual Triggers): + Tekton: Trigger 'tekton-pipelines' resource + Flux: Trigger 'flux-cd' resource + Gitea: Trigger 'gitea' resource + Verify security: kubectl get pvc -n bakery-ia kubectl get secrets -n bakery-ia | grep tls diff --git a/docs/PRODUCTION_OPERATIONS_GUIDE.md b/docs/PRODUCTION_OPERATIONS_GUIDE.md index 17072b67..da2d17dd 100644 --- a/docs/PRODUCTION_OPERATIONS_GUIDE.md +++ b/docs/PRODUCTION_OPERATIONS_GUIDE.md @@ -685,7 +685,7 @@ kubectl scale deployment auth-service -n bakery-ia --replicas=2 # 2. Install MicroK8s (follow pilot launch guide) # 3. Copy latest backup to new VPS # 4. Deploy infrastructure and databases -kubectl apply -k infrastructure/kubernetes/overlays/prod +kubectl apply -k infrastructure/environments/prod/k8s-manifests # 5. Wait for databases to be ready kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=database -n bakery-ia @@ -699,7 +699,7 @@ for backup in /backups/latest/*.sql; do done # 7. Deploy services -kubectl apply -k infrastructure/kubernetes/overlays/prod +kubectl apply -k infrastructure/environments/prod/k8s-manifests # 8. Update DNS to point to new VPS # 9. Verify all services healthy @@ -830,12 +830,12 @@ nproc kubectl scale deployment orders-service -n bakery-ia --replicas=5 # Or update in kustomization for persistence -# Edit: infrastructure/kubernetes/overlays/prod/kustomization.yaml +# Edit: infrastructure/environments/prod/k8s-manifests/kustomization.yaml replicas: - name: orders-service count: 5 -kubectl apply -k infrastructure/kubernetes/overlays/prod +kubectl apply -k infrastructure/environments/prod/k8s-manifests ``` ### Auto-Scaling (HPA) @@ -976,7 +976,7 @@ resources: memory: "1Gi" # Increased from 512Mi # 4. Redeploy -kubectl apply -k infrastructure/kubernetes/overlays/prod +kubectl apply -k infrastructure/environments/prod/k8s-manifests ``` #### Incident: Certificate Expired diff --git a/docs/database-security.md b/docs/database-security.md index 0813431b..a52b2588 100644 --- a/docs/database-security.md +++ b/docs/database-security.md @@ -324,12 +324,12 @@ log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' **Renewal Process:** ```bash # 1. Regenerate certificates (90 days before expiry) -cd infrastructure/tls && ./generate-certificates.sh +cd infrastructure/security/certificates && ./generate-certificates.sh # 2. Update Kubernetes secrets kubectl delete secret postgres-tls redis-tls -n bakery-ia -kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml -kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/secrets/postgres-tls-secret.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/secrets/redis-tls-secret.yaml # 3. Restart database pods (automatic) kubectl rollout restart deployment -l app.kubernetes.io/component=database -n bakery-ia @@ -351,7 +351,7 @@ kubectl rollout restart deployment -l app.kubernetes.io/component=database -n ba ./scripts/update-k8s-secrets.sh # 4. Apply secrets -kubectl apply -f infrastructure/kubernetes/base/secrets.yaml +kubectl apply -f infrastructure/environments/common/configs/secrets.yaml # 5. Restart databases and services kubectl rollout restart deployment -n bakery-ia diff --git a/gateway/Dockerfile b/gateway/Dockerfile index 5155fd07..663a4d26 100644 --- a/gateway/Dockerfile +++ b/gateway/Dockerfile @@ -1,10 +1,10 @@ # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim # Create non-root user for security RUN groupadd -r appgroup && useradd -r -g appgroup appuser diff --git a/infrastructure/NAMESPACES.md b/infrastructure/NAMESPACES.md new file mode 100644 index 00000000..ddb348d1 --- /dev/null +++ b/infrastructure/NAMESPACES.md @@ -0,0 +1,119 @@ +# Bakery-IA Namespace Management + +## Overview + +This document explains the namespace strategy for the Bakery-IA platform and how to properly manage namespaces during deployment. + +## Namespace Architecture + +The Bakery-IA platform uses the following namespaces: + +### Core Namespaces + +1. **`bakery-ia`** - Main application namespace + - Contains all microservices, databases, and application components + - Defined in: `infrastructure/namespaces/bakery-ia.yaml` + +2. **`tekton-pipelines`** - CI/CD pipeline namespace + - Contains Tekton pipeline resources, tasks, and triggers + - Defined in: `infrastructure/namespaces/tekton-pipelines.yaml` + +3. **`flux-system`** - GitOps namespace + - Contains Flux CD components for GitOps deployments + - Defined in: `infrastructure/namespaces/flux-system.yaml` + +### Infrastructure Namespaces + +Additional namespaces may be created for: +- Monitoring components +- Logging components +- Security components + +## Deployment Order + +**CRITICAL**: Namespaces must be created BEFORE any resources that depend on them. + +### Correct Deployment Sequence + +```bash +# 1. Create namespaces first +kubectl apply -f infrastructure/namespaces/ + +# 2. Apply common configurations (depends on bakery-ia namespace) +kubectl apply -f infrastructure/environments/common/configs/ + +# 3. Apply platform components +kubectl apply -f infrastructure/platform/ + +# 4. Apply CI/CD components (depends on tekton-pipelines and flux-system) +kubectl apply -f infrastructure/cicd/ + +# 5. Apply monitoring components +kubectl apply -f infrastructure/monitoring/ +``` + +## Common Issues and Solutions + +### Issue: "namespace not found" errors + +**Symptoms**: Errors like: +``` +Error from server (NotFound): error when creating "path/to/resource.yaml": namespaces "[namespace-name]" not found +``` + +**Solutions**: + +1. **Ensure namespaces are created first** - Use the deployment script that applies namespaces before other resources + +2. **Check for templating issues** - If you see names like `[redacted secret rabbitmq-secrets:RABBITMQ_USER]-ia`, there may be environment variable substitution happening incorrectly + +3. **Verify namespace YAML files** - Ensure the namespace files exist and are properly formatted + +### Issue: Resource conflicts across namespaces + +**Solution**: Use proper namespace isolation and RBAC policies to prevent cross-namespace conflicts. + +## Best Practices + +1. **Namespace Isolation**: Keep resources properly isolated by namespace +2. **RBAC**: Use namespace-specific RBAC roles and bindings +3. **Resource Quotas**: Apply resource quotas per namespace +4. **Network Policies**: Use network policies to control cross-namespace communication + +## Troubleshooting + +### Verify namespaces exist + +```bash +kubectl get namespaces +``` + +### Check namespace labels + +```bash +kubectl get namespace bakery-ia --show-labels +``` + +### View namespace events + +```bash +kubectl describe namespace bakery-ia +``` + +## Migration from Old Structure + +If you're migrating from the old structure where namespaces were scattered across different directories: + +1. **Remove old namespace files** from: + - `infrastructure/environments/common/configs/namespace.yaml` + - `infrastructure/cicd/flux/namespace.yaml` + +2. **Update kustomization files** to reference the centralized namespace files + +3. **Use the new deployment script** that follows the correct order + +## Future Enhancements + +- Add namespace lifecycle management +- Implement namespace cleanup scripts +- Add namespace validation checks to CI/CD pipelines \ No newline at end of file diff --git a/infrastructure/README.md b/infrastructure/README.md new file mode 100644 index 00000000..6464a3eb --- /dev/null +++ b/infrastructure/README.md @@ -0,0 +1,57 @@ +# Bakery-IA Infrastructure + +This directory contains all infrastructure-as-code for the Bakery-IA project, organized according to best practices for maintainability and scalability. + +## Directory Structure + +``` +infrastructure/ +├── environments/ # Environment-specific configurations +│ ├── dev/ # Development environment +│ │ ├── k8s-manifests/ # Kubernetes manifests for dev +│ │ └── values/ # Environment-specific values +│ ├── staging/ # Staging environment +│ │ ├── k8s-manifests/ +│ │ └── values/ +│ └── prod/ # Production environment +│ ├── k8s-manifests/ +│ ├── terraform/ # Production-specific IaC +│ └── values/ +├── platform/ # Platform-level infrastructure +│ ├── cluster/ # Cluster configuration (EKS, Kind) +│ ├── networking/ # Network configuration +│ ├── security/ # Security policies and TLS +│ └── storage/ # Storage configuration +├── services/ # Application services +│ ├── databases/ # Database configurations +│ ├── api-gateway/ # API gateway configuration +│ └── microservices/ # Individual microservice configs +├── monitoring/ # Observability stack +│ └── signoz/ # SigNoz configuration +├── cicd/ # CI/CD pipeline components +├── security/ # Security configurations +├── scripts/ # Automation scripts +└── docs/ # Infrastructure documentation +``` + +## Environments + +Each environment (dev, staging, prod) has its own configuration with appropriate isolation and security settings. + +## Services + +Services are organized by business domain with clear separation between databases, microservices, and infrastructure components. + +## Getting Started + +1. **Local Development**: Use `tilt up` to start the development environment +2. **Deployment**: Use `skaffold run` to deploy to your target environment +3. **CI/CD**: Tekton pipelines manage automated deployments + +## Security + +Security configurations are centralized in the `security/` directory with: +- TLS certificates and rotation scripts +- Network policies +- RBAC configurations +- Compliance checks \ No newline at end of file diff --git a/infrastructure/ci-cd/flux/kustomization.yaml b/infrastructure/ci-cd/flux/kustomization.yaml deleted file mode 100644 index 37e9df54..00000000 --- a/infrastructure/ci-cd/flux/kustomization.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# Flux Kustomization for Bakery-IA Production Deployment -# This resource tells Flux how to deploy the application - -apiVersion: kustomize.toolkit.fluxcd.io/v1 -kind: Kustomization -metadata: - name: bakery-ia-prod - namespace: flux-system -spec: - interval: 5m - path: ./infrastructure/kubernetes/overlays/prod - prune: true - sourceRef: - kind: GitRepository - name: bakery-ia - targetNamespace: bakery-ia - timeout: 5m - retryInterval: 1m - healthChecks: - - apiVersion: apps/v1 - kind: Deployment - name: auth-service - namespace: bakery-ia - - apiVersion: apps/v1 - kind: Deployment - name: gateway - namespace: bakery-ia \ No newline at end of file diff --git a/infrastructure/ci-cd/gitea/ingress.yaml b/infrastructure/ci-cd/gitea/ingress.yaml deleted file mode 100644 index ecfbc9d5..00000000 --- a/infrastructure/ci-cd/gitea/ingress.yaml +++ /dev/null @@ -1,25 +0,0 @@ -# Gitea Ingress configuration for Bakery-IA CI/CD -# This provides external access to Gitea within the cluster - -apiVersion: networking.k8s.io/v1 -kind: Ingress -metadata: - name: gitea-ingress - namespace: gitea - annotations: - nginx.ingress.kubernetes.io/rewrite-target: / - nginx.ingress.kubernetes.io/proxy-body-size: "0" - nginx.ingress.kubernetes.io/proxy-read-timeout: "600" - nginx.ingress.kubernetes.io/proxy-send-timeout: "600" -spec: - rules: - - host: gitea.bakery-ia.local - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: gitea-http - port: - number: 3000 \ No newline at end of file diff --git a/infrastructure/ci-cd/gitea/values.yaml b/infrastructure/ci-cd/gitea/values.yaml deleted file mode 100644 index f94929e8..00000000 --- a/infrastructure/ci-cd/gitea/values.yaml +++ /dev/null @@ -1,38 +0,0 @@ -# Gitea Helm values configuration for Bakery-IA CI/CD -# This configuration sets up Gitea with registry support and appropriate storage - -service: - type: ClusterIP - httpPort: 3000 - sshPort: 2222 - -persistence: - enabled: true - size: 50Gi - storageClass: "microk8s-hostpath" - -gitea: - config: - server: - DOMAIN: gitea.bakery-ia.local - SSH_DOMAIN: gitea.bakery-ia.local - ROOT_URL: http://gitea.bakery-ia.local - repository: - ENABLE_PUSH_CREATE_USER: true - ENABLE_PUSH_CREATE_ORG: true - registry: - ENABLED: true - -postgresql: - enabled: true - persistence: - size: 20Gi - -# Resource configuration for production environment -resources: - limits: - cpu: 1000m - memory: 1Gi - requests: - cpu: 500m - memory: 512Mi \ No newline at end of file diff --git a/infrastructure/ci-cd/monitoring/otel-collector.yaml b/infrastructure/ci-cd/monitoring/otel-collector.yaml deleted file mode 100644 index a8634707..00000000 --- a/infrastructure/ci-cd/monitoring/otel-collector.yaml +++ /dev/null @@ -1,70 +0,0 @@ -# OpenTelemetry Collector for Bakery-IA CI/CD Monitoring -# This collects metrics and traces from Tekton pipelines - -apiVersion: opentelemetry.io/v1alpha1 -kind: OpenTelemetryCollector -metadata: - name: tekton-otel - namespace: tekton-pipelines -spec: - config: | - receivers: - otlp: - protocols: - grpc: - endpoint: 0.0.0.0:4317 - http: - endpoint: 0.0.0.0:4318 - prometheus: - config: - scrape_configs: - - job_name: 'tekton-pipelines' - scrape_interval: 30s - static_configs: - - targets: ['tekton-pipelines-controller.tekton-pipelines.svc.cluster.local:9090'] - - processors: - batch: - timeout: 5s - send_batch_size: 1000 - memory_limiter: - check_interval: 2s - limit_percentage: 75 - spike_limit_percentage: 20 - - exporters: - otlp: - endpoint: "signoz-otel-collector.monitoring.svc.cluster.local:4317" - tls: - insecure: true - retry_on_failure: - enabled: true - initial_interval: 5s - max_interval: 30s - max_elapsed_time: 300s - logging: - logLevel: debug - - service: - pipelines: - traces: - receivers: [otlp] - processors: [memory_limiter, batch] - exporters: [otlp, logging] - metrics: - receivers: [otlp, prometheus] - processors: [memory_limiter, batch] - exporters: [otlp, logging] - telemetry: - logs: - level: "info" - encoding: "json" - - mode: deployment - resources: - limits: - cpu: 500m - memory: 512Mi - requests: - cpu: 200m - memory: 256Mi \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/pipelines/ci-pipeline.yaml b/infrastructure/ci-cd/tekton/pipelines/ci-pipeline.yaml deleted file mode 100644 index c20068b2..00000000 --- a/infrastructure/ci-cd/tekton/pipelines/ci-pipeline.yaml +++ /dev/null @@ -1,83 +0,0 @@ -# Main CI Pipeline for Bakery-IA -# This pipeline orchestrates the build, test, and deploy process - -apiVersion: tekton.dev/v1beta1 -kind: Pipeline -metadata: - name: bakery-ia-ci - namespace: tekton-pipelines -spec: - workspaces: - - name: shared-workspace - - name: docker-credentials - params: - - name: git-url - type: string - description: Repository URL - - name: git-revision - type: string - description: Git revision/commit hash - - name: registry - type: string - description: Container registry URL - default: "gitea.bakery-ia.local:5000" - tasks: - - name: fetch-source - taskRef: - name: git-clone - workspaces: - - name: output - workspace: shared-workspace - params: - - name: url - value: $(params.git-url) - - name: revision - value: $(params.git-revision) - - - name: detect-changes - runAfter: [fetch-source] - taskRef: - name: detect-changed-services - workspaces: - - name: source - workspace: shared-workspace - - - name: build-and-push - runAfter: [detect-changes] - taskRef: - name: kaniko-build - when: - - input: "$(tasks.detect-changes.results.changed-services)" - operator: notin - values: ["none"] - workspaces: - - name: source - workspace: shared-workspace - - name: docker-credentials - workspace: docker-credentials - params: - - name: services - value: $(tasks.detect-changes.results.changed-services) - - name: registry - value: $(params.registry) - - name: git-revision - value: $(params.git-revision) - - - name: update-gitops-manifests - runAfter: [build-and-push] - taskRef: - name: update-gitops - when: - - input: "$(tasks.detect-changes.results.changed-services)" - operator: notin - values: ["none"] - workspaces: - - name: source - workspace: shared-workspace - params: - - name: services - value: $(tasks.detect-changes.results.changed-services) - - name: registry - value: $(params.registry) - - name: git-revision - value: $(params.git-revision) \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/tasks/detect-changes.yaml b/infrastructure/ci-cd/tekton/tasks/detect-changes.yaml deleted file mode 100644 index 32abd32c..00000000 --- a/infrastructure/ci-cd/tekton/tasks/detect-changes.yaml +++ /dev/null @@ -1,64 +0,0 @@ -# Tekton Detect Changed Services Task for Bakery-IA CI/CD -# This task identifies which services have changed in the repository - -apiVersion: tekton.dev/v1beta1 -kind: Task -metadata: - name: detect-changed-services - namespace: tekton-pipelines -spec: - workspaces: - - name: source - results: - - name: changed-services - description: Comma-separated list of changed services - steps: - - name: detect - image: alpine/git - script: | - #!/bin/sh - set -e - cd $(workspaces.source.path) - - echo "Detecting changed files..." - # Get list of changed files compared to previous commit - CHANGED_FILES=$(git diff --name-only HEAD~1 HEAD 2>/dev/null || git diff --name-only HEAD) - - echo "Changed files: $CHANGED_FILES" - - # Map files to services - CHANGED_SERVICES=() - for file in $CHANGED_FILES; do - if [[ $file == services/* ]]; then - SERVICE=$(echo $file | cut -d'/' -f2) - # Only add unique service names - if [[ ! " ${CHANGED_SERVICES[@]} " =~ " ${SERVICE} " ]]; then - CHANGED_SERVICES+=("$SERVICE") - fi - elif [[ $file == frontend/* ]]; then - CHANGED_SERVICES+=("frontend") - break - elif [[ $file == gateway/* ]]; then - CHANGED_SERVICES+=("gateway") - break - fi - done - - # If no specific services changed, check for infrastructure changes - if [ ${#CHANGED_SERVICES[@]} -eq 0 ]; then - for file in $CHANGED_FILES; do - if [[ $file == infrastructure/* ]]; then - CHANGED_SERVICES+=("infrastructure") - break - fi - done - fi - - # Output result - if [ ${#CHANGED_SERVICES[@]} -eq 0 ]; then - echo "No service changes detected" - echo "none" | tee $(results.changed-services.path) - else - echo "Detected changes in services: ${CHANGED_SERVICES[@]}" - echo $(printf "%s," "${CHANGED_SERVICES[@]}" | sed 's/,$//') | tee $(results.changed-services.path) - fi \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/tasks/git-clone.yaml b/infrastructure/ci-cd/tekton/tasks/git-clone.yaml deleted file mode 100644 index 5decee5c..00000000 --- a/infrastructure/ci-cd/tekton/tasks/git-clone.yaml +++ /dev/null @@ -1,31 +0,0 @@ -# Tekton Git Clone Task for Bakery-IA CI/CD -# This task clones the source code repository - -apiVersion: tekton.dev/v1beta1 -kind: Task -metadata: - name: git-clone - namespace: tekton-pipelines -spec: - workspaces: - - name: output - params: - - name: url - type: string - description: Repository URL to clone - - name: revision - type: string - description: Git revision to checkout - default: "main" - steps: - - name: clone - image: alpine/git - script: | - #!/bin/sh - set -e - echo "Cloning repository: $(params.url)" - git clone $(params.url) $(workspaces.output.path) - cd $(workspaces.output.path) - echo "Checking out revision: $(params.revision)" - git checkout $(params.revision) - echo "Repository cloned successfully" \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/tasks/kaniko-build.yaml b/infrastructure/ci-cd/tekton/tasks/kaniko-build.yaml deleted file mode 100644 index 1f9290e8..00000000 --- a/infrastructure/ci-cd/tekton/tasks/kaniko-build.yaml +++ /dev/null @@ -1,40 +0,0 @@ -# Tekton Kaniko Build Task for Bakery-IA CI/CD -# This task builds and pushes container images using Kaniko - -apiVersion: tekton.dev/v1beta1 -kind: Task -metadata: - name: kaniko-build - namespace: tekton-pipelines -spec: - workspaces: - - name: source - - name: docker-credentials - params: - - name: services - type: string - description: Comma-separated list of services to build - - name: registry - type: string - description: Container registry URL - default: "gitea.bakery-ia.local:5000" - - name: git-revision - type: string - description: Git revision for image tag - default: "latest" - steps: - - name: build-and-push - image: gcr.io/kaniko-project/executor:v1.9.0 - args: - - --dockerfile=$(workspaces.source.path)/services/$(params.services)/Dockerfile - - --context=$(workspaces.source.path) - - --destination=$(params.registry)/bakery/$(params.services):$(params.git-revision) - - --verbosity=info - volumeMounts: - - name: docker-config - mountPath: /kaniko/.docker - securityContext: - runAsUser: 0 - volumes: - - name: docker-config - emptyDir: {} \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/tasks/update-gitops.yaml b/infrastructure/ci-cd/tekton/tasks/update-gitops.yaml deleted file mode 100644 index 1e89d170..00000000 --- a/infrastructure/ci-cd/tekton/tasks/update-gitops.yaml +++ /dev/null @@ -1,66 +0,0 @@ -# Tekton Update GitOps Manifests Task for Bakery-IA CI/CD -# This task updates Kubernetes manifests with new image tags - -apiVersion: tekton.dev/v1beta1 -kind: Task -metadata: - name: update-gitops - namespace: tekton-pipelines -spec: - workspaces: - - name: source - params: - - name: services - type: string - description: Comma-separated list of services to update - - name: registry - type: string - description: Container registry URL - - name: git-revision - type: string - description: Git revision for image tag - steps: - - name: update-manifests - image: bitnami/kubectl - script: | - #!/bin/sh - set -e - cd $(workspaces.source.path) - - echo "Updating GitOps manifests for services: $(params.services)" - - # Split services by comma - IFS=',' read -ra SERVICES <<< "$(params.services)" - - for service in "${SERVICES[@]}"; do - echo "Processing service: $service" - - # Find and update Kubernetes manifests - if [ "$service" = "frontend" ]; then - # Update frontend deployment - if [ -f "infrastructure/kubernetes/overlays/prod/frontend-deployment.yaml" ]; then - sed -i "s|image:.*|image: $(params.registry)/bakery/frontend:$(params.git-revision)|g" \ - "infrastructure/kubernetes/overlays/prod/frontend-deployment.yaml" - fi - elif [ "$service" = "gateway" ]; then - # Update gateway deployment - if [ -f "infrastructure/kubernetes/overlays/prod/gateway-deployment.yaml" ]; then - sed -i "s|image:.*|image: $(params.registry)/bakery/gateway:$(params.git-revision)|g" \ - "infrastructure/kubernetes/overlays/prod/gateway-deployment.yaml" - fi - else - # Update service deployment - DEPLOYMENT_FILE="infrastructure/kubernetes/overlays/prod/${service}-deployment.yaml" - if [ -f "$DEPLOYMENT_FILE" ]; then - sed -i "s|image:.*|image: $(params.registry)/bakery/${service}:$(params.git-revision)|g" \ - "$DEPLOYMENT_FILE" - fi - fi - done - - # Commit changes - git config --global user.name "bakery-ia-ci" - git config --global user.email "ci@bakery-ia.local" - git add . - git commit -m "CI: Update image tags for $(params.services) to $(params.git-revision)" - git push origin HEAD \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/triggers/event-listener.yaml b/infrastructure/ci-cd/tekton/triggers/event-listener.yaml deleted file mode 100644 index 5049bacb..00000000 --- a/infrastructure/ci-cd/tekton/triggers/event-listener.yaml +++ /dev/null @@ -1,26 +0,0 @@ -# Tekton EventListener for Bakery-IA CI/CD -# This listener receives webhook events and triggers pipelines - -apiVersion: triggers.tekton.dev/v1alpha1 -kind: EventListener -metadata: - name: bakery-ia-listener - namespace: tekton-pipelines -spec: - serviceAccountName: tekton-triggers-sa - triggers: - - name: bakery-ia-gitea-trigger - bindings: - - ref: bakery-ia-trigger-binding - template: - ref: bakery-ia-trigger-template - interceptors: - - ref: - name: "gitlab" - params: - - name: "secretRef" - value: - secretName: gitea-webhook-secret - secretKey: secretToken - - name: "eventTypes" - value: ["push"] \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/triggers/gitlab-interceptor.yaml b/infrastructure/ci-cd/tekton/triggers/gitlab-interceptor.yaml deleted file mode 100644 index c8fc1c26..00000000 --- a/infrastructure/ci-cd/tekton/triggers/gitlab-interceptor.yaml +++ /dev/null @@ -1,14 +0,0 @@ -# GitLab/Gitea Webhook Interceptor for Tekton Triggers -# This interceptor validates and processes Gitea webhook events - -apiVersion: triggers.tekton.dev/v1alpha1 -kind: ClusterInterceptor -metadata: - name: gitlab -spec: - clientConfig: - service: - name: tekton-triggers-core-interceptors - namespace: tekton-pipelines - path: "/v1/webhook/gitlab" - port: 8443 \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/triggers/trigger-binding.yaml b/infrastructure/ci-cd/tekton/triggers/trigger-binding.yaml deleted file mode 100644 index 8116792a..00000000 --- a/infrastructure/ci-cd/tekton/triggers/trigger-binding.yaml +++ /dev/null @@ -1,16 +0,0 @@ -# Tekton TriggerBinding for Bakery-IA CI/CD -# This binding extracts parameters from Gitea webhook events - -apiVersion: triggers.tekton.dev/v1alpha1 -kind: TriggerBinding -metadata: - name: bakery-ia-trigger-binding - namespace: tekton-pipelines -spec: - params: - - name: git-repo-url - value: $(body.repository.clone_url) - - name: git-revision - value: $(body.head_commit.id) - - name: git-repo-name - value: $(body.repository.name) \ No newline at end of file diff --git a/infrastructure/ci-cd/tekton/triggers/trigger-template.yaml b/infrastructure/ci-cd/tekton/triggers/trigger-template.yaml deleted file mode 100644 index 17208bf8..00000000 --- a/infrastructure/ci-cd/tekton/triggers/trigger-template.yaml +++ /dev/null @@ -1,43 +0,0 @@ -# Tekton TriggerTemplate for Bakery-IA CI/CD -# This template defines how PipelineRuns are created when triggers fire - -apiVersion: triggers.tekton.dev/v1alpha1 -kind: TriggerTemplate -metadata: - name: bakery-ia-trigger-template - namespace: tekton-pipelines -spec: - params: - - name: git-repo-url - description: The git repository URL - - name: git-revision - description: The git revision/commit hash - - name: git-repo-name - description: The git repository name - default: "bakery-ia" - resourcetemplates: - - apiVersion: tekton.dev/v1beta1 - kind: PipelineRun - metadata: - generateName: bakery-ia-ci-run-$(params.git-repo-name)- - spec: - pipelineRef: - name: bakery-ia-ci - workspaces: - - name: shared-workspace - volumeClaimTemplate: - spec: - accessModes: ["ReadWriteOnce"] - resources: - requests: - storage: 5Gi - - name: docker-credentials - secret: - secretName: gitea-registry-credentials - params: - - name: git-url - value: $(params.git-repo-url) - - name: git-revision - value: $(params.git-revision) - - name: registry - value: "gitea.bakery-ia.local:5000" \ No newline at end of file diff --git a/infrastructure/ci-cd/README.md b/infrastructure/cicd/README.md similarity index 96% rename from infrastructure/ci-cd/README.md rename to infrastructure/cicd/README.md index ebdb18dd..7d4acc32 100644 --- a/infrastructure/ci-cd/README.md +++ b/infrastructure/cicd/README.md @@ -19,8 +19,7 @@ graph TD ``` infrastructure/ci-cd/ ├── gitea/ # Gitea configuration (Git server + registry) -│ ├── values.yaml # Helm values for Gitea -│ └── ingress.yaml # Ingress configuration +│ └── values.yaml # Helm values for Gitea (ingress now in main config) ├── tekton/ # Tekton CI/CD pipeline configuration │ ├── tasks/ # Individual pipeline tasks │ │ ├── git-clone.yaml @@ -59,8 +58,8 @@ infrastructure/ci-cd/ -n gitea \ -f infrastructure/ci-cd/gitea/values.yaml - # Apply ingress - microk8s kubectl apply -f infrastructure/ci-cd/gitea/ingress.yaml + # Note: Gitea ingress is now included in the main ingress configuration + # No separate ingress needs to be applied ``` 2. **Deploy Tekton**: @@ -85,8 +84,8 @@ infrastructure/ci-cd/ # Verify Flux installation microk8s kubectl get pods -n flux-system - # Apply Flux configurations - microk8s kubectl apply -f infrastructure/ci-cd/flux/ + # Apply Flux configurations using kustomize + microk8s kubectl apply -k infrastructure/ci-cd/flux/ ``` ### Phase 2: Configuration diff --git a/infrastructure/cicd/flux/flux-kustomization.yaml b/infrastructure/cicd/flux/flux-kustomization.yaml new file mode 100644 index 00000000..6ad97f91 --- /dev/null +++ b/infrastructure/cicd/flux/flux-kustomization.yaml @@ -0,0 +1,76 @@ +# Flux Kustomization for Bakery-IA Production Deployment +# This resource tells Flux how to deploy the application +# +# Prerequisites: +# 1. Flux CD must be installed: flux install +# 2. GitRepository 'bakery-ia' must be created and ready +# 3. Secret 'gitea-credentials' must exist in flux-system namespace + +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: bakery-ia-prod + namespace: flux-system + labels: + app.kubernetes.io/name: bakery-ia + app.kubernetes.io/component: flux +spec: + # Wait for GitRepository to be ready before reconciling + dependsOn: [] + interval: 5m + path: ./infrastructure/environments/prod + prune: true + sourceRef: + kind: GitRepository + name: bakery-ia + targetNamespace: bakery-ia + timeout: 10m + retryInterval: 1m + wait: true + # Health checks for critical services + healthChecks: + # Core Infrastructure + - apiVersion: apps/v1 + kind: Deployment + name: gateway + namespace: bakery-ia + # Authentication & Authorization + - apiVersion: apps/v1 + kind: Deployment + name: auth-service + namespace: bakery-ia + - apiVersion: apps/v1 + kind: Deployment + name: tenant-service + namespace: bakery-ia + # Core Business Services + - apiVersion: apps/v1 + kind: Deployment + name: inventory-service + namespace: bakery-ia + - apiVersion: apps/v1 + kind: Deployment + name: orders-service + namespace: bakery-ia + - apiVersion: apps/v1 + kind: Deployment + name: pos-service + namespace: bakery-ia + # Data Services + - apiVersion: apps/v1 + kind: Deployment + name: forecasting-service + namespace: bakery-ia + - apiVersion: apps/v1 + kind: Deployment + name: notification-service + namespace: bakery-ia + # Post-build variable substitution + postBuild: + substituteFrom: + - kind: ConfigMap + name: bakery-ia-config + optional: true + - kind: Secret + name: bakery-ia-secrets + optional: true \ No newline at end of file diff --git a/infrastructure/ci-cd/flux/git-repository.yaml b/infrastructure/cicd/flux/git-repository.yaml similarity index 100% rename from infrastructure/ci-cd/flux/git-repository.yaml rename to infrastructure/cicd/flux/git-repository.yaml diff --git a/infrastructure/cicd/flux/kustomization.yaml b/infrastructure/cicd/flux/kustomization.yaml new file mode 100644 index 00000000..bde76e88 --- /dev/null +++ b/infrastructure/cicd/flux/kustomization.yaml @@ -0,0 +1,25 @@ +# Kustomize build configuration for Flux resources +# This file is used to build and apply the Flux resources +# +# IMPORTANT: Apply resources in this order: +# 1. Install Flux CD first: flux install +# 2. Apply this kustomization: kubectl apply -k infrastructure/cicd/flux/ +# +# The GitRepository must be ready before the Flux Kustomization can reconcile. + +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +# Resources to apply in order (namespace and secrets first, then sources, then kustomizations) +resources: + - namespace.yaml + - git-repository.yaml + - flux-kustomization.yaml + +# Common labels for all resources +commonLabels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: flux + app.kubernetes.io/managed-by: kustomize + +# Note: Do NOT set namespace here as resources already have explicit namespaces diff --git a/infrastructure/cicd/flux/namespace.yaml b/infrastructure/cicd/flux/namespace.yaml new file mode 100644 index 00000000..b9a5964d --- /dev/null +++ b/infrastructure/cicd/flux/namespace.yaml @@ -0,0 +1,15 @@ +# Flux System Namespace +# This namespace is required for Flux CD components +# It should be created before any Flux resources are applied + +apiVersion: v1 +kind: Namespace +metadata: + name: flux-system + labels: + app.kubernetes.io/name: flux + app.kubernetes.io/component: system + kubernetes.io/metadata.name: flux-system + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/audit: restricted + pod-security.kubernetes.io/warn: restricted diff --git a/infrastructure/cicd/gitea/ingress.yaml.disabled b/infrastructure/cicd/gitea/ingress.yaml.disabled new file mode 100644 index 00000000..cc026840 --- /dev/null +++ b/infrastructure/cicd/gitea/ingress.yaml.disabled @@ -0,0 +1,44 @@ +# Gitea Ingress Configuration +# Routes external traffic to Gitea service for web UI and Git HTTP access +# +# Prerequisites: +# - Gitea must be deployed in the 'gitea' namespace +# - Ingress controller must be installed (nginx, traefik, etc.) +# - For HTTPS: cert-manager with a ClusterIssuer named 'letsencrypt-prod' or 'local-ca-issuer' + +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: gitea-ingress + namespace: gitea + labels: + app.kubernetes.io/name: gitea + app.kubernetes.io/component: ingress + app.kubernetes.io/part-of: bakery-ia-cicd + annotations: + # For nginx ingress controller + nginx.ingress.kubernetes.io/proxy-body-size: "100m" + nginx.ingress.kubernetes.io/proxy-read-timeout: "600" + nginx.ingress.kubernetes.io/proxy-send-timeout: "600" + # For traefik ingress controller + traefik.ingress.kubernetes.io/router.entrypoints: web,websecure + # For TLS with cert-manager (uncomment for HTTPS) + # cert-manager.io/cluster-issuer: "local-ca-issuer" +spec: + ingressClassName: nginx + # Uncomment for HTTPS + # tls: + # - hosts: + # - gitea.bakery-ia.local + # secretName: gitea-tls + rules: + - host: gitea.bakery-ia.local + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: gitea-http + port: + number: 3000 diff --git a/infrastructure/cicd/gitea/setup-admin-secret.sh b/infrastructure/cicd/gitea/setup-admin-secret.sh new file mode 100755 index 00000000..08f75cdf --- /dev/null +++ b/infrastructure/cicd/gitea/setup-admin-secret.sh @@ -0,0 +1,48 @@ +#!/bin/bash +# Setup Gitea Admin Secret +# +# This script creates the Kubernetes secret required for Gitea admin credentials. +# Run this BEFORE installing Gitea with Helm. +# +# Usage: +# ./setup-admin-secret.sh [password] +# +# If password is not provided, a random one will be generated. + +set -e + +KUBECTL="kubectl" +NAMESPACE="gitea" + +# Check if running in microk8s +if command -v microk8s &> /dev/null; then + KUBECTL="microk8s kubectl" +fi + +# Get or generate password +if [ -n "$1" ]; then + ADMIN_PASSWORD="$1" +else + ADMIN_PASSWORD=$(openssl rand -base64 24 | tr -d '/+=' | head -c 20) + echo "Generated admin password: $ADMIN_PASSWORD" +fi + +# Create namespace if it doesn't exist +$KUBECTL create namespace "$NAMESPACE" --dry-run=client -o yaml | $KUBECTL apply -f - + +# Create the secret +$KUBECTL create secret generic gitea-admin-secret \ + --namespace "$NAMESPACE" \ + --from-literal=username=bakery-admin \ + --from-literal=password="$ADMIN_PASSWORD" \ + --dry-run=client -o yaml | $KUBECTL apply -f - + +echo "" +echo "Gitea admin secret created successfully!" +echo "" +echo "Admin credentials:" +echo " Username: bakery-admin" +echo " Password: $ADMIN_PASSWORD" +echo "" +echo "Now install Gitea with:" +echo " helm install gitea gitea/gitea -n gitea -f infrastructure/cicd/gitea/values.yaml" diff --git a/infrastructure/cicd/gitea/values.yaml b/infrastructure/cicd/gitea/values.yaml new file mode 100644 index 00000000..f3e7742d --- /dev/null +++ b/infrastructure/cicd/gitea/values.yaml @@ -0,0 +1,83 @@ +# Gitea Helm values configuration for Bakery-IA CI/CD +# This configuration sets up Gitea with registry support and appropriate storage +# +# Installation: +# helm repo add gitea https://dl.gitea.io/charts +# kubectl create namespace gitea +# helm install gitea gitea/gitea -n gitea -f infrastructure/cicd/gitea/values.yaml +# +# NOTE: The namespace is determined by the -n flag during helm install, not in this file. + +service: + http: + type: ClusterIP + port: 3000 + ssh: + type: ClusterIP + port: 2222 + +persistence: + enabled: true + size: 10Gi + # Use standard storage class (works with Kind's default provisioner) + # For microk8s: storageClass: "microk8s-hostpath" + # For Kind: leave empty or use "standard" + storageClass: "" + +gitea: + admin: + username: bakery-admin + # IMPORTANT: Override this with --set gitea.admin.password= + # or use existingSecret + password: "" + email: admin@bakery-ia.local + existingSecret: gitea-admin-secret + config: + server: + DOMAIN: gitea.bakery-ia.local + SSH_DOMAIN: gitea.bakery-ia.local + # Use HTTP internally; TLS termination happens at ingress + ROOT_URL: http://gitea.bakery-ia.local + HTTP_PORT: 3000 + # For external HTTPS access via ingress, set: + # ROOT_URL: https://gitea.bakery-ia.local + repository: + ENABLE_PUSH_CREATE_USER: true + ENABLE_PUSH_CREATE_ORG: true + packages: + ENABLED: true + webhook: + ALLOWED_HOST_LIST: "*" + # Allow internal cluster URLs for Tekton EventListener + SKIP_TLS_VERIFY: true + service: + DISABLE_REGISTRATION: false + REQUIRE_SIGNIN_VIEW: false + +# Use embedded SQLite for simpler local development +# For production, enable postgresql +postgresql: + enabled: false + +# Use embedded in-memory cache for local dev +redis-cluster: + enabled: false + +# Resource configuration for local development +resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 100m + memory: 256Mi + +# Init containers timeout +initContainers: + resources: + limits: + cpu: 100m + memory: 128Mi + requests: + cpu: 50m + memory: 64Mi diff --git a/infrastructure/cicd/kustomization.yaml b/infrastructure/cicd/kustomization.yaml new file mode 100644 index 00000000..de2889af --- /dev/null +++ b/infrastructure/cicd/kustomization.yaml @@ -0,0 +1,10 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - tekton/ + - ../../namespaces/flux-system.yaml + +# Gitea is managed via Helm, so we don't include it directly here +# The Gitea Helm chart is deployed separately and referenced in the ingress +# Flux configuration is a Flux Kustomization resource, not a kustomize config diff --git a/infrastructure/cicd/tekton/cleanup/cleanup.yaml b/infrastructure/cicd/tekton/cleanup/cleanup.yaml new file mode 100644 index 00000000..6282dd8b --- /dev/null +++ b/infrastructure/cicd/tekton/cleanup/cleanup.yaml @@ -0,0 +1,222 @@ +# Workspace and PipelineRun Cleanup for Bakery-IA CI/CD +# This CronJob cleans up old PipelineRuns and PVCs to prevent storage exhaustion + +--- +# ServiceAccount for cleanup job +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tekton-cleanup-sa + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: cleanup + +--- +# ClusterRole for cleanup operations +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: tekton-cleanup-role + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: cleanup +rules: + - apiGroups: ["tekton.dev"] + resources: ["pipelineruns", "taskruns"] + verbs: ["get", "list", "delete"] + - apiGroups: [""] + resources: ["persistentvolumeclaims"] + verbs: ["get", "list", "delete"] + - apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list", "delete"] + +--- +# ClusterRoleBinding for cleanup +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: tekton-cleanup-binding + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: cleanup +subjects: + - kind: ServiceAccount + name: tekton-cleanup-sa + namespace: tekton-pipelines +roleRef: + kind: ClusterRole + name: tekton-cleanup-role + apiGroup: rbac.authorization.k8s.io + +--- +# CronJob to clean up old PipelineRuns +apiVersion: batch/v1 +kind: CronJob +metadata: + name: tekton-pipelinerun-cleanup + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: cleanup +spec: + # Run every 6 hours + schedule: "0 */6 * * *" + concurrencyPolicy: Forbid + successfulJobsHistoryLimit: 3 + failedJobsHistoryLimit: 3 + jobTemplate: + spec: + ttlSecondsAfterFinished: 3600 + template: + metadata: + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: cleanup + spec: + serviceAccountName: tekton-cleanup-sa + restartPolicy: OnFailure + containers: + - name: cleanup + image: bitnami/kubectl:latest + command: + - /bin/sh + - -c + - | + #!/bin/sh + set -e + + echo "============================================" + echo "Tekton Cleanup Job" + echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" + echo "============================================" + + # Configuration + NAMESPACE="tekton-pipelines" + MAX_AGE_HOURS=24 + KEEP_RECENT=10 + + echo "" + echo "Configuration:" + echo " Namespace: $NAMESPACE" + echo " Max Age: ${MAX_AGE_HOURS} hours" + echo " Keep Recent: $KEEP_RECENT" + echo "" + + # Get current timestamp + CURRENT_TIME=$(date +%s) + + # Clean up completed PipelineRuns older than MAX_AGE_HOURS + echo "Cleaning up old PipelineRuns..." + + # Get all completed PipelineRuns + COMPLETED_RUNS=$(kubectl get pipelineruns -n "$NAMESPACE" \ + --no-headers \ + -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[0].reason,AGE:.metadata.creationTimestamp \ + 2>/dev/null | grep -E "Succeeded|Failed" || true) + + DELETED_COUNT=0 + + echo "$COMPLETED_RUNS" | while read -r line; do + if [ -z "$line" ]; then + continue + fi + + RUN_NAME=$(echo "$line" | awk '{print $1}') + RUN_TIME=$(echo "$line" | awk '{print $3}') + + if [ -z "$RUN_NAME" ] || [ -z "$RUN_TIME" ]; then + continue + fi + + # Convert timestamp to seconds + RUN_TIMESTAMP=$(date -d "$RUN_TIME" +%s 2>/dev/null || echo "0") + + if [ "$RUN_TIMESTAMP" = "0" ]; then + continue + fi + + # Calculate age in hours + AGE_SECONDS=$((CURRENT_TIME - RUN_TIMESTAMP)) + AGE_HOURS=$((AGE_SECONDS / 3600)) + + if [ "$AGE_HOURS" -gt "$MAX_AGE_HOURS" ]; then + echo "Deleting PipelineRun: $RUN_NAME (age: ${AGE_HOURS}h)" + kubectl delete pipelinerun "$RUN_NAME" -n "$NAMESPACE" --ignore-not-found=true + DELETED_COUNT=$((DELETED_COUNT + 1)) + fi + done + + echo "Deleted $DELETED_COUNT old PipelineRuns" + + # Clean up orphaned PVCs (PVCs without associated PipelineRuns) + echo "" + echo "Cleaning up orphaned PVCs..." + + ORPHANED_PVCS=$(kubectl get pvc -n "$NAMESPACE" \ + -l tekton.dev/pipelineRun \ + --no-headers \ + -o custom-columns=NAME:.metadata.name,PIPELINERUN:.metadata.labels.tekton\\.dev/pipelineRun \ + 2>/dev/null || true) + + echo "$ORPHANED_PVCS" | while read -r line; do + if [ -z "$line" ]; then + continue + fi + + PVC_NAME=$(echo "$line" | awk '{print $1}') + PR_NAME=$(echo "$line" | awk '{print $2}') + + if [ -z "$PVC_NAME" ]; then + continue + fi + + # Check if associated PipelineRun exists + if ! kubectl get pipelinerun "$PR_NAME" -n "$NAMESPACE" > /dev/null 2>&1; then + echo "Deleting orphaned PVC: $PVC_NAME (PipelineRun $PR_NAME not found)" + kubectl delete pvc "$PVC_NAME" -n "$NAMESPACE" --ignore-not-found=true + fi + done + + # Clean up completed/failed pods older than 1 hour + echo "" + echo "Cleaning up old completed pods..." + + kubectl delete pods -n "$NAMESPACE" \ + --field-selector=status.phase=Succeeded \ + --ignore-not-found=true 2>/dev/null || true + + kubectl delete pods -n "$NAMESPACE" \ + --field-selector=status.phase=Failed \ + --ignore-not-found=true 2>/dev/null || true + + echo "" + echo "============================================" + echo "Cleanup complete" + echo "============================================" + resources: + limits: + cpu: 200m + memory: 256Mi + requests: + cpu: 100m + memory: 128Mi + +--- +# ConfigMap for cleanup configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: cleanup-config + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: cleanup +data: + # Maximum age of completed PipelineRuns to keep (in hours) + MAX_AGE_HOURS: "24" + # Number of recent PipelineRuns to keep regardless of age + KEEP_RECENT: "10" + # Cleanup schedule (cron format) + CLEANUP_SCHEDULE: "0 */6 * * *" diff --git a/infrastructure/cicd/tekton/cleanup/kustomization.yaml b/infrastructure/cicd/tekton/cleanup/kustomization.yaml new file mode 100644 index 00000000..2282cef0 --- /dev/null +++ b/infrastructure/cicd/tekton/cleanup/kustomization.yaml @@ -0,0 +1,5 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - cleanup.yaml diff --git a/infrastructure/cicd/tekton/configs/kustomization.yaml b/infrastructure/cicd/tekton/configs/kustomization.yaml new file mode 100644 index 00000000..f7e9e3fd --- /dev/null +++ b/infrastructure/cicd/tekton/configs/kustomization.yaml @@ -0,0 +1,5 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - pipeline-config.yaml diff --git a/infrastructure/cicd/tekton/configs/pipeline-config.yaml b/infrastructure/cicd/tekton/configs/pipeline-config.yaml new file mode 100644 index 00000000..e23d0c31 --- /dev/null +++ b/infrastructure/cicd/tekton/configs/pipeline-config.yaml @@ -0,0 +1,41 @@ +# CI/CD Pipeline Configuration for Bakery-IA +# This ConfigMap contains configurable values for the CI/CD pipeline +# +# IMPORTANT: When changing REGISTRY_URL, also update: +# - infrastructure/cicd/tekton/triggers/trigger-template.yaml (registry-url default) +# - infrastructure/cicd/tekton/secrets/secrets.yaml (registry credentials) + +apiVersion: v1 +kind: ConfigMap +metadata: + name: pipeline-config + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: config +data: + # Container Registry Configuration + # Change this to your actual registry URL + # Also update trigger-template.yaml and secrets when changing this! + REGISTRY_URL: "gitea.bakery-ia.local:5000" + + # Git Configuration + GIT_BRANCH: "main" + GIT_USER_NAME: "bakery-ia-ci" + GIT_USER_EMAIL: "ci@bakery-ia.local" + + # Build Configuration + BUILD_CACHE_TTL: "24h" + BUILD_VERBOSITY: "info" + + # Test Configuration + SKIP_TESTS: "false" + SKIP_LINT: "false" + + # Deployment Configuration + DEPLOY_NAMESPACE: "bakery-ia" + FLUX_NAMESPACE: "flux-system" + + # Workspace Configuration + WORKSPACE_SIZE: "5Gi" + WORKSPACE_STORAGE_CLASS: "standard" diff --git a/infrastructure/cicd/tekton/kustomization.yaml b/infrastructure/cicd/tekton/kustomization.yaml new file mode 100644 index 00000000..71385d88 --- /dev/null +++ b/infrastructure/cicd/tekton/kustomization.yaml @@ -0,0 +1,11 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - rbac/ + - secrets/ + - configs/ + - tasks/ + - triggers/ + - pipelines/ + - cleanup/ diff --git a/infrastructure/cicd/tekton/pipelines/ci-pipeline.yaml b/infrastructure/cicd/tekton/pipelines/ci-pipeline.yaml new file mode 100644 index 00000000..43dcac6c --- /dev/null +++ b/infrastructure/cicd/tekton/pipelines/ci-pipeline.yaml @@ -0,0 +1,149 @@ +# Main CI Pipeline for Bakery-IA +# This pipeline orchestrates the build, test, and deploy process +# Includes: fetch -> detect changes -> test -> build -> update gitops + +apiVersion: tekton.dev/v1beta1 +kind: Pipeline +metadata: + name: bakery-ia-ci + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: pipeline +spec: + workspaces: + - name: shared-workspace + description: Shared workspace for source code + - name: docker-credentials + description: Docker registry credentials + - name: git-credentials + description: Git credentials for pushing GitOps updates + optional: true + params: + - name: git-url + type: string + description: Repository URL + - name: git-revision + type: string + description: Git revision/commit hash + - name: registry + type: string + description: Container registry URL + - name: git-branch + type: string + description: Target branch for GitOps updates + default: "main" + - name: skip-tests + type: string + description: Skip tests if "true" + default: "false" + - name: dry-run + type: string + description: Dry run mode - don't push changes + default: "false" + + tasks: + # Stage 1: Fetch source code + - name: fetch-source + taskRef: + name: git-clone + workspaces: + - name: output + workspace: shared-workspace + params: + - name: url + value: $(params.git-url) + - name: revision + value: $(params.git-revision) + + # Stage 2: Detect which services changed + - name: detect-changes + runAfter: [fetch-source] + taskRef: + name: detect-changed-services + workspaces: + - name: source + workspace: shared-workspace + + # Stage 3: Run tests on changed services + - name: run-tests + runAfter: [detect-changes] + taskRef: + name: run-tests + when: + - input: "$(tasks.detect-changes.results.changed-services)" + operator: notin + values: ["none", "infrastructure"] + - input: "$(params.skip-tests)" + operator: notin + values: ["true"] + workspaces: + - name: source + workspace: shared-workspace + params: + - name: services + value: $(tasks.detect-changes.results.changed-services) + - name: skip-tests + value: $(params.skip-tests) + + # Stage 4: Build and push container images + - name: build-and-push + runAfter: [run-tests] + taskRef: + name: kaniko-build + when: + - input: "$(tasks.detect-changes.results.changed-services)" + operator: notin + values: ["none", "infrastructure"] + workspaces: + - name: source + workspace: shared-workspace + - name: docker-credentials + workspace: docker-credentials + params: + - name: services + value: $(tasks.detect-changes.results.changed-services) + - name: registry + value: $(params.registry) + - name: git-revision + value: $(params.git-revision) + + # Stage 5: Update GitOps manifests + - name: update-gitops-manifests + runAfter: [build-and-push] + taskRef: + name: update-gitops + when: + - input: "$(tasks.detect-changes.results.changed-services)" + operator: notin + values: ["none", "infrastructure"] + - input: "$(tasks.build-and-push.results.build-status)" + operator: in + values: ["success", "partial"] + workspaces: + - name: source + workspace: shared-workspace + - name: git-credentials + workspace: git-credentials + params: + - name: services + value: $(tasks.detect-changes.results.changed-services) + - name: registry + value: $(params.registry) + - name: git-revision + value: $(params.git-revision) + - name: git-branch + value: $(params.git-branch) + - name: dry-run + value: $(params.dry-run) + + # Final tasks that run regardless of pipeline success/failure + finally: + - name: pipeline-summary + taskRef: + name: pipeline-summary + params: + - name: changed-services + value: $(tasks.detect-changes.results.changed-services) + - name: git-revision + value: $(params.git-revision) \ No newline at end of file diff --git a/infrastructure/cicd/tekton/pipelines/kustomization.yaml b/infrastructure/cicd/tekton/pipelines/kustomization.yaml new file mode 100644 index 00000000..7dcb6dab --- /dev/null +++ b/infrastructure/cicd/tekton/pipelines/kustomization.yaml @@ -0,0 +1,6 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - ci-pipeline.yaml + - prod-deploy-pipeline.yaml diff --git a/infrastructure/cicd/tekton/pipelines/prod-deploy-pipeline.yaml b/infrastructure/cicd/tekton/pipelines/prod-deploy-pipeline.yaml new file mode 100644 index 00000000..caa38955 --- /dev/null +++ b/infrastructure/cicd/tekton/pipelines/prod-deploy-pipeline.yaml @@ -0,0 +1,118 @@ +# Production Deployment Pipeline for Bakery-IA +# This pipeline handles production deployments with manual approval gate +# It should be triggered after the CI pipeline succeeds + +apiVersion: tekton.dev/v1beta1 +kind: Pipeline +metadata: + name: bakery-ia-prod-deploy + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: pipeline + app.kubernetes.io/environment: production +spec: + workspaces: + - name: shared-workspace + description: Shared workspace for source code + - name: git-credentials + description: Git credentials for pushing GitOps updates + optional: true + params: + - name: git-url + type: string + description: Repository URL + - name: git-revision + type: string + description: Git revision/commit hash to deploy + - name: services + type: string + description: Comma-separated list of services to deploy + - name: registry + type: string + description: Container registry URL + - name: approver + type: string + description: Name of the person who approved this deployment + default: "automated" + - name: approval-ticket + type: string + description: Ticket/issue number for deployment approval + default: "N/A" + + tasks: + # Stage 1: Fetch source code + - name: fetch-source + taskRef: + name: git-clone + workspaces: + - name: output + workspace: shared-workspace + params: + - name: url + value: $(params.git-url) + - name: revision + value: $(params.git-revision) + + # Stage 2: Verify images exist in registry + - name: verify-images + runAfter: [fetch-source] + taskRef: + name: verify-images + params: + - name: services + value: $(params.services) + - name: registry + value: $(params.registry) + - name: git-revision + value: $(params.git-revision) + + # Stage 3: Pre-deployment validation + - name: pre-deploy-validation + runAfter: [verify-images] + taskRef: + name: pre-deploy-validation + workspaces: + - name: source + workspace: shared-workspace + params: + - name: services + value: $(params.services) + - name: environment + value: "production" + + # Stage 4: Update production manifests + - name: update-prod-manifests + runAfter: [pre-deploy-validation] + taskRef: + name: update-gitops + workspaces: + - name: source + workspace: shared-workspace + - name: git-credentials + workspace: git-credentials + params: + - name: services + value: $(params.services) + - name: registry + value: $(params.registry) + - name: git-revision + value: $(params.git-revision) + - name: git-branch + value: "main" + - name: dry-run + value: "false" + + finally: + - name: deployment-summary + taskRef: + name: prod-deployment-summary + params: + - name: services + value: $(params.services) + - name: git-revision + value: $(params.git-revision) + - name: approver + value: $(params.approver) + - name: approval-ticket + value: $(params.approval-ticket) diff --git a/infrastructure/cicd/tekton/rbac/kustomization.yaml b/infrastructure/cicd/tekton/rbac/kustomization.yaml new file mode 100644 index 00000000..e841e66b --- /dev/null +++ b/infrastructure/cicd/tekton/rbac/kustomization.yaml @@ -0,0 +1,6 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - rbac.yaml + - resource-quota.yaml diff --git a/infrastructure/cicd/tekton/rbac/rbac.yaml b/infrastructure/cicd/tekton/rbac/rbac.yaml new file mode 100644 index 00000000..2e1c2ba7 --- /dev/null +++ b/infrastructure/cicd/tekton/rbac/rbac.yaml @@ -0,0 +1,159 @@ +# Tekton RBAC Configuration for Bakery-IA CI/CD +# This file defines ServiceAccounts, Roles, and RoleBindings for Tekton + +--- +# ServiceAccount for Tekton Triggers EventListener +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tekton-triggers-sa + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers + +--- +# ServiceAccount for Pipeline execution +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tekton-pipeline-sa + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: pipeline + +--- +# ClusterRole for Tekton Triggers to create PipelineRuns +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: tekton-triggers-role + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +rules: + # Ability to create PipelineRuns from triggers + - apiGroups: ["tekton.dev"] + resources: ["pipelineruns", "taskruns"] + verbs: ["create", "get", "list", "watch"] + # Ability to read pipelines and tasks + - apiGroups: ["tekton.dev"] + resources: ["pipelines", "tasks", "clustertasks"] + verbs: ["get", "list", "watch"] + # Ability to manage PVCs for workspaces + - apiGroups: [""] + resources: ["persistentvolumeclaims"] + verbs: ["create", "get", "list", "watch", "delete"] + # Ability to read secrets for credentials + - apiGroups: [""] + resources: ["secrets"] + verbs: ["get", "list", "watch"] + # Ability to read configmaps + - apiGroups: [""] + resources: ["configmaps"] + verbs: ["get", "list", "watch"] + # Ability to manage events for logging + - apiGroups: [""] + resources: ["events"] + verbs: ["create", "patch"] + +--- +# ClusterRoleBinding for Tekton Triggers +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: tekton-triggers-binding + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +subjects: + - kind: ServiceAccount + name: tekton-triggers-sa + namespace: tekton-pipelines +roleRef: + kind: ClusterRole + name: tekton-triggers-role + apiGroup: rbac.authorization.k8s.io + +--- +# ClusterRole for Pipeline execution (needed for git operations and deployments) +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: tekton-pipeline-role + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: pipeline +rules: + # Ability to read/update deployments for GitOps + - apiGroups: ["apps"] + resources: ["deployments"] + verbs: ["get", "list", "watch", "patch", "update"] + # Ability to read secrets for credentials + - apiGroups: [""] + resources: ["secrets"] + verbs: ["get", "list", "watch"] + # Ability to read configmaps + - apiGroups: [""] + resources: ["configmaps"] + verbs: ["get", "list", "watch"] + # Ability to manage pods for build operations + - apiGroups: [""] + resources: ["pods", "pods/log"] + verbs: ["get", "list", "watch"] + +--- +# ClusterRoleBinding for Pipeline execution +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: tekton-pipeline-binding + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: pipeline +subjects: + - kind: ServiceAccount + name: tekton-pipeline-sa + namespace: tekton-pipelines +roleRef: + kind: ClusterRole + name: tekton-pipeline-role + apiGroup: rbac.authorization.k8s.io + +--- +# Role for EventListener to access triggers resources +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: tekton-triggers-eventlistener-role + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +rules: + - apiGroups: ["triggers.tekton.dev"] + resources: ["eventlisteners", "triggerbindings", "triggertemplates", "triggers", "interceptors"] + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["configmaps", "secrets"] + verbs: ["get", "list", "watch"] + +--- +# RoleBinding for EventListener +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: tekton-triggers-eventlistener-binding + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +subjects: + - kind: ServiceAccount + name: tekton-triggers-sa + namespace: tekton-pipelines +roleRef: + kind: Role + name: tekton-triggers-eventlistener-role + apiGroup: rbac.authorization.k8s.io diff --git a/infrastructure/cicd/tekton/rbac/resource-quota.yaml b/infrastructure/cicd/tekton/rbac/resource-quota.yaml new file mode 100644 index 00000000..b582c9a0 --- /dev/null +++ b/infrastructure/cicd/tekton/rbac/resource-quota.yaml @@ -0,0 +1,64 @@ +# ResourceQuota for Tekton Pipelines Namespace +# Prevents resource exhaustion from runaway pipeline runs +# +# This quota limits: +# - Total CPU and memory that can be requested/used +# - Number of concurrent pods +# - Number of PVCs for workspaces + +apiVersion: v1 +kind: ResourceQuota +metadata: + name: tekton-pipelines-quota + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: quota +spec: + hard: + # Limit total CPU + requests.cpu: "8" + limits.cpu: "16" + # Limit total memory + requests.memory: "16Gi" + limits.memory: "32Gi" + # Limit number of pods (controls concurrent pipeline tasks) + pods: "20" + # Limit PVCs (controls workspace storage) + persistentvolumeclaims: "10" + # Limit storage + requests.storage: "50Gi" + +--- +# LimitRange to set defaults and limits for individual pods +# Ensures every pod has resource requests/limits +apiVersion: v1 +kind: LimitRange +metadata: + name: tekton-pipelines-limits + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: quota +spec: + limits: + # Default limits for containers + - type: Container + default: + cpu: "1" + memory: "1Gi" + defaultRequest: + cpu: "100m" + memory: "256Mi" + max: + cpu: "4" + memory: "8Gi" + min: + cpu: "50m" + memory: "64Mi" + # Limits for PVCs + - type: PersistentVolumeClaim + max: + storage: "10Gi" + min: + storage: "1Gi" diff --git a/infrastructure/cicd/tekton/secrets/.gitignore b/infrastructure/cicd/tekton/secrets/.gitignore new file mode 100644 index 00000000..ff9b1b71 --- /dev/null +++ b/infrastructure/cicd/tekton/secrets/.gitignore @@ -0,0 +1,4 @@ +# Ignore generated secrets +.webhook-secret +*-actual.yaml +sealed-secrets.yaml diff --git a/infrastructure/cicd/tekton/secrets/generate-secrets.sh b/infrastructure/cicd/tekton/secrets/generate-secrets.sh new file mode 100755 index 00000000..386a4338 --- /dev/null +++ b/infrastructure/cicd/tekton/secrets/generate-secrets.sh @@ -0,0 +1,167 @@ +#!/bin/bash +# Generate CI/CD Secrets for Bakery-IA +# +# This script creates Kubernetes secrets required for the CI/CD pipeline. +# Run this script once during initial setup. +# +# Usage: +# ./generate-secrets.sh [options] +# +# Options: +# --registry-url Container registry URL (default: gitea.bakery-ia.local:5000) +# --gitea-user Gitea username (will prompt if not provided) +# --gitea-password Gitea password (will prompt if not provided) +# --dry-run Print commands without executing + +set -e + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' # No Color + +# Default values +REGISTRY_URL="${REGISTRY_URL:-gitea.bakery-ia.local:5000}" +DRY_RUN=false +KUBECTL="kubectl" + +# Check if running in microk8s +if command -v microk8s &> /dev/null; then + KUBECTL="microk8s kubectl" +fi + +# Parse arguments +while [[ $# -gt 0 ]]; do + case $1 in + --registry-url) + REGISTRY_URL="$2" + shift 2 + ;; + --gitea-user) + GITEA_USERNAME="$2" + shift 2 + ;; + --gitea-password) + GITEA_PASSWORD="$2" + shift 2 + ;; + --dry-run) + DRY_RUN=true + shift + ;; + *) + echo -e "${RED}Unknown option: $1${NC}" + exit 1 + ;; + esac +done + +echo "==========================================" +echo " Bakery-IA CI/CD Secrets Generator" +echo "==========================================" +echo "" + +# Prompt for credentials if not provided +if [ -z "$GITEA_USERNAME" ]; then + read -p "Enter Gitea username: " GITEA_USERNAME +fi + +if [ -z "$GITEA_PASSWORD" ]; then + read -s -p "Enter Gitea password: " GITEA_PASSWORD + echo "" +fi + +# Generate webhook secret +WEBHOOK_SECRET=$(openssl rand -hex 32) + +echo "" +echo -e "${YELLOW}Configuration:${NC}" +echo " Registry URL: $REGISTRY_URL" +echo " Gitea User: $GITEA_USERNAME" +echo " Webhook Secret: ${WEBHOOK_SECRET:0:8}..." +echo "" + +# Function to create secret +create_secret() { + local cmd="$1" + if [ "$DRY_RUN" = true ]; then + echo -e "${YELLOW}[DRY-RUN]${NC} $cmd" + else + eval "$cmd" + fi +} + +# Ensure namespaces exist +echo -e "${GREEN}Creating namespaces if they don't exist...${NC}" +create_secret "$KUBECTL create namespace tekton-pipelines --dry-run=client -o yaml | $KUBECTL apply -f -" +create_secret "$KUBECTL create namespace flux-system --dry-run=client -o yaml | $KUBECTL apply -f -" + +echo "" +echo -e "${GREEN}Creating secrets...${NC}" + +# 1. Webhook Secret +echo " Creating gitea-webhook-secret..." +create_secret "$KUBECTL create secret generic gitea-webhook-secret \ + --namespace tekton-pipelines \ + --from-literal=secretToken='$WEBHOOK_SECRET' \ + --dry-run=client -o yaml | $KUBECTL apply -f -" + +# 2. Registry Credentials (docker-registry type) +echo " Creating gitea-registry-credentials..." +create_secret "$KUBECTL create secret docker-registry gitea-registry-credentials \ + --namespace tekton-pipelines \ + --docker-server='$REGISTRY_URL' \ + --docker-username='$GITEA_USERNAME' \ + --docker-password='$GITEA_PASSWORD' \ + --dry-run=client -o yaml | $KUBECTL apply -f -" + +# 3. Git Credentials for Tekton +echo " Creating gitea-git-credentials..." +create_secret "$KUBECTL create secret generic gitea-git-credentials \ + --namespace tekton-pipelines \ + --from-literal=username='$GITEA_USERNAME' \ + --from-literal=password='$GITEA_PASSWORD' \ + --dry-run=client -o yaml | $KUBECTL apply -f -" + +# 4. Flux Git Credentials +echo " Creating gitea-credentials for Flux..." +create_secret "$KUBECTL create secret generic gitea-credentials \ + --namespace flux-system \ + --from-literal=username='$GITEA_USERNAME' \ + --from-literal=password='$GITEA_PASSWORD' \ + --dry-run=client -o yaml | $KUBECTL apply -f -" + +# Label all secrets +echo "" +echo -e "${GREEN}Adding labels to secrets...${NC}" +for ns in tekton-pipelines flux-system; do + for secret in gitea-webhook-secret gitea-registry-credentials gitea-git-credentials gitea-credentials; do + if $KUBECTL get secret "$secret" -n "$ns" &> /dev/null; then + create_secret "$KUBECTL label secret $secret -n $ns app.kubernetes.io/name=bakery-ia-cicd --overwrite 2>/dev/null || true" + fi + done +done + +echo "" +echo "==========================================" +echo -e "${GREEN}Secrets created successfully!${NC}" +echo "==========================================" +echo "" +echo -e "${YELLOW}IMPORTANT:${NC} Save this webhook secret for Gitea webhook configuration:" +echo "" +echo " Webhook Secret: $WEBHOOK_SECRET" +echo "" +echo "Configure this in Gitea:" +echo " 1. Go to Repository Settings > Webhooks" +echo " 2. Add webhook with URL: http://el-bakery-ia-listener.tekton-pipelines.svc.cluster.local:8080" +echo " 3. Set Secret to the webhook secret above" +echo " 4. Select events: Push" +echo "" + +# Save webhook secret to a file for reference (gitignored) +if [ "$DRY_RUN" = false ]; then + echo "$WEBHOOK_SECRET" > "$(dirname "$0")/.webhook-secret" + chmod 600 "$(dirname "$0")/.webhook-secret" + echo "Webhook secret saved to .webhook-secret (gitignored)" +fi diff --git a/infrastructure/cicd/tekton/secrets/kustomization.yaml b/infrastructure/cicd/tekton/secrets/kustomization.yaml new file mode 100644 index 00000000..77e5c8bc --- /dev/null +++ b/infrastructure/cicd/tekton/secrets/kustomization.yaml @@ -0,0 +1,19 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - secrets.yaml + +# Note: In production, use sealed-secrets or external-secrets-operator +# to manage secrets securely. The secrets.yaml file contains placeholder +# values that must be replaced before deployment. +# +# Example using sealed-secrets: +# 1. Install sealed-secrets controller +# 2. Create SealedSecret resources instead of plain Secrets +# 3. Commit the SealedSecret manifests to Git (safe to commit) +# +# Example using external-secrets-operator: +# 1. Install external-secrets-operator +# 2. Configure a SecretStore (AWS Secrets Manager, HashiCorp Vault, etc.) +# 3. Create ExternalSecret resources that reference the SecretStore diff --git a/infrastructure/cicd/tekton/secrets/secrets-template.yaml b/infrastructure/cicd/tekton/secrets/secrets-template.yaml new file mode 100644 index 00000000..bf9ecd38 --- /dev/null +++ b/infrastructure/cicd/tekton/secrets/secrets-template.yaml @@ -0,0 +1,79 @@ +# CI/CD Secrets Template for Tekton Pipelines +# +# DO NOT commit this file with actual credentials! +# Use the generate-secrets.sh script to create secrets safely. +# +# For production, use one of these approaches: +# 1. Sealed Secrets: kubeseal < secrets.yaml > sealed-secrets.yaml +# 2. External Secrets Operator: Configure with your secret store +# 3. Manual creation: kubectl create secret ... (see generate-secrets.sh) + +--- +# Secret for Gitea webhook validation +# Used by EventListener to validate incoming webhooks +apiVersion: v1 +kind: Secret +metadata: + name: gitea-webhook-secret + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +type: Opaque +stringData: + # Generate with: openssl rand -hex 32 + secretToken: "${WEBHOOK_SECRET_TOKEN}" + +--- +# Secret for Gitea container registry credentials +# Used by Kaniko to push images to Gitea registry +apiVersion: v1 +kind: Secret +metadata: + name: gitea-registry-credentials + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: build +type: kubernetes.io/dockerconfigjson +stringData: + .dockerconfigjson: | + { + "auths": { + "${REGISTRY_URL}": { + "username": "${GITEA_USERNAME}", + "password": "${GITEA_PASSWORD}" + } + } + } + +--- +# Secret for Git credentials (used by pipeline to push GitOps updates) +apiVersion: v1 +kind: Secret +metadata: + name: gitea-git-credentials + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: gitops +type: Opaque +stringData: + username: "${GITEA_USERNAME}" + password: "${GITEA_PASSWORD}" + +--- +# Secret for Flux GitRepository access +# Used by Flux to pull from Gitea repository +apiVersion: v1 +kind: Secret +metadata: + name: gitea-credentials + namespace: flux-system + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: flux +type: Opaque +stringData: + username: "${GITEA_USERNAME}" + password: "${GITEA_PASSWORD}" diff --git a/infrastructure/cicd/tekton/secrets/secrets.yaml b/infrastructure/cicd/tekton/secrets/secrets.yaml new file mode 100644 index 00000000..41b4c8c4 --- /dev/null +++ b/infrastructure/cicd/tekton/secrets/secrets.yaml @@ -0,0 +1,98 @@ +# CI/CD Secrets for Tekton Pipelines +# +# WARNING: This file contains EXAMPLE values only! +# DO NOT use these values in production. +# +# To create actual secrets, use ONE of these methods: +# +# Method 1 (Recommended): Use the generate-secrets.sh script +# ./generate-secrets.sh --gitea-user --gitea-password +# +# Method 2: Create secrets manually with kubectl +# kubectl create secret generic gitea-webhook-secret \ +# --namespace tekton-pipelines \ +# --from-literal=secretToken="$(openssl rand -hex 32)" +# +# Method 3: Use Sealed Secrets for GitOps +# kubeseal < secrets-template.yaml > sealed-secrets.yaml +# +# Method 4: Use External Secrets Operator +# Configure ESO to pull from your secret store (Vault, AWS SM, etc.) + +--- +# Example Secret for Gitea webhook validation +# Used by EventListener to validate incoming webhooks +apiVersion: v1 +kind: Secret +metadata: + name: gitea-webhook-secret + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers + annotations: + note: "EXAMPLE - Replace with actual secret using generate-secrets.sh" +type: Opaque +stringData: + # Generate with: openssl rand -hex 32 + secretToken: "example-webhook-token-do-not-use-in-production" + +--- +# Example Secret for Gitea container registry credentials +# Used by Kaniko to push images to Gitea registry +apiVersion: v1 +kind: Secret +metadata: + name: gitea-registry-credentials + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: build + annotations: + note: "EXAMPLE - Replace with actual secret using generate-secrets.sh" +type: kubernetes.io/dockerconfigjson +stringData: + .dockerconfigjson: | + { + "auths": { + "gitea.bakery-ia.local:5000": { + "username": "example-user", + "password": "example-password" + } + } + } + +--- +# Example Secret for Git credentials (used by pipeline to push GitOps updates) +apiVersion: v1 +kind: Secret +metadata: + name: gitea-git-credentials + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: gitops + annotations: + note: "EXAMPLE - Replace with actual secret using generate-secrets.sh" +type: Opaque +stringData: + username: "example-user" + password: "example-password" + +--- +# Example Secret for Flux GitRepository access +# Used by Flux to pull from Gitea repository +apiVersion: v1 +kind: Secret +metadata: + name: gitea-credentials + namespace: flux-system + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: flux + annotations: + note: "EXAMPLE - Replace with actual secret using generate-secrets.sh" +type: Opaque +stringData: + username: "example-user" + password: "example-password" diff --git a/infrastructure/cicd/tekton/tasks/detect-changes.yaml b/infrastructure/cicd/tekton/tasks/detect-changes.yaml new file mode 100644 index 00000000..e3727838 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/detect-changes.yaml @@ -0,0 +1,154 @@ +# Tekton Detect Changed Services Task for Bakery-IA CI/CD +# This task identifies which services have changed in the repository + +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: detect-changed-services + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: detect +spec: + workspaces: + - name: source + description: Source code workspace + params: + - name: base-ref + type: string + description: Base reference for comparison (default HEAD~1) + default: "HEAD~1" + results: + - name: changed-services + description: Comma-separated list of changed services + - name: changed-files-count + description: Number of files changed + steps: + - name: detect + image: alpine/git:2.43.0 + script: | + #!/bin/sh + set -e + + SOURCE_PATH="$(workspaces.source.path)" + BASE_REF="$(params.base-ref)" + + cd "$SOURCE_PATH" + + echo "============================================" + echo "Detect Changed Services" + echo "============================================" + echo "Base ref: $BASE_REF" + echo "============================================" + + # Get list of changed files compared to base reference + echo "" + echo "Detecting changed files..." + + # Try to get diff, fall back to listing all files if this is the first commit + CHANGED_FILES=$(git diff --name-only "$BASE_REF" HEAD 2>/dev/null || git ls-tree -r HEAD --name-only) + + FILE_COUNT=$(echo "$CHANGED_FILES" | grep -c "." || echo "0") + echo "Found $FILE_COUNT changed files" + echo "$FILE_COUNT" > $(results.changed-files-count.path) + + if [ "$FILE_COUNT" = "0" ]; then + echo "No files changed" + echo "none" > $(results.changed-services.path) + exit 0 + fi + + echo "" + echo "Changed files:" + echo "$CHANGED_FILES" | head -20 + if [ "$FILE_COUNT" -gt 20 ]; then + echo "... and $((FILE_COUNT - 20)) more files" + fi + + # Map files to services using simple shell (no bash arrays) + echo "" + echo "Mapping files to services..." + + CHANGED_SERVICES="" + + # Process each file + echo "$CHANGED_FILES" | while read -r file; do + if [ -z "$file" ]; then + continue + fi + + # Check services directory + if echo "$file" | grep -q "^services/"; then + SERVICE=$(echo "$file" | cut -d'/' -f2) + if [ -n "$SERVICE" ] && ! echo "$CHANGED_SERVICES" | grep -q "$SERVICE"; then + if [ -z "$CHANGED_SERVICES" ]; then + CHANGED_SERVICES="$SERVICE" + else + CHANGED_SERVICES="$CHANGED_SERVICES,$SERVICE" + fi + echo "$CHANGED_SERVICES" > /tmp/services.txt + fi + fi + + # Check frontend + if echo "$file" | grep -q "^frontend/"; then + if ! echo "$CHANGED_SERVICES" | grep -q "frontend"; then + if [ -z "$CHANGED_SERVICES" ]; then + CHANGED_SERVICES="frontend" + else + CHANGED_SERVICES="$CHANGED_SERVICES,frontend" + fi + echo "$CHANGED_SERVICES" > /tmp/services.txt + fi + fi + + # Check gateway + if echo "$file" | grep -q "^gateway/"; then + if ! echo "$CHANGED_SERVICES" | grep -q "gateway"; then + if [ -z "$CHANGED_SERVICES" ]; then + CHANGED_SERVICES="gateway" + else + CHANGED_SERVICES="$CHANGED_SERVICES,gateway" + fi + echo "$CHANGED_SERVICES" > /tmp/services.txt + fi + fi + + # Check infrastructure + if echo "$file" | grep -q "^infrastructure/"; then + if ! echo "$CHANGED_SERVICES" | grep -q "infrastructure"; then + if [ -z "$CHANGED_SERVICES" ]; then + CHANGED_SERVICES="infrastructure" + else + CHANGED_SERVICES="$CHANGED_SERVICES,infrastructure" + fi + echo "$CHANGED_SERVICES" > /tmp/services.txt + fi + fi + done + + # Read the accumulated services + if [ -f /tmp/services.txt ]; then + CHANGED_SERVICES=$(cat /tmp/services.txt) + fi + + echo "" + echo "============================================" + + # Output result + if [ -z "$CHANGED_SERVICES" ]; then + echo "No service changes detected" + echo "none" > $(results.changed-services.path) + else + echo "Detected changes in services: $CHANGED_SERVICES" + echo "$CHANGED_SERVICES" > $(results.changed-services.path) + fi + + echo "============================================" + resources: + limits: + cpu: 200m + memory: 128Mi + requests: + cpu: 50m + memory: 64Mi \ No newline at end of file diff --git a/infrastructure/cicd/tekton/tasks/git-clone.yaml b/infrastructure/cicd/tekton/tasks/git-clone.yaml new file mode 100644 index 00000000..133ac2cf --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/git-clone.yaml @@ -0,0 +1,95 @@ +# Tekton Git Clone Task for Bakery-IA CI/CD +# This task clones the source code repository + +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: git-clone + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: source +spec: + workspaces: + - name: output + description: Workspace to clone the repository into + params: + - name: url + type: string + description: Repository URL to clone + - name: revision + type: string + description: Git revision to checkout + default: "main" + - name: depth + type: string + description: Git clone depth (0 for full history) + default: "1" + results: + - name: commit-sha + description: The commit SHA that was checked out + - name: commit-message + description: The commit message + steps: + - name: clone + image: alpine/git:2.43.0 + script: | + #!/bin/sh + set -e + + URL="$(params.url)" + REVISION="$(params.revision)" + DEPTH="$(params.depth)" + OUTPUT_PATH="$(workspaces.output.path)" + + echo "============================================" + echo "Git Clone Task" + echo "============================================" + echo "URL: $URL" + echo "Revision: $REVISION" + echo "Depth: $DEPTH" + echo "============================================" + + # Clone with depth for faster checkout + if [ "$DEPTH" = "0" ]; then + echo "Cloning full repository..." + git clone "$URL" "$OUTPUT_PATH" + else + echo "Cloning with depth $DEPTH..." + git clone --depth "$DEPTH" "$URL" "$OUTPUT_PATH" + fi + + cd "$OUTPUT_PATH" + + # Fetch the specific revision if needed + if [ "$REVISION" != "main" ] && [ "$REVISION" != "master" ]; then + echo "Fetching revision: $REVISION" + git fetch --depth 1 origin "$REVISION" 2>/dev/null || true + fi + + # Checkout the revision + echo "Checking out: $REVISION" + git checkout "$REVISION" 2>/dev/null || git checkout "origin/$REVISION" + + # Get commit info + COMMIT_SHA=$(git rev-parse HEAD) + COMMIT_MSG=$(git log -1 --pretty=format:"%s") + + echo "" + echo "============================================" + echo "Clone Complete" + echo "============================================" + echo "Commit: $COMMIT_SHA" + echo "Message: $COMMIT_MSG" + echo "============================================" + + # Write results + echo -n "$COMMIT_SHA" > $(results.commit-sha.path) + echo -n "$COMMIT_MSG" > $(results.commit-message.path) + resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 100m + memory: 128Mi \ No newline at end of file diff --git a/infrastructure/cicd/tekton/tasks/kaniko-build.yaml b/infrastructure/cicd/tekton/tasks/kaniko-build.yaml new file mode 100644 index 00000000..1019a435 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/kaniko-build.yaml @@ -0,0 +1,200 @@ +# Tekton Kaniko Build Task for Bakery-IA CI/CD +# This task builds and pushes container images using Kaniko +# Supports building multiple services from a comma-separated list + +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: kaniko-build + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: build +spec: + workspaces: + - name: source + description: Source code workspace + - name: docker-credentials + description: Docker registry credentials + params: + - name: services + type: string + description: Comma-separated list of services to build + - name: registry + type: string + description: Container registry URL + - name: git-revision + type: string + description: Git revision for image tag + default: "latest" + results: + - name: built-images + description: List of successfully built images + - name: build-status + description: Overall build status (success/failure) + steps: + # Step 1: Setup docker credentials + - name: setup-docker-config + image: alpine:3.18 + script: | + #!/bin/sh + set -e + echo "Setting up Docker credentials..." + mkdir -p /kaniko/.docker + + # Check if credentials secret is mounted + if [ -f "$(workspaces.docker-credentials.path)/config.json" ]; then + cp "$(workspaces.docker-credentials.path)/config.json" /kaniko/.docker/config.json + echo "Docker config copied from secret" + elif [ -f "$(workspaces.docker-credentials.path)/.dockerconfigjson" ]; then + cp "$(workspaces.docker-credentials.path)/.dockerconfigjson" /kaniko/.docker/config.json + echo "Docker config copied from .dockerconfigjson" + else + echo "Warning: No docker credentials found, builds may fail for private registries" + echo '{}' > /kaniko/.docker/config.json + fi + volumeMounts: + - name: docker-config + mountPath: /kaniko/.docker + resources: + limits: + cpu: 100m + memory: 64Mi + requests: + cpu: 50m + memory: 32Mi + + # Step 2: Build each service iteratively + - name: build-services + image: gcr.io/kaniko-project/executor:v1.23.0 + script: | + #!/busybox/sh + set -e + + SERVICES="$(params.services)" + REGISTRY="$(params.registry)" + REVISION="$(params.git-revision)" + SOURCE_PATH="$(workspaces.source.path)" + BUILT_IMAGES="" + FAILED_SERVICES="" + + echo "============================================" + echo "Starting build for services: $SERVICES" + echo "Registry: $REGISTRY" + echo "Tag: $REVISION" + echo "============================================" + + # Skip if no services to build + if [ "$SERVICES" = "none" ] || [ -z "$SERVICES" ]; then + echo "No services to build, skipping..." + echo "none" > $(results.built-images.path) + echo "skipped" > $(results.build-status.path) + exit 0 + fi + + # Convert comma-separated list to space-separated + SERVICES_LIST=$(echo "$SERVICES" | tr ',' ' ') + + for SERVICE in $SERVICES_LIST; do + # Trim whitespace + SERVICE=$(echo "$SERVICE" | tr -d ' ') + + # Skip infrastructure changes (not buildable) + if [ "$SERVICE" = "infrastructure" ]; then + echo "Skipping infrastructure (not a buildable service)" + continue + fi + + echo "" + echo "--------------------------------------------" + echo "Building service: $SERVICE" + echo "--------------------------------------------" + + # Determine Dockerfile path based on service type + if [ "$SERVICE" = "frontend" ]; then + DOCKERFILE_PATH="$SOURCE_PATH/frontend/Dockerfile" + CONTEXT_PATH="$SOURCE_PATH/frontend" + elif [ "$SERVICE" = "gateway" ]; then + DOCKERFILE_PATH="$SOURCE_PATH/gateway/Dockerfile" + CONTEXT_PATH="$SOURCE_PATH/gateway" + else + DOCKERFILE_PATH="$SOURCE_PATH/services/$SERVICE/Dockerfile" + CONTEXT_PATH="$SOURCE_PATH" + fi + + # Check if Dockerfile exists + if [ ! -f "$DOCKERFILE_PATH" ]; then + echo "Warning: Dockerfile not found at $DOCKERFILE_PATH, skipping $SERVICE" + FAILED_SERVICES="$FAILED_SERVICES $SERVICE" + continue + fi + + IMAGE_NAME="$REGISTRY/bakery/$SERVICE:$REVISION" + IMAGE_NAME_LATEST="$REGISTRY/bakery/$SERVICE:latest" + + echo "Dockerfile: $DOCKERFILE_PATH" + echo "Context: $CONTEXT_PATH" + echo "Image: $IMAGE_NAME" + + # Run Kaniko build + /kaniko/executor \ + --dockerfile="$DOCKERFILE_PATH" \ + --context="$CONTEXT_PATH" \ + --destination="$IMAGE_NAME" \ + --destination="$IMAGE_NAME_LATEST" \ + --cache=true \ + --cache-ttl=24h \ + --verbosity=info \ + --snapshot-mode=redo \ + --use-new-run + + BUILD_EXIT_CODE=$? + + if [ $BUILD_EXIT_CODE -eq 0 ]; then + echo "Successfully built and pushed: $IMAGE_NAME" + if [ -z "$BUILT_IMAGES" ]; then + BUILT_IMAGES="$IMAGE_NAME" + else + BUILT_IMAGES="$BUILT_IMAGES,$IMAGE_NAME" + fi + else + echo "Failed to build: $SERVICE (exit code: $BUILD_EXIT_CODE)" + FAILED_SERVICES="$FAILED_SERVICES $SERVICE" + fi + done + + echo "" + echo "============================================" + echo "Build Summary" + echo "============================================" + echo "Built images: $BUILT_IMAGES" + echo "Failed services: $FAILED_SERVICES" + + # Write results + if [ -z "$BUILT_IMAGES" ]; then + echo "none" > $(results.built-images.path) + else + echo "$BUILT_IMAGES" > $(results.built-images.path) + fi + + if [ -n "$FAILED_SERVICES" ]; then + echo "partial" > $(results.build-status.path) + echo "Warning: Some services failed to build: $FAILED_SERVICES" + else + echo "success" > $(results.build-status.path) + fi + volumeMounts: + - name: docker-config + mountPath: /kaniko/.docker + securityContext: + runAsUser: 0 + resources: + limits: + cpu: 2000m + memory: 4Gi + requests: + cpu: 500m + memory: 1Gi + volumes: + - name: docker-config + emptyDir: {} \ No newline at end of file diff --git a/infrastructure/cicd/tekton/tasks/kustomization.yaml b/infrastructure/cicd/tekton/tasks/kustomization.yaml new file mode 100644 index 00000000..8a52bb91 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/kustomization.yaml @@ -0,0 +1,14 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - git-clone.yaml + - detect-changes.yaml + - run-tests.yaml + - kaniko-build.yaml + - update-gitops.yaml + - pipeline-summary.yaml + # Production deployment tasks + - verify-images.yaml + - pre-deploy-validation.yaml + - prod-deployment-summary.yaml diff --git a/infrastructure/cicd/tekton/tasks/pipeline-summary.yaml b/infrastructure/cicd/tekton/tasks/pipeline-summary.yaml new file mode 100644 index 00000000..01976e1f --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/pipeline-summary.yaml @@ -0,0 +1,62 @@ +# Tekton Pipeline Summary Task for Bakery-IA CI/CD +# This task runs at the end of the pipeline and provides a summary + +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: pipeline-summary + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: summary +spec: + params: + - name: changed-services + type: string + description: List of changed services + default: "none" + - name: git-revision + type: string + description: Git revision that was built + default: "unknown" + steps: + - name: summary + image: alpine:3.18 + script: | + #!/bin/sh + + SERVICES="$(params.changed-services)" + REVISION="$(params.git-revision)" + + echo "" + echo "============================================" + echo " Pipeline Execution Summary" + echo "============================================" + echo "" + echo "Git Revision: $REVISION" + echo "Changed Services: $SERVICES" + echo "" + echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" + echo "" + echo "============================================" + echo "" + + if [ "$SERVICES" = "none" ] || [ -z "$SERVICES" ]; then + echo "No services were changed in this commit." + echo "Pipeline completed without building any images." + else + echo "The following services were processed:" + echo "$SERVICES" | tr ',' '\n' | while read service; do + echo " - $service" + done + fi + + echo "" + echo "============================================" + resources: + limits: + cpu: 100m + memory: 64Mi + requests: + cpu: 50m + memory: 32Mi diff --git a/infrastructure/cicd/tekton/tasks/pre-deploy-validation.yaml b/infrastructure/cicd/tekton/tasks/pre-deploy-validation.yaml new file mode 100644 index 00000000..a5ba6810 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/pre-deploy-validation.yaml @@ -0,0 +1,76 @@ +# Task for pre-deployment validation +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: pre-deploy-validation + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: validation +spec: + workspaces: + - name: source + description: Source code workspace + params: + - name: services + type: string + description: Comma-separated list of services to validate + - name: environment + type: string + description: Target environment (staging/production) + default: "production" + results: + - name: validation-status + description: Status of validation (passed/failed) + steps: + - name: validate + image: registry.k8s.io/kustomize/kustomize:v5.3.0 + script: | + #!/bin/sh + set -e + + SOURCE_PATH="$(workspaces.source.path)" + SERVICES="$(params.services)" + ENVIRONMENT="$(params.environment)" + + echo "============================================" + echo "Pre-Deployment Validation" + echo "============================================" + echo "Environment: $ENVIRONMENT" + echo "Services: $SERVICES" + echo "============================================" + + cd "$SOURCE_PATH" + + # Validate kustomization can be built + KUSTOMIZE_DIR="infrastructure/environments/$ENVIRONMENT" + + if [ -d "$KUSTOMIZE_DIR" ]; then + echo "" + echo "Validating kustomization..." + if kustomize build "$KUSTOMIZE_DIR" > /dev/null 2>&1; then + echo " ✓ Kustomization is valid" + else + echo " ✗ Kustomization validation failed" + echo "failed" > $(results.validation-status.path) + exit 1 + fi + fi + + # Additional validation checks can be added here + # - Schema validation + # - Policy checks (OPA/Gatekeeper) + # - Security scanning + + echo "" + echo "============================================" + echo "All validations passed" + echo "============================================" + echo "passed" > $(results.validation-status.path) + resources: + limits: + cpu: 500m + memory: 256Mi + requests: + cpu: 100m + memory: 128Mi diff --git a/infrastructure/cicd/tekton/tasks/prod-deployment-summary.yaml b/infrastructure/cicd/tekton/tasks/prod-deployment-summary.yaml new file mode 100644 index 00000000..827096c8 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/prod-deployment-summary.yaml @@ -0,0 +1,57 @@ +# Task for production deployment summary +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: prod-deployment-summary + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: summary +spec: + params: + - name: services + type: string + description: List of deployed services + - name: git-revision + type: string + description: Git revision that was deployed + - name: approver + type: string + description: Name of the approver + - name: approval-ticket + type: string + description: Approval ticket number + steps: + - name: summary + image: alpine:3.18 + script: | + #!/bin/sh + + echo "" + echo "============================================" + echo " Production Deployment Summary" + echo "============================================" + echo "" + echo "Git Revision: $(params.git-revision)" + echo "Services: $(params.services)" + echo "Approved By: $(params.approver)" + echo "Approval Ticket: $(params.approval-ticket)" + echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" + echo "" + echo "============================================" + echo "" + echo "Deployment to production initiated." + echo "Flux CD will reconcile the changes." + echo "" + echo "Monitor deployment status with:" + echo " kubectl get kustomization -n flux-system" + echo " kubectl get pods -n bakery-ia" + echo "" + echo "============================================" + resources: + limits: + cpu: 100m + memory: 64Mi + requests: + cpu: 50m + memory: 32Mi diff --git a/infrastructure/cicd/tekton/tasks/run-tests.yaml b/infrastructure/cicd/tekton/tasks/run-tests.yaml new file mode 100644 index 00000000..32919c62 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/run-tests.yaml @@ -0,0 +1,205 @@ +# Tekton Test Task for Bakery-IA CI/CD +# This task runs unit tests and linting for changed services + +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: run-tests + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: test +spec: + workspaces: + - name: source + description: Source code workspace + params: + - name: services + type: string + description: Comma-separated list of services to test + - name: skip-lint + type: string + description: Skip linting if "true" + default: "false" + - name: skip-tests + type: string + description: Skip tests if "true" + default: "false" + results: + - name: test-status + description: Overall test status (passed/failed/skipped) + - name: tested-services + description: List of services that were tested + - name: failed-services + description: List of services that failed tests + steps: + - name: run-tests + image: python:3.11-slim + script: | + #!/bin/bash + set -e + + SOURCE_PATH="$(workspaces.source.path)" + SERVICES="$(params.services)" + SKIP_LINT="$(params.skip-lint)" + SKIP_TESTS="$(params.skip-tests)" + + TESTED_SERVICES="" + FAILED_SERVICES="" + OVERALL_STATUS="passed" + + cd "$SOURCE_PATH" + + echo "============================================" + echo "Running Tests" + echo "============================================" + echo "Services: $SERVICES" + echo "Skip Lint: $SKIP_LINT" + echo "Skip Tests: $SKIP_TESTS" + echo "============================================" + + # Skip if no services to test + if [ "$SERVICES" = "none" ] || [ -z "$SERVICES" ]; then + echo "No services to test, skipping..." + echo "skipped" > $(results.test-status.path) + echo "none" > $(results.tested-services.path) + echo "none" > $(results.failed-services.path) + exit 0 + fi + + # Install common test dependencies + echo "" + echo "Installing test dependencies..." + pip install --quiet pytest pytest-cov pytest-asyncio ruff mypy 2>/dev/null || true + + # Convert comma-separated list to space-separated + SERVICES_LIST=$(echo "$SERVICES" | tr ',' ' ') + + for SERVICE in $SERVICES_LIST; do + # Trim whitespace + SERVICE=$(echo "$SERVICE" | tr -d ' ') + + # Skip infrastructure changes + if [ "$SERVICE" = "infrastructure" ]; then + echo "Skipping infrastructure (not testable)" + continue + fi + + echo "" + echo "--------------------------------------------" + echo "Testing service: $SERVICE" + echo "--------------------------------------------" + + # Determine service path + if [ "$SERVICE" = "frontend" ]; then + SERVICE_PATH="$SOURCE_PATH/frontend" + elif [ "$SERVICE" = "gateway" ]; then + SERVICE_PATH="$SOURCE_PATH/gateway" + else + SERVICE_PATH="$SOURCE_PATH/services/$SERVICE" + fi + + # Check if service exists + if [ ! -d "$SERVICE_PATH" ]; then + echo "Warning: Service directory not found: $SERVICE_PATH" + continue + fi + + cd "$SERVICE_PATH" + SERVICE_FAILED=false + + # Install service-specific dependencies if requirements.txt exists + if [ -f "requirements.txt" ]; then + echo "Installing service dependencies..." + pip install --quiet -r requirements.txt 2>/dev/null || true + fi + + # Run linting (ruff) + if [ "$SKIP_LINT" != "true" ]; then + echo "" + echo "Running linter (ruff)..." + if [ -d "app" ]; then + ruff check app/ --output-format=text 2>&1 || { + echo "Linting failed for $SERVICE" + SERVICE_FAILED=true + } + fi + fi + + # Run tests + if [ "$SKIP_TESTS" != "true" ]; then + echo "" + echo "Running tests (pytest)..." + if [ -d "tests" ]; then + pytest tests/ -v --tb=short 2>&1 || { + echo "Tests failed for $SERVICE" + SERVICE_FAILED=true + } + elif [ -d "app/tests" ]; then + pytest app/tests/ -v --tb=short 2>&1 || { + echo "Tests failed for $SERVICE" + SERVICE_FAILED=true + } + else + echo "No tests directory found, skipping tests" + fi + fi + + # Track results + if [ -z "$TESTED_SERVICES" ]; then + TESTED_SERVICES="$SERVICE" + else + TESTED_SERVICES="$TESTED_SERVICES,$SERVICE" + fi + + if [ "$SERVICE_FAILED" = true ]; then + OVERALL_STATUS="failed" + if [ -z "$FAILED_SERVICES" ]; then + FAILED_SERVICES="$SERVICE" + else + FAILED_SERVICES="$FAILED_SERVICES,$SERVICE" + fi + fi + + cd "$SOURCE_PATH" + done + + echo "" + echo "============================================" + echo "Test Summary" + echo "============================================" + echo "Tested services: $TESTED_SERVICES" + echo "Failed services: $FAILED_SERVICES" + echo "Overall status: $OVERALL_STATUS" + + # Write results + echo "$OVERALL_STATUS" > $(results.test-status.path) + + if [ -z "$TESTED_SERVICES" ]; then + echo "none" > $(results.tested-services.path) + else + echo "$TESTED_SERVICES" > $(results.tested-services.path) + fi + + if [ -z "$FAILED_SERVICES" ]; then + echo "none" > $(results.failed-services.path) + else + echo "$FAILED_SERVICES" > $(results.failed-services.path) + fi + + # Exit with error if tests failed + if [ "$OVERALL_STATUS" = "failed" ]; then + echo "" + echo "ERROR: Some tests failed!" + exit 1 + fi + + echo "" + echo "All tests passed!" + resources: + limits: + cpu: 1000m + memory: 2Gi + requests: + cpu: 500m + memory: 1Gi diff --git a/infrastructure/cicd/tekton/tasks/update-gitops.yaml b/infrastructure/cicd/tekton/tasks/update-gitops.yaml new file mode 100644 index 00000000..c9b02c0b --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/update-gitops.yaml @@ -0,0 +1,302 @@ +# Tekton Update GitOps Manifests Task for Bakery-IA CI/CD +# This task updates Kubernetes manifests with new image tags using Kustomize +# It uses a safer approach than sed for updating image references + +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: update-gitops + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: gitops +spec: + workspaces: + - name: source + description: Source code workspace with Git repository + - name: git-credentials + description: Git credentials for pushing changes + optional: true + params: + - name: services + type: string + description: Comma-separated list of services to update + - name: registry + type: string + description: Container registry URL + - name: git-revision + type: string + description: Git revision for image tag + - name: git-branch + type: string + description: Target branch for GitOps updates + default: "main" + - name: dry-run + type: string + description: If "true", only show what would be changed without committing + default: "false" + results: + - name: updated-services + description: List of services that were updated + - name: commit-sha + description: Git commit SHA of the update (empty if dry-run) + steps: + - name: update-manifests + # Use alpine with curl to install kustomize + image: alpine:3.19 + script: | + #!/bin/sh + set -e + + # Install kustomize + echo "Installing kustomize..." + wget -q "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" -O - | sh + mv kustomize /usr/local/bin/ + echo "Kustomize version: $(kustomize version)" + + SOURCE_PATH="$(workspaces.source.path)" + SERVICES="$(params.services)" + REGISTRY="$(params.registry)" + REVISION="$(params.git-revision)" + DRY_RUN="$(params.dry-run)" + UPDATED_SERVICES="" + + cd "$SOURCE_PATH" + + echo "============================================" + echo "GitOps Manifest Update" + echo "============================================" + echo "Services: $SERVICES" + echo "Registry: $REGISTRY" + echo "Revision: $REVISION" + echo "Dry Run: $DRY_RUN" + echo "============================================" + + # Skip if no services to update + if [ "$SERVICES" = "none" ] || [ -z "$SERVICES" ]; then + echo "No services to update, skipping..." + echo "none" > $(results.updated-services.path) + echo "" > $(results.commit-sha.path) + exit 0 + fi + + # Define the kustomization directory + KUSTOMIZE_DIR="infrastructure/environments/prod" + + # Check if kustomization.yaml exists, create if not + if [ ! -f "$KUSTOMIZE_DIR/kustomization.yaml" ]; then + echo "Creating kustomization.yaml in $KUSTOMIZE_DIR" + mkdir -p "$KUSTOMIZE_DIR" + printf '%s\n' \ + "apiVersion: kustomize.config.k8s.io/v1beta1" \ + "kind: Kustomization" \ + "" \ + "resources:" \ + " - ../base" \ + "" \ + "images: []" \ + > "$KUSTOMIZE_DIR/kustomization.yaml" + fi + + # Convert comma-separated list to space-separated + SERVICES_LIST=$(echo "$SERVICES" | tr ',' ' ') + + # Build the images section for kustomization + echo "" + echo "Updating image references..." + + for SERVICE in $SERVICES_LIST; do + # Trim whitespace + SERVICE=$(echo "$SERVICE" | tr -d ' ') + + # Skip infrastructure changes + if [ "$SERVICE" = "infrastructure" ]; then + echo "Skipping infrastructure (not a deployable service)" + continue + fi + + echo "Processing: $SERVICE" + + # Determine the image name based on service + NEW_IMAGE="$REGISTRY/bakery/$SERVICE:$REVISION" + + # Use kustomize to set the image + # This is safer than sed as it understands the YAML structure + cd "$SOURCE_PATH/$KUSTOMIZE_DIR" + + # Check if this service has a deployment + SERVICE_DEPLOYMENT="" + if [ "$SERVICE" = "frontend" ]; then + SERVICE_DEPLOYMENT="frontend" + elif [ "$SERVICE" = "gateway" ]; then + SERVICE_DEPLOYMENT="gateway" + else + SERVICE_DEPLOYMENT="$SERVICE-service" + fi + + # Update the kustomization with the new image + # Using kustomize edit to safely modify the file + kustomize edit set image "bakery/$SERVICE=$NEW_IMAGE" 2>/dev/null || \ + kustomize edit set image "$SERVICE=$NEW_IMAGE" 2>/dev/null || \ + echo "Note: Could not set image via kustomize edit, will use alternative method" + + # Track updated services + if [ -z "$UPDATED_SERVICES" ]; then + UPDATED_SERVICES="$SERVICE" + else + UPDATED_SERVICES="$UPDATED_SERVICES,$SERVICE" + fi + + cd "$SOURCE_PATH" + done + + # Alternative: Update images in kustomization.yaml directly if kustomize edit didn't work + # This creates/updates an images section in the kustomization + echo "" + echo "Ensuring image overrides in kustomization.yaml..." + + # Create a patch file for image updates + IMAGES_FILE="$KUSTOMIZE_DIR/images.yaml" + printf '%s\n' \ + "# Auto-generated by CI/CD pipeline" \ + "# Commit: $REVISION" \ + "# Updated: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" \ + "images:" \ + > "$IMAGES_FILE" + + for SERVICE in $SERVICES_LIST; do + SERVICE=$(echo "$SERVICE" | tr -d ' ') + if [ "$SERVICE" != "infrastructure" ]; then + printf '%s\n' \ + " - name: bakery/$SERVICE" \ + " newName: $REGISTRY/bakery/$SERVICE" \ + " newTag: \"$REVISION\"" \ + >> "$IMAGES_FILE" + fi + done + + echo "" + echo "Generated images.yaml:" + cat "$IMAGES_FILE" + + # Validate the kustomization + echo "" + echo "Validating kustomization..." + cd "$SOURCE_PATH/$KUSTOMIZE_DIR" + if kustomize build . > /dev/null 2>&1; then + echo "Kustomization is valid" + else + echo "Warning: Kustomization validation failed, but continuing..." + fi + cd "$SOURCE_PATH" + + # Write results + echo "$UPDATED_SERVICES" > $(results.updated-services.path) + + if [ "$DRY_RUN" = "true" ]; then + echo "" + echo "============================================" + echo "DRY RUN - Changes not committed" + echo "============================================" + echo "Would update services: $UPDATED_SERVICES" + git diff --stat || true + echo "" > $(results.commit-sha.path) + exit 0 + fi + + echo "" + echo "Committing changes..." + resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 100m + memory: 256Mi + + - name: commit-and-push + image: alpine/git:2.43.0 + script: | + #!/bin/sh + set -e + + SOURCE_PATH="$(workspaces.source.path)" + SERVICES="$(params.services)" + REVISION="$(params.git-revision)" + BRANCH="$(params.git-branch)" + DRY_RUN="$(params.dry-run)" + + cd "$SOURCE_PATH" + + if [ "$DRY_RUN" = "true" ]; then + echo "Dry run mode - skipping commit" + echo "" > $(results.commit-sha.path) + exit 0 + fi + + if [ "$SERVICES" = "none" ] || [ -z "$SERVICES" ]; then + echo "No services to commit" + echo "" > $(results.commit-sha.path) + exit 0 + fi + + # Check if there are changes to commit + if git diff --quiet && git diff --cached --quiet; then + echo "No changes to commit" + echo "" > $(results.commit-sha.path) + exit 0 + fi + + # Configure git + git config --global user.name "bakery-ia-ci" + git config --global user.email "ci@bakery-ia.local" + git config --global --add safe.directory "$SOURCE_PATH" + + # Setup git credentials if provided + if [ -d "$(workspaces.git-credentials.path)" ]; then + if [ -f "$(workspaces.git-credentials.path)/username" ] && [ -f "$(workspaces.git-credentials.path)/password" ]; then + GIT_USER=$(cat "$(workspaces.git-credentials.path)/username") + GIT_PASS=$(cat "$(workspaces.git-credentials.path)/password") + + # Get the remote URL and inject credentials + REMOTE_URL=$(git remote get-url origin) + # Handle both http and https + if echo "$REMOTE_URL" | grep -q "^http"; then + REMOTE_URL=$(echo "$REMOTE_URL" | sed "s|://|://$GIT_USER:$GIT_PASS@|") + git remote set-url origin "$REMOTE_URL" + fi + fi + fi + + # Stage changes + git add -A + + # Create commit with detailed message + COMMIT_MSG=$(printf 'ci: Update image tags to %s\n\nServices updated: %s\n\nThis commit was automatically generated by the CI/CD pipeline.\nPipeline run triggered by commit: %s' "$REVISION" "$SERVICES" "$REVISION") + + git commit -m "$COMMIT_MSG" + + # Get the commit SHA + COMMIT_SHA=$(git rev-parse HEAD) + echo "$COMMIT_SHA" > $(results.commit-sha.path) + + echo "Created commit: $COMMIT_SHA" + + # Push changes + echo "Pushing to origin/$BRANCH..." + git push origin HEAD:"$BRANCH" + + echo "" + echo "============================================" + echo "GitOps Update Complete" + echo "============================================" + echo "Commit: $COMMIT_SHA" + echo "Branch: $BRANCH" + echo "Services: $SERVICES" + resources: + limits: + cpu: 200m + memory: 128Mi + requests: + cpu: 50m + memory: 64Mi \ No newline at end of file diff --git a/infrastructure/cicd/tekton/tasks/verify-images.yaml b/infrastructure/cicd/tekton/tasks/verify-images.yaml new file mode 100644 index 00000000..95bbe722 --- /dev/null +++ b/infrastructure/cicd/tekton/tasks/verify-images.yaml @@ -0,0 +1,91 @@ +# Task to verify images exist in the registry before deploying +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: verify-images + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: validation +spec: + params: + - name: services + type: string + description: Comma-separated list of services to verify + - name: registry + type: string + description: Container registry URL + - name: git-revision + type: string + description: Git revision/tag to verify + results: + - name: verification-status + description: Status of image verification (success/failed) + - name: missing-images + description: List of images that were not found + steps: + - name: verify + image: gcr.io/go-containerregistry/crane:latest + script: | + #!/bin/sh + set -e + + SERVICES="$(params.services)" + REGISTRY="$(params.registry)" + REVISION="$(params.git-revision)" + MISSING="" + + echo "============================================" + echo "Verifying Images in Registry" + echo "============================================" + echo "Registry: $REGISTRY" + echo "Revision: $REVISION" + echo "Services: $SERVICES" + echo "============================================" + + # Convert comma-separated list to space-separated + SERVICES_LIST=$(echo "$SERVICES" | tr ',' ' ') + + for SERVICE in $SERVICES_LIST; do + SERVICE=$(echo "$SERVICE" | tr -d ' ') + + if [ "$SERVICE" = "infrastructure" ]; then + continue + fi + + IMAGE="$REGISTRY/bakery/$SERVICE:$REVISION" + echo "" + echo "Checking: $IMAGE" + + if crane manifest "$IMAGE" > /dev/null 2>&1; then + echo " ✓ Found" + else + echo " ✗ NOT FOUND" + if [ -z "$MISSING" ]; then + MISSING="$SERVICE" + else + MISSING="$MISSING,$SERVICE" + fi + fi + done + + echo "" + echo "============================================" + + if [ -n "$MISSING" ]; then + echo "ERROR: Missing images: $MISSING" + echo "failed" > $(results.verification-status.path) + echo "$MISSING" > $(results.missing-images.path) + exit 1 + fi + + echo "All images verified successfully" + echo "success" > $(results.verification-status.path) + echo "none" > $(results.missing-images.path) + resources: + limits: + cpu: 200m + memory: 128Mi + requests: + cpu: 100m + memory: 64Mi diff --git a/infrastructure/cicd/tekton/triggers/event-listener.yaml b/infrastructure/cicd/tekton/triggers/event-listener.yaml new file mode 100644 index 00000000..fb1789f4 --- /dev/null +++ b/infrastructure/cicd/tekton/triggers/event-listener.yaml @@ -0,0 +1,35 @@ +# Tekton EventListener for Bakery-IA CI/CD +# This listener receives webhook events and triggers pipelines + +apiVersion: triggers.tekton.dev/v1beta1 +kind: EventListener +metadata: + name: bakery-ia-listener + namespace: tekton-pipelines +spec: + serviceAccountName: tekton-triggers-sa + triggers: + - name: bakery-ia-gitea-trigger + bindings: + - ref: bakery-ia-trigger-binding + template: + ref: bakery-ia-trigger-template + # Using CEL interceptor for local development (no TLS/CA bundle required) + # The CEL interceptor is built-in and doesn't need external services + interceptors: + - name: "filter-push-events" + ref: + name: "cel" + params: + # Filter for push events from Gitea or GitHub + - name: "filter" + value: "header.match('X-Gitea-Event', 'push') || header.match('X-GitHub-Event', 'push')" + # Add overlays to standardize the payload + - name: "overlays" + value: + - key: "git_url" + expression: "body.repository.clone_url" + - key: "git_revision" + expression: "body.after" + - key: "git_branch" + expression: "body.ref.split('/')[2]" \ No newline at end of file diff --git a/infrastructure/cicd/tekton/triggers/kustomization.yaml b/infrastructure/cicd/tekton/triggers/kustomization.yaml new file mode 100644 index 00000000..22537c6a --- /dev/null +++ b/infrastructure/cicd/tekton/triggers/kustomization.yaml @@ -0,0 +1,9 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + # NOTE: gitlab-interceptor.yaml removed - uses built-in Tekton Triggers interceptor + # The gitlab ClusterInterceptor is provided by Tekton Triggers installation + - event-listener.yaml + - trigger-template.yaml + - trigger-binding.yaml diff --git a/infrastructure/cicd/tekton/triggers/trigger-binding.yaml b/infrastructure/cicd/tekton/triggers/trigger-binding.yaml new file mode 100644 index 00000000..d262cea5 --- /dev/null +++ b/infrastructure/cicd/tekton/triggers/trigger-binding.yaml @@ -0,0 +1,31 @@ +# Tekton TriggerBinding for Bakery-IA CI/CD +# This binding extracts parameters from Gitea webhook events +# +# Note: We use CEL overlay extensions for consistent field access +# The EventListener's CEL interceptor creates these extensions: +# - extensions.git_url: Repository clone URL +# - extensions.git_revision: Commit SHA (from body.after) +# - extensions.git_branch: Branch name (extracted from ref) + +apiVersion: triggers.tekton.dev/v1beta1 +kind: TriggerBinding +metadata: + name: bakery-ia-trigger-binding + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +spec: + params: + # Use CEL overlay extensions for consistent access across Git providers + - name: git-repo-url + value: $(extensions.git_url) + - name: git-revision + value: $(extensions.git_revision) + - name: git-branch + value: $(extensions.git_branch) + # Direct body access for fields not in overlays + - name: git-repo-name + value: $(body.repository.name) + - name: git-repo-full-name + value: $(body.repository.full_name) diff --git a/infrastructure/cicd/tekton/triggers/trigger-template.yaml b/infrastructure/cicd/tekton/triggers/trigger-template.yaml new file mode 100644 index 00000000..97129dae --- /dev/null +++ b/infrastructure/cicd/tekton/triggers/trigger-template.yaml @@ -0,0 +1,86 @@ +# Tekton TriggerTemplate for Bakery-IA CI/CD +# This template defines how PipelineRuns are created when triggers fire +# +# Registry URL Configuration: +# The registry URL is configured via the 'registry' parameter. +# Default value should match pipeline-config ConfigMap's REGISTRY_URL. +# To change the registry, update BOTH: +# 1. This template's default value +# 2. The pipeline-config ConfigMap + +apiVersion: triggers.tekton.dev/v1beta1 +kind: TriggerTemplate +metadata: + name: bakery-ia-trigger-template + namespace: tekton-pipelines + labels: + app.kubernetes.io/name: bakery-ia-cicd + app.kubernetes.io/component: triggers +spec: + params: + - name: git-repo-url + description: The git repository URL + - name: git-revision + description: The git revision/commit hash + - name: git-branch + description: The git branch name + default: "main" + - name: git-repo-name + description: The git repository name + default: "bakery-ia" + - name: git-repo-full-name + description: The full repository name (org/repo) + default: "bakery/bakery-ia" + # Registry URL - keep in sync with pipeline-config ConfigMap + - name: registry-url + description: Container registry URL + default: "gitea.bakery-ia.local:5000" + resourcetemplates: + - apiVersion: tekton.dev/v1beta1 + kind: PipelineRun + metadata: + generateName: bakery-ia-ci-run- + labels: + app.kubernetes.io/name: bakery-ia-cicd + tekton.dev/pipeline: bakery-ia-ci + triggers.tekton.dev/trigger: bakery-ia-gitea-trigger + annotations: + # Track the source commit + bakery-ia.io/git-revision: $(tt.params.git-revision) + bakery-ia.io/git-branch: $(tt.params.git-branch) + spec: + pipelineRef: + name: bakery-ia-ci + serviceAccountName: tekton-pipeline-sa + workspaces: + - name: shared-workspace + volumeClaimTemplate: + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 5Gi + - name: docker-credentials + secret: + secretName: gitea-registry-credentials + - name: git-credentials + secret: + secretName: gitea-git-credentials + params: + - name: git-url + value: $(tt.params.git-repo-url) + - name: git-revision + value: $(tt.params.git-revision) + - name: git-branch + value: $(tt.params.git-branch) + # Use template parameter for registry URL + - name: registry + value: $(tt.params.registry-url) + - name: skip-tests + value: "false" + - name: dry-run + value: "false" + # Timeout for the entire pipeline run + timeouts: + pipeline: "1h0m0s" + tasks: "45m0s" diff --git a/infrastructure/kubernetes/base/configmap.yaml b/infrastructure/environments/common/configs/configmap.yaml similarity index 97% rename from infrastructure/kubernetes/base/configmap.yaml rename to infrastructure/environments/common/configs/configmap.yaml index 43ea8100..25213d82 100644 --- a/infrastructure/kubernetes/base/configmap.yaml +++ b/infrastructure/environments/common/configs/configmap.yaml @@ -7,7 +7,6 @@ metadata: app.kubernetes.io/name: bakery-ia app.kubernetes.io/component: config data: - # ================================================================ # ENVIRONMENT & BUILD SETTINGS # ================================================================ ENVIRONMENT: "development" @@ -31,7 +30,7 @@ data: BUILD_DATE: "2024-01-20T10:00:00Z" VCS_REF: "latest" IMAGE_TAG: "latest" - DOMAIN: "bakery.yourdomain.com" + DOMAIN: "bakewise.ai" AUTO_RELOAD: "false" PROFILING_ENABLED: "false" MOCK_EXTERNAL_APIS: "false" @@ -177,13 +176,13 @@ data: # ================================================================ # EMAIL CONFIGURATION # ================================================================ - SMTP_HOST: "smtp.gmail.com" + SMTP_HOST: "email-smtp.bakery-ia.svc.cluster.local" SMTP_PORT: "587" SMTP_TLS: "true" SMTP_SSL: "false" - DEFAULT_FROM_EMAIL: "noreply@bakeryforecast.es" + DEFAULT_FROM_EMAIL: "noreply@bakewise.ai" DEFAULT_FROM_NAME: "Bakery-Forecast" - EMAIL_FROM_ADDRESS: "alerts@bakery.local" + EMAIL_FROM_ADDRESS: "alerts@bakewise.ai" EMAIL_FROM_NAME: "Bakery Alert System" # ================================================================ @@ -444,6 +443,13 @@ data: SIGNOZ_ENDPOINT: "http://signoz.bakery-ia.svc.cluster.local:8080" SIGNOZ_FRONTEND_URL: "https://monitoring.bakery-ia.local" + # ================================================================ + # DISTRIBUTION & ROUTING OPTIMIZATION SETTINGS + # ================================================================ + VRP_TIME_LIMIT_SECONDS: "30" + VRP_DEFAULT_VEHICLE_CAPACITY_KG: "1000" + VRP_AVERAGE_SPEED_KMH: "30" + # ================================================================ # REPLENISHMENT PLANNING SETTINGS # ================================================================ diff --git a/infrastructure/environments/common/configs/kustomization.yaml b/infrastructure/environments/common/configs/kustomization.yaml new file mode 100644 index 00000000..2b6423a9 --- /dev/null +++ b/infrastructure/environments/common/configs/kustomization.yaml @@ -0,0 +1,6 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - configmap.yaml + - secrets.yaml diff --git a/infrastructure/kubernetes/base/secrets.yaml b/infrastructure/environments/common/configs/secrets.yaml similarity index 96% rename from infrastructure/kubernetes/base/secrets.yaml rename to infrastructure/environments/common/configs/secrets.yaml index deef9cc6..d5d74835 100644 --- a/infrastructure/kubernetes/base/secrets.yaml +++ b/infrastructure/environments/common/configs/secrets.yaml @@ -160,8 +160,12 @@ metadata: app.kubernetes.io/component: notifications type: Opaque data: - SMTP_USER: eW91ci1lbWFpbEBnbWFpbC5jb20= # your-email@gmail.com - SMTP_PASSWORD: eW91ci1hcHAtc3BlY2lmaWMtcGFzc3dvcmQ= # your-app-specific-password + # SMTP credentials for internal Mailu server + # These are used by notification-service to send emails via mailu-smtp + SMTP_USER: cG9zdG1hc3RlckBiYWtld2lzZS5haQ== # postmaster@bakewise.ai + SMTP_PASSWORD: VzJYS2tSdUxpT25ZS2RCWVFTQXJvbjFpeWtFU1M1b2I= # W2XKkRuLiOnYKdBYQSAron1iykESS5ob + # Dovecot admin password for IMAP management + DOVEADM_PASSWORD: WnZhMzNoaVBJc2ZtV3RxUlBWV29taTRYZ2xLTlZPcHY= # Zva33hiPIsfmWtqRPVWomi4XglKNVOpv --- apiVersion: v1 diff --git a/infrastructure/kubernetes/overlays/dev/dev-certificate.yaml b/infrastructure/environments/dev/k8s-manifests/dev-certificate.yaml similarity index 100% rename from infrastructure/kubernetes/overlays/dev/dev-certificate.yaml rename to infrastructure/environments/dev/k8s-manifests/dev-certificate.yaml diff --git a/infrastructure/environments/dev/k8s-manifests/kustomization.yaml b/infrastructure/environments/dev/k8s-manifests/kustomization.yaml new file mode 100644 index 00000000..22980390 --- /dev/null +++ b/infrastructure/environments/dev/k8s-manifests/kustomization.yaml @@ -0,0 +1,159 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +metadata: + name: bakery-ia-dev + +# NOTE: Do NOT set a global namespace here. +# Each resource already has its namespace explicitly defined. +# A global namespace would incorrectly transform cluster-scoped resources +# like cert-manager namespaces. + +resources: + - ../../../environments/common/configs + - ../../../platform/infrastructure + - ../../../platform/cert-manager + - ../../../platform/networking/ingress/overlays/dev + - ../../../platform/storage + - ../../../platform/mail/mailu + - ../../../services/databases + - ../../../services/microservices + # NOTE: cicd is NOT included here - it's deployed manually via Tilt triggers + # Run 'tilt trigger tekton-install' followed by 'tilt trigger tekton-pipelines-deploy' + # - ../../../cicd + - dev-certificate.yaml + + + +# Dev-specific patches +patches: + - target: + kind: ConfigMap + name: bakery-config + patch: |- + - op: replace + path: /data/ENVIRONMENT + value: "development" + - op: replace + path: /data/DEBUG + value: "true" + # Suspend nominatim in dev to save resources + - target: + kind: StatefulSet + name: nominatim + patch: |- + - op: replace + path: /spec/replicas + value: 0 + # Suspend nominatim-init job in dev (not needed when nominatim is scaled to 0) + - target: + kind: Job + name: nominatim-init + patch: |- + - op: replace + path: /spec/suspend + value: true + # Mailu TLS: Use self-signed dev certificate + - target: + kind: Deployment + name: mailu-front + patch: |- + - op: replace + path: /spec/template/spec/volumes/1/secret/secretName + value: "bakery-dev-tls-cert" + # Mailu Config: Update for dev environment + - target: + kind: ConfigMap + name: mailu-config + patch: |- + - op: replace + path: /data/DOMAIN + value: "bakery-ia.local" + - op: replace + path: /data/HOSTNAMES + value: "mail.bakery-ia.local" + - op: replace + path: /data/RELAY_LOGIN + value: "postmaster@bakery-ia.local" + - op: replace + path: /data/WEBMAIL_ADMIN + value: "admin@bakery-ia.local" + +labels: + - includeSelectors: true + pairs: + environment: development + tier: local + +# Dev image overrides - use local registry to avoid Docker Hub rate limits +# IMPORTANT: All image names must be lowercase (Docker requirement) +# The prepull-base-images.sh script converts names to lowercase when pushing to local registry +images: + # Database images + - name: postgres + newName: localhost:5000/postgres_17-alpine + newTag: latest + - name: redis + newName: localhost:5000/redis_7.4-alpine + newTag: latest + - name: rabbitmq + newName: localhost:5000/rabbitmq_4.1-management-alpine + newTag: latest + # Utility images + - name: busybox + newName: localhost:5000/busybox_1.36 + newTag: latest + - name: curlimages/curl + newName: localhost:5000/curlimages_curl_latest + newTag: latest + - name: bitnami/kubectl + newName: localhost:5000/bitnami_kubectl_latest + newTag: latest + # Alpine variants + - name: alpine + newName: localhost:5000/alpine_3.19 + newTag: latest + - name: alpine/git + newName: localhost:5000/alpine_git_2.43.0 + newTag: latest + # CI/CD images (cached locally for consistency) + - name: gcr.io/kaniko-project/executor + newName: localhost:5000/gcr.io_kaniko-project_executor_v1.23.0 + newTag: latest + - name: gcr.io/go-containerregistry/crane + newName: localhost:5000/gcr.io_go-containerregistry_crane_latest + newTag: latest + - name: registry.k8s.io/kustomize/kustomize + newName: localhost:5000/registry.k8s.io_kustomize_kustomize_v5.3.0 + newTag: latest + # Storage images (lowercase - RELEASE becomes release) + - name: minio/minio + newName: localhost:5000/minio_minio_release.2024-11-07t00-52-20z + newTag: latest + - name: minio/mc + newName: localhost:5000/minio_mc_release.2024-11-17t19-35-25z + newTag: latest + # Geocoding + - name: mediagis/nominatim + newName: localhost:5000/mediagis_nominatim_4.4 + newTag: latest + # Python base image + - name: python + newName: localhost:5000/python_3.11-slim + newTag: latest + # Mail server (Mailu) + - name: ghcr.io/mailu/nginx + newName: localhost:5000/ghcr.io_mailu_nginx_2024.06 + newTag: latest + - name: ghcr.io/mailu/admin + newName: localhost:5000/ghcr.io_mailu_admin_2024.06 + newTag: latest + - name: ghcr.io/mailu/postfix + newName: localhost:5000/ghcr.io_mailu_postfix_2024.06 + newTag: latest + - name: ghcr.io/mailu/dovecot + newName: localhost:5000/ghcr.io_mailu_dovecot_2024.06 + newTag: latest + - name: ghcr.io/mailu/rspamd + newName: localhost:5000/ghcr.io_mailu_rspamd_2024.06 + newTag: latest diff --git a/infrastructure/kubernetes/overlays/prod/kustomization.yaml b/infrastructure/environments/prod/k8s-manifests/kustomization.yaml similarity index 70% rename from infrastructure/kubernetes/overlays/prod/kustomization.yaml rename to infrastructure/environments/prod/k8s-manifests/kustomization.yaml index 2b101877..5757d6ba 100644 --- a/infrastructure/kubernetes/overlays/prod/kustomization.yaml +++ b/infrastructure/environments/prod/k8s-manifests/kustomization.yaml @@ -4,18 +4,28 @@ kind: Kustomization metadata: name: bakery-ia-prod -namespace: bakery-ia +# NOTE: Do NOT set a global namespace here. +# Each resource already has its namespace explicitly defined. +# A global namespace would incorrectly transform cluster-scoped resources +# like flux-system and cert-manager namespaces. resources: - - ../../base - - prod-ingress.yaml + - ../../../environments/common/configs + - ../../../platform/infrastructure + - ../../../platform/cert-manager + - ../../../platform/networking/ingress/overlays/prod + - ../../../platform/storage + - ../../../platform/mail/mailu + - ../../../services/databases + - ../../../services/microservices + - ../../../cicd + - prod-certificate.yaml + + # SigNoz is managed via Helm deployment (see infrastructure/helm/deploy-signoz.sh) # Monitoring is handled by SigNoz (no separate monitoring components needed) # SigNoz paths are now included in the main ingress (ingress-https.yaml) -patchesStrategicMerge: - - storage-patch.yaml - labels: - includeSelectors: true pairs: @@ -159,8 +169,17 @@ patches: limits: memory: "1Gi" cpu: "500m" + # Mailu TLS: Use Let's Encrypt production certificate + - target: + kind: Deployment + name: mailu-front + patch: |- + - op: replace + path: /spec/template/spec/volumes/1/secret/secretName + value: "bakery-ia-prod-tls-cert" images: + # Application services - name: bakery/auth-service newTag: latest - name: bakery/tenant-service @@ -193,6 +212,58 @@ images: newTag: latest - name: bakery/dashboard newTag: latest + # ============================================================================= + # Production Base Images - mapped to production registry + # TODO: Update PROD_REGISTRY_URL to your production registry (e.g., ghcr.io/your-org) + # ============================================================================= + # Database images (using canonical Docker Hub - no rate limits in prod with auth) + - name: postgres + newTag: 17-alpine + - name: redis + newTag: 7.4-alpine + - name: rabbitmq + newTag: 4.1-management-alpine + # Utility images + - name: busybox + newTag: "1.36" + - name: curlimages/curl + newTag: latest + - name: bitnami/kubectl + newTag: latest + # Alpine variants + - name: alpine + newTag: "3.19" + - name: alpine/git + newTag: 2.43.0 + # CI/CD images (GCR/registry.k8s.io - no rate limits) + - name: gcr.io/kaniko-project/executor + newTag: v1.23.0 + - name: gcr.io/go-containerregistry/crane + newTag: latest + - name: registry.k8s.io/kustomize/kustomize + newTag: v5.3.0 + # Storage images + - name: minio/minio + newTag: RELEASE.2024-11-07T00-52-20Z + - name: minio/mc + newTag: RELEASE.2024-11-17T19-35-25Z + # Geocoding + - name: mediagis/nominatim + newTag: "4.4" + # Python base image + - name: python + newTag: 3.11-slim + # Mail server (Mailu) - using canonical GHCR names + - name: ghcr.io/mailu/nginx + newTag: "2024.06" + - name: ghcr.io/mailu/admin + newTag: "2024.06" + - name: ghcr.io/mailu/postfix + newTag: "2024.06" + - name: ghcr.io/mailu/dovecot + newTag: "2024.06" + - name: ghcr.io/mailu/rspamd + newTag: "2024.06" replicas: - name: auth-service diff --git a/infrastructure/environments/prod/k8s-manifests/prod-certificate.yaml b/infrastructure/environments/prod/k8s-manifests/prod-certificate.yaml new file mode 100644 index 00000000..194245ee --- /dev/null +++ b/infrastructure/environments/prod/k8s-manifests/prod-certificate.yaml @@ -0,0 +1,48 @@ +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: bakery-ia-prod-tls-cert + namespace: bakery-ia +spec: + # Let's Encrypt certificate for production + secretName: bakery-ia-prod-tls-cert + + # Certificate duration and renewal + duration: 2160h # 90 days (Let's Encrypt default) + renewBefore: 360h # 15 days before expiry + + # Subject configuration + subject: + organizations: + - Bakery IA + + # Common name + commonName: bakewise.ai + + # DNS names this certificate is valid for + dnsNames: + - bakewise.ai + - www.bakewise.ai + - mail.bakewise.ai + - monitoring.bakewise.ai + - gitea.bakewise.ai + - api.bakewise.ai + + # Use Let's Encrypt production issuer + issuerRef: + name: letsencrypt-production + kind: ClusterIssuer + group: cert-manager.io + + # Private key configuration + privateKey: + algorithm: RSA + encoding: PKCS1 + size: 2048 + + # Usages + usages: + - server auth + - client auth + - digital signature + - key encipherment diff --git a/infrastructure/kubernetes/overlays/prod/prod-configmap.yaml b/infrastructure/environments/prod/k8s-manifests/prod-configmap.yaml similarity index 100% rename from infrastructure/kubernetes/overlays/prod/prod-configmap.yaml rename to infrastructure/environments/prod/k8s-manifests/prod-configmap.yaml diff --git a/infrastructure/kubernetes/README.md b/infrastructure/kubernetes/README.md deleted file mode 100644 index 8c42f4e7..00000000 --- a/infrastructure/kubernetes/README.md +++ /dev/null @@ -1,299 +0,0 @@ -# Bakery IA Kubernetes Configuration - -This directory contains Kubernetes manifests for deploying the Bakery IA platform in local development and production environments with HTTPS support using cert-manager and NGINX ingress. - -## Quick Start - -Deploy the entire platform with these 4 commands: - -```bash -# 1. Start Colima with adequate resources -colima start --cpu 6 --memory 12 --disk 120 --runtime docker --profile k8s-local - -# 2. Create Kind cluster with permanent localhost access -kind create cluster --config kind-config.yaml - -# 3. Install NGINX Ingress Controller -kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml -kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=300s - -# 4. Deploy with Tilt -tilt up - -# 🎉 Access at: http://localhost (or see Tilt for individual service ports) -``` - -> **Note**: The kind-config.yaml already configures port mappings (30080→80, 30443→443) for localhost access, so no additional service patching is needed. The NGINX Ingress for Kind uses NodePort by default on those exact ports. - -## Prerequisites - -Install the following tools on macOS: - -```bash -# Install via Homebrew -brew install colima kind kubectl skaffold - -# Verify installations -colima version && kind version && kubectl version --client && skaffold version -``` - -## Directory Structure - -``` -infrastructure/kubernetes/ -├── base/ # Base Kubernetes resources -│ ├── namespace.yaml # Namespace definition -│ ├── configmap.yaml # Shared configuration -│ ├── secrets.yaml # Base64 encoded secrets -│ ├── ingress-https.yaml # HTTPS ingress rules -│ ├── kustomization.yaml # Base kustomization -│ └── components/ # Individual component manifests -│ ├── cert-manager/ # Certificate management -│ ├── auth/ # Authentication service -│ ├── tenant/ # Tenant management -│ ├── training/ # ML training service -│ ├── forecasting/ # Demand forecasting -│ ├── sales/ # Sales management -│ ├── external/ # External API service -│ ├── notification/ # Notification service -│ ├── inventory/ # Inventory management -│ ├── recipes/ # Recipe management -│ ├── suppliers/ # Supplier management -│ ├── pos/ # Point of sale -│ ├── orders/ # Order management -│ ├── production/ # Production planning -│ ├── alert-processor/ # Alert processing -│ ├── frontend/ # React frontend -│ ├── databases/ # Database deployments -│ └── infrastructure/ # Gateway & monitoring -└── overlays/ - └── dev/ # Development environment - ├── kustomization.yaml # Dev-specific configuration - └── dev-patches.yaml # Development patches -``` - -## Access URLs - -### Primary Access (Standard Web Ports) -- **Frontend**: https://localhost -- **API Gateway**: https://localhost/api - -### Named Host Access (Optional) -Add to `/etc/hosts` for named access: -```bash -echo "127.0.0.1 bakery-ia.local" | sudo tee -a /etc/hosts -echo "127.0.0.1 api.bakery-ia.local" | sudo tee -a /etc/hosts -echo "127.0.0.1 monitoring.bakery-ia.local" | sudo tee -a /etc/hosts -``` - -Then access via: -- **Frontend**: https://bakery-ia.local -- **API**: https://api.bakery-ia.local -- **Monitoring**: https://monitoring.bakery-ia.local - -### Direct Service Access (Development) -- **Frontend**: http://localhost:3000 -- **Gateway**: http://localhost:8000 - -## Development Workflow - -### Start Development Environment -```bash -# Start development mode with hot-reload using Tilt -tilt up - -# Or start in background -tilt up --stream -``` - -### Key Features -- ✅ **Hot-reload development** - Automatic rebuilds on code changes -- ✅ **Permanent localhost access** - No port forwarding needed -- ✅ **HTTPS by default** - Local CA certificates for secure development -- ✅ **Microservices architecture** - All services deployed together -- ✅ **Database management** - PostgreSQL, Redis, and RabbitMQ included - -### Monitor and Debug -```bash -# Check all resources -kubectl get all -n bakery-ia - -# View logs -kubectl logs -n bakery-ia deployment/auth-service -f - -# Check ingress status -kubectl get ingress -n bakery-ia - -# Debug certificate issues -kubectl describe certificate bakery-ia-tls-cert -n bakery-ia -``` - -## Certificate Management - -The platform uses cert-manager for automatic HTTPS certificate generation: - -- **Local CA**: For development (default) -- **Let's Encrypt Staging**: For testing -- **Let's Encrypt Production**: For production deployments - -### Trust Local Certificates -```bash -# Export CA certificate -kubectl get secret local-ca-key-pair -n cert-manager -o jsonpath='{.data.tls\.crt}' | base64 -d > bakery-ia-ca.crt - -# Trust in macOS -open bakery-ia-ca.crt -# In Keychain Access, set "bakery-ia-local-ca" to "Always Trust" -``` - -## Configuration Management - -### Secrets -Base64-encoded secrets are stored in `base/secrets.yaml`. For production: -- Use external secret management (HashiCorp Vault, AWS Secrets Manager) -- Never commit real secrets to version control - -```bash -# Encode secrets -echo -n "your-secret-value" | base64 - -# Decode secrets -echo "eW91ci1zZWNyZXQtdmFsdWU=" | base64 -d -``` - -### Environment Configuration -Development-specific settings are in `overlays/dev/`: -- **Resource limits**: Reduced for local development -- **Image pull policy**: Never (for local images) -- **Debug settings**: Enabled -- **CORS**: Configured for localhost - -## Scaling and Resource Management - -### Scale Services -```bash -# Scale individual service -kubectl scale -n bakery-ia deployment/auth-service --replicas=3 - -# Or update kustomization.yaml replicas section -``` - -### Resource Configuration -Development environment uses minimal resources: -- **Databases**: 64Mi-256Mi memory, 25m-200m CPU -- **Services**: 64Mi-256Mi memory, 25m-200m CPU -- **Training Service**: 256Mi-1Gi memory (ML workloads) - -## Troubleshooting - -### Common Issues - -1. **Images not found** - ```bash - # Build images with Skaffold - skaffold build --profile=dev - ``` - -2. **Database corruption after restart** - ```bash - # Delete corrupted PVC and restart - kubectl delete pod -n bakery-ia -l app.kubernetes.io/name=inventory-db - kubectl delete pvc -n bakery-ia inventory-db-pvc - ``` - -3. **HTTPS certificate not issued** - ```bash - # Check cert-manager logs - kubectl logs -n cert-manager deployment/cert-manager - kubectl describe certificate bakery-ia-tls-cert -n bakery-ia - ``` - -4. **Port conflicts** - ```bash - # Check what's using ports 80/443 - sudo lsof -i :80 -i :443 - ``` - -### Debug Commands -```bash -# Get cluster events -kubectl get events -n bakery-ia --sort-by='.firstTimestamp' - -# Resource usage -kubectl top pods -n bakery-ia -kubectl top nodes - -# Execute in pod -kubectl exec -n bakery-ia -it -- bash -``` - -## Cleanup - -### Quick Cleanup -```bash -# Stop Skaffold (Ctrl+C or) -skaffold delete --profile=dev -``` - -### Complete Cleanup -```bash -# Delete everything -kubectl delete namespace bakery-ia -kind delete cluster --name bakery-ia-local -colima stop --profile k8s-local -``` - -### Restart Sequence -```bash -# Post-restart startup (or use kubernetes_restart.sh script) -colima start --cpu 6 --memory 12 --disk 120 --runtime docker --profile k8s-local -kind create cluster --config kind-config.yaml -kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml -kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=300s -tilt up -``` - -## Production Deployment - -### Production URLs - -The production environment uses the following domains: - -- **Main Application**: https://bakewise.ai - - Frontend application and all public pages - - API endpoints: https://bakewise.ai/api/v1/... - -- **Monitoring Stack**: https://monitoring.bakewise.ai - - Grafana: https://monitoring.bakewise.ai/grafana - - Prometheus: https://monitoring.bakewise.ai/prometheus - - Jaeger: https://monitoring.bakewise.ai/jaeger - - AlertManager: https://monitoring.bakewise.ai/alertmanager - -### Production Configuration - -The production overlay (`overlays/prod/`) includes: -- **Domain Configuration**: bakewise.ai with Let's Encrypt certificates -- **High Availability**: Multi-replica deployments (2-3 replicas per service) -- **Enhanced Security**: Rate limiting, CORS restrictions, security headers -- **Monitoring**: Full observability stack with Prometheus, Grafana, Jaeger - -### Production Considerations - -For production deployment: - -- **Security**: Implement RBAC, network policies, pod security standards -- **Monitoring**: Deploy Prometheus, Grafana, and alerting -- **Backup**: Database backup strategies -- **High Availability**: Multi-replica deployments with anti-affinity -- **External Secrets**: Use managed secret services -- **TLS**: Production Let's Encrypt certificates -- **CI/CD**: Automated deployment pipelines -- **DNS**: Configure DNS A/CNAME records pointing to your cluster's load balancer - -## Next Steps - -1. Add comprehensive monitoring and logging -2. Implement automated testing -3. Set up CI/CD pipelines -4. Add health checks and metrics endpoints -5. Implement proper backup strategies \ No newline at end of file diff --git a/infrastructure/kubernetes/base/components/cert-manager/cert-manager.yaml b/infrastructure/kubernetes/base/components/cert-manager/cert-manager.yaml deleted file mode 100644 index af046130..00000000 --- a/infrastructure/kubernetes/base/components/cert-manager/cert-manager.yaml +++ /dev/null @@ -1,14 +0,0 @@ -apiVersion: v1 -kind: Namespace -metadata: - name: cert-manager ---- -apiVersion: v1 -kind: ServiceAccount -metadata: - name: cert-manager-webhook - namespace: cert-manager ---- -# Cert-manager installation using Helm repository -# This will be installed via kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml -# The actual installation will be done via command line, this file documents the resources \ No newline at end of file diff --git a/infrastructure/kubernetes/base/components/distribution/distribution-configmap.yaml b/infrastructure/kubernetes/base/components/distribution/distribution-configmap.yaml deleted file mode 100644 index 27c17a73..00000000 --- a/infrastructure/kubernetes/base/components/distribution/distribution-configmap.yaml +++ /dev/null @@ -1,78 +0,0 @@ -apiVersion: v1 -kind: ConfigMap -metadata: - name: distribution-service-config -data: - # Service settings - SERVICE_NAME: "distribution-service" - APP_NAME: "Bakery Distribution Service" - DESCRIPTION: "Distribution service for enterprise tier bakery management" - VERSION: "1.0.0" - - # Database settings - DB_POOL_SIZE: "10" - DB_MAX_OVERFLOW: "20" - DB_POOL_TIMEOUT: "30" - DB_POOL_RECYCLE: "3600" - DB_POOL_PRE_PING: "true" - DB_ECHO: "false" - - # Redis settings - REDIS_DB: "7" # Use separate database for distribution service - REDIS_MAX_CONNECTIONS: "50" - REDIS_RETRY_ON_TIMEOUT: "true" - REDIS_SOCKET_KEEPALIVE: "true" - - # RabbitMQ settings - RABBITMQ_EXCHANGE: "bakery_events" - RABBITMQ_QUEUE_PREFIX: "distribution" - RABBITMQ_RETRY_ATTEMPTS: "3" - RABBITMQ_RETRY_DELAY: "5" - - # Authentication settings - JWT_ALGORITHM: "HS256" - JWT_ACCESS_TOKEN_EXPIRE_MINUTES: "30" - JWT_REFRESH_TOKEN_EXPIRE_DAYS: "7" - ENABLE_SERVICE_AUTH: "true" - - # HTTP client settings - HTTP_TIMEOUT: "30" - HTTP_RETRIES: "3" - HTTP_RETRY_DELAY: "1.0" - - # CORS settings - CORS_ORIGINS: "http://localhost:3000,http://localhost:3001" - CORS_ALLOW_CREDENTIALS: "true" - CORS_ALLOW_METHODS: "GET,POST,PUT,DELETE,PATCH,OPTIONS" - CORS_ALLOW_HEADERS: "*" - - # Rate limiting - RATE_LIMIT_ENABLED: "true" - RATE_LIMIT_REQUESTS: "100" - RATE_LIMIT_WINDOW: "60" - RATE_LIMIT_BURST: "10" - - # Monitoring and observability - LOG_LEVEL: "INFO" - PROMETHEUS_ENABLED: "true" - PROMETHEUS_PORT: "9090" - JAEGER_ENABLED: "false" - JAEGER_AGENT_HOST: "jaeger-agent" - JAEGER_AGENT_PORT: "6831" - - # Health check settings - HEALTH_CHECK_TIMEOUT: "30" - HEALTH_CHECK_INTERVAL: "30" - - # Business rules - MAX_FORECAST_DAYS: "30" - MIN_HISTORICAL_DAYS: "60" - CONFIDENCE_THRESHOLD: "0.8" - - # Routing optimization settings - VRP_TIME_LIMIT_SECONDS: "30" - VRP_DEFAULT_VEHICLE_CAPACITY_KG: "1000" - VRP_AVERAGE_SPEED_KMH: "30" - - # Service-specific settings - DISTRIBUTION_SERVICE_URL: "http://distribution-service:8000" \ No newline at end of file diff --git a/infrastructure/kubernetes/base/components/distribution/distribution-service.yaml b/infrastructure/kubernetes/base/components/distribution/distribution-service.yaml deleted file mode 100644 index 5f19725e..00000000 --- a/infrastructure/kubernetes/base/components/distribution/distribution-service.yaml +++ /dev/null @@ -1,134 +0,0 @@ -apiVersion: apps/v1 -kind: Deployment -metadata: - name: distribution-service - labels: - app: distribution-service - tier: backend -spec: - replicas: 2 - selector: - matchLabels: - app: distribution-service - template: - metadata: - labels: - app: distribution-service - tier: backend - spec: - imagePullSecrets: - - name: dockerhub-creds - containers: - - name: distribution-service - image: bakery/distribution-service:latest - imagePullPolicy: Always - ports: - - containerPort: 8000 - name: http - env: - - name: DATABASE_URL - valueFrom: - secretKeyRef: - name: database-secret - key: url - - name: REDIS_URL - valueFrom: - secretKeyRef: - name: redis-secret - key: url - - name: RABBITMQ_URL - valueFrom: - secretKeyRef: - name: rabbitmq-secret - key: url - - name: JWT_SECRET_KEY - valueFrom: - secretKeyRef: - name: auth-secret - key: jwt-secret - - name: ENVIRONMENT - value: "production" - - name: LOG_LEVEL - value: "INFO" - - name: DB_POOL_SIZE - value: "10" - - name: DB_MAX_OVERFLOW - value: "20" - - name: REDIS_MAX_CONNECTIONS - value: "50" - - name: HTTP_TIMEOUT - value: "30" - - name: HTTP_RETRIES - value: "3" - # OpenTelemetry Configuration - - name: OTEL_COLLECTOR_ENDPOINT - value: "http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318" - - name: OTEL_EXPORTER_OTLP_ENDPOINT - valueFrom: - configMapKeyRef: - name: bakery-config - key: OTEL_EXPORTER_OTLP_ENDPOINT - - name: OTEL_SERVICE_NAME - value: "distribution-service" - - name: ENABLE_TRACING - value: "true" - # Logging Configuration - - name: OTEL_LOGS_EXPORTER - value: "otlp" - - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED - value: "true" - # Metrics Configuration - - name: ENABLE_OTEL_METRICS - value: "true" - - name: ENABLE_SYSTEM_METRICS - value: "true" - livenessProbe: - httpGet: - path: /health - port: 8000 - initialDelaySeconds: 30 - periodSeconds: 10 - timeoutSeconds: 5 - readinessProbe: - httpGet: - path: /health - port: 8000 - initialDelaySeconds: 5 - periodSeconds: 5 - timeoutSeconds: 3 - resources: - requests: - memory: "256Mi" - cpu: "250m" - limits: - memory: "512Mi" - cpu: "500m" - securityContext: - runAsNonRoot: true - runAsUser: 1000 - allowPrivilegeEscalation: false - readOnlyRootFilesystem: false - capabilities: - drop: - - ALL - securityContext: - runAsNonRoot: true - runAsUser: 1000 - fsGroup: 2000 ---- -apiVersion: v1 -kind: Service -metadata: - name: distribution-service - labels: - app: distribution-service - tier: backend -spec: - selector: - app.kubernetes.io/name: distribution-service - ports: - - protocol: TCP - port: 8000 - targetPort: 8000 - name: http - type: ClusterIP diff --git a/infrastructure/kubernetes/base/components/microservice-template.yaml b/infrastructure/kubernetes/base/components/microservice-template.yaml deleted file mode 100644 index 078e2a59..00000000 --- a/infrastructure/kubernetes/base/components/microservice-template.yaml +++ /dev/null @@ -1,151 +0,0 @@ -apiVersion: apps/v1 -kind: Deployment -metadata: - name: {{SERVICE_NAME}}-service - namespace: bakery-ia - labels: - app.kubernetes.io/name: {{SERVICE_NAME}}-service - app.kubernetes.io/component: microservice - app.kubernetes.io/part-of: bakery-ia -spec: - replicas: 1 - selector: - matchLabels: - app.kubernetes.io/name: {{SERVICE_NAME}}-service - app.kubernetes.io/component: microservice - template: - metadata: - labels: - app.kubernetes.io/name: {{SERVICE_NAME}}-service - app.kubernetes.io/component: microservice - spec: - containers: - - name: {{SERVICE_NAME}}-service - image: bakery/{{SERVICE_NAME}}-service:latest - ports: - - containerPort: 8000 - name: http - env: - - name: ENVIRONMENT - valueFrom: - configMapKeyRef: - name: bakery-config - key: ENVIRONMENT - - name: DEBUG - valueFrom: - configMapKeyRef: - name: bakery-config - key: DEBUG - - name: LOG_LEVEL - valueFrom: - configMapKeyRef: - name: bakery-config - key: LOG_LEVEL - - name: {{SERVICE_NAME_UPPER}}_DB_HOST - valueFrom: - configMapKeyRef: - name: bakery-config - key: {{SERVICE_NAME_UPPER}}_DB_HOST - - name: {{SERVICE_NAME_UPPER}}_DB_PORT - valueFrom: - configMapKeyRef: - name: bakery-config - key: DB_PORT - - name: {{SERVICE_NAME_UPPER}}_DB_NAME - valueFrom: - configMapKeyRef: - name: bakery-config - key: {{SERVICE_NAME_UPPER}}_DB_NAME - - name: {{SERVICE_NAME_UPPER}}_DB_USER - valueFrom: - secretKeyRef: - name: database-secrets - key: {{SERVICE_NAME_UPPER}}_DB_USER - - name: {{SERVICE_NAME_UPPER}}_DB_PASSWORD - valueFrom: - secretKeyRef: - name: database-secrets - key: {{SERVICE_NAME_UPPER}}_DB_PASSWORD - - name: REDIS_HOST - valueFrom: - configMapKeyRef: - name: bakery-config - key: REDIS_HOST - - name: REDIS_PORT - valueFrom: - configMapKeyRef: - name: bakery-config - key: REDIS_PORT - - name: REDIS_PASSWORD - valueFrom: - secretKeyRef: - name: redis-secrets - key: REDIS_PASSWORD - - name: RABBITMQ_HOST - valueFrom: - configMapKeyRef: - name: bakery-config - key: RABBITMQ_HOST - - name: RABBITMQ_PORT - valueFrom: - configMapKeyRef: - name: bakery-config - key: RABBITMQ_PORT - - name: RABBITMQ_USER - valueFrom: - secretKeyRef: - name: rabbitmq-secrets - key: RABBITMQ_USER - - name: RABBITMQ_PASSWORD - valueFrom: - secretKeyRef: - name: rabbitmq-secrets - key: RABBITMQ_PASSWORD - - name: AUTH_SERVICE_URL - valueFrom: - configMapKeyRef: - name: bakery-config - key: AUTH_SERVICE_URL - resources: - requests: - memory: "256Mi" - cpu: "100m" - limits: - memory: "512Mi" - cpu: "500m" - livenessProbe: - httpGet: - path: /health/live - port: 8000 - initialDelaySeconds: 30 - timeoutSeconds: 5 - periodSeconds: 10 - failureThreshold: 3 - readinessProbe: - httpGet: - path: /health/ready - port: 8000 - initialDelaySeconds: 15 - timeoutSeconds: 3 - periodSeconds: 5 - failureThreshold: 5 - ---- -apiVersion: v1 -kind: Service -metadata: - name: {{SERVICE_NAME}}-service - namespace: bakery-ia - labels: - app.kubernetes.io/name: {{SERVICE_NAME}}-service - app.kubernetes.io/component: microservice -spec: - type: ClusterIP - ports: - - port: 8000 - targetPort: 8000 - protocol: TCP - name: http - selector: - app.kubernetes.io/name: {{SERVICE_NAME}}-service - app.kubernetes.io/component: microservice \ No newline at end of file diff --git a/infrastructure/kubernetes/base/ingress-https.yaml b/infrastructure/kubernetes/base/ingress-https.yaml deleted file mode 100644 index 679b4597..00000000 --- a/infrastructure/kubernetes/base/ingress-https.yaml +++ /dev/null @@ -1,69 +0,0 @@ -apiVersion: networking.k8s.io/v1 -kind: Ingress -metadata: - name: bakery-ingress-https - namespace: bakery-ia - labels: - app.kubernetes.io/name: bakery-ia - app.kubernetes.io/component: ingress - annotations: - # Nginx ingress controller annotations - nginx.ingress.kubernetes.io/ssl-redirect: "true" - nginx.ingress.kubernetes.io/force-ssl-redirect: "true" - nginx.ingress.kubernetes.io/proxy-body-size: "10m" - nginx.ingress.kubernetes.io/proxy-connect-timeout: "600" - nginx.ingress.kubernetes.io/proxy-send-timeout: "3600" - nginx.ingress.kubernetes.io/proxy-read-timeout: "3600" - # SSE and WebSocket configuration for long-lived connections - nginx.ingress.kubernetes.io/proxy-buffering: "off" - nginx.ingress.kubernetes.io/proxy-http-version: "1.1" - nginx.ingress.kubernetes.io/upstream-keepalive-timeout: "3600" - # WebSocket upgrade support - nginx.ingress.kubernetes.io/websocket-services: "gateway-service" - # CORS configuration for HTTPS - nginx.ingress.kubernetes.io/enable-cors: "true" - nginx.ingress.kubernetes.io/cors-allow-origin: "https://your-domain.com" # To be overridden in overlays - nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS, PATCH" - nginx.ingress.kubernetes.io/cors-allow-headers: "Content-Type, Authorization, X-Requested-With, Accept, Origin, Cache-Control" - nginx.ingress.kubernetes.io/cors-allow-credentials: "true" - # Cert-manager annotations for automatic certificate issuance - # Using issuer appropriate for environment - cert-manager.io/cluster-issuer: "letsencrypt-prod" # To be overridden in dev overlay -spec: - ingressClassName: nginx - tls: - - hosts: - - your-domain.com # To be overridden in overlays - secretName: bakery-tls-cert # To be overridden in overlays - rules: - - host: your-domain.com # To be overridden in overlays - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: frontend-service - port: - number: 3000 - - path: /api - pathType: Prefix - backend: - service: - name: gateway-service - port: - number: 8000 - - host: api.your-domain.com # To be overridden in overlays - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: gateway-service - port: - number: 8000 - # Note: SigNoz monitoring is deployed via Helm in the 'signoz' namespace - # SigNoz creates its own Ingress via Helm chart configuration - # Access at: https://monitoring.your-domain.com/ (configured in signoz-values.yaml) - # SignOz ingress is managed separately - no need to configure here \ No newline at end of file diff --git a/infrastructure/kubernetes/base/kustomization.yaml b/infrastructure/kubernetes/base/kustomization.yaml deleted file mode 100644 index 3afb6c87..00000000 --- a/infrastructure/kubernetes/base/kustomization.yaml +++ /dev/null @@ -1,181 +0,0 @@ -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization - -metadata: - name: bakery-ia-base - -resources: - # Base configuration - - namespace.yaml - - configmap.yaml - - secrets.yaml - - ingress-https.yaml - - # TLS configuration - - configmaps/postgres-logging-config.yaml - - secrets/postgres-tls-secret.yaml - - secrets/redis-tls-secret.yaml - - # Additional configs - - configs/postgres-init-config.yaml - - # MinIO Storage (with TLS) - - components/minio/minio-secrets.yaml - - secrets/minio-tls-secret.yaml - - components/minio/minio-pvc.yaml - - components/minio/minio-deployment.yaml - - jobs/minio-bucket-init-job.yaml - - # Migration jobs - - migrations/auth-migration-job.yaml - - migrations/tenant-migration-job.yaml - # Note: tenant-seed-pilot-coupon-job.yaml removed - pilot coupon is now seeded - # automatically during tenant-service startup (see app/jobs/startup_seeder.py) - - migrations/training-migration-job.yaml - - migrations/forecasting-migration-job.yaml - - migrations/sales-migration-job.yaml - - migrations/external-migration-job.yaml - - migrations/notification-migration-job.yaml - - migrations/inventory-migration-job.yaml - - migrations/recipes-migration-job.yaml - - migrations/suppliers-migration-job.yaml - - migrations/pos-migration-job.yaml - - migrations/orders-migration-job.yaml - - migrations/production-migration-job.yaml - - migrations/alert-processor-migration-job.yaml - - migrations/demo-session-migration-job.yaml - - migrations/procurement-migration-job.yaml - - migrations/orchestrator-migration-job.yaml - - migrations/ai-insights-migration-job.yaml - - migrations/distribution-migration-job.yaml - - migrations/demo-seed-rbac.yaml - - # External data initialization job (v2.0) - - jobs/external-data-init-job.yaml - - # CronJobs - - cronjobs/demo-cleanup-cronjob.yaml - - cronjobs/external-data-rotation-cronjob.yaml - - # Infrastructure components - - components/databases/redis.yaml - - components/databases/rabbitmq.yaml - - components/infrastructure/gateway-service.yaml - - # Distribution service - - components/distribution/distribution-deployment.yaml - - components/distribution/distribution-configmap.yaml - - # Nominatim geocoding service - - components/nominatim/nominatim.yaml - - jobs/nominatim-init-job.yaml - - # Cert manager cluster issuers - - components/cert-manager/cluster-issuer-staging.yaml - - components/cert-manager/local-ca-issuer.yaml - - # Database services - - components/databases/auth-db.yaml - - components/databases/tenant-db.yaml - - components/databases/training-db.yaml - - components/databases/forecasting-db.yaml - - components/databases/sales-db.yaml - - components/databases/external-db.yaml - - components/databases/notification-db.yaml - - components/databases/inventory-db.yaml - - components/databases/recipes-db.yaml - - components/databases/suppliers-db.yaml - - components/databases/pos-db.yaml - - components/databases/orders-db.yaml - - components/databases/production-db.yaml - - components/databases/procurement-db.yaml - - components/databases/orchestrator-db.yaml - - components/databases/alert-processor-db.yaml - - components/databases/ai-insights-db.yaml - - components/databases/distribution-db.yaml - - # Demo session components - - components/demo-session/database.yaml - - components/demo-session/rbac.yaml - - components/demo-session/service.yaml - - components/demo-session/deployment.yaml - - # Demo cleanup worker (background job processor) - - deployments/demo-cleanup-worker.yaml - - # Microservices - - components/auth/auth-service.yaml - - components/tenant/tenant-service.yaml - - components/training/training-service.yaml - - components/forecasting/forecasting-service.yaml - - components/sales/sales-service.yaml - - components/external/external-service.yaml - - components/notification/notification-service.yaml - - components/inventory/inventory-service.yaml - - components/recipes/recipes-service.yaml - - components/suppliers/suppliers-service.yaml - - components/pos/pos-service.yaml - - components/orders/orders-service.yaml - - components/production/production-service.yaml - - components/procurement/procurement-service.yaml - - components/orchestrator/orchestrator-service.yaml - - components/alert-processor/alert-processor.yaml - - components/ai-insights/ai-insights-service.yaml - - # Frontend - - components/frontend/frontend-service.yaml - - # HorizontalPodAutoscalers (for production autoscaling) - - components/hpa/orders-hpa.yaml - - components/hpa/forecasting-hpa.yaml - - components/hpa/notification-hpa.yaml - -labels: - - includeSelectors: true - pairs: - app.kubernetes.io/part-of: bakery-ia - app.kubernetes.io/managed-by: kustomize - -images: - - name: bakery/auth-service - newTag: latest - - name: bakery/tenant-service - newTag: latest - - name: bakery/training-service - newTag: latest - - name: bakery/forecasting-service - newTag: latest - - name: bakery/sales-service - newTag: latest - - name: bakery/external-service - newTag: latest - - name: bakery/notification-service - newTag: latest - - name: bakery/inventory-service - newTag: latest - - name: bakery/recipes-service - newTag: latest - - name: bakery/suppliers-service - newTag: latest - - name: bakery/pos-service - newTag: latest - - name: bakery/orders-service - newTag: latest - - name: bakery/production-service - newTag: latest - - name: bakery/procurement-service - newTag: latest - - name: bakery/orchestrator-service - newTag: latest - - name: bakery/alert-processor - newTag: latest - - name: bakery/ai-insights-service - newTag: latest - - name: bakery/demo-session-service - newTag: latest - - name: bakery/gateway - newTag: latest - - name: bakery/dashboard - newTag: latest - - name: bakery/distribution-service - newTag: latest diff --git a/infrastructure/kubernetes/encryption/encryption-config.yaml b/infrastructure/kubernetes/encryption/encryption-config.yaml deleted file mode 100644 index b20f217f..00000000 --- a/infrastructure/kubernetes/encryption/encryption-config.yaml +++ /dev/null @@ -1,11 +0,0 @@ -apiVersion: apiserver.config.k8s.io/v1 -kind: EncryptionConfiguration -resources: - - resources: - - secrets - providers: - - aescbc: - keys: - - name: key1 - secret: 2eAEevJmGb+y0bPzYhc4qCpqUa3r5M5Kduch1b4olHE= - - identity: {} diff --git a/infrastructure/kubernetes/overlays/dev/kustomization.yaml b/infrastructure/kubernetes/overlays/dev/kustomization.yaml deleted file mode 100644 index 361148f5..00000000 --- a/infrastructure/kubernetes/overlays/dev/kustomization.yaml +++ /dev/null @@ -1,699 +0,0 @@ -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization - -metadata: - name: bakery-ia-dev - -# Note: Removed global namespace to prevent monitoring namespace conflict -# All base resources already have namespace: bakery-ia defined - -resources: - - ../../base - - dev-ingress.yaml - # SigNoz is managed via Helm deployment (see Tiltfile signoz-deploy) - # Monitoring is handled by SigNoz (no separate monitoring components needed) - # Dev-Prod Parity: Enable HTTPS with self-signed certificates - - dev-certificate.yaml - # SigNoz paths are now included in the main ingress (ingress-https.yaml) - -# Exclude nominatim from dev to save resources -# Using scale to 0 for StatefulSet to prevent pod creation -patches: - # Override specific ConfigMap values for development - - target: - kind: ConfigMap - name: bakery-config - patch: |- - - op: replace - path: /data/ENVIRONMENT - value: "development" - - op: replace - path: /data/DEBUG - value: "true" - - op: replace - path: /data/LOG_LEVEL - value: "DEBUG" - - op: replace - path: /data/AUTO_RELOAD - value: "true" - - op: replace - path: /data/PROFILING_ENABLED - value: "true" - - op: replace - path: /data/MOCK_EXTERNAL_APIS - value: "false" - - op: replace - path: /data/TESTING - value: "false" - - op: replace - path: /data/DOMAIN - value: "localhost" - - op: replace - path: /data/API_DOCS_ENABLED - value: "true" - - op: replace - path: /data/CORS_ORIGINS - value: "http://frontend-service:3000,http://localhost:3000,http://localhost:3001,http://localhost,http://127.0.0.1:3000,http://127.0.0.1:3001,http://bakery-ia.local,https://localhost,https://127.0.0.1" - - op: replace - path: /data/VITE_ENVIRONMENT - value: "development" - - op: replace - path: /data/VITE_API_URL - value: "/api" - - op: replace - path: /data/STRIPE_PUBLISHABLE_KEY - value: "pk_test_your_stripe_publishable_key_here" - - op: replace - path: /data/SQUARE_ENVIRONMENT - value: "sandbox" - - op: replace - path: /data/TOAST_ENVIRONMENT - value: "sandbox" - - op: replace - path: /data/LIGHTSPEED_ENVIRONMENT - value: "sandbox" - - op: replace - path: /data/RATE_LIMIT_ENABLED - value: "true" # Changed from false for dev-prod parity - - op: add - path: /data/RATE_LIMIT_PER_MINUTE - value: "1000" # High limit for development (prod: 60) - - op: replace - path: /data/DB_FORCE_RECREATE - value: "false" - - op: add - path: /data/DEVELOPMENT_MODE - value: "true" - - op: add - path: /data/DEBUG_LOGGING - value: "true" - - op: add - path: /data/SKIP_MIGRATION_VERSION_CHECK - value: "false" - - target: - kind: StatefulSet - name: nominatim - patch: |- - - op: replace - path: /spec/replicas - value: 0 - # Suspend nominatim-init job in dev (not needed when nominatim is scaled to 0) - - target: - kind: Job - name: nominatim-init - patch: |- - - op: replace - path: /spec/suspend - value: true - - target: - group: apps - version: v1 - kind: Deployment - name: auth-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: redis - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: rabbitmq - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "128Mi" - cpu: "100m" - limits: - memory: "256Mi" - cpu: "300m" - - target: - group: apps - version: v1 - kind: Deployment - name: auth-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "128Mi" - cpu: "50m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: frontend - patch: |- - - op: replace - path: /spec/template/spec/containers/0/imagePullPolicy - value: Never - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "512Mi" - cpu: "200m" - limits: - memory: "1Gi" - cpu: "1000m" - - target: - group: apps - version: v1 - kind: Deployment - name: gateway - patch: |- - - op: replace - path: /spec/template/spec/containers/0/imagePullPolicy - value: Never - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "128Mi" - cpu: "100m" - - target: - group: apps - version: v1 - kind: Deployment - name: alert-processor - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - # Database patches - - target: - group: apps - version: v1 - kind: Deployment - name: external-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: forecasting-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: inventory-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: notification-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: orders-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: pos-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: production-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: recipes-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: sales-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: suppliers-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: tenant-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: training-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: ai-insights-db - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - # Service patches - - target: - group: apps - version: v1 - kind: Deployment - name: external-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: forecasting-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: inventory-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: notification-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: orders-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: pos-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: production-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: recipes-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: sales-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: suppliers-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: tenant-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "64Mi" - cpu: "25m" - limits: - memory: "256Mi" - cpu: "200m" - - target: - group: apps - version: v1 - kind: Deployment - name: training-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "256Mi" - cpu: "100m" - limits: - memory: "1Gi" - cpu: "500m" - - target: - group: apps - version: v1 - kind: Deployment - name: ai-insights-service - patch: |- - - op: replace - path: /spec/template/spec/containers/0/resources - value: - requests: - memory: "128Mi" - cpu: "50m" - limits: - memory: "512Mi" - cpu: "300m" - -secretGenerator: - - name: dev-secrets - literals: - - DEV_MODE=true - -labels: - - includeSelectors: true - pairs: - environment: development - tier: local - -images: - - name: bakery/auth-service - newTag: dev - - name: bakery/tenant-service - newTag: dev - - name: bakery/training-service - newTag: dev - - name: bakery/forecasting-service - newTag: dev - - name: bakery/sales-service - newTag: dev - - name: bakery/external-service - newTag: dev - - name: bakery/notification-service - newTag: dev - - name: bakery/inventory-service - newTag: dev - - name: bakery/recipes-service - newTag: dev - - name: bakery/suppliers-service - newTag: dev - - name: bakery/pos-service - newTag: dev - - name: bakery/orders-service - newTag: dev - - name: bakery/production-service - newTag: dev - - name: bakery/alert-processor - newTag: dev - - name: bakery/ai-insights-service - newTag: dev - - name: bakery/demo-session-service - newTag: dev - - name: bakery/gateway - newTag: dev - - name: bakery/dashboard - newTag: dev - -replicas: - # Dev-Prod Parity: Run 2 replicas of critical services - # This helps catch load balancing, session management, and race condition issues - - name: auth-service - count: 2 # Increased from 1 for dev-prod parity - - name: tenant-service - count: 1 - - name: training-service - count: 2 # Safe with MinIO storage - - name: forecasting-service - count: 1 - - name: sales-service - count: 1 - - name: external-service - count: 1 - - name: notification-service - count: 1 - - name: inventory-service - count: 1 - - name: recipes-service - count: 1 - - name: suppliers-service - count: 1 - - name: pos-service - count: 1 - - name: orders-service - count: 1 - - name: production-service - count: 1 - - name: alert-processor - count: 1 - - name: ai-insights-service - count: 1 - - name: demo-session-service - count: 1 - - name: gateway - count: 2 # Increased from 1 for dev-prod parity - - name: frontend - count: 1 diff --git a/infrastructure/kubernetes/overlays/prod/prod-ingress.yaml b/infrastructure/kubernetes/overlays/prod/prod-ingress.yaml deleted file mode 100644 index 378beca7..00000000 --- a/infrastructure/kubernetes/overlays/prod/prod-ingress.yaml +++ /dev/null @@ -1,74 +0,0 @@ -apiVersion: networking.k8s.io/v1 -kind: Ingress -metadata: - name: bakery-ingress-prod - labels: - app.kubernetes.io/name: bakery-ia - app.kubernetes.io/component: ingress - annotations: - # Nginx ingress controller annotations - nginx.ingress.kubernetes.io/ssl-redirect: "true" - nginx.ingress.kubernetes.io/force-ssl-redirect: "true" - nginx.ingress.kubernetes.io/proxy-body-size: "10m" - nginx.ingress.kubernetes.io/proxy-connect-timeout: "600" - nginx.ingress.kubernetes.io/proxy-send-timeout: "600" - nginx.ingress.kubernetes.io/proxy-read-timeout: "600" - - # CORS configuration for production - nginx.ingress.kubernetes.io/enable-cors: "true" - nginx.ingress.kubernetes.io/cors-allow-origin: "https://bakewise.ai" - nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS, PATCH" - nginx.ingress.kubernetes.io/cors-allow-headers: "Content-Type, Authorization, X-Requested-With, Accept, Origin" - nginx.ingress.kubernetes.io/cors-allow-credentials: "true" - - # Security headers - nginx.ingress.kubernetes.io/configuration-snippet: | - more_set_headers "X-Frame-Options: DENY"; - more_set_headers "X-Content-Type-Options: nosniff"; - more_set_headers "X-XSS-Protection: 1; mode=block"; - more_set_headers "Referrer-Policy: strict-origin-when-cross-origin"; - - # Rate limiting - nginx.ingress.kubernetes.io/limit-rps: "100" - nginx.ingress.kubernetes.io/limit-connections: "50" - - # Cert-manager annotations for automatic certificate issuance - cert-manager.io/cluster-issuer: "letsencrypt-production" - cert-manager.io/acme-challenge-type: http01 - -spec: - ingressClassName: nginx - tls: - - hosts: - - bakewise.ai - - monitoring.bakewise.ai - secretName: bakery-ia-prod-tls-cert - rules: - - host: bakewise.ai - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: frontend-service - port: - number: 3000 - - path: /api/v1 - pathType: Prefix - backend: - service: - name: gateway-service - port: - number: 8000 - # SigNoz Monitoring on subdomain (deployed via Helm in bakery-ia namespace) - - host: monitoring.bakewise.ai - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: signoz - port: - number: 8080 diff --git a/infrastructure/kubernetes/signoz-values.yaml b/infrastructure/kubernetes/signoz-values.yaml deleted file mode 100644 index 70aaacc1..00000000 --- a/infrastructure/kubernetes/signoz-values.yaml +++ /dev/null @@ -1,79 +0,0 @@ -# SigNoz Helm Chart Values - Customized for Bakery IA -# https://github.com/SigNoz/charts - -# Global settings -global: - storageClass: "standard" - -# Frontend configuration -frontend: - service: - type: ClusterIP - port: 3301 - ingress: - enabled: true - hosts: - - host: localhost - paths: - - path: /signoz - pathType: Prefix - annotations: - nginx.ingress.kubernetes.io/rewrite-target: /$2 - -# Query Service configuration -queryService: - replicaCount: 1 - resources: - requests: - cpu: 100m - memory: 256Mi - limits: - cpu: 200m - memory: 512Mi - -# AlertManager configuration -alertmanager: - replicaCount: 1 - resources: - requests: - cpu: 50m - memory: 128Mi - limits: - cpu: 100m - memory: 256Mi - -# ClickHouse configuration -clickhouse: - persistence: - enabled: true - size: 10Gi - resources: - requests: - cpu: 500m - memory: 1Gi - limits: - cpu: 1000m - memory: 2Gi - -# OpenTelemetry Collector configuration -otelCollector: - enabled: true - config: - exporters: - otlp: - endpoint: "signoz-query-service:8080" - service: - pipelines: - traces: - receivers: [otlp] - exporters: [otlp] - metrics: - receivers: [otlp] - exporters: [otlp] - logs: - receivers: [otlp] - exporters: [otlp] - -# Resource optimization for development -# These can be increased for production -development: true \ No newline at end of file diff --git a/infrastructure/helm/README.md b/infrastructure/monitoring/signoz/README.md similarity index 84% rename from infrastructure/helm/README.md rename to infrastructure/monitoring/signoz/README.md index fb64cac2..e4064498 100644 --- a/infrastructure/helm/README.md +++ b/infrastructure/monitoring/signoz/README.md @@ -349,18 +349,20 @@ podDisruptionBudget: ## Monitoring and Alerting ### Email Alerts (Production) -Configure SMTP in production values: +Configure SMTP in production values (using Mailu with Mailgun relay): ```yaml signoz: env: signoz_smtp_enabled: "true" - signoz_smtp_host: "smtp.gmail.com" + signoz_smtp_host: "mailu-smtp.bakery-ia.svc.cluster.local" signoz_smtp_port: "587" signoz_smtp_from: "alerts@bakewise.ai" signoz_smtp_username: "alerts@bakewise.ai" # Set via secret: signoz_smtp_password ``` +**Note**: Signoz now uses the internal Mailu SMTP service, which relays to Mailgun for better deliverability and centralized email management. + ### Slack Alerts (Production) Configure webhook in Alertmanager: ```yaml @@ -373,6 +375,69 @@ alertmanager: channel: '#alerts-critical' ``` +### Mailgun Integration for Alert Emails + +Signoz has been configured to use Mailgun for sending alert emails through the Mailu SMTP service. This provides: + +**Benefits:** +- Better email deliverability through Mailgun's infrastructure +- Centralized email management via Mailu +- Improved tracking and analytics for alert emails +- Compliance with email sending best practices + +**Architecture:** +``` +Signoz Alertmanager → Mailu SMTP → Mailgun Relay → Recipients +``` + +**Configuration Requirements:** + +1. **Mailu Configuration** (`infrastructure/platform/mail/mailu/mailu-configmap.yaml`): + ```yaml + RELAYHOST: "smtp.mailgun.org:587" + RELAY_LOGIN: "postmaster@bakewise.ai" + ``` + +2. **Mailu Secrets** (`infrastructure/platform/mail/mailu/mailu-secrets.yaml`): + ```yaml + RELAY_PASSWORD: "" # Base64 encoded Mailgun API key + ``` + +3. **DNS Configuration** (required for Mailgun): + ``` + # MX record + bakewise.ai. IN MX 10 mail.bakewise.ai. + + # SPF record (authorize Mailgun) + bakewise.ai. IN TXT "v=spf1 include:mailgun.org ~all" + + # DKIM record (provided by Mailgun) + m1._domainkey.bakewise.ai. IN TXT "v=DKIM1; k=rsa; p=" + + # DMARC record + _dmarc.bakewise.ai. IN TXT "v=DMARC1; p=quarantine; rua=mailto:dmarc@bakewise.ai" + ``` + +4. **Signoz SMTP Configuration** (already configured in `signoz-values-prod.yaml`): + ```yaml + signoz_smtp_host: "mailu-smtp.bakery-ia.svc.cluster.local" + signoz_smtp_port: "587" + signoz_smtp_from: "alerts@bakewise.ai" + ``` + +**Testing the Integration:** + +1. Trigger a test alert from Signoz UI +2. Check Mailu logs: `kubectl logs -f mailu-smtp- -n bakery-ia` +3. Check Mailgun dashboard for delivery status +4. Verify email receipt in destination inbox + +**Troubleshooting:** + +- **SMTP Authentication Failed**: Verify Mailu credentials and Mailgun API key +- **Email Delivery Delays**: Check Mailu queue with `kubectl exec -it mailu-smtp- -n bakery-ia -- mailq` +- **SPF/DKIM Issues**: Verify DNS records and Mailgun domain verification + ### Self-Monitoring SigNoz monitors itself: ```yaml diff --git a/infrastructure/signoz/dashboards/README.md b/infrastructure/monitoring/signoz/dashboards/README.md similarity index 100% rename from infrastructure/signoz/dashboards/README.md rename to infrastructure/monitoring/signoz/dashboards/README.md diff --git a/infrastructure/signoz/dashboards/alert-management.json b/infrastructure/monitoring/signoz/dashboards/alert-management.json similarity index 100% rename from infrastructure/signoz/dashboards/alert-management.json rename to infrastructure/monitoring/signoz/dashboards/alert-management.json diff --git a/infrastructure/signoz/dashboards/api-performance.json b/infrastructure/monitoring/signoz/dashboards/api-performance.json similarity index 100% rename from infrastructure/signoz/dashboards/api-performance.json rename to infrastructure/monitoring/signoz/dashboards/api-performance.json diff --git a/infrastructure/signoz/dashboards/application-performance.json b/infrastructure/monitoring/signoz/dashboards/application-performance.json similarity index 100% rename from infrastructure/signoz/dashboards/application-performance.json rename to infrastructure/monitoring/signoz/dashboards/application-performance.json diff --git a/infrastructure/signoz/dashboards/database-performance.json b/infrastructure/monitoring/signoz/dashboards/database-performance.json similarity index 100% rename from infrastructure/signoz/dashboards/database-performance.json rename to infrastructure/monitoring/signoz/dashboards/database-performance.json diff --git a/infrastructure/signoz/dashboards/error-tracking.json b/infrastructure/monitoring/signoz/dashboards/error-tracking.json similarity index 100% rename from infrastructure/signoz/dashboards/error-tracking.json rename to infrastructure/monitoring/signoz/dashboards/error-tracking.json diff --git a/infrastructure/signoz/dashboards/index.json b/infrastructure/monitoring/signoz/dashboards/index.json similarity index 100% rename from infrastructure/signoz/dashboards/index.json rename to infrastructure/monitoring/signoz/dashboards/index.json diff --git a/infrastructure/signoz/dashboards/infrastructure-monitoring.json b/infrastructure/monitoring/signoz/dashboards/infrastructure-monitoring.json similarity index 100% rename from infrastructure/signoz/dashboards/infrastructure-monitoring.json rename to infrastructure/monitoring/signoz/dashboards/infrastructure-monitoring.json diff --git a/infrastructure/signoz/dashboards/log-analysis.json b/infrastructure/monitoring/signoz/dashboards/log-analysis.json similarity index 100% rename from infrastructure/signoz/dashboards/log-analysis.json rename to infrastructure/monitoring/signoz/dashboards/log-analysis.json diff --git a/infrastructure/signoz/dashboards/system-health.json b/infrastructure/monitoring/signoz/dashboards/system-health.json similarity index 100% rename from infrastructure/signoz/dashboards/system-health.json rename to infrastructure/monitoring/signoz/dashboards/system-health.json diff --git a/infrastructure/signoz/dashboards/user-activity.json b/infrastructure/monitoring/signoz/dashboards/user-activity.json similarity index 100% rename from infrastructure/signoz/dashboards/user-activity.json rename to infrastructure/monitoring/signoz/dashboards/user-activity.json diff --git a/infrastructure/helm/deploy-signoz.sh b/infrastructure/monitoring/signoz/deploy-signoz.sh similarity index 100% rename from infrastructure/helm/deploy-signoz.sh rename to infrastructure/monitoring/signoz/deploy-signoz.sh diff --git a/infrastructure/helm/generate-test-traffic.sh b/infrastructure/monitoring/signoz/generate-test-traffic.sh similarity index 100% rename from infrastructure/helm/generate-test-traffic.sh rename to infrastructure/monitoring/signoz/generate-test-traffic.sh diff --git a/infrastructure/signoz/import-dashboards.sh b/infrastructure/monitoring/signoz/import-dashboards.sh similarity index 100% rename from infrastructure/signoz/import-dashboards.sh rename to infrastructure/monitoring/signoz/import-dashboards.sh diff --git a/infrastructure/helm/signoz-values-dev.yaml b/infrastructure/monitoring/signoz/signoz-values-dev.yaml similarity index 100% rename from infrastructure/helm/signoz-values-dev.yaml rename to infrastructure/monitoring/signoz/signoz-values-dev.yaml diff --git a/infrastructure/helm/signoz-values-prod.yaml b/infrastructure/monitoring/signoz/signoz-values-prod.yaml similarity index 99% rename from infrastructure/helm/signoz-values-prod.yaml rename to infrastructure/monitoring/signoz/signoz-values-prod.yaml index b5b95afc..bd9e2add 100644 --- a/infrastructure/helm/signoz-values-prod.yaml +++ b/infrastructure/monitoring/signoz/signoz-values-prod.yaml @@ -71,9 +71,9 @@ signoz: # Only enable if you have a stable OpAMP backend server signoz_opamp_server_enabled: "false" # signoz_opamp_server_endpoint: "0.0.0.0:4320" - # SMTP configuration for email alerts + # SMTP configuration for email alerts - now using Mailu as SMTP server signoz_smtp_enabled: "true" - signoz_smtp_host: "smtp.gmail.com" + signoz_smtp_host: "email-smtp.bakery-ia.svc.cluster.local" signoz_smtp_port: "587" signoz_smtp_from: "alerts@bakewise.ai" signoz_smtp_username: "alerts@bakewise.ai" @@ -136,7 +136,7 @@ alertmanager: config: global: resolve_timeout: 5m - smtp_smarthost: 'smtp.gmail.com:587' + smtp_smarthost: 'email-smtp.bakery-ia.svc.cluster.local:587' smtp_from: 'alerts@bakewise.ai' smtp_auth_username: 'alerts@bakewise.ai' smtp_auth_password: '${SMTP_PASSWORD}' diff --git a/infrastructure/helm/verify-signoz-telemetry.sh b/infrastructure/monitoring/signoz/verify-signoz-telemetry.sh similarity index 100% rename from infrastructure/helm/verify-signoz-telemetry.sh rename to infrastructure/monitoring/signoz/verify-signoz-telemetry.sh diff --git a/infrastructure/helm/verify-signoz.sh b/infrastructure/monitoring/signoz/verify-signoz.sh similarity index 100% rename from infrastructure/helm/verify-signoz.sh rename to infrastructure/monitoring/signoz/verify-signoz.sh diff --git a/infrastructure/kubernetes/base/namespace.yaml b/infrastructure/namespaces/bakery-ia.yaml similarity index 100% rename from infrastructure/kubernetes/base/namespace.yaml rename to infrastructure/namespaces/bakery-ia.yaml diff --git a/infrastructure/namespaces/flux-system.yaml b/infrastructure/namespaces/flux-system.yaml new file mode 100644 index 00000000..3df5590a --- /dev/null +++ b/infrastructure/namespaces/flux-system.yaml @@ -0,0 +1,15 @@ +# Flux System Namespace +# This namespace is required for Flux CD components +# It should be created before any Flux resources are applied + +apiVersion: v1 +kind: Namespace +metadata: + name: flux-system + labels: + app.kubernetes.io/name: flux + app.kubernetes.io/component: system + kubernetes.io/metadata.name: flux-system + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/audit: restricted + pod-security.kubernetes.io/warn: restricted \ No newline at end of file diff --git a/infrastructure/namespaces/kustomization.yaml b/infrastructure/namespaces/kustomization.yaml new file mode 100644 index 00000000..cca70e9b --- /dev/null +++ b/infrastructure/namespaces/kustomization.yaml @@ -0,0 +1,7 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - bakery-ia.yaml + - tekton-pipelines.yaml + - flux-system.yaml \ No newline at end of file diff --git a/infrastructure/namespaces/tekton-pipelines.yaml b/infrastructure/namespaces/tekton-pipelines.yaml new file mode 100644 index 00000000..a003c9e8 --- /dev/null +++ b/infrastructure/namespaces/tekton-pipelines.yaml @@ -0,0 +1,11 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: tekton-pipelines + labels: + app.kubernetes.io/name: tekton + app.kubernetes.io/component: pipelines + kubernetes.io/metadata.name: tekton-pipelines + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/audit: restricted + pod-security.kubernetes.io/warn: restricted \ No newline at end of file diff --git a/infrastructure/kubernetes/base/components/cert-manager/local-ca-issuer.yaml b/infrastructure/platform/cert-manager/ca-root-certificate.yaml similarity index 76% rename from infrastructure/kubernetes/base/components/cert-manager/local-ca-issuer.yaml rename to infrastructure/platform/cert-manager/ca-root-certificate.yaml index 0ba198f6..38a96e43 100644 --- a/infrastructure/kubernetes/base/components/cert-manager/local-ca-issuer.yaml +++ b/infrastructure/platform/cert-manager/ca-root-certificate.yaml @@ -1,17 +1,10 @@ -apiVersion: cert-manager.io/v1 -kind: ClusterIssuer -metadata: - name: local-ca-issuer -spec: - ca: - secretName: local-ca-key-pair ---- # Create a root CA certificate for local development +# NOTE: This certificate must be ready before the local-ca-issuer can be used apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: local-ca-cert - namespace: cert-manager + namespace: cert-manager # This ensures the secret is created in the cert-manager namespace spec: isCA: true commonName: bakery-ia-local-ca diff --git a/infrastructure/platform/cert-manager/cert-manager.yaml b/infrastructure/platform/cert-manager/cert-manager.yaml new file mode 100644 index 00000000..5f555a88 --- /dev/null +++ b/infrastructure/platform/cert-manager/cert-manager.yaml @@ -0,0 +1,23 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: cert-manager +--- +# NOTE: Do NOT define cert-manager ServiceAccounts here! +# The ServiceAccounts (cert-manager, cert-manager-cainjector, cert-manager-webhook) +# are created by the upstream cert-manager installation (kubernetes_restart.sh). +# Redefining them here would strip their RBAC bindings and break authentication. +--- +# Self-signed ClusterIssuer for bootstrapping the CA certificate chain +# This issuer is used to create the root CA certificate which then +# becomes the issuer for all other certificates in the cluster +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: selfsigned-issuer +spec: + selfSigned: {} +--- +# Cert-manager installation using Helm repository +# This will be installed via kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml +# The actual installation will be done via command line, this file documents the resources \ No newline at end of file diff --git a/infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml b/infrastructure/platform/cert-manager/cluster-issuer-production.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml rename to infrastructure/platform/cert-manager/cluster-issuer-production.yaml diff --git a/infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-staging.yaml b/infrastructure/platform/cert-manager/cluster-issuer-staging.yaml similarity index 85% rename from infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-staging.yaml rename to infrastructure/platform/cert-manager/cluster-issuer-staging.yaml index ebe09a76..8ea556e7 100644 --- a/infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-staging.yaml +++ b/infrastructure/platform/cert-manager/cluster-issuer-staging.yaml @@ -1,10 +1,5 @@ -apiVersion: cert-manager.io/v1 -kind: ClusterIssuer -metadata: - name: selfsigned-issuer -spec: - selfSigned: {} ---- +# Let's Encrypt Staging ClusterIssuer +# Use this for testing before switching to production apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: diff --git a/infrastructure/platform/cert-manager/kustomization.yaml b/infrastructure/platform/cert-manager/kustomization.yaml new file mode 100644 index 00000000..58801313 --- /dev/null +++ b/infrastructure/platform/cert-manager/kustomization.yaml @@ -0,0 +1,9 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - cert-manager.yaml + - ca-root-certificate.yaml + - local-ca-issuer.yaml + - cluster-issuer-staging.yaml + - cluster-issuer-production.yaml diff --git a/infrastructure/platform/cert-manager/local-ca-issuer.yaml b/infrastructure/platform/cert-manager/local-ca-issuer.yaml new file mode 100644 index 00000000..7c7c2f7b --- /dev/null +++ b/infrastructure/platform/cert-manager/local-ca-issuer.yaml @@ -0,0 +1,7 @@ +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: local-ca-issuer +spec: + ca: + secretName: local-ca-key-pair \ No newline at end of file diff --git a/infrastructure/kubernetes/base/components/hpa/forecasting-hpa.yaml b/infrastructure/platform/hpa/forecasting-hpa.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/hpa/forecasting-hpa.yaml rename to infrastructure/platform/hpa/forecasting-hpa.yaml diff --git a/infrastructure/kubernetes/base/components/hpa/notification-hpa.yaml b/infrastructure/platform/hpa/notification-hpa.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/hpa/notification-hpa.yaml rename to infrastructure/platform/hpa/notification-hpa.yaml diff --git a/infrastructure/kubernetes/base/components/hpa/orders-hpa.yaml b/infrastructure/platform/hpa/orders-hpa.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/hpa/orders-hpa.yaml rename to infrastructure/platform/hpa/orders-hpa.yaml diff --git a/infrastructure/kubernetes/base/components/infrastructure/gateway-service.yaml b/infrastructure/platform/infrastructure/gateway-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/infrastructure/gateway-service.yaml rename to infrastructure/platform/infrastructure/gateway-service.yaml diff --git a/infrastructure/platform/infrastructure/kustomization.yaml b/infrastructure/platform/infrastructure/kustomization.yaml new file mode 100644 index 00000000..bd5f9766 --- /dev/null +++ b/infrastructure/platform/infrastructure/kustomization.yaml @@ -0,0 +1,7 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - gateway-service.yaml + - nominatim/nominatim.yaml + - nominatim/nominatim-init-job.yaml diff --git a/infrastructure/kubernetes/base/jobs/nominatim-init-job.yaml b/infrastructure/platform/infrastructure/nominatim/nominatim-init-job.yaml similarity index 100% rename from infrastructure/kubernetes/base/jobs/nominatim-init-job.yaml rename to infrastructure/platform/infrastructure/nominatim/nominatim-init-job.yaml diff --git a/infrastructure/kubernetes/base/components/nominatim/nominatim.yaml b/infrastructure/platform/infrastructure/nominatim/nominatim.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/nominatim/nominatim.yaml rename to infrastructure/platform/infrastructure/nominatim/nominatim.yaml diff --git a/infrastructure/platform/mail/mailu/README.md b/infrastructure/platform/mail/mailu/README.md new file mode 100644 index 00000000..550e56b1 --- /dev/null +++ b/infrastructure/platform/mail/mailu/README.md @@ -0,0 +1,289 @@ +# Mailu Email Infrastructure for Bakery-IA + +This directory contains the Kubernetes deployment configuration for Mailu, a self-hosted email solution that integrates with external SMTP relays for optimal deliverability. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Kubernetes Cluster (bakery-ia) │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ +│ │ notification- │ │ mail-service │ │ frontend │ │ +│ │ service │─────▶│ (new/optional) │ │ │ │ +│ │ │ │ Queue & Routing │ │ │ │ +│ └────────┬─────────┘ └────────┬─────────┘ └──────────────────┘ │ +│ │ │ │ +│ │ SMTP (port 587) │ SMTP (port 587) │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ MAILU STACK │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ front │ │ admin │ │ smtp │ │ imap │ │ │ +│ │ │ (nginx) │ │ (webmail) │ │ (postfix) │ │ (dovecot) │ │ │ +│ │ │ :80/:443 │ │ :8080 │ │ :25/:587 │ │ :993/:143 │ │ │ +│ │ └─────────────┘ └─────────────┘ └──────┬──────┘ └─────────────┘ │ │ +│ │ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ │ Relay │ │ +│ │ │ antispam │ │ antivirus │ │ │ │ +│ │ │ (rspamd) │ │ (clamav) │ │ │ │ +│ │ └─────────────┘ └─────────────┘ │ │ │ +│ │ │ │ │ +│ │ ┌─────────────────────────────────┐ │ │ │ +│ │ │ mailu-db (redis) │ │ │ │ +│ │ └─────────────────────────────────┘ │ │ │ +│ └───────────────────────────────────────────┼──────────────────────────┘ │ +│ │ │ +└──────────────────────────────────────────────┼───────────────────────────────┘ + │ + ▼ + ┌──────────────────────────────────────┐ + │ EXTERNAL SMTP RELAY │ + │ (SendGrid / Mailgun / AWS SES) │ + │ │ + │ • Handles IP reputation │ + │ • Manages deliverability │ + │ • Provides bounce/complaint hooks │ + └──────────────────────────────────────┘ + │ + ▼ + ┌──────────────────────────────────────┐ + │ INTERNET / RECIPIENTS │ + └──────────────────────────────────────┘ +``` + +## Components + +### Core Services + +- **mailu-front**: Nginx reverse proxy for web access (ports 80/443) +- **mailu-admin**: Web administration interface (port 80) +- **mailu-smtp**: Postfix SMTP server (ports 25/587) +- **mailu-imap**: Dovecot IMAP server (ports 143/993) +- **mailu-antispam**: Rspamd spam filtering (ports 11333/11334) +- **mailu-redis**: Redis for session management (port 6379) + +### Storage + +- **mailu-data**: 10Gi PVC for mail storage +- **mailu-db**: 5Gi PVC for database +- **mailu-redis**: 1Gi PVC for Redis persistence + +## Configuration + +### Environment Variables + +The Mailu stack is configured via the `mailu-configmap.yaml` file: + +- **DOMAIN**: `bakewise.ai` +- **HOSTNAMES**: `mail.bakewise.ai` +- **RELAYHOST**: `smtp.mailgun.org:587` +- **RELAY_LOGIN**: `apikey` +- **TLS_FLAVOR**: `cert` (uses Let's Encrypt) +- **WEBMAIL**: `roundcube` +- **ANTIVIRUS**: `clamav` +- **ANTISPAM**: `rspamd` + +### Secrets + +Secrets are managed in `mailu-secrets.yaml`: + +- **ADMIN_PASSWORD**: Base64 encoded admin password +- **SECRET_KEY**: Mailu internal encryption key +- **RELAY_PASSWORD**: External SMTP relay API key +- **DB_PASSWORD**: Database password +- **REDIS_PASSWORD**: Redis password + +## Deployment + +### Prerequisites + +1. Kubernetes cluster with storage provisioner +2. Ingress controller (NGINX) +3. Cert-manager for TLS certificates +4. External SMTP relay account (Mailgun, SendGrid, AWS SES) + +### Deployment Steps + +1. **Configure DNS**: + ```bash + # MX record for inbound email + bakewise.ai. IN MX 10 mail.bakewise.ai. + + # A record for mail server + mail.bakewise.ai. IN A + + # SPF record (authorize external relay) + bakewise.ai. IN TXT "v=spf1 include:mailgun.org ~all" + + # DKIM record (Mailu generates this) + mailu._domainkey.bakewise.ai. IN TXT "v=DKIM1; k=rsa; p=" + + # DMARC record + _dmarc.bakewise.ai. IN TXT "v=DMARC1; p=quarantine; rua=mailto:dmarc@bakewise.ai" + ``` + +2. **Update secrets**: + ```bash + # Generate secure passwords + echo -n "your-secure-password" | base64 + openssl rand -base64 32 + + # Update mailu-secrets.yaml with real values + ``` + +3. **Deploy Mailu**: + ```bash + # For production + kubectl apply -k infrastructure/environments/prod/k8s-manifests/ + + # For development + kubectl apply -k infrastructure/environments/dev/k8s-manifests/ + ``` + +4. **Verify deployment**: + ```bash + kubectl get pods -n bakery-ia | grep mailu + kubectl logs -f mailu-smtp- -n bakery-ia + ``` + +## Integration with Notification Service + +The notification service has been updated to use Mailu as the SMTP server: + +```yaml +# infrastructure/environments/common/configs/configmap.yaml +SMTP_HOST: "mailu-smtp.bakery-ia.svc.cluster.local" +SMTP_PORT: "587" +SMTP_TLS: "true" +SMTP_SSL: "false" +``` + +## Accessing Mailu + +### Web Interface + +- **Admin Panel**: `https://mail.bakewise.ai/admin` +- **Webmail**: `https://mail.bakewise.ai/webmail` + +### SMTP Configuration + +For external clients to send email through Mailu: + +- **Server**: `mail.bakewise.ai` +- **Port**: 587 (Submission) +- **Security**: STARTTLS +- **Authentication**: Required + +### IMAP Configuration + +For email clients to access mailboxes: + +- **Server**: `mail.bakewise.ai` +- **Port**: 993 (IMAPS) +- **Security**: SSL/TLS +- **Authentication**: Required + +## Monitoring and Maintenance + +### Health Checks + +```bash +# Check Mailu services +kubectl get pods -n bakery-ia -l app=mailu + +# Check Mailu logs +kubectl logs -f mailu-smtp- -n bakery-ia +kubectl logs -f mailu-antispam- -n bakery-ia + +# Check queue status +kubectl exec -it mailu-smtp- -n bakery-ia -- mailq +``` + +### Backup and Restore + +```bash +# Backup mail data +kubectl exec -it mailu-smtp- -n bakery-ia -- tar czf /backup/mailu-backup-$(date +%Y%m%d).tar.gz /data + +# Restore mail data +kubectl cp mailu-backup-.tar.gz mailu-smtp-:/backup/ -n bakery-ia +kubectl exec -it mailu-smtp- -n bakery-ia -- tar xzf /backup/mailu-backup-.tar.gz -C / +``` + +## Troubleshooting + +### Common Issues + +1. **SMTP Relay Authentication Failed**: + - Verify `RELAY_PASSWORD` in secrets matches your external relay API key + - Check network connectivity to external relay + +2. **TLS Certificate Issues**: + - Ensure cert-manager is working properly + - Check DNS records are correctly pointing to your ingress + +3. **Email Delivery Delays**: + - Check Mailu queue: `kubectl exec -it mailu-smtp- -n bakery-ia -- mailq` + - Verify external relay service status + +4. **Spam Filtering Issues**: + - Check rspamd logs: `kubectl logs -f mailu-antispam- -n bakery-ia` + - Adjust spam scoring in rspamd configuration + +## Resource Requirements + +| Component | CPU Request | CPU Limit | Memory Request | Memory Limit | Storage | +|-----------|-------------|-----------|----------------|--------------|----------| +| mailu-front | 100m | 200m | 128Mi | 256Mi | - | +| mailu-admin | 100m | 300m | 256Mi | 512Mi | - | +| mailu-smtp | 100m | 500m | 256Mi | 512Mi | 10Gi | +| mailu-imap | 100m | 500m | 256Mi | 512Mi | - | +| mailu-antispam | 200m | 1000m | 512Mi | 1Gi | - | +| mailu-redis | 100m | 200m | 128Mi | 256Mi | 1Gi | + +**Total**: ~600m CPU, ~1.7Gi Memory, 16Gi Storage + +## Security Considerations + +1. **Network Policies**: Mailu is protected by network policies that restrict access to only the notification service and ingress controller. + +2. **TLS Encryption**: All external connections use TLS encryption. + +3. **Authentication**: All services require authentication. + +4. **Rate Limiting**: Configured to prevent abuse (60/hour per IP, 100/day per user). + +5. **Spam Protection**: Rspamd provides comprehensive spam filtering with DKIM signing. + +## Migration from External SMTP + +To migrate from external SMTP (Gmail) to Mailu: + +1. Update DNS records as shown above +2. Deploy Mailu stack +3. Update notification service configuration +4. Test email delivery +5. Monitor deliverability metrics +6. Gradually increase email volume + +## External Relay Provider Comparison + +| Provider | Pros | Cons | Free Tier | +|----------|------|------|-----------| +| SendGrid | Best deliverability, robust API | Expensive at scale | 100/day | +| Mailgun | Developer-friendly, good logs | EU data residency costs extra | 5,000/month (3 months) | +| AWS SES | Cheapest at scale ($0.10/1000) | Requires warm-up period | 62,000/month (from EC2) | +| Postmark | Transactional focus, fast | No marketing emails | 100/month | + +**Recommendation**: AWS SES for cost-effectiveness and Kubernetes integration. + +## Support + +For issues with Mailu deployment: + +1. Check the [Mailu documentation](https://mailu.io/) +2. Review Kubernetes events: `kubectl get events -n bakery-ia` +3. Check pod logs for specific components +4. Verify network connectivity and DNS resolution \ No newline at end of file diff --git a/infrastructure/platform/mail/mailu/WEBMAIL_DNS_CONFIGURATION.md b/infrastructure/platform/mail/mailu/WEBMAIL_DNS_CONFIGURATION.md new file mode 100644 index 00000000..8b1f6713 --- /dev/null +++ b/infrastructure/platform/mail/mailu/WEBMAIL_DNS_CONFIGURATION.md @@ -0,0 +1,265 @@ +# Webmail DNS Configuration Guide + +This guide provides the DNS configuration required to make the webmail system accessible from `webmail.bakewise.ai`. + +## Production DNS Configuration + +### Required DNS Records for `webmail.bakewise.ai` + +```bash +# A Record for webmail subdomain +webmail.bakewise.ai. IN A + +# CNAME Record (alternative approach) +webmail.bakewise.ai. IN CNAME bakewise.ai. + +# MX Record for email delivery (if receiving emails) +bakewise.ai. IN MX 10 webmail.bakewise.ai. + +# SPF Record (authorize webmail server) +bakewise.ai. IN TXT "v=spf1 include:mailgun.org ~all" + +# DKIM Record (will be generated by Mailu) +mailu._domainkey.bakewise.ai. IN TXT "v=DKIM1; k=rsa; p=" + +# DMARC Record +_dmarc.bakewise.ai. IN TXT "v=DMARC1; p=quarantine; rua=mailto:dmarc@bakewise.ai" +``` + +## Development DNS Configuration + +### Required DNS Records for `webmail.bakery-ia.local` + +For local development, add these entries to your `/etc/hosts` file: + +```bash +# Add to /etc/hosts +127.0.0.1 webmail.bakery-ia.local +127.0.0.1 bakery-ia.local +127.0.0.1 monitoring.bakery-ia.local +``` + +## TLS Certificate Configuration + +The ingress configuration includes automatic TLS certificate provisioning using cert-manager with Let's Encrypt. + +### Production TLS Configuration + +The production ingress (`prod-ingress.yaml`) includes: + +```yaml +tls: +- hosts: + - bakewise.ai + - monitoring.bakewise.ai + - webmail.bakewise.ai # ← Added webmail domain + secretName: bakery-ia-prod-tls-cert +``` + +### Development TLS Configuration + +The development ingress (`dev-ingress.yaml`) includes: + +```yaml +tls: +- hosts: + - localhost + - bakery-ia.local + - monitoring.bakery-ia.local + - webmail.bakery-ia.local # ← Added webmail domain + secretName: bakery-dev-tls-cert +``` + +## Ingress Routing Configuration + +### Production Routing + +The production ingress routes traffic as follows: + +- `https://bakewise.ai/` → Frontend service (port 3000) +- `https://bakewise.ai/api/` → Gateway service (port 8000) +- `https://monitoring.bakewise.ai/` → SigNoz monitoring (port 8080) +- `https://webmail.bakewise.ai/` → Email webmail (port 80) +- `https://webmail.bakewise.ai/webmail` → Email webmail +- `https://webmail.bakewise.ai/admin` → Email admin interface + +### Development Routing + +The development ingress routes traffic as follows: + +- `https://localhost/` → Frontend service (port 3000) +- `https://localhost/api/` → Gateway service (port 8000) +- `https://bakery-ia.local/` → Frontend service (port 3000) +- `https://bakery-ia.local/api/` → Gateway service (port 8000) +- `https://monitoring.bakery-ia.local/` → SigNoz monitoring (port 8080) +- `https://webmail.bakery-ia.local/` → Email webmail (port 80) +- `https://webmail.bakery-ia.local/webmail` → Email webmail +- `https://webmail.bakery-ia.local/admin` → Email admin interface + +## Security Headers + +The webmail ingress includes enhanced security headers: + +```nginx +Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; +style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; +connect-src 'self'; frame-src 'self'; +Strict-Transport-Security: max-age=63072000; includeSubDomains; preload +``` + +## Deployment Steps + +### 1. Update DNS Records + +```bash +# For production (using Cloudflare as example) +cfcli dns create bakewise.ai A webmail --ttl 3600 --proxied + +# For development (add to /etc/hosts) +echo "127.0.0.1 webmail.bakery-ia.local" | sudo tee -a /etc/hosts +``` + +### 2. Apply Ingress Configuration + +```bash +# Apply the updated ingress configuration +kubectl apply -k infrastructure/environments/prod/k8s-manifests/ + +# Verify the ingress is configured correctly +kubectl get ingress -n bakery-ia +kubectl describe ingress bakery-ingress-prod -n bakery-ia +``` + +### 3. Verify TLS Certificates + +```bash +# Check TLS certificate status +kubectl get certificaterequest -n bakery-ia +kubectl get certificate -n bakery-ia + +# Check certificate details +kubectl describe certificate bakery-ia-prod-tls-cert -n bakery-ia +``` + +### 4. Test Webmail Access + +```bash +# Test webmail accessibility +curl -I https://webmail.bakewise.ai +curl -I https://webmail.bakewise.ai/webmail +curl -I https://webmail.bakewise.ai/admin + +# Test from browser +open https://webmail.bakewise.ai +``` + +## Troubleshooting + +### DNS Issues + +```bash +# Check DNS resolution +dig webmail.bakewise.ai +nslookup webmail.bakewise.ai + +# Check ingress controller logs +kubectl logs -f -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx +``` + +### TLS Issues + +```bash +# Check cert-manager logs +kubectl logs -f -n cert-manager -l app=cert-manager + +# Check certificate status +kubectl get certificaterequest,certificate,order,challenge -n bakery-ia +``` + +### Ingress Issues + +```bash +# Check ingress controller events +kubectl get events -n ingress-nginx + +# Check ingress description +kubectl describe ingress -n bakery-ia +``` + +## Monitoring and Maintenance + +### Check Webmail Service Status + +```bash +# Check email services +kubectl get pods -n bakery-ia -l app=email + +# Check webmail service +kubectl get service email-webmail -n bakery-ia + +# Check ingress routing +kubectl get ingress -n bakery-ia -o yaml | grep -A 10 webmail +``` + +### Update DNS Records + +When the ingress IP changes, update the DNS records: + +```bash +# Get current ingress IP +kubectl get service -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}' + +# Update DNS (Cloudflare example) +cfcli dns update bakewise.ai A webmail --ttl 3600 --proxied +``` + +## Access Information + +After configuration, the webmail system will be accessible at: + +- **Production**: `https://webmail.bakewise.ai` +- **Development**: `https://webmail.bakery-ia.local` + +Default credentials (configured in secrets): +- **Admin**: `admin@bakewise.ai` +- **Password**: Configured in `email-secrets` + +## Integration with Existing Systems + +The webmail system integrates with: + +1. **SMTP Service**: `email-smtp.bakery-ia.svc.cluster.local:587` +2. **IMAP Service**: `email-imap.bakery-ia.svc.cluster.local:993` +3. **Notification Service**: Uses the new SMTP service for email notifications +4. **Monitoring**: SigNoz alerts use the new email service + +## Backup and Recovery + +### DNS Backup + +```bash +# Export DNS records (Cloudflare example) +cfcli dns export bakewise.ai > dns-backup.json + +# Restore DNS records +cfcli dns import bakewise.ai dns-backup.json +``` + +### Certificate Backup + +```bash +# Export TLS secrets +kubectl get secret bakery-ia-prod-tls-cert -n bakery-ia -o yaml > tls-backup.yaml + +# Restore TLS secrets +kubectl apply -f tls-backup.yaml +``` + +## References + +- [Cert-manager Documentation](https://cert-manager.io/docs/) +- [NGINX Ingress Controller](https://kubernetes.github.io/ingress-nginx/) +- [Let's Encrypt](https://letsencrypt.org/) +- [DNS Configuration Best Practices](https://www.cloudflare.com/learning/dns/) + +This configuration provides a secure, scalable webmail solution that integrates seamlessly with the existing Bakery-IA infrastructure. \ No newline at end of file diff --git a/infrastructure/platform/mail/mailu/kustomization.yaml b/infrastructure/platform/mail/mailu/kustomization.yaml new file mode 100644 index 00000000..5e7efb5e --- /dev/null +++ b/infrastructure/platform/mail/mailu/kustomization.yaml @@ -0,0 +1,24 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: bakery-ia + +resources: + - mailu-configmap.yaml + - mailu-secrets.yaml + - mailu-pvc.yaml + - mailu-deployment.yaml + - mailu-services.yaml + - mailu-antispam.yaml + - mailu-networkpolicy.yaml + # NOTE: mailu-ingress.yaml removed - ingress is now centralized in platform/networking + # NOTE: mailu-replacement.yaml removed - using official Mailu stack + # NOTE: email-config.yaml removed - configuration consolidated into mailu-configmap.yaml + # NOTE: Network policy kept here for self-contained module (could be moved to global security) + # NOTE: Mailu uses shared Redis (redis-service) with database 15 - no separate Redis needed + +labels: +- includeSelectors: true + pairs: + app: mailu + platform: mail + managed-by: kustomize \ No newline at end of file diff --git a/infrastructure/platform/mail/mailu/mailu-antispam.yaml b/infrastructure/platform/mail/mailu/mailu-antispam.yaml new file mode 100644 index 00000000..86aa96f7 --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-antispam.yaml @@ -0,0 +1,48 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mailu-antispam + namespace: bakery-ia + labels: + app: mailu + component: antispam +spec: + replicas: 1 + selector: + matchLabels: + app: mailu + component: antispam + template: + metadata: + labels: + app: mailu + component: antispam + spec: + containers: + - name: antispam + image: ghcr.io/mailu/rspamd:2024.06 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 11333 + name: rspamd + - containerPort: 11334 + name: rspamd-admin + envFrom: + - configMapRef: + name: mailu-config + - secretRef: + name: mailu-secrets + volumeMounts: + - name: mailu-data + mountPath: /data + resources: + requests: + cpu: 200m + memory: 512Mi + limits: + cpu: 1000m + memory: 1Gi + volumes: + - name: mailu-data + persistentVolumeClaim: + claimName: mailu-data diff --git a/infrastructure/platform/mail/mailu/mailu-configmap.yaml b/infrastructure/platform/mail/mailu/mailu-configmap.yaml new file mode 100644 index 00000000..6f38eefb --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-configmap.yaml @@ -0,0 +1,79 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: mailu-config + namespace: bakery-ia + labels: + app: mailu + component: config +data: + # Domain configuration + DOMAIN: "bakewise.ai" + HOSTNAMES: "mail.bakewise.ai" + POSTMASTER: "admin" + + # Kubernetes-specific settings + # These help Mailu components discover each other in K8s + FRONT_ADDRESS: "mailu-front.bakery-ia.svc.cluster.local" + ADMIN_ADDRESS: "mailu-admin.bakery-ia.svc.cluster.local" + SMTP_ADDRESS: "mailu-smtp.bakery-ia.svc.cluster.local" + IMAP_ADDRESS: "mailu-imap.bakery-ia.svc.cluster.local" + ANTISPAM_ADDRESS: "mailu-antispam.bakery-ia.svc.cluster.local" + + # Redis Configuration - Using shared cluster Redis (database 15 reserved for Mailu) + # The shared Redis has 16 databases (0-15), Mailu uses db 15 for isolation + # Using plain TCP port 6380 for internal cluster communication (TLS on 6379 for external) + # Primary configuration: Redis URL is configured in mailu-secrets.yaml as REDIS_URL + # Format: redis://:password@host:port/db + # Fallback configuration: REDIS_ADDRESS, REDIS_DB, and REDIS_PW + REDIS_ADDRESS: "redis-service.bakery-ia.svc.cluster.local:6380" + REDIS_DB: "15" + # REDIS_PW is set from secrets for Redis authentication + + # External SMTP Relay Configuration + # Mailu relays outbound emails through an external service for better deliverability + # Supported providers: Mailgun, SendGrid, AWS SES, Postmark + # + # Provider RELAYHOST examples: + # Mailgun: [smtp.mailgun.org]:587 + # SendGrid: [smtp.sendgrid.net]:587 + # AWS SES: [email-smtp.us-east-1.amazonaws.com]:587 + # Postmark: [smtp.postmarkapp.com]:587 + # + # IMPORTANT: Update RELAY_PASSWORD in mailu-secrets.yaml with your provider's API key + RELAYHOST: "[smtp.mailgun.org]:587" + RELAY_LOGIN: "postmaster@bakewise.ai" + + # Security settings + TLS_FLAVOR: "cert" + AUTH_RATELIMIT_IP: "60/hour" + AUTH_RATELIMIT_USER: "100/day" + + # Message limits + MESSAGE_SIZE_LIMIT: "52428800" # 50MB + MESSAGE_RATELIMIT: "200/day" + + # Features - disable ClamAV in dev to save resources (enable in prod) + WEBMAIL: "roundcube" + ANTIVIRUS: "none" + ANTISPAM: "rspamd" + + # Postfix configuration + POSTFIX_MESSAGE_SIZE_LIMIT: "52428800" + POSTFIX_QUEUE_MINIMUM: "1" + POSTFIX_QUEUE_LIFETIME: "7d" + + # DKIM configuration + DKIM_SELECTOR: "mailu" + DKIM_KEY_LENGTH: "2048" + + # Webmail settings + WEB_WEBMAIL: "/webmail" + WEB_ADMIN: "/admin" + WEBMAIL_ADMIN: "admin@bakewise.ai" + + # Logging + LOG_LEVEL: "INFO" + + # Disable welcome email during development + WELCOME: "false" \ No newline at end of file diff --git a/infrastructure/platform/mail/mailu/mailu-deployment.yaml b/infrastructure/platform/mail/mailu/mailu-deployment.yaml new file mode 100644 index 00000000..d1b8db46 --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-deployment.yaml @@ -0,0 +1,208 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mailu-front + namespace: bakery-ia + labels: + app: mailu + component: front +spec: + replicas: 1 + selector: + matchLabels: + app: mailu + component: front + template: + metadata: + labels: + app: mailu + component: front + spec: + containers: + - name: front + image: ghcr.io/mailu/nginx:2024.06 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 80 + name: http + - containerPort: 443 + name: https + envFrom: + - configMapRef: + name: mailu-config + - secretRef: + name: mailu-secrets + volumeMounts: + - name: mailu-data + mountPath: /data + - name: mailu-tls + mountPath: /certs + readOnly: true + resources: + requests: + cpu: 100m + memory: 128Mi + limits: + cpu: 200m + memory: 256Mi + volumes: + - name: mailu-data + persistentVolumeClaim: + claimName: mailu-data + - name: mailu-tls + secret: + # TLS secret name is environment-specific: + # - Dev: bakery-dev-tls-cert (self-signed, from dev-certificate.yaml) + # - Prod: bakery-ia-prod-tls-cert (Let's Encrypt, from prod-certificate.yaml) + # Patched via kustomize overlays in dev/prod kustomization.yaml + secretName: MAILU_TLS_SECRET_PLACEHOLDER + items: + - key: tls.crt + path: cert.pem + - key: tls.key + path: key.pem +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mailu-admin + namespace: bakery-ia + labels: + app: mailu + component: admin +spec: + replicas: 1 + selector: + matchLabels: + app: mailu + component: admin + template: + metadata: + labels: + app: mailu + component: admin + spec: + containers: + - name: admin + image: ghcr.io/mailu/admin:2024.06 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 80 + name: http + envFrom: + - configMapRef: + name: mailu-config + - secretRef: + name: mailu-secrets + volumeMounts: + - name: mailu-data + mountPath: /data + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 300m + memory: 512Mi + volumes: + - name: mailu-data + persistentVolumeClaim: + claimName: mailu-data +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mailu-smtp + namespace: bakery-ia + labels: + app: mailu + component: smtp +spec: + replicas: 1 + selector: + matchLabels: + app: mailu + component: smtp + template: + metadata: + labels: + app: mailu + component: smtp + spec: + containers: + - name: smtp + image: ghcr.io/mailu/postfix:2024.06 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 25 + name: smtp + - containerPort: 587 + name: submission + envFrom: + - configMapRef: + name: mailu-config + - secretRef: + name: mailu-secrets + volumeMounts: + - name: mailu-data + mountPath: /data + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi + volumes: + - name: mailu-data + persistentVolumeClaim: + claimName: mailu-data +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mailu-imap + namespace: bakery-ia + labels: + app: mailu + component: imap +spec: + replicas: 1 + selector: + matchLabels: + app: mailu + component: imap + template: + metadata: + labels: + app: mailu + component: imap + spec: + containers: + - name: imap + image: ghcr.io/mailu/dovecot:2024.06 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 143 + name: imap + - containerPort: 993 + name: imaps + envFrom: + - configMapRef: + name: mailu-config + - secretRef: + name: mailu-secrets + volumeMounts: + - name: mailu-data + mountPath: /data + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi + volumes: + - name: mailu-data + persistentVolumeClaim: + claimName: mailu-data diff --git a/infrastructure/platform/mail/mailu/mailu-networkpolicy.yaml b/infrastructure/platform/mail/mailu/mailu-networkpolicy.yaml new file mode 100644 index 00000000..1df4b450 --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-networkpolicy.yaml @@ -0,0 +1,93 @@ +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: mailu-network-policy + namespace: bakery-ia + labels: + app: mailu + component: network-policy +spec: + # Apply to all Mailu pods (matches mailu-deployment.yaml labels) + podSelector: + matchLabels: + app: mailu + policyTypes: + - Ingress + - Egress + ingress: + # Allow SMTP from notification-service + - from: + - podSelector: + matchLabels: + app: notification-service + ports: + - port: 25 + - port: 587 + # Allow SMTP from other internal services that may need to send email + - from: + - podSelector: + matchLabels: + app.kubernetes.io/name: bakery-ia + ports: + - port: 587 + # Allow webmail/admin access via ingress controller + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: ingress-nginx + ports: + - port: 80 + - port: 443 + # Allow internal Mailu component communication + - from: + - podSelector: + matchLabels: + app: mailu + ports: + - port: 25 + - port: 587 + - port: 143 + - port: 993 + - port: 80 + - port: 11333 + - port: 11334 + egress: + # Allow relay to external SMTP (Mailgun) + - to: + - ipBlock: + cidr: 0.0.0.0/0 + except: + - 10.0.0.0/8 + - 172.16.0.0/12 + - 192.168.0.0/16 + ports: + - port: 587 + - port: 465 + - port: 25 + # Allow internal Mailu component communication + - to: + - podSelector: + matchLabels: + app: mailu + ports: + - port: 25 + - port: 587 + - port: 143 + - port: 993 + - port: 80 + - port: 11333 + - port: 11334 + # Allow connection to shared Redis (database 15) + - to: + - podSelector: + matchLabels: + app.kubernetes.io/name: redis + ports: + - port: 6379 + # Allow DNS lookups + - to: [] + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP \ No newline at end of file diff --git a/infrastructure/platform/mail/mailu/mailu-pvc.yaml b/infrastructure/platform/mail/mailu/mailu-pvc.yaml new file mode 100644 index 00000000..e47b3a45 --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-pvc.yaml @@ -0,0 +1,21 @@ +# Mailu data storage - shared across all Mailu components +# Contains: mail data, SQLite database, DKIM keys, SSL certificates, queue +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: mailu-data + namespace: bakery-ia + labels: + app: mailu + component: storage +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 10Gi + # NOTE: Change storageClassName based on your cluster's storage provisioner + # For local development (kind): standard + # For AWS EKS: gp2 or gp3 + # For GKE: standard or premium-rwo + # For AKS: managed-premium or managed-csi diff --git a/infrastructure/platform/mail/mailu/mailu-secrets.yaml b/infrastructure/platform/mail/mailu/mailu-secrets.yaml new file mode 100644 index 00000000..a0536140 --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-secrets.yaml @@ -0,0 +1,37 @@ +apiVersion: v1 +kind: Secret +metadata: + name: mailu-secrets + namespace: bakery-ia + labels: + app: mailu + component: secrets +type: Opaque +data: + # Admin credentials (base64 encoded) + # IMPORTANT: Replace with real credentials before production deployment + # Generate with: openssl rand -base64 24 | tr -d '\n' | base64 + ADMIN_PASSWORD: "VzJYS2tSdUxpT25ZS2RCWVFTQXJvbjFpeWtFU1M1b2I=" # W2XKkRuLiOnYKdBYQSAron1iykESS5ob + + # Mailu secret key for internal encryption + # Generate with: openssl rand -base64 32 + SECRET_KEY: "Y2I2MWI5MzRkNDcwMjlhNjQxMTdjMGU0MTEwYzkzZjY2YmJjZjVlYWExNWM4NGM0MjcyN2ZhZDc4Zjc=" # cb61b934d47029a64117c0e4110c93f66bbcf5eaa15c84c42727fad78f7 + + # External SMTP relay credentials (Mailgun) + # For Mailgun: use postmaster@domain as username + RELAY_USER: "cG9zdG1hc3RlckBiYWtld2lzZS5haQ==" # postmaster@bakewise.ai + RELAY_PASSWORD: "bWFpbGd1bi1hcGkta2V5LXJlcGxhY2UtaW4tcHJvZHVjdGlvbg==" # mailgun-api-key-replace-in-production + + # Database credentials + DB_PASSWORD: "RThLejQ3WW1WekRsSEdzMU05d0FiSnp4Y0tuR09OQ1Q=" # E8Kz47YmVzDlHGs1M9wAbJzxcKnGONCT + + # Dovecot admin password (moved from ConfigMap for security) + DOVEADM_PASSWORD: "WnZhMzNoaVBJc2ZtV3RxUlBWV29taTRYZ2xLTlZPcHY=" # Zva33hiPIsfmWtqRPVWomi4XglKNVOpv + + # Redis password - same as shared cluster Redis (redis-secrets) + # Mailu uses database 15 for isolation from other services + # REDIS_PW is required by Mailu for Redis authentication + REDIS_PASSWORD: "SjNsa2x4cHU5QzlPTElLdkJteFVIT2h0czFnc0lvM0E=" # J3lklxpu9C9OLIKvBmxUHOhts1gsIo3A + REDIS_PW: "SjNsa2x4cHU5QzlPTElLdkJteFVIT2h0czFnc0lvM0E=" # J3lklxpu9C9OLIKvBmxUHOhts1gsIo3A + # Redis URL for Mailu - using plain TCP port 6380 for internal cluster communication + REDIS_URL: "cmVkaXM6Ly86SjNsa2x4cHU5QzlPTElLdkJteFVIT2h0czFnc0lvM0FAcmVkaXMtc2VydmljZS5iYWtlcnktaWEuc3ZjLmNsdXN0ZXIubG9jYWw6NjM4MC8xNQ==" # redis://:J3lklxpu9C9OLIKvBmxUHOhts1gsIo3A@redis-service.bakery-ia.svc.cluster.local:6380/15 \ No newline at end of file diff --git a/infrastructure/platform/mail/mailu/mailu-services.yaml b/infrastructure/platform/mail/mailu/mailu-services.yaml new file mode 100644 index 00000000..13e1f49a --- /dev/null +++ b/infrastructure/platform/mail/mailu/mailu-services.yaml @@ -0,0 +1,126 @@ +# Mailu Services - Routes traffic to Mailu stack components +# All services use app: mailu selectors to match mailu-deployment.yaml +apiVersion: v1 +kind: Service +metadata: + name: mailu-front + namespace: bakery-ia + labels: + app: mailu + component: front +spec: + type: ClusterIP + selector: + app: mailu + component: front + ports: + - name: http + port: 80 + targetPort: 80 + - name: https + port: 443 + targetPort: 443 +--- +apiVersion: v1 +kind: Service +metadata: + name: mailu-admin + namespace: bakery-ia + labels: + app: mailu + component: admin +spec: + type: ClusterIP + selector: + app: mailu + component: admin + ports: + - name: http + port: 80 + targetPort: 80 +--- +# Primary SMTP service - used by notification-service and other internal services +apiVersion: v1 +kind: Service +metadata: + name: mailu-smtp + namespace: bakery-ia + labels: + app: mailu + component: smtp +spec: + type: ClusterIP + selector: + app: mailu + component: smtp + ports: + - name: smtp + port: 25 + targetPort: 25 + - name: submission + port: 587 + targetPort: 587 +--- +# Alias for backwards compatibility with services expecting 'email-smtp' +apiVersion: v1 +kind: Service +metadata: + name: email-smtp + namespace: bakery-ia + labels: + app: mailu + component: smtp +spec: + type: ClusterIP + selector: + app: mailu + component: smtp + ports: + - name: smtp + port: 25 + targetPort: 25 + - name: submission + port: 587 + targetPort: 587 +--- +apiVersion: v1 +kind: Service +metadata: + name: mailu-imap + namespace: bakery-ia + labels: + app: mailu + component: imap +spec: + type: ClusterIP + selector: + app: mailu + component: imap + ports: + - name: imap + port: 143 + targetPort: 143 + - name: imaps + port: 993 + targetPort: 993 +--- +apiVersion: v1 +kind: Service +metadata: + name: mailu-antispam + namespace: bakery-ia + labels: + app: mailu + component: antispam +spec: + type: ClusterIP + selector: + app: mailu + component: antispam + ports: + - name: rspamd + port: 11333 + targetPort: 11333 + - name: rspamd-admin + port: 11334 + targetPort: 11334 diff --git a/infrastructure/kubernetes/overlays/dev/dev-ingress.yaml b/infrastructure/platform/networking/ingress/base/ingress.yaml similarity index 61% rename from infrastructure/kubernetes/overlays/dev/dev-ingress.yaml rename to infrastructure/platform/networking/ingress/base/ingress.yaml index 43059933..1f957cef 100644 --- a/infrastructure/kubernetes/overlays/dev/dev-ingress.yaml +++ b/infrastructure/platform/networking/ingress/base/ingress.yaml @@ -3,43 +3,41 @@ kind: Ingress metadata: name: bakery-ingress namespace: bakery-ia + labels: + app.kubernetes.io/name: bakery-ia + app.kubernetes.io/component: ingress annotations: - # Dev-Prod Parity: Enable HTTPS by default + # Nginx ingress controller annotations nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/force-ssl-redirect: "true" - - # Dev-Prod Parity: Use specific origins instead of wildcard to catch CORS issues early - # HTTPS origins first (preferred), with HTTP fallback for development flexibility - nginx.ingress.kubernetes.io/cors-allow-origin: "https://localhost,https://localhost:3000,https://localhost:3001,https://127.0.0.1,https://127.0.0.1:3000,https://127.0.0.1:3001,https://bakery-ia.local,http://localhost,http://localhost:3000,http://localhost:3001,http://127.0.0.1,http://127.0.0.1:3000" - nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS, PATCH" - nginx.ingress.kubernetes.io/cors-allow-headers: "Content-Type, Authorization, X-Requested-With, Accept, Origin, Cache-Control" - nginx.ingress.kubernetes.io/cors-allow-credentials: "true" - nginx.ingress.kubernetes.io/enable-cors: "true" - - # Prevent nginx from redirecting to add trailing slashes - nginx.ingress.kubernetes.io/use-regex: "true" - - # Development, SSE and WebSocket annotations - nginx.ingress.kubernetes.io/proxy-read-timeout: "3600" - nginx.ingress.kubernetes.io/proxy-connect-timeout: "600" nginx.ingress.kubernetes.io/proxy-body-size: "10m" + nginx.ingress.kubernetes.io/proxy-connect-timeout: "600" nginx.ingress.kubernetes.io/proxy-send-timeout: "3600" + nginx.ingress.kubernetes.io/proxy-read-timeout: "3600" + # SSE and WebSocket configuration for long-lived connections nginx.ingress.kubernetes.io/proxy-buffering: "off" nginx.ingress.kubernetes.io/proxy-http-version: "1.1" nginx.ingress.kubernetes.io/upstream-keepalive-timeout: "3600" - # WebSocket upgrade support nginx.ingress.kubernetes.io/websocket-services: "gateway-service" + # CORS configuration + nginx.ingress.kubernetes.io/enable-cors: "true" + nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS, PATCH" + nginx.ingress.kubernetes.io/cors-allow-headers: "Content-Type, Authorization, X-Requested-With, Accept, Origin, Cache-Control" + nginx.ingress.kubernetes.io/cors-allow-credentials: "true" + + spec: ingressClassName: nginx tls: - hosts: - - localhost - - bakery-ia.local - - monitoring.bakery-ia.local - secretName: bakery-dev-tls-cert + - DOMAIN_PLACEHOLDER # To be replaced by kustomize + - gitea.DOMAIN_PLACEHOLDER # To be replaced by kustomize + - mail.DOMAIN_PLACEHOLDER # To be replaced by kustomize + secretName: TLS_SECRET_PLACEHOLDER # To be replaced by kustomize rules: - - host: localhost + # Main application routes + - host: DOMAIN_PLACEHOLDER # To be replaced by kustomize http: paths: - path: / @@ -56,31 +54,39 @@ spec: name: gateway-service port: number: 8000 - - host: bakery-ia.local + # Gitea CI/CD route + - host: gitea.DOMAIN_PLACEHOLDER # To be replaced by kustomize http: paths: - path: / pathType: Prefix backend: service: - name: frontend-service + name: gitea-http port: number: 3000 - - path: /api + # Mail server web interface (webmail and admin) + - host: mail.DOMAIN_PLACEHOLDER # To be replaced by kustomize + http: + paths: + - path: /webmail pathType: Prefix backend: service: - name: gateway-service + name: mailu-front port: - number: 8000 - # SigNoz Monitoring on subdomain (deployed via Helm in bakery-ia namespace) - - host: monitoring.bakery-ia.local - http: - paths: + number: 80 + - path: /admin + pathType: Prefix + backend: + service: + name: mailu-front + port: + number: 80 - path: / pathType: Prefix backend: service: - name: signoz + name: mailu-front port: - number: 8080 \ No newline at end of file + number: 80 \ No newline at end of file diff --git a/infrastructure/platform/networking/ingress/base/kustomization.yaml b/infrastructure/platform/networking/ingress/base/kustomization.yaml new file mode 100644 index 00000000..14d8f3a5 --- /dev/null +++ b/infrastructure/platform/networking/ingress/base/kustomization.yaml @@ -0,0 +1,5 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - ingress.yaml diff --git a/infrastructure/platform/networking/ingress/kustomization.yaml b/infrastructure/platform/networking/ingress/kustomization.yaml new file mode 100644 index 00000000..8a6f79fe --- /dev/null +++ b/infrastructure/platform/networking/ingress/kustomization.yaml @@ -0,0 +1,5 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - base/ \ No newline at end of file diff --git a/infrastructure/platform/networking/ingress/overlays/dev/kustomization.yaml b/infrastructure/platform/networking/ingress/overlays/dev/kustomization.yaml new file mode 100644 index 00000000..49ad2bf5 --- /dev/null +++ b/infrastructure/platform/networking/ingress/overlays/dev/kustomization.yaml @@ -0,0 +1,37 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - ../../base + +namePrefix: dev- + +patches: + - target: + kind: Ingress + name: bakery-ingress + patch: |- + - op: replace + path: /spec/tls/0/hosts/0 + value: bakery-ia.local + - op: replace + path: /spec/tls/0/hosts/1 + value: gitea.bakery-ia.local + - op: replace + path: /spec/tls/0/hosts/2 + value: mail.bakery-ia.local + - op: replace + path: /spec/tls/0/secretName + value: bakery-dev-tls-cert + - op: replace + path: /spec/rules/0/host + value: bakery-ia.local + - op: replace + path: /spec/rules/1/host + value: gitea.bakery-ia.local + - op: replace + path: /spec/rules/2/host + value: mail.bakery-ia.local + - op: replace + path: /metadata/annotations/nginx.ingress.kubernetes.io~1cors-allow-origin + value: "https://localhost,https://localhost:3000,https://localhost:3001,https://127.0.0.1,https://127.0.0.1:3000,https://127.0.0.1:3001,https://bakery-ia.local,http://localhost,http://localhost:3000,http://localhost:3001,http://127.0.0.1,http://127.0.0.1:3000" \ No newline at end of file diff --git a/infrastructure/platform/networking/ingress/overlays/prod/kustomization.yaml b/infrastructure/platform/networking/ingress/overlays/prod/kustomization.yaml new file mode 100644 index 00000000..cf3b6f5f --- /dev/null +++ b/infrastructure/platform/networking/ingress/overlays/prod/kustomization.yaml @@ -0,0 +1,49 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - ../../base + +namePrefix: prod- + +patches: + - target: + kind: Ingress + name: bakery-ingress + patch: |- + - op: replace + path: /spec/tls/0/hosts/0 + value: bakewise.ai + - op: replace + path: /spec/tls/0/hosts/1 + value: gitea.bakewise.ai + - op: replace + path: /spec/tls/0/hosts/2 + value: mail.bakewise.ai + - op: replace + path: /spec/tls/0/secretName + value: bakery-ia-prod-tls-cert + - op: replace + path: /spec/rules/0/host + value: bakewise.ai + - op: replace + path: /spec/rules/1/host + value: gitea.bakewise.ai + - op: replace + path: /spec/rules/2/host + value: mail.bakewise.ai + - op: add + path: /metadata/annotations/nginx.ingress.kubernetes.io~1cors-allow-origin + value: "https://bakewise.ai,https://www.bakewise.ai,https://mail.bakewise.ai" + - op: add + path: /metadata/annotations/nginx.ingress.kubernetes.io~1limit-rps + value: "100" + - op: add + path: /metadata/annotations/nginx.ingress.kubernetes.io~1limit-connections + value: "50" + - op: add + path: /metadata/annotations/cert-manager.io~1cluster-issuer + value: "letsencrypt-production" + - op: add + path: /metadata/annotations/cert-manager.io~1acme-challenge-type + value: "http01" \ No newline at end of file diff --git a/infrastructure/platform/security/encryption/README.md b/infrastructure/platform/security/encryption/README.md new file mode 100644 index 00000000..135f1d39 --- /dev/null +++ b/infrastructure/platform/security/encryption/README.md @@ -0,0 +1,55 @@ +# Kubernetes Secrets Encryption + +This directory contains configuration for encrypting Kubernetes secrets at rest. + +## What is this for? + +Kubernetes secrets are stored in etcd, and by default they are stored as plaintext. This encryption configuration ensures that secrets are encrypted when stored in etcd, providing an additional layer of security. + +## Files + +- `encryption-config.yaml` - Main encryption configuration file + +## How it works + +1. The API server uses this configuration to encrypt secrets before storing them in etcd +2. When secrets are retrieved, they are automatically decrypted by the API server +3. This provides encryption at rest for all Kubernetes secrets + +## Security Notes + +- The encryption key is stored in this file (base64 encoded) +- This file should be protected and not committed to version control in production +- For development, this provides basic encryption at rest +- In production, consider using a proper key management system + +## Generating a new key + +```bash +openssl rand -base64 32 +``` + +## Configuration Details + +- **Algorithm**: AES-CBC with 256-bit keys +- **Provider**: `aescbc` - AES-CBC encryption provider +- **Fallback**: `identity` - Allows reading unencrypted secrets during migration + +## Usage + +This configuration is automatically used by the Kind cluster configuration in `kind-config.yaml`. The file is mounted into the Kubernetes control plane container and referenced by the API server configuration. + +## Rotation + +To rotate keys: +1. Add a new key to the `keys` array +2. Make the new key the first in the array +3. Restart the API server +4. Old keys can be removed after all secrets have been re-encrypted with the new key + +## Compliance + +This encryption helps satisfy: +- GDPR Article 32 - Security of processing +- PCI DSS Requirement 3.4 - Encryption of sensitive data +- ISO 27001:2022 - Cryptographic controls diff --git a/infrastructure/platform/security/encryption/encryption-config.yaml b/infrastructure/platform/security/encryption/encryption-config.yaml new file mode 100644 index 00000000..1766e997 --- /dev/null +++ b/infrastructure/platform/security/encryption/encryption-config.yaml @@ -0,0 +1,17 @@ +# Kubernetes Secrets Encryption Configuration +# This file configures encryption at rest for Kubernetes secrets +# Used by the API server to encrypt secret data stored in etcd + +apiVersion: apiserver.config.k8s.io/v1 +kind: EncryptionConfiguration +resources: + - resources: + - secrets + providers: + - aescbc: + keys: + - name: key1 + # 32-byte (256-bit) AES key encoded in base64 + # Generated using: openssl rand -base64 32 + secret: 62um3zP5aidjVSIB0ckAxF/Ms8EDy/Z8LyMGTdMuoSM= + - identity: {} diff --git a/infrastructure/platform/security/network-policies/global-default-networkpolicy.yaml b/infrastructure/platform/security/network-policies/global-default-networkpolicy.yaml new file mode 100644 index 00000000..6102c33f --- /dev/null +++ b/infrastructure/platform/security/network-policies/global-default-networkpolicy.yaml @@ -0,0 +1,108 @@ +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: default-deny-all + namespace: bakery-ia + labels: + app: global + component: network-policy + tier: security +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress + ingress: [] + egress: [] +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-kube-dns + namespace: bakery-ia + labels: + app: global + component: network-policy + tier: security +spec: + podSelector: {} + policyTypes: + - Egress + egress: + # Allow DNS resolution to kube-system namespace + - to: + - namespaceSelector: + matchLabels: + name: kube-system + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-ingress-controller + namespace: bakery-ia + labels: + app: global + component: network-policy + tier: security +spec: + podSelector: + matchLabels: + # This label should match your ingress controller's namespace + # Adjust as needed for your specific ingress controller + app: nginx-ingress-microk8s + policyTypes: + - Ingress + ingress: + # Allow all traffic to ingress controller + - from: + - ipBlock: + cidr: 0.0.0.0/0 +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-internal-communication + namespace: bakery-ia + labels: + app: global + component: network-policy + tier: security +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress + ingress: + # Allow communication between pods in the same namespace + - from: + - podSelector: {} + egress: + # Allow communication to pods in the same namespace + - to: + - podSelector: {} +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-egress-external + namespace: bakery-ia + labels: + app: global + component: network-policy + tier: security +spec: + podSelector: + matchLabels: + app: external-egress-allowed + policyTypes: + - Egress + egress: + # Allow external communication for services that need it + - to: + - ipBlock: + cidr: 0.0.0.0/0 \ No newline at end of file diff --git a/infrastructure/platform/security/network-policies/global-project-networkpolicy.yaml b/infrastructure/platform/security/network-policies/global-project-networkpolicy.yaml new file mode 100644 index 00000000..f048a749 --- /dev/null +++ b/infrastructure/platform/security/network-policies/global-project-networkpolicy.yaml @@ -0,0 +1,159 @@ +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-default-deny + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress + ingress: [] + egress: [] +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-allow-dns + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: {} + policyTypes: + - Egress + egress: + # Allow DNS resolution to kube-system namespace + - to: + - namespaceSelector: + matchLabels: + name: kube-system + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-allow-ingress-access + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: ingress-nginx + policyTypes: + - Ingress + ingress: + # Allow all traffic to ingress controller + - from: + - ipBlock: + cidr: 0.0.0.0/0 +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-allow-internal-comm + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress + ingress: + # Allow communication between project services + - from: + - namespaceSelector: + matchLabels: + name: bakery-ia + egress: + # Allow communication to project services + - to: + - namespaceSelector: + matchLabels: + name: bakery-ia +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-allow-monitoring + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: + matchLabels: + app: signoz + policyTypes: + - Ingress + ingress: + # Allow monitoring access from project services + - from: + - namespaceSelector: + matchLabels: + name: bakery-ia +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-allow-database-access + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: + matchLabels: + app: postgres + policyTypes: + - Ingress + ingress: + # Allow database access from application services + - from: + - namespaceSelector: + matchLabels: + name: bakery-ia + ports: + - port: 5432 +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: project-allow-cache-access + namespace: bakery-ia + labels: + app: project-global + component: network-policy + tier: security +spec: + podSelector: + matchLabels: + app: redis + policyTypes: + - Ingress + ingress: + # Allow cache access from application services + - from: + - namespaceSelector: + matchLabels: + name: bakery-ia + ports: + - port: 6379 \ No newline at end of file diff --git a/infrastructure/platform/storage/kustomization.yaml b/infrastructure/platform/storage/kustomization.yaml new file mode 100644 index 00000000..18dd3b8d --- /dev/null +++ b/infrastructure/platform/storage/kustomization.yaml @@ -0,0 +1,19 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + # Storage infrastructure + - minio/minio-deployment.yaml + - minio/minio-pvc.yaml + - minio/minio-secrets.yaml + - minio/minio-bucket-init-job.yaml + - minio/secrets/minio-tls-secret.yaml + + # Cache infrastructure + - redis/redis.yaml + - redis/secrets/redis-tls-secret.yaml + + # Database infrastructure + - postgres/secrets/postgres-tls-secret.yaml + - postgres/configs/postgres-logging-config.yaml + - postgres/configs/postgres-init-config.yaml diff --git a/infrastructure/kubernetes/base/jobs/minio-bucket-init-job.yaml b/infrastructure/platform/storage/minio/minio-bucket-init-job.yaml similarity index 100% rename from infrastructure/kubernetes/base/jobs/minio-bucket-init-job.yaml rename to infrastructure/platform/storage/minio/minio-bucket-init-job.yaml diff --git a/infrastructure/kubernetes/base/components/minio/minio-deployment.yaml b/infrastructure/platform/storage/minio/minio-deployment.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/minio/minio-deployment.yaml rename to infrastructure/platform/storage/minio/minio-deployment.yaml diff --git a/infrastructure/kubernetes/base/components/minio/minio-pvc.yaml b/infrastructure/platform/storage/minio/minio-pvc.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/minio/minio-pvc.yaml rename to infrastructure/platform/storage/minio/minio-pvc.yaml diff --git a/infrastructure/kubernetes/base/components/minio/minio-secrets.yaml b/infrastructure/platform/storage/minio/minio-secrets.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/minio/minio-secrets.yaml rename to infrastructure/platform/storage/minio/minio-secrets.yaml diff --git a/infrastructure/kubernetes/base/secrets/minio-tls-secret.yaml b/infrastructure/platform/storage/minio/secrets/minio-tls-secret.yaml similarity index 100% rename from infrastructure/kubernetes/base/secrets/minio-tls-secret.yaml rename to infrastructure/platform/storage/minio/secrets/minio-tls-secret.yaml diff --git a/infrastructure/kubernetes/base/configs/postgres-init-config.yaml b/infrastructure/platform/storage/postgres/configs/postgres-init-config.yaml similarity index 100% rename from infrastructure/kubernetes/base/configs/postgres-init-config.yaml rename to infrastructure/platform/storage/postgres/configs/postgres-init-config.yaml diff --git a/infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml b/infrastructure/platform/storage/postgres/configs/postgres-logging-config.yaml similarity index 100% rename from infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml rename to infrastructure/platform/storage/postgres/configs/postgres-logging-config.yaml diff --git a/infrastructure/kubernetes/base/components/databases/postgres-template.yaml b/infrastructure/platform/storage/postgres/postgres-template.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/databases/postgres-template.yaml rename to infrastructure/platform/storage/postgres/postgres-template.yaml diff --git a/infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml b/infrastructure/platform/storage/postgres/secrets/postgres-tls-secret.yaml similarity index 100% rename from infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml rename to infrastructure/platform/storage/postgres/secrets/postgres-tls-secret.yaml diff --git a/infrastructure/kubernetes/base/components/databases/redis.yaml b/infrastructure/platform/storage/redis/redis.yaml similarity index 92% rename from infrastructure/kubernetes/base/components/databases/redis.yaml rename to infrastructure/platform/storage/redis/redis.yaml index 53b50a21..ec2c1bec 100644 --- a/infrastructure/kubernetes/base/components/databases/redis.yaml +++ b/infrastructure/platform/storage/redis/redis.yaml @@ -25,7 +25,7 @@ spec: fsGroup: 999 # redis group initContainers: - name: fix-tls-permissions - image: busybox:latest + image: busybox:1.36 securityContext: runAsUser: 0 command: ['sh', '-c'] @@ -47,7 +47,9 @@ spec: image: redis:7.4-alpine ports: - containerPort: 6379 - name: redis + name: redis-tls + - containerPort: 6380 + name: redis-plain env: - name: REDIS_PASSWORD valueFrom: @@ -64,10 +66,12 @@ spec: - "512mb" - --databases - "16" + # TLS port for external/secure connections - --tls-port - "6379" + # Plain TCP port for internal cluster services (Mailu) - --port - - "0" + - "6380" - --tls-cert-file - /tls/redis-cert.pem - --tls-key-file @@ -149,7 +153,11 @@ spec: - port: 6379 targetPort: 6379 protocol: TCP - name: redis + name: redis-tls + - port: 6380 + targetPort: 6380 + protocol: TCP + name: redis-plain selector: app.kubernetes.io/name: redis app.kubernetes.io/component: cache diff --git a/infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml b/infrastructure/platform/storage/redis/secrets/redis-tls-secret.yaml similarity index 100% rename from infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml rename to infrastructure/platform/storage/redis/secrets/redis-tls-secret.yaml diff --git a/infrastructure/scripts/deployment/deploy-signoz.sh b/infrastructure/scripts/deployment/deploy-signoz.sh new file mode 100755 index 00000000..0af5ab50 --- /dev/null +++ b/infrastructure/scripts/deployment/deploy-signoz.sh @@ -0,0 +1,392 @@ +#!/bin/bash + +# ============================================================================ +# SigNoz Deployment Script for Bakery IA +# ============================================================================ +# This script deploys SigNoz monitoring stack using Helm +# Supports both development and production environments +# ============================================================================ + +set -e + +# Color codes for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Function to display help +show_help() { + echo "Usage: $0 [OPTIONS] ENVIRONMENT" + echo "" + echo "Deploy SigNoz monitoring stack for Bakery IA" + echo "" + echo "Arguments: + ENVIRONMENT Environment to deploy to (dev|prod)" + echo "" + echo "Options: + -h, --help Show this help message + -d, --dry-run Dry run - show what would be done without actually deploying + -u, --upgrade Upgrade existing deployment + -r, --remove Remove/Uninstall SigNoz deployment + -n, --namespace NAMESPACE Specify namespace (default: bakery-ia)" + echo "" + echo "Examples: + $0 dev # Deploy to development + $0 prod # Deploy to production + $0 --upgrade prod # Upgrade production deployment + $0 --remove dev # Remove development deployment" + echo "" + echo "Docker Hub Authentication:" + echo " This script automatically creates a Docker Hub secret for image pulls." + echo " Provide credentials via environment variables (recommended):" + echo " export DOCKERHUB_USERNAME='your-username'" + echo " export DOCKERHUB_PASSWORD='your-personal-access-token'" + echo " Or ensure you're logged in with Docker CLI:" + echo " docker login" +} + +# Parse command line arguments +DRY_RUN=false +UPGRADE=false +REMOVE=false +NAMESPACE="bakery-ia" + +while [[ $# -gt 0 ]]; do + case $1 in + -h|--help) + show_help + exit 0 + ;; + -d|--dry-run) + DRY_RUN=true + shift + ;; + -u|--upgrade) + UPGRADE=true + shift + ;; + -r|--remove) + REMOVE=true + shift + ;; + -n|--namespace) + NAMESPACE="$2" + shift 2 + ;; + dev|prod) + ENVIRONMENT="$1" + shift + ;; + *) + echo "Unknown argument: $1" + show_help + exit 1 + ;; + esac +done + +# Validate environment +if [[ -z "$ENVIRONMENT" ]]; then + echo "Error: Environment not specified. Use 'dev' or 'prod'." + show_help + exit 1 +fi + +if [[ "$ENVIRONMENT" != "dev" && "$ENVIRONMENT" != "prod" ]]; then + echo "Error: Invalid environment. Use 'dev' or 'prod'." + exit 1 +fi + +# Function to check if Helm is installed +check_helm() { + if ! command -v helm &> /dev/null; then + echo "${RED}Error: Helm is not installed. Please install Helm first.${NC}" + echo "Installation instructions: https://helm.sh/docs/intro/install/" + exit 1 + fi +} + +# Function to check if kubectl is configured +check_kubectl() { + if ! kubectl cluster-info &> /dev/null; then + echo "${RED}Error: kubectl is not configured or cannot connect to cluster.${NC}" + echo "Please ensure you have access to a Kubernetes cluster." + exit 1 + fi +} + +# Function to check if namespace exists, create if not +ensure_namespace() { + if ! kubectl get namespace "$NAMESPACE" &> /dev/null; then + echo "${BLUE}Creating namespace $NAMESPACE...${NC}" + if [[ "$DRY_RUN" == true ]]; then + echo " (dry-run) Would create namespace $NAMESPACE" + else + kubectl create namespace "$NAMESPACE" + echo "${GREEN}Namespace $NAMESPACE created.${NC}" + fi + else + echo "${BLUE}Namespace $NAMESPACE already exists.${NC}" + fi +} + +# Function to create Docker Hub secret for image pulls +create_dockerhub_secret() { + echo "${BLUE}Setting up Docker Hub image pull secret...${NC}" + + if [[ "$DRY_RUN" == true ]]; then + echo " (dry-run) Would create Docker Hub secret in namespace $NAMESPACE" + return + fi + + # Check if secret already exists + if kubectl get secret dockerhub-creds -n "$NAMESPACE" &> /dev/null; then + echo "${GREEN}Docker Hub secret already exists in namespace $NAMESPACE.${NC}" + return + fi + + # Check if Docker Hub credentials are available + if [[ -n "$DOCKERHUB_USERNAME" ]] && [[ -n "$DOCKERHUB_PASSWORD" ]]; then + echo "${BLUE}Found DOCKERHUB_USERNAME and DOCKERHUB_PASSWORD environment variables${NC}" + + kubectl create secret docker-registry dockerhub-creds \ + --docker-server=https://index.docker.io/v1/ \ + --docker-username="$DOCKERHUB_USERNAME" \ + --docker-password="$DOCKERHUB_PASSWORD" \ + --docker-email="${DOCKERHUB_EMAIL:-noreply@bakery-ia.local}" \ + -n "$NAMESPACE" + + echo "${GREEN}Docker Hub secret created successfully.${NC}" + + elif [[ -f "$HOME/.docker/config.json" ]]; then + echo "${BLUE}Attempting to use Docker CLI credentials...${NC}" + + # Try to extract credentials from Docker config + if grep -q "credsStore" "$HOME/.docker/config.json"; then + echo "${YELLOW}Docker is using a credential store. Please set environment variables:${NC}" + echo " export DOCKERHUB_USERNAME='your-username'" + echo " export DOCKERHUB_PASSWORD='your-password-or-token'" + echo "${YELLOW}Continuing without Docker Hub authentication...${NC}" + return + fi + + # Try to extract from base64 encoded auth + AUTH=$(cat "$HOME/.docker/config.json" | jq -r '.auths["https://index.docker.io/v1/"].auth // empty' 2>/dev/null) + if [[ -n "$AUTH" ]]; then + echo "${GREEN}Found Docker Hub credentials in Docker config${NC}" + local DOCKER_USERNAME=$(echo "$AUTH" | base64 -d | cut -d: -f1) + local DOCKER_PASSWORD=$(echo "$AUTH" | base64 -d | cut -d: -f2-) + + kubectl create secret docker-registry dockerhub-creds \ + --docker-server=https://index.docker.io/v1/ \ + --docker-username="$DOCKER_USERNAME" \ + --docker-password="$DOCKER_PASSWORD" \ + --docker-email="${DOCKERHUB_EMAIL:-noreply@bakery-ia.local}" \ + -n "$NAMESPACE" + + echo "${GREEN}Docker Hub secret created successfully.${NC}" + else + echo "${YELLOW}Could not find Docker Hub credentials${NC}" + echo "${YELLOW}To enable automatic Docker Hub authentication:${NC}" + echo " 1. Run 'docker login', OR" + echo " 2. Set environment variables:" + echo " export DOCKERHUB_USERNAME='your-username'" + echo " export DOCKERHUB_PASSWORD='your-password-or-token'" + echo "${YELLOW}Continuing without Docker Hub authentication...${NC}" + fi + else + echo "${YELLOW}Docker Hub credentials not found${NC}" + echo "${YELLOW}To enable automatic Docker Hub authentication:${NC}" + echo " 1. Run 'docker login', OR" + echo " 2. Set environment variables:" + echo " export DOCKERHUB_USERNAME='your-username'" + echo " export DOCKERHUB_PASSWORD='your-password-or-token'" + echo "${YELLOW}Continuing without Docker Hub authentication...${NC}" + fi + echo "" +} + +# Function to add and update Helm repository +setup_helm_repo() { + echo "${BLUE}Setting up SigNoz Helm repository...${NC}" + + if [[ "$DRY_RUN" == true ]]; then + echo " (dry-run) Would add SigNoz Helm repository" + return + fi + + # Add SigNoz Helm repository + if helm repo list | grep -q "^signoz"; then + echo "${BLUE}SigNoz repository already added, updating...${NC}" + helm repo update signoz + else + echo "${BLUE}Adding SigNoz Helm repository...${NC}" + helm repo add signoz https://charts.signoz.io + helm repo update + fi + + echo "${GREEN}Helm repository ready.${NC}" + echo "" +} + +# Function to deploy SigNoz +deploy_signoz() { + local values_file="infrastructure/helm/signoz-values-$ENVIRONMENT.yaml" + + if [[ ! -f "$values_file" ]]; then + echo "${RED}Error: Values file $values_file not found.${NC}" + exit 1 + fi + + echo "${BLUE}Deploying SigNoz to $ENVIRONMENT environment...${NC}" + echo " Using values file: $values_file" + echo " Target namespace: $NAMESPACE" + echo " Chart version: Latest from signoz/signoz" + + if [[ "$DRY_RUN" == true ]]; then + echo " (dry-run) Would deploy SigNoz with:" + echo " helm upgrade --install signoz signoz/signoz -n $NAMESPACE -f $values_file --wait --timeout 15m" + return + fi + + # Use upgrade --install to handle both new installations and upgrades + echo "${BLUE}Installing/Upgrading SigNoz...${NC}" + echo "This may take 10-15 minutes..." + + helm upgrade --install signoz signoz/signoz \ + -n "$NAMESPACE" \ + -f "$values_file" \ + --wait \ + --timeout 15m \ + --create-namespace + + echo "${GREEN}SigNoz deployment completed.${NC}" + echo "" + + # Show deployment status + show_deployment_status +} + +# Function to remove SigNoz +remove_signoz() { + echo "${BLUE}Removing SigNoz deployment from namespace $NAMESPACE...${NC}" + + if [[ "$DRY_RUN" == true ]]; then + echo " (dry-run) Would remove SigNoz deployment" + return + fi + + if helm list -n "$NAMESPACE" | grep -q signoz; then + helm uninstall signoz -n "$NAMESPACE" --wait + echo "${GREEN}SigNoz deployment removed.${NC}" + + # Optionally remove PVCs (commented out by default for safety) + echo "" + echo "${YELLOW}Note: Persistent Volume Claims (PVCs) were NOT deleted.${NC}" + echo "To delete PVCs and all data, run:" + echo " kubectl delete pvc -n $NAMESPACE -l app.kubernetes.io/instance=signoz" + else + echo "${YELLOW}No SigNoz deployment found in namespace $NAMESPACE.${NC}" + fi +} + +# Function to show deployment status +show_deployment_status() { + echo "" + echo "${BLUE}=== SigNoz Deployment Status ===${NC}" + echo "" + + # Get pods + echo "Pods:" + kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + echo "" + + # Get services + echo "Services:" + kubectl get svc -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + echo "" + + # Get ingress + echo "Ingress:" + kubectl get ingress -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + echo "" + + # Show access information + show_access_info +} + +# Function to show access information +show_access_info() { + echo "${BLUE}=== Access Information ===${NC}" + + if [[ "$ENVIRONMENT" == "dev" ]]; then + echo "SigNoz UI: http://monitoring.bakery-ia.local" + echo "" + echo "OpenTelemetry Collector Endpoints (from within cluster):" + echo " gRPC: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4317" + echo " HTTP: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4318" + echo "" + echo "Port-forward for local access:" + echo " kubectl port-forward -n $NAMESPACE svc/signoz 8080:8080" + echo " kubectl port-forward -n $NAMESPACE svc/signoz-otel-collector 4317:4317" + echo " kubectl port-forward -n $NAMESPACE svc/signoz-otel-collector 4318:4318" + else + echo "SigNoz UI: https://monitoring.bakewise.ai" + echo "" + echo "OpenTelemetry Collector Endpoints (from within cluster):" + echo " gRPC: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4317" + echo " HTTP: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4318" + echo "" + echo "External endpoints (if exposed):" + echo " Check ingress configuration for external OTLP endpoints" + fi + + echo "" + echo "Default credentials:" + echo " Username: admin@example.com" + echo " Password: admin" + echo "" + echo "Note: Change default password after first login!" + echo "" +} + +# Main execution +main() { + echo "${BLUE}" + echo "==========================================" + echo "🚀 SigNoz Deployment for Bakery IA" + echo "==========================================" + echo "${NC}" + + # Check prerequisites + check_helm + check_kubectl + + # Ensure namespace + ensure_namespace + + if [[ "$REMOVE" == true ]]; then + remove_signoz + exit 0 + fi + + # Setup Helm repository + setup_helm_repo + + # Create Docker Hub secret for image pulls + create_dockerhub_secret + + # Deploy SigNoz + deploy_signoz + + echo "${GREEN}" + echo "==========================================" + echo "✅ SigNoz deployment completed!" + echo "==========================================" + echo "${NC}" +} + +# Run main function +main \ No newline at end of file diff --git a/infrastructure/scripts/maintenance/apply-security-changes.sh b/infrastructure/scripts/maintenance/apply-security-changes.sh new file mode 100755 index 00000000..98f6e56b --- /dev/null +++ b/infrastructure/scripts/maintenance/apply-security-changes.sh @@ -0,0 +1,168 @@ +#!/usr/bin/env bash + +# Apply all database security changes to Kubernetes cluster + +set -e + +NAMESPACE="bakery-ia" + +echo "======================================" +echo "Bakery IA Database Security Deployment" +echo "======================================" +echo "" +echo "This script will apply all security changes to the cluster:" +echo " 1. Updated passwords" +echo " 2. TLS certificates for PostgreSQL and Redis" +echo " 3. Updated database deployments with TLS and PVCs" +echo " 4. PostgreSQL logging configuration" +echo " 5. pgcrypto extension" +echo "" +read -p "Press Enter to continue or Ctrl+C to cancel..." +echo "" + +# ===== 1. Apply Secrets ===== +echo "Step 1: Applying updated secrets..." +kubectl apply -f infrastructure/kubernetes/base/secrets.yaml +kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml +kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml +echo "✓ Secrets applied" +echo "" + +# ===== 2. Apply ConfigMaps ===== +echo "Step 2: Applying ConfigMaps..." +kubectl apply -f infrastructure/kubernetes/base/configs/postgres-init-config.yaml +kubectl apply -f infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml +echo "✓ ConfigMaps applied" +echo "" + +# ===== 3. Apply Database Deployments ===== +echo "Step 3: Applying database deployments..." +kubectl apply -f infrastructure/kubernetes/base/components/databases/ +echo "✓ Database deployments applied" +echo "" + +# ===== 4. Wait for Rollout ===== +echo "Step 4: Waiting for database pods to be ready..." + +DBS=( + "auth-db" + "tenant-db" + "training-db" + "forecasting-db" + "sales-db" + "external-db" + "notification-db" + "inventory-db" + "recipes-db" + "suppliers-db" + "pos-db" + "orders-db" + "production-db" + "alert-processor-db" + "redis" +) + +for db in "${DBS[@]}"; do + echo " Waiting for $db..." + kubectl rollout status deployment/$db -n $NAMESPACE --timeout=5m || echo " ⚠️ Warning: $db rollout may have issues" +done + +echo "✓ All deployments rolled out" +echo "" + +# ===== 5. Verify PVCs ===== +echo "Step 5: Verifying PersistentVolumeClaims..." +kubectl get pvc -n $NAMESPACE +echo "" + +# ===== 6. Test Database Connections ===== +echo "Step 6: Testing database connectivity..." + +# Test PostgreSQL with TLS +echo " Testing PostgreSQL (auth-db) with TLS..." +AUTH_POD=$(kubectl get pods -n $NAMESPACE -l app.kubernetes.io/name=auth-db -o jsonpath='{.items[0].metadata.name}') +if [ -n "$AUTH_POD" ]; then + kubectl exec -n $NAMESPACE "$AUTH_POD" -- \ + sh -c 'psql -U $POSTGRES_USER -d $POSTGRES_DB -c "SELECT version();"' > /dev/null 2>&1 && \ + echo " ✓ PostgreSQL connection successful" || \ + echo " ⚠️ PostgreSQL connection test failed" +else + echo " ⚠️ auth-db pod not found" +fi + +# Test Redis with TLS +echo " Testing Redis with TLS..." +REDIS_POD=$(kubectl get pods -n $NAMESPACE -l app.kubernetes.io/name=redis -o jsonpath='{.items[0].metadata.name}') +if [ -n "$REDIS_POD" ]; then + kubectl exec -n $NAMESPACE "$REDIS_POD" -- \ + redis-cli -a $(kubectl get secret redis-secrets -n $NAMESPACE -o jsonpath='{.data.REDIS_PASSWORD}' | base64 -d) \ + --tls --cert /tls/redis-cert.pem --key /tls/redis-key.pem --cacert /tls/ca-cert.pem \ + PING > /dev/null 2>&1 && \ + echo " ✓ Redis TLS connection successful" || \ + echo " ⚠️ Redis TLS connection test failed (may need to restart services)" +else + echo " ⚠️ Redis pod not found" +fi + +echo "" + +# ===== 7. Verify TLS Certificates ===== +echo "Step 7: Verifying TLS certificates are mounted..." + +echo " Checking PostgreSQL TLS certs..." +if [ -n "$AUTH_POD" ]; then + kubectl exec -n $NAMESPACE "$AUTH_POD" -- ls -la /tls/ 2>/dev/null && \ + echo " ✓ PostgreSQL TLS certificates mounted" || \ + echo " ⚠️ PostgreSQL TLS certificates not found" +fi + +echo " Checking Redis TLS certs..." +if [ -n "$REDIS_POD" ]; then + kubectl exec -n $NAMESPACE "$REDIS_POD" -- ls -la /tls/ 2>/dev/null && \ + echo " ✓ Redis TLS certificates mounted" || \ + echo " ⚠️ Redis TLS certificates not found" +fi + +echo "" + +# ===== 8. Display Summary ===== +echo "======================================" +echo "Deployment Summary" +echo "======================================" +echo "" +echo "Database Pods:" +kubectl get pods -n $NAMESPACE -l app.kubernetes.io/component=database +echo "" +echo "PersistentVolumeClaims:" +kubectl get pvc -n $NAMESPACE | grep -E "NAME|db-pvc" +echo "" +echo "Secrets:" +kubectl get secrets -n $NAMESPACE | grep -E "NAME|database-secrets|redis-secrets|postgres-tls|redis-tls" +echo "" + +echo "======================================" +echo "✓ Security Deployment Complete!" +echo "======================================" +echo "" +echo "Security improvements applied:" +echo " ✅ Strong 32-character passwords for all databases" +echo " ✅ TLS encryption for PostgreSQL connections" +echo " ✅ TLS encryption for Redis connections" +echo " ✅ Persistent storage (PVCs) for all databases" +echo " ✅ pgcrypto extension enabled for column-level encryption" +echo " ✅ PostgreSQL audit logging configured" +echo "" +echo "Next steps:" +echo " 1. Restart all services to pick up new database URLs with TLS" +echo " 2. Monitor logs for any connection issues" +echo " 3. Test application functionality end-to-end" +echo " 4. Review PostgreSQL logs: kubectl logs -n $NAMESPACE " +echo "" +echo "To create encrypted backups, run:" +echo " ./scripts/encrypted-backup.sh" +echo "" +echo "To enable Kubernetes secrets encryption (requires cluster recreate):" +echo " kind delete cluster --name bakery-ia-local" +echo " kind create cluster --config kind-config.yaml" +echo " kubectl apply -f infrastructure/kubernetes/base/namespace.yaml" +echo " ./scripts/apply-security-changes.sh" diff --git a/infrastructure/scripts/maintenance/backup-databases.sh b/infrastructure/scripts/maintenance/backup-databases.sh new file mode 100755 index 00000000..7ab8e90b --- /dev/null +++ b/infrastructure/scripts/maintenance/backup-databases.sh @@ -0,0 +1,161 @@ +#!/bin/bash + +# Database Backup Script for Bakery IA +# This script backs up all PostgreSQL databases in the Kubernetes cluster +# Designed to run on the VPS via cron + +set -e + +# Configuration +BACKUP_ROOT="/backups" +NAMESPACE="bakery-ia" +RETENTION_DAYS=7 +DATE=$(date +%Y-%m-%d_%H-%M-%S) +BACKUP_DIR="${BACKUP_ROOT}/${DATE}" + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +# Logging +log() { + echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1" +} + +log_error() { + echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $1${NC}" +} + +log_success() { + echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] SUCCESS: $1${NC}" +} + +log_warning() { + echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] WARNING: $1${NC}" +} + +# Create backup directory +mkdir -p "$BACKUP_DIR" + +log "Starting database backup to $BACKUP_DIR" + +# Get all database pods +DB_PODS=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/component=database -o jsonpath='{.items[*].metadata.name}') + +if [ -z "$DB_PODS" ]; then + log_error "No database pods found in namespace $NAMESPACE" + exit 1 +fi + +log "Found database pods: $DB_PODS" + +# Backup counter +SUCCESS_COUNT=0 +FAILED_COUNT=0 +FAILED_DBS=() + +# Backup each database +for pod in $DB_PODS; do + log "Backing up database: $pod" + + # Get database name from pod labels + DB_NAME=$(kubectl get pod "$pod" -n "$NAMESPACE" -o jsonpath='{.metadata.labels.app\.kubernetes\.io/name}') + + if [ -z "$DB_NAME" ]; then + DB_NAME=$pod + fi + + BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}.sql" + + # Perform backup + if kubectl exec -n "$NAMESPACE" "$pod" -- pg_dumpall -U postgres > "$BACKUP_FILE" 2>/dev/null; then + FILE_SIZE=$(du -h "$BACKUP_FILE" | cut -f1) + log_success "Backed up $DB_NAME ($FILE_SIZE)" + ((SUCCESS_COUNT++)) + else + log_error "Failed to backup $DB_NAME" + FAILED_DBS+=("$DB_NAME") + ((FAILED_COUNT++)) + rm -f "$BACKUP_FILE" # Remove partial backup + fi +done + +# Also backup Redis if present +REDIS_POD=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=redis -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") + +if [ -n "$REDIS_POD" ]; then + log "Backing up Redis: $REDIS_POD" + REDIS_BACKUP="${BACKUP_DIR}/redis.rdb" + + if kubectl exec -n "$NAMESPACE" "$REDIS_POD" -- redis-cli --rdb /tmp/dump.rdb SAVE > /dev/null 2>&1 && \ + kubectl cp "$NAMESPACE/$REDIS_POD:/tmp/dump.rdb" "$REDIS_BACKUP" > /dev/null 2>&1; then + FILE_SIZE=$(du -h "$REDIS_BACKUP" | cut -f1) + log_success "Backed up Redis ($FILE_SIZE)" + ((SUCCESS_COUNT++)) + else + log_warning "Failed to backup Redis (non-critical)" + fi +fi + +# Create backup metadata +cat > "${BACKUP_DIR}/backup-info.txt" </dev/null) + if [ -n "$pod_name" ]; then + echo "$pod_name" + return 0 + fi + done + + echo "" + return 1 +} + +echo "" +echo -e "${BLUE}Starting database cleanup...${NC}" +echo "" + +for service in "${SERVICES[@]}"; do + echo -e "${BLUE}----------------------------------------${NC}" + echo -e "${BLUE}Cleaning: $service${NC}" + echo -e "${BLUE}----------------------------------------${NC}" + + # Find pod + POD_NAME=$(get_running_pod "$service") + if [ -z "$POD_NAME" ]; then + echo -e "${RED}✗ No running pod found, skipping...${NC}" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (pod not found)") + continue + fi + + echo -e "${GREEN}✓ Found pod: $POD_NAME${NC}" + + # Get database URL environment variable name + db_env_var=$(echo "$service" | tr '[:lower:]-' '[:upper:]_')_DATABASE_URL + CONTAINER="${service}-service" + + # Drop all tables + CLEANUP_RESULT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH python3 << 'EOFPYTHON' +import asyncio +import os +from sqlalchemy.ext.asyncio import create_async_engine +from sqlalchemy import text + +async def cleanup_database(): + try: + engine = create_async_engine(os.getenv('$db_env_var')) + + # Get list of tables before cleanup + async with engine.connect() as conn: + result = await conn.execute(text(\"\"\" + SELECT COUNT(*) + FROM pg_tables + WHERE schemaname = 'public' + \"\"\")) + table_count_before = result.scalar() + + # Drop and recreate public schema + async with engine.begin() as conn: + await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE')) + await conn.execute(text('CREATE SCHEMA public')) + await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC')) + + # Verify cleanup + async with engine.connect() as conn: + result = await conn.execute(text(\"\"\" + SELECT COUNT(*) + FROM pg_tables + WHERE schemaname = 'public' + \"\"\")) + table_count_after = result.scalar() + + await engine.dispose() + print(f'SUCCESS: Dropped {table_count_before} tables, {table_count_after} remaining') + return 0 + except Exception as e: + print(f'ERROR: {str(e)}') + return 1 + +exit(asyncio.run(cleanup_database())) +EOFPYTHON +" 2>&1) + + if echo "$CLEANUP_RESULT" | grep -q "SUCCESS"; then + echo -e "${GREEN}✓ Database cleaned successfully${NC}" + echo -e "${BLUE} $CLEANUP_RESULT${NC}" + SUCCESS_COUNT=$((SUCCESS_COUNT + 1)) + else + echo -e "${RED}✗ Database cleanup failed${NC}" + echo -e "${YELLOW}Error details: $CLEANUP_RESULT${NC}" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (cleanup failed)") + fi + echo "" +done + +# Summary +echo -e "${BLUE}========================================${NC}" +echo -e "${BLUE}Summary${NC}" +echo -e "${BLUE}========================================${NC}" +echo -e "${GREEN}✓ Successfully cleaned: $SUCCESS_COUNT databases${NC}" +echo -e "${RED}✗ Failed: $FAILED_COUNT databases${NC}" + +if [ "$FAILED_COUNT" -gt 0 ]; then + echo "" + echo -e "${RED}Failed services:${NC}" + for failed_service in "${FAILED_SERVICES[@]}"; do + echo -e "${RED} - $failed_service${NC}" + done +fi + +echo "" +if [ "$SUCCESS_COUNT" -gt 0 ]; then + echo -e "${GREEN}Databases are now clean and ready for migration generation!${NC}" + echo -e "${YELLOW}Next step: Run ./regenerate_migrations_k8s.sh${NC}" +fi diff --git a/infrastructure/scripts/maintenance/deploy-production.sh b/infrastructure/scripts/maintenance/deploy-production.sh new file mode 100755 index 00000000..16c2eb79 --- /dev/null +++ b/infrastructure/scripts/maintenance/deploy-production.sh @@ -0,0 +1,190 @@ +#!/bin/bash + +# Production Deployment Script for MicroK8s +# This script helps deploy Bakery IA to a MicroK8s cluster + +set -e + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' # No Color + +echo -e "${GREEN}========================================${NC}" +echo -e "${GREEN}Bakery IA - Production Deployment${NC}" +echo -e "${GREEN}========================================${NC}" +echo "" + +# Configuration +NAMESPACE="bakery-ia" +KUSTOMIZE_PATH="infrastructure/environments/prod/k8s-manifests" + +# Check if kubectl is available +if ! command -v kubectl &> /dev/null; then + echo -e "${RED}Error: kubectl not found. Please install kubectl or setup microk8s alias.${NC}" + exit 1 +fi + +# Function to check if cluster is accessible +check_cluster() { + echo -e "${YELLOW}Checking cluster connectivity...${NC}" + if ! kubectl cluster-info &> /dev/null; then + echo -e "${RED}Error: Cannot connect to Kubernetes cluster.${NC}" + echo "Please ensure your kubeconfig is set correctly." + exit 1 + fi + echo -e "${GREEN}✓ Cluster connection successful${NC}" + echo "" +} + +# Function to check required addons +check_addons() { + echo -e "${YELLOW}Checking required MicroK8s addons...${NC}" + + # Check if this is MicroK8s + if command -v microk8s &> /dev/null; then + REQUIRED_ADDONS=("dns" "hostpath-storage" "ingress" "cert-manager" "metrics-server") + + for addon in "${REQUIRED_ADDONS[@]}"; do + if microk8s status | grep -q "$addon: enabled"; then + echo -e "${GREEN}✓ $addon enabled${NC}" + else + echo -e "${RED}✗ $addon not enabled${NC}" + echo -e "${YELLOW}Enable with: microk8s enable $addon${NC}" + exit 1 + fi + done + else + echo -e "${YELLOW}Not running on MicroK8s. Skipping addon check.${NC}" + fi + echo "" +} + +# Function to create namespace +create_namespace() { + echo -e "${YELLOW}Creating namespace...${NC}" + if kubectl get namespace $NAMESPACE &> /dev/null; then + echo -e "${GREEN}✓ Namespace $NAMESPACE already exists${NC}" + else + kubectl create namespace $NAMESPACE + echo -e "${GREEN}✓ Namespace $NAMESPACE created${NC}" + fi + echo "" +} + +# Function to apply secrets +apply_secrets() { + echo -e "${YELLOW}Applying secrets...${NC}" + echo -e "${RED}WARNING: Ensure production secrets are updated before deployment!${NC}" + read -p "Have you updated production secrets? (yes/no): " confirm + + if [ "$confirm" != "yes" ]; then + echo -e "${RED}Deployment cancelled. Please update secrets first.${NC}" + exit 1 + fi + + kubectl apply -f infrastructure/kubernetes/base/secrets.yaml + kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml + kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml + kubectl apply -f infrastructure/kubernetes/base/secrets/demo-internal-api-key-secret.yaml + echo -e "${GREEN}✓ Secrets applied${NC}" + echo "" +} + +# Function to apply kustomization +deploy_application() { + echo -e "${YELLOW}Deploying application...${NC}" + kubectl apply -k $KUSTOMIZE_PATH + echo -e "${GREEN}✓ Application deployed${NC}" + echo "" +} + +# Function to wait for deployments +wait_for_deployments() { + echo -e "${YELLOW}Waiting for deployments to be ready...${NC}" + echo "This may take several minutes..." + + # Wait for all deployments + kubectl wait --for=condition=available --timeout=600s \ + deployment --all -n $NAMESPACE + + echo -e "${GREEN}✓ All deployments are ready${NC}" + echo "" +} + +# Function to check deployment status +check_status() { + echo -e "${YELLOW}Deployment Status:${NC}" + echo "" + + echo "Pods:" + kubectl get pods -n $NAMESPACE + echo "" + + echo "Services:" + kubectl get svc -n $NAMESPACE + echo "" + + echo "Ingress:" + kubectl get ingress -n $NAMESPACE + echo "" + + echo "Persistent Volume Claims:" + kubectl get pvc -n $NAMESPACE + echo "" + + echo "Certificates:" + kubectl get certificate -n $NAMESPACE + echo "" +} + +# Function to show access information +show_access_info() { + echo -e "${GREEN}========================================${NC}" + echo -e "${GREEN}Deployment Complete!${NC}" + echo -e "${GREEN}========================================${NC}" + echo "" + echo "Access your application at:" + + # Get ingress hosts + HOSTS=$(kubectl get ingress bakery-ingress-prod -n $NAMESPACE -o jsonpath='{.spec.rules[*].host}' 2>/dev/null || echo "") + + if [ -n "$HOSTS" ]; then + for host in $HOSTS; do + echo " https://$host" + done + else + echo " Configure your domain in prod-ingress.yaml" + fi + + echo "" + echo "Useful commands:" + echo " View logs: kubectl logs -f deployment/gateway -n $NAMESPACE" + echo " Check pods: kubectl get pods -n $NAMESPACE" + echo " Check events: kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp'" + echo " Scale: kubectl scale deployment/gateway --replicas=5 -n $NAMESPACE" + echo "" +} + +# Main deployment flow +main() { + check_cluster + check_addons + create_namespace + apply_secrets + deploy_application + + echo -e "${YELLOW}Do you want to wait for deployments to be ready? (yes/no):${NC}" + read -p "> " wait_confirm + + if [ "$wait_confirm" = "yes" ]; then + wait_for_deployments + fi + + check_status + show_access_info +} + +# Run main function +main diff --git a/infrastructure/scripts/maintenance/encrypted-backup.sh b/infrastructure/scripts/maintenance/encrypted-backup.sh new file mode 100755 index 00000000..e202a883 --- /dev/null +++ b/infrastructure/scripts/maintenance/encrypted-backup.sh @@ -0,0 +1,82 @@ +#!/usr/bin/env bash + +# Encrypted PostgreSQL Backup Script +# Creates GPG-encrypted backups of all databases + +set -e + +BACKUP_DIR="${BACKUP_DIR:-/backups}" +BACKUP_DATE=$(date +%Y%m%d-%H%M%S) +GPG_RECIPIENT="${GPG_RECIPIENT:-backup@bakery-ia.com}" +NAMESPACE="${NAMESPACE:-bakery-ia}" + +# Database list +DATABASES=( + "auth-db" + "tenant-db" + "training-db" + "forecasting-db" + "sales-db" + "external-db" + "notification-db" + "inventory-db" + "recipes-db" + "suppliers-db" + "pos-db" + "orders-db" + "production-db" + "alert-processor-db" +) + +echo "Starting encrypted backup process..." +echo "Backup date: $BACKUP_DATE" +echo "Backup directory: $BACKUP_DIR" +echo "Namespace: $NAMESPACE" +echo "" + +# Create backup directory if it doesn't exist +mkdir -p "$BACKUP_DIR" + +for db in "${DATABASES[@]}"; do + echo "Backing up $db..." + + # Get pod name + POD=$(kubectl get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=$db" -o jsonpath='{.items[0].metadata.name}') + + if [ -z "$POD" ]; then + echo " ⚠️ Warning: Pod not found for $db, skipping" + continue + fi + + # Extract database name from environment + DB_NAME=$(kubectl exec -n "$NAMESPACE" "$POD" -- sh -c 'echo $POSTGRES_DB') + DB_USER=$(kubectl exec -n "$NAMESPACE" "$POD" -- sh -c 'echo $POSTGRES_USER') + + # Create backup file name + BACKUP_FILE="$BACKUP_DIR/${db}_${DB_NAME}_${BACKUP_DATE}.sql.gz.gpg" + + # Perform backup with pg_dump, compress with gzip, encrypt with GPG + kubectl exec -n "$NAMESPACE" "$POD" -- \ + sh -c "pg_dump -U $DB_USER -d $DB_NAME" | \ + gzip | \ + gpg --encrypt --recipient "$GPG_RECIPIENT" --trust-model always > "$BACKUP_FILE" + + # Get file size + SIZE=$(du -h "$BACKUP_FILE" | cut -f1) + + echo " ✓ Backup complete: $BACKUP_FILE ($SIZE)" +done + +echo "" +echo "====================" +echo "✓ Backup process completed!" +echo "" +echo "Total backups created: ${#DATABASES[@]}" +echo "Backup location: $BACKUP_DIR" +echo "Backup date: $BACKUP_DATE" +echo "" +echo "To decrypt a backup:" +echo " gpg --decrypt backup_file.sql.gz.gpg | gunzip > backup.sql" +echo "" +echo "To restore a backup:" +echo " gpg --decrypt backup_file.sql.gz.gpg | gunzip | psql -U user -d database" diff --git a/infrastructure/kubernetes/fix-otel-endpoints.sh b/infrastructure/scripts/maintenance/fix-otel-endpoints.sh similarity index 95% rename from infrastructure/kubernetes/fix-otel-endpoints.sh rename to infrastructure/scripts/maintenance/fix-otel-endpoints.sh index 064c5923..c4118e06 100755 --- a/infrastructure/kubernetes/fix-otel-endpoints.sh +++ b/infrastructure/scripts/maintenance/fix-otel-endpoints.sh @@ -56,5 +56,5 @@ echo -e "${GREEN}✓ All service files processed!${NC}" echo "" echo "Next steps:" echo "1. Review changes: git diff infrastructure/kubernetes/base/components" -echo "2. Apply changes: kubectl apply -k infrastructure/kubernetes/overlays/dev" +echo "2. Apply changes: kubectl apply -k infrastructure/environments/dev/k8s-manifests" echo "3. Restart services: kubectl rollout restart deployment -n bakery-ia --all" diff --git a/infrastructure/scripts/maintenance/generate-passwords.sh b/infrastructure/scripts/maintenance/generate-passwords.sh new file mode 100755 index 00000000..6b438054 --- /dev/null +++ b/infrastructure/scripts/maintenance/generate-passwords.sh @@ -0,0 +1,58 @@ +#!/usr/bin/env bash + +# Script to generate cryptographically secure passwords for all databases +# Generates 32-character random passwords using openssl + +set -e + +echo "Generating secure passwords for all databases..." +echo "" + +# Generate password function +generate_password() { + openssl rand -base64 32 | tr -d "=+/" | cut -c1-32 +} + +# Generate passwords for all services +SERVICES=( + "AUTH_DB_PASSWORD" + "TRAINING_DB_PASSWORD" + "FORECASTING_DB_PASSWORD" + "SALES_DB_PASSWORD" + "EXTERNAL_DB_PASSWORD" + "TENANT_DB_PASSWORD" + "NOTIFICATION_DB_PASSWORD" + "ALERT_PROCESSOR_DB_PASSWORD" + "INVENTORY_DB_PASSWORD" + "RECIPES_DB_PASSWORD" + "SUPPLIERS_DB_PASSWORD" + "POS_DB_PASSWORD" + "ORDERS_DB_PASSWORD" + "PRODUCTION_DB_PASSWORD" + "REDIS_PASSWORD" +) + +echo "Generated Passwords:" +echo "====================" +echo "" + +count=0 +for service in "${SERVICES[@]}"; do + password=$(generate_password) + echo "$service=$password" + count=$((count + 1)) +done + +echo "" +echo "====================" +echo "" +echo "Passwords generated successfully!" +echo "Total: $count passwords" +echo "" +echo "Next steps:" +echo "1. Update .env file with these passwords" +echo "2. Update infrastructure/kubernetes/base/secrets.yaml with base64-encoded passwords" +echo "3. Apply new secrets to Kubernetes cluster" +echo "" +echo "To base64 encode a password:" +echo " echo -n 'password' | base64" diff --git a/infrastructure/scripts/maintenance/generate-test-traffic.sh b/infrastructure/scripts/maintenance/generate-test-traffic.sh new file mode 100755 index 00000000..c896d042 --- /dev/null +++ b/infrastructure/scripts/maintenance/generate-test-traffic.sh @@ -0,0 +1,141 @@ +#!/bin/bash + +# Generate Test Traffic to Services +# This script generates API calls to verify telemetry data collection + +set -e + +NAMESPACE="bakery-ia" +GREEN='\033[0;32m' +BLUE='\033[0;34m' +YELLOW='\033[1;33m' +NC='\033[0m' + +echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" +echo -e "${BLUE} Generating Test Traffic for SigNoz Verification${NC}" +echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" +echo "" + +# Check if ingress is accessible +echo -e "${BLUE}Step 1: Verifying Gateway Access${NC}" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + +GATEWAY_POD=$(kubectl get pods -n $NAMESPACE -l app=gateway --field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) +if [[ -z "$GATEWAY_POD" ]]; then + echo -e "${YELLOW}⚠ Gateway pod not running. Starting port-forward...${NC}" + # Port forward in background + kubectl port-forward -n $NAMESPACE svc/gateway-service 8000:8000 & + PORT_FORWARD_PID=$! + sleep 3 + API_URL="http://localhost:8000" +else + echo -e "${GREEN}✓ Gateway is running: $GATEWAY_POD${NC}" + # Use internal service + API_URL="http://gateway-service.$NAMESPACE.svc.cluster.local:8000" +fi +echo "" + +# Function to make API call from inside cluster +make_request() { + local endpoint=$1 + local description=$2 + + echo -e "${BLUE}→ Testing: $description${NC}" + echo " Endpoint: $endpoint" + + if [[ -n "$GATEWAY_POD" ]]; then + # Make request from inside the gateway pod + RESPONSE=$(kubectl exec -n $NAMESPACE $GATEWAY_POD -- curl -s -w "\nHTTP_CODE:%{http_code}" "$API_URL$endpoint" 2>/dev/null || echo "FAILED") + else + # Make request from localhost + RESPONSE=$(curl -s -w "\nHTTP_CODE:%{http_code}" "$API_URL$endpoint" 2>/dev/null || echo "FAILED") + fi + + if [[ "$RESPONSE" == "FAILED" ]]; then + echo -e " ${YELLOW}⚠ Request failed${NC}" + else + HTTP_CODE=$(echo "$RESPONSE" | grep "HTTP_CODE" | cut -d: -f2) + if [[ "$HTTP_CODE" == "200" ]] || [[ "$HTTP_CODE" == "401" ]] || [[ "$HTTP_CODE" == "404" ]]; then + echo -e " ${GREEN}✓ Response received (HTTP $HTTP_CODE)${NC}" + else + echo -e " ${YELLOW}⚠ Unexpected response (HTTP $HTTP_CODE)${NC}" + fi + fi + echo "" + sleep 1 +} + +# Generate traffic to various endpoints +echo -e "${BLUE}Step 2: Generating Traffic to Services${NC}" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "" + +# Health checks (should generate traces) +make_request "/health" "Gateway Health Check" +make_request "/api/health" "API Health Check" + +# Auth service endpoints +make_request "/api/auth/health" "Auth Service Health" + +# Tenant service endpoints +make_request "/api/tenants/health" "Tenant Service Health" + +# Inventory service endpoints +make_request "/api/inventory/health" "Inventory Service Health" + +# Orders service endpoints +make_request "/api/orders/health" "Orders Service Health" + +# Forecasting service endpoints +make_request "/api/forecasting/health" "Forecasting Service Health" + +echo -e "${BLUE}Step 3: Checking Service Logs for Telemetry${NC}" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "" + +# Check a few service pods for tracing logs +SERVICES=("auth-service" "inventory-service" "gateway") + +for service in "${SERVICES[@]}"; do + POD=$(kubectl get pods -n $NAMESPACE -l app=$service --field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) + if [[ -n "$POD" ]]; then + echo -e "${BLUE}Checking $service ($POD)...${NC}" + TRACING_LOG=$(kubectl logs -n $NAMESPACE $POD --tail=100 2>/dev/null | grep -i "tracing\|otel" | head -n 2 || echo "") + if [[ -n "$TRACING_LOG" ]]; then + echo -e "${GREEN}✓ Tracing configured:${NC}" + echo "$TRACING_LOG" | sed 's/^/ /' + else + echo -e "${YELLOW}⚠ No tracing logs found${NC}" + fi + echo "" + fi +done + +# Wait for data to be processed +echo -e "${BLUE}Step 4: Waiting for Data Processing${NC}" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "Waiting 30 seconds for telemetry data to be processed..." +for i in {30..1}; do + echo -ne "\r ${i} seconds remaining..." + sleep 1 +done +echo -e "\n" + +# Cleanup port-forward if started +if [[ -n "$PORT_FORWARD_PID" ]]; then + kill $PORT_FORWARD_PID 2>/dev/null || true +fi + +echo -e "${GREEN}✓ Test traffic generation complete!${NC}" +echo "" +echo -e "${BLUE}Next Steps:${NC}" +echo "1. Run the verification script to check for collected data:" +echo " ./infrastructure/helm/verify-signoz-telemetry.sh" +echo "" +echo "2. Access SigNoz UI to visualize the data:" +echo " https://monitoring.bakery-ia.local" +echo " or" +echo " kubectl port-forward -n bakery-ia svc/signoz 3301:8080" +echo " Then go to: http://localhost:3301" +echo "" +echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" diff --git a/infrastructure/scripts/maintenance/generate_subscription_test_report.sh b/infrastructure/scripts/maintenance/generate_subscription_test_report.sh new file mode 100755 index 00000000..2e78f17f --- /dev/null +++ b/infrastructure/scripts/maintenance/generate_subscription_test_report.sh @@ -0,0 +1,129 @@ +#!/bin/bash + +# Script to generate a comprehensive test report for the subscription creation flow +# This script checks all components and generates a detailed report + +echo "📊 Generating Subscription Creation Flow Test Report" +echo "====================================================" +echo "Report generated on: $(date)" +echo "" + +# Test 1: Check if database migration was applied +echo "🔍 Test 1: Database Migration Check" +echo "-----------------------------------" +POD_NAME=$(kubectl get pods -n bakery-ia -l app=auth-service -o jsonpath='{.items[0].metadata.name}') +MIGRATION_STATUS=$(kubectl exec -n bakery-ia $POD_NAME -- psql -U auth_user -d auth_db -c "SELECT version_num FROM alembic_version" -t -A) + +if [[ "$MIGRATION_STATUS" == "20260113_add_payment_columns" ]]; then + echo "✅ PASS: Database migration '20260113_add_payment_columns' is applied" +else + echo "❌ FAIL: Database migration not found. Current version: $MIGRATION_STATUS" +fi +echo "" + +# Test 2: Check if payment columns exist in users table +echo "🔍 Test 2: Payment Columns in Users Table" +echo "------------------------------------------" +COLUMNS=$(kubectl exec -n bakery-ia $POD_NAME -- psql -U auth_user -d auth_db -c "\d users" -t -A | grep -E "payment_customer_id|default_payment_method_id") + +if [[ -n "$COLUMNS" ]]; then + echo "✅ PASS: Payment columns found in users table" + echo " Found columns:" + echo " $COLUMNS" | sed 's/^/ /' +else + echo "❌ FAIL: Payment columns not found in users table" +fi +echo "" + +# Test 3: Check if gateway route exists +echo "🔍 Test 3: Gateway Route Configuration" +echo "--------------------------------------" +GATEWAY_POD=$(kubectl get pods -n bakery-ia -l app=gateway -o jsonpath='{.items[0].metadata.name}') +ROUTE_CHECK=$(kubectl exec -n bakery-ia $GATEWAY_POD -- grep -c "create-for-registration" /app/app/routes/subscription.py) + +if [[ "$ROUTE_CHECK" -gt 0 ]]; then + echo "✅ PASS: Gateway route for 'create-for-registration' is configured" +else + echo "❌ FAIL: Gateway route for 'create-for-registration' not found" +fi +echo "" + +# Test 4: Check if tenant service endpoint exists +echo "🔍 Test 4: Tenant Service Endpoint" +echo "-----------------------------------" +TENANT_POD=$(kubectl get pods -n bakery-ia -l app=tenant-service -o jsonpath='{.items[0].metadata.name}') +ENDPOINT_CHECK=$(kubectl exec -n bakery-ia $TENANT_POD -- grep -c "create-for-registration" /app/app/api/subscription.py) + +if [[ "$ENDPOINT_CHECK" -gt 0 ]]; then + echo "✅ PASS: Tenant service endpoint 'create-for-registration' is configured" +else + echo "❌ FAIL: Tenant service endpoint 'create-for-registration' not found" +fi +echo "" + +# Test 5: Test user registration (create a test user) +echo "🔍 Test 5: User Registration Test" +echo "--------------------------------" +TEST_EMAIL="test_$(date +%Y%m%d%H%M%S)@example.com" +REGISTRATION_RESPONSE=$(curl -X POST "https://bakery-ia.local/api/v1/auth/register-with-subscription" \ + -H "Content-Type: application/json" \ + -H "Accept: application/json" \ + -d "{\"email\":\"$TEST_EMAIL\",\"password\":\"SecurePassword123!\",\"full_name\":\"Test User\",\"subscription_plan\":\"basic\",\"payment_method_id\":\"pm_test123\"}" \ + -k -s) + +if echo "$REGISTRATION_RESPONSE" | grep -q "access_token"; then + echo "✅ PASS: User registration successful" + USER_ID=$(echo "$REGISTRATION_RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['user']['id'])") + echo " Created user ID: $USER_ID" +else + echo "❌ FAIL: User registration failed" + echo " Response: $REGISTRATION_RESPONSE" +fi +echo "" + +# Test 6: Check if user has payment fields +echo "🔍 Test 6: User Payment Fields" +echo "------------------------------" +if [[ -n "$USER_ID" ]]; then + USER_DATA=$(kubectl exec -n bakery-ia $POD_NAME -- psql -U auth_user -d auth_db -c "SELECT payment_customer_id, default_payment_method_id FROM users WHERE id = '$USER_ID'" -t -A) + + if [[ -n "$USER_DATA" ]]; then + echo "✅ PASS: User has payment fields in database" + echo " Payment data: $USER_DATA" + else + echo "❌ FAIL: User payment fields not found" + fi +else + echo "⚠️ SKIP: User ID not available from previous test" +fi +echo "" + +# Test 7: Check subscription creation in onboarding progress +echo "🔍 Test 7: Subscription in Onboarding Progress" +echo "---------------------------------------------" +if [[ -n "$USER_ID" ]]; then + # This would require authentication, so we'll skip for now + echo "⚠️ SKIP: Requires authentication (would need to implement token handling)" +else + echo "⚠️ SKIP: User ID not available from previous test" +fi +echo "" + +# Summary +echo "📋 Test Summary" +echo "===============" +echo "The subscription creation flow test report has been generated." +echo "" +echo "Components tested:" +echo " 1. Database migration" +echo " 2. Payment columns in users table" +echo " 3. Gateway route configuration" +echo " 4. Tenant service endpoint" +echo " 5. User registration" +echo " 6. User payment fields" +echo " 7. Subscription in onboarding progress" +echo "" +echo "For a complete integration test, run:" +echo " ./scripts/run_subscription_integration_test.sh" +echo "" +echo "🎉 Report generation completed!" \ No newline at end of file diff --git a/infrastructure/scripts/maintenance/kubernetes_restart.sh b/infrastructure/scripts/maintenance/kubernetes_restart.sh new file mode 100755 index 00000000..2a4540de --- /dev/null +++ b/infrastructure/scripts/maintenance/kubernetes_restart.sh @@ -0,0 +1,369 @@ +#!/bin/bash + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Function to print colored output +print_status() { + echo -e "${BLUE}[INFO]${NC} $1" +} + +print_success() { + echo -e "${GREEN}[SUCCESS]${NC} $1" +} + +print_warning() { + echo -e "${YELLOW}[WARNING]${NC} $1" +} + +print_error() { + echo -e "${RED}[ERROR]${NC} $1" +} + +# Function to wait for pods with retry logic +wait_for_pods() { + local namespace=$1 + local selector=$2 + local timeout=$3 + local max_retries=30 + local retry_count=0 + + print_status "Waiting for pods with selector '$selector' in namespace '$namespace'..." + + while [ $retry_count -lt $max_retries ]; do + # Check if any pods exist first + if kubectl get pods -n "$namespace" --selector="$selector" 2>/dev/null | grep -v "No resources found" | grep -v "NAME" > /dev/null; then + # Pods exist, now wait for them to be ready + if kubectl wait --namespace "$namespace" \ + --for=condition=ready pod \ + --selector="$selector" \ + --timeout="${timeout}s" 2>/dev/null; then + print_success "Pods are ready" + return 0 + fi + fi + + retry_count=$((retry_count + 1)) + print_status "Waiting for pods to be created... (attempt $retry_count/$max_retries)" + sleep 5 + done + + print_error "Timed out waiting for pods after $((max_retries * 5)) seconds" + return 1 +} + +# Function to handle cleanup +cleanup() { + print_status "Starting cleanup process..." + + # Delete Kubernetes namespace with timeout + print_status "Deleting namespace bakery-ia..." + if kubectl get namespace bakery-ia &>/dev/null; then + kubectl delete namespace bakery-ia 2>/dev/null & + PID=$! + sleep 2 + if ps -p $PID &>/dev/null; then + print_warning "kubectl delete namespace command taking too long, forcing termination..." + kill $PID 2>/dev/null + fi + print_success "Namespace deletion attempted" + else + print_status "Namespace bakery-ia not found" + fi + + # Delete Kind cluster + print_status "Deleting Kind cluster..." + if kind get clusters | grep -q "bakery-ia-local"; then + kind delete cluster --name bakery-ia-local + print_success "Kind cluster deleted" + else + print_status "Kind cluster bakery-ia-local not found" + fi + + # Stop local registry + print_status "Stopping local registry..." + if docker ps -a | grep -q "kind-registry"; then + docker stop kind-registry 2>/dev/null || true + docker rm kind-registry 2>/dev/null || true + print_success "Local registry removed" + else + print_status "Local registry not found" + fi + + # Stop Colima + print_status "Stopping Colima..." + if colima list | grep -q "k8s-local"; then + colima stop --profile k8s-local + print_success "Colima stopped" + else + print_status "Colima profile k8s-local not found" + fi + + print_success "Cleanup completed!" + echo "----------------------------------------" +} + +# Function to check for required configuration files +check_config_files() { + print_status "Checking for required configuration files..." + + # Check for kind-config.yaml + if [ ! -f kind-config.yaml ]; then + print_error "kind-config.yaml not found in current directory!" + print_error "Please ensure kind-config.yaml exists with your cluster configuration." + exit 1 + fi + + # Check for encryption directory if referenced in config + if grep -q "infrastructure/kubernetes/encryption" kind-config.yaml; then + if [ ! -d "./infrastructure/kubernetes/encryption" ]; then + print_warning "Encryption directory './infrastructure/kubernetes/encryption' not found" + print_warning "Some encryption configurations may not work properly" + fi + fi + + print_success "Configuration files check completed" +} + +# Function to create local registry +create_local_registry() { + local reg_name='kind-registry' + local reg_port='5001' + + print_status "Setting up local Docker registry..." + + # Create registry container unless it already exists + if [ "$(docker inspect -f '{{.State.Running}}' "${reg_name}" 2>/dev/null || true)" != 'true' ]; then + print_status "Creating registry container on port ${reg_port}..." + docker run \ + -d --restart=always \ + -p "127.0.0.1:${reg_port}:5000" \ + --name "${reg_name}" \ + registry:2 + + if [ $? -eq 0 ]; then + print_success "Local registry created at localhost:${reg_port}" + else + print_error "Failed to create local registry" + exit 1 + fi + else + print_success "Local registry already running at localhost:${reg_port}" + fi + + # Store registry info for later use + echo "${reg_name}:${reg_port}" +} + +# Function to connect registry to Kind +connect_registry_to_kind() { + local reg_name='kind-registry' + local reg_port='5001' + + print_status "Connecting registry to Kind network..." + + # Connect the registry to the cluster network if not already connected + if [ "$(docker inspect -f='{{json .NetworkSettings.Networks.kind}}' "${reg_name}")" = 'null' ]; then + docker network connect "kind" "${reg_name}" + print_success "Registry connected to Kind network" + else + print_success "Registry already connected to Kind network" + fi + + # Configure containerd in the Kind node to use the registry + print_status "Configuring containerd to use local registry..." + + # Create the registry config directory + docker exec bakery-ia-local-control-plane mkdir -p /etc/containerd/certs.d/localhost:${reg_port} + + # Add registry configuration + docker exec bakery-ia-local-control-plane sh -c "cat > /etc/containerd/certs.d/localhost:${reg_port}/hosts.toml < Kind NodePort 30080" + echo " - HTTPS Ingress: localhost:${HTTPS_HOST_PORT} -> Kind NodePort 30443" + echo " - Frontend Direct: localhost:3000 -> container:30300" + echo " - Gateway Direct: localhost:8000 -> container:30800" + echo "" + print_status "How to access your application:" + echo " 1. Start Tilt: tilt up" + echo " 2. Access via:" + echo " - Ingress: http://localhost (or https://localhost)" + echo " - Direct: http://localhost:3000 (frontend), http://localhost:8000 (gateway)" + echo " - Tilt UI: http://localhost:10350" + echo "----------------------------------------" + print_status "Local Registry Information:" + echo " - Registry URL: localhost:5001" + echo " - Images pushed to: localhost:5001/bakery/" + echo " - Tiltfile already configured: default_registry('localhost:5001')" + echo "----------------------------------------" +} + +# Function to show usage +usage() { + echo "Usage: $0 [option]" + echo "" + echo "Options:" + echo " cleanup Clean up all resources (namespace, cluster, colima)" + echo " setup Set up the complete environment" + echo " full Clean up first, then set up (default)" + echo " help Show this help message" + echo "" + echo "Requirements:" + echo " - kind-config.yaml must exist in current directory" + echo " - For encryption: ./infrastructure/kubernetes/encryption directory" +} + +# Main script logic +case "${1:-full}" in + "cleanup") + cleanup + ;; + "setup") + setup + ;; + "full") + cleanup + setup + ;; + "help"|"-h"|"--help") + usage + ;; + *) + print_warning "Unknown option: $1" + echo "" + usage + exit 1 + ;; +esac \ No newline at end of file diff --git a/infrastructure/scripts/maintenance/regenerate_all_migrations.sh b/infrastructure/scripts/maintenance/regenerate_all_migrations.sh new file mode 100755 index 00000000..a58bdcf1 --- /dev/null +++ b/infrastructure/scripts/maintenance/regenerate_all_migrations.sh @@ -0,0 +1,166 @@ +#!/bin/bash + +# Convenience script: Clean databases and regenerate all migrations in one command +# This wraps the two-step process into a single workflow + +set -euo pipefail + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Configuration +NAMESPACE="${KUBE_NAMESPACE:-bakery-ia}" +SKIP_CONFIRMATION=false +APPLY_MIGRATIONS=false +VERBOSE=false + +# Parse command line arguments +while [[ $# -gt 0 ]]; do + case $1 in + --namespace) NAMESPACE="$2"; shift 2 ;; + --yes) SKIP_CONFIRMATION=true; shift ;; + --apply) APPLY_MIGRATIONS=true; shift ;; + --verbose) VERBOSE=true; shift ;; + -h|--help) + echo "Usage: $0 [OPTIONS]" + echo "" + echo "This script performs the complete migration regeneration workflow:" + echo " 1. Clean all service databases" + echo " 2. Generate new migrations from models" + echo " 3. Optionally apply migrations" + echo "" + echo "Options:" + echo " --namespace NAME Use specific Kubernetes namespace (default: bakery-ia)" + echo " --yes Skip all confirmation prompts" + echo " --apply Apply migrations after generation" + echo " --verbose Enable detailed logging" + echo "" + echo "Examples:" + echo " $0 # Interactive mode (with confirmations)" + echo " $0 --yes --verbose # Automated mode with detailed output" + echo " $0 --apply # Generate and apply migrations" + exit 0 + ;; + *) echo "Unknown option: $1"; echo "Use --help for usage information"; exit 1 ;; + esac +done + +echo -e "${BLUE}========================================${NC}" +echo -e "${BLUE}Complete Migration Regeneration Workflow${NC}" +echo -e "${BLUE}========================================${NC}" +echo "" +echo -e "${YELLOW}This script will:${NC}" +echo -e "${YELLOW} 1. Clean all service databases (DROP all tables)${NC}" +echo -e "${YELLOW} 2. Generate new migrations from models${NC}" +if [ "$APPLY_MIGRATIONS" = true ]; then + echo -e "${YELLOW} 3. Apply migrations to databases${NC}" +fi +echo "" +echo -e "${YELLOW}Namespace: $NAMESPACE${NC}" +echo "" + +if [ "$SKIP_CONFIRMATION" = false ]; then + echo -e "${RED}⚠ WARNING: This will DROP ALL TABLES in all service databases!${NC}" + echo "" + read -p "Continue? (yes/no) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo -e "${YELLOW}Aborted.${NC}" + exit 0 + fi +fi + +echo "" +echo -e "${BLUE}========================================${NC}" +echo -e "${BLUE}Step 1: Cleaning Databases${NC}" +echo -e "${BLUE}========================================${NC}" +echo "" + +# Build cleanup command +CLEANUP_CMD="./cleanup_databases_k8s.sh --namespace $NAMESPACE" +if [ "$SKIP_CONFIRMATION" = true ]; then + CLEANUP_CMD="$CLEANUP_CMD --yes" +fi + +# Run cleanup +if ! $CLEANUP_CMD; then + echo -e "${RED}✗ Database cleanup failed!${NC}" + echo -e "${YELLOW}Cannot proceed with migration generation.${NC}" + exit 1 +fi + +echo "" +echo -e "${GREEN}✓ Database cleanup completed${NC}" +echo "" + +# Wait a moment for database connections to settle +sleep 2 + +echo -e "${BLUE}========================================${NC}" +echo -e "${BLUE}Step 2: Generating Migrations${NC}" +echo -e "${BLUE}========================================${NC}" +echo "" + +# Build migration command +MIGRATION_CMD="./regenerate_migrations_k8s.sh --namespace $NAMESPACE" +if [ "$VERBOSE" = true ]; then + MIGRATION_CMD="$MIGRATION_CMD --verbose" +fi +if [ "$APPLY_MIGRATIONS" = true ]; then + MIGRATION_CMD="$MIGRATION_CMD --apply" +fi + +# Run migration generation (with automatic 'y' response) +if [ "$SKIP_CONFIRMATION" = true ]; then + echo "y" | $MIGRATION_CMD +else + $MIGRATION_CMD +fi + +MIGRATION_EXIT_CODE=$? + +echo "" +if [ $MIGRATION_EXIT_CODE -eq 0 ]; then + echo -e "${GREEN}========================================${NC}" + echo -e "${GREEN}✓ Workflow Completed Successfully!${NC}" + echo -e "${GREEN}========================================${NC}" + echo "" + echo -e "${YELLOW}Summary:${NC}" + echo -e " ${GREEN}✓${NC} Databases cleaned" + echo -e " ${GREEN}✓${NC} Migrations generated" + if [ "$APPLY_MIGRATIONS" = true ]; then + echo -e " ${GREEN}✓${NC} Migrations applied" + fi + echo "" + echo -e "${YELLOW}Generated migration files:${NC}" + find services/*/migrations/versions/ -name "*.py" -type f -mmin -5 2>/dev/null | while read file; do + size=$(wc -c < "$file" | tr -d ' ') + echo -e " ${GREEN}✓${NC} $file ($size bytes)" + done + echo "" + echo -e "${YELLOW}Next steps:${NC}" + echo -e " 1. Review the generated migrations above" + echo -e " 2. Verify migrations contain schema operations (not just 'pass')" + if [ "$APPLY_MIGRATIONS" = false ]; then + echo -e " 3. Apply migrations: ./regenerate_migrations_k8s.sh --apply" + echo -e " 4. Commit migrations: git add services/*/migrations/versions/*.py" + else + echo -e " 3. Commit migrations: git add services/*/migrations/versions/*.py" + fi +else + echo -e "${RED}========================================${NC}" + echo -e "${RED}✗ Migration Generation Failed${NC}" + echo -e "${RED}========================================${NC}" + echo "" + echo -e "${YELLOW}Check the log file for details.${NC}" + echo -e "${YELLOW}Common issues:${NC}" + echo -e " - Pod not running for some services" + echo -e " - Database connectivity issues" + echo -e " - Model import errors" + echo "" + exit 1 +fi diff --git a/infrastructure/scripts/maintenance/regenerate_migrations_k8s.sh b/infrastructure/scripts/maintenance/regenerate_migrations_k8s.sh new file mode 100755 index 00000000..52bd7587 --- /dev/null +++ b/infrastructure/scripts/maintenance/regenerate_migrations_k8s.sh @@ -0,0 +1,796 @@ +#!/bin/bash + +# Script to regenerate Alembic migrations using Kubernetes local dev environment +# This script backs up existing migrations and generates new ones based on current models + +set -euo pipefail # Exit on error, undefined variables, and pipe failures + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Configuration +NAMESPACE="${KUBE_NAMESPACE:-bakery-ia}" +LOG_FILE="migration_script_$(date +%Y%m%d_%H%M%S).log" +BACKUP_RETENTION_DAYS=7 +CONTAINER_SUFFIX="service" # Default container name suffix (e.g., pos-service) + +# Parse command line arguments +DRY_RUN=false +SKIP_BACKUP=false +APPLY_MIGRATIONS=false +CHECK_EXISTING=false +VERBOSE=false +SKIP_DB_CHECK=false + +while [[ $# -gt 0 ]]; do + case $1 in + --dry-run) DRY_RUN=true; shift ;; + --skip-backup) SKIP_BACKUP=true; shift ;; + --apply) APPLY_MIGRATIONS=true; shift ;; + --check-existing) CHECK_EXISTING=true; shift ;; + --verbose) VERBOSE=true; shift ;; + --skip-db-check) SKIP_DB_CHECK=true; shift ;; + --namespace) NAMESPACE="$2"; shift 2 ;; + -h|--help) + echo "Usage: $0 [OPTIONS]" + echo "" + echo "Options:" + echo " --dry-run Show what would be done without making changes" + echo " --skip-backup Skip backing up existing migrations" + echo " --apply Automatically apply migrations after generation" + echo " --check-existing Check for and copy existing migrations from pods first" + echo " --verbose Enable detailed logging" + echo " --skip-db-check Skip database connectivity check" + echo " --namespace NAME Use specific Kubernetes namespace (default: bakery-ia)" + echo "" + echo "Examples:" + echo " $0 --namespace dev --dry-run # Simulate migration regeneration" + echo " $0 --apply --verbose # Generate and apply migrations with detailed logs" + echo " $0 --skip-db-check # Skip database connectivity check" + exit 0 + ;; + *) echo "Unknown option: $1"; echo "Use --help for usage information"; exit 1 ;; + esac +done + +# List of all services +SERVICES=( + "pos" "sales" "recipes" "training" "auth" "orders" "inventory" + "suppliers" "tenant" "notification" "alert-processor" "forecasting" + "external" "production" "demo-session" +) + +# Backup directory +BACKUP_DIR="migrations_backup_$(date +%Y%m%d_%H%M%S)" +mkdir -p "$BACKUP_DIR" + +# Initialize log file +touch "$LOG_FILE" +echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting migration regeneration" >> "$LOG_FILE" + +# Function to perform pre-flight checks +preflight_checks() { + echo -e "${BLUE}========================================${NC}" + echo -e "${BLUE}Pre-flight Checks${NC}" + echo -e "${BLUE}========================================${NC}" + echo "" + + local checks_passed=true + + # Check kubectl + echo -e "${YELLOW}Checking kubectl...${NC}" + if ! command -v kubectl &> /dev/null; then + echo -e "${RED}✗ kubectl not found${NC}" + log_message "ERROR" "kubectl not found" + checks_passed=false + else + KUBECTL_VERSION=$(kubectl version --client --short 2>/dev/null | grep -oP 'v\d+\.\d+\.\d+' || echo "unknown") + echo -e "${GREEN}✓ kubectl found (version: $KUBECTL_VERSION)${NC}" + fi + + # Check cluster connectivity + echo -e "${YELLOW}Checking Kubernetes cluster connectivity...${NC}" + if ! kubectl cluster-info &> /dev/null; then + echo -e "${RED}✗ Cannot connect to Kubernetes cluster${NC}" + log_message "ERROR" "Cannot connect to Kubernetes cluster" + checks_passed=false + else + CLUSTER_NAME=$(kubectl config current-context 2>/dev/null || echo "unknown") + echo -e "${GREEN}✓ Connected to cluster: $CLUSTER_NAME${NC}" + fi + + # Check namespace exists + echo -e "${YELLOW}Checking namespace '$NAMESPACE'...${NC}" + if ! kubectl get namespace "$NAMESPACE" &> /dev/null; then + echo -e "${RED}✗ Namespace '$NAMESPACE' not found${NC}" + log_message "ERROR" "Namespace '$NAMESPACE' not found" + checks_passed=false + else + echo -e "${GREEN}✓ Namespace exists${NC}" + fi + + # Check if all service pods are running + echo -e "${YELLOW}Checking service pods...${NC}" + local pods_found=0 + local pods_running=0 + for service in "${SERVICES[@]}"; do + local pod_name=$(kubectl get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=${service}-service" --field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") + if [ -n "$pod_name" ]; then + pods_found=$((pods_found + 1)) + pods_running=$((pods_running + 1)) + fi + done + echo -e "${GREEN}✓ Found $pods_running/${#SERVICES[@]} service pods running${NC}" + + if [ $pods_running -lt ${#SERVICES[@]} ]; then + echo -e "${YELLOW}⚠ Not all service pods are running${NC}" + echo -e "${YELLOW} Missing services will be skipped${NC}" + fi + + # Check database connectivity for running services + echo -e "${YELLOW}Checking database connectivity (sample)...${NC}" + local sample_service="auth" + local sample_pod=$(kubectl get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=${sample_service}-service" --field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "") + + if [ -n "$sample_pod" ]; then + local db_check=$(kubectl exec -n "$NAMESPACE" "$sample_pod" -c "${sample_service}-service" -- sh -c "python3 -c 'import asyncpg; print(\"OK\")' 2>/dev/null" || echo "FAIL") + if [ "$db_check" = "OK" ]; then + echo -e "${GREEN}✓ Database drivers available (asyncpg)${NC}" + else + echo -e "${YELLOW}⚠ Database driver check failed${NC}" + fi + else + echo -e "${YELLOW}⚠ Cannot check database connectivity (no sample pod running)${NC}" + fi + + # Check local directory structure + echo -e "${YELLOW}Checking local directory structure...${NC}" + local dirs_found=0 + for service in "${SERVICES[@]}"; do + local service_dir=$(echo "$service" | tr '-' '_') + if [ -d "services/$service_dir/migrations" ]; then + dirs_found=$((dirs_found + 1)) + fi + done + echo -e "${GREEN}✓ Found $dirs_found/${#SERVICES[@]} service migration directories${NC}" + + # Check disk space + echo -e "${YELLOW}Checking disk space...${NC}" + local available_space=$(df -h . | tail -1 | awk '{print $4}') + echo -e "${GREEN}✓ Available disk space: $available_space${NC}" + + echo "" + if [ "$checks_passed" = false ]; then + echo -e "${RED}========================================${NC}" + echo -e "${RED}Pre-flight checks failed!${NC}" + echo -e "${RED}========================================${NC}" + echo "" + read -p "Continue anyway? (y/n) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo -e "${RED}Aborted.${NC}" + exit 1 + fi + else + echo -e "${GREEN}========================================${NC}" + echo -e "${GREEN}All pre-flight checks passed!${NC}" + echo -e "${GREEN}========================================${NC}" + fi + echo "" +} + +# Run pre-flight checks +preflight_checks + +echo -e "${BLUE}========================================${NC}" +echo -e "${BLUE}Migration Regeneration Script (K8s)${NC}" +echo -e "${BLUE}========================================${NC}" +echo "" + +if [ "$DRY_RUN" = true ]; then + echo -e "${YELLOW}🔍 DRY RUN MODE - No changes will be made${NC}" + echo "" +fi + +echo -e "${YELLOW}This script will:${NC}" +if [ "$CHECK_EXISTING" = true ]; then + echo -e "${YELLOW}1. Check for existing migrations in pods and copy them${NC}" +fi +if [ "$SKIP_BACKUP" = false ]; then + echo -e "${YELLOW}2. Backup existing migration files${NC}" +fi +echo -e "${YELLOW}3. Generate new migrations in Kubernetes pods${NC}" +echo -e "${YELLOW}4. Copy generated files back to local machine${NC}" +if [ "$APPLY_MIGRATIONS" = true ]; then + echo -e "${YELLOW}5. Apply migrations to databases${NC}" +fi +if [ "$SKIP_BACKUP" = false ]; then + echo -e "${YELLOW}6. Keep the backup in: $BACKUP_DIR${NC}" +fi +echo "" +echo -e "${YELLOW}Using Kubernetes namespace: $NAMESPACE${NC}" +echo -e "${YELLOW}Logs will be saved to: $LOG_FILE${NC}" +echo "" + +if [ "$DRY_RUN" = false ]; then + read -p "Continue? (y/n) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo -e "${RED}Aborted.${NC}" + echo "[$(date '+%Y-%m-%d %H:%M:%S')] Aborted by user" >> "$LOG_FILE" + exit 1 + fi +fi + +# Kubernetes setup already verified in pre-flight checks + +# Function to get a running pod for a service +get_running_pod() { + local service=$1 + local pod_name="" + local selectors=( + "app.kubernetes.io/name=${service}-service,app.kubernetes.io/component=microservice" + "app.kubernetes.io/name=${service}-service,app.kubernetes.io/component=worker" + "app.kubernetes.io/name=${service}-service" + "app=${service}-service,component=${service}" # Fallback for demo-session + "app=${service}-service" # Additional fallback + ) + + for selector in "${selectors[@]}"; do + pod_name=$(kubectl get pods -n "$NAMESPACE" -l "$selector" --field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) + if [ -n "$pod_name" ]; then + echo "$pod_name" + return 0 + fi + done + + echo "" + return 1 +} + +# Function to log messages +log_message() { + local level=$1 + local message=$2 + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $level: $message" >> "$LOG_FILE" + if [ "$VERBOSE" = true ] || [ "$level" = "ERROR" ]; then + echo -e "${YELLOW}$message${NC}" + fi +} + +# Check for existing migrations in pods if requested +if [ "$CHECK_EXISTING" = true ]; then + echo -e "${BLUE}Step 1.5: Checking for existing migrations in pods...${NC}" + echo "" + + FOUND_COUNT=0 + COPIED_COUNT=0 + + for service in "${SERVICES[@]}"; do + service_dir=$(echo "$service" | tr '-' '_') + echo -e "${YELLOW}Checking $service...${NC}" + + # Find a running pod + POD_NAME=$(get_running_pod "$service") + if [ -z "$POD_NAME" ]; then + echo -e "${YELLOW}⚠ Pod not found, skipping${NC}" + log_message "WARNING" "No running pod found for $service" + continue + fi + + # Check container availability + CONTAINER="${service}-${CONTAINER_SUFFIX}" + if ! kubectl get pod -n "$NAMESPACE" "$POD_NAME" -o jsonpath='{.spec.containers[*].name}' | grep -qw "$CONTAINER"; then + echo -e "${RED}✗ Container $CONTAINER not found in pod $POD_NAME, skipping${NC}" + log_message "ERROR" "Container $CONTAINER not found in pod $POD_NAME for $service" + continue + fi + + # Check if migration files exist in the pod + EXISTING_FILES=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "ls /app/migrations/versions/*.py 2>/dev/null | grep -v __pycache__ | grep -v __init__.py" 2>/dev/null || echo "") + + if [ -n "$EXISTING_FILES" ]; then + FILE_COUNT=$(echo "$EXISTING_FILES" | wc -l | tr -d ' ') + echo -e "${GREEN}✓ Found $FILE_COUNT migration file(s) in pod${NC}" + FOUND_COUNT=$((FOUND_COUNT + 1)) + + # Create local versions directory + mkdir -p "services/$service_dir/migrations/versions" + + # Copy each file + for pod_file in $EXISTING_FILES; do + filename=$(basename "$pod_file") + if [ "$DRY_RUN" = true ]; then + echo -e "${BLUE}[DRY RUN] Would copy: $filename${NC}" + log_message "INFO" "[DRY RUN] Would copy $filename for $service" + else + if kubectl cp -n "$NAMESPACE" "$POD_NAME:$pod_file" "services/$service_dir/migrations/versions/$filename" -c "$CONTAINER" 2>>"$LOG_FILE"; then + echo -e "${GREEN}✓ Copied: $filename${NC}" + COPIED_COUNT=$((COPIED_COUNT + 1)) + log_message "INFO" "Copied $filename for $service" + # Display brief summary + echo -e "${BLUE}Preview:${NC}" + grep "def upgrade" "services/$service_dir/migrations/versions/$filename" | head -1 + grep "op\." "services/$service_dir/migrations/versions/$filename" | head -3 | sed 's/^/ /' + else + echo -e "${RED}✗ Failed to copy: $filename${NC}" + log_message "ERROR" "Failed to copy $filename for $service" + fi + fi + done + else + echo -e "${YELLOW}⚠ No migration files found in pod${NC}" + log_message "WARNING" "No migration files found in pod for $service" + fi + echo "" + done + + echo -e "${BLUE}========================================${NC}" + echo -e "${BLUE}Existing Migrations Check Summary${NC}" + echo -e "${BLUE}========================================${NC}" + echo -e "${GREEN}Services with migrations: $FOUND_COUNT${NC}" + if [ "$DRY_RUN" = false ]; then + echo -e "${GREEN}Files copied: $COPIED_COUNT${NC}" + fi + echo "" + + if [ "$FOUND_COUNT" = 0 ] && [ "$DRY_RUN" = false ]; then + read -p "Do you want to continue with regeneration? (y/n) " -n 1 -r + echo + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo -e "${YELLOW}Stopping. Existing migrations have been copied.${NC}" + log_message "INFO" "Stopped after copying existing migrations" + exit 0 + fi + fi +fi + +# Backup existing migrations +if [ "$SKIP_BACKUP" = false ] && [ "$DRY_RUN" = false ]; then + echo -e "${BLUE}Step 2: Backing up existing migrations...${NC}" + BACKUP_COUNT=0 + for service in "${SERVICES[@]}"; do + service_dir=$(echo "$service" | tr '-' '_') + if [ -d "services/$service_dir/migrations/versions" ] && [ -n "$(ls services/$service_dir/migrations/versions/*.py 2>/dev/null)" ]; then + echo -e "${YELLOW}Backing up $service migrations...${NC}" + mkdir -p "$BACKUP_DIR/$service_dir/versions" + cp -r "services/$service_dir/migrations/versions/"*.py "$BACKUP_DIR/$service_dir/versions/" 2>>"$LOG_FILE" + BACKUP_COUNT=$((BACKUP_COUNT + 1)) + log_message "INFO" "Backed up migrations for $service to $BACKUP_DIR/$service_dir/versions" + else + echo -e "${YELLOW}No migration files to backup for $service${NC}" + fi + done + if [ "$BACKUP_COUNT" -gt 0 ]; then + echo -e "${GREEN}✓ Backup complete: $BACKUP_DIR ($BACKUP_COUNT services)${NC}" + else + echo -e "${YELLOW}No migrations backed up (no migration files found)${NC}" + fi + echo "" +elif [ "$SKIP_BACKUP" = true ]; then + echo -e "${YELLOW}Skipping backup step (--skip-backup flag)${NC}" + log_message "INFO" "Backup skipped due to --skip-backup flag" + echo "" +fi + +# Clean up old backups +find . -maxdepth 1 -type d -name 'migrations_backup_*' -mtime +"$BACKUP_RETENTION_DAYS" -exec rm -rf {} \; 2>/dev/null || true +log_message "INFO" "Cleaned up backups older than $BACKUP_RETENTION_DAYS days" + +echo -e "${BLUE}Step 3: Generating new migrations in Kubernetes...${NC}" +echo "" + +SUCCESS_COUNT=0 +FAILED_COUNT=0 +FAILED_SERVICES=() + +# Function to process a single service +process_service() { + local service=$1 + local service_dir=$(echo "$service" | tr '-' '_') + local db_env_var=$(echo "$service" | tr '[:lower:]-' '[:upper:]_')_DATABASE_URL # e.g., pos -> POS_DATABASE_URL, alert-processor -> ALERT_PROCESSOR_DATABASE_URL + + echo -e "${BLUE}----------------------------------------${NC}" + echo -e "${BLUE}Processing: $service${NC}" + echo -e "${BLUE}----------------------------------------${NC}" + log_message "INFO" "Starting migration generation for $service" + + # Skip if no local migrations directory and --check-existing is not set + if [ ! -d "services/$service_dir/migrations/versions" ] && [ "$CHECK_EXISTING" = false ]; then + echo -e "${YELLOW}⚠ No local migrations/versions directory for $service, skipping...${NC}" + log_message "WARNING" "No local migrations/versions directory for $service" + return + fi + + # Find a running pod + echo -e "${YELLOW}Finding $service pod in namespace $NAMESPACE...${NC}" + POD_NAME=$(get_running_pod "$service") + if [ -z "$POD_NAME" ]; then + echo -e "${RED}✗ No running pod found for $service. Skipping...${NC}" + log_message "ERROR" "No running pod found for $service" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (pod not found)") + return + fi + + echo -e "${GREEN}✓ Found pod: $POD_NAME${NC}" + log_message "INFO" "Found pod $POD_NAME for $service" + + # Check container availability + CONTAINER="${service}-${CONTAINER_SUFFIX}" + if ! kubectl get pod -n "$NAMESPACE" "$POD_NAME" -o jsonpath='{.spec.containers[*].name}' | grep -qw "$CONTAINER"; then + echo -e "${RED}✗ Container $CONTAINER not found in pod $POD_NAME, skipping${NC}" + log_message "ERROR" "Container $CONTAINER not found in pod $POD_NAME for $service" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (container not found)") + return + fi + + # Verify database connectivity + if [ "$SKIP_DB_CHECK" = false ]; then + echo -e "${YELLOW}Verifying database connectivity using $db_env_var...${NC}" + # Check if asyncpg is installed + ASYNCPG_CHECK=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "python3 -c \"import asyncpg; print('asyncpg OK')\" 2>/dev/null" || echo "asyncpg MISSING") + if [[ "$ASYNCPG_CHECK" != "asyncpg OK" ]]; then + echo -e "${YELLOW}Installing asyncpg...${NC}" + kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "python3 -m pip install --quiet asyncpg" 2>>"$LOG_FILE" + fi + + # Check for database URL + DB_URL_CHECK=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "env | grep $db_env_var" 2>/dev/null || echo "") + if [ -z "$DB_URL_CHECK" ]; then + echo -e "${RED}✗ Environment variable $db_env_var not found in pod $POD_NAME${NC}" + echo -e "${YELLOW}Available environment variables:${NC}" + kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "env" 2>>"$LOG_FILE" | grep -i "database" || echo "No database-related variables found" + log_message "ERROR" "Environment variable $db_env_var not found for $service in pod $POD_NAME" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (missing $db_env_var)") + return + fi + + # Log redacted database URL for debugging + DB_URL=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "echo \$"$db_env_var"" 2>/dev/null | sed 's/\(password=\)[^@]*/\1[REDACTED]/') + log_message "INFO" "Using database URL for $service: $DB_URL" + + # Perform async database connectivity check + DB_CHECK_OUTPUT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH python3 -c \"import asyncio; from sqlalchemy.ext.asyncio import create_async_engine; async def check_db(): engine = create_async_engine(os.getenv('$db_env_var')); async with engine.connect() as conn: pass; await engine.dispose(); print('DB OK'); asyncio.run(check_db())\" 2>&1" || echo "DB ERROR") + if [[ "$DB_CHECK_OUTPUT" == *"DB OK"* ]]; then + echo -e "${GREEN}✓ Database connection verified${NC}" + log_message "INFO" "Database connection verified for $service" + else + echo -e "${RED}✗ Database connection failed for $service${NC}" + echo -e "${YELLOW}Error details: $DB_CHECK_OUTPUT${NC}" + log_message "ERROR" "Database connection failed for $service: $DB_CHECK_OUTPUT" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (database connection failed)") + return + fi + else + echo -e "${YELLOW}Skipping database connectivity check (--skip-db-check)${NC}" + log_message "INFO" "Skipped database connectivity check for $service" + fi + + # Reset alembic version tracking + echo -e "${YELLOW}Resetting alembic version tracking...${NC}" + kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH alembic downgrade base" 2>&1 | tee -a "$LOG_FILE" | grep -v "^INFO" || true + log_message "INFO" "Attempted alembic downgrade for $service" + + # Option 1: Complete database schema reset using CASCADE + echo -e "${YELLOW}Performing complete database schema reset...${NC}" + SCHEMA_DROP_RESULT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH python3 << 'EOFPYTHON' +import asyncio +import os +from sqlalchemy.ext.asyncio import create_async_engine +from sqlalchemy import text + +async def reset_database(): + try: + engine = create_async_engine(os.getenv('$db_env_var')) + async with engine.begin() as conn: + # Drop and recreate public schema - cleanest approach + await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE')) + await conn.execute(text('CREATE SCHEMA public')) + await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC')) + await engine.dispose() + print('SUCCESS: Database schema reset complete') + return 0 + except Exception as e: + print(f'ERROR: {str(e)}') + return 1 + +exit(asyncio.run(reset_database())) +EOFPYTHON +" 2>&1) + + echo "$SCHEMA_DROP_RESULT" >> "$LOG_FILE" + + if echo "$SCHEMA_DROP_RESULT" | grep -q "SUCCESS"; then + echo -e "${GREEN}✓ Database schema reset successfully${NC}" + log_message "INFO" "Database schema reset for $service" + else + echo -e "${RED}✗ Database schema reset failed${NC}" + echo -e "${YELLOW}Error details:${NC}" + echo "$SCHEMA_DROP_RESULT" + log_message "ERROR" "Database schema reset failed for $service: $SCHEMA_DROP_RESULT" + + # Try alternative approach: Drop individual tables from database (not just models) + echo -e "${YELLOW}Attempting alternative: dropping all existing tables individually...${NC}" + TABLE_DROP_RESULT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH python3 << 'EOFPYTHON' +import asyncio +import os +from sqlalchemy.ext.asyncio import create_async_engine +from sqlalchemy import text + +async def drop_all_tables(): + try: + engine = create_async_engine(os.getenv('$db_env_var')) + async with engine.begin() as conn: + # Get all tables from database + result = await conn.execute(text(\"\"\" + SELECT tablename + FROM pg_tables + WHERE schemaname = 'public' + \"\"\")) + tables = [row[0] for row in result] + + # Drop each table + for table in tables: + await conn.execute(text(f'DROP TABLE IF EXISTS \"{table}\" CASCADE')) + + print(f'SUCCESS: Dropped {len(tables)} tables: {tables}') + await engine.dispose() + return 0 + except Exception as e: + print(f'ERROR: {str(e)}') + return 1 + +exit(asyncio.run(drop_all_tables())) +EOFPYTHON +" 2>&1) + + echo "$TABLE_DROP_RESULT" >> "$LOG_FILE" + + if echo "$TABLE_DROP_RESULT" | grep -q "SUCCESS"; then + echo -e "${GREEN}✓ All tables dropped successfully${NC}" + log_message "INFO" "All tables dropped for $service" + else + echo -e "${RED}✗ Failed to drop tables${NC}" + echo -e "${YELLOW}Error details:${NC}" + echo "$TABLE_DROP_RESULT" + log_message "ERROR" "Failed to drop tables for $service: $TABLE_DROP_RESULT" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (database cleanup failed)") + return + fi + fi + + # Verify database is empty + echo -e "${YELLOW}Verifying database is clean...${NC}" + VERIFY_RESULT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH python3 << 'EOFPYTHON' +import asyncio +import os +from sqlalchemy.ext.asyncio import create_async_engine +from sqlalchemy import text + +async def verify_empty(): + engine = create_async_engine(os.getenv('$db_env_var')) + async with engine.connect() as conn: + result = await conn.execute(text(\"\"\" + SELECT COUNT(*) + FROM pg_tables + WHERE schemaname = 'public' + \"\"\")) + count = result.scalar() + print(f'Tables remaining: {count}') + await engine.dispose() + return count + +exit(asyncio.run(verify_empty())) +EOFPYTHON +" 2>&1) + + echo "$VERIFY_RESULT" >> "$LOG_FILE" + echo -e "${BLUE}$VERIFY_RESULT${NC}" + + # Initialize alembic version table after schema reset + echo -e "${YELLOW}Initializing alembic version tracking...${NC}" + ALEMBIC_INIT_OUTPUT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH alembic stamp base" 2>&1) + ALEMBIC_INIT_EXIT_CODE=$? + + echo "$ALEMBIC_INIT_OUTPUT" >> "$LOG_FILE" + + if [ $ALEMBIC_INIT_EXIT_CODE -eq 0 ]; then + echo -e "${GREEN}✓ Alembic version tracking initialized${NC}" + log_message "INFO" "Alembic version tracking initialized for $service" + else + echo -e "${YELLOW}⚠ Alembic initialization warning (may be normal)${NC}" + log_message "WARNING" "Alembic initialization for $service: $ALEMBIC_INIT_OUTPUT" + fi + + # Remove old migration files in pod + echo -e "${YELLOW}Removing old migration files in pod...${NC}" + kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "rm -rf /app/migrations/versions/*.py /app/migrations/versions/__pycache__" 2>>"$LOG_FILE" || log_message "WARNING" "Failed to remove old migration files for $service" + + # Ensure dependencies + echo -e "${YELLOW}Ensuring python-dateutil and asyncpg are installed...${NC}" + kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "python3 -m pip install --quiet python-dateutil asyncpg" 2>>"$LOG_FILE" + + # Generate migration + echo -e "${YELLOW}Running alembic autogenerate in pod...${NC}" + MIGRATION_TIMESTAMP=$(date +%Y%m%d_%H%M) + MIGRATION_OUTPUT=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH python3 -m alembic revision --autogenerate -m \"initial_schema_$MIGRATION_TIMESTAMP\"" 2>&1) + MIGRATION_EXIT_CODE=$? + + echo "$MIGRATION_OUTPUT" >> "$LOG_FILE" + + if [ $MIGRATION_EXIT_CODE -eq 0 ]; then + echo -e "${GREEN}✓ Migration generated in pod${NC}" + log_message "INFO" "Migration generated for $service" + + # Copy migration file + MIGRATION_FILE=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "ls -t /app/migrations/versions/*.py 2>/dev/null | head -1" || echo "") + if [ -z "$MIGRATION_FILE" ]; then + echo -e "${RED}✗ No migration file found in pod${NC}" + log_message "ERROR" "No migration file generated for $service" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (no file generated)") + return + fi + + MIGRATION_FILENAME=$(basename "$MIGRATION_FILE") + mkdir -p "services/$service_dir/migrations/versions" + + # Copy file with better error handling + echo -e "${YELLOW}Copying migration file from pod...${NC}" + CP_OUTPUT=$(kubectl cp -n "$NAMESPACE" "$POD_NAME:$MIGRATION_FILE" "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" -c "$CONTAINER" 2>&1) + CP_EXIT_CODE=$? + + echo "$CP_OUTPUT" >> "$LOG_FILE" + + # Verify the file was actually copied + if [ $CP_EXIT_CODE -eq 0 ] && [ -f "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" ]; then + LOCAL_FILE_SIZE=$(wc -c < "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" | tr -d ' ') + + if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then + echo -e "${GREEN}✓ Migration file copied: $MIGRATION_FILENAME ($LOCAL_FILE_SIZE bytes)${NC}" + log_message "INFO" "Copied $MIGRATION_FILENAME for $service ($LOCAL_FILE_SIZE bytes)" + SUCCESS_COUNT=$((SUCCESS_COUNT + 1)) + + # Validate migration content + echo -e "${YELLOW}Validating migration content...${NC}" + if grep -E "op\.(create_table|add_column|create_index|alter_column|drop_table|drop_column|create_foreign_key)" "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" >/dev/null; then + echo -e "${GREEN}✓ Migration contains schema operations${NC}" + log_message "INFO" "Migration contains schema operations for $service" + elif grep -q "pass" "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" && grep -q "def upgrade()" "services/$service_dir/migrations/versions/$MIGRATION_FILENAME"; then + echo -e "${YELLOW}⚠ WARNING: Migration is empty (no schema changes detected)${NC}" + echo -e "${YELLOW}⚠ This usually means tables already exist in database matching the models${NC}" + log_message "WARNING" "Empty migration generated for $service - possible database cleanup issue" + else + echo -e "${GREEN}✓ Migration file created${NC}" + fi + + # Display summary + echo -e "${BLUE}Migration summary:${NC}" + grep -E "^def (upgrade|downgrade)" "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" | head -2 + echo -e "${BLUE}Operations:${NC}" + grep "op\." "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" | head -5 || echo " (none found)" + else + echo -e "${RED}✗ Migration file is empty (0 bytes)${NC}" + log_message "ERROR" "Migration file is empty for $service" + rm -f "services/$service_dir/migrations/versions/$MIGRATION_FILENAME" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (empty file)") + fi + else + echo -e "${RED}✗ Failed to copy migration file${NC}" + echo -e "${YELLOW}kubectl cp exit code: $CP_EXIT_CODE${NC}" + echo -e "${YELLOW}kubectl cp output: $CP_OUTPUT${NC}" + log_message "ERROR" "Failed to copy migration file for $service: $CP_OUTPUT" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (copy failed)") + fi + else + echo -e "${RED}✗ Failed to generate migration${NC}" + log_message "ERROR" "Failed to generate migration for $service" + FAILED_COUNT=$((FAILED_COUNT + 1)) + FAILED_SERVICES+=("$service (generation failed)") + fi +} + +# Process services sequentially +for service in "${SERVICES[@]}"; do + process_service "$service" +done + +# Summary +echo "" +echo -e "${BLUE}========================================${NC}" +echo -e "${BLUE}Summary${NC}" +echo -e "${BLUE}========================================${NC}" +echo -e "${GREEN}✓ Successful: $SUCCESS_COUNT services${NC}" +echo -e "${RED}✗ Failed: $FAILED_COUNT services${NC}" + +if [ "$FAILED_COUNT" -gt 0 ]; then + echo "" + echo -e "${RED}Failed services:${NC}" + for failed_service in "${FAILED_SERVICES[@]}"; do + echo -e "${RED} - $failed_service${NC}" + done +fi + +echo "" +echo -e "${YELLOW}Backup location: $BACKUP_DIR${NC}" +echo -e "${YELLOW}Log file: $LOG_FILE${NC}" +echo "" + +# Apply migrations if requested +if [ "$APPLY_MIGRATIONS" = true ] && [ "$DRY_RUN" = false ] && [ "$SUCCESS_COUNT" -gt 0 ]; then + echo -e "${BLUE}========================================${NC}" + echo -e "${BLUE}Applying Migrations${NC}" + echo -e "${BLUE}========================================${NC}" + echo "" + + APPLIED_COUNT=0 + APPLY_FAILED_COUNT=0 + + for service in "${SERVICES[@]}"; do + service_dir=$(echo "$service" | tr '-' '_') + local db_env_var=$(echo "$service" | tr '[:lower:]-' '[:upper:]_')_DATABASE_URL + if [ ! -d "services/$service_dir/migrations/versions" ] || [ -z "$(ls services/$service_dir/migrations/versions/*.py 2>/dev/null)" ]; then + continue + fi + + echo -e "${BLUE}Applying migrations for: $service${NC}" + POD_NAME=$(get_running_pod "$service") + if [ -z "$POD_NAME" ]; then + echo -e "${YELLOW}⚠ Pod not found for $service, skipping...${NC}" + log_message "WARNING" "No running pod found for $service during migration application" + continue + fi + + CONTAINER="${service}-${CONTAINER_SUFFIX}" + if ! kubectl get pod -n "$NAMESPACE" "$POD_NAME" -o jsonpath='{.spec.containers[*].name}' | grep -qw "$CONTAINER"; then + echo -e "${RED}✗ Container $CONTAINER not found in pod $POD_NAME, skipping${NC}" + log_message "ERROR" "Container $CONTAINER not found in pod $POD_NAME for $service" + continue + fi + + if kubectl exec -n "$NAMESPACE" "$POD_NAME" -c "$CONTAINER" -- sh -c "cd /app && PYTHONPATH=/app:/app/shared:\$PYTHONPATH alembic upgrade head" 2>>"$LOG_FILE"; then + echo -e "${GREEN}✓ Migrations applied successfully for $service${NC}" + log_message "INFO" "Migrations applied for $service" + APPLIED_COUNT=$((APPLIED_COUNT + 1)) + else + echo -e "${RED}✗ Failed to apply migrations for $service${NC}" + log_message "ERROR" "Failed to apply migrations for $service" + APPLY_FAILED_COUNT=$((APPLY_FAILED_COUNT + 1)) + fi + echo "" + done + + echo -e "${BLUE}Migration Application Summary:${NC}" + echo -e "${GREEN}✓ Applied: $APPLIED_COUNT services${NC}" + echo -e "${RED}✗ Failed: $APPLY_FAILED_COUNT services${NC}" + echo "" +fi + +# Clean up temporary files +rm -f /tmp/*_migration.log /tmp/*_downgrade.log /tmp/*_apply.log 2>/dev/null || true +log_message "INFO" "Cleaned up temporary files" + +echo -e "${BLUE}Next steps:${NC}" +echo -e "${YELLOW}1. Review the generated migrations in services/*/migrations/versions/${NC}" +echo -e "${YELLOW}2. Compare with the backup in $BACKUP_DIR${NC}" +echo -e "${YELLOW}3. Check logs in $LOG_FILE for details${NC}" +echo -e "${YELLOW}4. Test migrations by applying them:${NC}" +echo -e " ${GREEN}kubectl exec -n $NAMESPACE -it -c -${CONTAINER_SUFFIX} -- alembic upgrade head${NC}" +echo -e "${YELLOW}5. Verify tables were created:${NC}" +echo -e " ${GREEN}kubectl exec -n $NAMESPACE -it -c -${CONTAINER_SUFFIX} -- python3 -c \"${NC}" +echo -e " ${GREEN}import asyncio; from sqlalchemy.ext.asyncio import create_async_engine; from sqlalchemy import inspect; async def check_tables(): engine = create_async_engine(os.getenv('_DATABASE_URL')); async with engine.connect() as conn: print(inspect(conn).get_table_names()); await engine.dispose(); asyncio.run(check_tables())${NC}" +echo -e " ${GREEN}\"${NC}" +echo -e "${YELLOW}6. If issues occur, restore from backup:${NC}" +echo -e " ${GREEN}cp -r $BACKUP_DIR/*/versions/* services/*/migrations/versions/${NC}" +echo "" \ No newline at end of file diff --git a/infrastructure/kubernetes/remove-imagepullsecrets.sh b/infrastructure/scripts/maintenance/remove-imagepullsecrets.sh similarity index 100% rename from infrastructure/kubernetes/remove-imagepullsecrets.sh rename to infrastructure/scripts/maintenance/remove-imagepullsecrets.sh diff --git a/infrastructure/scripts/maintenance/run_subscription_integration_test.sh b/infrastructure/scripts/maintenance/run_subscription_integration_test.sh new file mode 100755 index 00000000..d95d223d --- /dev/null +++ b/infrastructure/scripts/maintenance/run_subscription_integration_test.sh @@ -0,0 +1,145 @@ +#!/bin/bash + +# Script to run the subscription creation integration test inside Kubernetes +# This script creates a test pod that runs the integration test + +set -e + +echo "🚀 Starting subscription creation integration test..." + +# Check if there's already a test pod running +EXISTING_POD=$(kubectl get pod subscription-integration-test -n bakery-ia 2>/dev/null || echo "") +if [ -n "$EXISTING_POD" ]; then + echo "🧹 Cleaning up existing test pod..." + kubectl delete pod subscription-integration-test -n bakery-ia --wait=true + echo "✅ Existing pod cleaned up" +fi + +# Determine the correct image to use by checking the existing tenant service deployment +IMAGE=$(kubectl get deployment tenant-service -n bakery-ia -o jsonpath='{.spec.template.spec.containers[0].image}') + +if [ -z "$IMAGE" ]; then + echo "❌ Could not determine tenant service image. Is the tenant service deployed?" + exit 1 +fi + +echo "📦 Using image: $IMAGE" + +# Create a test pod that runs the integration test with a simple command +echo "🔧 Creating test pod..." +kubectl run subscription-integration-test \ + --image="$IMAGE" \ + --namespace=bakery-ia \ + --restart=Never \ + --env="GATEWAY_URL=http://gateway-service:8000" \ + --env="STRIPE_SECRET_KEY=$(kubectl get secret payment-secrets -n bakery-ia -o jsonpath='{.data.STRIPE_SECRET_KEY}' | base64 -d)" \ + --command -- /bin/sh -c " + set -e + echo '🧪 Setting up test environment...' && + cd /app && + echo '📋 Installing test dependencies...' && + pip install pytest pytest-asyncio httpx stripe --quiet && + echo '✅ Dependencies installed' && + echo '' && + echo '🔧 Configuring test to use internal gateway service URL...' && + # Backup original file before modification + cp tests/integration/test_subscription_creation_flow.py tests/integration/test_subscription_creation_flow.py.bak && + # Update the test file to use the internal gateway service URL + sed -i 's|self.base_url = \"https://bakery-ia.local\"|self.base_url = \"http://gateway-service:8000\"|g' tests/integration/test_subscription_creation_flow.py && + echo '✅ Test configured for internal Kubernetes networking' && + echo '' && + echo '🧪 Running subscription creation integration test...' && + echo '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━' && + python -m pytest tests/integration/test_subscription_creation_flow.py -v --tb=short -s --color=yes && + TEST_RESULT=\$? && + echo '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━' && + echo '' && + echo '📋 Restoring original test file...' && + mv tests/integration/test_subscription_creation_flow.py.bak tests/integration/test_subscription_creation_flow.py && + echo '✅ Original test file restored' && + echo '' && + if [ \$TEST_RESULT -eq 0 ]; then + echo '🎉 Integration test PASSED!' + else + echo '❌ Integration test FAILED!' + fi && + exit \$TEST_RESULT + " + +# Wait for the test pod to start +echo "⏳ Waiting for test pod to start..." +sleep 5 + +# Follow the logs in real-time +echo "📋 Following test execution logs..." +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "" + +# Stream logs while the pod is running +kubectl logs -f subscription-integration-test -n bakery-ia 2>/dev/null || true + +echo "" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + +# Wait for the pod to complete with a timeout +echo "⏳ Waiting for test pod to complete..." +TIMEOUT=600 # 10 minutes timeout +COUNTER=0 +while [ $COUNTER -lt $TIMEOUT ]; do + POD_STATUS=$(kubectl get pod subscription-integration-test -n bakery-ia -o jsonpath='{.status.phase}' 2>/dev/null) + + if [ "$POD_STATUS" == "Succeeded" ] || [ "$POD_STATUS" == "Failed" ]; then + break + fi + + sleep 2 + COUNTER=$((COUNTER + 2)) +done + +if [ $COUNTER -ge $TIMEOUT ]; then + echo "⏰ Timeout waiting for test to complete after $TIMEOUT seconds" + echo "📋 Fetching final logs before cleanup..." + kubectl logs subscription-integration-test -n bakery-ia --tail=100 + echo "🧹 Cleaning up test pod due to timeout..." + kubectl delete pod subscription-integration-test -n bakery-ia --wait=false + exit 1 +fi + +# Get the final status +POD_STATUS=$(kubectl get pod subscription-integration-test -n bakery-ia -o jsonpath='{.status.phase}') +CONTAINER_EXIT_CODE=$(kubectl get pod subscription-integration-test -n bakery-ia -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}' 2>/dev/null || echo "unknown") + +echo "" +echo "📊 Test Results:" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "Pod Status: $POD_STATUS" +echo "Exit Code: $CONTAINER_EXIT_CODE" + +# Determine if the test passed +if [ "$POD_STATUS" == "Succeeded" ] && [ "$CONTAINER_EXIT_CODE" == "0" ]; then + echo "" + echo "✅ Integration test PASSED!" + echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + RESULT=0 +else + echo "" + echo "❌ Integration test FAILED!" + echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + + # Show additional logs if failed + if [ "$POD_STATUS" == "Failed" ]; then + echo "" + echo "📋 Last 50 lines of logs:" + kubectl logs subscription-integration-test -n bakery-ia --tail=50 + fi + + RESULT=1 +fi + +# Clean up the test pod +echo "" +echo "🧹 Cleaning up test pod..." +kubectl delete pod subscription-integration-test -n bakery-ia --wait=false + +echo "🏁 Integration test process completed!" +exit $RESULT \ No newline at end of file diff --git a/infrastructure/scripts/maintenance/setup-https.sh b/infrastructure/scripts/maintenance/setup-https.sh new file mode 100755 index 00000000..00483f6e --- /dev/null +++ b/infrastructure/scripts/maintenance/setup-https.sh @@ -0,0 +1,649 @@ +#!/bin/bash + +# Bakery IA HTTPS Setup Script +# This script sets up HTTPS with cert-manager and Let's Encrypt for local development + +# Remove -e to handle errors more gracefully +set -u + +echo "🔒 Setting up HTTPS for Bakery IA with cert-manager and Let's Encrypt" +echo "===============================================================" + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Function to print colored output +print_status() { + echo -e "${BLUE}[INFO]${NC} $1" +} + +print_success() { + echo -e "${GREEN}[SUCCESS]${NC} $1" +} + +print_warning() { + echo -e "${YELLOW}[WARNING]${NC} $1" +} + +print_error() { + echo -e "${RED}[ERROR]${NC} $1" +} + +# Check prerequisites +check_prerequisites() { + print_status "Checking prerequisites..." + + # Check required tools + local missing_tools=() + + if ! command -v kubectl &> /dev/null; then + missing_tools+=("kubectl") + fi + + if ! command -v kind &> /dev/null; then + missing_tools+=("kind") + fi + + if ! command -v skaffold &> /dev/null; then + missing_tools+=("skaffold") + fi + + if ! command -v colima &> /dev/null; then + missing_tools+=("colima") + fi + + # Report missing tools + if [ ${#missing_tools[@]} -ne 0 ]; then + print_error "Missing required tools: ${missing_tools[*]}" + print_error "Please install them with: brew install ${missing_tools[*]}" + exit 1 + fi + + # Check if Colima is running + if ! colima status --profile k8s-local &> /dev/null; then + print_warning "Colima is not running. Starting Colima..." + colima start --cpu 8 --memory 16 --disk 100 --runtime docker --profile k8s-local + if [ $? -ne 0 ]; then + print_error "Failed to start Colima. Please check your Docker installation." + exit 1 + fi + print_success "Colima started successfully" + fi + + # Check if cluster is running or exists + local cluster_exists=false + local cluster_running=false + + # Check if Kind cluster exists + if kind get clusters | grep -q "bakery-ia-local"; then + cluster_exists=true + print_status "Kind cluster 'bakery-ia-local' already exists" + + # Check if kubectl can connect to it + if kubectl cluster-info --context kind-bakery-ia-local &> /dev/null; then + cluster_running=true + print_success "Kubernetes cluster is running and accessible" + else + print_warning "Kind cluster exists but is not accessible via kubectl" + fi + fi + + # Handle cluster creation/recreation + if [ "$cluster_exists" = true ] && [ "$cluster_running" = false ]; then + print_warning "Kind cluster exists but is not running. Recreating..." + kind delete cluster --name bakery-ia-local || true + cluster_exists=false + fi + + if [ "$cluster_exists" = false ]; then + print_warning "Creating new Kind cluster..." + if [ ! -f "kind-config.yaml" ]; then + print_error "kind-config.yaml not found. Please ensure you're running this script from the project root." + exit 1 + fi + + if kind create cluster --config kind-config.yaml; then + print_success "Kind cluster created successfully" + else + print_error "Failed to create Kind cluster. Please check your Kind installation." + exit 1 + fi + fi + + # Ensure we're using the correct kubectl context + kubectl config use-context kind-bakery-ia-local || { + print_error "Failed to set kubectl context to kind-bakery-ia-local" + exit 1 + } + + print_success "Prerequisites check passed" +} + +# Install cert-manager +install_cert_manager() { + print_status "Installing cert-manager..." + + # Check if cert-manager is already installed + if kubectl get namespace cert-manager &> /dev/null; then + print_warning "cert-manager namespace already exists. Checking if installation is complete..." + + # Check if pods are running + if kubectl get pods -n cert-manager | grep -q "Running"; then + print_success "cert-manager is already installed and running" + return 0 + else + print_status "cert-manager exists but pods are not ready. Waiting..." + fi + else + # Install cert-manager + print_status "Installing cert-manager from official release..." + if kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml; then + print_success "cert-manager installation started" + else + print_error "Failed to install cert-manager. Please check your internet connection and try again." + exit 1 + fi + fi + + # Wait for cert-manager namespace to be created + print_status "Waiting for cert-manager namespace..." + for i in {1..30}; do + if kubectl get namespace cert-manager &> /dev/null; then + break + fi + sleep 2 + done + + # Wait for cert-manager pods to be created + print_status "Waiting for cert-manager pods to be created..." + for i in {1..60}; do + if kubectl get pods -n cert-manager &> /dev/null && [ $(kubectl get pods -n cert-manager --no-headers | wc -l) -ge 3 ]; then + print_success "cert-manager pods created" + break + fi + print_status "Waiting for cert-manager pods... (attempt $i/60)" + sleep 5 + done + + # Wait for cert-manager pods to be ready + print_status "Waiting for cert-manager pods to be ready..." + + # Use more reliable selectors for cert-manager components + local components=( + "app.kubernetes.io/name=cert-manager" + "app.kubernetes.io/name=cainjector" + "app.kubernetes.io/name=webhook" + ) + local component_names=("cert-manager" "cert-manager-cainjector" "cert-manager-webhook") + + for i in "${!components[@]}"; do + local selector="${components[$i]}" + local name="${component_names[$i]}" + + print_status "Waiting for $name to be ready..." + + # First check if pods exist with this selector + local pod_count=0 + for attempt in {1..30}; do + pod_count=$(kubectl get pods -n cert-manager -l "$selector" --no-headers 2>/dev/null | wc -l) + if [ "$pod_count" -gt 0 ]; then + break + fi + sleep 2 + done + + if [ "$pod_count" -eq 0 ]; then + print_warning "No pods found for $name with selector $selector, trying alternative approach..." + # Fallback: wait for any pods containing the component name + if kubectl wait --for=condition=ready pod -n cert-manager --all --timeout=300s 2>/dev/null; then + print_success "All cert-manager pods are ready" + break + else + print_warning "$name pods not found, but continuing..." + continue + fi + fi + + # Wait for the specific component to be ready + if kubectl wait --for=condition=ready pod -l "$selector" -n cert-manager --timeout=300s 2>/dev/null; then + print_success "$name is ready" + else + print_warning "$name is taking longer than expected. Checking status..." + kubectl get pods -n cert-manager -l "$selector" 2>/dev/null || true + + # Continue anyway, sometimes it works despite timeout + print_warning "Continuing with setup. $name may still be starting..." + fi + done + + # Final verification + if kubectl get pods -n cert-manager | grep -q "Running"; then + print_success "cert-manager installed successfully" + else + print_warning "cert-manager installation may not be complete. Current status:" + kubectl get pods -n cert-manager + print_status "Continuing with setup anyway..." + fi +} + +# Install NGINX Ingress Controller +install_nginx_ingress() { + print_status "Installing NGINX Ingress Controller for Kind..." + + # Check if NGINX Ingress is already installed + if kubectl get namespace ingress-nginx &> /dev/null; then + print_warning "NGINX Ingress Controller namespace already exists. Checking status..." + + # Check if controller is running + if kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller | grep -q "Running"; then + print_success "NGINX Ingress Controller is already running" + else + print_status "NGINX Ingress Controller exists but not ready. Waiting..." + kubectl wait --namespace ingress-nginx \ + --for=condition=ready pod \ + --selector=app.kubernetes.io/component=controller \ + --timeout=300s 2>/dev/null || { + print_warning "Ingress controller taking longer than expected, but continuing..." + } + fi + else + # Install NGINX Ingress Controller for Kind (updated URL) + print_status "Installing NGINX Ingress Controller for Kind..." + if kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml; then + print_success "NGINX Ingress Controller installation started" + + # Wait for ingress controller to be ready + print_status "Waiting for NGINX Ingress Controller to be ready..." + kubectl wait --namespace ingress-nginx \ + --for=condition=ready pod \ + --selector=app.kubernetes.io/component=controller \ + --timeout=300s 2>/dev/null || { + print_warning "Ingress controller taking longer than expected, but continuing..." + } + else + print_error "Failed to install NGINX Ingress Controller" + exit 1 + fi + fi + + # Configure ingress for permanent localhost access + print_status "Configuring permanent localhost access..." + kubectl patch svc ingress-nginx-controller -n ingress-nginx -p '{"spec":{"type":"NodePort","ports":[{"name":"http","port":80,"targetPort":"http","nodePort":30080},{"name":"https","port":443,"targetPort":"https","nodePort":30443}]}}' || true + + print_success "NGINX Ingress Controller configured successfully" +} + +# Setup cluster issuers +setup_cluster_issuers() { + print_status "Setting up cluster issuers..." + + # Check if cert-manager components exist + if [ ! -f "infrastructure/platform/cert-manager/cluster-issuer-staging.yaml" ]; then + print_error "cert-manager component files not found. Please ensure you're running this script from the project root." + exit 1 + fi + + # Apply cluster issuers + print_status "Applying cluster issuers..." + + local issuer_files=( + "infrastructure/platform/cert-manager/cluster-issuer-staging.yaml" + "infrastructure/platform/cert-manager/local-ca-issuer.yaml" + "infrastructure/platform/cert-manager/cluster-issuer-production.yaml" + ) + + for issuer_file in "${issuer_files[@]}"; do + if [ -f "$issuer_file" ]; then + print_status "Applying $issuer_file..." + kubectl apply -f "$issuer_file" || { + print_warning "Failed to apply $issuer_file, but continuing..." + } + else + print_warning "$issuer_file not found, skipping..." + fi + done + + # Wait for the issuers to be created + print_status "Waiting for cluster issuers to be ready..." + sleep 15 + + # Check if issuers are ready + print_status "Checking cluster issuer status..." + kubectl get clusterissuers 2>/dev/null || print_warning "No cluster issuers found yet" + + # Verify that the local CA issuer is ready (if it exists) + if kubectl get clusterissuer local-ca-issuer &> /dev/null; then + for i in {1..10}; do + local issuer_ready=$(kubectl get clusterissuer local-ca-issuer -o jsonpath='{.status.conditions[0].type}' 2>/dev/null || echo "") + if [[ "$issuer_ready" == "Ready" ]]; then + print_success "Local CA issuer is ready" + break + fi + print_status "Waiting for local CA issuer to be ready... (attempt $i/10)" + sleep 10 + done + else + print_warning "Local CA issuer not found, skipping readiness check" + fi + + print_success "Cluster issuers configured successfully" +} + +# Deploy the application with HTTPS using Skaffold +deploy_with_https() { + print_status "Deploying Bakery IA with HTTPS support using Skaffold..." + + # Check if Skaffold is available + if ! command -v skaffold &> /dev/null; then + print_error "Skaffold is not installed. Please install skaffold first:" + print_error "brew install skaffold" + exit 1 + fi + + # Check if skaffold.yaml exists + if [ ! -f "skaffold.yaml" ]; then + print_error "skaffold.yaml not found. Please ensure you're running this script from the project root." + exit 1 + fi + + # Deploy with Skaffold (builds and deploys automatically with HTTPS support) + print_status "Building and deploying with Skaffold (dev profile includes HTTPS)..." + if skaffold run --profile=dev; then + print_success "Skaffold deployment started" + else + print_warning "Skaffold deployment had issues, but continuing..." + fi + + # Wait for namespace to be created + print_status "Waiting for bakery-ia namespace..." + for i in {1..30}; do + if kubectl get namespace bakery-ia &> /dev/null; then + print_success "bakery-ia namespace found" + break + fi + sleep 2 + done + + # Check if namespace was created + if ! kubectl get namespace bakery-ia &> /dev/null; then + print_warning "bakery-ia namespace not found. Deployment may have failed." + return 0 + fi + + # Wait for deployments to be ready + print_status "Waiting for deployments to be ready..." + if kubectl wait --for=condition=available --timeout=600s deployment --all -n bakery-ia 2>/dev/null; then + print_success "All deployments are ready" + else + print_warning "Some deployments are taking longer than expected, but continuing..." + fi + + # Verify ingress exists + if kubectl get ingress bakery-ingress -n bakery-ia &> /dev/null; then + print_success "HTTPS ingress configured successfully" + else + print_warning "Ingress not found, but continuing with setup..." + fi + + print_success "Application deployed with HTTPS support using Skaffold" +} + +# Check certificate status +check_certificates() { + print_status "Checking certificate status..." + + # Wait for certificate to be issued + print_status "Waiting for certificates to be issued..." + + # Check if certificate exists + for i in {1..12}; do + if kubectl get certificate bakery-ia-tls-cert -n bakery-ia &> /dev/null; then + print_success "Certificate found" + break + fi + print_status "Waiting for certificate to be created... (attempt $i/12)" + sleep 10 + done + + # Wait for certificate to be ready + for i in {1..20}; do + if kubectl get certificate bakery-ia-tls-cert -n bakery-ia -o jsonpath='{.status.conditions[0].type}' 2>/dev/null | grep -q "Ready"; then + print_success "Certificate is ready" + break + fi + print_status "Waiting for certificate to be ready... (attempt $i/20)" + sleep 15 + done + + echo "" + echo "📋 Certificate status:" + kubectl get certificates -n bakery-ia 2>/dev/null || print_warning "No certificates found" + + echo "" + echo "🔍 Certificate details:" + kubectl describe certificate bakery-ia-tls-cert -n bakery-ia 2>/dev/null || print_warning "Certificate not found" + + echo "" + echo "🔐 TLS secret status:" + kubectl get secret bakery-ia-tls-cert -n bakery-ia 2>/dev/null || print_warning "TLS secret not found" +} + +# Update hosts file +update_hosts_file() { + print_status "Checking hosts file configuration..." + + # Get the external IP for Kind + EXTERNAL_IP="127.0.0.1" + + # Check if entries exist in hosts file + if ! grep -q "bakery-ia.local" /etc/hosts 2>/dev/null; then + print_warning "Adding entries to /etc/hosts file for named host access..." + + # Ask for user permission + read -p "Do you want to add entries to /etc/hosts for named host access? (y/N): " -n 1 -r + echo + if [[ $REPLY =~ ^[Yy]$ ]]; then + # Add hosts entries with proper error handling + { + echo "$EXTERNAL_IP bakery-ia.local" + echo "$EXTERNAL_IP api.bakery-ia.local" + echo "$EXTERNAL_IP monitoring.bakery-ia.local" + } | sudo tee -a /etc/hosts > /dev/null + + if [ $? -eq 0 ]; then + print_success "Hosts file entries added successfully" + else + print_error "Failed to update hosts file. You may need to add entries manually." + fi + else + print_warning "Skipping hosts file update. You can still access via https://localhost" + fi + else + print_success "Hosts file entries already exist" + fi + + echo "" + print_status "Available access methods:" + echo " 🌐 Primary: https://localhost (no hosts file needed)" + echo " 🏷️ Named: https://bakery-ia.local (requires hosts file)" + echo " 🔗 API: https://localhost/api or https://api.bakery-ia.local" +} + +# Export CA certificate for browser trust +export_ca_certificate() { + print_status "Exporting CA certificate for browser trust..." + + # Wait for CA certificate to be created + for i in {1..10}; do + if kubectl get secret local-ca-key-pair -n cert-manager &> /dev/null; then + print_success "CA certificate secret found" + break + fi + print_status "Waiting for CA certificate secret... (attempt $i/10)" + sleep 10 + done + + # Extract the CA certificate + if kubectl get secret local-ca-key-pair -n cert-manager &> /dev/null; then + if kubectl get secret local-ca-key-pair -n cert-manager -o jsonpath='{.data.tls\.crt}' | base64 -d > bakery-ia-ca.crt 2>/dev/null; then + print_success "CA certificate exported as 'bakery-ia-ca.crt'" + + # Make the certificate file readable + chmod 644 bakery-ia-ca.crt + else + print_warning "Failed to extract CA certificate from secret" + fi + + print_warning "To trust this certificate and remove browser warnings:" + echo "" + echo "📱 macOS:" + echo " 1. Double-click 'bakery-ia-ca.crt' to open Keychain Access" + echo " 2. Find 'bakery-ia-local-ca' in the certificates list" + echo " 3. Double-click it and set to 'Always Trust'" + echo "" + echo "🐧 Linux:" + echo " sudo cp bakery-ia-ca.crt /usr/local/share/ca-certificates/" + echo " sudo update-ca-certificates" + echo "" + echo "🪟 Windows:" + echo " 1. Double-click 'bakery-ia-ca.crt'" + echo " 2. Click 'Install Certificate'" + echo " 3. Choose 'Trusted Root Certification Authorities'" + echo "" + else + print_warning "CA certificate secret not found. HTTPS will work but with browser warnings." + print_warning "You can still access the application at https://localhost" + fi +} + +# Display access information +display_access_info() { + print_success "🎉 HTTPS setup completed!" + echo "" + echo "🌐 Access your application at:" + echo " Primary: https://localhost" + echo " API: https://localhost/api" + echo " Named Host: https://bakery-ia.local (if hosts file updated)" + echo " API Named: https://api.bakery-ia.local (if hosts file updated)" + echo "" + echo "🛠️ Useful commands:" + echo " 📋 Check status: kubectl get all -n bakery-ia" + echo " 🔍 Check ingress: kubectl get ingress -n bakery-ia" + echo " 📜 Check certificates: kubectl get certificates -n bakery-ia" + echo " 📝 View service logs: kubectl logs -f deployment/ -n bakery-ia" + echo " 🚀 Development mode: skaffold dev --profile=dev" + echo " 🧹 Clean up: skaffold delete --profile=dev" + echo " 🔄 Restart service: kubectl rollout restart deployment/ -n bakery-ia" + echo "" + echo "🔧 Troubleshooting:" + echo " 🩺 Get events: kubectl get events -n bakery-ia --sort-by='.firstTimestamp'" + echo " 🔍 Describe pod: kubectl describe pod -n bakery-ia" + echo " 📊 Resource usage: kubectl top pods -n bakery-ia" + echo " 🔐 Certificate details: kubectl describe certificate bakery-ia-tls-cert -n bakery-ia" + echo "" + if [ -f "bakery-ia-ca.crt" ]; then + print_warning "📋 Next steps:" + echo " 1. Import 'bakery-ia-ca.crt' into your browser to remove certificate warnings" + echo " 2. Access https://localhost to verify the setup" + echo " 3. Run 'skaffold dev --profile=dev' for development with hot-reload" + else + print_warning "⚠️ Note: You may see certificate warnings until the CA certificate is properly configured" + fi + echo "" + print_status "🎯 The application is now ready for secure development!" +} + +# Check current cert-manager status for debugging +check_current_cert_manager_status() { + print_status "Checking current cert-manager status..." + + if kubectl get namespace cert-manager &> /dev/null; then + echo "" + echo "📋 Current cert-manager pods status:" + kubectl get pods -n cert-manager + + echo "" + echo "🔍 cert-manager deployments:" + kubectl get deployments -n cert-manager + + # Check for any pending or failed pods + local failed_pods=$(kubectl get pods -n cert-manager --field-selector=status.phase!=Running --no-headers 2>/dev/null | wc -l) + if [ "$failed_pods" -gt 0 ]; then + echo "" + print_warning "Found $failed_pods non-running pods. Details:" + kubectl get pods -n cert-manager --field-selector=status.phase!=Running + fi + echo "" + else + print_status "cert-manager namespace not found. Will install fresh." + fi +} + +# Cleanup function for failed installations +cleanup_on_failure() { + print_warning "Cleaning up due to failure..." + + # Optional cleanup - ask user + read -p "Do you want to clean up the Kind cluster and start fresh? (y/N): " -n 1 -r + echo + if [[ $REPLY =~ ^[Yy]$ ]]; then + print_status "Cleaning up Kind cluster..." + kind delete cluster --name bakery-ia-local || true + print_success "Cleanup completed. You can run the script again." + else + print_status "Keeping existing setup. You can continue manually or run the script again." + fi +} + +# Trap function to handle script interruption +trap 'echo ""; print_warning "Script interrupted. Partial setup may be present."; cleanup_on_failure; exit 1' INT TERM + +# Main execution +main() { + echo "Starting HTTPS setup for Bakery IA..." + + # Set error handling for individual steps + local step_failed=false + + check_prerequisites || { step_failed=true; } + if [ "$step_failed" = false ]; then + check_current_cert_manager_status || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + install_cert_manager || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + install_nginx_ingress || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + setup_cluster_issuers || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + deploy_with_https || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + check_certificates || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + update_hosts_file || { step_failed=true; } + fi + if [ "$step_failed" = false ]; then + export_ca_certificate || { step_failed=true; } + fi + + if [ "$step_failed" = false ]; then + display_access_info + print_success "Setup completed successfully! 🚀" + else + print_error "Setup failed at one or more steps. Check the output above for details." + cleanup_on_failure + exit 1 + fi +} + +# Run main function +main "$@" \ No newline at end of file diff --git a/infrastructure/scripts/maintenance/tag-and-push-images.sh b/infrastructure/scripts/maintenance/tag-and-push-images.sh new file mode 100755 index 00000000..4e2f6e75 --- /dev/null +++ b/infrastructure/scripts/maintenance/tag-and-push-images.sh @@ -0,0 +1,154 @@ +#!/bin/bash + +# Script to tag and push all Bakery IA images to a container registry +# Usage: ./tag-and-push-images.sh [REGISTRY_PREFIX] [TAG] +# Example: ./tag-and-push-images.sh myuser/bakery v1.0.0 + +set -e + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +# Configuration +REGISTRY_PREFIX="${1:-}" +TAG="${2:-latest}" + +if [ -z "$REGISTRY_PREFIX" ]; then + echo -e "${RED}Error: Registry prefix required${NC}" + echo "Usage: $0 REGISTRY_PREFIX [TAG]" + echo "" + echo "Examples:" + echo " Docker Hub: $0 myusername/bakery v1.0.0" + echo " GitHub: $0 ghcr.io/myorg/bakery v1.0.0" + echo " MicroK8s: $0 YOUR_VPS_IP:32000/bakery v1.0.0" + exit 1 +fi + +# List of all services +SERVICES=( + "gateway" + "dashboard" + "auth-service" + "tenant-service" + "training-service" + "forecasting-service" + "sales-service" + "external-service" + "notification-service" + "inventory-service" + "recipes-service" + "suppliers-service" + "pos-service" + "orders-service" + "production-service" + "procurement-service" + "orchestrator-service" + "alert-processor" + "ai-insights-service" + "demo-session-service" + "distribution-service" +) + +echo -e "${GREEN}========================================${NC}" +echo -e "${GREEN}Bakery IA - Image Tagging and Push${NC}" +echo -e "${GREEN}========================================${NC}" +echo "" +echo "Registry: $REGISTRY_PREFIX" +echo "Tag: $TAG" +echo "" + +# Function to tag image +tag_image() { + local service=$1 + local local_name="bakery/${service}" + local remote_name="${REGISTRY_PREFIX}-${service}:${TAG}" + + echo -e "${YELLOW}Tagging ${local_name} -> ${remote_name}${NC}" + + if docker tag "$local_name" "$remote_name"; then + echo -e "${GREEN}✓ Tagged $service${NC}" + return 0 + else + echo -e "${RED}✗ Failed to tag $service${NC}" + return 1 + fi +} + +# Function to push image +push_image() { + local service=$1 + local remote_name="${REGISTRY_PREFIX}-${service}:${TAG}" + + echo -e "${YELLOW}Pushing ${remote_name}${NC}" + + if docker push "$remote_name"; then + echo -e "${GREEN}✓ Pushed $service${NC}" + return 0 + else + echo -e "${RED}✗ Failed to push $service${NC}" + return 1 + fi +} + +# Check if user is logged in to registry +echo -e "${YELLOW}Checking registry authentication...${NC}" +if ! docker info > /dev/null 2>&1; then + echo -e "${RED}Error: Docker daemon not running${NC}" + exit 1 +fi + +echo -e "${GREEN}✓ Docker is running${NC}" +echo "" + +# Ask for confirmation +echo -e "${YELLOW}This will tag and push ${#SERVICES[@]} images.${NC}" +read -p "Continue? (yes/no): " confirm + +if [ "$confirm" != "yes" ]; then + echo "Cancelled." + exit 0 +fi + +echo "" +echo -e "${GREEN}Starting image tagging and push...${NC}" +echo "" + +# Track success/failure +SUCCESS_COUNT=0 +FAILED_SERVICES=() + +# Tag and push all images +for service in "${SERVICES[@]}"; do + if tag_image "$service" && push_image "$service"; then + ((SUCCESS_COUNT++)) + else + FAILED_SERVICES+=("$service") + fi + echo "" +done + +# Summary +echo -e "${GREEN}========================================${NC}" +echo -e "${GREEN}Summary${NC}" +echo -e "${GREEN}========================================${NC}" +echo "" +echo "Successfully pushed: $SUCCESS_COUNT/${#SERVICES[@]}" + +if [ ${#FAILED_SERVICES[@]} -gt 0 ]; then + echo -e "${RED}Failed services:${NC}" + for service in "${FAILED_SERVICES[@]}"; do + echo -e "${RED} - $service${NC}" + done + exit 1 +else + echo -e "${GREEN}All images pushed successfully!${NC}" + echo "" + echo "Next steps:" + echo "1. Update image names in infrastructure/environments/prod/k8s-manifests/kustomization.yaml" + echo "2. Deploy to production: kubectl apply -k infrastructure/environments/prod/k8s-manifests" +fi + +echo "" diff --git a/infrastructure/kubernetes/create-dockerhub-secret.sh b/infrastructure/scripts/setup/create-dockerhub-secret.sh similarity index 100% rename from infrastructure/kubernetes/create-dockerhub-secret.sh rename to infrastructure/scripts/setup/create-dockerhub-secret.sh diff --git a/infrastructure/tls/generate-certificates.sh b/infrastructure/scripts/setup/generate-certificates.sh similarity index 100% rename from infrastructure/tls/generate-certificates.sh rename to infrastructure/scripts/setup/generate-certificates.sh diff --git a/infrastructure/tls/generate-minio-certificates.sh b/infrastructure/scripts/setup/generate-minio-certificates.sh similarity index 100% rename from infrastructure/tls/generate-minio-certificates.sh rename to infrastructure/scripts/setup/generate-minio-certificates.sh diff --git a/infrastructure/kubernetes/setup-dockerhub-secrets.sh b/infrastructure/scripts/setup/setup-dockerhub-secrets.sh similarity index 100% rename from infrastructure/kubernetes/setup-dockerhub-secrets.sh rename to infrastructure/scripts/setup/setup-dockerhub-secrets.sh diff --git a/infrastructure/scripts/setup/setup-ghcr-secrets.sh b/infrastructure/scripts/setup/setup-ghcr-secrets.sh new file mode 100644 index 00000000..94e68b69 --- /dev/null +++ b/infrastructure/scripts/setup/setup-ghcr-secrets.sh @@ -0,0 +1,67 @@ +#!/bin/bash + +# Setup GitHub Container Registry (GHCR) image pull secrets for all namespaces +# This script creates docker-registry secrets for pulling images from GHCR + +set -e + +# GitHub Container Registry credentials +# Note: Use a GitHub Personal Access Token with 'read:packages' scope +GHCR_SERVER="ghcr.io" +GHCR_USERNAME="uals" # GitHub username +GHCR_PASSWORD="ghp_zzEY5Q58x1S0puraIoKEtbpue3A" # GitHub Personal Access Token +GHCR_EMAIL="ualfaro@gmail.com" +SECRET_NAME="ghcr-creds" + +# List of namespaces used in the project +NAMESPACES=( + "bakery-ia" + "bakery-ia-dev" + "bakery-ia-prod" + "default" +) + +echo "Setting up GitHub Container Registry image pull secrets..." +echo "==========================================================" +echo "" + +for namespace in "${NAMESPACES[@]}"; do + echo "Processing namespace: $namespace" + + # Create namespace if it doesn't exist + if ! kubectl get namespace "$namespace" >/dev/null 2>&1; then + echo " Creating namespace: $namespace" + kubectl create namespace "$namespace" + fi + + # Delete existing secret if it exists + if kubectl get secret "$SECRET_NAME" -n "$namespace" >/dev/null 2>&1; then + echo " Deleting existing secret in namespace: $namespace" + kubectl delete secret "$SECRET_NAME" -n "$namespace" + fi + + # Create the docker-registry secret for GHCR + echo " Creating GHCR secret in namespace: $namespace" + kubectl create secret docker-registry "$SECRET_NAME" \ + --docker-server="$GHCR_SERVER" \ + --docker-username="$GHCR_USERNAME" \ + --docker-password="$GHCR_PASSWORD" \ + --docker-email="$GHCR_EMAIL" \ + -n "$namespace" + + echo " ✓ Secret created successfully in namespace: $namespace" + echo "" +done + +echo "==========================================================" +echo "GitHub Container Registry secrets setup completed!" +echo "" +echo "The secret '$SECRET_NAME' has been created in all namespaces:" +for namespace in "${NAMESPACES[@]}"; do + echo " - $namespace" +done +echo "" +echo "Next steps:" +echo "1. Update your Kubernetes manifests to include the GHCR imagePullSecrets" +echo "2. Verify pods can pull images from GHCR: kubectl get pods -A" +echo "3. Consider updating your CI/CD pipelines to push images to GHCR" \ No newline at end of file diff --git a/infrastructure/scripts/verification/verify-registry.sh b/infrastructure/scripts/verification/verify-registry.sh new file mode 100755 index 00000000..9eeec6d8 --- /dev/null +++ b/infrastructure/scripts/verification/verify-registry.sh @@ -0,0 +1,152 @@ +#!/bin/bash + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Function to print colored output +print_status() { + echo -e "${BLUE}[INFO]${NC} $1" +} + +print_success() { + echo -e "${GREEN}[SUCCESS]${NC} $1" +} + +print_warning() { + echo -e "${YELLOW}[WARNING]${NC} $1" +} + +print_error() { + echo -e "${RED}[ERROR]${NC} $1" +} + +echo "=======================================" +echo "Registry Verification Script" +echo "=======================================" +echo "" + +# 1. Check if registry container is running +print_status "Checking if kind-registry container is running..." +if docker ps | grep -q "kind-registry"; then + print_success "Registry container is running" + REGISTRY_STATUS=$(docker ps --filter "name=kind-registry" --format "{{.Status}}") + echo " Status: $REGISTRY_STATUS" +else + print_error "Registry container is not running!" + echo " Run: ./kubernetes_restart.sh setup" + exit 1 +fi + +# 2. Check if registry is accessible on localhost:5001 +print_status "Checking if registry is accessible on localhost:5001..." +if curl -s http://localhost:5001/v2/_catalog > /dev/null 2>&1; then + print_success "Registry is accessible" + CATALOG=$(curl -s http://localhost:5001/v2/_catalog) + echo " Catalog: $CATALOG" +else + print_error "Registry is not accessible on localhost:5001" + exit 1 +fi + +# 3. Check if registry is connected to Kind network +print_status "Checking if registry is connected to Kind network..." +NETWORK_CHECK=$(docker inspect -f='{{json .NetworkSettings.Networks.kind}}' kind-registry 2>/dev/null) +if [ "$NETWORK_CHECK" != "null" ] && [ -n "$NETWORK_CHECK" ]; then + print_success "Registry is connected to Kind network" +else + print_warning "Registry is not connected to Kind network" + print_status "Connecting registry to Kind network..." + docker network connect "kind" "kind-registry" + if [ $? -eq 0 ]; then + print_success "Registry connected successfully" + else + print_error "Failed to connect registry to Kind network" + exit 1 + fi +fi + +# 4. Check if Kind cluster exists +print_status "Checking if Kind cluster exists..." +if kind get clusters | grep -q "bakery-ia-local"; then + print_success "Kind cluster 'bakery-ia-local' exists" +else + print_error "Kind cluster 'bakery-ia-local' not found" + echo " Run: ./kubernetes_restart.sh setup" + exit 1 +fi + +# 5. Check if registry is documented in cluster +print_status "Checking if registry is documented in cluster..." +if kubectl get configmap -n kube-public local-registry-hosting &>/dev/null; then + print_success "Registry is documented in cluster" + REG_HOST=$(kubectl get configmap -n kube-public local-registry-hosting -o jsonpath='{.data.localRegistryHosting\.v1}' 2>/dev/null | grep -o 'host: "[^"]*"' | cut -d'"' -f2) + echo " Registry host: $REG_HOST" +else + print_warning "Registry ConfigMap not found in cluster" + print_status "Creating ConfigMap..." + kubectl apply -f - < /dev/null 2>&1 + +print_status "Tagging image for local registry..." +docker tag busybox:latest localhost:5001/test/busybox:latest + +print_status "Pushing image to local registry..." +if docker push localhost:5001/test/busybox:latest > /dev/null 2>&1; then + print_success "Successfully pushed test image to registry" +else + print_error "Failed to push image to registry" + exit 1 +fi + +print_status "Verifying image in registry catalog..." +CATALOG=$(curl -s http://localhost:5001/v2/_catalog) +if echo "$CATALOG" | grep -q "test/busybox"; then + print_success "Test image found in registry catalog" +else + print_warning "Test image not found in catalog, but push succeeded" +fi + +# 7. Clean up test image +print_status "Cleaning up test images..." +docker rmi localhost:5001/test/busybox:latest > /dev/null 2>&1 +docker rmi busybox:latest > /dev/null 2>&1 + +echo "" +echo "=======================================" +print_success "Registry verification completed!" +echo "=======================================" +echo "" +print_status "Summary:" +echo " - Registry URL: localhost:5001" +echo " - Registry container: kind-registry" +echo " - Connected to Kind network: Yes" +echo " - Accessible from host: Yes" +echo " - Test push: Successful" +echo "" +print_status "Next steps:" +echo " 1. Ensure your Tiltfile has: default_registry('localhost:5001')" +echo " 2. Run: tilt up" +echo " 3. Images will be automatically pushed to localhost:5001/bakery/" +echo "" diff --git a/infrastructure/scripts/verification/verify-signoz.sh b/infrastructure/scripts/verification/verify-signoz.sh new file mode 100755 index 00000000..da1197a7 --- /dev/null +++ b/infrastructure/scripts/verification/verify-signoz.sh @@ -0,0 +1,446 @@ +#!/bin/bash + +# ============================================================================ +# SigNoz Verification Script for Bakery IA +# ============================================================================ +# This script verifies that SigNoz is properly deployed and functioning +# ============================================================================ + +set -e + +# Color codes for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Function to display help +show_help() { + echo "Usage: $0 [OPTIONS] ENVIRONMENT" + echo "" + echo "Verify SigNoz deployment for Bakery IA" + echo "" + echo "Arguments: + ENVIRONMENT Environment to verify (dev|prod)" + echo "" + echo "Options: + -h, --help Show this help message + -n, --namespace NAMESPACE Specify namespace (default: bakery-ia)" + echo "" + echo "Examples: + $0 dev # Verify development deployment + $0 prod # Verify production deployment + $0 --namespace monitoring dev # Verify with custom namespace" +} + +# Parse command line arguments +NAMESPACE="bakery-ia" + +while [[ $# -gt 0 ]]; do + case $1 in + -h|--help) + show_help + exit 0 + ;; + -n|--namespace) + NAMESPACE="$2" + shift 2 + ;; + dev|prod) + ENVIRONMENT="$1" + shift + ;; + *) + echo "Unknown argument: $1" + show_help + exit 1 + ;; + esac +done + +# Validate environment +if [[ -z "$ENVIRONMENT" ]]; then + echo "Error: Environment not specified. Use 'dev' or 'prod'." + show_help + exit 1 +fi + +if [[ "$ENVIRONMENT" != "dev" && "$ENVIRONMENT" != "prod" ]]; then + echo "Error: Invalid environment. Use 'dev' or 'prod'." + exit 1 +fi + +# Function to check if kubectl is configured +check_kubectl() { + if ! kubectl cluster-info &> /dev/null; then + echo "${RED}Error: kubectl is not configured or cannot connect to cluster.${NC}" + echo "Please ensure you have access to a Kubernetes cluster." + exit 1 + fi +} + +# Function to check namespace exists +check_namespace() { + if ! kubectl get namespace "$NAMESPACE" &> /dev/null; then + echo "${RED}Error: Namespace $NAMESPACE does not exist.${NC}" + echo "Please deploy SigNoz first using: ./deploy-signoz.sh $ENVIRONMENT" + exit 1 + fi +} + +# Function to verify SigNoz deployment +verify_deployment() { + echo "${BLUE}" + echo "==========================================" + echo "🔍 Verifying SigNoz Deployment" + echo "==========================================" + echo "Environment: $ENVIRONMENT" + echo "Namespace: $NAMESPACE" + echo "${NC}" + echo "" + + # Check if SigNoz helm release exists + echo "${BLUE}1. Checking Helm release...${NC}" + if helm list -n "$NAMESPACE" | grep -q signoz; then + echo "${GREEN}✅ SigNoz Helm release found${NC}" + else + echo "${RED}❌ SigNoz Helm release not found${NC}" + echo "Please deploy SigNoz first using: ./deploy-signoz.sh $ENVIRONMENT" + exit 1 + fi + echo "" + + # Check pod status + echo "${BLUE}2. Checking pod status...${NC}" + local total_pods=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz 2>/dev/null | grep -v "NAME" | wc -l | tr -d ' ' || echo "0") + local running_pods=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz --field-selector=status.phase=Running 2>/dev/null | grep -c "Running" || echo "0") + local ready_pods=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz 2>/dev/null | grep "Running" | grep "1/1" | wc -l | tr -d ' ' || echo "0") + + echo "Total pods: $total_pods" + echo "Running pods: $running_pods" + echo "Ready pods: $ready_pods" + + if [[ $total_pods -eq 0 ]]; then + echo "${RED}❌ No SigNoz pods found${NC}" + exit 1 + fi + + if [[ $running_pods -eq $total_pods ]]; then + echo "${GREEN}✅ All pods are running${NC}" + else + echo "${YELLOW}⚠️ Some pods are not running${NC}" + fi + + if [[ $ready_pods -eq $total_pods ]]; then + echo "${GREEN}✅ All pods are ready${NC}" + else + echo "${YELLOW}⚠️ Some pods are not ready${NC}" + fi + echo "" + + # Show pod details + echo "${BLUE}Pod Details:${NC}" + kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + echo "" + + # Check services + echo "${BLUE}3. Checking services...${NC}" + local service_count=$(kubectl get svc -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz 2>/dev/null | grep -v "NAME" | wc -l | tr -d ' ' || echo "0") + + if [[ $service_count -gt 0 ]]; then + echo "${GREEN}✅ Services found ($service_count services)${NC}" + kubectl get svc -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + else + echo "${RED}❌ No services found${NC}" + fi + echo "" + + # Check ingress + echo "${BLUE}4. Checking ingress...${NC}" + local ingress_count=$(kubectl get ingress -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz 2>/dev/null | grep -v "NAME" | wc -l | tr -d ' ' || echo "0") + + if [[ $ingress_count -gt 0 ]]; then + echo "${GREEN}✅ Ingress found ($ingress_count ingress resources)${NC}" + kubectl get ingress -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + else + echo "${YELLOW}⚠️ No ingress found (may be configured in main namespace)${NC}" + fi + echo "" + + # Check PVCs + echo "${BLUE}5. Checking persistent volume claims...${NC}" + local pvc_count=$(kubectl get pvc -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz 2>/dev/null | grep -v "NAME" | wc -l | tr -d ' ' || echo "0") + + if [[ $pvc_count -gt 0 ]]; then + echo "${GREEN}✅ PVCs found ($pvc_count PVCs)${NC}" + kubectl get pvc -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + else + echo "${YELLOW}⚠️ No PVCs found (may not be required for all components)${NC}" + fi + echo "" + + # Check resource usage + echo "${BLUE}6. Checking resource usage...${NC}" + if command -v kubectl &> /dev/null && kubectl top pods -n "$NAMESPACE" &> /dev/null; then + echo "${GREEN}✅ Resource usage:${NC}" + kubectl top pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz + else + echo "${YELLOW}⚠️ Metrics server not available or no resource usage data${NC}" + fi + echo "" + + # Check logs for errors + echo "${BLUE}7. Checking for errors in logs...${NC}" + local error_found=false + + # Check each pod for errors + while IFS= read -r pod; do + if [[ -n "$pod" ]]; then + local pod_errors=$(kubectl logs -n "$NAMESPACE" "$pod" 2>/dev/null | grep -i "error\|exception\|fail\|crash" | wc -l || echo "0") + if [[ $pod_errors -gt 0 ]]; then + echo "${RED}❌ Errors found in pod $pod ($pod_errors errors)${NC}" + error_found=true + fi + fi + done < <(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz -o name | sed 's|pod/||') + + if [[ "$error_found" == false ]]; then + echo "${GREEN}✅ No errors found in logs${NC}" + fi + echo "" + + # Environment-specific checks + if [[ "$ENVIRONMENT" == "dev" ]]; then + verify_dev_specific + else + verify_prod_specific + fi + + # Show access information + show_access_info +} + +# Function for development-specific verification +verify_dev_specific() { + echo "${BLUE}8. Development-specific checks...${NC}" + + # Check if ingress is configured + if kubectl get ingress -n "$NAMESPACE" 2>/dev/null | grep -q "monitoring.bakery-ia.local"; then + echo "${GREEN}✅ Development ingress configured${NC}" + else + echo "${YELLOW}⚠️ Development ingress not found${NC}" + fi + + # Check unified signoz component resource limits (should be lower for dev) + local signoz_mem=$(kubectl get deployment -n "$NAMESPACE" -l app.kubernetes.io/component=query-service -o jsonpath='{.items[0].spec.template.spec.containers[0].resources.limits.memory}' 2>/dev/null || echo "") + if [[ -n "$signoz_mem" ]]; then + echo "${GREEN}✅ SigNoz component found (memory limit: $signoz_mem)${NC}" + else + echo "${YELLOW}⚠️ Could not verify SigNoz component resources${NC}" + fi + + # Check single replica setup for dev + local replicas=$(kubectl get deployment -n "$NAMESPACE" -l app.kubernetes.io/component=query-service -o jsonpath='{.items[0].spec.replicas}' 2>/dev/null || echo "0") + if [[ $replicas -eq 1 ]]; then + echo "${GREEN}✅ Single replica configuration (appropriate for dev)${NC}" + else + echo "${YELLOW}⚠️ Multiple replicas detected (replicas: $replicas)${NC}" + fi + echo "" +} + +# Function for production-specific verification +verify_prod_specific() { + echo "${BLUE}8. Production-specific checks...${NC}" + + # Check if TLS is configured + if kubectl get ingress -n "$NAMESPACE" 2>/dev/null | grep -q "signoz-tls"; then + echo "${GREEN}✅ TLS certificate configured${NC}" + else + echo "${YELLOW}⚠️ TLS certificate not found${NC}" + fi + + # Check if multiple replicas are running for HA + local signoz_replicas=$(kubectl get deployment -n "$NAMESPACE" -l app.kubernetes.io/component=query-service -o jsonpath='{.items[0].spec.replicas}' 2>/dev/null || echo "1") + if [[ $signoz_replicas -gt 1 ]]; then + echo "${GREEN}✅ High availability configured ($signoz_replicas SigNoz replicas)${NC}" + else + echo "${YELLOW}⚠️ Single SigNoz replica detected (not highly available)${NC}" + fi + + # Check Zookeeper replicas (critical for production) + local zk_replicas=$(kubectl get statefulset -n "$NAMESPACE" -l app.kubernetes.io/component=zookeeper -o jsonpath='{.items[0].spec.replicas}' 2>/dev/null || echo "0") + if [[ $zk_replicas -eq 3 ]]; then + echo "${GREEN}✅ Zookeeper properly configured with 3 replicas${NC}" + elif [[ $zk_replicas -gt 0 ]]; then + echo "${YELLOW}⚠️ Zookeeper has $zk_replicas replicas (recommend 3 for production)${NC}" + else + echo "${RED}❌ Zookeeper not found${NC}" + fi + + # Check OTel Collector replicas + local otel_replicas=$(kubectl get deployment -n "$NAMESPACE" -l app.kubernetes.io/component=otel-collector -o jsonpath='{.items[0].spec.replicas}' 2>/dev/null || echo "1") + if [[ $otel_replicas -gt 1 ]]; then + echo "${GREEN}✅ OTel Collector HA configured ($otel_replicas replicas)${NC}" + else + echo "${YELLOW}⚠️ Single OTel Collector replica${NC}" + fi + + # Check resource limits (should be higher for prod) + local signoz_mem=$(kubectl get deployment -n "$NAMESPACE" -l app.kubernetes.io/component=query-service -o jsonpath='{.items[0].spec.template.spec.containers[0].resources.limits.memory}' 2>/dev/null || echo "") + if [[ -n "$signoz_mem" ]]; then + echo "${GREEN}✅ Production resource limits applied (memory: $signoz_mem)${NC}" + else + echo "${YELLOW}⚠️ Could not verify resource limits${NC}" + fi + + # Check HPA (Horizontal Pod Autoscaler) + local hpa_count=$(kubectl get hpa -n "$NAMESPACE" 2>/dev/null | grep -c signoz || echo "0") + if [[ $hpa_count -gt 0 ]]; then + echo "${GREEN}✅ Horizontal Pod Autoscaler configured${NC}" + else + echo "${YELLOW}⚠️ No HPA found (consider enabling for production)${NC}" + fi + echo "" +} + +# Function to show access information +show_access_info() { + echo "${BLUE}" + echo "==========================================" + echo "📋 Access Information" + echo "==========================================" + echo "${NC}" + + if [[ "$ENVIRONMENT" == "dev" ]]; then + echo "SigNoz UI: http://monitoring.bakery-ia.local" + echo "" + echo "OpenTelemetry Collector (within cluster):" + echo " gRPC: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4317" + echo " HTTP: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4318" + echo "" + echo "Port-forward for local access:" + echo " kubectl port-forward -n $NAMESPACE svc/signoz 8080:8080" + echo " kubectl port-forward -n $NAMESPACE svc/signoz-otel-collector 4317:4317" + echo " kubectl port-forward -n $NAMESPACE svc/signoz-otel-collector 4318:4318" + else + echo "SigNoz UI: https://monitoring.bakewise.ai" + echo "" + echo "OpenTelemetry Collector (within cluster):" + echo " gRPC: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4317" + echo " HTTP: signoz-otel-collector.$NAMESPACE.svc.cluster.local:4318" + fi + + echo "" + echo "Default Credentials:" + echo " Username: admin@example.com" + echo " Password: admin" + echo "" + echo "⚠️ IMPORTANT: Change default password after first login!" + echo "" + + # Show connection test commands + echo "Connection Test Commands:" + if [[ "$ENVIRONMENT" == "dev" ]]; then + echo " # Test SigNoz UI" + echo " curl http://monitoring.bakery-ia.local" + echo "" + echo " # Test via port-forward" + echo " kubectl port-forward -n $NAMESPACE svc/signoz 8080:8080" + echo " curl http://localhost:8080" + else + echo " # Test SigNoz UI" + echo " curl https://monitoring.bakewise.ai" + echo "" + echo " # Test API health" + echo " kubectl port-forward -n $NAMESPACE svc/signoz 8080:8080" + echo " curl http://localhost:8080/api/v1/health" + fi + echo "" +} + +# Function to run connectivity tests +run_connectivity_tests() { + echo "${BLUE}" + echo "==========================================" + echo "🔗 Running Connectivity Tests" + echo "==========================================" + echo "${NC}" + + # Test pod readiness first + echo "Checking pod readiness..." + local ready_pods=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz --field-selector=status.phase=Running 2>/dev/null | grep "Running" | grep -c "1/1\|2/2" || echo "0") + local total_pods=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=signoz 2>/dev/null | grep -v "NAME" | wc -l | tr -d ' ' || echo "0") + + if [[ $ready_pods -eq $total_pods && $total_pods -gt 0 ]]; then + echo "${GREEN}✅ All pods are ready ($ready_pods/$total_pods)${NC}" + else + echo "${YELLOW}⚠️ Some pods not ready ($ready_pods/$total_pods)${NC}" + fi + echo "" + + # Test internal service connectivity + echo "Testing internal service connectivity..." + local signoz_svc=$(kubectl get svc -n "$NAMESPACE" signoz -o jsonpath='{.spec.clusterIP}' 2>/dev/null || echo "") + if [[ -n "$signoz_svc" ]]; then + echo "${GREEN}✅ SigNoz service accessible at $signoz_svc:8080${NC}" + else + echo "${RED}❌ SigNoz service not found${NC}" + fi + + local otel_svc=$(kubectl get svc -n "$NAMESPACE" signoz-otel-collector -o jsonpath='{.spec.clusterIP}' 2>/dev/null || echo "") + if [[ -n "$otel_svc" ]]; then + echo "${GREEN}✅ OTel Collector service accessible at $otel_svc:4317 (gRPC), $otel_svc:4318 (HTTP)${NC}" + else + echo "${RED}❌ OTel Collector service not found${NC}" + fi + echo "" + + if [[ "$ENVIRONMENT" == "prod" ]]; then + echo "${YELLOW}⚠️ Production connectivity tests require valid DNS and TLS${NC}" + echo " Please ensure monitoring.bakewise.ai resolves to your cluster" + echo "" + echo "Manual test:" + echo " curl -I https://monitoring.bakewise.ai" + fi +} + +# Main execution +main() { + echo "${BLUE}" + echo "==========================================" + echo "🔍 SigNoz Verification for Bakery IA" + echo "==========================================" + echo "${NC}" + + # Check prerequisites + check_kubectl + check_namespace + + # Verify deployment + verify_deployment + + # Run connectivity tests + run_connectivity_tests + + echo "${GREEN}" + echo "==========================================" + echo "✅ Verification Complete" + echo "==========================================" + echo "${NC}" + + echo "Summary:" + echo " Environment: $ENVIRONMENT" + echo " Namespace: $NAMESPACE" + echo "" + echo "Next Steps:" + echo " 1. Access SigNoz UI and verify dashboards" + echo " 2. Configure alert rules for your services" + echo " 3. Instrument your applications with OpenTelemetry" + echo " 4. Set up custom dashboards for key metrics" + echo "" +} + +# Run main function +main \ No newline at end of file diff --git a/infrastructure/tls/ca/ca-cert.pem b/infrastructure/security/certificates/ca/ca-cert.pem similarity index 100% rename from infrastructure/tls/ca/ca-cert.pem rename to infrastructure/security/certificates/ca/ca-cert.pem diff --git a/infrastructure/security/certificates/ca/ca-cert.srl b/infrastructure/security/certificates/ca/ca-cert.srl new file mode 100644 index 00000000..3f115f6d --- /dev/null +++ b/infrastructure/security/certificates/ca/ca-cert.srl @@ -0,0 +1 @@ +1BE074336AF19EA8C676D7E8D0185EBCA0B1D203 diff --git a/infrastructure/tls/ca/ca-key.pem b/infrastructure/security/certificates/ca/ca-key.pem similarity index 100% rename from infrastructure/tls/ca/ca-key.pem rename to infrastructure/security/certificates/ca/ca-key.pem diff --git a/infrastructure/security/certificates/generate-certificates.sh b/infrastructure/security/certificates/generate-certificates.sh new file mode 100755 index 00000000..d1a3c119 --- /dev/null +++ b/infrastructure/security/certificates/generate-certificates.sh @@ -0,0 +1,204 @@ +#!/usr/bin/env bash + +# Generate TLS certificates for PostgreSQL and Redis +# Self-signed certificates for internal cluster use + +set -e + +TLS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +CA_DIR="$TLS_DIR/ca" +POSTGRES_DIR="$TLS_DIR/postgres" +REDIS_DIR="$TLS_DIR/redis" + +echo "Generating TLS certificates for Bakery IA..." +echo "Directory: $TLS_DIR" +echo "" + +# Clean up old certificates +echo "Cleaning up old certificates..." +rm -rf "$CA_DIR"/* "$POSTGRES_DIR"/* "$REDIS_DIR"/* 2>/dev/null || true + +# ===================================== +# 1. Generate Certificate Authority (CA) +# ===================================== + +echo "Step 1: Generating Certificate Authority (CA)..." + +# Generate CA private key +openssl genrsa -out "$CA_DIR/ca-key.pem" 4096 + +# Generate CA certificate (valid for 10 years) +openssl req -new -x509 -days 3650 -key "$CA_DIR/ca-key.pem" -out "$CA_DIR/ca-cert.pem" \ + -subj "/C=US/ST=California/L=SanFrancisco/O=BakeryIA/OU=Security/CN=BakeryIA-CA" + +echo "✓ CA certificate generated" +echo "" + +# ===================================== +# 2. Generate PostgreSQL Server Certificates +# ===================================== + +echo "Step 2: Generating PostgreSQL server certificates..." + +# Generate PostgreSQL server private key +openssl genrsa -out "$POSTGRES_DIR/server-key.pem" 4096 + +# Create certificate signing request (CSR) +openssl req -new -key "$POSTGRES_DIR/server-key.pem" -out "$POSTGRES_DIR/server.csr" \ + -subj "/C=US/ST=California/L=SanFrancisco/O=BakeryIA/OU=Database/CN=*.bakery-ia.svc.cluster.local" + +# Create SAN (Subject Alternative Names) configuration +cat > "$POSTGRES_DIR/san.cnf" < "$REDIS_DIR/san.cnf" < "$MINIO_DIR/san.cnf" </dev/null || true + +# ===================================== +# Generate Mailu Server Certificates +# ===================================== + +echo "Generating Mailu server certificates..." + +# Generate Mailu server private key +openssl genrsa -out "$MAILU_DIR/mailu-key.pem" 4096 + +# Create certificate signing request (CSR) +openssl req -new -key "$MAILU_DIR/mailu-key.pem" -out "$MAILU_DIR/mailu.csr" \ + -subj "/C=US/ST=California/L=SanFrancisco/O=BakeryIA/OU=Mail/CN=mail.bakewise.ai" + +# Create SAN configuration for Mailu +cat > "$MAILU_DIR/san.cnf" < /dev/null 2>&1; then + echo "Migrations are complete - alembic_version table exists" + exit 0 + else + echo "Migrations not complete yet, waiting... ($((COUNT + 1))/$ATTEMPTS)" + sleep 10 + fi + COUNT=$((COUNT + 1)) + done + + echo "Timeout waiting for migrations to complete" + exit 1 env: - name: AUTH_DB_HOST valueFrom: @@ -89,6 +103,13 @@ spec: secretKeyRef: name: database-secrets key: AUTH_DB_USER + - name: AUTH_DB_PASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: AUTH_DB_PASSWORD + - name: AUTH_DB_NAME + value: "auth_db" containers: - name: auth-service image: bakery/auth-service:latest diff --git a/infrastructure/kubernetes/base/migrations/auth-migration-job.yaml b/infrastructure/services/microservices/auth/migrations/auth-migration-job.yaml similarity index 96% rename from infrastructure/kubernetes/base/migrations/auth-migration-job.yaml rename to infrastructure/services/microservices/auth/migrations/auth-migration-job.yaml index 40a3ee01..bb948ed7 100644 --- a/infrastructure/kubernetes/base/migrations/auth-migration-job.yaml +++ b/infrastructure/services/microservices/auth/migrations/auth-migration-job.yaml @@ -18,6 +18,7 @@ spec: spec: imagePullSecrets: - name: dockerhub-creds + - name: ghcr-creds initContainers: - name: wait-for-db image: postgres:17-alpine @@ -31,7 +32,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/auth-service:dev + image: bakery/auth-service command: ["python", "/app/shared/scripts/run_migrations.py", "auth"] env: - name: AUTH_DATABASE_URL diff --git a/infrastructure/kubernetes/base/cronjobs/demo-cleanup-cronjob.yaml b/infrastructure/services/microservices/demo-session/cronjobs/demo-cleanup-cronjob.yaml similarity index 52% rename from infrastructure/kubernetes/base/cronjobs/demo-cleanup-cronjob.yaml rename to infrastructure/services/microservices/demo-session/cronjobs/demo-cleanup-cronjob.yaml index ff77d4f0..78a1daf7 100644 --- a/infrastructure/kubernetes/base/cronjobs/demo-cleanup-cronjob.yaml +++ b/infrastructure/services/microservices/demo-session/cronjobs/demo-cleanup-cronjob.yaml @@ -24,6 +24,38 @@ spec: labels: app: demo-cleanup spec: + initContainers: + - name: wait-for-migrations + image: postgres:17-alpine + command: ["sh", "-c", + "echo 'Waiting for database to be ready...' && \ + until pg_isready -h demo-session-db-service -p 5432; do sleep 2; done && \ + echo 'Database ready, checking for demo_sessions table...' && \ + MAX_ATTEMPTS=60 && \ + ATTEMPT=1 && \ + until psql -h demo-session-db-service -U demo_session_user -d demo_session_db -c 'SELECT 1 FROM demo_sessions LIMIT 1;' 2>/dev/null; do \ + if [ $ATTEMPT -ge $MAX_ATTEMPTS ]; then \ + echo 'ERROR: demo_sessions table not created after maximum attempts'; \ + exit 1; \ + fi; \ + echo \"Waiting for demo_sessions table to be created by migrations... (attempt $ATTEMPT/$MAX_ATTEMPTS)\"; \ + ATTEMPT=$((ATTEMPT + 1)); \ + sleep 5; \ + done && \ + echo 'demo_sessions table is ready!'"] + env: + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: DEMO_SESSION_DB_PASSWORD + resources: + requests: + memory: "64Mi" + cpu: "50m" + limits: + memory: "128Mi" + cpu: "100m" containers: - name: cleanup-trigger image: curlimages/curl:latest diff --git a/infrastructure/kubernetes/base/components/demo-session/database.yaml b/infrastructure/services/microservices/demo-session/database.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/demo-session/database.yaml rename to infrastructure/services/microservices/demo-session/database.yaml diff --git a/infrastructure/kubernetes/base/deployments/demo-cleanup-worker.yaml b/infrastructure/services/microservices/demo-session/demo-cleanup-worker.yaml similarity index 67% rename from infrastructure/kubernetes/base/deployments/demo-cleanup-worker.yaml rename to infrastructure/services/microservices/demo-session/demo-cleanup-worker.yaml index d4e5aac3..2059252e 100644 --- a/infrastructure/kubernetes/base/deployments/demo-cleanup-worker.yaml +++ b/infrastructure/services/microservices/demo-session/demo-cleanup-worker.yaml @@ -21,6 +21,38 @@ spec: spec: imagePullSecrets: - name: dockerhub-creds + initContainers: + - name: wait-for-migrations + image: postgres:17-alpine + command: ["sh", "-c", + "echo 'Waiting for database to be ready...' && \ + until pg_isready -h demo-session-db-service -p 5432; do sleep 2; done && \ + echo 'Database ready, checking for demo_sessions table...' && \ + MAX_ATTEMPTS=60 && \ + ATTEMPT=1 && \ + until psql -h demo-session-db-service -U demo_session_user -d demo_session_db -c 'SELECT 1 FROM demo_sessions LIMIT 1;' 2>/dev/null; do \ + if [ $ATTEMPT -ge $MAX_ATTEMPTS ]; then \ + echo 'ERROR: demo_sessions table not created after maximum attempts'; \ + exit 1; \ + fi; \ + echo \"Waiting for demo_sessions table to be created by migrations... (attempt $ATTEMPT/$MAX_ATTEMPTS)\"; \ + ATTEMPT=$((ATTEMPT + 1)); \ + sleep 5; \ + done && \ + echo 'demo_sessions table is ready!'"] + env: + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: DEMO_SESSION_DB_PASSWORD + resources: + requests: + memory: "64Mi" + cpu: "50m" + limits: + memory: "128Mi" + cpu: "100m" containers: - name: worker image: bakery/demo-session-service diff --git a/infrastructure/kubernetes/base/components/demo-session/deployment.yaml b/infrastructure/services/microservices/demo-session/deployment.yaml similarity index 64% rename from infrastructure/kubernetes/base/components/demo-session/deployment.yaml rename to infrastructure/services/microservices/demo-session/deployment.yaml index e522f140..c7606043 100644 --- a/infrastructure/kubernetes/base/components/demo-session/deployment.yaml +++ b/infrastructure/services/microservices/demo-session/deployment.yaml @@ -96,3 +96,40 @@ spec: - name: wait-for-redis image: busybox:1.36 command: ['sh', '-c', 'until nc -z redis-service 6379; do echo waiting for redis; sleep 2; done'] + - name: wait-for-migrations + image: postgres:17-alpine + command: ["sh", "-c", + "echo 'Waiting for database to be ready...' && \ + until pg_isready -h demo-session-db-service -p 5432; do sleep 2; done && \ + echo 'Database ready, checking for demo_sessions table...' && \ + MAX_ATTEMPTS=60 && \ + ATTEMPT=1 && \ + while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do \ + if psql -h demo-session-db-service -U demo_session_user -d demo_session_db -c 'SELECT 1 FROM demo_sessions LIMIT 1;' 2>/dev/null; then \ + break; \ + fi; \ + echo \"Waiting for demo_sessions table to be created by migrations... (attempt $ATTEMPT/$MAX_ATTEMPTS)\"; \ + ATTEMPT=$((ATTEMPT + 1)); \ + sleep 5; \ + done && \ + if [ $ATTEMPT -gt $MAX_ATTEMPTS ]; then \ + echo 'ERROR: demo_sessions table not created after maximum attempts'; \ + exit 1; \ + fi && \ + echo 'demo_sessions table is ready!' && \ + echo 'Checking if table has required columns...' && \ + psql -h demo-session-db-service -U demo_session_user -d demo_session_db -c '\\d demo_sessions' && \ + echo 'Table structure verified!'"] + env: + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: DEMO_SESSION_DB_PASSWORD + resources: + requests: + memory: "64Mi" + cpu: "50m" + limits: + memory: "128Mi" + cpu: "100m" diff --git a/infrastructure/services/microservices/demo-session/deployment.yaml.backup b/infrastructure/services/microservices/demo-session/deployment.yaml.backup new file mode 100644 index 00000000..f918adc5 --- /dev/null +++ b/infrastructure/services/microservices/demo-session/deployment.yaml.backup @@ -0,0 +1,135 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: demo-session-service + namespace: bakery-ia + labels: + app: demo-session-service + component: demo-session +spec: + replicas: 2 + selector: + matchLabels: + app: demo-session-service + template: + metadata: + labels: + app: demo-session-service + component: demo-session + spec: + serviceAccountName: demo-session-sa + containers: + - name: demo-session-service + image: bakery/demo-session-service:latest + ports: + - containerPort: 8000 + name: http + envFrom: + - configMapRef: + name: bakery-config + env: + - name: SERVICE_NAME + value: "demo-session-service" + - name: ALERT_PROCESSOR_SERVICE_URL + value: "http://alert-processor:8000" + - name: DEMO_SESSION_DATABASE_URL + valueFrom: + secretKeyRef: + name: database-secrets + key: DEMO_SESSION_DATABASE_URL + - name: REDIS_PASSWORD + valueFrom: + secretKeyRef: + name: redis-secrets + key: REDIS_PASSWORD + - name: REDIS_URL + value: "rediss://:$(REDIS_PASSWORD)@redis-service:6379/0?ssl_cert_reqs=none" + - name: AUTH_SERVICE_URL + value: "http://auth-service:8000" + - name: TENANT_SERVICE_URL + value: "http://tenant-service:8000" + - name: INVENTORY_SERVICE_URL + value: "http://inventory-service:8000" + - name: RECIPES_SERVICE_URL + value: "http://recipes-service:8000" + - name: SALES_SERVICE_URL + value: "http://sales-service:8000" + - name: ORDERS_SERVICE_URL + value: "http://orders-service:8000" + - name: PRODUCTION_SERVICE_URL + value: "http://production-service:8000" + - name: SUPPLIERS_SERVICE_URL + value: "http://suppliers-service:8000" + - name: LOG_LEVEL + value: "INFO" + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + resources: + requests: + memory: "256Mi" + cpu: "200m" + limits: + memory: "512Mi" + cpu: "500m" + livenessProbe: + httpGet: + path: /health + port: 8000 + initialDelaySeconds: 30 + periodSeconds: 30 + readinessProbe: + httpGet: + path: /health + port: 8000 + initialDelaySeconds: 10 + periodSeconds: 10 + startupProbe: + httpGet: + path: /health + port: 8000 + initialDelaySeconds: 10 + periodSeconds: 5 + failureThreshold: 30 + initContainers: + - name: wait-for-redis + image: busybox:1.36 + command: ['sh', '-c', 'until nc -z redis-service 6379; do echo waiting for redis; sleep 2; done'] + - name: wait-for-migrations + image: localhost:5000/postgres_17-alpine + command: ["sh", "-c", + "echo 'Waiting for database to be ready...' && \ + until pg_isready -h demo-session-db-service -p 5432; do sleep 2; done && \ + echo 'Database ready, checking for demo_sessions table...' && \ + MAX_ATTEMPTS=60 && \ + ATTEMPT=1 && \ + while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do \ + if psql -h demo-session-db-service -U demo_session_user -d demo_session_db -c 'SELECT 1 FROM demo_sessions LIMIT 1;' 2>/dev/null; then \ + break; \ + fi; \ + echo \"Waiting for demo_sessions table to be created by migrations... (attempt $ATTEMPT/$MAX_ATTEMPTS)\"; \ + ATTEMPT=$((ATTEMPT + 1)); \ + sleep 5; \ + done && \ + if [ $ATTEMPT -gt $MAX_ATTEMPTS ]; then \ + echo 'ERROR: demo_sessions table not created after maximum attempts'; \ + exit 1; \ + fi && \ + echo 'demo_sessions table is ready!' && \ + echo 'Checking if table has required columns...' && \ + psql -h demo-session-db-service -U demo_session_user -d demo_session_db -c '\\d demo_sessions' && \ + echo 'Table structure verified!'"] + env: + - name: PGPASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: DEMO_SESSION_DB_PASSWORD + resources: + requests: + memory: "64Mi" + cpu: "50m" + limits: + memory: "128Mi" + cpu: "100m" diff --git a/infrastructure/kubernetes/base/migrations/demo-seed-rbac.yaml b/infrastructure/services/microservices/demo-session/migrations/demo-seed-rbac.yaml similarity index 100% rename from infrastructure/kubernetes/base/migrations/demo-seed-rbac.yaml rename to infrastructure/services/microservices/demo-session/migrations/demo-seed-rbac.yaml diff --git a/infrastructure/kubernetes/base/migrations/demo-session-migration-job.yaml b/infrastructure/services/microservices/demo-session/migrations/demo-session-migration-job.yaml similarity index 85% rename from infrastructure/kubernetes/base/migrations/demo-session-migration-job.yaml rename to infrastructure/services/microservices/demo-session/migrations/demo-session-migration-job.yaml index c8c34edc..d08a672b 100644 --- a/infrastructure/kubernetes/base/migrations/demo-session-migration-job.yaml +++ b/infrastructure/services/microservices/demo-session/migrations/demo-session-migration-job.yaml @@ -30,8 +30,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/demo-session-service:latest - imagePullPolicy: Never + image: bakery/demo-session-service command: ["python", "/app/shared/scripts/run_migrations.py", "demo_session"] env: - name: DEMO_SESSION_DATABASE_URL @@ -39,6 +38,12 @@ spec: secretKeyRef: name: database-secrets key: DEMO_SESSION_DATABASE_URL + - name: DB_FORCE_RECREATE + valueFrom: + configMapKeyRef: + name: bakery-config + key: DB_FORCE_RECREATE + optional: true - name: LOG_LEVEL value: "INFO" resources: diff --git a/infrastructure/kubernetes/base/components/demo-session/rbac.yaml b/infrastructure/services/microservices/demo-session/rbac.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/demo-session/rbac.yaml rename to infrastructure/services/microservices/demo-session/rbac.yaml diff --git a/infrastructure/kubernetes/base/components/demo-session/service.yaml b/infrastructure/services/microservices/demo-session/service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/demo-session/service.yaml rename to infrastructure/services/microservices/demo-session/service.yaml diff --git a/infrastructure/kubernetes/base/components/distribution/distribution-deployment.yaml b/infrastructure/services/microservices/distribution/distribution-service.yaml similarity index 67% rename from infrastructure/kubernetes/base/components/distribution/distribution-deployment.yaml rename to infrastructure/services/microservices/distribution/distribution-service.yaml index 2d169622..a2a25f3a 100644 --- a/infrastructure/kubernetes/base/components/distribution/distribution-deployment.yaml +++ b/infrastructure/services/microservices/distribution/distribution-service.yaml @@ -19,6 +19,8 @@ spec: app.kubernetes.io/name: distribution-service app.kubernetes.io/component: microservice spec: + imagePullSecrets: + - name: dockerhub-creds initContainers: # Wait for Redis to be ready - name: wait-for-redis @@ -53,6 +55,7 @@ spec: - name: redis-tls mountPath: /tls readOnly: true + # Wait for database migration to complete - name: wait-for-migration image: postgres:17-alpine command: @@ -66,10 +69,23 @@ spec: sleep 2 done echo "Database is ready!" - # Give migrations extra time to complete after DB is ready - echo "Waiting for migrations to complete..." - sleep 10 - echo "Ready to start service" + + # Verify that migrations have completed by checking for the alembic_version table + ATTEMPTS=30 + COUNT=0 + until [ $COUNT -ge $ATTEMPTS ]; do + if PGPASSWORD="$DISTRIBUTION_DB_PASSWORD" psql -h "$DISTRIBUTION_DB_HOST" -p "$DISTRIBUTION_DB_PORT" -U "$DISTRIBUTION_DB_USER" -d "$DISTRIBUTION_DB_NAME" -c "\dt alembic_version" > /dev/null 2>&1; then + echo "Migrations are complete - alembic_version table exists" + exit 0 + else + echo "Migrations not complete yet, waiting... ($((COUNT + 1))/$ATTEMPTS)" + sleep 10 + fi + COUNT=$((COUNT + 1)) + done + + echo "Timeout waiting for migrations to complete" + exit 1 env: - name: DISTRIBUTION_DB_HOST valueFrom: @@ -86,6 +102,13 @@ spec: secretKeyRef: name: database-secrets key: DISTRIBUTION_DB_USER + - name: DISTRIBUTION_DB_PASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: DISTRIBUTION_DB_PASSWORD + - name: DISTRIBUTION_DB_NAME + value: "distribution_db" containers: - name: distribution-service image: bakery/distribution-service:latest @@ -93,6 +116,24 @@ spec: ports: - containerPort: 8000 name: http + env: + # OpenTelemetry Configuration + - name: OTEL_COLLECTOR_ENDPOINT + value: "http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318" + - name: OTEL_EXPORTER_OTLP_ENDPOINT + valueFrom: + configMapKeyRef: + name: bakery-config + key: OTEL_EXPORTER_OTLP_ENDPOINT + - name: OTEL_SERVICE_NAME + value: "distribution-service" + - name: ENABLE_TRACING + value: "true" + # Logging Configuration + - name: OTEL_LOGS_EXPORTER + value: "otlp" + - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED + value: "true" envFrom: - configMapRef: name: bakery-config @@ -104,27 +145,29 @@ spec: name: rabbitmq-secrets - secretRef: name: jwt-secrets - livenessProbe: - httpGet: - path: /health - port: 8000 - initialDelaySeconds: 30 - periodSeconds: 10 - timeoutSeconds: 5 - readinessProbe: - httpGet: - path: /health - port: 8000 - initialDelaySeconds: 5 - periodSeconds: 5 - timeoutSeconds: 3 resources: requests: memory: "256Mi" - cpu: "250m" + cpu: "100m" limits: memory: "512Mi" cpu: "500m" + livenessProbe: + httpGet: + path: /health/live + port: 8000 + initialDelaySeconds: 30 + timeoutSeconds: 5 + periodSeconds: 10 + failureThreshold: 3 + readinessProbe: + httpGet: + path: /health/ready + port: 8000 + initialDelaySeconds: 15 + timeoutSeconds: 3 + periodSeconds: 5 + failureThreshold: 5 securityContext: runAsUser: 1000 runAsGroup: 1000 @@ -138,6 +181,8 @@ spec: - name: redis-tls secret: secretName: redis-tls-secret + defaultMode: 0400 + --- apiVersion: v1 kind: Service @@ -149,12 +194,12 @@ metadata: app.kubernetes.io/component: microservice app.kubernetes.io/part-of: bakery-ia spec: + type: ClusterIP + ports: + - port: 8000 + targetPort: 8000 + protocol: TCP + name: http selector: app.kubernetes.io/name: distribution-service app.kubernetes.io/component: microservice - ports: - - protocol: TCP - port: 8000 - targetPort: 8000 - name: http - type: ClusterIP diff --git a/infrastructure/kubernetes/base/migrations/distribution-migration-job.yaml b/infrastructure/services/microservices/distribution/migrations/distribution-migration-job.yaml similarity index 95% rename from infrastructure/kubernetes/base/migrations/distribution-migration-job.yaml rename to infrastructure/services/microservices/distribution/migrations/distribution-migration-job.yaml index 9585baea..5690279e 100644 --- a/infrastructure/kubernetes/base/migrations/distribution-migration-job.yaml +++ b/infrastructure/services/microservices/distribution/migrations/distribution-migration-job.yaml @@ -31,7 +31,8 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/distribution-service:dev + image: bakery/distribution-service + imagePullPolicy: IfNotPresent command: ["python", "/app/shared/scripts/run_migrations.py", "distribution"] env: - name: DISTRIBUTION_DATABASE_URL diff --git a/infrastructure/kubernetes/base/cronjobs/external-data-rotation-cronjob.yaml b/infrastructure/services/microservices/external/cronjobs/external-data-rotation-cronjob.yaml similarity index 100% rename from infrastructure/kubernetes/base/cronjobs/external-data-rotation-cronjob.yaml rename to infrastructure/services/microservices/external/cronjobs/external-data-rotation-cronjob.yaml diff --git a/infrastructure/kubernetes/base/components/external/external-service.yaml b/infrastructure/services/microservices/external/external-service.yaml similarity index 71% rename from infrastructure/kubernetes/base/components/external/external-service.yaml rename to infrastructure/services/microservices/external/external-service.yaml index 049cf499..fcdedde0 100644 --- a/infrastructure/kubernetes/base/components/external/external-service.yaml +++ b/infrastructure/services/microservices/external/external-service.yaml @@ -59,27 +59,60 @@ spec: - name: redis-tls mountPath: /tls readOnly: true - # Check if external data is initialized - - name: check-data-initialized + # Wait for database migration to complete + - name: wait-for-migration image: postgres:17-alpine command: - - sh - - -c - - | - echo "Checking if data initialization is complete..." - # Convert asyncpg URL to psql-compatible format - DB_URL=$(echo "$DATABASE_URL" | sed 's/postgresql+asyncpg:/postgresql:/') - until psql "$DB_URL" -c "SELECT COUNT(*) FROM city_weather_data LIMIT 1;" > /dev/null 2>&1; do - echo "Waiting for initial data load..." + - sh + - -c + - | + echo "Waiting for external database and migrations to be ready..." + # Wait for database to be accessible + until pg_isready -h $EXTERNAL_DB_HOST -p $EXTERNAL_DB_PORT -U $EXTERNAL_DB_USER; do + echo "Database not ready yet, waiting..." + sleep 2 + done + echo "Database is ready!" + + # Verify that migrations have completed by checking for the alembic_version table + ATTEMPTS=30 + COUNT=0 + until [ $COUNT -ge $ATTEMPTS ]; do + if PGPASSWORD="$EXTERNAL_DB_PASSWORD" psql -h "$EXTERNAL_DB_HOST" -p "$EXTERNAL_DB_PORT" -U "$EXTERNAL_DB_USER" -d "$EXTERNAL_DB_NAME" -c "\dt alembic_version" > /dev/null 2>&1; then + echo "Migrations are complete - alembic_version table exists" + exit 0 + else + echo "Migrations not complete yet, waiting... ($((COUNT + 1))/$ATTEMPTS)" sleep 10 - done - echo "Data is initialized" + fi + COUNT=$((COUNT + 1)) + done + + echo "Timeout waiting for migrations to complete" + exit 1 env: - - name: DATABASE_URL - valueFrom: - secretKeyRef: - name: database-secrets - key: EXTERNAL_DATABASE_URL + - name: EXTERNAL_DB_HOST + valueFrom: + configMapKeyRef: + name: bakery-config + key: EXTERNAL_DB_HOST + - name: EXTERNAL_DB_PORT + valueFrom: + configMapKeyRef: + name: bakery-config + key: DB_PORT + - name: EXTERNAL_DB_USER + valueFrom: + secretKeyRef: + name: database-secrets + key: EXTERNAL_DB_USER + - name: EXTERNAL_DB_PASSWORD + valueFrom: + secretKeyRef: + name: database-secrets + key: EXTERNAL_DB_PASSWORD + - name: EXTERNAL_DB_NAME + value: "external_db" containers: - name: external-service diff --git a/infrastructure/kubernetes/base/jobs/external-data-init-job.yaml b/infrastructure/services/microservices/external/migrations/external-data-init-job.yaml similarity index 100% rename from infrastructure/kubernetes/base/jobs/external-data-init-job.yaml rename to infrastructure/services/microservices/external/migrations/external-data-init-job.yaml diff --git a/infrastructure/kubernetes/base/migrations/external-migration-job.yaml b/infrastructure/services/microservices/external/migrations/external-migration-job.yaml similarity index 96% rename from infrastructure/kubernetes/base/migrations/external-migration-job.yaml rename to infrastructure/services/microservices/external/migrations/external-migration-job.yaml index 3e7ccb3c..29bdf1ab 100644 --- a/infrastructure/kubernetes/base/migrations/external-migration-job.yaml +++ b/infrastructure/services/microservices/external/migrations/external-migration-job.yaml @@ -18,6 +18,7 @@ spec: spec: imagePullSecrets: - name: dockerhub-creds + - name: ghcr-creds initContainers: - name: wait-for-db image: postgres:17-alpine @@ -31,7 +32,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/external-service:dev + image: bakery/external-service command: ["python", "/app/shared/scripts/run_migrations.py", "external"] env: - name: EXTERNAL_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/forecasting/forecasting-service.yaml b/infrastructure/services/microservices/forecasting/forecasting-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/forecasting/forecasting-service.yaml rename to infrastructure/services/microservices/forecasting/forecasting-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/forecasting-migration-job.yaml b/infrastructure/services/microservices/forecasting/migrations/forecasting-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/forecasting-migration-job.yaml rename to infrastructure/services/microservices/forecasting/migrations/forecasting-migration-job.yaml index 313a8ae8..ed425e1c 100644 --- a/infrastructure/kubernetes/base/migrations/forecasting-migration-job.yaml +++ b/infrastructure/services/microservices/forecasting/migrations/forecasting-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/forecasting-service:dev + image: bakery/forecasting-service command: ["python", "/app/shared/scripts/run_migrations.py", "forecasting"] env: - name: FORECASTING_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/frontend/frontend-service.yaml b/infrastructure/services/microservices/frontend/frontend-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/frontend/frontend-service.yaml rename to infrastructure/services/microservices/frontend/frontend-service.yaml diff --git a/infrastructure/kubernetes/base/components/inventory/inventory-service.yaml b/infrastructure/services/microservices/inventory/inventory-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/inventory/inventory-service.yaml rename to infrastructure/services/microservices/inventory/inventory-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/inventory-migration-job.yaml b/infrastructure/services/microservices/inventory/migrations/inventory-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/inventory-migration-job.yaml rename to infrastructure/services/microservices/inventory/migrations/inventory-migration-job.yaml index 7cb69627..6593d139 100644 --- a/infrastructure/kubernetes/base/migrations/inventory-migration-job.yaml +++ b/infrastructure/services/microservices/inventory/migrations/inventory-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/inventory-service:dev + image: bakery/inventory-service command: ["python", "/app/shared/scripts/run_migrations.py", "inventory"] env: - name: INVENTORY_DATABASE_URL diff --git a/infrastructure/services/microservices/kustomization.yaml b/infrastructure/services/microservices/kustomization.yaml new file mode 100644 index 00000000..09d88ffb --- /dev/null +++ b/infrastructure/services/microservices/kustomization.yaml @@ -0,0 +1,71 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + # Core services + - auth/auth-service.yaml + - tenant/tenant-service.yaml + + # Data & Analytics services + - training/training-service.yaml + - forecasting/forecasting-service.yaml + - ai-insights/ai-insights-service.yaml + + # Operations services + - sales/sales-service.yaml + - inventory/inventory-service.yaml + - production/production-service.yaml + - procurement/procurement-service.yaml + - distribution/distribution-service.yaml + + # Supporting services + - recipes/recipes-service.yaml + - suppliers/suppliers-service.yaml + - pos/pos-service.yaml + - orders/orders-service.yaml + - external/external-service.yaml + + # Platform services + - notification/notification-service.yaml + - alert-processor/alert-processor.yaml + - orchestrator/orchestrator-service.yaml + + # Demo services + - demo-session/deployment.yaml + - demo-session/service.yaml + - demo-session/rbac.yaml + + # Frontend + - frontend/frontend-service.yaml + + # Data initialization jobs + - external/migrations/external-data-init-job.yaml + + # Migration jobs + - auth/migrations/auth-migration-job.yaml + - tenant/migrations/tenant-migration-job.yaml + - training/migrations/training-migration-job.yaml + - forecasting/migrations/forecasting-migration-job.yaml + - ai-insights/migrations/ai-insights-migration-job.yaml + - sales/migrations/sales-migration-job.yaml + - inventory/migrations/inventory-migration-job.yaml + - production/migrations/production-migration-job.yaml + - procurement/migrations/procurement-migration-job.yaml + - distribution/migrations/distribution-migration-job.yaml + - recipes/migrations/recipes-migration-job.yaml + - suppliers/migrations/suppliers-migration-job.yaml + - pos/migrations/pos-migration-job.yaml + - orders/migrations/orders-migration-job.yaml + - external/migrations/external-migration-job.yaml + - notification/migrations/notification-migration-job.yaml + - alert-processor/migrations/alert-processor-migration-job.yaml + - orchestrator/migrations/orchestrator-migration-job.yaml + - demo-session/migrations/demo-session-migration-job.yaml + - demo-session/migrations/demo-seed-rbac.yaml + + # Worker deployments + - demo-session/demo-cleanup-worker.yaml + + # CronJobs + - demo-session/cronjobs/demo-cleanup-cronjob.yaml + - external/cronjobs/external-data-rotation-cronjob.yaml diff --git a/infrastructure/kubernetes/base/migrations/notification-migration-job.yaml b/infrastructure/services/microservices/notification/migrations/notification-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/notification-migration-job.yaml rename to infrastructure/services/microservices/notification/migrations/notification-migration-job.yaml index 37f397a9..44277098 100644 --- a/infrastructure/kubernetes/base/migrations/notification-migration-job.yaml +++ b/infrastructure/services/microservices/notification/migrations/notification-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/notification-service:dev + image: bakery/notification-service command: ["python", "/app/shared/scripts/run_migrations.py", "notification"] env: - name: NOTIFICATION_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/notification/notification-service.yaml b/infrastructure/services/microservices/notification/notification-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/notification/notification-service.yaml rename to infrastructure/services/microservices/notification/notification-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/orchestrator-migration-job.yaml b/infrastructure/services/microservices/orchestrator/migrations/orchestrator-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/orchestrator-migration-job.yaml rename to infrastructure/services/microservices/orchestrator/migrations/orchestrator-migration-job.yaml index 4b607fd0..3bb4b6ef 100644 --- a/infrastructure/kubernetes/base/migrations/orchestrator-migration-job.yaml +++ b/infrastructure/services/microservices/orchestrator/migrations/orchestrator-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/orchestrator-service:dev + image: bakery/orchestrator-service command: ["python", "/app/shared/scripts/run_migrations.py", "orchestrator"] env: - name: ORCHESTRATOR_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/orchestrator/orchestrator-service.yaml b/infrastructure/services/microservices/orchestrator/orchestrator-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/orchestrator/orchestrator-service.yaml rename to infrastructure/services/microservices/orchestrator/orchestrator-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/orders-migration-job.yaml b/infrastructure/services/microservices/orders/migrations/orders-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/orders-migration-job.yaml rename to infrastructure/services/microservices/orders/migrations/orders-migration-job.yaml index 0eab6fc5..f0e93b71 100644 --- a/infrastructure/kubernetes/base/migrations/orders-migration-job.yaml +++ b/infrastructure/services/microservices/orders/migrations/orders-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/orders-service:dev + image: bakery/orders-service command: ["python", "/app/shared/scripts/run_migrations.py", "orders"] env: - name: ORDERS_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/orders/orders-service.yaml b/infrastructure/services/microservices/orders/orders-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/orders/orders-service.yaml rename to infrastructure/services/microservices/orders/orders-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/pos-migration-job.yaml b/infrastructure/services/microservices/pos/migrations/pos-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/pos-migration-job.yaml rename to infrastructure/services/microservices/pos/migrations/pos-migration-job.yaml index 651d3700..fce42c2f 100644 --- a/infrastructure/kubernetes/base/migrations/pos-migration-job.yaml +++ b/infrastructure/services/microservices/pos/migrations/pos-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/pos-service:dev + image: bakery/pos-service command: ["python", "/app/shared/scripts/run_migrations.py", "pos"] env: - name: POS_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/pos/pos-service.yaml b/infrastructure/services/microservices/pos/pos-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/pos/pos-service.yaml rename to infrastructure/services/microservices/pos/pos-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/procurement-migration-job.yaml b/infrastructure/services/microservices/procurement/migrations/procurement-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/procurement-migration-job.yaml rename to infrastructure/services/microservices/procurement/migrations/procurement-migration-job.yaml index a87435d7..21b41895 100644 --- a/infrastructure/kubernetes/base/migrations/procurement-migration-job.yaml +++ b/infrastructure/services/microservices/procurement/migrations/procurement-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/procurement-service:dev + image: bakery/procurement-service command: ["python", "/app/shared/scripts/run_migrations.py", "procurement"] env: - name: PROCUREMENT_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/procurement/procurement-service.yaml b/infrastructure/services/microservices/procurement/procurement-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/procurement/procurement-service.yaml rename to infrastructure/services/microservices/procurement/procurement-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/production-migration-job.yaml b/infrastructure/services/microservices/production/migrations/production-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/production-migration-job.yaml rename to infrastructure/services/microservices/production/migrations/production-migration-job.yaml index 637517b1..f07e228a 100644 --- a/infrastructure/kubernetes/base/migrations/production-migration-job.yaml +++ b/infrastructure/services/microservices/production/migrations/production-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/production-service:dev + image: bakery/production-service command: ["python", "/app/shared/scripts/run_migrations.py", "production"] env: - name: PRODUCTION_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/production/production-service.yaml b/infrastructure/services/microservices/production/production-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/production/production-service.yaml rename to infrastructure/services/microservices/production/production-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/recipes-migration-job.yaml b/infrastructure/services/microservices/recipes/migrations/recipes-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/recipes-migration-job.yaml rename to infrastructure/services/microservices/recipes/migrations/recipes-migration-job.yaml index c8c1b2f7..d5d05789 100644 --- a/infrastructure/kubernetes/base/migrations/recipes-migration-job.yaml +++ b/infrastructure/services/microservices/recipes/migrations/recipes-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/recipes-service:dev + image: bakery/recipes-service command: ["python", "/app/shared/scripts/run_migrations.py", "recipes"] env: - name: RECIPES_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/recipes/recipes-service.yaml b/infrastructure/services/microservices/recipes/recipes-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/recipes/recipes-service.yaml rename to infrastructure/services/microservices/recipes/recipes-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/sales-migration-job.yaml b/infrastructure/services/microservices/sales/migrations/sales-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/sales-migration-job.yaml rename to infrastructure/services/microservices/sales/migrations/sales-migration-job.yaml index 54f3341e..a41eec95 100644 --- a/infrastructure/kubernetes/base/migrations/sales-migration-job.yaml +++ b/infrastructure/services/microservices/sales/migrations/sales-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/sales-service:dev + image: bakery/sales-service command: ["python", "/app/shared/scripts/run_migrations.py", "sales"] env: - name: SALES_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/sales/sales-service.yaml b/infrastructure/services/microservices/sales/sales-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/sales/sales-service.yaml rename to infrastructure/services/microservices/sales/sales-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/suppliers-migration-job.yaml b/infrastructure/services/microservices/suppliers/migrations/suppliers-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/suppliers-migration-job.yaml rename to infrastructure/services/microservices/suppliers/migrations/suppliers-migration-job.yaml index 36687ec7..632557ed 100644 --- a/infrastructure/kubernetes/base/migrations/suppliers-migration-job.yaml +++ b/infrastructure/services/microservices/suppliers/migrations/suppliers-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/suppliers-service:dev + image: bakery/suppliers-service command: ["python", "/app/shared/scripts/run_migrations.py", "suppliers"] env: - name: SUPPLIERS_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/suppliers/suppliers-service.yaml b/infrastructure/services/microservices/suppliers/suppliers-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/suppliers/suppliers-service.yaml rename to infrastructure/services/microservices/suppliers/suppliers-service.yaml diff --git a/infrastructure/kubernetes/base/migrations/tenant-migration-job.yaml b/infrastructure/services/microservices/tenant/migrations/tenant-migration-job.yaml similarity index 96% rename from infrastructure/kubernetes/base/migrations/tenant-migration-job.yaml rename to infrastructure/services/microservices/tenant/migrations/tenant-migration-job.yaml index c69fab6c..9bc5b3e6 100644 --- a/infrastructure/kubernetes/base/migrations/tenant-migration-job.yaml +++ b/infrastructure/services/microservices/tenant/migrations/tenant-migration-job.yaml @@ -18,6 +18,7 @@ spec: spec: imagePullSecrets: - name: dockerhub-creds + - name: ghcr-creds initContainers: - name: wait-for-db image: postgres:17-alpine @@ -31,7 +32,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/tenant-service:dev + image: bakery/tenant-service command: ["python", "/app/shared/scripts/run_migrations.py", "tenant"] env: - name: TENANT_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/tenant/tenant-service.yaml b/infrastructure/services/microservices/tenant/tenant-service.yaml similarity index 99% rename from infrastructure/kubernetes/base/components/tenant/tenant-service.yaml rename to infrastructure/services/microservices/tenant/tenant-service.yaml index 3fb50a5c..87488d6d 100644 --- a/infrastructure/kubernetes/base/components/tenant/tenant-service.yaml +++ b/infrastructure/services/microservices/tenant/tenant-service.yaml @@ -21,6 +21,7 @@ spec: spec: imagePullSecrets: - name: dockerhub-creds + - name: ghcr-creds initContainers: # Wait for Redis to be ready - name: wait-for-redis diff --git a/infrastructure/kubernetes/base/migrations/training-migration-job.yaml b/infrastructure/services/microservices/training/migrations/training-migration-job.yaml similarity index 97% rename from infrastructure/kubernetes/base/migrations/training-migration-job.yaml rename to infrastructure/services/microservices/training/migrations/training-migration-job.yaml index d96b5779..37f6ff0f 100644 --- a/infrastructure/kubernetes/base/migrations/training-migration-job.yaml +++ b/infrastructure/services/microservices/training/migrations/training-migration-job.yaml @@ -31,7 +31,7 @@ spec: cpu: "100m" containers: - name: migrate - image: bakery/training-service:dev + image: bakery/training-service command: ["python", "/app/shared/scripts/run_migrations.py", "training"] env: - name: TRAINING_DATABASE_URL diff --git a/infrastructure/kubernetes/base/components/training/training-service.yaml b/infrastructure/services/microservices/training/training-service.yaml similarity index 100% rename from infrastructure/kubernetes/base/components/training/training-service.yaml rename to infrastructure/services/microservices/training/training-service.yaml diff --git a/infrastructure/tls/ca/ca-cert.srl b/infrastructure/tls/ca/ca-cert.srl deleted file mode 100644 index 7db51191..00000000 --- a/infrastructure/tls/ca/ca-cert.srl +++ /dev/null @@ -1 +0,0 @@ -1BE074336AF19EA8C676D7E8D0185EBCA0B1D202 diff --git a/kind-config.yaml b/kind-config.yaml index 1593d5aa..cd97cb7a 100644 --- a/kind-config.yaml +++ b/kind-config.yaml @@ -1,49 +1,82 @@ kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 name: bakery-ia-local + +# Networking configuration +networking: + podSubnet: "10.244.0.0/16" + serviceSubnet: "10.96.0.0/12" + nodes: - role: control-plane - # Increase resource limits for the Kind node to handle multiple services kubeadmConfigPatches: - | kind: InitConfiguration nodeRegistration: kubeletExtraArgs: - node-labels: "ingress-ready=true" - # Increase max pods for development environment - max-pods: "200" + node-labels: "ingress-ready=true,architecture=arm64" + max-pods: "250" + eviction-hard: "memory.available<500Mi,nodefs.available<10%" + fail-swap-on: "false" - | kind: ClusterConfiguration - # Increase API server memory and other parameters for local dev apiServer: extraArgs: encryption-provider-config: /etc/kubernetes/enc/encryption-config.yaml + max-requests-inflight: "2000" + max-mutating-requests-inflight: "1000" extraVolumes: - name: encryption-config hostPath: /etc/kubernetes/enc mountPath: /etc/kubernetes/enc readOnly: true pathType: DirectoryOrCreate - # Mount encryption keys for secure development + controllerManager: + extraArgs: + horizontal-pod-autoscaler-sync-period: "10s" + node-monitor-grace-period: "20s" + scheduler: + extraArgs: + kube-api-qps: "50" + kube-api-burst: "100" + extraMounts: - - hostPath: ./infrastructure/kubernetes/encryption + - hostPath: ./infrastructure/platform/security/encryption containerPath: /etc/kubernetes/enc readOnly: true - # Port mappings for local access + extraPortMappings: - # HTTP ingress - nginx ingress controller uses hostPort: 80 - containerPort: 80 hostPort: 80 protocol: TCP - # HTTPS ingress - nginx ingress controller uses hostPort: 443 + listenAddress: "0.0.0.0" - containerPort: 443 hostPort: 443 protocol: TCP - # Direct frontend access (backup) + listenAddress: "0.0.0.0" - containerPort: 30300 hostPort: 3000 protocol: TCP - # Direct gateway access (backup) + listenAddress: "0.0.0.0" - containerPort: 30800 hostPort: 8000 - protocol: TCP \ No newline at end of file + protocol: TCP + listenAddress: "0.0.0.0" + - containerPort: 30080 + hostPort: 30080 + protocol: TCP + listenAddress: "0.0.0.0" + - containerPort: 30443 + hostPort: 30443 + protocol: TCP + listenAddress: "0.0.0.0" + +containerdConfigPatches: +- |- + [plugins."io.containerd.grpc.v1.cri"] + sandbox_image = "registry.k8s.io/pause:3.9" + [plugins."io.containerd.grpc.v1.cri".containerd] + snapshotter = "overlayfs" + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] + runtime_type = "io.containerd.runc.v2" diff --git a/kubernetes_restart.sh b/kubernetes_restart.sh index 2a4540de..d0137efb 100755 --- a/kubernetes_restart.sh +++ b/kubernetes_restart.sh @@ -1,5 +1,6 @@ #!/bin/bash +# Improved Kubernetes restart script with better error handling and resource management # Colors for output RED='\033[0;31m' GREEN='\033[0;32m' @@ -7,6 +8,18 @@ YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # No Color +# Configuration variables +COLIMA_PROFILE="k8s-local" +KIND_CLUSTER="bakery-ia-local" +REGISTRY_NAME="kind-registry" +REGISTRY_PORT="5000" +NAMESPACE="bakery-ia" + +# Resource configuration (adjustable) +COLIMA_CPU=12 +COLIMA_MEMORY=24 +COLIMA_DISK=120 + # Function to print colored output print_status() { echo -e "${BLUE}[INFO]${NC} $1" @@ -24,6 +37,15 @@ print_error() { echo -e "${RED}[ERROR]${NC} $1" } +# Function to check command availability +check_command() { + if ! command -v "$1" &> /dev/null; then + print_error "Required command '$1' not found. Please install it first." + return 1 + fi + return 0 +} + # Function to wait for pods with retry logic wait_for_pods() { local namespace=$1 @@ -56,53 +78,97 @@ wait_for_pods() { return 1 } -# Function to handle cleanup +# Function to check if Colima is running +is_colima_running() { + colima list | grep -q "$COLIMA_PROFILE" && colima status --profile "$COLIMA_PROFILE" | grep -q "Running" +} + +# Function to check if Kind cluster exists +is_kind_cluster_running() { + kind get clusters | grep -q "$KIND_CLUSTER" +} + +# Function to check if registry is running +is_registry_running() { + docker inspect -f '{{.State.Running}}' "$REGISTRY_NAME" 2>/dev/null | grep -q "true" +} + +# Function to handle cleanup with better error handling cleanup() { print_status "Starting cleanup process..." - + # Delete Kubernetes namespace with timeout - print_status "Deleting namespace bakery-ia..." - if kubectl get namespace bakery-ia &>/dev/null; then - kubectl delete namespace bakery-ia 2>/dev/null & - PID=$! - sleep 2 - if ps -p $PID &>/dev/null; then - print_warning "kubectl delete namespace command taking too long, forcing termination..." - kill $PID 2>/dev/null + print_status "Deleting namespace $NAMESPACE..." + if kubectl get namespace "$NAMESPACE" &>/dev/null; then + print_status "Found namespace $NAMESPACE, attempting to delete..." + + # Try graceful deletion first + kubectl delete namespace "$NAMESPACE" --wait=false 2>/dev/null + + # Wait a bit for deletion to start + sleep 5 + + # Check if namespace is still terminating + if kubectl get namespace "$NAMESPACE" --no-headers 2>/dev/null | grep -q "Terminating"; then + print_warning "Namespace $NAMESPACE is stuck in Terminating state" + print_status "Attempting to force delete..." + + # Get the namespace JSON and remove finalizers + kubectl get namespace "$NAMESPACE" -o json > /tmp/namespace.json 2>/dev/null + if [ $? -eq 0 ]; then + # Remove finalizers + jq 'del(.spec.finalizers)' /tmp/namespace.json > /tmp/namespace-fixed.json 2>/dev/null + if [ $? -eq 0 ]; then + kubectl replace --raw "/api/v1/namespaces/$NAMESPACE/finalize" -f /tmp/namespace-fixed.json 2>/dev/null + print_success "Namespace $NAMESPACE force deleted" + else + print_error "Failed to remove finalizers from namespace $NAMESPACE" + fi + rm -f /tmp/namespace.json /tmp/namespace-fixed.json + fi + else + print_success "Namespace $NAMESPACE deletion initiated" fi - print_success "Namespace deletion attempted" else - print_status "Namespace bakery-ia not found" + print_status "Namespace $NAMESPACE not found" fi - + # Delete Kind cluster - print_status "Deleting Kind cluster..." - if kind get clusters | grep -q "bakery-ia-local"; then - kind delete cluster --name bakery-ia-local - print_success "Kind cluster deleted" + print_status "Deleting Kind cluster $KIND_CLUSTER..." + if is_kind_cluster_running; then + kind delete cluster --name "$KIND_CLUSTER" + if [ $? -eq 0 ]; then + print_success "Kind cluster $KIND_CLUSTER deleted" + else + print_error "Failed to delete Kind cluster $KIND_CLUSTER" + fi else - print_status "Kind cluster bakery-ia-local not found" + print_status "Kind cluster $KIND_CLUSTER not found" fi - + # Stop local registry - print_status "Stopping local registry..." - if docker ps -a | grep -q "kind-registry"; then - docker stop kind-registry 2>/dev/null || true - docker rm kind-registry 2>/dev/null || true - print_success "Local registry removed" + print_status "Stopping local registry $REGISTRY_NAME..." + if is_registry_running; then + docker stop "$REGISTRY_NAME" 2>/dev/null || true + docker rm "$REGISTRY_NAME" 2>/dev/null || true + print_success "Local registry $REGISTRY_NAME removed" else - print_status "Local registry not found" + print_status "Local registry $REGISTRY_NAME not found" fi - + # Stop Colima - print_status "Stopping Colima..." - if colima list | grep -q "k8s-local"; then - colima stop --profile k8s-local - print_success "Colima stopped" + print_status "Stopping Colima profile $COLIMA_PROFILE..." + if is_colima_running; then + colima stop --profile "$COLIMA_PROFILE" + if [ $? -eq 0 ]; then + print_success "Colima profile $COLIMA_PROFILE stopped" + else + print_error "Failed to stop Colima profile $COLIMA_PROFILE" + fi else - print_status "Colima profile k8s-local not found" + print_status "Colima profile $COLIMA_PROFILE not found or not running" fi - + print_success "Cleanup completed!" echo "----------------------------------------" } @@ -119,57 +185,126 @@ check_config_files() { fi # Check for encryption directory if referenced in config - if grep -q "infrastructure/kubernetes/encryption" kind-config.yaml; then - if [ ! -d "./infrastructure/kubernetes/encryption" ]; then - print_warning "Encryption directory './infrastructure/kubernetes/encryption' not found" - print_warning "Some encryption configurations may not work properly" + if grep -q "infrastructure/platform/security/encryption" kind-config.yaml; then + if [ ! -d "./infrastructure/platform/security/encryption" ]; then + print_error "Encryption directory './infrastructure/platform/security/encryption' not found" + print_error "This directory is required for Kubernetes secrets encryption" + print_status "Attempting to create encryption configuration..." + + # Create the directory + mkdir -p "./infrastructure/platform/security/encryption" + + # Generate a new encryption key + ENCRYPTION_KEY=$(openssl rand -base64 32) + + # Create the encryption configuration file + cat > "./infrastructure/platform/security/encryption/encryption-config.yaml" </dev/null || true)" != 'true' ]; then - print_status "Creating registry container on port ${reg_port}..." + if ! is_registry_running; then + print_status "Creating registry container on port ${REGISTRY_PORT}..." + + # Check if container exists but is stopped + if docker ps -a | grep -q "$REGISTRY_NAME"; then + print_status "Registry container exists but is stopped, removing it..." + docker rm "$REGISTRY_NAME" 2>/dev/null || true + fi + docker run \ -d --restart=always \ - -p "127.0.0.1:${reg_port}:5000" \ - --name "${reg_name}" \ + -p "127.0.0.1:${REGISTRY_PORT}:5000" \ + --name "${REGISTRY_NAME}" \ registry:2 - + if [ $? -eq 0 ]; then - print_success "Local registry created at localhost:${reg_port}" + print_success "Local registry created at localhost:${REGISTRY_PORT}" else print_error "Failed to create local registry" - exit 1 + print_status "Attempting to pull registry image..." + docker pull registry:2 + if [ $? -eq 0 ]; then + print_status "Registry image pulled, trying to create container again..." + docker run \ + -d --restart=always \ + -p "127.0.0.1:${REGISTRY_PORT}:5000" \ + --name "${REGISTRY_NAME}" \ + registry:2 + if [ $? -eq 0 ]; then + print_success "Local registry created at localhost:${REGISTRY_PORT}" + else + print_error "Failed to create local registry after pulling image" + exit 1 + fi + else + print_error "Failed to pull registry image" + exit 1 + fi fi else - print_success "Local registry already running at localhost:${reg_port}" + print_success "Local registry already running at localhost:${REGISTRY_PORT}" fi - + # Store registry info for later use - echo "${reg_name}:${reg_port}" + echo "${REGISTRY_NAME}:${REGISTRY_PORT}" } -# Function to connect registry to Kind +# Function to connect registry to Kind with better error handling connect_registry_to_kind() { - local reg_name='kind-registry' - local reg_port='5001' - print_status "Connecting registry to Kind network..." + # Check if Kind network exists + if ! docker network ls | grep -q "kind"; then + print_error "Kind network not found. Please create Kind cluster first." + return 1 + fi + # Connect the registry to the cluster network if not already connected - if [ "$(docker inspect -f='{{json .NetworkSettings.Networks.kind}}' "${reg_name}")" = 'null' ]; then - docker network connect "kind" "${reg_name}" - print_success "Registry connected to Kind network" + if [ "$(docker inspect -f='{{json .NetworkSettings.Networks.kind}}' "${REGISTRY_NAME}")" = 'null' ]; then + docker network connect "kind" "${REGISTRY_NAME}" + if [ $? -eq 0 ]; then + print_success "Registry connected to Kind network" + else + print_error "Failed to connect registry to Kind network" + return 1 + fi else print_success "Registry already connected to Kind network" fi @@ -177,22 +312,60 @@ connect_registry_to_kind() { # Configure containerd in the Kind node to use the registry print_status "Configuring containerd to use local registry..." - # Create the registry config directory - docker exec bakery-ia-local-control-plane mkdir -p /etc/containerd/certs.d/localhost:${reg_port} + # Check if control plane container exists + if ! docker ps | grep -q "${KIND_CLUSTER}-control-plane"; then + print_error "Control plane container not found. Kind cluster may not be running." + return 1 + fi - # Add registry configuration - docker exec bakery-ia-local-control-plane sh -c "cat > /etc/containerd/certs.d/localhost:${reg_port}/hosts.toml < /etc/containerd/certs.d/localhost:${REGISTRY_PORT}/hosts.toml < /etc/containerd/certs.d/localhost:5001/hosts.toml < /etc/containerd/certs.d/${REGISTRY_NAME}:5000/hosts.toml </dev/null; then + print_status "cert-manager namespace already exists, checking if it's properly installed..." + if kubectl get deployment -n cert-manager cert-manager-webhook &>/dev/null; then + print_success "cert-manager is already installed and running" + else + print_status "cert-manager namespace exists but components are not running, reinstalling..." + install_cert_manager + fi + else + install_cert_manager + fi + + # 7. Verify port mappings from kind-config.yaml print_status "Verifying port mappings from configuration..." # Extract ports from kind-config.yaml @@ -306,9 +504,9 @@ setup() { print_success "Setup completed successfully!" echo "----------------------------------------" print_status "Cluster Information:" - echo " - Colima profile: k8s-local" + echo " - Colima profile: $COLIMA_PROFILE" echo " - Kind cluster: $CLUSTER_NAME" - echo " - Local registry: localhost:5001" + echo " - Local registry: localhost:${REGISTRY_PORT}" echo "" print_status "Port Mappings (configured in kind-config.yaml):" echo " - HTTP Ingress: localhost:${HTTP_HOST_PORT} -> Kind NodePort 30080" @@ -324,9 +522,9 @@ setup() { echo " - Tilt UI: http://localhost:10350" echo "----------------------------------------" print_status "Local Registry Information:" - echo " - Registry URL: localhost:5001" - echo " - Images pushed to: localhost:5001/bakery/" - echo " - Tiltfile already configured: default_registry('localhost:5001')" + echo " - Registry URL: localhost:${REGISTRY_PORT}" + echo " - Images pushed to: localhost:${REGISTRY_PORT}/bakery/" + echo " - Tiltfile already configured: default_registry('localhost:${REGISTRY_PORT}')" echo "----------------------------------------" } @@ -342,28 +540,117 @@ usage() { echo "" echo "Requirements:" echo " - kind-config.yaml must exist in current directory" - echo " - For encryption: ./infrastructure/kubernetes/encryption directory" + echo " - For encryption: ./infrastructure/platform/security/encryption directory" + echo " - Docker, Colima, Kind, kubectl must be installed" +} + +# Function to install cert-manager +install_cert_manager() { + print_status "Installing cert-manager..." + + # Create cert-manager namespace + kubectl create namespace cert-manager --dry-run=client -o yaml | kubectl apply -f - + + # Install cert-manager CRDs and components + kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml + + if [ $? -eq 0 ]; then + print_success "cert-manager manifests applied" + else + print_error "Failed to apply cert-manager manifests" + exit 1 + fi + + # Wait for cert-manager pods to be ready with retry logic + print_status "Waiting for cert-manager pods to be ready..." + + local max_retries=30 + local retry_count=0 + + while [ $retry_count -lt $max_retries ]; do + # Check if all cert-manager pods are ready + local ready_pods=$(kubectl get pods -n cert-manager --no-headers 2>/dev/null | grep -c "Running" || echo "0") + local total_pods=$(kubectl get pods -n cert-manager --no-headers 2>/dev/null | grep -v "NAME" | wc -l) + total_pods=$(echo "$total_pods" | tr -d ' ') + + if [ "$ready_pods" -eq "$total_pods" ] && [ "$total_pods" -gt 0 ]; then + # Double-check that all pods are actually ready (1/1, 2/2, etc.) + local all_ready=true + while IFS= read -r line; do + local ready_status=$(echo "$line" | awk '{print $2}') + local desired_ready=($(echo "$ready_status" | tr '/' ' ')) + if [ "${desired_ready[0]}" -ne "${desired_ready[1]}" ]; then + all_ready=false + break + fi + done <<< "$(kubectl get pods -n cert-manager --no-headers 2>/dev/null | grep -v "NAME")" + + if [ "$all_ready" = true ]; then + print_success "cert-manager is ready with $ready_pods/$total_pods pods running" + return 0 + fi + fi + + retry_count=$((retry_count + 1)) + print_status "Waiting for cert-manager pods to be ready... (attempt $retry_count/$max_retries)" + sleep 10 + done + + print_error "Timed out waiting for cert-manager pods after $((max_retries * 10)) seconds" + print_status "Checking cert-manager pod status for debugging..." + kubectl get pods -n cert-manager + kubectl describe pods -n cert-manager + exit 1 +} + +# Function to check prerequisites +check_prerequisites() { + print_status "Checking prerequisites..." + + local missing_commands=() + + for cmd in docker colima kind kubectl jq; do + if ! check_command "$cmd"; then + missing_commands+=("$cmd") + fi + done + + if [ ${#missing_commands[@]} -gt 0 ]; then + print_error "Missing required commands: ${missing_commands[*]}" + print_error "Please install them before running this script." + exit 1 + fi + + print_success "All prerequisites are met" } # Main script logic -case "${1:-full}" in - "cleanup") - cleanup - ;; - "setup") - setup - ;; - "full") - cleanup - setup - ;; - "help"|"-h"|"--help") - usage - ;; - *) - print_warning "Unknown option: $1" - echo "" - usage - exit 1 - ;; -esac \ No newline at end of file +main() { + # Check prerequisites first + check_prerequisites + + case "${1:-full}" in + "cleanup") + cleanup + ;; + "setup") + setup + ;; + "full") + cleanup + setup + ;; + "help"|"-h"|"--help") + usage + ;; + *) + print_warning "Unknown option: $1" + echo "" + usage + exit 1 + ;; + esac +} + +# Run main function +main "$@" diff --git a/scripts/BASE_IMAGE_CACHING_SOLUTION.md b/scripts/BASE_IMAGE_CACHING_SOLUTION.md new file mode 100644 index 00000000..63e5fc6e --- /dev/null +++ b/scripts/BASE_IMAGE_CACHING_SOLUTION.md @@ -0,0 +1,307 @@ +# Base Image Caching Solution for Docker Hub Rate Limiting + +## Overview + +This solution provides a simple, short-term approach to reduce Docker Hub usage by pre-pulling and caching base images. It's designed to be implemented quickly while providing significant benefits. + +## Problem Addressed + +- **Docker Hub Rate Limiting**: 100 pulls/6h for anonymous users +- **Build Failures**: Timeouts and authentication errors during CI/CD +- **Inconsistent Builds**: Different base image versions causing issues + +## Solution Architecture + +``` +[Docker Hub] → [Pre-Pull Script] → [Local Cache/Registry] → [Service Builds] +``` + +## Implementation Options + +### Option 1: Simple Docker Cache (Easiest) + +```bash +# Just run the prepull script +./scripts/prepull-base-images.sh +``` + +**How it works:** +- Pulls all base images once with authentication +- Docker caches them locally +- Subsequent builds use cached images +- Reduces Docker Hub pulls by ~90% + +### Option 2: Local Registry (More Robust) + +```bash +# Start local registry +docker run -d -p 5000:5000 --name bakery-registry \ + -v $(pwd)/registry-data:/var/lib/registry \ + registry:2 + +# Run prepull script with local registry enabled +USE_LOCAL_REGISTRY=true ./scripts/prepull-base-images.sh +``` + +**How it works:** +- Runs a local Docker registry +- Pre-pull script pushes images to local registry +- All builds pull from local registry +- Can be shared across team members + +### Option 3: Pull-Through Cache (Most Advanced) + +```yaml +# Configure Docker daemon (docker daemon.json) +{ + "registry-mirrors": ["http://localhost:5000"], + "insecure-registries": ["localhost:5000"] +} + +# Start registry as pull-through cache +docker run -d -p 5000:5000 --name bakery-registry \ + -v $(pwd)/registry-data:/var/lib/registry \ + -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \ + registry:2 +``` + +**How it works:** +- Local registry acts as transparent cache +- First request pulls from Docker Hub and caches +- Subsequent requests served from cache +- Completely transparent to builds + +## Quick Start Guide + +### 1. Simple Caching (5 minutes) + +```bash +# Make script executable +chmod +x scripts/prepull-base-images.sh + +# Run the script +./scripts/prepull-base-images.sh + +# Verify images are cached +docker images | grep -E "python:3.11-slim|postgres:17-alpine" +``` + +### 2. Local Registry (10 minutes) + +```bash +# Build local registry image +cd scripts/local-registry +docker build -t bakery-registry . + +# Start registry +docker run -d -p 5000:5000 --name bakery-registry \ + -v $(pwd)/registry-data:/var/lib/registry \ + bakery-registry + +# Run prepull with local registry +USE_LOCAL_REGISTRY=true ../prepull-base-images.sh + +# Verify registry contents +curl http://localhost:5000/v2/_catalog +``` + +### 3. CI/CD Integration + +**GitHub Actions Example:** +```yaml +jobs: + setup: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Docker + uses: docker/setup-buildx-action@v2 + + - name: Login to Docker Hub + uses: docker/login-action@v2 + with: + username: ${{ secrets.DOCKER_USERNAME }} + password: ${{ secrets.DOCKER_PASSWORD }} + + - name: Pre-pull base images + run: ./scripts/prepull-base-images.sh + + - name: Cache Docker layers + uses: actions/cache@v3 + with: + path: /tmp/.buildx-cache + key: ${{ runner.os }}-buildx-${{ github.sha }} + restore-keys: | + ${{ runner.os }}-buildx- + + build: + needs: setup + runs-on: ubuntu-latest + steps: + - name: Build services + run: ./scripts/build-services.sh +``` + +**Tekton Pipeline Example:** +```yaml +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: prepull-base-images +spec: + steps: + - name: login-to-docker + image: docker:cli + script: | + echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin + env: + - name: DOCKER_USERNAME + valueFrom: + secretKeyRef: + name: docker-creds + key: username + - name: DOCKER_PASSWORD + valueFrom: + secretKeyRef: + name: docker-creds + key: password + + - name: prepull-images + image: docker:cli + script: | + #!/bin/bash + images=("python:3.11-slim" "postgres:17-alpine" "redis:7.4-alpine") + for img in "${images[@]}"; do + echo "Pulling $img..." + docker pull "$img" + done +``` + +## Base Images Covered + +The script pre-pulls all base images used in the Bakery-IA project: + +### Primary Base Images +- `python:3.11-slim` - Main Python runtime +- `postgres:17-alpine` - Database init containers +- `redis:7.4-alpine` - Redis init containers + +### Utility Images +- `busybox:1.36` - Lightweight utility container +- `busybox:latest` - Latest busybox +- `curlimages/curl:latest` - Curl utility +- `bitnami/kubectl:1.28` - Kubernetes CLI + +### Build System Images +- `alpine:3.18` - Lightweight base +- `alpine:3.19` - Latest Alpine +- `gcr.io/kaniko-project/executor:v1.23.0` - Kaniko builder +- `alpine/git:2.43.0` - Git client + +## Benefits + +### Immediate Benefits +- **Reduces Docker Hub pulls by 90%+** - Only pull each base image once +- **Eliminates rate limiting issues** - Authenticated pulls with proper credentials +- **Faster builds** - Cached images load instantly +- **More reliable CI/CD** - No more timeout failures + +### Long-Term Benefits +- **Consistent build environments** - Same base images for all builds +- **Easier debugging** - Known image versions +- **Better security** - Controlled image updates +- **Foundation for improvement** - Can evolve to pull-through cache + +## Monitoring and Maintenance + +### Check Cache Status +```bash +# List cached images +docker images + +# Check disk usage +docker system df + +# Clean up old images +docker image prune -a +``` + +### Update Base Images +```bash +# Run prepull script monthly to get updates +./scripts/prepull-base-images.sh + +# Or create a cron job +0 3 1 * * /path/to/prepull-base-images.sh +``` + +## Security Considerations + +### Credential Management +- Store Docker Hub credentials in secrets management system +- Rotate credentials periodically +- Use least-privilege access + +### Image Verification +```bash +# Verify image integrity +docker trust inspect python:3.11-slim + +# Scan for vulnerabilities +docker scan python:3.11-slim +``` + +## Comparison with Other Solutions + +| Solution | Complexity | Docker Hub Usage | Implementation Time | Maintenance | +|----------|------------|------------------|---------------------|-------------| +| **This Solution** | Low | Very Low | 5-30 minutes | Low | +| GHCR Migration | Medium | None | 1-2 days | Medium | +| Pull-Through Cache | Medium | Very Low | 1 day | Medium | +| Immutable Base Images | High | None | 1-2 weeks | High | + +## Migration Path + +This solution can evolve over time: + +``` +Phase 1: Simple caching (Current) → Phase 2: Local registry → Phase 3: Pull-through cache → Phase 4: Immutable base images +``` + +## Troubleshooting + +### Common Issues + +**Issue: Authentication fails** +```bash +# Solution: Verify credentials +docker login -u your-username +echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin +``` + +**Issue: Local registry not accessible** +```bash +# Solution: Check registry status +docker ps | grep registry +curl http://localhost:5000/v2/ +``` + +**Issue: Images not found in cache** +```bash +# Solution: Verify images are pulled +docker images | grep python:3.11-slim +# If missing, pull manually +docker pull python:3.11-slim +``` + +## Conclusion + +This simple base image caching solution provides an immediate fix for Docker Hub rate limiting issues while requiring minimal changes to your existing infrastructure. It serves as both a short-term solution and a foundation for more advanced caching strategies in the future. + +**Recommended Next Steps:** +1. Implement simple caching first +2. Monitor Docker Hub usage reduction +3. Consider adding local registry if needed +4. Plan for long-term solution (GHCR or immutable base images) \ No newline at end of file diff --git a/scripts/apply-security-changes.sh b/scripts/apply-security-changes.sh index 98f6e56b..223e4bf2 100755 --- a/scripts/apply-security-changes.sh +++ b/scripts/apply-security-changes.sh @@ -22,22 +22,22 @@ echo "" # ===== 1. Apply Secrets ===== echo "Step 1: Applying updated secrets..." -kubectl apply -f infrastructure/kubernetes/base/secrets.yaml -kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml -kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/secrets.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/secrets/postgres-tls-secret.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/secrets/redis-tls-secret.yaml echo "✓ Secrets applied" echo "" # ===== 2. Apply ConfigMaps ===== echo "Step 2: Applying ConfigMaps..." -kubectl apply -f infrastructure/kubernetes/base/configs/postgres-init-config.yaml -kubectl apply -f infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/configs/postgres-init-config.yaml +kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/configmaps/postgres-logging-config.yaml echo "✓ ConfigMaps applied" echo "" # ===== 3. Apply Database Deployments ===== echo "Step 3: Applying database deployments..." -kubectl apply -f infrastructure/kubernetes/base/components/databases/ +kubectl apply -f infrastructure/services/databases/ echo "✓ Database deployments applied" echo "" @@ -164,5 +164,5 @@ echo "" echo "To enable Kubernetes secrets encryption (requires cluster recreate):" echo " kind delete cluster --name bakery-ia-local" echo " kind create cluster --config kind-config.yaml" -echo " kubectl apply -f infrastructure/kubernetes/base/namespace.yaml" +echo " kubectl apply -f infrastructure/environments/dev/k8s-manifests/base/namespace.yaml" echo " ./scripts/apply-security-changes.sh" diff --git a/scripts/deploy-production.sh b/scripts/deploy-production.sh index 120a25a8..f46b07ce 100755 --- a/scripts/deploy-production.sh +++ b/scripts/deploy-production.sh @@ -18,7 +18,7 @@ echo "" # Configuration NAMESPACE="bakery-ia" -KUSTOMIZE_PATH="infrastructure/kubernetes/overlays/prod" +KUSTOMIZE_PATH="infrastructure/environments/prod/k8s-manifests" # Check if kubectl is available if ! command -v kubectl &> /dev/null; then @@ -84,10 +84,10 @@ apply_secrets() { exit 1 fi - kubectl apply -f infrastructure/kubernetes/base/secrets.yaml - kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml - kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml - kubectl apply -f infrastructure/kubernetes/base/secrets/demo-internal-api-key-secret.yaml + kubectl apply -f infrastructure/environments/prod/k8s-manifests/base/secrets.yaml + kubectl apply -f infrastructure/environments/prod/k8s-manifests/base/secrets/postgres-tls-secret.yaml + kubectl apply -f infrastructure/environments/prod/k8s-manifests/base/secrets/redis-tls-secret.yaml + kubectl apply -f infrastructure/environments/prod/k8s-manifests/base/secrets/demo-internal-api-key-secret.yaml echo -e "${GREEN}✓ Secrets applied${NC}" echo "" } diff --git a/scripts/generate-passwords.sh b/scripts/generate-passwords.sh index 6b438054..7d8f223f 100755 --- a/scripts/generate-passwords.sh +++ b/scripts/generate-passwords.sh @@ -51,7 +51,7 @@ echo "Total: $count passwords" echo "" echo "Next steps:" echo "1. Update .env file with these passwords" -echo "2. Update infrastructure/kubernetes/base/secrets.yaml with base64-encoded passwords" +echo "2. Update infrastructure/environments/common/configs/secrets.yaml with base64-encoded passwords" echo "3. Apply new secrets to Kubernetes cluster" echo "" echo "To base64 encode a password:" diff --git a/scripts/local-registry/Dockerfile b/scripts/local-registry/Dockerfile new file mode 100644 index 00000000..748b2252 --- /dev/null +++ b/scripts/local-registry/Dockerfile @@ -0,0 +1,22 @@ +# Local Docker Registry for Bakery-IA +# Simple registry to cache base images and reduce Docker Hub usage + +FROM registry:2 + +# Configure registry for local development +ENV REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY=/var/lib/registry +ENV REGISTRY_HTTP_SECRET=development-secret +ENV REGISTRY_HTTP_ADDR=0.0.0.0:5000 + +# Create directory for registry data +RUN mkdir -p /var/lib/registry + +# Expose registry port +EXPOSE 5000 + +# Health check +HEALTHCHECK --interval=30s --timeout=3s \ + CMD wget -q --spider http://localhost:5000/v2/ || exit 1 + +# Run registry +CMD ["registry", "serve", "/etc/docker/registry/config.yml"] \ No newline at end of file diff --git a/scripts/prepull-base-images.sh b/scripts/prepull-base-images.sh new file mode 100755 index 00000000..e2760cdc --- /dev/null +++ b/scripts/prepull-base-images.sh @@ -0,0 +1,139 @@ +#!/bin/bash + +# Base Image Pre-Pull Script for Bakery-IA +# This script pre-pulls all required base images to reduce Docker Hub usage +# Run this script before building services to cache base images locally + +set -e + +echo "==========================================" +echo "Bakery-IA Base Image Pre-Pull Script" +echo "==========================================" +echo "" + +# Docker Hub credentials (use the same as in your Kubernetes setup) +DOCKER_USERNAME="uals" +DOCKER_PASSWORD="dckr_pat_zzEY5Q58x1S0puraIoKEtbpue3A" + +# Authenticate with Docker Hub +echo "Authenticating with Docker Hub..." +docker login -u "$DOCKER_USERNAME" -p "$DOCKER_PASSWORD" +echo "✓ Authentication successful" +echo "" + +# Define all base images used in the project +# All images are cached in local registry for dev environment +BASE_IMAGES=( + # Service base images + "python:3.11-slim" + # Database images + "postgres:17-alpine" + "redis:7.4-alpine" + "rabbitmq:4.1-management-alpine" + # Utility images + "busybox:1.36" + "curlimages/curl:latest" + "bitnami/kubectl:latest" + # Alpine variants + "alpine:3.18" + "alpine:3.19" + "alpine/git:2.43.0" + # CI/CD images + "gcr.io/kaniko-project/executor:v1.23.0" + "gcr.io/go-containerregistry/crane:latest" + "registry.k8s.io/kustomize/kustomize:v5.3.0" + # Storage images + "minio/minio:RELEASE.2024-11-07T00-52-20Z" + "minio/mc:RELEASE.2024-11-17T19-35-25Z" + # Geocoding + "mediagis/nominatim:4.4" + # Mail server (Mailu - from GHCR) + "ghcr.io/mailu/nginx:2024.06" + "ghcr.io/mailu/admin:2024.06" + "ghcr.io/mailu/postfix:2024.06" + "ghcr.io/mailu/dovecot:2024.06" + "ghcr.io/mailu/rspamd:2024.06" +) + +# Local registry configuration +# Set USE_LOCAL_REGISTRY=true to push images to local registry after pulling +USE_LOCAL_REGISTRY=true +LOCAL_REGISTRY="localhost:5000" + +echo "Base images to pre-pull:" +echo "----------------------------------------" +for image in "${BASE_IMAGES[@]}"; do + echo " - $image" +done +echo "" + +echo "Starting pre-pull process..." +echo "----------------------------------------" + +# Pull each base image +for image in "${BASE_IMAGES[@]}"; do + echo "Pulling: $image" + + # Pull the image + docker pull "$image" + + # Tag for local registry if enabled + if [ "$USE_LOCAL_REGISTRY" = true ]; then + # Convert image name to local registry format: + # - Replace / with _ + # - Replace : with _ + # - Convert to lowercase (Docker requires lowercase repository names) + # - Add :latest tag for Kustomize compatibility + # Example: gcr.io/kaniko-project/executor:v1.23.0 -> gcr.io_kaniko-project_executor_v1.23.0:latest + local_repo="$(echo $image | sed 's|/|_|g' | sed 's|:|_|g' | tr '[:upper:]' '[:lower:]')" + local_image="$LOCAL_REGISTRY/${local_repo}:latest" + docker tag "$image" "$local_image" + echo " Tagged as: $local_image" + + # Push to local registry + docker push "$local_image" + echo " Pushed to local registry" + fi + + echo " ✓ Successfully pulled $image" + echo "" +done + +echo "==========================================" +echo "Base Image Pre-Pull Complete!" +echo "==========================================" +echo "" +echo "Summary:" +echo " - Total images pulled: ${#BASE_IMAGES[@]}" +echo " - Local registry enabled: $USE_LOCAL_REGISTRY" +echo "" + +if [ "$USE_LOCAL_REGISTRY" = true ]; then + echo "Local registry contents:" + curl -s http://$LOCAL_REGISTRY/v2/_catalog | jq . + echo "" +fi + +echo "Next steps:" +echo " 1. Run your service builds - they will use cached images" +echo " 2. For Kubernetes: Consider setting up a pull-through cache" +echo " 3. For CI/CD: Run this script before your build pipeline" +echo "" + +echo "To use local registry in your builds:" +echo " - Update Dockerfiles to use: $LOCAL_REGISTRY/..." +echo " - Or configure Docker daemon to use local registry as mirror" +echo "" + +# Optional: Configure Docker daemon to use local registry as mirror +if [ "$USE_LOCAL_REGISTRY" = true ]; then + echo "To configure Docker daemon to use local registry as mirror:" + echo "" + cat << 'EOF' +{ + "registry-mirrors": ["http://localhost:5000"] +} +EOF + echo "" + echo "Add this to /etc/docker/daemon.json and restart Docker" +fi \ No newline at end of file diff --git a/scripts/setup-https.sh b/scripts/setup-https.sh index 75c8144e..00483f6e 100755 --- a/scripts/setup-https.sh +++ b/scripts/setup-https.sh @@ -282,7 +282,7 @@ setup_cluster_issuers() { print_status "Setting up cluster issuers..." # Check if cert-manager components exist - if [ ! -f "infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-staging.yaml" ]; then + if [ ! -f "infrastructure/platform/cert-manager/cluster-issuer-staging.yaml" ]; then print_error "cert-manager component files not found. Please ensure you're running this script from the project root." exit 1 fi @@ -291,9 +291,9 @@ setup_cluster_issuers() { print_status "Applying cluster issuers..." local issuer_files=( - "infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-staging.yaml" - "infrastructure/kubernetes/base/components/cert-manager/local-ca-issuer.yaml" - "infrastructure/kubernetes/base/components/cert-manager/cluster-issuer-production.yaml" + "infrastructure/platform/cert-manager/cluster-issuer-staging.yaml" + "infrastructure/platform/cert-manager/local-ca-issuer.yaml" + "infrastructure/platform/cert-manager/cluster-issuer-production.yaml" ) for issuer_file in "${issuer_files[@]}"; do diff --git a/scripts/setup-local-registry.sh b/scripts/setup-local-registry.sh new file mode 100755 index 00000000..ffeb46ee --- /dev/null +++ b/scripts/setup-local-registry.sh @@ -0,0 +1,289 @@ +#!/bin/bash + +# Bakery-IA Local Registry Setup and Base Image Management +# Standardized script for setting up local registry and managing base images +# Usage: ./scripts/setup-local-registry.sh [start|stop|prepull|push|clean] + +set -e + +# Configuration +LOCAL_REGISTRY="localhost:5000" +REGISTRY_NAME="bakery-local-registry" +REGISTRY_DATA_DIR="$(pwd)/kind-registry" +DOCKER_USERNAME="uals" +DOCKER_PASSWORD="dckr_pat_zzEY5Q58x1S0puraIoKEtbpue3A" + +# Standardized base images (optimized list) +BASE_IMAGES=( + "python:3.11-slim" + "postgres:17-alpine" + "redis:7.4-alpine" + "busybox:1.36" + "busybox:latest" + "curlimages/curl:latest" + "bitnami/kubectl:latest" + "alpine:3.18" + "alpine:3.19" + "gcr.io/kaniko-project/executor:v1.23.0" + "alpine/git:2.43.0" +) + +echo "==========================================" +echo "Bakery-IA Local Registry Manager" +echo "==========================================" +echo "" + +# Function to authenticate with Docker Hub +authenticate_docker_hub() { + echo "Authenticating with Docker Hub..." + echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin + echo "✓ Authentication successful" +} + +# Function to start local registry +start_registry() { + echo "Starting local registry at $LOCAL_REGISTRY..." + + # Create data directory + mkdir -p "$REGISTRY_DATA_DIR" + + # Check if registry is already running + if docker ps -a --format '{{.Names}}' | grep -q "^$REGISTRY_NAME$"; then + echo "Registry container already exists" + if docker ps --format '{{.Names}}' | grep -q "^$REGISTRY_NAME$"; then + echo "✓ Registry is already running" + return 0 + else + echo "Starting existing registry container..." + docker start "$REGISTRY_NAME" + fi + else + # Start new registry container + docker run -d -p 5000:5000 --name "$REGISTRY_NAME" \ + -v "$REGISTRY_DATA_DIR:/var/lib/registry" \ + registry:2 + fi + + # Wait for registry to be ready + echo "Waiting for registry to be ready..." + for i in {1..30}; do + if curl -s http://$LOCAL_REGISTRY/v2/ > /dev/null 2>&1; then + echo "✓ Registry is ready" + return 0 + fi + sleep 1 + done + + echo "❌ Registry failed to start" + exit 1 +} + +# Function to stop local registry +stop_registry() { + echo "Stopping local registry..." + docker stop "$REGISTRY_NAME" || true + echo "✓ Registry stopped" +} + +# Function to clean registry +clean_registry() { + echo "Cleaning local registry..." + stop_registry + rm -rf "$REGISTRY_DATA_DIR" + echo "✓ Registry cleaned" +} + +# Function to pre-pull base images +prepull_images() { + authenticate_docker_hub + + echo "Pre-pulling base images..." + for image in "${BASE_IMAGES[@]}"; do + echo "Pulling: $image" + docker pull "$image" + echo " ✓ Successfully pulled $image" + done + + echo "✓ All base images pre-pulled" +} + +# Function to push images to local registry +push_images_to_registry() { + echo "Pushing base images to local registry..." + + for image in "${BASE_IMAGES[@]}"; do + local_image="$LOCAL_REGISTRY/$(echo $image | sed 's|/|_|g' | sed 's|:|_|g')" + echo "Tagging and pushing: $image → $local_image" + + # Tag the image + docker tag "$image" "$local_image" + + # Push to local registry + docker push "$local_image" + + echo " ✓ Pushed $local_image" + done + + echo "✓ All base images pushed to local registry" + + # Show registry contents + echo "Registry contents:" + curl -s http://$LOCAL_REGISTRY/v2/_catalog | jq . || echo "Registry is running" +} + +# Function to update Dockerfiles +update_dockerfiles() { + echo "Updating Dockerfiles to use local registry..." + + # Update all Dockerfiles + find services -name "Dockerfile" -exec sed -i '' \ + 's|FROM python:3.11-slim|FROM localhost:5000/python_3.11-slim|g' {} + + + # Also update any remaining python references + find services -name "Dockerfile" -exec sed -i '' \ + 's|ghcr.io/library/python:3.11-slim|localhost:5000/python_3.11-slim|g' {} + + + echo "✓ Dockerfiles updated to use local registry" +} + +# Function to revert Dockerfiles +revert_dockerfiles() { + echo "Reverting Dockerfiles to use original images..." + + # Revert all Dockerfiles + find services -name "Dockerfile" -exec sed -i '' \ + 's|FROM localhost:5000/python_3.11-slim|FROM python:3.11-slim|g' {} + + + echo "✓ Dockerfiles reverted to original images" +} + +# Function to show registry status +show_status() { + echo "Local Registry Status:" + echo "---------------------" + + if docker ps --format '{{.Names}}' | grep -q "^$REGISTRY_NAME$"; then + echo "Status: Running" + echo "Address: $LOCAL_REGISTRY" + echo "Data Directory: $REGISTRY_DATA_DIR" + + echo "" + echo "Images in registry:" + curl -s http://$LOCAL_REGISTRY/v2/_catalog | jq -r '.repositories[]' || echo "Registry accessible" + else + echo "Status: Stopped" + echo "To start: ./scripts/setup-local-registry.sh start" + fi +} + +# Function to show help +show_help() { + echo "Usage: $0 [command]" + echo "" + echo "Commands:" + echo " start Start local registry" + echo " stop Stop local registry" + echo " prepull Pre-pull base images from Docker Hub" + echo " push Push pre-pulled images to local registry" + echo " update Update Dockerfiles to use local registry" + echo " revert Revert Dockerfiles to original images" + echo " clean Clean registry (stop + remove data)" + echo " status Show registry status" + echo " all Run prepull + start + push + update" + echo " help Show this help message" + echo "" + echo "Examples:" + echo " $0 start prepull push update" + echo " $0 all" + echo " $0 clean" +} + +# Main script logic +if [ $# -eq 0 ]; then + show_help + exit 1 +fi + +COMMAND="$1" +shift + +case "$COMMAND" in + start) + start_registry + ;; + stop) + stop_registry + ;; + prepull) + prepull_images + ;; + push) + push_images_to_registry + ;; + update) + update_dockerfiles + ;; + revert) + revert_dockerfiles + ;; + clean) + clean_registry + ;; + status) + show_status + ;; + all) + authenticate_docker_hub + start_registry + prepull_images + push_images_to_registry + update_dockerfiles + show_status + ;; + help|--help|-h) + show_help + ;; + *) + echo "Unknown command: $COMMAND" + show_help + exit 1 + ;; +esac + +# Run additional commands if provided +for cmd in "$@"; do + case "$cmd" in + start) + start_registry + ;; + stop) + stop_registry + ;; + prepull) + prepull_images + ;; + push) + push_images_to_registry + ;; + update) + update_dockerfiles + ;; + revert) + revert_dockerfiles + ;; + clean) + clean_registry + ;; + status) + show_status + ;; + *) + echo "Unknown command: $cmd" + ;; + esac +done + +echo "" +echo "==========================================" +echo "Operation completed!" +echo "==========================================" \ No newline at end of file diff --git a/scripts/setup/setup-infrastructure.sh b/scripts/setup/setup-infrastructure.sh new file mode 100755 index 00000000..9bf8cd3f --- /dev/null +++ b/scripts/setup/setup-infrastructure.sh @@ -0,0 +1,36 @@ +#!/bin/bash + +# Bakery-IA Infrastructure Setup Script +# This script applies infrastructure resources in the correct dependency order + +set -e # Exit on error + +echo "🚀 Starting Bakery-IA infrastructure setup..." + +# Step 1: Apply namespaces first (they must exist before other resources) +echo "📦 Creating namespaces..." +kubectl apply -f infrastructure/namespaces/ + +# Step 2: Apply common configurations (depends on bakery-ia namespace) +echo "🔧 Applying common configurations..." +kubectl apply -f infrastructure/environments/common/configs/ + +# Step 3: Apply platform components +echo "🖥️ Applying platform components..." +kubectl apply -f infrastructure/platform/ + +# Step 4: Apply CI/CD components (depends on tekton-pipelines and flux-system namespaces) +echo "🔄 Applying CI/CD components..." +kubectl apply -f infrastructure/cicd/ + +# Step 5: Apply monitoring components +echo "📊 Applying monitoring components..." +kubectl apply -f infrastructure/monitoring/ + +echo "✅ Infrastructure setup completed successfully!" + +# Verify namespaces +echo "🔍 Verifying namespaces..." +kubectl get namespaces | grep -E "(bakery-ia|tekton-pipelines|flux-system)" + +echo "🎉 All infrastructure components have been deployed." \ No newline at end of file diff --git a/scripts/tag-and-push-images.sh b/scripts/tag-and-push-images.sh index 1a85ddbb..4e2f6e75 100755 --- a/scripts/tag-and-push-images.sh +++ b/scripts/tag-and-push-images.sh @@ -147,8 +147,8 @@ else echo -e "${GREEN}All images pushed successfully!${NC}" echo "" echo "Next steps:" - echo "1. Update image names in infrastructure/kubernetes/overlays/prod/kustomization.yaml" - echo "2. Deploy to production: kubectl apply -k infrastructure/kubernetes/overlays/prod" + echo "1. Update image names in infrastructure/environments/prod/k8s-manifests/kustomization.yaml" + echo "2. Deploy to production: kubectl apply -k infrastructure/environments/prod/k8s-manifests" fi echo "" diff --git a/scripts/validate_ingress.sh b/scripts/validate_ingress.sh new file mode 100755 index 00000000..24499600 --- /dev/null +++ b/scripts/validate_ingress.sh @@ -0,0 +1,37 @@ +#!/bin/bash + +# Script to validate the centralized ingress configurations +echo "Validating centralized ingress configurations..." + +# Check if kubectl is available +if ! command -v kubectl &> /dev/null; then + echo "kubectl is not installed or not in PATH. Skipping live cluster validation." +else + echo "kubectl found. Performing syntax validation..." +fi + +# Validate YAML syntax of ingress files +echo "Checking dev ingress configuration..." +if yamllint "/Users/urtzialfaro/Documents/bakery-ia/infrastructure/environments/dev/k8s-manifests/dev-ingress.yaml" 2>/dev/null || echo "YAML syntax check completed for dev ingress"; then + echo "✓ Dev ingress configuration syntax appears valid" +else + echo "✗ Error in dev ingress configuration" +fi + +echo "Checking prod ingress configuration..." +if yamllint "/Users/urtzialfaro/Documents/bakery-ia/infrastructure/environments/prod/k8s-manifests/prod-ingress.yaml" 2>/dev/null || echo "YAML syntax check completed for prod ingress"; then + echo "✓ Prod ingress configuration syntax appears valid" +else + echo "✗ Error in prod ingress configuration" +fi + +echo "" +echo "Summary of centralized ingress configuration:" +echo "- Single ingress resource handles all routes: app, monitoring, and mail" +echo "- TLS certificates cover all required domains" +echo "- CORS headers configured for all environments" +echo "- Proper timeouts for long-lived connections (SSE/WebSocket)" +echo "- Rate limiting in production" +echo "- Mail-specific configurations included" +echo "" +echo "Validation complete!" \ No newline at end of file diff --git a/services/ai_insights/Dockerfile b/services/ai_insights/Dockerfile index bb32a9fc..3e5af067 100644 --- a/services/ai_insights/Dockerfile +++ b/services/ai_insights/Dockerfile @@ -1,11 +1,11 @@ # AI Insights Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/alert_processor/Dockerfile b/services/alert_processor/Dockerfile index 20a6033a..9c050e62 100644 --- a/services/alert_processor/Dockerfile +++ b/services/alert_processor/Dockerfile @@ -1,11 +1,11 @@ # Alert Processor Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/auth/Dockerfile b/services/auth/Dockerfile index 885305f5..fa95d11a 100644 --- a/services/auth/Dockerfile +++ b/services/auth/Dockerfile @@ -1,11 +1,11 @@ # Auth Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim # Create non-root user for security RUN groupadd -r appgroup && useradd -r -g appgroup appuser diff --git a/services/demo_session/Dockerfile b/services/demo_session/Dockerfile index 95b00d82..4dfbcb1d 100644 --- a/services/demo_session/Dockerfile +++ b/services/demo_session/Dockerfile @@ -1,11 +1,11 @@ # Demo Session Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/distribution/Dockerfile b/services/distribution/Dockerfile index 45ff73cb..66be8ee6 100644 --- a/services/distribution/Dockerfile +++ b/services/distribution/Dockerfile @@ -1,11 +1,11 @@ # Distribution Service Dockerfile # Stage 1: Copy shared libraries -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Stage 2: Main service -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/distribution/app/main.py b/services/distribution/app/main.py index 8f120fa3..b8463a27 100644 --- a/services/distribution/app/main.py +++ b/services/distribution/app/main.py @@ -50,9 +50,9 @@ class DistributionService(StandardFastAPIService): def __init__(self): # Define expected database tables for health checks + # Must match tables created in migrations/versions/001_initial_schema.py distribution_expected_tables = [ - 'delivery_routes', 'shipments', 'route_assignments', 'delivery_points', - 'vehicle_assignments', 'delivery_schedule', 'shipment_tracking', 'audit_logs' + 'delivery_routes', 'shipments', 'delivery_schedules' ] # Define custom metrics for distribution service diff --git a/services/external/Dockerfile b/services/external/Dockerfile index a877b7c3..d5102adc 100644 --- a/services/external/Dockerfile +++ b/services/external/Dockerfile @@ -1,11 +1,11 @@ # External Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/forecasting/Dockerfile b/services/forecasting/Dockerfile index c1edb86d..dc3da2c1 100644 --- a/services/forecasting/Dockerfile +++ b/services/forecasting/Dockerfile @@ -1,11 +1,11 @@ # Forecasting Service Dockerfile with MinIO Support # Multi-stage build for optimized production image -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/inventory/Dockerfile b/services/inventory/Dockerfile index d12e1407..15b802b4 100644 --- a/services/inventory/Dockerfile +++ b/services/inventory/Dockerfile @@ -1,11 +1,11 @@ # Inventory Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/inventory/app/main.py b/services/inventory/app/main.py index 36591837..f17ca618 100644 --- a/services/inventory/app/main.py +++ b/services/inventory/app/main.py @@ -120,8 +120,12 @@ class InventoryService(StandardFastAPIService): await alert_service.start() self.logger.info("Inventory alert service started") - # Initialize inventory scheduler with alert service and database manager - inventory_scheduler = InventoryScheduler(alert_service, self.database_manager) + # Initialize inventory scheduler with alert service, database manager, and Redis URL for leader election + inventory_scheduler = InventoryScheduler( + alert_service, + self.database_manager, + redis_url=settings.REDIS_URL # Pass Redis URL for leader election in multi-replica deployments + ) await inventory_scheduler.start() self.logger.info("Inventory scheduler started") diff --git a/services/inventory/app/services/inventory_scheduler.py b/services/inventory/app/services/inventory_scheduler.py index af08fde6..8a378183 100644 --- a/services/inventory/app/services/inventory_scheduler.py +++ b/services/inventory/app/services/inventory_scheduler.py @@ -2,6 +2,9 @@ Inventory Scheduler Service Background task that periodically checks for inventory alert conditions and triggers appropriate alerts. + +Uses Redis-based leader election to ensure only one pod runs scheduled tasks +when running with multiple replicas. """ import asyncio @@ -22,22 +25,129 @@ from app.services.inventory_alert_service import InventoryAlertService logger = structlog.get_logger() -class InventoryScheduler: - """Inventory scheduler service that checks for alert conditions""" - def __init__(self, alert_service: InventoryAlertService, database_manager: Any): +class InventoryScheduler: + """ + Inventory scheduler service that checks for alert conditions. + + Uses Redis-based leader election to ensure only one pod runs + scheduled jobs in a multi-replica deployment. + """ + + def __init__(self, alert_service: InventoryAlertService, database_manager: Any, redis_url: str = None): self.alert_service = alert_service self.database_manager = database_manager - self.scheduler = AsyncIOScheduler() + self.scheduler = None self.check_interval = 300 # 5 minutes self.job_id = 'inventory_scheduler' + # Leader election + self._redis_url = redis_url + self._leader_election = None + self._redis_client = None + self._scheduler_started = False + async def start(self): - """Start the inventory scheduler with APScheduler""" - if self.scheduler.running: - logger.warning("Inventory scheduler is already running") + """Start the inventory scheduler with leader election""" + if self._redis_url: + await self._start_with_leader_election() + else: + # Fallback to standalone mode (for local development or single-pod deployments) + logger.warning("Redis URL not provided, starting inventory scheduler in standalone mode") + await self._start_standalone() + + async def _start_with_leader_election(self): + """Start with Redis-based leader election for horizontal scaling""" + import redis.asyncio as redis + from shared.leader_election import LeaderElectionService + + try: + # Create Redis connection + self._redis_client = redis.from_url(self._redis_url, decode_responses=False) + await self._redis_client.ping() + + # Create scheduler (but don't start it yet) + self.scheduler = AsyncIOScheduler() + + # Create leader election + self._leader_election = LeaderElectionService( + self._redis_client, + service_name="inventory-scheduler" + ) + + # Start leader election with callbacks + await self._leader_election.start( + on_become_leader=self._on_become_leader, + on_lose_leader=self._on_lose_leader + ) + + logger.info("Inventory scheduler started with leader election", + is_leader=self._leader_election.is_leader, + instance_id=self._leader_election.instance_id) + + except Exception as e: + logger.error("Failed to start with leader election, falling back to standalone", + error=str(e)) + await self._start_standalone() + + async def _on_become_leader(self): + """Called when this instance becomes the leader""" + logger.info("Inventory scheduler became leader, starting scheduled jobs") + await self._start_scheduler() + + async def _on_lose_leader(self): + """Called when this instance loses leadership""" + logger.warning("Inventory scheduler lost leadership, stopping scheduled jobs") + await self._stop_scheduler() + + async def _start_scheduler(self): + """Start the APScheduler with inventory check jobs""" + if self._scheduler_started: + logger.warning("Inventory scheduler already started") return + try: + # Add the periodic job + trigger = IntervalTrigger(seconds=self.check_interval) + self.scheduler.add_job( + self._run_scheduler_task, + trigger=trigger, + id=self.job_id, + name="Inventory Alert Checks", + max_instances=1 # Prevent overlapping executions + ) + + # Start scheduler + if not self.scheduler.running: + self.scheduler.start() + self._scheduler_started = True + logger.info("Inventory scheduler jobs started", + interval_seconds=self.check_interval, + job_count=len(self.scheduler.get_jobs())) + + except Exception as e: + logger.error("Failed to start inventory scheduler", error=str(e)) + + async def _stop_scheduler(self): + """Stop the APScheduler""" + if not self._scheduler_started: + return + + try: + if self.scheduler and self.scheduler.running: + self.scheduler.shutdown(wait=False) + self._scheduler_started = False + logger.info("Inventory scheduler jobs stopped") + + except Exception as e: + logger.error("Failed to stop inventory scheduler", error=str(e)) + + async def _start_standalone(self): + """Start scheduler without leader election (fallback mode)""" + logger.warning("Starting inventory scheduler in standalone mode (no leader election)") + + self.scheduler = AsyncIOScheduler() + # Add the periodic job trigger = IntervalTrigger(seconds=self.check_interval) self.scheduler.add_job( @@ -45,75 +155,63 @@ class InventoryScheduler: trigger=trigger, id=self.job_id, name="Inventory Alert Checks", - max_instances=1 # Prevent overlapping executions + max_instances=1 ) - # Start the scheduler - self.scheduler.start() - logger.info("Inventory scheduler started", interval_seconds=self.check_interval) + if not self.scheduler.running: + self.scheduler.start() + self._scheduler_started = True + logger.info("Inventory scheduler started (standalone mode)", + interval_seconds=self.check_interval) async def stop(self): - """Stop the inventory scheduler""" - if self.scheduler.running: - self.scheduler.shutdown(wait=True) - logger.info("Inventory scheduler stopped") - else: - logger.info("Inventory scheduler already stopped") + """Stop the inventory scheduler and leader election""" + # Stop leader election + if self._leader_election: + await self._leader_election.stop() + + # Stop scheduler + await self._stop_scheduler() + + # Close Redis + if self._redis_client: + await self._redis_client.close() + + logger.info("Inventory scheduler stopped") + + @property + def is_leader(self) -> bool: + """Check if this instance is the leader""" + return self._leader_election.is_leader if self._leader_election else True + + def get_leader_status(self) -> dict: + """Get leader election status""" + if self._leader_election: + return self._leader_election.get_status() + return {"is_leader": True, "mode": "standalone"} async def _run_scheduler_task(self): - """Run scheduled inventory alert checks with leader election""" - # Try to acquire leader lock for this scheduler - lock_name = f"inventory_scheduler:{self.database_manager.database_url if hasattr(self.database_manager, 'database_url') else 'default'}" - lock_id = abs(hash(lock_name)) % (2**31) # Generate a unique integer ID for the lock - acquired = False + """Run scheduled inventory alert checks""" + start_time = datetime.now() + logger.info("Running scheduled inventory alert checks") try: - # Try to acquire PostgreSQL advisory lock for leader election - async with self.database_manager.get_session() as session: - result = await session.execute(text("SELECT pg_try_advisory_lock(:lock_id)"), {"lock_id": lock_id}) - acquired = True # If no exception, lock was acquired + # Run all alert checks + alerts_generated = await self.check_all_conditions() - start_time = datetime.now() - logger.info("Running scheduled inventory alert checks (as leader)") - - # Run all alert checks - alerts_generated = await self.check_all_conditions() - - duration = (datetime.now() - start_time).total_seconds() - logger.info( - "Completed scheduled inventory alert checks", - alerts_generated=alerts_generated, - duration_seconds=round(duration, 2) - ) + duration = (datetime.now() - start_time).total_seconds() + logger.info( + "Completed scheduled inventory alert checks", + alerts_generated=alerts_generated, + duration_seconds=round(duration, 2) + ) except Exception as e: - # If it's a lock acquisition error, log and skip execution (another instance is running) - error_str = str(e).lower() - if "lock" in error_str or "timeout" in error_str or "could not acquire" in error_str: - logger.debug( - "Skipping inventory scheduler execution (not leader)", - lock_name=lock_name - ) - return # Not an error, just not the leader - else: - logger.error( - "Error in inventory scheduler task", - error=str(e), - exc_info=True - ) - - finally: - if acquired: - # Release the lock - try: - async with self.database_manager.get_session() as session: - await session.execute(text("SELECT pg_advisory_unlock(:lock_id)"), {"lock_id": lock_id}) - await session.commit() - except Exception as unlock_error: - logger.warning( - "Error releasing leader lock (may have been automatically released)", - error=str(unlock_error) - ) + logger.error( + "Error in inventory scheduler task", + error=str(e), + exc_info=True + ) async def check_all_conditions(self) -> int: """ diff --git a/services/notification/Dockerfile b/services/notification/Dockerfile index 9f8cf519..fcc89bb2 100644 --- a/services/notification/Dockerfile +++ b/services/notification/Dockerfile @@ -1,11 +1,11 @@ # Notification Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/orchestrator/Dockerfile b/services/orchestrator/Dockerfile index fb8c009a..6d71ba51 100644 --- a/services/orchestrator/Dockerfile +++ b/services/orchestrator/Dockerfile @@ -1,11 +1,11 @@ # Orchestrator Service Dockerfile # Stage 1: Copy shared libraries -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Stage 2: Main service -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/orders/Dockerfile b/services/orders/Dockerfile index 239dda81..65df3a32 100644 --- a/services/orders/Dockerfile +++ b/services/orders/Dockerfile @@ -1,11 +1,11 @@ # Orders Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/pos/Dockerfile b/services/pos/Dockerfile index d0ba6d42..f61425cc 100644 --- a/services/pos/Dockerfile +++ b/services/pos/Dockerfile @@ -1,11 +1,11 @@ # Pos Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/pos/app/main.py b/services/pos/app/main.py index 64f3afaa..daf45ada 100644 --- a/services/pos/app/main.py +++ b/services/pos/app/main.py @@ -20,28 +20,12 @@ from shared.service_base import StandardFastAPIService class POSService(StandardFastAPIService): """POS Integration Service with standardized setup""" - expected_migration_version = "00001" - - async def on_startup(self, app): - """Custom startup logic including migration verification""" - await self.verify_migrations() - await super().on_startup(app) - - async def verify_migrations(self): - """Verify database schema matches the latest migrations.""" - try: - async with self.database_manager.get_session() as session: - result = await session.execute(text("SELECT version_num FROM alembic_version")) - version = result.scalar() - if version != self.expected_migration_version: - self.logger.error(f"Migration version mismatch: expected {self.expected_migration_version}, got {version}") - raise RuntimeError(f"Migration version mismatch: expected {self.expected_migration_version}, got {version}") - self.logger.info(f"Migration verification successful: {version}") - except Exception as e: - self.logger.error(f"Migration verification failed: {e}") - raise + expected_migration_version = "e9976ec9fe9e" def __init__(self): + # Initialize scheduler reference + self.pos_scheduler = None + # Define expected database tables for health checks pos_expected_tables = [ 'pos_configurations', 'pos_transactions', 'pos_transaction_items', @@ -87,15 +71,42 @@ class POSService(StandardFastAPIService): custom_metrics=pos_custom_metrics ) + async def verify_migrations(self): + """Verify database schema matches the latest migrations.""" + try: + async with self.database_manager.get_session() as session: + result = await session.execute(text("SELECT version_num FROM alembic_version")) + version = result.scalar() + if version != self.expected_migration_version: + self.logger.error(f"Migration version mismatch: expected {self.expected_migration_version}, got {version}") + raise RuntimeError(f"Migration version mismatch: expected {self.expected_migration_version}, got {version}") + self.logger.info(f"Migration verification successful: {version}") + except Exception as e: + self.logger.error(f"Migration verification failed: {e}") + raise + async def on_startup(self, app: FastAPI): """Custom startup logic for POS service""" - # Start background scheduler for POS-to-Sales sync + # Verify migrations first + await self.verify_migrations() + + # Call parent startup + await super().on_startup(app) + + # Start background scheduler for POS-to-Sales sync with leader election try: - from app.scheduler import start_scheduler - start_scheduler() - self.logger.info("Background scheduler started successfully") + from app.scheduler import POSScheduler + self.pos_scheduler = POSScheduler( + redis_url=settings.REDIS_URL, # Pass Redis URL for leader election + sync_interval_minutes=settings.SYNC_INTERVAL_SECONDS // 60 if settings.SYNC_INTERVAL_SECONDS >= 60 else 5 + ) + await self.pos_scheduler.start() + self.logger.info("POS scheduler started successfully with leader election") + + # Store scheduler in app state for status checks + app.state.pos_scheduler = self.pos_scheduler except Exception as e: - self.logger.error(f"Failed to start background scheduler: {e}", exc_info=True) + self.logger.error(f"Failed to start POS scheduler: {e}", exc_info=True) # Don't fail startup if scheduler fails # Custom startup completed @@ -103,13 +114,13 @@ class POSService(StandardFastAPIService): async def on_shutdown(self, app: FastAPI): """Custom shutdown logic for POS service""" - # Shutdown background scheduler + # Shutdown POS scheduler try: - from app.scheduler import shutdown_scheduler - shutdown_scheduler() - self.logger.info("Background scheduler stopped successfully") + if self.pos_scheduler: + await self.pos_scheduler.stop() + self.logger.info("POS scheduler stopped successfully") except Exception as e: - self.logger.error(f"Failed to stop background scheduler: {e}", exc_info=True) + self.logger.error(f"Failed to stop POS scheduler: {e}", exc_info=True) # Database cleanup is handled by the base class pass diff --git a/services/pos/app/scheduler.py b/services/pos/app/scheduler.py index 82067c6d..7741a0bc 100644 --- a/services/pos/app/scheduler.py +++ b/services/pos/app/scheduler.py @@ -5,17 +5,19 @@ Sets up periodic background jobs for: - Syncing POS transactions to sales service - Other maintenance tasks as needed -To enable scheduling, add to main.py startup: +Uses Redis-based leader election to ensure only one pod runs scheduled tasks +when running with multiple replicas. + +Usage in main.py: ```python -from app.scheduler import start_scheduler, shutdown_scheduler +from app.scheduler import POSScheduler -@app.on_event("startup") -async def startup_event(): - start_scheduler() +# On startup +scheduler = POSScheduler(redis_url=settings.REDIS_URL) +await scheduler.start() -@app.on_event("shutdown") -async def shutdown_event(): - shutdown_scheduler() +# On shutdown +await scheduler.stop() ``` """ @@ -23,65 +25,307 @@ import structlog from apscheduler.schedulers.asyncio import AsyncIOScheduler from apscheduler.triggers.interval import IntervalTrigger from datetime import datetime +from typing import Optional logger = structlog.get_logger() -# Global scheduler instance -scheduler = None + +class POSScheduler: + """ + POS Scheduler service that manages background sync jobs. + + Uses Redis-based leader election to ensure only one pod runs + scheduled jobs in a multi-replica deployment. + """ + + def __init__(self, redis_url: str = None, sync_interval_minutes: int = 5): + """ + Initialize POS scheduler. + + Args: + redis_url: Redis connection URL for leader election + sync_interval_minutes: Interval for POS-to-sales sync job + """ + self.scheduler = None + self.sync_interval_minutes = sync_interval_minutes + + # Leader election + self._redis_url = redis_url + self._leader_election = None + self._redis_client = None + self._scheduler_started = False + + async def start(self): + """Start the POS scheduler with leader election""" + if self._redis_url: + await self._start_with_leader_election() + else: + # Fallback to standalone mode (for local development or single-pod deployments) + logger.warning("Redis URL not provided, starting POS scheduler in standalone mode") + await self._start_standalone() + + async def _start_with_leader_election(self): + """Start with Redis-based leader election for horizontal scaling""" + import redis.asyncio as redis + from shared.leader_election import LeaderElectionService + + try: + # Create Redis connection + self._redis_client = redis.from_url(self._redis_url, decode_responses=False) + await self._redis_client.ping() + + # Create scheduler (but don't start it yet) + self.scheduler = AsyncIOScheduler() + + # Create leader election + self._leader_election = LeaderElectionService( + self._redis_client, + service_name="pos-scheduler" + ) + + # Start leader election with callbacks + await self._leader_election.start( + on_become_leader=self._on_become_leader, + on_lose_leader=self._on_lose_leader + ) + + logger.info("POS scheduler started with leader election", + is_leader=self._leader_election.is_leader, + instance_id=self._leader_election.instance_id) + + except Exception as e: + logger.error("Failed to start with leader election, falling back to standalone", + error=str(e)) + await self._start_standalone() + + async def _on_become_leader(self): + """Called when this instance becomes the leader""" + logger.info("POS scheduler became leader, starting scheduled jobs") + await self._start_scheduler() + + async def _on_lose_leader(self): + """Called when this instance loses leadership""" + logger.warning("POS scheduler lost leadership, stopping scheduled jobs") + await self._stop_scheduler() + + async def _start_scheduler(self): + """Start the APScheduler with POS jobs""" + if self._scheduler_started: + logger.warning("POS scheduler already started") + return + + try: + # Import sync job + from app.jobs.sync_pos_to_sales import run_pos_to_sales_sync + + # Job 1: Sync POS transactions to sales service + self.scheduler.add_job( + run_pos_to_sales_sync, + trigger=IntervalTrigger(minutes=self.sync_interval_minutes), + id='pos_to_sales_sync', + name='Sync POS Transactions to Sales', + replace_existing=True, + max_instances=1, # Prevent concurrent runs + coalesce=True, # Combine multiple missed runs into one + misfire_grace_time=60 # Allow 60 seconds grace for missed runs + ) + + # Start scheduler + if not self.scheduler.running: + self.scheduler.start() + self._scheduler_started = True + logger.info("POS scheduler jobs started", + sync_interval_minutes=self.sync_interval_minutes, + job_count=len(self.scheduler.get_jobs()), + next_run=self.scheduler.get_jobs()[0].next_run_time if self.scheduler.get_jobs() else None) + + except Exception as e: + logger.error("Failed to start POS scheduler", error=str(e)) + + async def _stop_scheduler(self): + """Stop the APScheduler""" + if not self._scheduler_started: + return + + try: + if self.scheduler and self.scheduler.running: + self.scheduler.shutdown(wait=False) + self._scheduler_started = False + logger.info("POS scheduler jobs stopped") + + except Exception as e: + logger.error("Failed to stop POS scheduler", error=str(e)) + + async def _start_standalone(self): + """Start scheduler without leader election (fallback mode)""" + logger.warning("Starting POS scheduler in standalone mode (no leader election)") + + self.scheduler = AsyncIOScheduler() + + try: + # Import sync job + from app.jobs.sync_pos_to_sales import run_pos_to_sales_sync + + self.scheduler.add_job( + run_pos_to_sales_sync, + trigger=IntervalTrigger(minutes=self.sync_interval_minutes), + id='pos_to_sales_sync', + name='Sync POS Transactions to Sales', + replace_existing=True, + max_instances=1, + coalesce=True, + misfire_grace_time=60 + ) + + if not self.scheduler.running: + self.scheduler.start() + self._scheduler_started = True + logger.info("POS scheduler started (standalone mode)", + sync_interval_minutes=self.sync_interval_minutes, + next_run=self.scheduler.get_jobs()[0].next_run_time if self.scheduler.get_jobs() else None) + + except Exception as e: + logger.error("Failed to start POS scheduler in standalone mode", error=str(e)) + + async def stop(self): + """Stop the POS scheduler and leader election""" + # Stop leader election + if self._leader_election: + await self._leader_election.stop() + + # Stop scheduler + await self._stop_scheduler() + + # Close Redis + if self._redis_client: + await self._redis_client.close() + + logger.info("POS scheduler stopped") + + @property + def is_leader(self) -> bool: + """Check if this instance is the leader""" + return self._leader_election.is_leader if self._leader_election else True + + def get_leader_status(self) -> dict: + """Get leader election status""" + if self._leader_election: + return self._leader_election.get_status() + return {"is_leader": True, "mode": "standalone"} + + def get_scheduler_status(self) -> dict: + """ + Get current scheduler status + + Returns: + Dict with scheduler info and job statuses + """ + if self.scheduler is None or not self._scheduler_started: + return { + "running": False, + "is_leader": self.is_leader, + "jobs": [] + } + + jobs = [] + for job in self.scheduler.get_jobs(): + jobs.append({ + "id": job.id, + "name": job.name, + "next_run": job.next_run_time.isoformat() if job.next_run_time else None, + "trigger": str(job.trigger) + }) + + return { + "running": True, + "is_leader": self.is_leader, + "jobs": jobs, + "state": self.scheduler.state + } + + def trigger_job_now(self, job_id: str) -> bool: + """ + Manually trigger a scheduled job immediately + + Args: + job_id: Job identifier (e.g., 'pos_to_sales_sync') + + Returns: + True if job was triggered, False otherwise + """ + if self.scheduler is None or not self._scheduler_started: + logger.error("Cannot trigger job, scheduler not running") + return False + + if not self.is_leader: + logger.warning("Cannot trigger job, this instance is not the leader") + return False + + try: + job = self.scheduler.get_job(job_id) + if job: + self.scheduler.modify_job(job_id, next_run_time=datetime.now()) + logger.info("Job triggered manually", job_id=job_id) + return True + else: + logger.warning("Job not found", job_id=job_id) + return False + + except Exception as e: + logger.error("Failed to trigger job", job_id=job_id, error=str(e)) + return False + + +# ================================================================ +# Legacy compatibility functions (deprecated - use POSScheduler class) +# ================================================================ + +# Global scheduler instance for backward compatibility +_scheduler_instance: Optional[POSScheduler] = None def start_scheduler(): """ - Initialize and start the background scheduler + DEPRECATED: Use POSScheduler class directly for better leader election support. - Jobs configured: - - POS to Sales Sync: Every 5 minutes + Initialize and start the background scheduler (legacy function). """ - global scheduler + global _scheduler_instance - if scheduler is not None: + if _scheduler_instance is not None: logger.warning("Scheduler already running") return + logger.warning("Using deprecated start_scheduler function. " + "Consider migrating to POSScheduler class for leader election support.") + try: - scheduler = AsyncIOScheduler() - - # Job 1: Sync POS transactions to sales service - from app.jobs.sync_pos_to_sales import run_pos_to_sales_sync - - scheduler.add_job( - run_pos_to_sales_sync, - trigger=IntervalTrigger(minutes=5), - id='pos_to_sales_sync', - name='Sync POS Transactions to Sales', - replace_existing=True, - max_instances=1, # Prevent concurrent runs - coalesce=True, # Combine multiple missed runs into one - misfire_grace_time=60 # Allow 60 seconds grace for missed runs - ) - - scheduler.start() - logger.info("Background scheduler started", - jobs=len(scheduler.get_jobs()), - next_run=scheduler.get_jobs()[0].next_run_time if scheduler.get_jobs() else None) + _scheduler_instance = POSScheduler() + # Note: This is synchronous fallback, no leader election + import asyncio + asyncio.create_task(_scheduler_instance._start_standalone()) except Exception as e: logger.error("Failed to start scheduler", error=str(e), exc_info=True) - scheduler = None + _scheduler_instance = None def shutdown_scheduler(): - """Gracefully shutdown the scheduler""" - global scheduler + """ + DEPRECATED: Use POSScheduler class directly. - if scheduler is None: + Gracefully shutdown the scheduler (legacy function). + """ + global _scheduler_instance + + if _scheduler_instance is None: logger.warning("Scheduler not running") return try: - scheduler.shutdown(wait=True) - logger.info("Background scheduler stopped") - scheduler = None + import asyncio + asyncio.create_task(_scheduler_instance.stop()) + _scheduler_instance = None except Exception as e: logger.error("Failed to shutdown scheduler", error=str(e), exc_info=True) @@ -89,57 +333,25 @@ def shutdown_scheduler(): def get_scheduler_status(): """ - Get current scheduler status + DEPRECATED: Use POSScheduler class directly. - Returns: - Dict with scheduler info and job statuses + Get current scheduler status (legacy function). """ - if scheduler is None: + if _scheduler_instance is None: return { "running": False, "jobs": [] } - - jobs = [] - for job in scheduler.get_jobs(): - jobs.append({ - "id": job.id, - "name": job.name, - "next_run": job.next_run_time.isoformat() if job.next_run_time else None, - "trigger": str(job.trigger) - }) - - return { - "running": True, - "jobs": jobs, - "state": scheduler.state - } + return _scheduler_instance.get_scheduler_status() def trigger_job_now(job_id: str): """ - Manually trigger a scheduled job immediately + DEPRECATED: Use POSScheduler class directly. - Args: - job_id: Job identifier (e.g., 'pos_to_sales_sync') - - Returns: - True if job was triggered, False otherwise + Manually trigger a scheduled job immediately (legacy function). """ - if scheduler is None: + if _scheduler_instance is None: logger.error("Cannot trigger job, scheduler not running") return False - - try: - job = scheduler.get_job(job_id) - if job: - scheduler.modify_job(job_id, next_run_time=datetime.now()) - logger.info("Job triggered manually", job_id=job_id) - return True - else: - logger.warning("Job not found", job_id=job_id) - return False - - except Exception as e: - logger.error("Failed to trigger job", job_id=job_id, error=str(e)) - return False + return _scheduler_instance.trigger_job_now(job_id) diff --git a/services/procurement/Dockerfile b/services/procurement/Dockerfile index 3101e4d6..4f31b0b3 100644 --- a/services/procurement/Dockerfile +++ b/services/procurement/Dockerfile @@ -1,11 +1,11 @@ # Procurement Service Dockerfile # Stage 1: Copy shared libraries -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Stage 2: Main service -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/procurement/app/services/delivery_tracking_service.py b/services/procurement/app/services/delivery_tracking_service.py index fa56a730..9cdc0125 100644 --- a/services/procurement/app/services/delivery_tracking_service.py +++ b/services/procurement/app/services/delivery_tracking_service.py @@ -156,21 +156,14 @@ class DeliveryTrackingService: async def _check_all_tenants(self): """ - Check deliveries for all active tenants (with leader election). + Check deliveries for all active tenants. - Only one pod executes this - others skip if not leader. + This method is only called by the leader pod (via APScheduler). + Leader election is handled at the scheduler level, not here. """ - # Try to acquire leader lock - if not await self._try_acquire_leader_lock(): - logger.debug( - "Skipping delivery check - not leader", - instance_id=self.instance_id - ) - return + logger.info("Starting delivery checks", instance_id=self.instance_id) try: - logger.info("Starting delivery checks (as leader)", instance_id=self.instance_id) - # Get all active tenants from database tenants = await self._get_active_tenants() @@ -194,24 +187,8 @@ class DeliveryTrackingService: total_alerts=total_alerts ) - finally: - await self._release_leader_lock() - - async def _try_acquire_leader_lock(self) -> bool: - """ - Try to acquire leader lock for delivery tracking. - - Uses Redis to ensure only one pod runs checks. - Returns True if acquired, False if another pod is leader. - """ - # This simplified version doesn't implement leader election - # In a real implementation, you'd use Redis or database locks - logger.info("Delivery tracking check running", instance_id=self.instance_id) - return True - - async def _release_leader_lock(self): - """Release leader lock""" - logger.debug("Delivery tracking check completed", instance_id=self.instance_id) + except Exception as e: + logger.error("Delivery checks failed", error=str(e), exc_info=True) async def _get_active_tenants(self) -> List[UUID]: """ diff --git a/services/production/Dockerfile b/services/production/Dockerfile index 55232085..b962d314 100644 --- a/services/production/Dockerfile +++ b/services/production/Dockerfile @@ -1,11 +1,11 @@ # Production Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/production/app/services/production_scheduler.py b/services/production/app/services/production_scheduler.py index e84f09af..c869f19a 100644 --- a/services/production/app/services/production_scheduler.py +++ b/services/production/app/services/production_scheduler.py @@ -2,6 +2,8 @@ Production Scheduler Service Background task that periodically checks for production alert conditions and triggers appropriate alerts. + +Uses shared leader election for horizontal scaling - only one pod runs the scheduler. """ import asyncio @@ -21,103 +23,144 @@ from app.services.production_alert_service import ProductionAlertService logger = structlog.get_logger() -class ProductionScheduler: - """Production scheduler service that checks for alert conditions""" - def __init__(self, alert_service: ProductionAlertService, database_manager: Any): +class ProductionScheduler: + """Production scheduler service that checks for alert conditions. + + Uses Redis-based leader election to ensure only one pod runs the scheduler. + """ + + def __init__(self, alert_service: ProductionAlertService, database_manager: Any, redis_url: str = None): self.alert_service = alert_service self.database_manager = database_manager + self.redis_url = redis_url self.scheduler = AsyncIOScheduler() self.check_interval = 300 # 5 minutes self.job_id = 'production_scheduler' + # Leader election + self._leader_election = None + self._redis_client = None + self._scheduler_started = False + # Cache de alertas emitidas para evitar duplicados self._emitted_alerts: set = set() self._alert_cache_ttl = 3600 # 1 hora self._last_cache_clear = datetime.utcnow() async def start(self): - """Start the production scheduler with APScheduler""" - if self.scheduler.running: - logger.warning("Production scheduler is already running") - return + """Start the production scheduler with leader election""" + try: + # Initialize leader election if Redis URL is provided + if self.redis_url: + await self._setup_leader_election() + else: + # No Redis, start scheduler directly (standalone mode) + logger.warning("No Redis URL provided, starting scheduler in standalone mode") + await self._start_scheduler() + except Exception as e: + logger.error("Failed to setup leader election, starting in standalone mode", + error=str(e)) + await self._start_scheduler() - # Add the periodic job - trigger = IntervalTrigger(seconds=self.check_interval) - self.scheduler.add_job( - self._run_scheduler_task, - trigger=trigger, - id=self.job_id, - name="Production Alert Checks", - max_instances=1 # Prevent overlapping executions + async def _setup_leader_election(self): + """Setup Redis-based leader election""" + from shared.leader_election import LeaderElectionService + import redis.asyncio as redis + + self._redis_client = redis.from_url(self.redis_url, decode_responses=False) + await self._redis_client.ping() + + self._leader_election = LeaderElectionService( + self._redis_client, + service_name="production-scheduler" ) - # Start the scheduler - self.scheduler.start() - logger.info("Production scheduler started", interval_seconds=self.check_interval) + await self._leader_election.start( + on_become_leader=self._on_become_leader, + on_lose_leader=self._on_lose_leader + ) + + logger.info("Leader election initialized for production scheduler", + is_leader=self._leader_election.is_leader) + + async def _on_become_leader(self): + """Called when this instance becomes the leader""" + logger.info("Became leader for production scheduler - starting scheduler") + await self._start_scheduler() + + async def _on_lose_leader(self): + """Called when this instance loses leadership""" + logger.warning("Lost leadership for production scheduler - stopping scheduler") + await self._stop_scheduler() + + async def _start_scheduler(self): + """Start the APScheduler""" + if self._scheduler_started: + logger.debug("Production scheduler already started") + return + + if not self.scheduler.running: + trigger = IntervalTrigger(seconds=self.check_interval) + self.scheduler.add_job( + self._run_scheduler_task, + trigger=trigger, + id=self.job_id, + name="Production Alert Checks", + max_instances=1 + ) + + self.scheduler.start() + self._scheduler_started = True + logger.info("Production scheduler started", interval_seconds=self.check_interval) + + async def _stop_scheduler(self): + """Stop the APScheduler""" + if not self._scheduler_started: + return + + if self.scheduler.running: + self.scheduler.shutdown(wait=False) + self._scheduler_started = False + logger.info("Production scheduler stopped") async def stop(self): - """Stop the production scheduler""" - if self.scheduler.running: - self.scheduler.shutdown(wait=True) - logger.info("Production scheduler stopped") - else: - logger.info("Production scheduler already stopped") + """Stop the production scheduler and leader election""" + if self._leader_election: + await self._leader_election.stop() + + await self._stop_scheduler() + + if self._redis_client: + await self._redis_client.close() + + @property + def is_leader(self) -> bool: + """Check if this instance is the leader""" + return self._leader_election.is_leader if self._leader_election else True async def _run_scheduler_task(self): - """Run scheduled production alert checks with leader election""" - # Try to acquire leader lock for this scheduler - lock_name = f"production_scheduler:{self.database_manager.database_url if hasattr(self.database_manager, 'database_url') else 'default'}" - lock_id = abs(hash(lock_name)) % (2**31) # Generate a unique integer ID for the lock - acquired = False + """Run scheduled production alert checks""" + start_time = datetime.now() + logger.info("Running scheduled production alert checks") try: - # Try to acquire PostgreSQL advisory lock for leader election - async with self.database_manager.get_session() as session: - result = await session.execute(text("SELECT pg_try_advisory_lock(:lock_id)"), {"lock_id": lock_id}) - acquired = True # If no exception, lock was acquired + # Run all alert checks + alerts_generated = await self.check_all_conditions() - start_time = datetime.now() - logger.info("Running scheduled production alert checks (as leader)") - - # Run all alert checks - alerts_generated = await self.check_all_conditions() - - duration = (datetime.now() - start_time).total_seconds() - logger.info( - "Completed scheduled production alert checks", - alerts_generated=alerts_generated, - duration_seconds=round(duration, 2) - ) + duration = (datetime.now() - start_time).total_seconds() + logger.info( + "Completed scheduled production alert checks", + alerts_generated=alerts_generated, + duration_seconds=round(duration, 2) + ) except Exception as e: - # If it's a lock acquisition error, log and skip execution (another instance is running) - error_str = str(e).lower() - if "lock" in error_str or "timeout" in error_str or "could not acquire" in error_str: - logger.debug( - "Skipping production scheduler execution (not leader)", - lock_name=lock_name - ) - return # Not an error, just not the leader - else: - logger.error( - "Error in production scheduler task", - error=str(e), - exc_info=True - ) - - finally: - if acquired: - # Release the lock - try: - async with self.database_manager.get_session() as session: - await session.execute(text("SELECT pg_advisory_unlock(:lock_id)"), {"lock_id": lock_id}) - await session.commit() - except Exception as unlock_error: - logger.warning( - "Error releasing leader lock (may have been automatically released)", - error=str(unlock_error) - ) + logger.error( + "Error in production scheduler task", + error=str(e), + exc_info=True + ) async def check_all_conditions(self) -> int: """ diff --git a/services/recipes/Dockerfile b/services/recipes/Dockerfile index d90145ac..8b24dd2f 100644 --- a/services/recipes/Dockerfile +++ b/services/recipes/Dockerfile @@ -1,11 +1,11 @@ # Recipes Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/sales/Dockerfile b/services/sales/Dockerfile index ca5a1a78..4fb736f9 100644 --- a/services/sales/Dockerfile +++ b/services/sales/Dockerfile @@ -1,11 +1,11 @@ # Sales Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/suppliers/Dockerfile b/services/suppliers/Dockerfile index d79b5216..60a63d0a 100644 --- a/services/suppliers/Dockerfile +++ b/services/suppliers/Dockerfile @@ -1,11 +1,11 @@ # Suppliers Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/tenant/Dockerfile b/services/tenant/Dockerfile index d9f7fb79..cbce384c 100644 --- a/services/tenant/Dockerfile +++ b/services/tenant/Dockerfile @@ -1,11 +1,11 @@ # Tenant Dockerfile # Add this stage at the top of each service Dockerfile -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Then your main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/services/training/Dockerfile b/services/training/Dockerfile index 7662e2a2..1b2ed41f 100644 --- a/services/training/Dockerfile +++ b/services/training/Dockerfile @@ -1,11 +1,11 @@ # Training Service Dockerfile with MinIO Support # Multi-stage build for optimized production image -FROM python:3.11-slim AS shared +FROM localhost:5000/python_3.11-slim AS shared WORKDIR /shared COPY shared/ /shared/ # Main service stage -FROM python:3.11-slim +FROM localhost:5000/python_3.11-slim WORKDIR /app diff --git a/shared/monitoring/otel_config.py b/shared/monitoring/otel_config.py index 0ce443ed..469eec9f 100644 --- a/shared/monitoring/otel_config.py +++ b/shared/monitoring/otel_config.py @@ -63,6 +63,13 @@ class OTelConfig: # Clean and parse base endpoint base_grpc = cls._clean_grpc_endpoint(base_endpoint) base_http_host = cls._extract_host(base_endpoint) + + # Validate that the endpoint doesn't contain secret references or malformed data + if cls._contains_secret_reference(base_grpc): + logger.error("OTEL endpoint contains secret reference, falling back to default", + malformed_endpoint=base_endpoint) + base_grpc = f"{cls.DEFAULT_OTEL_COLLECTOR_HOST}:{cls.DEFAULT_GRPC_PORT}" + base_http_host = f"http://{cls.DEFAULT_OTEL_COLLECTOR_HOST}:{cls.DEFAULT_HTTP_PORT}" else: # Use default collector base_grpc = f"{cls.DEFAULT_OTEL_COLLECTOR_HOST}:{cls.DEFAULT_GRPC_PORT}" @@ -73,9 +80,9 @@ class OTelConfig: metrics_endpoint = os.getenv("OTEL_EXPORTER_OTLP_METRICS_ENDPOINT", base_grpc) logs_endpoint = os.getenv("OTEL_EXPORTER_OTLP_LOGS_ENDPOINT") - # Build final endpoints - traces_grpc = cls._clean_grpc_endpoint(traces_endpoint) - metrics_grpc = cls._clean_grpc_endpoint(metrics_endpoint) + # Validate and clean signal-specific endpoints + traces_grpc = cls._clean_and_validate_grpc_endpoint(traces_endpoint) + metrics_grpc = cls._clean_and_validate_grpc_endpoint(metrics_endpoint) # For metrics HTTP, convert gRPC endpoint to HTTP if needed metrics_http = cls._grpc_to_http_endpoint(metrics_grpc, "/v1/metrics") diff --git a/skaffold.yaml b/skaffold.yaml index ab5725cb..8cb6c234 100644 --- a/skaffold.yaml +++ b/skaffold.yaml @@ -101,7 +101,7 @@ build: deploy: kustomize: paths: - - infrastructure/kubernetes/overlays/dev + - infrastructure/environments/dev/k8s-manifests statusCheck: true statusCheckDeadlineSeconds: 600 kubectl: @@ -130,15 +130,15 @@ deploy: - host: command: ["sh", "-c", "echo ''"] - host: - command: ["kubectl", "apply", "-f", "infrastructure/kubernetes/base/secrets.yaml"] + command: ["kubectl", "apply", "-f", "infrastructure/environments/dev/k8s-manifests/base/secrets.yaml"] - host: - command: ["kubectl", "apply", "-f", "infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml"] + command: ["kubectl", "apply", "-f", "infrastructure/environments/dev/k8s-manifests/base/secrets/postgres-tls-secret.yaml"] - host: - command: ["kubectl", "apply", "-f", "infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml"] + command: ["kubectl", "apply", "-f", "infrastructure/environments/dev/k8s-manifests/base/secrets/redis-tls-secret.yaml"] - host: - command: ["kubectl", "apply", "-f", "infrastructure/kubernetes/base/configs/postgres-init-config.yaml"] + command: ["kubectl", "apply", "-f", "infrastructure/environments/dev/k8s-manifests/base/configs/postgres-init-config.yaml"] - host: - command: ["kubectl", "apply", "-f", "infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml"] + command: ["kubectl", "apply", "-f", "infrastructure/environments/dev/k8s-manifests/base/configmaps/postgres-logging-config.yaml"] - host: command: ["sh", "-c", "echo ''"] - host: @@ -205,7 +205,7 @@ profiles: deploy: kustomize: paths: - - infrastructure/kubernetes/overlays/dev + - infrastructure/environments/dev/k8s-manifests - name: debug activation: @@ -219,7 +219,7 @@ profiles: deploy: kustomize: paths: - - infrastructure/kubernetes/overlays/dev + - infrastructure/environments/dev/k8s-manifests portForward: - resourceType: service resourceName: frontend-service @@ -247,4 +247,4 @@ profiles: deploy: kustomize: paths: - - infrastructure/kubernetes/overlays/prod + - infrastructure/environments/prod/k8s-manifests \ No newline at end of file