cat << 'EOFCMD' | colima --profile k8s-local ssh sudo tee /etc/docker/daemon.json << 'EOF' { "exec-opts": [ "native.cgroupdriver=cgroupfs" ], "features": { "buildkit": true, "containerd-snapshotter": true }, "insecure-registries": ["registry.bakery-ia.local"] } EOF EOFCMD ------- Kind cluster configuration: Added registry.bakery-ia.local to /etc/hosts inside Kind container Configured containerd to trust the self-signed certificate via /etc/containerd/certs.d/registry.bakery-ia.local/hosts.toml docker exec bakery-ia-local-control-plane sh -c 'echo "127.0.0.1 registry.bakery-ia.local" >> /etc/hosts' 2>&1 kubectl get secret bakery-dev-tls-cert -n bakery-ia -o jsonpath='{.data.tls\.crt}' | base64 -d | docker exec -i bakery-ia-local-control-plane sh -c 'mkdir -p /etc/containerd/certs.d/registry.bakery-ia.local && cat > /etc/containerd/certs.d/registry.bakery-ia.local/ca.crt' 2>&1 docker exec bakery-ia-local-control-plane sh -c 'cat > /etc/containerd/certs.d/registry.bakery-ia.local/hosts.toml << EOF server = "https://registry.bakery-ia.local" [host."https://registry.bakery-ia.local"] capabilities = ["pull", "resolve"] ca = "/etc/containerd/certs.d/registry.bakery-ia.local/ca.crt" EOF' 2>&1 # Bakery-IA Production CI/CD Implementation Plan ## Document Overview **Status**: Draft **Version**: 1.0 **Date**: 2024-07-15 **Author**: Mistral Vibe This document outlines the production-grade CI/CD architecture for Bakery-IA and provides a step-by-step implementation plan without requiring immediate code changes. ## Table of Contents 1. [Current State Analysis](#current-state-analysis) 2. [Target Architecture](#target-architecture) 3. [Implementation Strategy](#implementation-strategy) 4. [Phase 1: Infrastructure Setup](#phase-1-infrastructure-setup) 5. [Phase 2: CI/CD Pipeline Configuration](#phase-2-cicd-pipeline-configuration) 6. [Phase 3: Monitoring and Observability](#phase-3-monitoring-and-observability) 7. [Phase 4: Testing and Validation](#phase-4-testing-and-validation) 8. [Phase 5: Rollout and Migration](#phase-5-rollback-and-migration) 9. [Risk Assessment](#risk-assessment) 10. [Success Metrics](#success-metrics) 11. [Appendices](#appendices) --- ## Current State Analysis ### Existing Infrastructure - **Microservices**: 19 services in `services/` directory - **Frontend**: React application in `frontend/` - **Gateway**: API gateway in `gateway/` - **Databases**: 22 PostgreSQL instances + Redis + RabbitMQ - **Storage**: MinIO for object storage - **Monitoring**: SigNoz already deployed - **Target Platform**: MicroK8s on Clouding.io VPS ### Current Deployment Process - Manual builds using Tiltfile/Skaffold (local only) - Manual image pushes to local registry or Docker Hub - Manual kubectl apply commands - No automated testing gates - No rollback mechanism ### Pain Points - "Works on my machine" issues - No audit trail of deployments - Time-consuming manual processes - Risk of human error - No automated testing in pipeline --- ## Target Architecture ### High-Level Architecture Diagram ```mermaid graph TD A[Developer Workstation] -->|Push Code| B[Gitea Git Server] B -->|Webhook| C[Tekton Pipelines] C -->|Build/Test| D[Gitea Container Registry] D -->|New Image| E[Flux CD] E -->|Git Commit| B E -->|kubectl apply| F[MicroK8s Cluster] F -->|Metrics/Logs| G[SigNoz Monitoring] ``` ### How CI/CD Tools Run in Kubernetes Yes, they are individual container images running as pods in your MicroK8s cluster, just like your application services. ```mermaid graph TB subgraph "MicroK8s Cluster (Your VPS)" subgraph "Namespace: gitea" A1[Pod: gitea
Image: gitea/gitea:latest] A2[Pod: gitea-postgresql
Image: postgres:15] A3[PVC: gitea-data] end subgraph "Namespace: tekton-pipelines" B1[Pod: tekton-pipelines-controller
Image: gcr.io/tekton-releases/...] B2[Pod: tekton-pipelines-webhook
Image: gcr.io/tekton-releases/...] B3[Pod: tekton-triggers-controller
Image: gcr.io/tekton-releases/...] end subgraph "Namespace: flux-system" C1[Pod: source-controller
Image: ghcr.io/fluxcd/...] C2[Pod: kustomize-controller
Image: ghcr.io/fluxcd/...] C3[Pod: helm-controller
Image: ghcr.io/fluxcd/...] end subgraph "Namespace: bakery-ia (YOUR APP)" D1[19 services + 22 databases + Redis + RabbitMQ + MinIO] end end ``` ### Component Breakdown #### 1. Gitea (Git Server + Registry) - **Purpose**: Replace GitHub dependency - **Namespace**: `gitea` - **Resources**: ~768MB RAM (512MB Gitea + 256MB PostgreSQL) - **Storage**: PVC for repositories and registry - **Access**: Internal DNS `gitea.bakery-ia.local` - **LeaderElectionService**: Gitea handles leader election internally for high availability scenarios #### 2. Tekton (CI Pipelines) - **Purpose**: Build, test, and push container images - **Namespace**: `tekton-pipelines` - **Resources**: ~650MB baseline + 512MB per build - **Key Features**: - Path-based change detection - Parallel builds for independent services - Kaniko for in-cluster image building - Integration with Gitea registry - **LeaderElectionService**: Tekton controllers use leader election to ensure high availability #### 3. Flux CD (GitOps Deployment) - **Purpose**: Automated deployments from Git - **Namespace**: `flux-system` - **Resources**: ~230MB baseline - **Key Features**: - Pull-based deployments (no webhooks needed) - Kustomize support for your existing overlays - Image automation for rolling updates - Drift detection and correction - **LeaderElectionService**: Flux controllers use leader election to ensure only one active controller #### 4. SigNoz (Monitoring) - **Purpose**: Observability for CI/CD and applications - **Integration Points**: - Tekton pipeline metrics - Flux reconciliation events - Kubernetes resource metrics - Application performance monitoring ### Deployment Methods for Each Tool #### 1. Flux (Easiest - Built into MicroK8s) ```bash # One command - MicroK8s has it built-in microk8s enable fluxcd # This creates: # - Namespace: flux-system # - Deployments: source-controller, kustomize-controller, helm-controller, notification-controller # - CRDs: GitRepository, Kustomization, HelmRelease, etc. ``` **Images pulled:** - `ghcr.io/fluxcd/source-controller:v1.x.x` - `ghcr.io/fluxcd/kustomize-controller:v1.x.x` - `ghcr.io/fluxcd/helm-controller:v0.x.x` - `ghcr.io/fluxcd/notification-controller:v1.x.x` #### 2. Tekton (kubectl apply or Helm) ```bash # Option A: Direct apply (official releases) kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml kubectl apply -f https://storage.googleapis.com/tekton-releases/dashboard/latest/release.yaml # Option B: Helm chart helm repo add tekton https://tekton.dev/charts helm install tekton-pipelines tekton/tekton-pipelines -n tekton-pipelines --create-namespace ``` **Images pulled:** - `gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/controller:v0.x.x` - `gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/webhook:v0.x.x` - `gcr.io/tekton-releases/github.com/tektoncd/triggers/cmd/controller:v0.x.x` - `gcr.io/tekton-releases/github.com/tektoncd/dashboard/cmd/dashboard:v0.x.x` #### 3. Gitea (Helm chart) ```bash # Add Helm repo helm repo add gitea https://dl.gitea.io/charts # Install with custom values helm install gitea gitea/gitea \ -n gitea --create-namespace \ -f gitea-values.yaml ``` **Images pulled:** - `gitea/gitea:1.x.x` - `postgres:15-alpine` (or bundled) --- ## Complete Deployment Architecture ```mermaid graph TB subgraph "Your Git Repository
(Initially in GitHub, then Gitea)" A[bakery-ia/
├── services/
├── frontend/
├── gateway/
├── infrastructure/
│ ├── kubernetes/
│ │ ├── base/
│ │ └── overlays/
│ │ ├── dev/
│ │ └── prod/
│ └── ci-cd/
│ ├── gitea/
│ ├── tekton/
│ └── flux/
└── tekton/
└── pipeline.yaml] end A --> B[Gitea
Self-hosted Git
Stores code
Triggers webhook] B --> C[Tekton
EventListener
TriggerTemplate
PipelineRun] C --> D[Pipeline Steps
├── clone
├── detect changes
├── test
├── build
└── push] D --> E[Gitea Registry
gitea:5000/bakery/
auth-service:abc123] E --> F[Flux
source-controller
kustomize-controller
kubectl apply] F --> G[Your Application
bakery-ia namespace
Updated services] ``` ### Guiding Principles 1. **No Code Changes Required**: Use existing codebase as-is 2. **Incremental Rollout**: Phase-based implementation 3. **Zero Downtime**: Parallel run with existing manual process 4. **Observability First**: Monitor before automating 5. **Security by Design**: Secrets management from day one ### Implementation Phases ```mermaid gantt title CI/CD Implementation Timeline dateFormat YYYY-MM-DD section Phase 1: Infrastructure Infrastructure Setup :a1, 2024-07-15, 7d section Phase 2: CI/CD Config Pipeline Configuration :a2, 2024-07-22, 10d section Phase 3: Monitoring SigNoz Integration :a3, 2024-08-01, 5d section Phase 4: Testing Validation Testing :a4, 2024-08-06, 7d section Phase 5: Rollout Production Migration :a5, 2024-08-13, 5d ``` ## Step-by-Step: How to Deploy CI/CD to Production ### Phase 1: Bootstrap (One-time setup on VPS) ```bash # SSH to your VPS ssh user@your-clouding-vps # 1. Enable Flux (built into MicroK8s) microk8s enable fluxcd # 2. Install Tekton microk8s kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml microk8s kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml # 3. Install Gitea via Helm microk8s helm repo add gitea https://dl.gitea.io/charts microk8s helm install gitea gitea/gitea -n gitea --create-namespace -f gitea-values.yaml # 4. Verify all running microk8s kubectl get pods -A | grep -E "gitea|tekton|flux" ``` After this, you have: ``` NAMESPACE NAME READY STATUS gitea gitea-0 1/1 Running gitea gitea-postgresql-0 1/1 Running tekton-pipelines tekton-pipelines-controller-xxx 1/1 Running tekton-pipelines tekton-pipelines-webhook-xxx 1/1 Running tekton-pipelines tekton-triggers-controller-xxx 1/1 Running flux-system source-controller-xxx 1/1 Running flux-system kustomize-controller-xxx 1/1 Running flux-system helm-controller-xxx 1/1 Running ``` ### Phase 2: Configure Flux to Watch Your Repo ```yaml # infrastructure/ci-cd/flux/gitrepository.yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: GitRepository metadata: name: bakery-ia namespace: flux-system spec: interval: 1m url: https://gitea.bakery-ia.local/bakery/bakery-ia.git ref: branch: main secretRef: name: gitea-credentials # Git credentials --- # infrastructure/ci-cd/flux/kustomization.yaml apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: bakery-ia-prod namespace: flux-system spec: interval: 5m path: ./infrastructure/kubernetes/overlays/prod prune: true sourceRef: kind: GitRepository name: bakery-ia targetNamespace: bakery-ia ``` ### Phase 3: Configure Tekton Pipeline ```yaml # tekton/pipeline.yaml apiVersion: tekton.dev/v1beta1 kind: Pipeline metadata: name: bakery-ia-ci namespace: tekton-pipelines spec: params: - name: git-url - name: git-revision - name: changed-services type: array workspaces: - name: source - name: docker-credentials tasks: - name: clone taskRef: name: git-clone workspaces: - name: output workspace: source params: - name: url value: $(params.git-url) - name: revision value: $(params.git-revision) - name: detect-changes runAfter: [clone] taskRef: name: detect-changed-services workspaces: - name: source workspace: source - name: build-and-push runAfter: [detect-changes] taskRef: name: kaniko-build params: - name: services value: $(tasks.detect-changes.results.changed-services) workspaces: - name: source workspace: source - name: docker-credentials workspace: docker-credentials ``` ## Visual: Complete Production Flow ```mermaid graph LR A[Developer pushes code] --> B[Gitea
Self-hosted Git
• Receives push
• Stores code
• Triggers webhook] B -->|webhook POST to tekton-triggers| C[Tekton
EventListener
TriggerTemplate
PipelineRun] C --> D[Pipeline Steps
Each step = container in pod:
├── clone
├── detect changes
├── test (pytest)
├── build (kaniko)
└── push (registry)] D --> E[Only changed services] D --> F[Gitea Registry
gitea:5000/bakery/
auth-service:abc123] F -->|Final step: Update image tag in Git
commits new tag to infrastructure/kubernetes/overlays/prod| G[Git commit triggers Flux] G --> H[Flux
source-controller
kustomize-controller
kubectl apply
• Detects new image tag in Git
• Renders Kustomize overlay
• Applies to bakery-ia namespace
• Rolling update of changed services] H --> I[Your Application
Namespace: bakery-ia
├── auth-service:abc123 ←NEW
├── tenant-svc:def456
└── training-svc:ghi789
Only auth-service was updated (others unchanged)] ``` ## Where Images Come From | Component | Image Source | Notes | |-----------|--------------|-------| | Flux | ghcr.io/fluxcd/* | Pulled once, cached locally | | Tekton | gcr.io/tekton-releases/* | Pulled once, cached locally | | Gitea | gitea/gitea (Docker Hub) | Pulled once, cached locally | | Your Services | gitea.local:5000/bakery/* | Built by Tekton, stored in Gitea registry | | Build Tools | gcr.io/kaniko-project/executor | Used during builds only | ## Summary: What Lives Where ```mermaid graph TB subgraph "MicroK8s Cluster" subgraph "Namespace: gitea (CI/CD Infrastructure)
~768MB total" A1[gitea pod ~512MB RAM] A2[postgresql pod ~256MB RAM] end subgraph "Namespace: tekton-pipelines (CI/CD Infrastructure)
~650MB baseline" B1[pipelines-controller ~200MB RAM] B2[pipelines-webhook ~100MB RAM] B3[triggers-controller ~150MB RAM] B4[triggers-webhook ~100MB RAM] end subgraph "Namespace: flux-system (CI/CD Infrastructure)
~230MB baseline" C1[source-controller ~50MB RAM] C2[kustomize-controller ~50MB RAM] C3[helm-controller ~50MB RAM] C4[notification-controller ~30MB RAM] end subgraph "Namespace: bakery-ia (YOUR APPLICATION)" D1[19 microservices] D2[22 PostgreSQL databases] D3[Redis] D4[RabbitMQ] D5[MinIO] end end note1["CI/CD Total: ~1.5GB baseline"] note2["During builds: +512MB per concurrent build (Tekton spawns pods)"] ``` ### Key Points - Everything runs as pods - Gitea, Tekton, Flux are all containerized - Pulled from public registries once - then cached on your VPS - Your app images stay local - built by Tekton, stored in Gitea registry - No external dependencies after setup - fully self-contained - Flux pulls from Git - no incoming webhooks needed for deployments --- ## Phase 1: Infrastructure Setup ### Objective Deploy CI/CD infrastructure components without affecting existing applications. ### Step-by-Step Implementation #### Step 1: Prepare MicroK8s Cluster ```bash # SSH to VPS ssh admin@bakery-ia-vps # Verify MicroK8s status microk8s status # Enable required addons microk8s enable dns storage ingress fluxcd # Verify storage class microk8s kubectl get storageclass ``` #### Step 2: Deploy Gitea **Create Gitea values file** (`infrastructure/ci-cd/gitea/values.yaml`): ```yaml service: type: ClusterIP httpPort: 3000 sshPort: 2222 persistence: enabled: true size: 50Gi storageClass: "microk8s-hostpath" gitea: config: server: DOMAIN: gitea.bakery-ia.local SSH_DOMAIN: gitea.bakery-ia.local ROOT_URL: http://gitea.bakery-ia.local repository: ENABLE_PUSH_CREATE_USER: true ENABLE_PUSH_CREATE_ORG: true registry: ENABLED: true postgresql: enabled: true persistence: size: 20Gi ``` **Deploy Gitea**: ```bash # Add Helm repo microk8s helm repo add gitea https://dl.gitea.io/charts # Create namespace microk8s kubectl create namespace gitea # Install Gitea microk8s helm install gitea gitea/gitea \ -n gitea \ -f infrastructure/ci-cd/gitea/values.yaml ``` **Verify Deployment**: ```bash # Check pods microk8s kubectl get pods -n gitea # Get admin password microk8s kubectl get secret -n gitea gitea-admin-secret -o jsonpath='{.data.password}' | base64 -d ``` #### Step 3: Configure Ingress for Gitea **Create Ingress Resource** (`infrastructure/ci-cd/gitea/ingress.yaml`): ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: gitea-ingress namespace: gitea annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: rules: - host: gitea.bakery-ia.local http: paths: - path: / pathType: Prefix backend: service: name: gitea-http port: number: 3000 ``` **Apply Ingress**: ```bash microk8s kubectl apply -f infrastructure/ci-cd/gitea/ingress.yaml ``` #### Step 4: Migrate Repository from GitHub **Manual Migration Steps**: 1. Create new repository in Gitea UI 2. Use git mirror to push existing repo: ```bash # Clone bare repo from GitHub git clone --bare git@github.com:your-org/bakery-ia.git # Push to Gitea cd bakery-ia.git git push --mirror http://admin:PASSWORD@gitea.bakery-ia.local/your-org/bakery-ia.git ``` #### Step 5: Deploy Tekton **Install Tekton Pipelines**: ```bash # Create namespace microk8s kubectl create namespace tekton-pipelines # Install Tekton Pipelines microk8s kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml # Install Tekton Triggers microk8s kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml # Install Tekton Dashboard (optional) microk8s kubectl apply -f https://storage.googleapis.com/tekton-releases/dashboard/latest/release.yaml ``` **Verify Installation**: ```bash microk8s kubectl get pods -n tekton-pipelines ``` #### Step 6: Configure Tekton for Gitea Integration **Create Gitea Webhook Secret**: ```bash # Generate webhook secret WEBHOOK_SECRET=$(openssl rand -hex 20) # Create secret microk8s kubectl create secret generic gitea-webhook-secret \ -n tekton-pipelines \ --from-literal=secretToken=$WEBHOOK_SECRET ``` **Configure Gitea Webhook**: 1. Go to Gitea repository settings 2. Add webhook: - URL: `http://tekton-triggers.tekton-pipelines.svc.cluster.local:8080` - Secret: Use the generated `WEBHOOK_SECRET` - Trigger: Push events #### Step 7: Verify Flux Installation **Check Flux Components**: ```bash microk8s kubectl get pods -n flux-system # Verify CRDs microk8s kubectl get crd | grep flux ``` --- ## Phase 2: CI/CD Pipeline Configuration ### Objective Configure pipelines to build, test, and deploy services automatically. ### Step-by-Step Implementation #### Step 1: Create Tekton Tasks **Git Clone Task** (`infrastructure/ci-cd/tekton/tasks/git-clone.yaml`): ```yaml apiVersion: tekton.dev/v1beta1 kind: Task metadata: name: git-clone namespace: tekton-pipelines spec: workspaces: - name: output params: - name: url type: string - name: revision type: string default: "main" steps: - name: clone image: alpine/git script: | git clone $(params.url) $(workspaces.output.path) cd $(workspaces.output.path) git checkout $(params.revision) ``` **Detect Changed Services Task** (`infrastructure/ci-cd/tekton/tasks/detect-changes.yaml`): ```yaml apiVersion: tekton.dev/v1beta1 kind: Task metadata: name: detect-changed-services namespace: tekton-pipelines spec: workspaces: - name: source results: - name: changed-services description: List of changed services steps: - name: detect image: alpine/git script: | cd $(workspaces.source.path) # Get list of changed files CHANGED_FILES=$(git diff --name-only HEAD~1 HEAD) # Map files to services CHANGED_SERVICES=() for file in $CHANGED_FILES; do if [[ $file == services/* ]]; then SERVICE=$(echo $file | cut -d'/' -f2) CHANGED_SERVICES+=($SERVICE) fi done # Remove duplicates and output echo $(printf "%s," "${CHANGED_SERVICES[@]}" | sed 's/,$//') | tee $(results.changed-services.path) ``` **Kaniko Build Task** (`infrastructure/ci-cd/tekton/tasks/kaniko-build.yaml`): ```yaml apiVersion: tekton.dev/v1beta1 kind: Task metadata: name: kaniko-build namespace: tekton-pipelines spec: workspaces: - name: source - name: docker-credentials params: - name: services type: string - name: registry type: string default: "gitea.bakery-ia.local:5000" steps: - name: build-and-push image: gcr.io/kaniko-project/executor:v1.9.0 args: - --dockerfile=$(workspaces.source.path)/services/$(params.services)/Dockerfile - --context=$(workspaces.source.path) - --destination=$(params.registry)/bakery/$(params.services):$(params.git-revision) volumeMounts: - name: docker-config mountPath: /kaniko/.docker ``` #### Step 2: Create Tekton Pipeline **Main CI Pipeline** (`infrastructure/ci-cd/tekton/pipelines/ci-pipeline.yaml`): ```yaml apiVersion: tekton.dev/v1beta1 kind: Pipeline metadata: name: bakery-ia-ci namespace: tekton-pipelines spec: workspaces: - name: shared-workspace - name: docker-credentials params: - name: git-url type: string - name: git-revision type: string tasks: - name: fetch-source taskRef: name: git-clone workspaces: - name: output workspace: shared-workspace params: - name: url value: $(params.git-url) - name: revision value: $(params.git-revision) - name: detect-changes runAfter: [fetch-source] taskRef: name: detect-changed-services workspaces: - name: source workspace: shared-workspace - name: build-and-push runAfter: [detect-changes] taskRef: name: kaniko-build workspaces: - name: source workspace: shared-workspace - name: docker-credentials workspace: docker-credentials params: - name: services value: $(tasks.detect-changes.results.changed-services) - name: registry value: "gitea.bakery-ia.local:5000" ``` #### Step 3: Create Tekton Trigger **Trigger Template** (`infrastructure/ci-cd/tekton/triggers/trigger-template.yaml`): ```yaml apiVersion: triggers.tekton.dev/v1alpha1 kind: TriggerTemplate metadata: name: bakery-ia-trigger-template namespace: tekton-pipelines spec: params: - name: git-repo-url - name: git-revision resourcetemplates: - apiVersion: tekton.dev/v1beta1 kind: PipelineRun metadata: generateName: bakery-ia-ci-run- spec: pipelineRef: name: bakery-ia-ci workspaces: - name: shared-workspace volumeClaimTemplate: spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 1Gi - name: docker-credentials secret: secretName: gitea-registry-credentials params: - name: git-url value: $(params.git-repo-url) - name: git-revision value: $(params.git-revision) ``` **Trigger Binding** (`infrastructure/ci-cd/tekton/triggers/trigger-binding.yaml`): ```yaml apiVersion: triggers.tekton.dev/v1alpha1 kind: TriggerBinding metadata: name: bakery-ia-trigger-binding namespace: tekton-pipelines spec: params: - name: git-repo-url value: $(body.repository.clone_url) - name: git-revision value: $(body.head_commit.id) ``` **Event Listener** (`infrastructure/ci-cd/tekton/triggers/event-listener.yaml`): ```yaml apiVersion: triggers.tekton.dev/v1alpha1 kind: EventListener metadata: name: bakery-ia-listener namespace: tekton-pipelines spec: serviceAccountName: tekton-triggers-sa triggers: - name: bakery-ia-trigger bindings: - ref: bakery-ia-trigger-binding template: ref: bakery-ia-trigger-template ``` #### Step 4: Configure Flux for GitOps **Git Repository Source** (`infrastructure/ci-cd/flux/git-repository.yaml`): ```yaml apiVersion: source.toolkit.fluxcd.io/v1 kind: GitRepository metadata: name: bakery-ia namespace: flux-system spec: interval: 1m url: http://gitea.bakery-ia.local/your-org/bakery-ia.git ref: branch: main secretRef: name: gitea-credentials ``` **Kustomization for Production** (`infrastructure/ci-cd/flux/kustomization.yaml`): ```yaml apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: bakery-ia-prod namespace: flux-system spec: interval: 5m path: ./infrastructure/kubernetes/overlays/prod prune: true sourceRef: kind: GitRepository name: bakery-ia targetNamespace: bakery-ia ``` #### Step 5: Apply All Configurations ```bash # Apply Tekton tasks microk8s kubectl apply -f infrastructure/ci-cd/tekton/tasks/ # Apply Tekton pipeline microk8s kubectl apply -f infrastructure/ci-cd/tekton/pipelines/ # Apply Tekton triggers microk8s kubectl apply -f infrastructure/ci-cd/tekton/triggers/ # Apply Flux configurations microk8s kubectl apply -k infrastructure/ci-cd/flux/ ``` --- ## Phase 3: Monitoring and Observability ### Objective Integrate SigNoz with CI/CD pipelines for comprehensive monitoring. ### Step-by-Step Implementation #### Step 1: Configure OpenTelemetry for Tekton **Install OpenTelemetry Collector** (`infrastructure/ci-cd/monitoring/otel-collector.yaml`): ```yaml apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: tekton-otel namespace: tekton-pipelines spec: config: | receivers: otlp: protocols: grpc: http: processors: batch: exporters: otlp: endpoint: "signoz-otel-collector.monitoring.svc.cluster.local:4317" tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp] metrics: receivers: [otlp] processors: [batch] exporters: [otlp] ``` **Apply Configuration**: ```bash microk8s kubectl apply -f infrastructure/ci-cd/monitoring/otel-collector.yaml ``` #### Step 2: Instrument Tekton Pipelines **Update Pipeline with Tracing** (add to `ci-pipeline.yaml`): ```yaml spec: tasks: - name: fetch-source taskRef: name: git-clone # Add OpenTelemetry sidecar sidecars: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.70.0 args: ["--config=/etc/otel-collector-config.yaml"] volumeMounts: - name: otel-config mountPath: /etc/otel-collector-config.yaml subPath: otel-collector-config.yaml volumes: - name: otel-config configMap: name: otel-collector-config ``` #### Step 3: Configure SigNoz Dashboards **Create CI/CD Dashboard**: 1. Log in to SigNoz UI 2. Create new dashboard: "CI/CD Pipeline Metrics" 3. Add panels: - Pipeline execution time - Success/failure rates - Build duration by service - Resource usage during builds **Create Deployment Dashboard**: 1. Create dashboard: "GitOps Deployment Metrics" 2. Add panels: - Flux reconciliation events - Deployment frequency - Rollback events - Resource changes --- ## Phase 4: Testing and Validation ### Objective Validate CI/CD pipeline functionality without affecting production. ### Test Plan #### Test 1: Gitea Functionality - **Test**: Push code to Gitea repository - **Expected**: Code appears in Gitea UI, webhook triggers - **Validation**: ```bash # Push test commit cd bakery-ia echo "test" > test-file.txt git add test-file.txt git commit -m "Test CI/CD" git push origin main ``` #### Test 2: Tekton Pipeline Trigger - **Test**: Verify pipeline triggers on push - **Expected**: PipelineRun created in tekton-pipelines namespace - **Validation**: ```bash # Check PipelineRuns microk8s kubectl get pipelineruns -n tekton-pipelines ``` #### Test 3: Change Detection - **Test**: Modify single service and verify only that service builds - **Expected**: Only changed service is built and pushed - **Validation**: ```bash # Check build logs microk8s kubectl logs -n tekton-pipelines -c build-and-push ``` #### Test 4: Image Registry - **Test**: Verify images pushed to Gitea registry - **Expected**: New image appears in registry - **Validation**: ```bash # List images in registry curl -X GET http://gitea.bakery-ia.local/api/v2/repositories/bakery/auth-service/tags ``` #### Test 5: Flux Deployment - **Test**: Verify Flux detects and applies changes - **Expected**: New deployment in bakery-ia namespace - **Validation**: ```bash # Check Flux reconciliation microk8s kubectl get kustomizations -n flux-system # Check deployments microk8s kubectl get deployments -n bakery-ia ``` #### Test 6: Rollback - **Test**: Verify rollback capability - **Expected**: Previous version redeployed successfully - **Validation**: ```bash # Rollback via Git git revert git push origin main # Verify rollback microk8s kubectl get pods -n bakery-ia -w ``` --- ## Phase 5: Rollout and Migration ### Objective Gradually migrate from manual to automated CI/CD. ### Migration Strategy #### Step 1: Parallel Run - Run automated CI/CD alongside manual process - Compare results for 1 week - Monitor with SigNoz #### Step 2: Canary Deployment - Start with non-critical services: - auth-service - tenant-service - training-service - Monitor stability and performance #### Step 3: Full Migration - Migrate all services to automated pipeline - Disable manual deployment scripts - Update documentation #### Step 4: Cleanup - Remove old Tiltfile/Skaffold configurations - Archive manual deployment scripts - Update team documentation --- ## Risk Assessment ### Identified Risks | Risk | Likelihood | Impact | Mitigation Strategy | |------|------------|--------|---------------------| | Pipeline fails to detect changes | Medium | High | Manual override procedure, detailed logging | | Resource exhaustion during builds | High | Medium | Resource quotas, build queue limits | | Registry storage fills up | Medium | Medium | Automated cleanup policy, monitoring alerts | | Flux applies incorrect configuration | Low | High | Manual approval for first run, rollback testing | | Network issues between components | Medium | High | Health checks, retry logic | ### Mitigation Plan 1. **Resource Management**: - Set resource quotas for CI/CD namespaces - Limit concurrent builds to 2 - Monitor with SigNoz alerts 2. **Backup Strategy**: - Regular backups of Gitea (repos + registry) - Backup Flux configurations - Database backups for all services 3. **Rollback Plan**: - Document manual rollback procedures - Test rollback for each service - Maintain backup of manual deployment scripts 4. **Monitoring Alerts**: - Pipeline failure alerts - Resource threshold alerts - Deployment failure alerts --- ## Success Metrics ### Quantitative Metrics 1. **Deployment Frequency**: Increase from manual to automated deployments 2. **Lead Time for Changes**: Reduce from hours to minutes 3. **Change Failure Rate**: Maintain or reduce current rate 4. **Mean Time to Recovery**: Improve with automated rollbacks 5. **Resource Utilization**: Monitor CI/CD overhead (< 2GB baseline) ### Qualitative Metrics 1. **Developer Satisfaction**: Survey team on CI/CD experience 2. **Deployment Confidence**: Reduced "works on my machine" issues 3. **Auditability**: Full traceability of all deployments 4. **Reliability**: Consistent deployment outcomes --- ## Appendices ### Appendix A: Required Tools and Versions - MicroK8s: v1.27+ - Gitea: v1.19+ - Tekton Pipelines: v0.47+ - Flux CD: v2.0+ - SigNoz: v0.20+ - Kaniko: v1.9+ ### Appendix B: Network Requirements - Internal DNS: `gitea.bakery-ia.local` - Ingress: Configured for Gitea and SigNoz - Network Policies: Allow communication between namespaces ### Appendix C: Backup Procedures ```bash # Backup Gitea microk8s kubectl exec -n gitea gitea-0 -- gitea dump -c /data/gitea/conf/app.ini # Backup Flux configurations microk8s kubectl get all -n flux-system -o yaml > flux-backup.yaml # Backup Tekton configurations microk8s kubectl get all -n tekton-pipelines -o yaml > tekton-backup.yaml ``` ### Appendix D: Troubleshooting Guide **Issue: Pipeline not triggering** - Check Gitea webhook logs - Verify EventListener pods - Check TriggerBinding configuration **Issue: Build fails** - Check Kaniko logs - Verify Dockerfile paths - Ensure registry credentials are correct **Issue: Flux not applying changes** - Check GitRepository status - Verify Kustomization reconciliation - Check Flux logs --- ## Conclusion This implementation plan provides a clear path to transition from manual deployments to a fully automated, self-hosted CI/CD system. By following the phased approach, we minimize risk while maximizing the benefits of automation, observability, and reliability. ### Next Steps 1. Review and approve this plan 2. Schedule Phase 1 implementation 3. Assign team members to specific tasks 4. Begin infrastructure setup **Approval**: - [ ] Team Lead - [ ] DevOps Engineer - [ ] Security Review **Implementation Start Date**: _______________ **Target Completion Date**: _______________