Add new infra architecture 5

This commit is contained in:
Urtzi Alfaro
2026-01-19 15:15:04 +01:00
parent e96405b828
commit b78399da2c
84 changed files with 1027 additions and 2125 deletions

View File

@@ -0,0 +1,338 @@
# Mailu Deployment Architecture for Bakery-IA Project
## Executive Summary
This document outlines the recommended architecture for deploying Mailu email services across development and production environments for the Bakery-IA project. The solution addresses DNSSEC validation requirements while maintaining consistency across different Kubernetes platforms.
## Environment Overview
### Development Environment
- **Platform**: Kind (Kubernetes in Docker) or Colima
- **Purpose**: Local development and testing
- **Characteristics**: Ephemeral, single-node, resource-constrained
### Production Environment
- **Platform**: MicroK8s on Ubuntu VPS
- **Purpose**: Production email services
- **Characteristics**: Single-node or small cluster, persistent storage, production-grade reliability
## Core Requirements
1. **DNSSEC Validation**: Mailu v1.9+ requires DNSSEC-validating resolver
2. **Cross-Environment Consistency**: Unified approach for dev and prod
3. **Resource Efficiency**: Optimized for constrained environments
4. **Reliability**: Production-grade availability and monitoring
## Architectural Solution
### Unified DNS Resolution Strategy
**Recommended Approach**: Deploy Unbound as a dedicated DNSSEC-validating resolver pod in both environments
#### Benefits:
- ✅ Consistent behavior across dev and prod
- ✅ Meets Mailu's DNSSEC requirements
- ✅ Privacy-preserving (no external DNS queries)
- ✅ Avoids rate-limiting from public DNS providers
- ✅ Full control over DNS resolution
### Implementation Components
#### 1. Unbound Deployment Manifest
```yaml
# unbound.yaml - Cross-environment compatible
apiVersion: apps/v1
kind: Deployment
metadata:
name: unbound-resolver
namespace: mailu
labels:
app: unbound
component: dns
spec:
replicas: 1 # Scale to 2+ in production with anti-affinity
selector:
matchLabels:
app: unbound
template:
metadata:
labels:
app: unbound
component: dns
spec:
containers:
- name: unbound
image: mvance/unbound:latest
ports:
- containerPort: 53
name: dns-udp
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "384Mi"
readinessProbe:
exec:
command: ["drill", "@127.0.0.1", "-p", "53", "+dnssec", "example.org"]
initialDelaySeconds: 10
periodSeconds: 30
securityContext:
capabilities:
add: ["NET_BIND_SERVICE"]
---
apiVersion: v1
kind: Service
metadata:
name: unbound-dns
namespace: mailu
spec:
selector:
app: unbound
ports:
- name: dns-udp
port: 53
targetPort: 53
protocol: UDP
- name: dns-tcp
port: 53
targetPort: 53
protocol: TCP
```
#### 2. Mailu Configuration (values.yaml)
```yaml
# Production-tuned Mailu configuration
dnsPolicy: None
dnsConfig:
nameservers:
- "10.152.183.x" # Replace with actual unbound service IP
# Component-specific DNS configuration
admin:
dnsPolicy: None
dnsConfig:
nameservers:
- "10.152.183.x"
rspamd:
dnsPolicy: None
dnsConfig:
nameservers:
- "10.152.183.x"
# Environment-specific configurations
persistence:
enabled: true
# Development: use default storage class
# Production: use microk8s-hostpath or longhorn
storageClass: "standard"
replicas: 1 # Increase in production as needed
# Security settings
secretKey: "generate-strong-key-here"
# Ingress configuration
# Use existing Bakery-IA ingress controller
```
### Environment-Specific Adaptations
#### Development (Kind/Colima)
**Optimizations:**
- Use hostPath volumes for persistence
- Reduce resource requests/limits
- Disable or simplify monitoring
- Use NodePort for external access
**Deployment:**
```bash
# Apply unbound
kubectl apply -f unbound.yaml
# Get unbound service IP
UNBOUND_IP=$(kubectl get svc unbound-dns -n mailu -o jsonpath='{.spec.clusterIP}')
# Deploy Mailu with dev-specific values
helm upgrade --install mailu mailu/mailu \
--namespace mailu \
-f values-dev.yaml \
--set dnsConfig.nameservers[0]=$UNBOUND_IP
```
#### Production (MicroK8s/Ubuntu)
**Enhancements:**
- Use Longhorn or OpenEBS for storage
- Enable monitoring and logging
- Configure proper ingress with TLS
- Set up backup solutions
**Deployment:**
```bash
# Enable required MicroK8s addons
microk8s enable dns storage ingress metallb
# Apply unbound
kubectl apply -f unbound.yaml
# Get unbound service IP
UNBOUND_IP=$(kubectl get svc unbound-dns -n mailu -o jsonpath='{.spec.clusterIP}')
# Deploy Mailu with production values
helm upgrade --install mailu mailu/mailu \
--namespace mailu \
-f values-prod.yaml \
--set dnsConfig.nameservers[0]=$UNBOUND_IP
```
## Verification Procedures
### DNSSEC Validation Test
```bash
# From within a Mailu pod
kubectl exec -it -n mailu deploy/mailu-admin -- bash
# Test DNSSEC validation
dig @unbound-dns +short +dnssec +adflag example.org A
# Should show AD flag in response
```
### Service Health Checks
```bash
# Check unbound service
kubectl get pods -n mailu -l app=unbound
kubectl logs -n mailu -l app=unbound
# Check Mailu components
kubectl get pods -n mailu
kubectl logs -n mailu -l app.kubernetes.io/name=mailu
```
## Monitoring and Maintenance
### Production Monitoring Setup
```yaml
# Example monitoring configuration for production
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: unbound-monitor
namespace: mailu
spec:
selector:
matchLabels:
app: unbound
endpoints:
- port: dns-tcp
interval: 30s
path: /metrics
```
### Backup Strategy
**Production:**
- Daily Velero backups of Mailu namespace
- Weekly database dumps
- Monthly full cluster snapshots
**Development:**
- On-demand backups before major changes
- Volume snapshots for critical data
## Troubleshooting Guide
### Common Issues and Solutions
**Issue: DNSSEC validation failures**
- Verify unbound pod logs
- Check network policies
- Test DNS resolution from within pods
**Issue: Mailu pods failing to start**
- Confirm DNS configuration in values.yaml
- Verify unbound service is reachable
- Check resource availability
**Issue: Performance problems**
- Monitor CPU/memory usage
- Adjust resource limits
- Consider scaling replicas
## Migration Path
### From Development to Production
1. **Configuration Migration**
- Update storage class from hostPath to production storage
- Adjust resource requests/limits
- Enable monitoring and logging
2. **Data Migration**
- Export development data
- Import into production environment
- Verify data integrity
3. **DNS Configuration**
- Update DNS records to point to production
- Verify TLS certificates
- Test email delivery
## Security Considerations
### Production Security Hardening
1. **Network Security**
- Implement network policies
- Restrict ingress/egress traffic
- Use TLS for all external communications
2. **Access Control**
- Implement RBAC for Mailu namespace
- Restrict admin access
- Use strong authentication
3. **Monitoring and Alerting**
- Set up anomaly detection
- Configure alert thresholds
- Implement log retention policies
## Cost Optimization
### Resource Management
**Development:**
- Use minimal resource allocations
- Scale down when not in use
- Clean up unused resources
**Production:**
- Right-size resource requests
- Implement auto-scaling where possible
- Monitor and optimize usage patterns
## Conclusion
This architecture provides a robust, consistent solution for deploying Mailu across development and production environments. By using Unbound as a dedicated DNSSEC-validating resolver, we ensure compliance with Mailu's requirements while maintaining flexibility and reliability across different Kubernetes platforms.
The solution is designed to be:
- **Consistent**: Same core architecture across environments
- **Reliable**: Production-grade availability and monitoring
- **Efficient**: Optimized resource usage
- **Maintainable**: Clear documentation and troubleshooting guides
This approach aligns with the Bakery-IA project's requirements for a secure, reliable email infrastructure that can be consistently deployed across different environments.

View File

@@ -473,13 +473,16 @@ k8s_image_json_path(
# Redis & RabbitMQ
k8s_resource('redis', resource_deps=['security-setup'], labels=['01-infrastructure'])
k8s_resource('rabbitmq', labels=['01-infrastructure'])
k8s_resource('rabbitmq', resource_deps=['security-setup'], labels=['01-infrastructure'])
k8s_resource('nominatim', labels=['01-infrastructure'])
# MinIO Storage
k8s_resource('minio', resource_deps=['security-setup'], labels=['01-infrastructure'])
k8s_resource('minio-bucket-init', resource_deps=['minio'], labels=['01-infrastructure'])
# Unbound DNSSEC Resolver - Infrastructure component for Mailu DNS validation
k8s_resource('unbound-resolver', resource_deps=['security-setup'], labels=['01-infrastructure'])
# Mail Infrastructure (Mailu) - Manual trigger for Helm deployment
local_resource(
'mailu-helm',
@@ -542,6 +545,7 @@ local_resource(
auto_init=False, # Manual trigger only
)
# =============================================================================
# MONITORING RESOURCES - SigNoz (Unified Observability)
# =============================================================================

View File

@@ -433,6 +433,45 @@ microk8s enable prometheus
microk8s enable registry
```
### Step 3: Enhanced Infrastructure Components
**The platform includes additional infrastructure components that enhance security, monitoring, and operations:**
```bash
# The platform includes Mailu for email services
# Deploy Mailu via Helm (optional but recommended for production):
kubectl create namespace bakery-ia --dry-run=client -o yaml | kubectl apply -f -
helm repo add mailu https://mailu.github.io/helm-charts
helm repo update
helm install mailu mailu/mailu \
-n bakery-ia \
-f infrastructure/platform/mail/mailu-helm/values.yaml \
--timeout 10m \
--wait
# Verify Mailu deployment
kubectl get pods -n bakery-ia | grep mailu
```
**For development environments, ensure the prepull-base-images script is run:**
```bash
# On your local machine, run the prepull script to cache base images
cd bakery-ia
chmod +x scripts/prepull-base-images.sh
./scripts/prepull-base-images.sh
```
**For production environments, ensure CI/CD infrastructure is properly configured:**
```bash
# Tekton Pipelines for CI/CD (optional - can be deployed separately)
kubectl create namespace tekton-pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
# Flux CD for GitOps (already enabled in MicroK8s if needed)
# flux install --namespace=flux-system --network-policy=false
```
### Step 3: Configure Firewall
```bash
@@ -917,7 +956,34 @@ echo -n "your-value-here" | base64
**CRITICAL:** Never commit real secrets to git! The secrets.yaml file should be in `.gitignore`.
### Step 2: Apply Application Secrets
### Step 2: CI/CD Secrets Configuration
**For production CI/CD setup, additional secrets are required:**
```bash
# Create Docker Hub credentials secret (for image pulls)
kubectl create secret docker-registry dockerhub-creds \
--docker-server=docker.io \
--docker-username=YOUR_DOCKERHUB_USERNAME \
--docker-password=YOUR_DOCKERHUB_TOKEN \
--docker-email=your-email@example.com \
-n bakery-ia
# Create Gitea registry credentials (if using Gitea for CI/CD)
kubectl create secret docker-registry gitea-registry-credentials \
-n tekton-pipelines \
--docker-server=gitea.bakery-ia.local:5000 \
--docker-username=your-username \
--docker-password=your-password
# Create Git credentials for Flux (if using GitOps)
kubectl create secret generic gitea-credentials \
-n flux-system \
--from-literal=username=your-username \
--from-literal=password=your-password
```
### Step 3: Apply Application Secrets
```bash
# Copy manifests to VPS (from local machine)
@@ -938,7 +1004,30 @@ kubectl get secrets -n bakery-ia
## Database Migrations
### Step 0: Deploy SigNoz Monitoring (BEFORE Application)
### Step 0: Deploy CI/CD Infrastructure (Optional but Recommended)
**For production environments, deploy CI/CD infrastructure components:**
```bash
# Deploy Tekton Pipelines for CI/CD (optional but recommended for production)
kubectl create namespace tekton-pipelines
# Install Tekton Pipelines
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
# Install Tekton Triggers
kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
# Apply Tekton configurations
kubectl apply -f ~/infrastructure/cicd/tekton/tasks/
kubectl apply -f ~/infrastructure/cicd/tekton/pipelines/
kubectl apply -f ~/infrastructure/cicd/tekton/triggers/
# Verify Tekton deployment
kubectl get pods -n tekton-pipelines
```
### Step 1: Deploy SigNoz Monitoring (BEFORE Application)
**⚠️ CRITICAL:** SigNoz must be deployed BEFORE the application into the **bakery-ia namespace** because the production kustomization patches SigNoz resources.
@@ -975,7 +1064,7 @@ kubectl get statefulset -n bakery-ia | grep signoz
**⚠️ Important:** Do NOT create a separate `signoz` namespace. SigNoz must be in `bakery-ia` namespace for the overlays to work correctly.
### Step 1: Deploy Application and Databases
### Step 2: Deploy Application and Databases
```bash
# On VPS
@@ -1271,6 +1360,88 @@ kubectl logs -n bakery-ia deployment/signoz-otel-collector --tail=50 | grep -i "
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep filelog
```
### Step 2: Configure CI/CD Infrastructure (Optional but Recommended)
If you deployed the CI/CD infrastructure, configure it for your workflow:
#### Gitea Setup (Git Server + Registry)
```bash
# Access Gitea at: http://gitea.bakery-ia.local (for dev) or http://gitea.bakewise.ai (for prod)
# Make sure to add the appropriate hostname to /etc/hosts or configure DNS
# Create your repositories for each service
# Configure webhook to trigger Tekton pipelines
```
#### Tekton Pipeline Configuration
```bash
# Verify Tekton pipelines are running
kubectl get pods -n tekton-pipelines
# Create a PipelineRun manually to test:
kubectl create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: manual-ci-run
namespace: tekton-pipelines
spec:
pipelineRef:
name: bakery-ia-ci
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
- name: docker-credentials
secret:
secretName: gitea-registry-credentials
params:
- name: git-url
value: "http://gitea.bakery-ia.local/bakery/bakery-ia.git"
- name: git-revision
value: "main"
EOF
```
#### Flux CD Configuration (GitOps)
```bash
# Verify Flux is running
kubectl get pods -n flux-system
# Set up GitRepository and Kustomization resources for GitOps deployment
# Example:
cat <<EOF | kubectl apply -f -
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: bakery-ia
namespace: flux-system
spec:
interval: 1m
url: https://github.com/your-org/bakery-ia.git
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: bakery-ia
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: bakery-ia
path: ./infrastructure/environments/prod/k8s-manifests
prune: true
validation: client
EOF
```
### Step 2: Configure Alerting
SigNoz includes integrated alerting with AlertManager. Configure it for your team:

View File

@@ -12,14 +12,15 @@
1. [Overview](#overview)
2. [Monitoring & Observability](#monitoring--observability)
3. [Security Operations](#security-operations)
4. [Database Management](#database-management)
5. [Backup & Recovery](#backup--recovery)
6. [Performance Optimization](#performance-optimization)
7. [Scaling Operations](#scaling-operations)
8. [Incident Response](#incident-response)
9. [Maintenance Tasks](#maintenance-tasks)
10. [Compliance & Audit](#compliance--audit)
3. [CI/CD Operations](#ci-cd-operations)
4. [Security Operations](#security-operations)
5. [Database Management](#database-management)
6. [Backup & Recovery](#backup--recovery)
7. [Performance Optimization](#performance-optimization)
8. [Scaling Operations](#scaling-operations)
9. [Incident Response](#incident-response)
10. [Maintenance Tasks](#maintenance-tasks)
11. [Compliance & Audit](#compliance--audit)
---
@@ -33,6 +34,8 @@
- **Capacity:** 10-tenant pilot (scalable to 100+)
- **Security:** TLS encryption, RBAC, audit logging
- **Monitoring:** Prometheus, Grafana, AlertManager, SigNoz
- **CI/CD:** Tekton Pipelines, Gitea, Flux CD (GitOps)
- **Email:** Mailu (integrated email server)
**Key Metrics (10-tenant baseline):**
- **Uptime Target:** 99.5% (3.65 hours downtime/month)
@@ -46,11 +49,12 @@
| Role | Responsibilities |
|------|------------------|
| **DevOps Engineer** | Deployment, infrastructure, scaling |
| **DevOps Engineer** | Deployment, infrastructure, scaling, CI/CD |
| **SRE** | Monitoring, incident response, performance |
| **Security Admin** | Access control, security patches, compliance |
| **Database Admin** | Backups, optimization, migrations |
| **On-Call Engineer** | 24/7 incident response (if applicable) |
| **CI/CD Admin** | Pipeline management, GitOps workflows |
---
@@ -73,18 +77,6 @@ SigNoz is a comprehensive, open-source observability platform that provides:
- **Database Monitoring** - All 18 PostgreSQL databases + Redis + RabbitMQ
- **Kubernetes Monitoring** - Cluster, node, pod, and container metrics
**Port Forwarding (if ingress not available):**
```bash
# SigNoz Frontend (Main UI)
kubectl port-forward -n bakery-ia svc/signoz 8080:8080
# SigNoz AlertManager
kubectl port-forward -n bakery-ia svc/signoz-alertmanager 9093:9093
# OTel Collector (for debugging)
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4317:4317 # gRPC
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4318:4318 # HTTP
```
### Key SigNoz Dashboards and Features
@@ -340,6 +332,116 @@ kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep k8sattributes
---
## CI/CD Operations
### CI/CD Infrastructure Overview
The platform includes a complete CI/CD pipeline using:
- **Gitea** - Git server and container registry
- **Tekton** - Pipeline automation
- **Flux CD** - GitOps deployment
### Access CI/CD Systems
**Gitea (Git Server):**
- URL: http://gitea.bakery-ia.local (development) or http://gitea.bakewise.ai (production)
- Admin panel: http://gitea.bakery-ia.local/admin
**Tekton Dashboard:**
```bash
# Port forward to access Tekton dashboard
kubectl port-forward -n tekton-pipelines svc/tekton-dashboard 9097:9097
# Access at: http://localhost:9097
```
**Flux Status:**
```bash
# Check Flux status
flux check
kubectl get gitrepository -n flux-system
kubectl get kustomization -n flux-system
```
### CI/CD Monitoring
**Check pipeline status:**
```bash
# List all PipelineRuns
kubectl get pipelineruns -n tekton-pipelines
# Check Tekton controller logs
kubectl logs -n tekton-pipelines -l app=tekton-pipelines-controller
# Check Tekton dashboard logs
kubectl logs -n tekton-pipelines -l app=tekton-dashboard
```
**Monitor GitOps synchronization:**
```bash
# Check GitRepository status
kubectl get gitrepository -n flux-system -o wide
# Check Kustomization status
kubectl get kustomization -n flux-system -o wide
# Get reconciliation history
kubectl get events -n flux-system --sort-by='.lastTimestamp'
```
### CI/CD Troubleshooting
**Pipeline not triggering:**
```bash
# Check Gitea webhook logs
kubectl logs -n tekton-pipelines -l app=tekton-triggers-controller
# Verify EventListener pods are running
kubectl get pods -n tekton-pipelines -l app=tekton-triggers-eventlistener
# Check TriggerBinding configuration
kubectl get triggerbinding -n tekton-pipelines
```
**Build failures:**
```bash
# Check Kaniko logs for build errors
kubectl logs -n tekton-pipelines -l tekton.dev/task=kaniko-build
# Verify Dockerfile paths are correct
kubectl describe taskrun -n tekton-pipelines
```
**Flux not applying changes:**
```bash
# Check GitRepository status
kubectl describe gitrepository -n flux-system
# Check Kustomization reconciliation
kubectl describe kustomization -n flux-system
# Check Flux logs
kubectl logs -n flux-system -l app.kubernetes.io/name=helm-controller
```
### CI/CD Maintenance Tasks
**Daily Tasks:**
- [ ] Check for failed pipeline runs
- [ ] Verify GitOps synchronization status
- [ ] Clean up old PipelineRun resources
**Weekly Tasks:**
- [ ] Review pipeline performance metrics
- [ ] Update pipeline definitions if needed
- [ ] Rotate CI/CD secrets
**Monthly Tasks:**
- [ ] Update Tekton and Flux versions
- [ ] Review and optimize pipeline performance
- [ ] Audit CI/CD access permissions
---
## Security Operations
### Security Posture Overview
@@ -1210,6 +1312,8 @@ kubectl exec -n bakery-ia deployment/auth-db -- \
- [TLS Configuration](./tls-configuration.md) - Certificate management
- [RBAC Implementation](./rbac-implementation.md) - Access control configuration
- [Monitoring Stack README](../infrastructure/kubernetes/base/components/monitoring/README.md) - Detailed monitoring documentation
- [CI/CD Infrastructure README](../infrastructure/cicd/README.md) - Gitea, Tekton, and Flux CD setup and operations
- [SigNoz Monitoring README](../infrastructure/monitoring/signoz/README.md) - SigNoz deployment and configuration
**External Resources:**
- Kubernetes: https://kubernetes.io/docs

View File

@@ -28,6 +28,7 @@ metadata:
note: "Registry credentials for pushing images"
type: kubernetes.io/dockerconfigjson
stringData:
{{- if and .Values.secrets.registry.registryUrl .Values.secrets.registry.username .Values.secrets.registry.password }}
.dockerconfigjson: |
{
"auths": {
@@ -37,6 +38,9 @@ stringData:
}
}
}
{{- else }}
.dockerconfigjson: '{"auths":{}}'
{{- end }}
---
# Secret for Git credentials (used by pipeline to push GitOps updates)
apiVersion: v1

View File

@@ -83,6 +83,10 @@ images:
- name: bitnami/kubectl
newName: localhost:5000/bitnami_kubectl_latest
newTag: latest
# DNS resolver
- name: mvance/unbound
newName: localhost:5000/mvance_unbound_latest
newTag: latest
# Alpine variants
- name: alpine
newName: localhost:5000/alpine_3.19

View File

@@ -221,6 +221,9 @@ images:
newTag: latest
- name: bitnami/kubectl
newTag: latest
# DNS resolver
- name: mvance/unbound
newTag: latest
# Alpine variants
- name: alpine
newTag: "3.19"

View File

@@ -10,844 +10,3 @@ global:
clusterName: "bakery-ia-dev"
domain: "monitoring.bakery-ia.local"
# Docker Hub credentials - applied to all sub-charts (including Zookeeper, ClickHouse, etc)
imagePullSecrets:
- dockerhub-creds
# Docker Hub credentials for pulling images (root level for SigNoz components)
imagePullSecrets:
- dockerhub-creds
# SignOz Main Component (includes frontend and query service)
signoz:
replicaCount: 1
service:
type: ClusterIP
port: 8080
# DISABLE built-in ingress - using unified bakery-ingress instead
# Route configured in infrastructure/kubernetes/overlays/dev/dev-ingress.yaml
ingress:
enabled: false
resources:
requests:
cpu: 100m # Combined frontend + query service
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
# Environment variables (new format - replaces configVars)
env:
signoz_telemetrystore_provider: "clickhouse"
dot_metrics_enabled: "true"
signoz_emailing_enabled: "false"
signoz_alertmanager_provider: "signoz"
# Retention for dev (7 days)
signoz_traces_ttl_duration_hrs: "168"
signoz_metrics_ttl_duration_hrs: "168"
signoz_logs_ttl_duration_hrs: "168"
# OpAMP Server Configuration - DISABLED for dev (causes gRPC instability)
signoz_opamp_server_enabled: "false"
# signoz_opamp_server_endpoint: "0.0.0.0:4320"
persistence:
enabled: true
size: 5Gi
storageClass: "standard"
# AlertManager Configuration
alertmanager:
replicaCount: 1
image:
repository: signoz/alertmanager
tag: 0.23.5
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 9093
resources:
requests:
cpu: 25m # Reduced for local dev
memory: 64Mi # Reduced for local dev
limits:
cpu: 200m
memory: 256Mi
persistence:
enabled: true
size: 2Gi
storageClass: "standard"
config:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'default'
receivers:
- name: 'default'
# Add email, slack, webhook configs here
# ClickHouse Configuration - Time Series Database
# Minimal resources for local development on constrained Kind cluster
clickhouse:
enabled: true
installCustomStorageClass: false
image:
registry: docker.io
repository: clickhouse/clickhouse-server
tag: 25.5.6 # Official recommended version
# Reduce ClickHouse resource requests for local dev
clickhouse:
resources:
requests:
cpu: 200m # Reduced from default 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
persistence:
enabled: true
size: 20Gi
# Zookeeper Configuration (required by ClickHouse)
zookeeper:
enabled: true
replicaCount: 1 # Single replica for dev
image:
tag: 3.7.1 # Official recommended version
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
persistence:
enabled: true
size: 5Gi
# OpenTelemetry Collector - Data ingestion endpoint for all telemetry
otelCollector:
enabled: true
replicaCount: 1
image:
repository: signoz/signoz-otel-collector
tag: v0.129.12 # Latest recommended version
# OpAMP Configuration - DISABLED for development
# OpAMP is designed for production with remote config management
# In dev, it causes gRPC instability and collector reloads
# We use static configuration instead
# Init containers for the Otel Collector pod
initContainers:
fix-postgres-tls:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
- sh
- -c
- |
echo "Fixing PostgreSQL TLS file permissions..."
cp /etc/postgres-tls-source/* /etc/postgres-tls/
chmod 600 /etc/postgres-tls/server-key.pem
chmod 644 /etc/postgres-tls/server-cert.pem
chmod 644 /etc/postgres-tls/ca-cert.pem
echo "PostgreSQL TLS permissions fixed"
volumeMounts:
- name: postgres-tls-source
mountPath: /etc/postgres-tls-source
readOnly: true
- name: postgres-tls-fixed
mountPath: /etc/postgres-tls
readOnly: false
# Service configuration - expose both gRPC and HTTP endpoints
service:
type: ClusterIP
ports:
# gRPC receivers
- name: otlp-grpc
port: 4317
targetPort: 4317
protocol: TCP
# HTTP receivers
- name: otlp-http
port: 4318
targetPort: 4318
protocol: TCP
# Prometheus remote write
- name: prometheus
port: 8889
targetPort: 8889
protocol: TCP
# Metrics
- name: metrics
port: 8888
targetPort: 8888
protocol: TCP
resources:
requests:
cpu: 50m # Reduced from 100m
memory: 128Mi # Reduced from 256Mi
limits:
cpu: 500m
memory: 512Mi
# Additional environment variables for receivers
additionalEnvs:
POSTGRES_MONITOR_USER: "monitoring"
POSTGRES_MONITOR_PASSWORD: "monitoring_369f9c001f242b07ef9e2826e17169ca"
REDIS_PASSWORD: "OxdmdJjdVNXp37MNC2IFoMnTpfGGFv1k"
RABBITMQ_USER: "bakery"
RABBITMQ_PASSWORD: "forecast123"
# Mount TLS certificates for secure connections
extraVolumes:
- name: redis-tls
secret:
secretName: redis-tls-secret
- name: postgres-tls
secret:
secretName: postgres-tls
- name: postgres-tls-fixed
emptyDir: {}
- name: varlogpods
hostPath:
path: /var/log/pods
extraVolumeMounts:
- name: redis-tls
mountPath: /etc/redis-tls
readOnly: true
- name: postgres-tls
mountPath: /etc/postgres-tls-source
readOnly: true
- name: postgres-tls-fixed
mountPath: /etc/postgres-tls
readOnly: false
- name: varlogpods
mountPath: /var/log/pods
readOnly: true
# Disable OpAMP - use static configuration only
# Use 'args' instead of 'extraArgs' to completely override the command
command:
name: /signoz-otel-collector
args:
- --config=/conf/otel-collector-config.yaml
- --feature-gates=-pkg.translator.prometheus.NormalizeName
# OpenTelemetry Collector configuration
config:
# Connectors - bridge between pipelines
connectors:
signozmeter:
dimensions:
- name: service.name
- name: deployment.environment
- name: host.name
metrics_flush_interval: 1h
receivers:
# OTLP receivers for traces, metrics, and logs from applications
# All application telemetry is pushed via OTLP protocol
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins:
- "*"
# Filelog receiver for Kubernetes pod logs
# Collects container stdout/stderr from /var/log/pods
filelog:
include:
- /var/log/pods/*/*/*.log
exclude:
# Exclude SigNoz's own logs to avoid recursive collection
- /var/log/pods/bakery-ia_signoz-*/*/*.log
include_file_path: true
include_file_name: false
operators:
# Parse CRI-O / containerd log format
- type: regex_parser
regex: '^(?P<time>[^ ]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) (?P<log>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
# Fix timestamp parsing - extract from the parsed time field
- type: move
from: attributes.time
to: attributes.timestamp
# Extract Kubernetes metadata from file path
- type: regex_parser
id: extract_metadata_from_filepath
regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[^\/]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
parse_from: attributes["log.file.path"]
# Move metadata to resource attributes
- type: move
from: attributes.namespace
to: resource["k8s.namespace.name"]
- type: move
from: attributes.pod_name
to: resource["k8s.pod.name"]
- type: move
from: attributes.container_name
to: resource["k8s.container.name"]
- type: move
from: attributes.log
to: body
# Kubernetes Cluster Receiver - Collects cluster-level metrics
# Provides information about nodes, namespaces, pods, and other cluster resources
k8s_cluster:
collection_interval: 30s
node_conditions_to_report:
- Ready
- MemoryPressure
- DiskPressure
- PIDPressure
- NetworkUnavailable
allocatable_types_to_report:
- cpu
- memory
- pods
# PostgreSQL receivers for database metrics
# ENABLED: Monitor users configured and credentials stored in secrets
# Collects metrics directly from PostgreSQL databases with proper TLS
postgresql/auth:
endpoint: auth-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- auth_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/inventory:
endpoint: inventory-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- inventory_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/orders:
endpoint: orders-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- orders_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/ai-insights:
endpoint: ai-insights-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- ai_insights_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/alert-processor:
endpoint: alert-processor-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- alert_processor_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/distribution:
endpoint: distribution-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- distribution_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/external:
endpoint: external-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- external_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/forecasting:
endpoint: forecasting-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- forecasting_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/notification:
endpoint: notification-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- notification_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/orchestrator:
endpoint: orchestrator-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- orchestrator_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/pos:
endpoint: pos-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- pos_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/procurement:
endpoint: procurement-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- procurement_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/production:
endpoint: production-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- production_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/recipes:
endpoint: recipes-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- recipes_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/sales:
endpoint: sales-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- sales_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/suppliers:
endpoint: suppliers-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- suppliers_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/tenant:
endpoint: tenant-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- tenant_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/training:
endpoint: training-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- training_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
# Redis receiver for cache metrics
# ENABLED: Using existing credentials from redis-secrets with TLS
redis:
endpoint: redis-service.bakery-ia:6379
password: ${env:REDIS_PASSWORD}
collection_interval: 60s
transport: tcp
tls:
insecure_skip_verify: false
cert_file: /etc/redis-tls/redis-cert.pem
key_file: /etc/redis-tls/redis-key.pem
ca_file: /etc/redis-tls/ca-cert.pem
metrics:
redis.maxmemory:
enabled: true
redis.cmd.latency:
enabled: true
# RabbitMQ receiver via management API
# ENABLED: Using existing credentials from rabbitmq-secrets
rabbitmq:
endpoint: http://rabbitmq-service.bakery-ia:15672
username: ${env:RABBITMQ_USER}
password: ${env:RABBITMQ_PASSWORD}
collection_interval: 30s
# Prometheus Receiver - Scrapes metrics from Kubernetes API
# Simplified configuration using only Kubernetes API metrics
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-nodes-cadvisor'
scrape_interval: 30s
scrape_timeout: 10s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
scrape_interval: 30s
scrape_timeout: 10s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
processors:
# Batch processor for better performance (optimized for high throughput)
batch:
timeout: 1s
send_batch_size: 10000 # Increased from 1024 for better performance
send_batch_max_size: 10000
# Batch processor for meter data
batch/meter:
timeout: 1s
send_batch_size: 20000
send_batch_max_size: 25000
# Memory limiter to prevent OOM
memory_limiter:
check_interval: 1s
limit_mib: 400
spike_limit_mib: 100
# Resource detection
resourcedetection:
detectors: [env, system, docker]
timeout: 5s
# Kubernetes attributes processor - CRITICAL for logs
# Extracts pod, namespace, container metadata from log attributes
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
- k8s.container.name
labels:
- tag_name: "app"
- tag_name: "pod-template-hash"
annotations:
- tag_name: "description"
# SigNoz span metrics processor with delta aggregation (recommended)
# Generates RED metrics (Rate, Error, Duration) from trace spans
signozspanmetrics/delta:
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
metrics_exporter: signozclickhousemetrics
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s]
dimensions_cache_size: 100000
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: default
- name: signoz.collector.id
exporters:
# ClickHouse exporter for traces
clickhousetraces:
datasource: tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/?database=signoz_traces
timeout: 10s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
# ClickHouse exporter for metrics
signozclickhousemetrics:
dsn: "tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/signoz_metrics"
timeout: 10s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
# ClickHouse exporter for meter data (usage metrics)
signozclickhousemeter:
dsn: "tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/signoz_meter"
timeout: 45s
sending_queue:
enabled: false
# ClickHouse exporter for logs
clickhouselogsexporter:
dsn: tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/?database=signoz_logs
timeout: 10s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
# Metadata exporter for service metadata
metadataexporter:
dsn: "tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/signoz_metadata"
timeout: 10s
cache:
provider: in_memory
# Debug exporter for debugging (optional)
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
service:
pipelines:
# Traces pipeline - exports to ClickHouse and signozmeter connector
traces:
receivers: [otlp]
processors: [memory_limiter, batch, signozspanmetrics/delta, resourcedetection]
exporters: [clickhousetraces, metadataexporter, signozmeter]
# Metrics pipeline
metrics:
receivers: [otlp,
postgresql/auth, postgresql/inventory, postgresql/orders,
postgresql/ai-insights, postgresql/alert-processor, postgresql/distribution,
postgresql/external, postgresql/forecasting, postgresql/notification,
postgresql/orchestrator, postgresql/pos, postgresql/procurement,
postgresql/production, postgresql/recipes, postgresql/sales,
postgresql/suppliers, postgresql/tenant, postgresql/training,
redis, rabbitmq, k8s_cluster, prometheus]
processors: [memory_limiter, batch, resourcedetection]
exporters: [signozclickhousemetrics]
# Meter pipeline - receives from signozmeter connector
metrics/meter:
receivers: [signozmeter]
processors: [batch/meter]
exporters: [signozclickhousemeter]
# Logs pipeline - includes both OTLP and Kubernetes pod logs
logs:
receivers: [otlp, filelog]
processors: [memory_limiter, batch, resourcedetection, k8sattributes]
exporters: [clickhouselogsexporter]
# ClusterRole configuration for Kubernetes monitoring
# CRITICAL: Required for k8s_cluster receiver to access Kubernetes API
# Without these permissions, k8s metrics will not appear in SigNoz UI
clusterRole:
create: true
name: "signoz-otel-collector-bakery-ia"
annotations: {}
# Complete RBAC rules required by k8sclusterreceiver
# Based on OpenTelemetry and SigNoz official documentation
rules:
# Core API group - fundamental Kubernetes resources
- apiGroups: [""]
resources:
- "events"
- "namespaces"
- "nodes"
- "nodes/proxy"
- "nodes/metrics"
- "nodes/spec"
- "pods"
- "pods/status"
- "replicationcontrollers"
- "replicationcontrollers/status"
- "resourcequotas"
- "services"
- "endpoints"
verbs: ["get", "list", "watch"]
# Apps API group - modern workload controllers
- apiGroups: ["apps"]
resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
verbs: ["get", "list", "watch"]
# Batch API group - job management
- apiGroups: ["batch"]
resources: ["jobs", "cronjobs"]
verbs: ["get", "list", "watch"]
# Autoscaling API group - HPA metrics (CRITICAL)
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["get", "list", "watch"]
# Extensions API group - legacy support
- apiGroups: ["extensions"]
resources: ["deployments", "daemonsets", "replicasets"]
verbs: ["get", "list", "watch"]
# Metrics API group - resource metrics
- apiGroups: ["metrics.k8s.io"]
resources: ["nodes", "pods"]
verbs: ["get", "list", "watch"]
clusterRoleBinding:
annotations: {}
name: "signoz-otel-collector-bakery-ia"
# Additional Configuration
serviceAccount:
create: true
annotations: {}
name: "signoz-otel-collector"
# Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# Network Policies (disabled for dev)
networkPolicy:
enabled: false
# Monitoring SigNoz itself
selfMonitoring:
enabled: true
serviceMonitor:
enabled: false

View File

@@ -10,989 +10,3 @@ global:
clusterName: "bakery-ia-prod"
domain: "monitoring.bakewise.ai"
# Docker Hub credentials - applied to all sub-charts (including Zookeeper, ClickHouse, etc)
imagePullSecrets:
- dockerhub-creds
# Docker Hub credentials for pulling images (root level for SigNoz components)
imagePullSecrets:
- dockerhub-creds
# SigNoz Main Component (unified frontend + query service)
# BREAKING CHANGE: v0.89.0+ uses unified component instead of separate frontend/queryService
signoz:
replicaCount: 2
image:
repository: signoz/signoz
tag: v0.106.0 # Latest stable version
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 8080 # HTTP/API port
internalPort: 8085 # Internal gRPC port
# DISABLE built-in ingress - using unified bakery-ingress-prod instead
# Route configured in infrastructure/kubernetes/overlays/prod/prod-ingress.yaml
ingress:
enabled: false
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
# Pod Anti-affinity for HA
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: query-service
topologyKey: kubernetes.io/hostname
# Environment variables (new format - replaces configVars)
env:
signoz_telemetrystore_provider: "clickhouse"
dot_metrics_enabled: "true"
signoz_emailing_enabled: "true"
signoz_alertmanager_provider: "signoz"
# Retention configuration (30 days for prod)
signoz_traces_ttl_duration_hrs: "720"
signoz_metrics_ttl_duration_hrs: "720"
signoz_logs_ttl_duration_hrs: "720"
# OpAMP Server Configuration
# WARNING: OpAMP can cause gRPC instability and collector reloads
# Only enable if you have a stable OpAMP backend server
signoz_opamp_server_enabled: "false"
# signoz_opamp_server_endpoint: "0.0.0.0:4320"
# SMTP configuration for email alerts - now using Mailu as SMTP server
signoz_smtp_enabled: "true"
signoz_smtp_host: "mailu-postfix.bakery-ia.svc.cluster.local"
signoz_smtp_port: "587"
signoz_smtp_from: "alerts@bakewise.ai"
signoz_smtp_username: "alerts@bakewise.ai"
# Password should be set via secret: signoz_smtp_password
persistence:
enabled: true
size: 20Gi
storageClass: "standard"
# Horizontal Pod Autoscaler
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
# AlertManager Configuration
alertmanager:
enabled: true
replicaCount: 2
image:
repository: signoz/alertmanager
tag: 0.23.5
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 9093
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# Pod Anti-affinity for HA
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- signoz-alertmanager
topologyKey: kubernetes.io/hostname
persistence:
enabled: true
size: 5Gi
storageClass: "standard"
config:
global:
resolve_timeout: 5m
smtp_smarthost: 'mailu-postfix.bakery-ia.svc.cluster.local:587'
smtp_from: 'alerts@bakewise.ai'
smtp_auth_username: 'alerts@bakewise.ai'
smtp_auth_password: '${SMTP_PASSWORD}'
smtp_require_tls: true
route:
group_by: ['alertname', 'cluster', 'service', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'critical-alerts'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
continue: true
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'critical-alerts'
email_configs:
- to: 'critical-alerts@bakewise.ai'
headers:
Subject: '[CRITICAL] {{ .GroupLabels.alertname }} - Bakery IA'
# Slack webhook for critical alerts
slack_configs:
- api_url: '${SLACK_WEBHOOK_URL}'
channel: '#alerts-critical'
title: '[CRITICAL] {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
- name: 'warning-alerts'
email_configs:
- to: 'oncall@bakewise.ai'
headers:
Subject: '[WARNING] {{ .GroupLabels.alertname }} - Bakery IA'
# ClickHouse Configuration - Time Series Database
clickhouse:
enabled: true
installCustomStorageClass: false
image:
registry: docker.io
repository: clickhouse/clickhouse-server
tag: 25.5.6 # Updated to official recommended version
pullPolicy: IfNotPresent
# ClickHouse resources (nested config)
clickhouse:
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 4000m
memory: 8Gi
# Pod Anti-affinity for HA
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- signoz-clickhouse
topologyKey: kubernetes.io/hostname
persistence:
enabled: true
size: 100Gi
storageClass: "standard"
# Cold storage configuration for better disk space management
coldStorage:
enabled: true
defaultKeepFreeSpaceBytes: 10737418240 # Keep 10GB free
ttl:
deleteTTLDays: 30 # Move old data to cold storage after 30 days
# Zookeeper Configuration (required by ClickHouse for coordination)
zookeeper:
enabled: true
replicaCount: 3 # CRITICAL: Always use 3 replicas for production HA
image:
tag: 3.7.1 # Official recommended version
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
persistence:
enabled: true
size: 10Gi
storageClass: "standard"
# OpenTelemetry Collector - Integrated with SigNoz
otelCollector:
enabled: true
replicaCount: 2
image:
repository: signoz/signoz-otel-collector
tag: v0.129.12 # Updated to latest recommended version
pullPolicy: IfNotPresent
# Init containers for the Otel Collector pod
initContainers:
fix-postgres-tls:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
- sh
- -c
- |
echo "Fixing PostgreSQL TLS file permissions..."
cp /etc/postgres-tls-source/* /etc/postgres-tls/
chmod 600 /etc/postgres-tls/server-key.pem
chmod 644 /etc/postgres-tls/server-cert.pem
chmod 644 /etc/postgres-tls/ca-cert.pem
echo "PostgreSQL TLS permissions fixed"
volumeMounts:
- name: postgres-tls-source
mountPath: /etc/postgres-tls-source
readOnly: true
- name: postgres-tls-fixed
mountPath: /etc/postgres-tls
readOnly: false
service:
type: ClusterIP
ports:
- name: otlp-grpc
port: 4317
targetPort: 4317
protocol: TCP
- name: otlp-http
port: 4318
targetPort: 4318
protocol: TCP
- name: prometheus
port: 8889
targetPort: 8889
protocol: TCP
- name: metrics
port: 8888
targetPort: 8888
protocol: TCP
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
# Additional environment variables for receivers
additionalEnvs:
POSTGRES_MONITOR_USER: "monitoring"
POSTGRES_MONITOR_PASSWORD: "monitoring_369f9c001f242b07ef9e2826e17169ca"
REDIS_PASSWORD: "OxdmdJjdVNXp37MNC2IFoMnTpfGGFv1k"
RABBITMQ_USER: "bakery"
RABBITMQ_PASSWORD: "forecast123"
# Mount TLS certificates for secure connections
extraVolumes:
- name: redis-tls
secret:
secretName: redis-tls-secret
- name: postgres-tls
secret:
secretName: postgres-tls
- name: postgres-tls-fixed
emptyDir: {}
- name: varlogpods
hostPath:
path: /var/log/pods
extraVolumeMounts:
- name: redis-tls
mountPath: /etc/redis-tls
readOnly: true
- name: postgres-tls
mountPath: /etc/postgres-tls-source
readOnly: true
- name: postgres-tls-fixed
mountPath: /etc/postgres-tls
readOnly: false
- name: varlogpods
mountPath: /var/log/pods
readOnly: true
# Enable OpAMP for dynamic configuration management
command:
name: /signoz-otel-collector
extraArgs:
- --config=/conf/otel-collector-config.yaml
- --manager-config=/conf/otel-collector-opamp-config.yaml
- --feature-gates=-pkg.translator.prometheus.NormalizeName
# Full OTEL Collector Configuration
config:
# Connectors - bridge between pipelines
connectors:
signozmeter:
dimensions:
- name: service.name
- name: deployment.environment
- name: host.name
metrics_flush_interval: 1h
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
max_recv_msg_size_mib: 32 # Increased for larger payloads
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins:
- "https://monitoring.bakewise.ai"
- "https://*.bakewise.ai"
# Filelog receiver for Kubernetes pod logs
# Collects container stdout/stderr from /var/log/pods
filelog:
include:
- /var/log/pods/*/*/*.log
exclude:
# Exclude SigNoz's own logs to avoid recursive collection
- /var/log/pods/bakery-ia_signoz-*/*/*.log
include_file_path: true
include_file_name: false
operators:
# Parse CRI-O / containerd log format
- type: regex_parser
regex: '^(?P<time>[^ ]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) (?P<log>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
# Fix timestamp parsing - extract from the parsed time field
- type: move
from: attributes.time
to: attributes.timestamp
# Extract Kubernetes metadata from file path
- type: regex_parser
id: extract_metadata_from_filepath
regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[^\/]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
parse_from: attributes["log.file.path"]
# Move metadata to resource attributes
- type: move
from: attributes.namespace
to: resource["k8s.namespace.name"]
- type: move
from: attributes.pod_name
to: resource["k8s.pod.name"]
- type: move
from: attributes.container_name
to: resource["k8s.container.name"]
- type: move
from: attributes.log
to: body
# Kubernetes Cluster Receiver - Collects cluster-level metrics
# Provides information about nodes, namespaces, pods, and other cluster resources
k8s_cluster:
collection_interval: 30s
node_conditions_to_report:
- Ready
- MemoryPressure
- DiskPressure
- PIDPressure
- NetworkUnavailable
allocatable_types_to_report:
- cpu
- memory
- pods
# Prometheus receiver for scraping metrics
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-nodes-cadvisor'
scrape_interval: 30s
scrape_timeout: 10s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
scrape_interval: 30s
scrape_timeout: 10s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# Redis receiver for cache metrics
# ENABLED: Using existing credentials from redis-secrets with TLS
redis:
endpoint: redis-service.bakery-ia:6379
password: ${env:REDIS_PASSWORD}
collection_interval: 60s
transport: tcp
tls:
insecure_skip_verify: false
cert_file: /etc/redis-tls/redis-cert.pem
key_file: /etc/redis-tls/redis-key.pem
ca_file: /etc/redis-tls/ca-cert.pem
metrics:
redis.maxmemory:
enabled: true
redis.cmd.latency:
enabled: true
# RabbitMQ receiver via management API
# ENABLED: Using existing credentials from rabbitmq-secrets
rabbitmq:
endpoint: http://rabbitmq-service.bakery-ia:15672
username: ${env:RABBITMQ_USER}
password: ${env:RABBITMQ_PASSWORD}
collection_interval: 30s
# PostgreSQL receivers for database metrics
# Monitor all databases with proper TLS configuration
postgresql/auth:
endpoint: auth-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- auth_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/inventory:
endpoint: inventory-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- inventory_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/orders:
endpoint: orders-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- orders_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/ai-insights:
endpoint: ai-insights-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- ai_insights_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/alert-processor:
endpoint: alert-processor-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- alert_processor_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/distribution:
endpoint: distribution-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- distribution_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/external:
endpoint: external-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- external_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/forecasting:
endpoint: forecasting-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- forecasting_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/notification:
endpoint: notification-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- notification_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/orchestrator:
endpoint: orchestrator-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- orchestrator_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/pos:
endpoint: pos-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- pos_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/procurement:
endpoint: procurement-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- procurement_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/production:
endpoint: production-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- production_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/recipes:
endpoint: recipes-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- recipes_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/sales:
endpoint: sales-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- sales_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/suppliers:
endpoint: suppliers-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- suppliers_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/tenant:
endpoint: tenant-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- tenant_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
postgresql/training:
endpoint: training-db-service.bakery-ia:5432
username: ${env:POSTGRES_MONITOR_USER}
password: ${env:POSTGRES_MONITOR_PASSWORD}
databases:
- training_db
collection_interval: 60s
tls:
insecure: false
cert_file: /etc/postgres-tls/server-cert.pem
key_file: /etc/postgres-tls/server-key.pem
ca_file: /etc/postgres-tls/ca-cert.pem
processors:
# High-performance batch processing (official recommendation)
batch:
timeout: 1s # Reduced from 10s for faster processing
send_batch_size: 50000 # Increased from 2048 (official recommendation for traces)
send_batch_max_size: 50000
# Batch processor for meter data
batch/meter:
timeout: 1s
send_batch_size: 20000
send_batch_max_size: 25000
memory_limiter:
check_interval: 1s
limit_mib: 1500 # 75% of container memory (2Gi = ~2048Mi)
spike_limit_mib: 300
# Resource detection for K8s
resourcedetection:
detectors: [env, system, docker]
timeout: 5s
# Add resource attributes
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
- key: cluster.name
value: bakery-ia-prod
action: upsert
# Kubernetes attributes processor - CRITICAL for logs
# Extracts pod, namespace, container metadata from log attributes
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
- k8s.container.name
labels:
- tag_name: "app"
- tag_name: "pod-template-hash"
- tag_name: "version"
annotations:
- tag_name: "description"
# SigNoz span metrics processor with delta aggregation (recommended)
# Generates RED metrics (Rate, Error, Duration) from trace spans
signozspanmetrics/delta:
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
metrics_exporter: signozclickhousemetrics
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s]
dimensions_cache_size: 100000
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: production
- name: signoz.collector.id
exporters:
# ClickHouse exporter for traces
clickhousetraces:
datasource: tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/?database=signoz_traces
timeout: 10s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
# ClickHouse exporter for metrics
signozclickhousemetrics:
dsn: "tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/signoz_metrics"
timeout: 10s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
# ClickHouse exporter for meter data (usage metrics)
signozclickhousemeter:
dsn: "tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/signoz_meter"
timeout: 45s
sending_queue:
enabled: false
# ClickHouse exporter for logs
clickhouselogsexporter:
dsn: tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/?database=signoz_logs
timeout: 10s
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
# Metadata exporter for service metadata
metadataexporter:
dsn: "tcp://admin:27ff0399-0d3a-4bd8-919d-17c2181e6fb9@signoz-clickhouse:9000/signoz_metadata"
timeout: 10s
cache:
provider: in_memory
# Debug exporter for debugging (optional)
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
service:
extensions: [health_check, zpages]
pipelines:
# Traces pipeline - exports to ClickHouse and signozmeter connector
traces:
receivers: [otlp]
processors: [memory_limiter, batch, signozspanmetrics/delta, resourcedetection, resource]
exporters: [clickhousetraces, metadataexporter, signozmeter]
# Metrics pipeline - includes all infrastructure receivers
metrics:
receivers: [otlp,
postgresql/auth, postgresql/inventory, postgresql/orders,
postgresql/ai-insights, postgresql/alert-processor, postgresql/distribution,
postgresql/external, postgresql/forecasting, postgresql/notification,
postgresql/orchestrator, postgresql/pos, postgresql/procurement,
postgresql/production, postgresql/recipes, postgresql/sales,
postgresql/suppliers, postgresql/tenant, postgresql/training,
redis, rabbitmq, k8s_cluster, prometheus]
processors: [memory_limiter, batch, resourcedetection, resource]
exporters: [signozclickhousemetrics]
# Meter pipeline - receives from signozmeter connector
metrics/meter:
receivers: [signozmeter]
processors: [batch/meter]
exporters: [signozclickhousemeter]
# Logs pipeline - includes both OTLP and Kubernetes pod logs
logs:
receivers: [otlp, filelog]
processors: [memory_limiter, batch, resourcedetection, resource, k8sattributes]
exporters: [clickhouselogsexporter]
# HPA for OTEL Collector
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
# ClusterRole configuration for Kubernetes monitoring
# CRITICAL: Required for k8s_cluster receiver to access Kubernetes API
# Without these permissions, k8s metrics will not appear in SigNoz UI
clusterRole:
create: true
name: "signoz-otel-collector-bakery-ia"
annotations: {}
# Complete RBAC rules required by k8sclusterreceiver
# Based on OpenTelemetry and SigNoz official documentation
rules:
# Core API group - fundamental Kubernetes resources
- apiGroups: [""]
resources:
- "events"
- "namespaces"
- "nodes"
- "nodes/proxy"
- "nodes/metrics"
- "nodes/spec"
- "pods"
- "pods/status"
- "replicationcontrollers"
- "replicationcontrollers/status"
- "resourcequotas"
- "services"
- "endpoints"
verbs: ["get", "list", "watch"]
# Apps API group - modern workload controllers
- apiGroups: ["apps"]
resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
verbs: ["get", "list", "watch"]
# Batch API group - job management
- apiGroups: ["batch"]
resources: ["jobs", "cronjobs"]
verbs: ["get", "list", "watch"]
# Autoscaling API group - HPA metrics (CRITICAL)
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["get", "list", "watch"]
# Extensions API group - legacy support
- apiGroups: ["extensions"]
resources: ["deployments", "daemonsets", "replicasets"]
verbs: ["get", "list", "watch"]
# Metrics API group - resource metrics
- apiGroups: ["metrics.k8s.io"]
resources: ["nodes", "pods"]
verbs: ["get", "list", "watch"]
clusterRoleBinding:
annotations: {}
name: "signoz-otel-collector-bakery-ia"
# Schema Migrator - Manages ClickHouse schema migrations
schemaMigrator:
enabled: true
image:
repository: signoz/signoz-schema-migrator
tag: v0.129.12 # Updated to latest version
pullPolicy: IfNotPresent
# Enable Helm hooks for proper upgrade handling
upgradeHelmHooks: true
# Additional Configuration
serviceAccount:
create: true
annotations: {}
name: "signoz"
# Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# Pod Disruption Budgets for HA
podDisruptionBudget:
frontend:
enabled: true
minAvailable: 1
queryService:
enabled: true
minAvailable: 1
alertmanager:
enabled: true
minAvailable: 1
clickhouse:
enabled: true
minAvailable: 1
# Network Policies for security
networkPolicy:
enabled: true
policyTypes:
- Ingress
- Egress
# Monitoring SigNoz itself
selfMonitoring:
enabled: true
serviceMonitor:
enabled: true
interval: 30s

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: gateway
app.kubernetes.io/component: gateway
spec:
imagePullSecrets:
- name: dockerhub-creds
containers:
- name: gateway
image: bakery/gateway:latest

View File

@@ -5,3 +5,4 @@ resources:
- gateway-service.yaml
- nominatim/nominatim.yaml
- nominatim/nominatim-init-job.yaml
- unbound/unbound.yaml

View File

@@ -15,8 +15,6 @@ spec:
app.kubernetes.io/name: nominatim-init
app.kubernetes.io/component: data-init
spec:
imagePullSecrets:
- name: dockerhub-creds
restartPolicy: OnFailure
containers:
- name: nominatim-import

View File

@@ -0,0 +1,81 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: unbound-resolver
namespace: bakery-ia
labels:
app.kubernetes.io/name: unbound-resolver
app.kubernetes.io/component: dns
app.kubernetes.io/part-of: bakery-ia
spec:
replicas: 1 # Scale to 2+ in production with anti-affinity
selector:
matchLabels:
app.kubernetes.io/name: unbound-resolver
app.kubernetes.io/component: dns
template:
metadata:
labels:
app.kubernetes.io/name: unbound-resolver
app.kubernetes.io/component: dns
spec:
containers:
- name: unbound
image: mvance/unbound:latest
ports:
- containerPort: 53
name: dns-udp
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "384Mi"
readinessProbe:
exec:
command:
- sh
- -c
- drill @127.0.0.1 -p 53 +dnssec example.org || nslookup -type=A example.org 127.0.0.1
initialDelaySeconds: 10
periodSeconds: 30
livenessProbe:
exec:
command:
- sh
- -c
- drill @127.0.0.1 -p 53 +dnssec example.org || nslookup -type=A example.org 127.0.0.1
initialDelaySeconds: 30
periodSeconds: 60
securityContext:
capabilities:
add: ["NET_BIND_SERVICE"]
---
apiVersion: v1
kind: Service
metadata:
name: unbound-dns
namespace: bakery-ia
labels:
app.kubernetes.io/name: unbound-resolver
app.kubernetes.io/component: dns
spec:
type: ClusterIP
ports:
- name: dns-udp
port: 53
targetPort: 53
protocol: UDP
- name: dns-tcp
port: 53
targetPort: 53
protocol: TCP
selector:
app.kubernetes.io/name: unbound-resolver
app.kubernetes.io/component: dns

View File

@@ -1,5 +1,20 @@
# Dev-specific Mailu Helm values for Bakery-IA
# Overrides base configuration for development environment
# Development-tuned Mailu configuration
global:
# Use the unbound service IP - will be replaced during deployment
custom_dns_servers: "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
# Component-specific DNS configuration
admin:
dnsPolicy: "None"
dnsConfig:
nameservers:
- "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
rspamd:
dnsPolicy: "None"
dnsConfig:
nameservers:
- "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
# Domain configuration for dev
domain: "bakery-ia.local"
@@ -12,7 +27,64 @@ externalRelay:
username: "postmaster@bakery-ia.local"
password: "mailgun-api-key-replace-in-production"
# Ingress configuration for dev - disabled to use with existing ingress
# Environment-specific configurations
persistence:
enabled: true
# Development: use default storage class
storageClass: "standard"
size: "5Gi"
# Resource optimizations for development
resources:
admin:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
front:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
postfix:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "256Mi"
dovecot:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "256Mi"
rspamd:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
clamav:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "300m"
memory: "512Mi"
replicaCount: 1 # Single replica for development
# Security settings
secretKey: "generate-strong-key-here-for-development"
# Ingress configuration for development - disabled to use with existing ingress
ingress:
enabled: false # Disable chart's Ingress; use existing one
tls: false # Disable TLS in chart since ingress handles it
@@ -33,6 +105,15 @@ welcomeMessage:
# Log level for dev
logLevel: "DEBUG"
# Development-specific overrides
env:
DEBUG: "true"
LOG_LEVEL: "INFO"
# Disable or simplify monitoring in development
monitoring:
enabled: false
# Network Policy for dev
networkPolicy:
enabled: true

View File

@@ -1,5 +1,20 @@
# Production-specific Mailu Helm values for Bakery-IA
# Overrides base configuration for production environment
# Production-tuned Mailu configuration
global:
# Use the unbound service IP - will be replaced during deployment
custom_dns_servers: "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
# Component-specific DNS configuration
admin:
dnsPolicy: "None"
dnsConfig:
nameservers:
- "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
rspamd:
dnsPolicy: "None"
dnsConfig:
nameservers:
- "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
# Domain configuration for production
domain: "bakewise.ai"
@@ -12,6 +27,63 @@ externalRelay:
username: "postmaster@bakewise.ai"
password: "PRODUCTION_MAILGUN_API_KEY" # This should be set via secret
# Environment-specific configurations
persistence:
enabled: true
# Production: use microk8s-hostpath or longhorn
storageClass: "longhorn" # Assuming Longhorn is available in production
size: "20Gi" # Larger storage for production email volume
# Resource allocations for production
resources:
admin:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
front:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
postfix:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
dovecot:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
rspamd:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
clamav:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
replicaCount: 1 # Can be increased in production as needed
# Security settings
secretKey: "generate-strong-key-here-for-production"
# Ingress configuration for production - disabled to use with existing ingress
ingress:
enabled: false # Disable chart's Ingress; use existing one
@@ -40,7 +112,24 @@ antivirus:
enabled: true
flavor: "clamav"
# Network Policy for production
# Production-specific settings
env:
DEBUG: "false"
LOG_LEVEL: "WARNING"
TLS_FLAVOR: "cert"
REDIS_PASSWORD: "secure-redis-password"
# Enable monitoring in production
monitoring:
enabled: true
# Production-specific security settings
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# Network policies for production
networkPolicy:
enabled: true
ingressController:

View File

@@ -1,6 +1,11 @@
# Base Mailu Helm values for Bakery-IA
# Preserves critical configurations from the original Kustomize setup
# Global DNS configuration for DNSSEC validation
global:
# This will be replaced with the actual Unbound service IP during deployment
custom_dns_servers: "unbound-dns.bakery-ia.svc.cluster.local" # Using service DNS name instead of IP
# Domain configuration
domain: "DOMAIN_PLACEHOLDER"
hostnames:
@@ -204,3 +209,17 @@ networkPolicy:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
# DNS Policy Configuration for DNSSEC validation
# These settings ensure Mailu components use the Unbound DNS resolver
dnsPolicy: "None"
dnsConfig:
nameservers:
- "unbound-dns.bakery-ia.svc.cluster.local" # Points to the Unbound service in the bakery-ia namespace
options:
- name: ndots
value: "5"
- name: timeout
value: "5"
- name: attempts
value: "3"

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: {{SERVICE_NAME}}-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
containers:
- name: postgres
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: redis
app.kubernetes.io/component: cache
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 999 # redis group
initContainers:

View File

@@ -1,65 +0,0 @@
#!/bin/bash
# Setup Docker Hub image pull secrets for all namespaces
# This script creates docker-registry secrets for pulling images from Docker Hub
set -e
# Docker Hub credentials
DOCKER_SERVER="docker.io"
DOCKER_USERNAME="uals"
DOCKER_PASSWORD="dckr_pat_zzEY5Q58x1S0puraIoKEtbpue3A"
DOCKER_EMAIL="ualfaro@gmail.com"
SECRET_NAME="dockerhub-creds"
# List of namespaces used in the project
NAMESPACES=(
"bakery-ia"
"bakery-ia-dev"
"bakery-ia-prod"
"default"
)
echo "Setting up Docker Hub image pull secrets..."
echo "==========================================="
echo ""
for namespace in "${NAMESPACES[@]}"; do
echo "Processing namespace: $namespace"
# Create namespace if it doesn't exist
if ! kubectl get namespace "$namespace" >/dev/null 2>&1; then
echo " Creating namespace: $namespace"
kubectl create namespace "$namespace"
fi
# Delete existing secret if it exists
if kubectl get secret "$SECRET_NAME" -n "$namespace" >/dev/null 2>&1; then
echo " Deleting existing secret in namespace: $namespace"
kubectl delete secret "$SECRET_NAME" -n "$namespace"
fi
# Create the docker-registry secret
echo " Creating Docker Hub secret in namespace: $namespace"
kubectl create secret docker-registry "$SECRET_NAME" \
--docker-server="$DOCKER_SERVER" \
--docker-username="$DOCKER_USERNAME" \
--docker-password="$DOCKER_PASSWORD" \
--docker-email="$DOCKER_EMAIL" \
-n "$namespace"
echo " ✓ Secret created successfully in namespace: $namespace"
echo ""
done
echo "==========================================="
echo "Docker Hub secrets setup completed!"
echo ""
echo "The secret '$SECRET_NAME' has been created in all namespaces:"
for namespace in "${NAMESPACES[@]}"; do
echo " - $namespace"
done
echo ""
echo "Next steps:"
echo "1. Apply Kubernetes manifests with imagePullSecrets configured"
echo "2. Verify pods can pull images: kubectl get pods -A"

View File

@@ -1,67 +0,0 @@
#!/bin/bash
# Setup GitHub Container Registry (GHCR) image pull secrets for all namespaces
# This script creates docker-registry secrets for pulling images from GHCR
set -e
# GitHub Container Registry credentials
# Note: Use a GitHub Personal Access Token with 'read:packages' scope
GHCR_SERVER="ghcr.io"
GHCR_USERNAME="uals" # GitHub username
GHCR_PASSWORD="ghp_zzEY5Q58x1S0puraIoKEtbpue3A" # GitHub Personal Access Token
GHCR_EMAIL="ualfaro@gmail.com"
SECRET_NAME="ghcr-creds"
# List of namespaces used in the project
NAMESPACES=(
"bakery-ia"
"bakery-ia-dev"
"bakery-ia-prod"
"default"
)
echo "Setting up GitHub Container Registry image pull secrets..."
echo "=========================================================="
echo ""
for namespace in "${NAMESPACES[@]}"; do
echo "Processing namespace: $namespace"
# Create namespace if it doesn't exist
if ! kubectl get namespace "$namespace" >/dev/null 2>&1; then
echo " Creating namespace: $namespace"
kubectl create namespace "$namespace"
fi
# Delete existing secret if it exists
if kubectl get secret "$SECRET_NAME" -n "$namespace" >/dev/null 2>&1; then
echo " Deleting existing secret in namespace: $namespace"
kubectl delete secret "$SECRET_NAME" -n "$namespace"
fi
# Create the docker-registry secret for GHCR
echo " Creating GHCR secret in namespace: $namespace"
kubectl create secret docker-registry "$SECRET_NAME" \
--docker-server="$GHCR_SERVER" \
--docker-username="$GHCR_USERNAME" \
--docker-password="$GHCR_PASSWORD" \
--docker-email="$GHCR_EMAIL" \
-n "$namespace"
echo " ✓ Secret created successfully in namespace: $namespace"
echo ""
done
echo "=========================================================="
echo "GitHub Container Registry secrets setup completed!"
echo ""
echo "The secret '$SECRET_NAME' has been created in all namespaces:"
for namespace in "${NAMESPACES[@]}"; do
echo " - $namespace"
done
echo ""
echo "Next steps:"
echo "1. Update your Kubernetes manifests to include the GHCR imagePullSecrets"
echo "2. Verify pods can pull images from GHCR: kubectl get pods -A"
echo "3. Consider updating your CI/CD pipelines to push images to GHCR"

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: ai-insights-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: alert-processor-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: auth-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -21,8 +21,6 @@ spec:
app: demo-session-db
component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: distribution-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: external-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: forecasting-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: inventory-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: notification-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: orchestrator-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: orders-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: pos-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: procurement-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: production-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: rabbitmq
app.kubernetes.io/component: message-broker
spec:
imagePullSecrets:
- name: dockerhub-creds
containers:
- name: rabbitmq
image: rabbitmq:4.1-management-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: recipes-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: sales-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: suppliers-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: tenant-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: training-db
app.kubernetes.io/component: database
spec:
imagePullSecrets:
- name: dockerhub-creds
securityContext:
fsGroup: 70
initContainers:

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: ai-insights-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: ai-insights-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: alert-processor-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: auth-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,9 +16,6 @@ spec:
app.kubernetes.io/name: auth-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
- name: ghcr-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -17,8 +17,6 @@ spec:
labels:
app: demo-cleanup
spec:
imagePullSecrets:
- name: dockerhub-creds
template:
metadata:
labels:

View File

@@ -19,8 +19,6 @@ spec:
component: background-jobs
service: demo-session
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-migrations
image: postgres:17-alpine

View File

@@ -15,8 +15,6 @@ spec:
app.kubernetes.io/name: demo-session-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: distribution-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: distribution-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -22,8 +22,6 @@ spec:
app: external-service
job: data-rotation
spec:
imagePullSecrets:
- name: dockerhub-creds
ttlSecondsAfterFinished: 172800
backoffLimit: 2

View File

@@ -23,8 +23,6 @@ spec:
app.kubernetes.io/component: microservice
version: "2.0"
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -17,8 +17,6 @@ spec:
app: external-service
job: data-init
spec:
imagePullSecrets:
- name: dockerhub-creds
restartPolicy: OnFailure
initContainers:

View File

@@ -16,9 +16,6 @@ spec:
app.kubernetes.io/name: external-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
- name: ghcr-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: forecasting-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: forecasting-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: frontend
app.kubernetes.io/component: frontend
spec:
imagePullSecrets:
- name: dockerhub-creds
containers:
- name: frontend
image: bakery/dashboard:latest

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: inventory-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: inventory-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: notification-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: notification-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: orchestrator-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: orchestrator-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: orders-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: orders-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: pos-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: pos-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: procurement-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: procurement-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: production-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: production-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: recipes-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: recipes-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: sales-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: sales-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: suppliers-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: suppliers-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,9 +16,6 @@ spec:
app.kubernetes.io/name: tenant-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
- name: ghcr-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,9 +19,6 @@ spec:
app.kubernetes.io/name: tenant-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
- name: ghcr-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -16,8 +16,6 @@ spec:
app.kubernetes.io/name: training-migration
app.kubernetes.io/component: migration
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
- name: wait-for-db
image: postgres:17-alpine

View File

@@ -19,8 +19,6 @@ spec:
app.kubernetes.io/name: training-service
app.kubernetes.io/component: microservice
spec:
imagePullSecrets:
- name: dockerhub-creds
initContainers:
# Wait for Redis to be ready
- name: wait-for-redis

View File

@@ -53,6 +53,8 @@ BASE_IMAGES=(
"ghcr.io/mailu/postfix:2024.06"
"ghcr.io/mailu/dovecot:2024.06"
"ghcr.io/mailu/rspamd:2024.06"
# DNS resolver (Unbound)
"mvance/unbound:latest"
)
# Local registry configuration

72
secrets_test.yaml Normal file
View File

@@ -0,0 +1,72 @@
# Secret for Gitea webhook validation
# Used by EventListener to validate incoming webhooks
apiVersion: v1
kind: Secret
metadata:
name: gitea-webhook-secret
namespace: {{ .Values.namespace }}
labels:
app.kubernetes.io/name: {{ .Values.labels.app.name }}
app.kubernetes.io/component: triggers
annotations:
note: "Webhook secret for validating incoming webhooks"
type: Opaque
stringData:
secretToken: {{ .Values.secrets.webhook.token | quote }}
---
# Secret for Gitea container registry credentials
# Used by Kaniko to push images to Gitea registry
apiVersion: v1
kind: Secret
metadata:
name: gitea-registry-credentials
namespace: {{ .Values.namespace }}
labels:
app.kubernetes.io/name: {{ .Values.labels.app.name }}
app.kubernetes.io/component: build
annotations:
note: "Registry credentials for pushing images"
type: kubernetes.io/dockerconfigjson
stringData:
.dockerconfigjson: |
{
"auths": {
{{ .Values.secrets.registry.registryUrl | quote }}: {
"username": {{ .Values.secrets.registry.username | quote }},
"password": {{ .Values.secrets.registry.password | quote }}
}
}
}
---
# Secret for Git credentials (used by pipeline to push GitOps updates)
apiVersion: v1
kind: Secret
metadata:
name: gitea-git-credentials
namespace: {{ .Values.namespace }}
labels:
app.kubernetes.io/name: {{ .Values.labels.app.name }}
app.kubernetes.io/component: gitops
annotations:
note: "Git credentials for GitOps updates"
type: Opaque
stringData:
username: {{ .Values.secrets.git.username | quote }}
password: {{ .Values.secrets.git.password | quote }}
---
# Secret for Flux GitRepository access
# Used by Flux to pull from Gitea repository
apiVersion: v1
kind: Secret
metadata:
name: gitea-credentials
namespace: {{ .Values.pipeline.deployment.fluxNamespace }}
labels:
app.kubernetes.io/name: {{ .Values.labels.app.name }}
app.kubernetes.io/component: flux
annotations:
note: "Credentials for Flux GitRepository access"
type: Opaque
stringData:
username: {{ .Values.secrets.git.username | quote }}
password: {{ .Values.secrets.git.password | quote }}

22
test_secrets.yaml Normal file
View File

@@ -0,0 +1,22 @@
# Test version of the secrets file to isolate the issue
apiVersion: v1
kind: Secret
metadata:
name: gitea-registry-credentials
namespace: {{ .Values.namespace }}
labels:
app.kubernetes.io/name: {{ .Values.labels.app.name }}
app.kubernetes.io/component: build
annotations:
note: "Registry credentials for pushing images"
type: kubernetes.io/dockerconfigjson
stringData:
.dockerconfigjson: |
{
"auths": {
{{ .Values.secrets.registry.registryUrl | quote }}: {
"username": {{ .Values.secrets.registry.username | quote }},
"password": {{ .Values.secrets.registry.password | quote }}
}
}
}