Add new infra architecture 5
This commit is contained in:
@@ -12,14 +12,15 @@
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Monitoring & Observability](#monitoring--observability)
|
||||
3. [Security Operations](#security-operations)
|
||||
4. [Database Management](#database-management)
|
||||
5. [Backup & Recovery](#backup--recovery)
|
||||
6. [Performance Optimization](#performance-optimization)
|
||||
7. [Scaling Operations](#scaling-operations)
|
||||
8. [Incident Response](#incident-response)
|
||||
9. [Maintenance Tasks](#maintenance-tasks)
|
||||
10. [Compliance & Audit](#compliance--audit)
|
||||
3. [CI/CD Operations](#ci-cd-operations)
|
||||
4. [Security Operations](#security-operations)
|
||||
5. [Database Management](#database-management)
|
||||
6. [Backup & Recovery](#backup--recovery)
|
||||
7. [Performance Optimization](#performance-optimization)
|
||||
8. [Scaling Operations](#scaling-operations)
|
||||
9. [Incident Response](#incident-response)
|
||||
10. [Maintenance Tasks](#maintenance-tasks)
|
||||
11. [Compliance & Audit](#compliance--audit)
|
||||
|
||||
---
|
||||
|
||||
@@ -33,6 +34,8 @@
|
||||
- **Capacity:** 10-tenant pilot (scalable to 100+)
|
||||
- **Security:** TLS encryption, RBAC, audit logging
|
||||
- **Monitoring:** Prometheus, Grafana, AlertManager, SigNoz
|
||||
- **CI/CD:** Tekton Pipelines, Gitea, Flux CD (GitOps)
|
||||
- **Email:** Mailu (integrated email server)
|
||||
|
||||
**Key Metrics (10-tenant baseline):**
|
||||
- **Uptime Target:** 99.5% (3.65 hours downtime/month)
|
||||
@@ -46,11 +49,12 @@
|
||||
|
||||
| Role | Responsibilities |
|
||||
|------|------------------|
|
||||
| **DevOps Engineer** | Deployment, infrastructure, scaling |
|
||||
| **DevOps Engineer** | Deployment, infrastructure, scaling, CI/CD |
|
||||
| **SRE** | Monitoring, incident response, performance |
|
||||
| **Security Admin** | Access control, security patches, compliance |
|
||||
| **Database Admin** | Backups, optimization, migrations |
|
||||
| **On-Call Engineer** | 24/7 incident response (if applicable) |
|
||||
| **CI/CD Admin** | Pipeline management, GitOps workflows |
|
||||
|
||||
---
|
||||
|
||||
@@ -73,18 +77,6 @@ SigNoz is a comprehensive, open-source observability platform that provides:
|
||||
- **Database Monitoring** - All 18 PostgreSQL databases + Redis + RabbitMQ
|
||||
- **Kubernetes Monitoring** - Cluster, node, pod, and container metrics
|
||||
|
||||
**Port Forwarding (if ingress not available):**
|
||||
```bash
|
||||
# SigNoz Frontend (Main UI)
|
||||
kubectl port-forward -n bakery-ia svc/signoz 8080:8080
|
||||
|
||||
# SigNoz AlertManager
|
||||
kubectl port-forward -n bakery-ia svc/signoz-alertmanager 9093:9093
|
||||
|
||||
# OTel Collector (for debugging)
|
||||
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4317:4317 # gRPC
|
||||
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 4318:4318 # HTTP
|
||||
```
|
||||
|
||||
### Key SigNoz Dashboards and Features
|
||||
|
||||
@@ -340,6 +332,116 @@ kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep k8sattributes
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Operations
|
||||
|
||||
### CI/CD Infrastructure Overview
|
||||
|
||||
The platform includes a complete CI/CD pipeline using:
|
||||
- **Gitea** - Git server and container registry
|
||||
- **Tekton** - Pipeline automation
|
||||
- **Flux CD** - GitOps deployment
|
||||
|
||||
### Access CI/CD Systems
|
||||
|
||||
**Gitea (Git Server):**
|
||||
- URL: http://gitea.bakery-ia.local (development) or http://gitea.bakewise.ai (production)
|
||||
- Admin panel: http://gitea.bakery-ia.local/admin
|
||||
|
||||
**Tekton Dashboard:**
|
||||
```bash
|
||||
# Port forward to access Tekton dashboard
|
||||
kubectl port-forward -n tekton-pipelines svc/tekton-dashboard 9097:9097
|
||||
# Access at: http://localhost:9097
|
||||
```
|
||||
|
||||
**Flux Status:**
|
||||
```bash
|
||||
# Check Flux status
|
||||
flux check
|
||||
kubectl get gitrepository -n flux-system
|
||||
kubectl get kustomization -n flux-system
|
||||
```
|
||||
|
||||
### CI/CD Monitoring
|
||||
|
||||
**Check pipeline status:**
|
||||
```bash
|
||||
# List all PipelineRuns
|
||||
kubectl get pipelineruns -n tekton-pipelines
|
||||
|
||||
# Check Tekton controller logs
|
||||
kubectl logs -n tekton-pipelines -l app=tekton-pipelines-controller
|
||||
|
||||
# Check Tekton dashboard logs
|
||||
kubectl logs -n tekton-pipelines -l app=tekton-dashboard
|
||||
```
|
||||
|
||||
**Monitor GitOps synchronization:**
|
||||
```bash
|
||||
# Check GitRepository status
|
||||
kubectl get gitrepository -n flux-system -o wide
|
||||
|
||||
# Check Kustomization status
|
||||
kubectl get kustomization -n flux-system -o wide
|
||||
|
||||
# Get reconciliation history
|
||||
kubectl get events -n flux-system --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
### CI/CD Troubleshooting
|
||||
|
||||
**Pipeline not triggering:**
|
||||
```bash
|
||||
# Check Gitea webhook logs
|
||||
kubectl logs -n tekton-pipelines -l app=tekton-triggers-controller
|
||||
|
||||
# Verify EventListener pods are running
|
||||
kubectl get pods -n tekton-pipelines -l app=tekton-triggers-eventlistener
|
||||
|
||||
# Check TriggerBinding configuration
|
||||
kubectl get triggerbinding -n tekton-pipelines
|
||||
```
|
||||
|
||||
**Build failures:**
|
||||
```bash
|
||||
# Check Kaniko logs for build errors
|
||||
kubectl logs -n tekton-pipelines -l tekton.dev/task=kaniko-build
|
||||
|
||||
# Verify Dockerfile paths are correct
|
||||
kubectl describe taskrun -n tekton-pipelines
|
||||
```
|
||||
|
||||
**Flux not applying changes:**
|
||||
```bash
|
||||
# Check GitRepository status
|
||||
kubectl describe gitrepository -n flux-system
|
||||
|
||||
# Check Kustomization reconciliation
|
||||
kubectl describe kustomization -n flux-system
|
||||
|
||||
# Check Flux logs
|
||||
kubectl logs -n flux-system -l app.kubernetes.io/name=helm-controller
|
||||
```
|
||||
|
||||
### CI/CD Maintenance Tasks
|
||||
|
||||
**Daily Tasks:**
|
||||
- [ ] Check for failed pipeline runs
|
||||
- [ ] Verify GitOps synchronization status
|
||||
- [ ] Clean up old PipelineRun resources
|
||||
|
||||
**Weekly Tasks:**
|
||||
- [ ] Review pipeline performance metrics
|
||||
- [ ] Update pipeline definitions if needed
|
||||
- [ ] Rotate CI/CD secrets
|
||||
|
||||
**Monthly Tasks:**
|
||||
- [ ] Update Tekton and Flux versions
|
||||
- [ ] Review and optimize pipeline performance
|
||||
- [ ] Audit CI/CD access permissions
|
||||
|
||||
---
|
||||
|
||||
## Security Operations
|
||||
|
||||
### Security Posture Overview
|
||||
@@ -1210,6 +1312,8 @@ kubectl exec -n bakery-ia deployment/auth-db -- \
|
||||
- [TLS Configuration](./tls-configuration.md) - Certificate management
|
||||
- [RBAC Implementation](./rbac-implementation.md) - Access control configuration
|
||||
- [Monitoring Stack README](../infrastructure/kubernetes/base/components/monitoring/README.md) - Detailed monitoring documentation
|
||||
- [CI/CD Infrastructure README](../infrastructure/cicd/README.md) - Gitea, Tekton, and Flux CD setup and operations
|
||||
- [SigNoz Monitoring README](../infrastructure/monitoring/signoz/README.md) - SigNoz deployment and configuration
|
||||
|
||||
**External Resources:**
|
||||
- Kubernetes: https://kubernetes.io/docs
|
||||
|
||||
Reference in New Issue
Block a user