Imporve monitoring 6

This commit is contained in:
Urtzi Alfaro
2026-01-10 13:43:38 +01:00
parent c05538cafb
commit b089c216db
13 changed files with 1248 additions and 2546 deletions

View File

@@ -38,7 +38,8 @@ Bakery-IA is an **AI-powered SaaS platform** designed specifically for the Spani
**Infrastructure:**
- Docker containers, Kubernetes orchestration
- PostgreSQL 17, Redis 7.4, RabbitMQ 4.1
- Prometheus + Grafana monitoring
- **SigNoz unified observability platform** - Traces, metrics, logs
- OpenTelemetry instrumentation across all services
- HTTPS with automatic certificate renewal
---
@@ -711,6 +712,14 @@ Data Collection → Feature Engineering → Prophet Training
- Service decoupling
- Asynchronous processing
**4. Distributed Tracing (OpenTelemetry)**
- End-to-end request tracking across all 18 microservices
- Automatic instrumentation for FastAPI, HTTPX, SQLAlchemy, Redis
- Performance bottleneck identification
- Database query performance analysis
- External API call monitoring
- Error tracking with full context
### Scalability & Performance
**1. Microservices Architecture**
@@ -731,6 +740,16 @@ Data Collection → Feature Engineering → Prophet Training
- 1,000+ req/sec per gateway instance
- 10,000+ concurrent connections
**4. Observability & Monitoring**
- **SigNoz Platform**: Unified traces, metrics, and logs
- **Auto-Instrumentation**: Zero-code instrumentation via OpenTelemetry
- **Application Monitoring**: All 18 services reporting metrics
- **Infrastructure Monitoring**: 18 PostgreSQL databases, Redis, RabbitMQ
- **Kubernetes Monitoring**: Node, pod, container metrics
- **Log Aggregation**: Centralized logs with trace correlation
- **Real-Time Alerting**: Email and Slack notifications
- **Query Performance**: ClickHouse backend for fast analytics
---
## Security & Compliance
@@ -786,8 +805,13 @@ Data Collection → Feature Engineering → Prophet Training
- **Orchestration**: Kubernetes
- **Ingress**: NGINX Ingress Controller
- **Certificates**: Let's Encrypt (auto-renewal)
- **Monitoring**: Prometheus + Grafana
- **Logging**: ELK Stack (planned)
- **Observability**: SigNoz (unified traces, metrics, logs)
- **Distributed Tracing**: OpenTelemetry auto-instrumentation (FastAPI, HTTPX, SQLAlchemy, Redis)
- **Application Metrics**: RED metrics (Rate, Error, Duration) from all 18 services
- **Infrastructure Metrics**: PostgreSQL (18 databases), Redis, RabbitMQ, Kubernetes cluster
- **Log Management**: Centralized logs with trace correlation and Kubernetes metadata
- **Alerting**: Multi-channel notifications (email, Slack) via AlertManager
- **Telemetry Backend**: ClickHouse for high-performance time-series storage
### CI/CD Pipeline
1. Code push to GitHub
@@ -834,11 +858,14 @@ Data Collection → Feature Engineering → Prophet Training
- Stripe integration
- Automated billing
### 5. Real-Time Operations
### 5. Real-Time Operations & Observability
- SSE for instant alerts
- WebSocket for live updates
- Sub-second dashboard refresh
- Always up-to-date data
- **Full-stack observability** with SigNoz
- Distributed tracing for performance debugging
- Real-time metrics from all layers (app, DB, cache, queue, cluster)
### 6. Developer-Friendly
- RESTful APIs