- Updated all OpenTelemetry packages to latest versions: - opentelemetry-api: 1.27.0 → 1.39.1 - opentelemetry-sdk: 1.27.0 → 1.39.1 - opentelemetry-exporter-otlp-proto-grpc: 1.27.0 → 1.39.1 - opentelemetry-exporter-otlp-proto-http: 1.27.0 → 1.39.1 - opentelemetry-instrumentation-fastapi: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-httpx: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-redis: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-sqlalchemy: 0.48b0 → 0.60b1 - Removed prometheus-client==0.23.1 from all services - Unified all services to use the same monitoring package versions Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
15 KiB
SigNoz Monitoring Setup Guide
This guide explains how to set up complete observability for the Bakery IA platform using SigNoz, which provides unified metrics, logs, and traces visualization.
Table of Contents
- Architecture Overview
- Prerequisites
- SigNoz Deployment
- Service Configuration
- Data Flow
- Verification
- Troubleshooting
Architecture Overview
The monitoring setup uses a three-tier approach:
┌─────────────────────────────────────────────────────────────┐
│ Bakery IA Services │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Auth │ │ Inventory│ │ Orders │ │ ... │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └─────────────┴─────────────┴─────────────┘ │
│ │ │
│ OpenTelemetry Protocol (OTLP) │
│ Traces / Metrics / Logs │
└──────────────────────────┼───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ SigNoz OpenTelemetry Collector │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Receivers: │ │
│ │ - OTLP gRPC (4317) - OTLP HTTP (4318) │ │
│ │ - Prometheus Scraper (service discovery) │ │
│ └────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌────────────────────┴───────────────────────────────────┐ │
│ │ Processors: batch, memory_limiter, resourcedetection │ │
│ └────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌────────────────────┴───────────────────────────────────┐ │
│ │ Exporters: ClickHouse (traces, metrics, logs) │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────┼───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ ClickHouse Database │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Traces │ │ Metrics │ │ Logs │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└──────────────────────────┼───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ SigNoz Query Service │
│ & Frontend UI │
│ https://monitoring.bakery-ia.local │
└──────────────────────────────────────────────────────────────┘
Key Components
- Services: Generate telemetry data using OpenTelemetry SDK
- OpenTelemetry Collector: Receives, processes, and exports telemetry
- ClickHouse: Stores traces, metrics, and logs
- SigNoz UI: Query and visualize all telemetry data
Prerequisites
- Kubernetes cluster (Kind, Minikube, or production cluster)
- Helm 3.x installed
- kubectl configured
- At least 4GB RAM available for SigNoz components
SigNoz Deployment
1. Add SigNoz Helm Repository
helm repo add signoz https://charts.signoz.io
helm repo update
2. Create Namespace
kubectl create namespace signoz
3. Deploy SigNoz
# For development environment
helm install signoz signoz/signoz \
-n signoz \
-f infrastructure/helm/signoz-values-dev.yaml
# For production environment
helm install signoz signoz/signoz \
-n signoz \
-f infrastructure/helm/signoz-values-prod.yaml
4. Verify Deployment
# Check all pods are running
kubectl get pods -n signoz
# Expected output:
# signoz-alertmanager-0
# signoz-clickhouse-0
# signoz-frontend-*
# signoz-otel-collector-*
# signoz-query-service-*
# Check services
kubectl get svc -n signoz
Service Configuration
Each microservice needs to be configured to send telemetry to SigNoz.
Environment Variables
Add these environment variables to your service deployments:
env:
# OpenTelemetry Collector endpoint
- name: OTEL_COLLECTOR_ENDPOINT
value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
# Service identification
- name: OTEL_SERVICE_NAME
value: "your-service-name" # e.g., "auth-service"
# Enable tracing
- name: ENABLE_TRACING
value: "true"
# Enable logs export
- name: OTEL_LOGS_EXPORTER
value: "otlp"
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "true"
# Enable metrics export (optional, default: true)
- name: ENABLE_OTEL_METRICS
value: "true"
Prometheus Annotations
Add these annotations to enable Prometheus metrics scraping:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
Complete Example
See infrastructure/kubernetes/base/components/auth/auth-service.yaml for a complete example.
Automated Configuration Script
Use the provided script to add monitoring configuration to all services:
# Run from project root
./infrastructure/kubernetes/add-monitoring-config.sh
Data Flow
1. Traces
Automatic Instrumentation:
# In your service's main.py
from shared.service_base import StandardFastAPIService
service = AuthService() # Extends StandardFastAPIService
app = service.create_app()
# Tracing is automatically enabled if ENABLE_TRACING=true
# All FastAPI endpoints, HTTP clients, Redis, PostgreSQL are auto-instrumented
Manual Instrumentation:
from shared.monitoring.tracing import add_trace_attributes, add_trace_event
# Add custom attributes to current span
add_trace_attributes(
user_id="123",
tenant_id="abc",
operation="user_registration"
)
# Add events for important operations
add_trace_event("user_authenticated", user_id="123", method="jwt")
2. Metrics
Dual Export Strategy:
Services export metrics in two ways:
- Prometheus format at
/metricsendpoint (scraped by SigNoz) - OTLP push directly to SigNoz collector (real-time)
Built-in Metrics:
# Automatically collected by BaseFastAPIService:
# - http_requests_total
# - http_request_duration_seconds
# - active_connections
Custom Metrics:
# Define in your service
custom_metrics = {
"user_registrations": {
"type": "counter",
"description": "Total user registrations",
"labels": ["status"]
},
"login_duration_seconds": {
"type": "histogram",
"description": "Login request duration"
}
}
service = AuthService(custom_metrics=custom_metrics)
# Use in your code
service.metrics_collector.increment_counter(
"user_registrations",
labels={"status": "success"}
)
3. Logs
Automatic Export:
# Logs are automatically exported if OTEL_LOGS_EXPORTER=otlp
import logging
logger = logging.getLogger(__name__)
# This will appear in SigNoz
logger.info("User logged in", extra={"user_id": "123", "tenant_id": "abc"})
Structured Logging with Context:
from shared.monitoring.logs_exporter import add_log_context
# Add context that persists across log calls
log_ctx = add_log_context(
request_id="req_123",
user_id="user_456",
tenant_id="tenant_789"
)
# All subsequent logs include this context
log_ctx.info("Processing order") # Includes request_id, user_id, tenant_id
Trace Correlation:
from shared.monitoring.logs_exporter import get_current_trace_context
# Get trace context for correlation
trace_ctx = get_current_trace_context()
logger.info("Processing request", extra=trace_ctx)
# Logs now include trace_id and span_id for correlation
Verification
1. Check Service Health
# Check that services are exporting telemetry
kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel\|signoz"
# Expected output includes:
# - "Distributed tracing configured"
# - "OpenTelemetry logs export configured"
# - "OpenTelemetry metrics export configured"
2. Access SigNoz UI
# Port-forward (for local development)
kubectl port-forward -n signoz svc/signoz-frontend 3301:3301
# Or via Ingress
open https://monitoring.bakery-ia.local
3. Verify Data Ingestion
Traces:
- Go to SigNoz UI → Traces
- You should see traces from your services
- Click on a trace to see the full span tree
Metrics:
- Go to SigNoz UI → Metrics
- Query:
http_requests_total - Filter by service:
service="auth-service"
Logs:
- Go to SigNoz UI → Logs
- Filter by service:
service_name="auth-service" - Search for specific log messages
4. Test Trace-Log Correlation
- Find a trace in SigNoz UI
- Copy the
trace_id - Go to Logs tab
- Search:
trace_id="<your-trace-id>" - You should see all logs for that trace
Troubleshooting
No Data in SigNoz
1. Check OpenTelemetry Collector:
# Check collector logs
kubectl logs -n signoz deployment/signoz-otel-collector
# Should see:
# - "Receiver is starting"
# - "Exporter is starting"
# - No error messages
2. Check Service Configuration:
# Verify environment variables
kubectl get deployment auth-service -n bakery-ia -o yaml | grep -A 20 "env:"
# Verify annotations
kubectl get deployment auth-service -n bakery-ia -o yaml | grep -A 5 "annotations:"
3. Check Network Connectivity:
# Test from service pod
kubectl exec -n bakery-ia deployment/auth-service -- \
curl -v http://signoz-otel-collector.signoz.svc.cluster.local:4318/v1/traces
# Should return: 405 Method Not Allowed (POST required)
# If connection refused, check network policies
Traces Not Appearing
Check instrumentation:
# Verify tracing is enabled
import os
print(os.getenv("ENABLE_TRACING")) # Should be "true"
print(os.getenv("OTEL_COLLECTOR_ENDPOINT")) # Should be set
Check trace sampling:
# Verify sampling rate (default 100%)
kubectl logs -n bakery-ia deployment/auth-service | grep "sampling"
Metrics Not Appearing
1. Verify Prometheus annotations:
kubectl get pods -n bakery-ia -o yaml | grep "prometheus.io"
2. Test metrics endpoint:
# Port-forward service
kubectl port-forward -n bakery-ia deployment/auth-service 8000:8000
# Test endpoint
curl http://localhost:8000/metrics
# Should return Prometheus format metrics
3. Check SigNoz scrape configuration:
# Check collector config
kubectl get configmap -n signoz signoz-otel-collector -o yaml | grep -A 30 "prometheus:"
Logs Not Appearing
1. Verify log export is enabled:
kubectl get deployment auth-service -n bakery-ia -o yaml | grep OTEL_LOGS_EXPORTER
# Should return: OTEL_LOGS_EXPORTER=otlp
2. Check log format:
# Logs should be JSON formatted
kubectl logs -n bakery-ia deployment/auth-service | head -5
3. Verify OTLP endpoint:
# Test logs endpoint
kubectl exec -n bakery-ia deployment/auth-service -- \
curl -X POST http://signoz-otel-collector.signoz.svc.cluster.local:4318/v1/logs \
-H "Content-Type: application/json" \
-d '{"resourceLogs":[]}'
# Should return 200 OK or 400 Bad Request (not connection error)
Performance Tuning
For Development
The default configuration is optimized for local development with minimal resources.
For Production
Update the following in signoz-values-prod.yaml:
# Increase collector resources
otelCollector:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
# Increase batch sizes
config:
processors:
batch:
timeout: 10s
send_batch_size: 10000 # Increased from 1024
# Add more replicas
replicaCount: 2
Best Practices
- Use Structured Logging: Always use key-value pairs for better querying
- Add Context: Include user_id, tenant_id, request_id in logs
- Trace Business Operations: Add custom spans for important operations
- Monitor Collector Health: Set up alerts for collector errors
- Retention Policy: Configure ClickHouse retention based on needs
Additional Resources
Support
For issues or questions:
- Check SigNoz community: https://signoz.io/slack
- Review OpenTelemetry docs: https://opentelemetry.io/docs/
- Create issue in project repository