Files
bakery-ia/docs/MONITORING_QUICKSTART.md
Urtzi Alfaro 29d19087f1 Update monitoring packages to latest versions
- Updated all OpenTelemetry packages to latest versions:
  - opentelemetry-api: 1.27.0 → 1.39.1
  - opentelemetry-sdk: 1.27.0 → 1.39.1
  - opentelemetry-exporter-otlp-proto-grpc: 1.27.0 → 1.39.1
  - opentelemetry-exporter-otlp-proto-http: 1.27.0 → 1.39.1
  - opentelemetry-instrumentation-fastapi: 0.48b0 → 0.60b1
  - opentelemetry-instrumentation-httpx: 0.48b0 → 0.60b1
  - opentelemetry-instrumentation-redis: 0.48b0 → 0.60b1
  - opentelemetry-instrumentation-sqlalchemy: 0.48b0 → 0.60b1

- Removed prometheus-client==0.23.1 from all services
- Unified all services to use the same monitoring package versions

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-01-08 19:25:52 +01:00

9.5 KiB

SigNoz Monitoring Quick Start

Get complete observability (metrics, logs, traces, system metrics) in under 10 minutes using OpenTelemetry.

What You'll Get

Distributed Tracing - Complete request flows across all services Application Metrics - HTTP requests, durations, error rates, custom business metrics System Metrics - CPU usage, memory usage, disk I/O, network I/O per service Structured Logs - Searchable logs correlated with traces Unified Dashboard - Single UI for all telemetry data

All data pushed via OpenTelemetry OTLP protocol - No Prometheus, no scraping needed!

Prerequisites

  • Kubernetes cluster running (Kind/Minikube/Production)
  • Helm 3.x installed
  • kubectl configured

Step 1: Deploy SigNoz

# Add Helm repository
helm repo add signoz https://charts.signoz.io
helm repo update

# Create namespace
kubectl create namespace signoz

# Install SigNoz
helm install signoz signoz/signoz \
  -n signoz \
  -f infrastructure/helm/signoz-values-dev.yaml

# Wait for pods to be ready (2-3 minutes)
kubectl wait --for=condition=ready pod -l app=signoz -n signoz --timeout=300s

Step 2: Configure Services

Each service needs OpenTelemetry environment variables. The auth-service is already configured as an example.

Quick Configuration (for remaining services)

Add these environment variables to each service deployment:

env:
  # OpenTelemetry Collector endpoint
  - name: OTEL_COLLECTOR_ENDPOINT
    value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
  - name: OTEL_SERVICE_NAME
    value: "your-service-name"  # e.g., "inventory-service"

  # Enable tracing
  - name: ENABLE_TRACING
    value: "true"

  # Enable logs export
  - name: OTEL_LOGS_EXPORTER
    value: "otlp"

  # Enable metrics export (includes system metrics)
  - name: ENABLE_OTEL_METRICS
    value: "true"
  - name: ENABLE_SYSTEM_METRICS
    value: "true"

Using the Configuration Script

# Generate configuration patches for all services
./infrastructure/kubernetes/add-monitoring-config.sh

# This creates /tmp/*-otel-patch.yaml files
# Review and manually add to each service deployment

Step 3: Deploy Updated Services

# Apply updated configurations
kubectl apply -k infrastructure/kubernetes/overlays/dev/

# Or restart services to pick up new env vars
kubectl rollout restart deployment -n bakery-ia

# Wait for rollout
kubectl rollout status deployment -n bakery-ia --timeout=5m

Step 4: Access SigNoz UI

Via Ingress

# Add to /etc/hosts if needed
echo "127.0.0.1 monitoring.bakery-ia.local" | sudo tee -a /etc/hosts

# Access UI
open https://monitoring.bakery-ia.local

Via Port Forward

kubectl port-forward -n signoz svc/signoz-frontend 3301:3301
open http://localhost:3301

Step 5: Explore Your Data

Traces

  1. Go to Services tab
  2. See all your services listed
  3. Click on a service → View traces
  4. Click on a trace → See detailed span tree with timing

Metrics

HTTP Metrics (automatically collected):

  • http_requests_total - Total requests by method, endpoint, status
  • http_request_duration_seconds - Request latency
  • active_requests - Current active HTTP requests

System Metrics (automatically collected per service):

  • process.cpu.utilization - Process CPU usage %
  • process.memory.usage - Process memory in bytes
  • process.memory.utilization - Process memory %
  • process.threads.count - Number of threads
  • system.cpu.utilization - System-wide CPU %
  • system.memory.usage - System memory usage
  • system.disk.io.read - Disk bytes read
  • system.disk.io.write - Disk bytes written
  • system.network.io.sent - Network bytes sent
  • system.network.io.received - Network bytes received

Custom Business Metrics (if configured):

  • User registrations
  • Orders created
  • Login attempts
  • etc.

Logs

  1. Go to Logs tab
  2. Filter by service: service_name="auth-service"
  3. Search for specific messages
  4. See structured fields (user_id, tenant_id, etc.)

Trace-Log Correlation

  1. Find a trace in Traces tab
  2. Note the trace_id
  3. Go to Logs tab
  4. Filter: trace_id="<the-trace-id>"
  5. See all logs for that specific request!

Verification Commands

# Check if services are sending telemetry
kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel"

# Check SigNoz collector is receiving data
kubectl logs -n signoz deployment/signoz-otel-collector | tail -50

# Test connectivity to collector
kubectl exec -n bakery-ia deployment/auth-service -- \
  curl -v http://signoz-otel-collector.signoz.svc.cluster.local:4318

Common Issues

No data in SigNoz

# 1. Verify environment variables are set
kubectl get deployment auth-service -n bakery-ia -o yaml | grep OTEL

# 2. Check collector logs
kubectl logs -n signoz deployment/signoz-otel-collector

# 3. Restart service
kubectl rollout restart deployment/auth-service -n bakery-ia

Services not appearing

# Check network connectivity
kubectl exec -n bakery-ia deployment/auth-service -- \
  curl http://signoz-otel-collector.signoz.svc.cluster.local:4318

# Should return: connection successful (not connection refused)

Architecture

┌─────────────────────────────────────────────┐
│         Your Microservices                   │
│  ┌──────┐  ┌──────┐  ┌──────┐              │
│  │ auth │  │ inv  │  │orders│  ...         │
│  └──┬───┘  └──┬───┘  └──┬───┘              │
│     │         │         │                    │
│     └─────────┴─────────┘                    │
│              │                               │
│         OTLP Push                            │
│  (traces, metrics, logs)                    │
└──────────────┼──────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│   SigNoz OpenTelemetry Collector             │
│   :4317 (gRPC)  :4318 (HTTP)                │
│                                              │
│   Receivers: OTLP only (no Prometheus)      │
│   Processors: batch, memory_limiter         │
│   Exporters: ClickHouse                     │
└──────────────┼──────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│         ClickHouse Database                   │
│   Stores: traces, metrics, logs              │
└──────────────┼──────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│       SigNoz Frontend UI                      │
│   monitoring.bakery-ia.local or :3301        │
└──────────────────────────────────────────────┘

What Makes This Different

Pure OpenTelemetry - No Prometheus involved:

  • All metrics pushed via OTLP (not scraped)
  • Automatic system metrics collection (CPU, memory, disk, network)
  • Unified data model for all telemetry
  • Native trace-metric-log correlation
  • Lower resource usage (no scraping overhead)

Next Steps

  • Create Dashboards - Build custom views for your metrics
  • Set Up Alerts - Configure alerts for errors, latency, resource usage
  • Explore System Metrics - Monitor CPU, memory per service
  • Query Logs - Use powerful log query language
  • Correlate Everything - Jump from traces → logs → metrics

Need Help?


Metrics You Get Out of the Box:

Category Metrics Description
HTTP http_requests_total Total requests by method, endpoint, status
HTTP http_request_duration_seconds Request latency histogram
HTTP active_requests Current active requests
Process process.cpu.utilization Process CPU usage %
Process process.memory.usage Process memory in bytes
Process process.memory.utilization Process memory %
Process process.threads.count Thread count
System system.cpu.utilization System CPU %
System system.memory.usage System memory usage
System system.memory.utilization System memory %
Disk system.disk.io.read Disk read bytes
Disk system.disk.io.write Disk write bytes
Network system.network.io.sent Network sent bytes
Network system.network.io.received Network received bytes