# SigNoz Telemetry Verification Guide ## Overview This guide explains how to verify that your services are correctly sending metrics, logs, and traces to SigNoz, and that SigNoz is collecting them properly. ## Current Configuration ### SigNoz Components - **Version**: v0.106.0 - **OTel Collector**: v0.129.12 - **Namespace**: `bakery-ia` - **Ingress URL**: https://monitoring.bakery-ia.local ### Telemetry Endpoints The OTel Collector exposes the following endpoints: | Protocol | Port | Purpose | |----------|------|---------| | OTLP gRPC | 4317 | Traces, Metrics, Logs (gRPC) | | OTLP HTTP | 4318 | Traces, Metrics, Logs (HTTP) | | Jaeger gRPC | 14250 | Jaeger traces (gRPC) | | Jaeger HTTP | 14268 | Jaeger traces (HTTP) | | Metrics | 8888 | Prometheus metrics from collector | | Health Check | 13133 | Collector health status | ### Service Configuration Services are configured via the `bakery-config` ConfigMap: ```yaml # Observability enabled ENABLE_TRACING: "true" ENABLE_METRICS: "true" ENABLE_LOGS: "true" # OTel Collector endpoint OTEL_EXPORTER_OTLP_ENDPOINT: "http://signoz-otel-collector.bakery-ia.svc.cluster.local:4317" OTEL_EXPORTER_OTLP_PROTOCOL: "grpc" ``` ### Shared Tracing Library Services use `shared/monitoring/tracing.py` which: - Auto-instruments FastAPI endpoints - Auto-instruments HTTPX (inter-service calls) - Auto-instruments Redis operations - Auto-instruments SQLAlchemy (PostgreSQL) - Uses OTLP exporter to send traces to SigNoz **Default endpoint**: `http://signoz-otel-collector.bakery-ia:4318` (HTTP) ## Verification Steps ### 1. Quick Verification Script Run the automated verification script: ```bash ./infrastructure/helm/verify-signoz-telemetry.sh ``` This script checks: - ✅ SigNoz components are running - ✅ OTel Collector endpoints are exposed - ✅ Configuration is correct - ✅ Health checks pass - ✅ Data is being collected in ClickHouse ### 2. Manual Verification #### Check SigNoz Components Status ```bash kubectl get pods -n bakery-ia | grep signoz ``` Expected output: ``` signoz-0 1/1 Running signoz-otel-collector-xxxxx 1/1 Running chi-signoz-clickhouse-cluster-0-0-0 1/1 Running signoz-zookeeper-0 1/1 Running signoz-clickhouse-operator-xxxxx 2/2 Running ``` #### Check OTel Collector Logs ```bash kubectl logs -n bakery-ia -l app.kubernetes.io/component=otel-collector --tail=50 ``` Look for: - `"msg":"Everything is ready. Begin running and processing data."` - No error messages about invalid processors - Evidence of data reception (traces/metrics/logs) #### Check Service Logs for Tracing ```bash # Check a specific service (e.g., gateway) kubectl logs -n bakery-ia -l app=gateway --tail=100 | grep -i "tracing\|otel" ``` Expected output: ``` Distributed tracing configured service=gateway-service otel_endpoint=http://signoz-otel-collector.bakery-ia:4318 ``` ### 3. Generate Test Traffic Run the traffic generation script: ```bash ./infrastructure/helm/generate-test-traffic.sh ``` This script: 1. Makes API calls to various service endpoints 2. Checks service logs for telemetry 3. Waits for data processing (30 seconds) ### 4. Verify Data in ClickHouse ```bash # Get ClickHouse password CH_PASSWORD=$(kubectl get secret -n bakery-ia signoz-clickhouse -o jsonpath='{.data.admin-password}' 2>/dev/null | base64 -d) # Get ClickHouse pod CH_POD=$(kubectl get pods -n bakery-ia -l clickhouse.altinity.com/chi=signoz-clickhouse -o jsonpath='{.items[0].metadata.name}') # Check traces kubectl exec -n bakery-ia $CH_POD -- clickhouse-client --user=admin --password=$CH_PASSWORD --query=" SELECT serviceName, COUNT() as trace_count, min(timestamp) as first_trace, max(timestamp) as last_trace FROM signoz_traces.signoz_index_v2 WHERE timestamp >= now() - INTERVAL 1 HOUR GROUP BY serviceName ORDER BY trace_count DESC " # Check metrics kubectl exec -n bakery-ia $CH_POD -- clickhouse-client --user=admin --password=$CH_PASSWORD --query=" SELECT metric_name, COUNT() as sample_count FROM signoz_metrics.samples_v4 WHERE unix_milli >= toUnixTimestamp(now() - INTERVAL 1 HOUR) * 1000 GROUP BY metric_name ORDER BY sample_count DESC LIMIT 10 " # Check logs kubectl exec -n bakery-ia $CH_POD -- clickhouse-client --user=admin --password=$CH_PASSWORD --query=" SELECT COUNT() as log_count, min(timestamp) as first_log, max(timestamp) as last_log FROM signoz_logs.logs WHERE timestamp >= now() - INTERVAL 1 HOUR " ``` ### 5. Access SigNoz UI #### Via Ingress (Recommended) 1. Add to `/etc/hosts`: ``` 127.0.0.1 monitoring.bakery-ia.local ``` 2. Access: https://monitoring.bakery-ia.local #### Via Port-Forward ```bash kubectl port-forward -n bakery-ia svc/signoz 3301:8080 ``` Then access: http://localhost:3301 ### 6. Explore Telemetry Data in SigNoz UI 1. **Traces**: - Go to "Services" tab - You should see your services listed (gateway, auth-service, inventory-service, etc.) - Click on a service to see its traces - Click on individual traces to see span details 2. **Metrics**: - Go to "Dashboards" or "Metrics" tab - Should see infrastructure metrics (PostgreSQL, Redis, RabbitMQ) - Should see service metrics (request rate, latency, errors) 3. **Logs**: - Go to "Logs" tab - Should see logs from your services - Can filter by service name, log level, etc. ## Troubleshooting ### Services Can't Connect to OTel Collector **Symptoms**: ``` [ERROR] opentelemetry.exporter.otlp.proto.grpc.exporter: Failed to export traces error code: StatusCode.UNAVAILABLE ``` **Solutions**: 1. **Check OTel Collector is running**: ```bash kubectl get pods -n bakery-ia -l app.kubernetes.io/component=otel-collector ``` 2. **Verify service can reach collector**: ```bash # From a service pod kubectl exec -it -n bakery-ia -- curl -v http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318 ``` 3. **Check endpoint configuration**: - gRPC endpoint should NOT have `http://` prefix - HTTP endpoint should have `http://` prefix Update your service's tracing setup: ```python # For gRPC (recommended) setup_tracing(app, "my-service", otel_endpoint="signoz-otel-collector.bakery-ia.svc.cluster.local:4317") # For HTTP setup_tracing(app, "my-service", otel_endpoint="http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318") ``` 4. **Restart services after config changes**: ```bash kubectl rollout restart deployment/ -n bakery-ia ``` ### No Data in SigNoz **Possible causes**: 1. **Services haven't been called yet** - Solution: Generate traffic using the test script 2. **Tracing not initialized** - Check service logs for tracing initialization messages - Verify `ENABLE_TRACING=true` in ConfigMap 3. **Wrong OTel endpoint** - Verify `OTEL_EXPORTER_OTLP_ENDPOINT` in ConfigMap - Should be: `http://signoz-otel-collector.bakery-ia.svc.cluster.local:4317` 4. **Service not using tracing library** - Check if service imports and calls `setup_tracing()` in main.py ```python from shared.monitoring.tracing import setup_tracing app = FastAPI(title="My Service") setup_tracing(app, "my-service") ``` ### OTel Collector Errors **Check collector logs**: ```bash kubectl logs -n bakery-ia -l app.kubernetes.io/component=otel-collector --tail=100 ``` **Common errors**: 1. **Invalid processor error**: - Check `signoz-values-dev.yaml` has `signozspanmetrics/delta` (not `spanmetrics`) - Already fixed in your configuration 2. **ClickHouse connection error**: - Verify ClickHouse is running - Check ClickHouse service is accessible 3. **Configuration validation error**: - Validate YAML syntax in `signoz-values-dev.yaml` - Check all processors used in pipelines are defined ## Infrastructure Metrics SigNoz automatically collects metrics from your infrastructure: ### PostgreSQL Databases - **Receivers configured for**: - auth_db (auth-db-service:5432) - inventory_db (inventory-db-service:5432) - orders_db (orders-db-service:5432) - **Metrics collected**: - Connection counts - Query performance - Database size - Table statistics ### Redis - **Endpoint**: redis-service:6379 - **Metrics collected**: - Memory usage - Keys count - Hit/miss ratio - Command stats ### RabbitMQ - **Endpoint**: rabbitmq-service:15672 (management API) - **Metrics collected**: - Queue lengths - Message rates - Connection counts - Consumer activity ## Best Practices ### 1. Service Implementation Always initialize tracing in your service's `main.py`: ```python from fastapi import FastAPI from shared.monitoring.tracing import setup_tracing import os app = FastAPI(title="My Service") # Initialize tracing otel_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://signoz-otel-collector.bakery-ia:4318") setup_tracing( app, service_name="my-service", service_version=os.getenv("SERVICE_VERSION", "1.0.0"), otel_endpoint=otel_endpoint ) ``` ### 2. Custom Spans Add custom spans for important operations: ```python from opentelemetry import trace tracer = trace.get_tracer(__name__) @app.post("/process") async def process_data(data: dict): with tracer.start_as_current_span("process_data") as span: span.set_attribute("data.size", len(data)) span.set_attribute("data.type", data.get("type")) # Your processing logic result = process(data) span.set_attribute("result.status", "success") return result ``` ### 3. Error Tracking Record exceptions in spans: ```python from shared.monitoring.tracing import record_exception try: result = risky_operation() except Exception as e: record_exception(e) raise ``` ### 4. Correlation Use trace IDs in logs for correlation: ```python from shared.monitoring.tracing import get_current_trace_id trace_id = get_current_trace_id() logger.info("Processing request", trace_id=trace_id) ``` ## Next Steps 1. ✅ **Verify SigNoz is running** - Run verification script 2. ✅ **Generate test traffic** - Run traffic generation script 3. ✅ **Check data collection** - Query ClickHouse or use UI 4. ✅ **Access SigNoz UI** - Visualize traces, metrics, and logs 5. ⏭️ **Set up dashboards** - Create custom dashboards for your use cases 6. ⏭️ **Configure alerts** - Set up alerts for critical metrics 7. ⏭️ **Document** - Document common queries and dashboard configurations ## Useful Commands ```bash # Quick status check kubectl get pods -n bakery-ia | grep signoz # View OTel Collector metrics kubectl port-forward -n bakery-ia svc/signoz-otel-collector 8888:8888 # Then visit: http://localhost:8888/metrics # Restart OTel Collector kubectl rollout restart deployment/signoz-otel-collector -n bakery-ia # View all services with telemetry kubectl get pods -n bakery-ia -l tier!=infrastructure # Check specific service logs kubectl logs -n bakery-ia -l app= --tail=100 -f # Port-forward to SigNoz UI kubectl port-forward -n bakery-ia svc/signoz 3301:8080 ``` ## Resources - [SigNoz Documentation](https://signoz.io/docs/) - [OpenTelemetry Python](https://opentelemetry.io/docs/languages/python/) - [SigNoz GitHub](https://github.com/SigNoz/signoz) - [Helm Chart Values](infrastructure/helm/signoz-values-dev.yaml) - [Verification Script](infrastructure/helm/verify-signoz-telemetry.sh) - [Traffic Generation Script](infrastructure/helm/generate-test-traffic.sh)