Files
bakery-ia/docs/SIGNOZ_VERIFICATION_GUIDE.md
2026-01-09 11:18:20 +01:00

11 KiB

SigNoz Telemetry Verification Guide

Overview

This guide explains how to verify that your services are correctly sending metrics, logs, and traces to SigNoz, and that SigNoz is collecting them properly.

Current Configuration

SigNoz Components

Telemetry Endpoints

The OTel Collector exposes the following endpoints:

Protocol Port Purpose
OTLP gRPC 4317 Traces, Metrics, Logs (gRPC)
OTLP HTTP 4318 Traces, Metrics, Logs (HTTP)
Jaeger gRPC 14250 Jaeger traces (gRPC)
Jaeger HTTP 14268 Jaeger traces (HTTP)
Metrics 8888 Prometheus metrics from collector
Health Check 13133 Collector health status

Service Configuration

Services are configured via the bakery-config ConfigMap:

# Observability enabled
ENABLE_TRACING: "true"
ENABLE_METRICS: "true"
ENABLE_LOGS: "true"

# OTel Collector endpoint
OTEL_EXPORTER_OTLP_ENDPOINT: "http://signoz-otel-collector.bakery-ia.svc.cluster.local:4317"
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"

Shared Tracing Library

Services use shared/monitoring/tracing.py which:

  • Auto-instruments FastAPI endpoints
  • Auto-instruments HTTPX (inter-service calls)
  • Auto-instruments Redis operations
  • Auto-instruments SQLAlchemy (PostgreSQL)
  • Uses OTLP exporter to send traces to SigNoz

Default endpoint: http://signoz-otel-collector.bakery-ia:4318 (HTTP)

Verification Steps

1. Quick Verification Script

Run the automated verification script:

./infrastructure/helm/verify-signoz-telemetry.sh

This script checks:

  • SigNoz components are running
  • OTel Collector endpoints are exposed
  • Configuration is correct
  • Health checks pass
  • Data is being collected in ClickHouse

2. Manual Verification

Check SigNoz Components Status

kubectl get pods -n bakery-ia | grep signoz

Expected output:

signoz-0                                      1/1     Running
signoz-otel-collector-xxxxx                   1/1     Running
chi-signoz-clickhouse-cluster-0-0-0           1/1     Running
signoz-zookeeper-0                            1/1     Running
signoz-clickhouse-operator-xxxxx              2/2     Running

Check OTel Collector Logs

kubectl logs -n bakery-ia -l app.kubernetes.io/component=otel-collector --tail=50

Look for:

  • "msg":"Everything is ready. Begin running and processing data."
  • No error messages about invalid processors
  • Evidence of data reception (traces/metrics/logs)

Check Service Logs for Tracing

# Check a specific service (e.g., gateway)
kubectl logs -n bakery-ia -l app=gateway --tail=100 | grep -i "tracing\|otel"

Expected output:

Distributed tracing configured
service=gateway-service
otel_endpoint=http://signoz-otel-collector.bakery-ia:4318

3. Generate Test Traffic

Run the traffic generation script:

./infrastructure/helm/generate-test-traffic.sh

This script:

  1. Makes API calls to various service endpoints
  2. Checks service logs for telemetry
  3. Waits for data processing (30 seconds)

4. Verify Data in ClickHouse

# Get ClickHouse password
CH_PASSWORD=$(kubectl get secret -n bakery-ia signoz-clickhouse -o jsonpath='{.data.admin-password}' 2>/dev/null | base64 -d)

# Get ClickHouse pod
CH_POD=$(kubectl get pods -n bakery-ia -l clickhouse.altinity.com/chi=signoz-clickhouse -o jsonpath='{.items[0].metadata.name}')

# Check traces
kubectl exec -n bakery-ia $CH_POD -- clickhouse-client --user=admin --password=$CH_PASSWORD --query="
SELECT
    serviceName,
    COUNT() as trace_count,
    min(timestamp) as first_trace,
    max(timestamp) as last_trace
FROM signoz_traces.signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY serviceName
ORDER BY trace_count DESC
"

# Check metrics
kubectl exec -n bakery-ia $CH_POD -- clickhouse-client --user=admin --password=$CH_PASSWORD --query="
SELECT
    metric_name,
    COUNT() as sample_count
FROM signoz_metrics.samples_v4
WHERE unix_milli >= toUnixTimestamp(now() - INTERVAL 1 HOUR) * 1000
GROUP BY metric_name
ORDER BY sample_count DESC
LIMIT 10
"

# Check logs
kubectl exec -n bakery-ia $CH_POD -- clickhouse-client --user=admin --password=$CH_PASSWORD --query="
SELECT
    COUNT() as log_count,
    min(timestamp) as first_log,
    max(timestamp) as last_log
FROM signoz_logs.logs
WHERE timestamp >= now() - INTERVAL 1 HOUR
"

5. Access SigNoz UI

  1. Add to /etc/hosts:

    127.0.0.1 monitoring.bakery-ia.local
    
  2. Access: https://monitoring.bakery-ia.local

Via Port-Forward

kubectl port-forward -n bakery-ia svc/signoz 3301:8080

Then access: http://localhost:3301

6. Explore Telemetry Data in SigNoz UI

  1. Traces:

    • Go to "Services" tab
    • You should see your services listed (gateway, auth-service, inventory-service, etc.)
    • Click on a service to see its traces
    • Click on individual traces to see span details
  2. Metrics:

    • Go to "Dashboards" or "Metrics" tab
    • Should see infrastructure metrics (PostgreSQL, Redis, RabbitMQ)
    • Should see service metrics (request rate, latency, errors)
  3. Logs:

    • Go to "Logs" tab
    • Should see logs from your services
    • Can filter by service name, log level, etc.

Troubleshooting

Services Can't Connect to OTel Collector

Symptoms:

[ERROR] opentelemetry.exporter.otlp.proto.grpc.exporter: Failed to export traces
error code: StatusCode.UNAVAILABLE

Solutions:

  1. Check OTel Collector is running:

    kubectl get pods -n bakery-ia -l app.kubernetes.io/component=otel-collector
    
  2. Verify service can reach collector:

    # From a service pod
    kubectl exec -it -n bakery-ia <service-pod> -- curl -v http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318
    
  3. Check endpoint configuration:

    • gRPC endpoint should NOT have http:// prefix
    • HTTP endpoint should have http:// prefix

    Update your service's tracing setup:

    # For gRPC (recommended)
    setup_tracing(app, "my-service", otel_endpoint="signoz-otel-collector.bakery-ia.svc.cluster.local:4317")
    
    # For HTTP
    setup_tracing(app, "my-service", otel_endpoint="http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318")
    
  4. Restart services after config changes:

    kubectl rollout restart deployment/<service-name> -n bakery-ia
    

No Data in SigNoz

Possible causes:

  1. Services haven't been called yet

    • Solution: Generate traffic using the test script
  2. Tracing not initialized

    • Check service logs for tracing initialization messages
    • Verify ENABLE_TRACING=true in ConfigMap
  3. Wrong OTel endpoint

    • Verify OTEL_EXPORTER_OTLP_ENDPOINT in ConfigMap
    • Should be: http://signoz-otel-collector.bakery-ia.svc.cluster.local:4317
  4. Service not using tracing library

    • Check if service imports and calls setup_tracing() in main.py
    from shared.monitoring.tracing import setup_tracing
    
    app = FastAPI(title="My Service")
    setup_tracing(app, "my-service")
    

OTel Collector Errors

Check collector logs:

kubectl logs -n bakery-ia -l app.kubernetes.io/component=otel-collector --tail=100

Common errors:

  1. Invalid processor error:

    • Check signoz-values-dev.yaml has signozspanmetrics/delta (not spanmetrics)
    • Already fixed in your configuration
  2. ClickHouse connection error:

    • Verify ClickHouse is running
    • Check ClickHouse service is accessible
  3. Configuration validation error:

    • Validate YAML syntax in signoz-values-dev.yaml
    • Check all processors used in pipelines are defined

Infrastructure Metrics

SigNoz automatically collects metrics from your infrastructure:

PostgreSQL Databases

  • Receivers configured for:

    • auth_db (auth-db-service:5432)
    • inventory_db (inventory-db-service:5432)
    • orders_db (orders-db-service:5432)
  • Metrics collected:

    • Connection counts
    • Query performance
    • Database size
    • Table statistics

Redis

  • Endpoint: redis-service:6379
  • Metrics collected:
    • Memory usage
    • Keys count
    • Hit/miss ratio
    • Command stats

RabbitMQ

  • Endpoint: rabbitmq-service:15672 (management API)
  • Metrics collected:
    • Queue lengths
    • Message rates
    • Connection counts
    • Consumer activity

Best Practices

1. Service Implementation

Always initialize tracing in your service's main.py:

from fastapi import FastAPI
from shared.monitoring.tracing import setup_tracing
import os

app = FastAPI(title="My Service")

# Initialize tracing
otel_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://signoz-otel-collector.bakery-ia:4318")
setup_tracing(
    app,
    service_name="my-service",
    service_version=os.getenv("SERVICE_VERSION", "1.0.0"),
    otel_endpoint=otel_endpoint
)

2. Custom Spans

Add custom spans for important operations:

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

@app.post("/process")
async def process_data(data: dict):
    with tracer.start_as_current_span("process_data") as span:
        span.set_attribute("data.size", len(data))
        span.set_attribute("data.type", data.get("type"))

        # Your processing logic
        result = process(data)

        span.set_attribute("result.status", "success")
        return result

3. Error Tracking

Record exceptions in spans:

from shared.monitoring.tracing import record_exception

try:
    result = risky_operation()
except Exception as e:
    record_exception(e)
    raise

4. Correlation

Use trace IDs in logs for correlation:

from shared.monitoring.tracing import get_current_trace_id

trace_id = get_current_trace_id()
logger.info("Processing request", trace_id=trace_id)

Next Steps

  1. Verify SigNoz is running - Run verification script
  2. Generate test traffic - Run traffic generation script
  3. Check data collection - Query ClickHouse or use UI
  4. Access SigNoz UI - Visualize traces, metrics, and logs
  5. ⏭️ Set up dashboards - Create custom dashboards for your use cases
  6. ⏭️ Configure alerts - Set up alerts for critical metrics
  7. ⏭️ Document - Document common queries and dashboard configurations

Useful Commands

# Quick status check
kubectl get pods -n bakery-ia | grep signoz

# View OTel Collector metrics
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 8888:8888
# Then visit: http://localhost:8888/metrics

# Restart OTel Collector
kubectl rollout restart deployment/signoz-otel-collector -n bakery-ia

# View all services with telemetry
kubectl get pods -n bakery-ia -l tier!=infrastructure

# Check specific service logs
kubectl logs -n bakery-ia -l app=<service-name> --tail=100 -f

# Port-forward to SigNoz UI
kubectl port-forward -n bakery-ia svc/signoz 3301:8080

Resources