Files
bakery-ia/docs/DATABASE_MONITORING.md
2026-01-09 07:26:11 +01:00

16 KiB

Database Monitoring with SigNoz

This guide explains how to collect metrics and logs from PostgreSQL, Redis, and RabbitMQ databases and send them to SigNoz.

Table of Contents

  1. Overview
  2. PostgreSQL Monitoring
  3. Redis Monitoring
  4. RabbitMQ Monitoring
  5. Database Logs Export
  6. Dashboard Examples

Overview

Database monitoring provides:

  • Metrics: Connection pools, query performance, cache hit rates, disk usage
  • Logs: Query logs, error logs, slow query logs
  • Correlation: Link database metrics with application traces

Three approaches for database monitoring:

  1. OpenTelemetry Collector Receivers (Recommended)

    • Deploy OTel collector as sidecar or separate deployment
    • Scrape database metrics and forward to SigNoz
    • No code changes needed
  2. Application-Level Instrumentation (Already Implemented)

    • Use OpenTelemetry auto-instrumentation in your services
    • Captures database queries as spans in traces
    • Shows query duration, errors in application context
  3. Database Exporters (Advanced)

    • Dedicated exporters (postgres_exporter, redis_exporter)
    • More detailed database-specific metrics
    • Requires additional deployment

PostgreSQL Monitoring

Deploy an OpenTelemetry collector instance to scrape PostgreSQL metrics.

Step 1: Create PostgreSQL Monitoring User

-- Create monitoring user with read-only access
CREATE USER otel_monitor WITH PASSWORD 'your-secure-password';
GRANT pg_monitor TO otel_monitor;
GRANT CONNECT ON DATABASE your_database TO otel_monitor;

Step 2: Deploy OTel Collector for PostgreSQL

Create a dedicated collector deployment:

# infrastructure/kubernetes/base/monitoring/postgres-otel-collector.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-otel-collector
  namespace: bakery-ia
  labels:
    app: postgres-otel-collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres-otel-collector
  template:
    metadata:
      labels:
        app: postgres-otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:latest
        ports:
        - containerPort: 4318
          name: otlp-http
        - containerPort: 4317
          name: otlp-grpc
        volumeMounts:
        - name: config
          mountPath: /etc/otel-collector
        command:
          - /otelcol-contrib
          - --config=/etc/otel-collector/config.yaml
      volumes:
      - name: config
        configMap:
          name: postgres-otel-collector-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-otel-collector-config
  namespace: bakery-ia
data:
  config.yaml: |
    receivers:
      # PostgreSQL receiver for each database
      postgresql/auth:
        endpoint: auth-db-service:5432
        username: otel_monitor
        password: ${POSTGRES_MONITOR_PASSWORD}
        databases:
          - auth_db
        collection_interval: 30s
        metrics:
          postgresql.backends: true
          postgresql.bgwriter.buffers.allocated: true
          postgresql.bgwriter.buffers.writes: true
          postgresql.blocks_read: true
          postgresql.commits: true
          postgresql.connection.max: true
          postgresql.database.count: true
          postgresql.database.size: true
          postgresql.deadlocks: true
          postgresql.index.scans: true
          postgresql.index.size: true
          postgresql.operations: true
          postgresql.rollbacks: true
          postgresql.rows: true
          postgresql.table.count: true
          postgresql.table.size: true
          postgresql.temp_files: true

      postgresql/inventory:
        endpoint: inventory-db-service:5432
        username: otel_monitor
        password: ${POSTGRES_MONITOR_PASSWORD}
        databases:
          - inventory_db
        collection_interval: 30s

      # Add more PostgreSQL receivers for other databases...

    processors:
      batch:
        timeout: 10s
        send_batch_size: 1024

      memory_limiter:
        check_interval: 1s
        limit_mib: 512

      resourcedetection:
        detectors: [env, system]

      # Add database labels
      resource:
        attributes:
          - key: database.system
            value: postgresql
            action: insert
          - key: deployment.environment
            value: ${ENVIRONMENT}
            action: insert

    exporters:
      # Send to SigNoz
      otlphttp:
        endpoint: http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318
        tls:
          insecure: true

      # Debug logging
      logging:
        loglevel: info

    service:
      pipelines:
        metrics:
          receivers: [postgresql/auth, postgresql/inventory]
          processors: [memory_limiter, resource, batch, resourcedetection]
          exporters: [otlphttp, logging]

Step 3: Create Secrets

# Create secret for monitoring user password
kubectl create secret generic postgres-monitor-secrets \
  -n bakery-ia \
  --from-literal=POSTGRES_MONITOR_PASSWORD='your-secure-password'

Step 4: Deploy

kubectl apply -f infrastructure/kubernetes/base/monitoring/postgres-otel-collector.yaml

Option 2: Application-Level Database Metrics (Already Implemented)

Your services already collect database metrics via SQLAlchemy instrumentation:

Metrics automatically collected:

  • db.client.connections.usage - Active database connections
  • db.client.operation.duration - Query duration (SELECT, INSERT, UPDATE, DELETE)
  • Query traces with SQL statements (in trace spans)

View in SigNoz:

  1. Go to Traces → Select a service → Filter by db.operation
  2. See individual database queries with duration
  3. Identify slow queries causing latency

PostgreSQL Metrics Reference

Metric Description
postgresql.backends Number of active connections
postgresql.database.size Database size in bytes
postgresql.commits Transaction commits
postgresql.rollbacks Transaction rollbacks
postgresql.deadlocks Deadlock count
postgresql.blocks_read Blocks read from disk
postgresql.table.size Table size in bytes
postgresql.index.size Index size in bytes
postgresql.rows Rows inserted/updated/deleted

Redis Monitoring

# Add to postgres-otel-collector config or create separate collector
receivers:
  redis:
    endpoint: redis-service.bakery-ia:6379
    password: ${REDIS_PASSWORD}
    collection_interval: 30s
    tls:
      insecure_skip_verify: false
      cert_file: /etc/redis-tls/redis-cert.pem
      key_file: /etc/redis-tls/redis-key.pem
      ca_file: /etc/redis-tls/ca-cert.pem
    metrics:
      redis.clients.connected: true
      redis.clients.blocked: true
      redis.commands.processed: true
      redis.commands.duration: true
      redis.db.keys: true
      redis.db.expires: true
      redis.keyspace.hits: true
      redis.keyspace.misses: true
      redis.memory.used: true
      redis.memory.peak: true
      redis.memory.fragmentation_ratio: true
      redis.cpu.time: true
      redis.replication.offset: true

Option 2: Application-Level Redis Metrics (Already Implemented)

Your services already collect Redis metrics via Redis instrumentation:

Metrics automatically collected:

  • Redis command traces (GET, SET, etc.) in spans
  • Command duration
  • Command errors

Redis Metrics Reference

Metric Description
redis.clients.connected Connected clients
redis.commands.processed Total commands processed
redis.keyspace.hits Cache hit rate
redis.keyspace.misses Cache miss rate
redis.memory.used Memory usage in bytes
redis.memory.fragmentation_ratio Memory fragmentation
redis.db.keys Number of keys per database

RabbitMQ Monitoring

RabbitMQ exposes metrics via its management API.

receivers:
  rabbitmq:
    endpoint: http://rabbitmq-service.bakery-ia:15672
    username: ${RABBITMQ_USER}
    password: ${RABBITMQ_PASSWORD}
    collection_interval: 30s
    metrics:
      rabbitmq.consumer.count: true
      rabbitmq.message.current: true
      rabbitmq.message.acknowledged: true
      rabbitmq.message.delivered: true
      rabbitmq.message.published: true
      rabbitmq.queue.count: true

RabbitMQ Metrics Reference

Metric Description
rabbitmq.consumer.count Active consumers
rabbitmq.message.current Messages in queue
rabbitmq.message.acknowledged Messages acknowledged
rabbitmq.message.delivered Messages delivered
rabbitmq.message.published Messages published
rabbitmq.queue.count Number of queues

Database Logs Export

PostgreSQL Logs

Option 1: Configure PostgreSQL to Log to Stdout (Kubernetes-native)

PostgreSQL logs should go to stdout/stderr, which Kubernetes automatically captures.

Update PostgreSQL configuration:

# In your postgres deployment ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: bakery-ia
data:
  postgresql.conf: |
    # Logging
    logging_collector = off  # Use stdout/stderr instead
    log_destination = 'stderr'
    log_statement = 'all'  # Or 'ddl', 'mod', 'none'
    log_duration = on
    log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '
    log_min_duration_statement = 100  # Log queries > 100ms
    log_checkpoints = on
    log_connections = on
    log_disconnections = on
    log_lock_waits = on

Option 2: OpenTelemetry Filelog Receiver

If PostgreSQL writes to files, use filelog receiver:

receivers:
  filelog/postgres:
    include:
      - /var/log/postgresql/*.log
    start_at: end
    operators:
      - type: regex_parser
        regex: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+) \[(?P<pid>\d+)\]: user=(?P<user>[^,]+),db=(?P<database>[^,]+),app=(?P<application>[^,]+),client=(?P<client>[^ ]+) (?P<level>[A-Z]+):  (?P<message>.*)'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%d %H:%M:%S.%f'
      - type: move
        from: attributes.level
        to: severity
      - type: add
        field: attributes["database.system"]
        value: "postgresql"

processors:
  resource/postgres:
    attributes:
      - key: database.system
        value: postgresql
        action: insert
      - key: service.name
        value: postgres-logs
        action: insert

exporters:
  otlphttp/logs:
    endpoint: http://signoz-otel-collector.bakery-ia.svc.cluster.local:4318/v1/logs

service:
  pipelines:
    logs/postgres:
      receivers: [filelog/postgres]
      processors: [resource/postgres, batch]
      exporters: [otlphttp/logs]

Redis Logs

Redis logs should go to stdout, which Kubernetes captures automatically. View them in SigNoz by:

  1. Ensuring Redis pods log to stdout
  2. No additional configuration needed - Kubernetes logs are available
  3. Optional: Use Kubernetes logs collection (see below)

Kubernetes Logs Collection (All Pods)

Deploy a DaemonSet to collect all Kubernetes pod logs:

# infrastructure/kubernetes/base/monitoring/logs-collector-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-logs-collector
  namespace: bakery-ia
spec:
  selector:
    matchLabels:
      name: otel-logs-collector
  template:
    metadata:
      labels:
        name: otel-logs-collector
    spec:
      serviceAccountName: otel-logs-collector
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:latest
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config
          mountPath: /etc/otel-collector
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: config
        configMap:
          name: otel-logs-collector-config
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-logs-collector
rules:
- apiGroups: [""]
  resources: ["pods", "namespaces"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-logs-collector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-logs-collector
subjects:
- kind: ServiceAccount
  name: otel-logs-collector
  namespace: bakery-ia
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-logs-collector
  namespace: bakery-ia

Dashboard Examples

PostgreSQL Dashboard in SigNoz

Create a custom dashboard with these panels:

  1. Active Connections

    • Query: postgresql.backends
    • Group by: database.name
  2. Query Rate

    • Query: rate(postgresql.commits[5m])
  3. Database Size

    • Query: postgresql.database.size
    • Group by: database.name
  4. Slow Queries

    • Go to Traces
    • Filter: db.system="postgresql" AND duration > 1s
    • See slow queries with full SQL
  5. Connection Pool Usage

    • Query: db.client.connections.usage
    • Group by: service

Redis Dashboard

  1. Hit Rate

    • Query: redis.keyspace.hits / (redis.keyspace.hits + redis.keyspace.misses)
  2. Memory Usage

    • Query: redis.memory.used
  3. Connected Clients

    • Query: redis.clients.connected
  4. Commands Per Second

    • Query: rate(redis.commands.processed[1m])

Quick Reference: What's Monitored

Database Metrics Logs Traces
PostgreSQL Via receiver
Via app instrumentation
Stdout/stderr
Optional filelog
Query spans in traces
Redis Via receiver
Via app instrumentation
Stdout/stderr Command spans in traces
RabbitMQ Via receiver Stdout/stderr Publish/consume spans

Deployment Checklist

  • Deploy OpenTelemetry collector for database metrics
  • Create monitoring users in PostgreSQL
  • Configure database logging to stdout
  • Verify metrics appear in SigNoz
  • Create database dashboards
  • Set up alerts for connection limits, slow queries, high memory

Troubleshooting

No PostgreSQL metrics

# Check collector logs
kubectl logs -n bakery-ia deployment/postgres-otel-collector

# Test connection to database
kubectl exec -n bakery-ia deployment/postgres-otel-collector -- \
  psql -h auth-db-service -U otel_monitor -d auth_db -c "SELECT 1"

No Redis metrics

# Check Redis connection
kubectl exec -n bakery-ia deployment/postgres-otel-collector -- \
  redis-cli -h redis-service -a PASSWORD ping

Logs not appearing

# Check if logs are going to stdout
kubectl logs -n bakery-ia postgres-pod-name

# Check logs collector
kubectl logs -n bakery-ia daemonset/otel-logs-collector

Best Practices

  1. Use dedicated monitoring users - Don't use application database users
  2. Set appropriate collection intervals - 30s-60s for metrics
  3. Monitor connection pool saturation - Alert before exhausting connections
  4. Track slow queries - Set log_min_duration_statement appropriately
  5. Monitor disk usage - PostgreSQL database size growth
  6. Track cache hit rates - Redis keyspace hits/misses ratio

Additional Resources