Documentation includes:
1. OpAMP Root Cause Analysis:
- Explains OpenAMP (Open Agent Management Protocol) functionality
- Documents how OpAMP was overwriting config with "nop" receivers
- Provides two solution paths:
* Option 1: Disable OpAMP (current solution)
* Option 2: Fix OpAMP server configuration (recommended for prod)
- References: SigNoz architecture and OTel collector docs
2. Database Receivers Configuration:
- PostgreSQL: Complete setup for 21 database instances
* SQL commands to create monitoring users
* Proper pg_monitor role permissions
* Environment variable configuration
- Redis: Configuration with/without TLS
* Uses existing redis-secrets
* Optional TLS certificate generation
- RabbitMQ: Management API setup
* Uses existing rabbitmq-secrets
* Port 15672 management interface
3. Automation Script:
- create-pg-monitoring-users.sh
- Creates monitoring user in all 21 PostgreSQL databases
- Generates secure random password
- Verifies permissions
- Provides next-step commands
Resources Referenced:
- PostgreSQL: https://signoz.io/docs/integrations/postgresql/
- Redis: https://signoz.io/blog/redis-opentelemetry/
- RabbitMQ: https://signoz.io/blog/opentelemetry-rabbitmq-metrics-monitoring/
- OpAMP: https://signoz.io/docs/operate/configuration/
- OTel Config: https://signoz.io/docs/opentelemetry-collection-agents/opentelemetry-collector/configuration/
Current Infrastructure Discovered:
- 21 PostgreSQL databases (all services have dedicated DBs)
- 1 Redis instance (password in redis-secrets)
- 1 RabbitMQ instance (credentials in rabbitmq-secrets)
Next Implementation Steps:
1. Run create-pg-monitoring-users.sh script
2. Create Kubernetes secrets for monitoring credentials
3. Update signoz-values-dev.yaml with receivers
4. Enable receivers in metrics pipeline
5. Test and verify metric collection
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
519 lines
15 KiB
Markdown
519 lines
15 KiB
Markdown
# SigNoz Complete Configuration Guide
|
|
|
|
## Root Cause Analysis and Solutions
|
|
|
|
This document provides a comprehensive analysis of the SigNoz telemetry collection issues and the proper configuration for all receivers.
|
|
|
|
---
|
|
|
|
## Problem 1: OpAMP Configuration Corruption
|
|
|
|
### Root Cause
|
|
|
|
**What is OpAMP?**
|
|
[OpAMP (Open Agent Management Protocol)](https://signoz.io/docs/operate/configuration/) is a protocol for remote configuration management in OpenTelemetry Collectors. In SigNoz, OpAMP runs a server that dynamically configures log pipelines in the SigNoz OTel collector.
|
|
|
|
**The Issue:**
|
|
- OpAMP was successfully connecting to the SigNoz backend and receiving remote configuration
|
|
- The remote configuration contained only `nop` (no-operation) receivers and exporters
|
|
- This overwrote the local collector configuration at runtime
|
|
- Result: The collector appeared healthy but couldn't receive or export any data
|
|
|
|
**Why This Happened:**
|
|
1. The SigNoz backend's OpAMP server was pushing an invalid/incomplete configuration
|
|
2. The collector's `--manager-config` flag pointed to OpAMP configuration
|
|
3. OpAMP's `--copy-path=/var/tmp/collector-config.yaml` overwrote the good config
|
|
|
|
### Solution Options
|
|
|
|
#### Option 1: Disable OpAMP (Current Solution)
|
|
|
|
Since OpAMP is pushing bad configuration and we have a working static configuration, we disabled it:
|
|
|
|
```bash
|
|
kubectl patch deployment -n bakery-ia signoz-otel-collector --type=json -p='[
|
|
{
|
|
"op": "replace",
|
|
"path": "/spec/template/spec/containers/0/args",
|
|
"value": [
|
|
"--config=/conf/otel-collector-config.yaml",
|
|
"--feature-gates=-pkg.translator.prometheus.NormalizeName"
|
|
]
|
|
}
|
|
]'
|
|
```
|
|
|
|
**Important:** This patch must be applied after every `helm install` or `helm upgrade` because the Helm chart doesn't support disabling OpAMP via values.
|
|
|
|
#### Option 2: Fix OpAMP Configuration (Recommended for Production)
|
|
|
|
To properly use OpAMP:
|
|
|
|
1. **Check SigNoz Backend Configuration:**
|
|
- Verify the SigNoz service is properly configured to serve OpAMP
|
|
- Check logs: `kubectl logs -n bakery-ia statefulset/signoz`
|
|
- Look for OpAMP-related errors
|
|
|
|
2. **Configure OpAMP Server Settings:**
|
|
According to [SigNoz configuration documentation](https://signoz.io/docs/operate/configuration/), set these environment variables in the SigNoz statefulset:
|
|
|
|
```yaml
|
|
signoz:
|
|
env:
|
|
OPAMP_ENABLED: "true"
|
|
OPAMP_SERVER_ENDPOINT: "ws://signoz:4320/v1/opamp"
|
|
```
|
|
|
|
3. **Verify OpAMP Configuration File:**
|
|
```bash
|
|
kubectl get configmap -n bakery-ia signoz-otel-collector -o yaml
|
|
```
|
|
|
|
Should contain:
|
|
```yaml
|
|
otel-collector-opamp-config.yaml: |
|
|
server_endpoint: "ws://signoz:4320/v1/opamp"
|
|
```
|
|
|
|
4. **Monitor OpAMP Status:**
|
|
```bash
|
|
kubectl logs -n bakery-ia deployment/signoz-otel-collector | grep opamp
|
|
```
|
|
|
|
### References
|
|
- [SigNoz Architecture](https://signoz.io/docs/architecture/)
|
|
- [OpenTelemetry Collector Configuration](https://signoz.io/docs/opentelemetry-collection-agents/opentelemetry-collector/configuration/)
|
|
- [SigNoz Helm Chart](https://github.com/SigNoz/charts)
|
|
|
|
---
|
|
|
|
## Problem 2: Database and Infrastructure Receivers Configuration
|
|
|
|
### Overview
|
|
|
|
You have the following infrastructure requiring monitoring:
|
|
|
|
- **21 PostgreSQL databases** (auth, inventory, orders, forecasting, production, etc.)
|
|
- **1 Redis instance** (caching layer)
|
|
- **1 RabbitMQ instance** (message queue)
|
|
|
|
All receivers were disabled because they lacked proper credentials and configuration.
|
|
|
|
---
|
|
|
|
## PostgreSQL Receiver Configuration
|
|
|
|
### Prerequisites
|
|
|
|
Based on [SigNoz PostgreSQL Integration Guide](https://signoz.io/docs/integrations/postgresql/), each PostgreSQL instance needs a monitoring user with proper permissions.
|
|
|
|
### Step 1: Create Monitoring Users
|
|
|
|
For each PostgreSQL database, create a dedicated monitoring user:
|
|
|
|
**For PostgreSQL 10 and newer:**
|
|
```sql
|
|
CREATE USER monitoring WITH PASSWORD 'your_secure_password';
|
|
GRANT pg_monitor TO monitoring;
|
|
GRANT SELECT ON pg_stat_database TO monitoring;
|
|
```
|
|
|
|
**For PostgreSQL 9.6 to 9.x:**
|
|
```sql
|
|
CREATE USER monitoring WITH PASSWORD 'your_secure_password';
|
|
GRANT SELECT ON pg_stat_database TO monitoring;
|
|
```
|
|
|
|
### Step 2: Create Monitoring User for All Databases
|
|
|
|
Run this script to create monitoring users in all PostgreSQL databases:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# File: infrastructure/scripts/create-pg-monitoring-users.sh
|
|
|
|
DATABASES=(
|
|
"auth-db"
|
|
"inventory-db"
|
|
"orders-db"
|
|
"ai-insights-db"
|
|
"alert-processor-db"
|
|
"demo-session-db"
|
|
"distribution-db"
|
|
"external-db"
|
|
"forecasting-db"
|
|
"notification-db"
|
|
"orchestrator-db"
|
|
"pos-db"
|
|
"procurement-db"
|
|
"production-db"
|
|
"recipes-db"
|
|
"sales-db"
|
|
"suppliers-db"
|
|
"tenant-db"
|
|
"training-db"
|
|
)
|
|
|
|
MONITORING_PASSWORD="monitoring_secure_pass_$(openssl rand -hex 16)"
|
|
|
|
echo "Creating monitoring users with password: $MONITORING_PASSWORD"
|
|
echo "Save this password for your SigNoz configuration!"
|
|
|
|
for db in "${DATABASES[@]}"; do
|
|
echo "Processing $db..."
|
|
kubectl exec -n bakery-ia deployment/$db -- psql -U postgres -c "
|
|
CREATE USER monitoring WITH PASSWORD '$MONITORING_PASSWORD';
|
|
GRANT pg_monitor TO monitoring;
|
|
GRANT SELECT ON pg_stat_database TO monitoring;
|
|
" 2>&1 | grep -v "already exists" || true
|
|
done
|
|
|
|
echo ""
|
|
echo "Monitoring users created!"
|
|
echo "Password: $MONITORING_PASSWORD"
|
|
```
|
|
|
|
### Step 3: Store Credentials in Kubernetes Secret
|
|
|
|
```bash
|
|
kubectl create secret generic -n bakery-ia postgres-monitoring-secrets \
|
|
--from-literal=POSTGRES_MONITOR_USER=monitoring \
|
|
--from-literal=POSTGRES_MONITOR_PASSWORD=<password-from-script>
|
|
```
|
|
|
|
### Step 4: Configure PostgreSQL Receivers in SigNoz
|
|
|
|
Update `infrastructure/helm/signoz-values-dev.yaml`:
|
|
|
|
```yaml
|
|
otelCollector:
|
|
config:
|
|
receivers:
|
|
# PostgreSQL receivers for database metrics
|
|
postgresql/auth:
|
|
endpoint: auth-db-service.bakery-ia:5432
|
|
username: ${env:POSTGRES_MONITOR_USER}
|
|
password: ${env:POSTGRES_MONITOR_PASSWORD}
|
|
databases:
|
|
- auth_db
|
|
collection_interval: 60s
|
|
tls:
|
|
insecure: true # Set to false if using TLS
|
|
|
|
postgresql/inventory:
|
|
endpoint: inventory-db-service.bakery-ia:5432
|
|
username: ${env:POSTGRES_MONITOR_USER}
|
|
password: ${env:POSTGRES_MONITOR_PASSWORD}
|
|
databases:
|
|
- inventory_db
|
|
collection_interval: 60s
|
|
tls:
|
|
insecure: true
|
|
|
|
# Add all other databases...
|
|
postgresql/orders:
|
|
endpoint: orders-db-service.bakery-ia:5432
|
|
username: ${env:POSTGRES_MONITOR_USER}
|
|
password: ${env:POSTGRES_MONITOR_PASSWORD}
|
|
databases:
|
|
- orders_db
|
|
collection_interval: 60s
|
|
tls:
|
|
insecure: true
|
|
|
|
# Update metrics pipeline
|
|
service:
|
|
pipelines:
|
|
metrics:
|
|
receivers:
|
|
- otlp
|
|
- postgresql/auth
|
|
- postgresql/inventory
|
|
- postgresql/orders
|
|
# Add all PostgreSQL receivers
|
|
processors: [memory_limiter, batch, resourcedetection]
|
|
exporters: [signozclickhousemetrics]
|
|
```
|
|
|
|
### Step 5: Add Environment Variables to OTel Collector Deployment
|
|
|
|
The Helm chart needs to inject these environment variables. Modify your Helm values:
|
|
|
|
```yaml
|
|
otelCollector:
|
|
env:
|
|
- name: POSTGRES_MONITOR_USER
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-monitoring-secrets
|
|
key: POSTGRES_MONITOR_USER
|
|
- name: POSTGRES_MONITOR_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-monitoring-secrets
|
|
key: POSTGRES_MONITOR_PASSWORD
|
|
```
|
|
|
|
### References
|
|
- [PostgreSQL Monitoring with OpenTelemetry | SigNoz](https://signoz.io/blog/opentelemetry-postgresql-metrics-monitoring/)
|
|
- [PostgreSQL Integration | SigNoz](https://signoz.io/docs/integrations/postgresql/)
|
|
|
|
---
|
|
|
|
## Redis Receiver Configuration
|
|
|
|
### Current Infrastructure
|
|
|
|
- **Service**: `redis-service.bakery-ia:6379`
|
|
- **Password**: Available in secret `redis-secrets`
|
|
- **TLS**: Currently not configured
|
|
|
|
### Step 1: Check if Redis Requires TLS
|
|
|
|
```bash
|
|
kubectl exec -n bakery-ia deployment/redis -- redis-cli CONFIG GET tls-port
|
|
```
|
|
|
|
If TLS is not configured (tls-port is 0 or empty), you can use `insecure: true`.
|
|
|
|
### Step 2: Configure Redis Receiver
|
|
|
|
Update `infrastructure/helm/signoz-values-dev.yaml`:
|
|
|
|
```yaml
|
|
otelCollector:
|
|
config:
|
|
receivers:
|
|
# Redis receiver for cache metrics
|
|
redis:
|
|
endpoint: redis-service.bakery-ia:6379
|
|
password: ${env:REDIS_PASSWORD}
|
|
collection_interval: 60s
|
|
transport: tcp
|
|
tls:
|
|
insecure: true # Change to false if using TLS
|
|
metrics:
|
|
redis.maxmemory:
|
|
enabled: true
|
|
redis.cmd.latency:
|
|
enabled: true
|
|
|
|
env:
|
|
- name: REDIS_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: redis-secrets
|
|
key: REDIS_PASSWORD
|
|
|
|
service:
|
|
pipelines:
|
|
metrics:
|
|
receivers: [otlp, redis, ...]
|
|
```
|
|
|
|
### Optional: Configure TLS for Redis
|
|
|
|
If you want to enable TLS for Redis (recommended for production):
|
|
|
|
1. **Generate TLS Certificates:**
|
|
```bash
|
|
# Create CA
|
|
openssl genrsa -out ca-key.pem 4096
|
|
openssl req -new -x509 -days 3650 -key ca-key.pem -out ca-cert.pem
|
|
|
|
# Create Redis server certificate
|
|
openssl genrsa -out redis-key.pem 4096
|
|
openssl req -new -key redis-key.pem -out redis.csr
|
|
openssl x509 -req -days 3650 -in redis.csr -CA ca-cert.pem -CAkey ca-key.pem -CAcreateserial -out redis-cert.pem
|
|
|
|
# Create Kubernetes secret
|
|
kubectl create secret generic -n bakery-ia redis-tls \
|
|
--from-file=ca-cert.pem=ca-cert.pem \
|
|
--from-file=redis-cert.pem=redis-cert.pem \
|
|
--from-file=redis-key.pem=redis-key.pem
|
|
```
|
|
|
|
2. **Mount Certificates in OTel Collector:**
|
|
```yaml
|
|
otelCollector:
|
|
volumes:
|
|
- name: redis-tls
|
|
secret:
|
|
secretName: redis-tls
|
|
|
|
volumeMounts:
|
|
- name: redis-tls
|
|
mountPath: /etc/redis-tls
|
|
readOnly: true
|
|
|
|
config:
|
|
receivers:
|
|
redis:
|
|
tls:
|
|
insecure: false
|
|
cert_file: /etc/redis-tls/redis-cert.pem
|
|
key_file: /etc/redis-tls/redis-key.pem
|
|
ca_file: /etc/redis-tls/ca-cert.pem
|
|
```
|
|
|
|
### References
|
|
- [Redis Monitoring with OpenTelemetry | SigNoz](https://signoz.io/blog/redis-opentelemetry/)
|
|
- [Redis Monitoring 101 | SigNoz](https://signoz.io/blog/redis-monitoring/)
|
|
|
|
---
|
|
|
|
## RabbitMQ Receiver Configuration
|
|
|
|
### Current Infrastructure
|
|
|
|
- **Service**: `rabbitmq-service.bakery-ia`
|
|
- Port 5672: AMQP protocol
|
|
- Port 15672: Management API (required for metrics)
|
|
- **Credentials**:
|
|
- Username: `bakery`
|
|
- Password: Available in secret `rabbitmq-secrets`
|
|
|
|
### Step 1: Enable RabbitMQ Management Plugin
|
|
|
|
```bash
|
|
kubectl exec -n bakery-ia deployment/rabbitmq -- rabbitmq-plugins enable rabbitmq_management
|
|
```
|
|
|
|
### Step 2: Verify Management API Access
|
|
|
|
```bash
|
|
kubectl port-forward -n bakery-ia svc/rabbitmq-service 15672:15672
|
|
# In browser: http://localhost:15672
|
|
# Login with: bakery / <password>
|
|
```
|
|
|
|
### Step 3: Configure RabbitMQ Receiver
|
|
|
|
Update `infrastructure/helm/signoz-values-dev.yaml`:
|
|
|
|
```yaml
|
|
otelCollector:
|
|
config:
|
|
receivers:
|
|
# RabbitMQ receiver via management API
|
|
rabbitmq:
|
|
endpoint: http://rabbitmq-service.bakery-ia:15672
|
|
username: ${env:RABBITMQ_USER}
|
|
password: ${env:RABBITMQ_PASSWORD}
|
|
collection_interval: 30s
|
|
|
|
env:
|
|
- name: RABBITMQ_USER
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: rabbitmq-secrets
|
|
key: RABBITMQ_USER
|
|
- name: RABBITMQ_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: rabbitmq-secrets
|
|
key: RABBITMQ_PASSWORD
|
|
|
|
service:
|
|
pipelines:
|
|
metrics:
|
|
receivers: [otlp, rabbitmq, ...]
|
|
```
|
|
|
|
### References
|
|
- [RabbitMQ Monitoring with OpenTelemetry | SigNoz](https://signoz.io/blog/opentelemetry-rabbitmq-metrics-monitoring/)
|
|
- [OpenTelemetry Receivers | SigNoz](https://signoz.io/docs/userguide/otel-metrics-receivers/)
|
|
|
|
---
|
|
|
|
## Complete Implementation Plan
|
|
|
|
### Phase 1: Enable Basic Infrastructure Monitoring (No TLS)
|
|
|
|
1. **Create PostgreSQL monitoring users** (all 21 databases)
|
|
2. **Create Kubernetes secrets** for credentials
|
|
3. **Update Helm values** with receiver configurations
|
|
4. **Configure environment variables** in OTel Collector
|
|
5. **Apply Helm upgrade** and OpAMP patch
|
|
6. **Verify metrics collection**
|
|
|
|
### Phase 2: Enable TLS (Optional, Production-Ready)
|
|
|
|
1. **Generate TLS certificates** for Redis
|
|
2. **Configure Redis TLS** in deployment
|
|
3. **Update Redis receiver** with TLS settings
|
|
4. **Configure PostgreSQL TLS** if required
|
|
5. **Test and verify** secure connections
|
|
|
|
### Phase 3: Enable OpAMP (Optional, Advanced)
|
|
|
|
1. **Fix SigNoz OpAMP server configuration**
|
|
2. **Test remote configuration** in dev environment
|
|
3. **Gradually enable** OpAMP after validation
|
|
4. **Monitor** for configuration corruption
|
|
|
|
---
|
|
|
|
## Verification Commands
|
|
|
|
### Check Collector Metrics
|
|
```bash
|
|
kubectl port-forward -n bakery-ia svc/signoz-otel-collector 8888:8888
|
|
curl http://localhost:8888/metrics | grep "otelcol_receiver_accepted"
|
|
```
|
|
|
|
### Check Database Connectivity
|
|
```bash
|
|
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- \
|
|
/bin/sh -c "nc -zv auth-db-service 5432"
|
|
```
|
|
|
|
### Check RabbitMQ Management API
|
|
```bash
|
|
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- \
|
|
/bin/sh -c "wget -O- http://rabbitmq-service:15672/api/overview"
|
|
```
|
|
|
|
### Check Redis Connectivity
|
|
```bash
|
|
kubectl exec -n bakery-ia deployment/signoz-otel-collector -- \
|
|
/bin/sh -c "nc -zv redis-service 6379"
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### PostgreSQL Connection Refused
|
|
- Verify monitoring user exists: `kubectl exec deployment/auth-db -- psql -U postgres -c "\du"`
|
|
- Check user permissions: `kubectl exec deployment/auth-db -- psql -U monitoring -c "SELECT 1"`
|
|
|
|
### Redis Authentication Failed
|
|
- Verify password: `kubectl get secret redis-secrets -o jsonpath='{.data.REDIS_PASSWORD}' | base64 -d`
|
|
- Test connection: `kubectl exec deployment/redis -- redis-cli -a <password> PING`
|
|
|
|
### RabbitMQ Management API Not Available
|
|
- Check plugin status: `kubectl exec deployment/rabbitmq -- rabbitmq-plugins list`
|
|
- Enable plugin: `kubectl exec deployment/rabbitmq -- rabbitmq-plugins enable rabbitmq_management`
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Current Status:**
|
|
- ✅ OTel Collector receiving traces (97+ spans)
|
|
- ✅ ClickHouse authentication fixed
|
|
- ✅ OpAMP disabled (preventing config corruption)
|
|
- ❌ PostgreSQL receivers not configured (no monitoring users)
|
|
- ❌ Redis receiver not configured (missing in pipeline)
|
|
- ❌ RabbitMQ receiver not configured (missing in pipeline)
|
|
|
|
**Next Steps:**
|
|
1. Create PostgreSQL monitoring users across all 21 databases
|
|
2. Configure Redis receiver with existing credentials
|
|
3. Configure RabbitMQ receiver with existing credentials
|
|
4. Test and verify all metrics are flowing
|
|
5. Optionally enable TLS for production
|
|
6. Optionally fix and re-enable OpAMP for dynamic configuration
|
|
|