284 lines
9.5 KiB
Markdown
284 lines
9.5 KiB
Markdown
|
|
# SigNoz Monitoring Quick Start
|
||
|
|
|
||
|
|
Get complete observability (metrics, logs, traces, system metrics) in under 10 minutes using OpenTelemetry.
|
||
|
|
|
||
|
|
## What You'll Get
|
||
|
|
|
||
|
|
✅ **Distributed Tracing** - Complete request flows across all services
|
||
|
|
✅ **Application Metrics** - HTTP requests, durations, error rates, custom business metrics
|
||
|
|
✅ **System Metrics** - CPU usage, memory usage, disk I/O, network I/O per service
|
||
|
|
✅ **Structured Logs** - Searchable logs correlated with traces
|
||
|
|
✅ **Unified Dashboard** - Single UI for all telemetry data
|
||
|
|
|
||
|
|
**All data pushed via OpenTelemetry OTLP protocol - No Prometheus, no scraping needed!**
|
||
|
|
|
||
|
|
## Prerequisites
|
||
|
|
|
||
|
|
- Kubernetes cluster running (Kind/Minikube/Production)
|
||
|
|
- Helm 3.x installed
|
||
|
|
- kubectl configured
|
||
|
|
|
||
|
|
## Step 1: Deploy SigNoz
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Add Helm repository
|
||
|
|
helm repo add signoz https://charts.signoz.io
|
||
|
|
helm repo update
|
||
|
|
|
||
|
|
# Create namespace
|
||
|
|
kubectl create namespace signoz
|
||
|
|
|
||
|
|
# Install SigNoz
|
||
|
|
helm install signoz signoz/signoz \
|
||
|
|
-n signoz \
|
||
|
|
-f infrastructure/helm/signoz-values-dev.yaml
|
||
|
|
|
||
|
|
# Wait for pods to be ready (2-3 minutes)
|
||
|
|
kubectl wait --for=condition=ready pod -l app=signoz -n signoz --timeout=300s
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 2: Configure Services
|
||
|
|
|
||
|
|
Each service needs OpenTelemetry environment variables. The auth-service is already configured as an example.
|
||
|
|
|
||
|
|
### Quick Configuration (for remaining services)
|
||
|
|
|
||
|
|
Add these environment variables to each service deployment:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
env:
|
||
|
|
# OpenTelemetry Collector endpoint
|
||
|
|
- name: OTEL_COLLECTOR_ENDPOINT
|
||
|
|
value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
|
||
|
|
- name: OTEL_EXPORTER_OTLP_ENDPOINT
|
||
|
|
value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
|
||
|
|
- name: OTEL_SERVICE_NAME
|
||
|
|
value: "your-service-name" # e.g., "inventory-service"
|
||
|
|
|
||
|
|
# Enable tracing
|
||
|
|
- name: ENABLE_TRACING
|
||
|
|
value: "true"
|
||
|
|
|
||
|
|
# Enable logs export
|
||
|
|
- name: OTEL_LOGS_EXPORTER
|
||
|
|
value: "otlp"
|
||
|
|
|
||
|
|
# Enable metrics export (includes system metrics)
|
||
|
|
- name: ENABLE_OTEL_METRICS
|
||
|
|
value: "true"
|
||
|
|
- name: ENABLE_SYSTEM_METRICS
|
||
|
|
value: "true"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Using the Configuration Script
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Generate configuration patches for all services
|
||
|
|
./infrastructure/kubernetes/add-monitoring-config.sh
|
||
|
|
|
||
|
|
# This creates /tmp/*-otel-patch.yaml files
|
||
|
|
# Review and manually add to each service deployment
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 3: Deploy Updated Services
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Apply updated configurations
|
||
|
|
kubectl apply -k infrastructure/kubernetes/overlays/dev/
|
||
|
|
|
||
|
|
# Or restart services to pick up new env vars
|
||
|
|
kubectl rollout restart deployment -n bakery-ia
|
||
|
|
|
||
|
|
# Wait for rollout
|
||
|
|
kubectl rollout status deployment -n bakery-ia --timeout=5m
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 4: Access SigNoz UI
|
||
|
|
|
||
|
|
### Via Ingress
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Add to /etc/hosts if needed
|
||
|
|
echo "127.0.0.1 monitoring.bakery-ia.local" | sudo tee -a /etc/hosts
|
||
|
|
|
||
|
|
# Access UI
|
||
|
|
open https://monitoring.bakery-ia.local
|
||
|
|
```
|
||
|
|
|
||
|
|
### Via Port Forward
|
||
|
|
|
||
|
|
```bash
|
||
|
|
kubectl port-forward -n signoz svc/signoz-frontend 3301:3301
|
||
|
|
open http://localhost:3301
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 5: Explore Your Data
|
||
|
|
|
||
|
|
### Traces
|
||
|
|
|
||
|
|
1. Go to **Services** tab
|
||
|
|
2. See all your services listed
|
||
|
|
3. Click on a service → View traces
|
||
|
|
4. Click on a trace → See detailed span tree with timing
|
||
|
|
|
||
|
|
### Metrics
|
||
|
|
|
||
|
|
**HTTP Metrics** (automatically collected):
|
||
|
|
- `http_requests_total` - Total requests by method, endpoint, status
|
||
|
|
- `http_request_duration_seconds` - Request latency
|
||
|
|
- `active_requests` - Current active HTTP requests
|
||
|
|
|
||
|
|
**System Metrics** (automatically collected per service):
|
||
|
|
- `process.cpu.utilization` - Process CPU usage %
|
||
|
|
- `process.memory.usage` - Process memory in bytes
|
||
|
|
- `process.memory.utilization` - Process memory %
|
||
|
|
- `process.threads.count` - Number of threads
|
||
|
|
- `system.cpu.utilization` - System-wide CPU %
|
||
|
|
- `system.memory.usage` - System memory usage
|
||
|
|
- `system.disk.io.read` - Disk bytes read
|
||
|
|
- `system.disk.io.write` - Disk bytes written
|
||
|
|
- `system.network.io.sent` - Network bytes sent
|
||
|
|
- `system.network.io.received` - Network bytes received
|
||
|
|
|
||
|
|
**Custom Business Metrics** (if configured):
|
||
|
|
- User registrations
|
||
|
|
- Orders created
|
||
|
|
- Login attempts
|
||
|
|
- etc.
|
||
|
|
|
||
|
|
### Logs
|
||
|
|
|
||
|
|
1. Go to **Logs** tab
|
||
|
|
2. Filter by service: `service_name="auth-service"`
|
||
|
|
3. Search for specific messages
|
||
|
|
4. See structured fields (user_id, tenant_id, etc.)
|
||
|
|
|
||
|
|
### Trace-Log Correlation
|
||
|
|
|
||
|
|
1. Find a trace in **Traces** tab
|
||
|
|
2. Note the `trace_id`
|
||
|
|
3. Go to **Logs** tab
|
||
|
|
4. Filter: `trace_id="<the-trace-id>"`
|
||
|
|
5. See all logs for that specific request!
|
||
|
|
|
||
|
|
## Verification Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check if services are sending telemetry
|
||
|
|
kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel"
|
||
|
|
|
||
|
|
# Check SigNoz collector is receiving data
|
||
|
|
kubectl logs -n signoz deployment/signoz-otel-collector | tail -50
|
||
|
|
|
||
|
|
# Test connectivity to collector
|
||
|
|
kubectl exec -n bakery-ia deployment/auth-service -- \
|
||
|
|
curl -v http://signoz-otel-collector.signoz.svc.cluster.local:4318
|
||
|
|
```
|
||
|
|
|
||
|
|
## Common Issues
|
||
|
|
|
||
|
|
### No data in SigNoz
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Verify environment variables are set
|
||
|
|
kubectl get deployment auth-service -n bakery-ia -o yaml | grep OTEL
|
||
|
|
|
||
|
|
# 2. Check collector logs
|
||
|
|
kubectl logs -n signoz deployment/signoz-otel-collector
|
||
|
|
|
||
|
|
# 3. Restart service
|
||
|
|
kubectl rollout restart deployment/auth-service -n bakery-ia
|
||
|
|
```
|
||
|
|
|
||
|
|
### Services not appearing
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check network connectivity
|
||
|
|
kubectl exec -n bakery-ia deployment/auth-service -- \
|
||
|
|
curl http://signoz-otel-collector.signoz.svc.cluster.local:4318
|
||
|
|
|
||
|
|
# Should return: connection successful (not connection refused)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────┐
|
||
|
|
│ Your Microservices │
|
||
|
|
│ ┌──────┐ ┌──────┐ ┌──────┐ │
|
||
|
|
│ │ auth │ │ inv │ │orders│ ... │
|
||
|
|
│ └──┬───┘ └──┬───┘ └──┬───┘ │
|
||
|
|
│ │ │ │ │
|
||
|
|
│ └─────────┴─────────┘ │
|
||
|
|
│ │ │
|
||
|
|
│ OTLP Push │
|
||
|
|
│ (traces, metrics, logs) │
|
||
|
|
└──────────────┼──────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────────────────────────────┐
|
||
|
|
│ SigNoz OpenTelemetry Collector │
|
||
|
|
│ :4317 (gRPC) :4318 (HTTP) │
|
||
|
|
│ │
|
||
|
|
│ Receivers: OTLP only (no Prometheus) │
|
||
|
|
│ Processors: batch, memory_limiter │
|
||
|
|
│ Exporters: ClickHouse │
|
||
|
|
└──────────────┼──────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────────────────────────────┐
|
||
|
|
│ ClickHouse Database │
|
||
|
|
│ Stores: traces, metrics, logs │
|
||
|
|
└──────────────┼──────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────────────────────────────┐
|
||
|
|
│ SigNoz Frontend UI │
|
||
|
|
│ monitoring.bakery-ia.local or :3301 │
|
||
|
|
└──────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## What Makes This Different
|
||
|
|
|
||
|
|
**Pure OpenTelemetry** - No Prometheus involved:
|
||
|
|
- ✅ All metrics pushed via OTLP (not scraped)
|
||
|
|
- ✅ Automatic system metrics collection (CPU, memory, disk, network)
|
||
|
|
- ✅ Unified data model for all telemetry
|
||
|
|
- ✅ Native trace-metric-log correlation
|
||
|
|
- ✅ Lower resource usage (no scraping overhead)
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
- **Create Dashboards** - Build custom views for your metrics
|
||
|
|
- **Set Up Alerts** - Configure alerts for errors, latency, resource usage
|
||
|
|
- **Explore System Metrics** - Monitor CPU, memory per service
|
||
|
|
- **Query Logs** - Use powerful log query language
|
||
|
|
- **Correlate Everything** - Jump from traces → logs → metrics
|
||
|
|
|
||
|
|
## Need Help?
|
||
|
|
|
||
|
|
- [Full Documentation](./MONITORING_SETUP.md) - Detailed setup guide
|
||
|
|
- [SigNoz Docs](https://signoz.io/docs/) - Official documentation
|
||
|
|
- [OpenTelemetry Python](https://opentelemetry.io/docs/instrumentation/python/) - Python instrumentation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Metrics You Get Out of the Box:**
|
||
|
|
|
||
|
|
| Category | Metrics | Description |
|
||
|
|
|----------|---------|-------------|
|
||
|
|
| HTTP | `http_requests_total` | Total requests by method, endpoint, status |
|
||
|
|
| HTTP | `http_request_duration_seconds` | Request latency histogram |
|
||
|
|
| HTTP | `active_requests` | Current active requests |
|
||
|
|
| Process | `process.cpu.utilization` | Process CPU usage % |
|
||
|
|
| Process | `process.memory.usage` | Process memory in bytes |
|
||
|
|
| Process | `process.memory.utilization` | Process memory % |
|
||
|
|
| Process | `process.threads.count` | Thread count |
|
||
|
|
| System | `system.cpu.utilization` | System CPU % |
|
||
|
|
| System | `system.memory.usage` | System memory usage |
|
||
|
|
| System | `system.memory.utilization` | System memory % |
|
||
|
|
| Disk | `system.disk.io.read` | Disk read bytes |
|
||
|
|
| Disk | `system.disk.io.write` | Disk write bytes |
|
||
|
|
| Network | `system.network.io.sent` | Network sent bytes |
|
||
|
|
| Network | `system.network.io.received` | Network received bytes |
|