# SigNoz Monitoring Quick Start

Get complete observability (metrics, logs, traces, system metrics) in under 10 minutes using OpenTelemetry.

## What You'll Get

✅ **Distributed Tracing** - Complete request flows across all services
✅ **Application Metrics** - HTTP requests, durations, error rates, custom business metrics
✅ **System Metrics** - CPU usage, memory usage, disk I/O, network I/O per service
✅ **Structured Logs** - Searchable logs correlated with traces
✅ **Unified Dashboard** - Single UI for all telemetry data

**All data pushed via OpenTelemetry OTLP protocol - No Prometheus, no scraping needed!**

## Prerequisites

- Kubernetes cluster running (Kind/Minikube/Production)
- Helm 3.x installed
- kubectl configured

## Step 1: Deploy SigNoz

```bash
# Add Helm repository
helm repo add signoz https://charts.signoz.io
helm repo update

# Create namespace
kubectl create namespace signoz

# Install SigNoz
helm install signoz signoz/signoz \
  -n signoz \
  -f infrastructure/helm/signoz-values-dev.yaml

# Wait for pods to be ready (2-3 minutes)
kubectl wait --for=condition=ready pod -l app=signoz -n signoz --timeout=300s
```

## Step 2: Configure Services

Each service needs OpenTelemetry environment variables. The auth-service is already configured as an example.

### Quick Configuration (for remaining services)

Add these environment variables to each service deployment:

```yaml
env:
  # OpenTelemetry Collector endpoint
  - name: OTEL_COLLECTOR_ENDPOINT
    value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://signoz-otel-collector.signoz.svc.cluster.local:4318"
  - name: OTEL_SERVICE_NAME
    value: "your-service-name"  # e.g., "inventory-service"

  # Enable tracing
  - name: ENABLE_TRACING
    value: "true"

  # Enable logs export
  - name: OTEL_LOGS_EXPORTER
    value: "otlp"

  # Enable metrics export (includes system metrics)
  - name: ENABLE_OTEL_METRICS
    value: "true"
  - name: ENABLE_SYSTEM_METRICS
    value: "true"
```

### Using the Configuration Script

```bash
# Generate configuration patches for all services
./infrastructure/kubernetes/add-monitoring-config.sh

# This creates /tmp/*-otel-patch.yaml files
# Review and manually add to each service deployment
```

## Step 3: Deploy Updated Services

```bash
# Apply updated configurations
kubectl apply -k infrastructure/kubernetes/overlays/dev/

# Or restart services to pick up new env vars
kubectl rollout restart deployment -n bakery-ia

# Wait for rollout
kubectl rollout status deployment -n bakery-ia --timeout=5m
```

## Step 4: Access SigNoz UI

### Via Ingress

```bash
# Add to /etc/hosts if needed
echo "127.0.0.1 monitoring.bakery-ia.local" | sudo tee -a /etc/hosts

# Access UI
open https://monitoring.bakery-ia.local
```

### Via Port Forward

```bash
kubectl port-forward -n signoz svc/signoz-frontend 3301:3301
open http://localhost:3301
```

## Step 5: Explore Your Data

### Traces

1. Go to **Services** tab
2. See all your services listed
3. Click on a service → View traces
4. Click on a trace → See detailed span tree with timing

### Metrics

**HTTP Metrics** (automatically collected):
- `http_requests_total` - Total requests by method, endpoint, status
- `http_request_duration_seconds` - Request latency
- `active_requests` - Current active HTTP requests

**System Metrics** (automatically collected per service):
- `process.cpu.utilization` - Process CPU usage %
- `process.memory.usage` - Process memory in bytes
- `process.memory.utilization` - Process memory %
- `process.threads.count` - Number of threads
- `system.cpu.utilization` - System-wide CPU %
- `system.memory.usage` - System memory usage
- `system.disk.io.read` - Disk bytes read
- `system.disk.io.write` - Disk bytes written
- `system.network.io.sent` - Network bytes sent
- `system.network.io.received` - Network bytes received

**Custom Business Metrics** (if configured):
- User registrations
- Orders created
- Login attempts
- etc.

### Logs

1. Go to **Logs** tab
2. Filter by service: `service_name="auth-service"`
3. Search for specific messages
4. See structured fields (user_id, tenant_id, etc.)

### Trace-Log Correlation

1. Find a trace in **Traces** tab
2. Note the `trace_id`
3. Go to **Logs** tab
4. Filter: `trace_id="<the-trace-id>"`
5. See all logs for that specific request!

## Verification Commands

```bash
# Check if services are sending telemetry
kubectl logs -n bakery-ia deployment/auth-service | grep -i "telemetry\|otel"

# Check SigNoz collector is receiving data
kubectl logs -n signoz deployment/signoz-otel-collector | tail -50

# Test connectivity to collector
kubectl exec -n bakery-ia deployment/auth-service -- \
  curl -v http://signoz-otel-collector.signoz.svc.cluster.local:4318
```

## Common Issues

### No data in SigNoz

```bash
# 1. Verify environment variables are set
kubectl get deployment auth-service -n bakery-ia -o yaml | grep OTEL

# 2. Check collector logs
kubectl logs -n signoz deployment/signoz-otel-collector

# 3. Restart service
kubectl rollout restart deployment/auth-service -n bakery-ia
```

### Services not appearing

```bash
# Check network connectivity
kubectl exec -n bakery-ia deployment/auth-service -- \
  curl http://signoz-otel-collector.signoz.svc.cluster.local:4318

# Should return: connection successful (not connection refused)
```

## Architecture

```
┌─────────────────────────────────────────────┐
│         Your Microservices                   │
│  ┌──────┐  ┌──────┐  ┌──────┐              │
│  │ auth │  │ inv  │  │orders│  ...         │
│  └──┬───┘  └──┬───┘  └──┬───┘              │
│     │         │         │                    │
│     └─────────┴─────────┘                    │
│              │                               │
│         OTLP Push                            │
│  (traces, metrics, logs)                    │
└──────────────┼──────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│   SigNoz OpenTelemetry Collector             │
│   :4317 (gRPC)  :4318 (HTTP)                │
│                                              │
│   Receivers: OTLP only (no Prometheus)      │
│   Processors: batch, memory_limiter         │
│   Exporters: ClickHouse                     │
└──────────────┼──────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│         ClickHouse Database                   │
│   Stores: traces, metrics, logs              │
└──────────────┼──────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│       SigNoz Frontend UI                      │
│   monitoring.bakery-ia.local or :3301        │
└──────────────────────────────────────────────┘
```

## What Makes This Different

**Pure OpenTelemetry** - No Prometheus involved:
- ✅ All metrics pushed via OTLP (not scraped)
- ✅ Automatic system metrics collection (CPU, memory, disk, network)
- ✅ Unified data model for all telemetry
- ✅ Native trace-metric-log correlation
- ✅ Lower resource usage (no scraping overhead)

## Next Steps

- **Create Dashboards** - Build custom views for your metrics
- **Set Up Alerts** - Configure alerts for errors, latency, resource usage
- **Explore System Metrics** - Monitor CPU, memory per service
- **Query Logs** - Use powerful log query language
- **Correlate Everything** - Jump from traces → logs → metrics

## Need Help?

- [Full Documentation](./MONITORING_SETUP.md) - Detailed setup guide
- [SigNoz Docs](https://signoz.io/docs/) - Official documentation
- [OpenTelemetry Python](https://opentelemetry.io/docs/instrumentation/python/) - Python instrumentation

---

**Metrics You Get Out of the Box:**

| Category | Metrics | Description |
|----------|---------|-------------|
| HTTP | `http_requests_total` | Total requests by method, endpoint, status |
| HTTP | `http_request_duration_seconds` | Request latency histogram |
| HTTP | `active_requests` | Current active requests |
| Process | `process.cpu.utilization` | Process CPU usage % |
| Process | `process.memory.usage` | Process memory in bytes |
| Process | `process.memory.utilization` | Process memory % |
| Process | `process.threads.count` | Thread count |
| System | `system.cpu.utilization` | System CPU % |
| System | `system.memory.usage` | System memory usage |
| System | `system.memory.utilization` | System memory % |
| Disk | `system.disk.io.read` | Disk read bytes |
| Disk | `system.disk.io.write` | Disk write bytes |
| Network | `system.network.io.sent` | Network sent bytes |
| Network | `system.network.io.received` | Network received bytes |