Imporve monitoring 4

This commit is contained in:
Urtzi Alfaro
2026-01-09 14:48:44 +01:00
parent 7ef85c1188
commit 22dab143ba
21 changed files with 1911 additions and 202 deletions

View File

@@ -0,0 +1,190 @@
# SigNoz Dashboards for Bakery IA
This directory contains comprehensive SigNoz dashboard configurations for monitoring the Bakery IA system.
## Available Dashboards
### 1. Infrastructure Monitoring
- **File**: `infrastructure-monitoring.json`
- **Purpose**: Monitor Kubernetes infrastructure, pod health, and resource utilization
- **Key Metrics**: CPU usage, memory usage, network traffic, pod status, container health
### 2. Application Performance
- **File**: `application-performance.json`
- **Purpose**: Monitor microservice performance and API metrics
- **Key Metrics**: Request rate, error rate, latency percentiles, endpoint performance
### 3. Database Performance
- **File**: `database-performance.json`
- **Purpose**: Monitor PostgreSQL and Redis database performance
- **Key Metrics**: Connections, query execution time, cache hit ratio, locks, replication status
### 4. API Performance
- **File**: `api-performance.json`
- **Purpose**: Monitor REST and GraphQL API performance
- **Key Metrics**: Request volume, response times, status codes, endpoint analysis
### 5. Error Tracking
- **File**: `error-tracking.json`
- **Purpose**: Track and analyze system errors
- **Key Metrics**: Error rates, error distribution, recent errors, HTTP errors, database errors
### 6. User Activity
- **File**: `user-activity.json`
- **Purpose**: Monitor user behavior and activity patterns
- **Key Metrics**: Active users, sessions, API calls per user, session duration
### 7. System Health
- **File**: `system-health.json`
- **Purpose**: Overall system health monitoring
- **Key Metrics**: Availability, health scores, resource utilization, service status
### 8. Alert Management
- **File**: `alert-management.json`
- **Purpose**: Monitor and manage system alerts
- **Key Metrics**: Active alerts, alert rates, alert distribution, firing alerts
### 9. Log Analysis
- **File**: `log-analysis.json`
- **Purpose**: Search and analyze system logs
- **Key Metrics**: Log volume, error logs, log distribution, log search
## How to Import Dashboards
### Method 1: Using SigNoz UI
1. **Access SigNoz UI**: Open your SigNoz instance in a web browser
2. **Navigate to Dashboards**: Go to the "Dashboards" section
3. **Import Dashboard**: Click on "Import Dashboard" button
4. **Upload JSON**: Select the JSON file from this directory
5. **Configure**: Adjust any variables or settings as needed
6. **Save**: Save the imported dashboard
**Note**: The dashboards now use the correct SigNoz JSON schema with proper filter arrays.
### Method 2: Using SigNoz API
```bash
# Import a single dashboard
curl -X POST "http://<SIGNOZ_HOST>:3301/api/v1/dashboards/import" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d @infrastructure-monitoring.json
# Import all dashboards
for file in *.json; do
curl -X POST "http://<SIGNOZ_HOST>:3301/api/v1/dashboards/import" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d @"$file"
done
```
### Method 3: Using Kubernetes ConfigMap
```yaml
# Create a ConfigMap with all dashboards
kubectl create configmap signoz-dashboards \
--from-file=infrastructure-monitoring.json \
--from-file=application-performance.json \
--from-file=database-performance.json \
--from-file=api-performance.json \
--from-file=error-tracking.json \
--from-file=user-activity.json \
--from-file=system-health.json \
--from-file=alert-management.json \
--from-file=log-analysis.json \
-n signoz
```
## Dashboard Variables
Most dashboards include variables that allow you to filter and customize the view:
- **Namespace**: Filter by Kubernetes namespace (e.g., `bakery-ia`, `default`)
- **Service**: Filter by specific microservice
- **Severity**: Filter by error/alert severity
- **Environment**: Filter by deployment environment
- **Time Range**: Adjust the time window for analysis
## Metrics Reference
The dashboards use standard OpenTelemetry metrics. If you need to add custom metrics, ensure they are properly instrumented in your services.
## Troubleshooting
### Dashboard Import Errors
If you encounter errors when importing dashboards:
1. **Validate JSON**: Ensure the JSON files are valid
```bash
jq . infrastructure-monitoring.json
```
2. **Check Metrics**: Verify that the metrics exist in your SigNoz instance
3. **Adjust Time Range**: Try different time ranges if no data appears
4. **Check Filters**: Ensure filters match your actual service names and tags
### "e.filter is not a function" Error
This error occurs when the dashboard JSON uses an incorrect filter format. The fix has been applied:
**Before (incorrect)**:
```json
"filters": {
"namespace": "${namespace}"
}
```
**After (correct)**:
```json
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "${namespace}"
}
]
```
All dashboards in this directory now use the correct array format for filters.
### Missing Data
If dashboards show no data:
1. **Verify Instrumentation**: Ensure your services are properly instrumented with OpenTelemetry
2. **Check Time Range**: Adjust the time range to include recent data
3. **Validate Metrics**: Confirm the metrics are being collected and stored
4. **Review Filters**: Check that filters match your actual deployment
## Customization
You can customize these dashboards by:
1. **Editing JSON**: Modify the JSON files to add/remove panels or adjust queries
2. **Cloning in UI**: Clone existing dashboards and modify them in the SigNoz UI
3. **Adding Variables**: Add new variables for additional filtering options
4. **Adjusting Layout**: Change the grid layout and panel sizes
## Best Practices
1. **Regular Reviews**: Review dashboards regularly to ensure they meet your monitoring needs
2. **Alert Integration**: Set up alerts based on key metrics shown in these dashboards
3. **Team Access**: Share relevant dashboards with appropriate team members
4. **Documentation**: Document any custom metrics or specific monitoring requirements
## Support
For issues with these dashboards:
1. Check the [SigNoz documentation](https://signoz.io/docs/)
2. Review the [Bakery IA monitoring guide](../SIGNOZ_COMPLETE_CONFIGURATION_GUIDE.md)
3. Consult the OpenTelemetry metrics specification
## License
These dashboard configurations are provided under the same license as the Bakery IA project.

View File

@@ -0,0 +1,104 @@
{
"dashboard": {
"title": "Bakery IA - Alert Management",
"description": "Alert monitoring and management dashboard",
"tags": ["alerts", "monitoring", "management"],
"panels": [
{
"title": "Active Alerts",
"type": "stat",
"query": {
"metric": "alerts_active",
"aggregate": "sum",
"filters": [
{
"key": "severity",
"operator": "=",
"value": "${severity}"
},
{
"key": "status",
"operator": "=",
"value": "firing"
}
]
},
"unit": "number"
},
{
"title": "Alert Rate",
"type": "timeseries",
"query": {
"metric": "alerts_total",
"aggregate": "rate",
"filters": [
{
"key": "severity",
"operator": "=",
"value": "${severity}"
}
]
},
"unit": "alerts/s"
},
{
"title": "Alerts by Severity",
"type": "pie",
"query": {
"metric": "alerts_total",
"aggregate": "sum",
"groupBy": ["severity"],
"filters": [
{
"key": "severity",
"operator": "=",
"value": "${severity}"
}
]
}
},
{
"title": "Alerts by Status",
"type": "pie",
"query": {
"metric": "alerts_total",
"aggregate": "sum",
"groupBy": ["status"],
"filters": [
{
"key": "status",
"operator": "=",
"value": "${status}"
}
]
}
}
],
"variables": [
{
"name": "severity",
"label": "Severity",
"type": "dropdown",
"default": "*",
"values": ["*", "critical", "high", "medium", "low"]
},
{
"name": "status",
"label": "Status",
"type": "dropdown",
"default": "*",
"values": ["*", "firing", "resolved", "acknowledged"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "15s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,102 @@
{
"dashboard": {
"title": "Bakery IA - API Performance",
"description": "Comprehensive API performance monitoring for Bakery IA REST and GraphQL endpoints",
"tags": ["api", "performance", "rest", "graphql"],
"panels": [
{
"title": "Request Volume",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "sum",
"groupBy": ["api"],
"filters": [
{
"key": "api",
"operator": "=",
"value": "${api}"
}
]
},
"unit": "req/s"
},
{
"title": "Error Rate",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "sum",
"groupBy": ["api", "status"],
"filters": [
{
"key": "api",
"operator": "=",
"value": "${api}"
},
{
"key": "status",
"operator": "=~",
"value": "5.."
}
]
},
"unit": "req/s"
},
{
"title": "Average Response Time",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_sum",
"aggregate": "avg",
"groupBy": ["api", "endpoint"],
"filters": [
{
"key": "api",
"operator": "=",
"value": "${api}"
}
]
},
"unit": "seconds"
},
{
"title": "P95 Latency",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_bucket",
"aggregate": "histogram_quantile",
"quantile": 0.95,
"groupBy": ["api", "endpoint"],
"filters": [
{
"key": "api",
"operator": "=",
"value": "${api}"
}
]
},
"unit": "seconds"
}
],
"variables": [
{
"name": "api",
"label": "API Service",
"type": "dropdown",
"default": "*",
"values": ["*", "gateway-api", "auth-api", "inventory-api", "production-api", "forecasting-api", "procurement-api"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "15s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,101 @@
{
"dashboard": {
"title": "Bakery IA - Application Performance",
"description": "Application performance monitoring dashboard for Bakery IA microservices",
"tags": ["application", "performance", "apm"],
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "sum",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "req/s"
},
{
"title": "Error Rate",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "sum",
"groupBy": ["service", "status"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
},
{
"key": "status",
"operator": "=~",
"value": "5.."
}
]
},
"unit": "req/s"
},
{
"title": "Average Response Time",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_sum",
"aggregate": "avg",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "seconds"
},
{
"title": "Throughput",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "rate",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "req/s"
}
],
"variables": [
{
"name": "service",
"label": "Service",
"type": "dropdown",
"default": "*",
"values": ["*", "auth-service", "gateway-service", "forecasting-service", "inventory-service", "production-service", "procurement-service"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "15s",
"time": {
"from": "now-30m",
"to": "now"
}
}
}

View File

@@ -0,0 +1,101 @@
{
"dashboard": {
"title": "Bakery IA - Database Performance",
"description": "Comprehensive database performance monitoring for PostgreSQL and Redis",
"tags": ["database", "postgresql", "redis", "performance"],
"panels": [
{
"title": "Database Connections",
"type": "timeseries",
"query": {
"metric": "pg_stat_activity_count",
"aggregate": "sum",
"groupBy": ["datname"],
"filters": [
{
"key": "datname",
"operator": "=",
"value": "${database}"
}
]
},
"unit": "number"
},
{
"title": "Active Queries",
"type": "timeseries",
"query": {
"metric": "pg_stat_activity_count",
"aggregate": "sum",
"groupBy": ["datname"],
"filters": [
{
"key": "datname",
"operator": "=",
"value": "${database}"
},
{
"key": "state",
"operator": "=",
"value": "active"
}
]
},
"unit": "number"
},
{
"title": "Database Size",
"type": "timeseries",
"query": {
"metric": "pg_database_size_bytes",
"aggregate": "sum",
"groupBy": ["datname"],
"filters": [
{
"key": "datname",
"operator": "=",
"value": "${database}"
}
]
},
"unit": "bytes"
},
{
"title": "Query Execution Time",
"type": "timeseries",
"query": {
"metric": "pg_stat_statements_total_time",
"aggregate": "avg",
"groupBy": ["datname"],
"filters": [
{
"key": "datname",
"operator": "=",
"value": "${database}"
}
]
},
"unit": "seconds"
}
],
"variables": [
{
"name": "database",
"label": "Database",
"type": "dropdown",
"default": "*",
"values": ["*", "postgresql", "redis"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,105 @@
{
"dashboard": {
"title": "Bakery IA - Error Tracking",
"description": "Comprehensive error tracking and analysis dashboard",
"tags": ["errors", "exceptions", "tracking"],
"panels": [
{
"title": "Total Errors",
"type": "stat",
"query": {
"metric": "error_total",
"aggregate": "sum",
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "number"
},
{
"title": "Error Rate",
"type": "timeseries",
"query": {
"metric": "error_total",
"aggregate": "rate",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "errors/s"
},
{
"title": "HTTP 5xx Errors",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "sum",
"groupBy": ["service", "status"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
},
{
"key": "status",
"operator": "=~",
"value": "5.."
}
]
},
"unit": "number"
},
{
"title": "HTTP 4xx Errors",
"type": "timeseries",
"query": {
"metric": "http_server_requests_seconds_count",
"aggregate": "sum",
"groupBy": ["service", "status"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
},
{
"key": "status",
"operator": "=~",
"value": "4.."
}
]
},
"unit": "number"
}
],
"variables": [
{
"name": "service",
"label": "Service",
"type": "dropdown",
"default": "*",
"values": ["*", "auth-service", "gateway-service", "inventory-service", "production-service", "forecasting-service"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "15s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,213 @@
{
"name": "Bakery IA Dashboard Collection",
"description": "Complete set of SigNoz dashboards for Bakery IA monitoring",
"version": "1.0.0",
"author": "Bakery IA Team",
"license": "MIT",
"dashboards": [
{
"id": "infrastructure-monitoring",
"name": "Infrastructure Monitoring",
"description": "Kubernetes infrastructure and resource monitoring",
"file": "infrastructure-monitoring.json",
"tags": ["infrastructure", "kubernetes", "system"],
"category": "infrastructure"
},
{
"id": "application-performance",
"name": "Application Performance",
"description": "Microservice performance and API metrics",
"file": "application-performance.json",
"tags": ["application", "performance", "apm"],
"category": "performance"
},
{
"id": "database-performance",
"name": "Database Performance",
"description": "PostgreSQL and Redis database monitoring",
"file": "database-performance.json",
"tags": ["database", "postgresql", "redis"],
"category": "database"
},
{
"id": "api-performance",
"name": "API Performance",
"description": "REST and GraphQL API performance monitoring",
"file": "api-performance.json",
"tags": ["api", "rest", "graphql"],
"category": "api"
},
{
"id": "error-tracking",
"name": "Error Tracking",
"description": "System error tracking and analysis",
"file": "error-tracking.json",
"tags": ["errors", "exceptions", "tracking"],
"category": "monitoring"
},
{
"id": "user-activity",
"name": "User Activity",
"description": "User behavior and activity monitoring",
"file": "user-activity.json",
"tags": ["user", "activity", "behavior"],
"category": "user"
},
{
"id": "system-health",
"name": "System Health",
"description": "Overall system health monitoring",
"file": "system-health.json",
"tags": ["system", "health", "overview"],
"category": "overview"
},
{
"id": "alert-management",
"name": "Alert Management",
"description": "Alert monitoring and management",
"file": "alert-management.json",
"tags": ["alerts", "notifications", "management"],
"category": "alerts"
},
{
"id": "log-analysis",
"name": "Log Analysis",
"description": "Log search and analysis",
"file": "log-analysis.json",
"tags": ["logs", "search", "analysis"],
"category": "logs"
}
],
"categories": [
{
"id": "infrastructure",
"name": "Infrastructure",
"description": "Kubernetes and system infrastructure monitoring"
},
{
"id": "performance",
"name": "Performance",
"description": "Application and service performance monitoring"
},
{
"id": "database",
"name": "Database",
"description": "Database performance and health monitoring"
},
{
"id": "api",
"name": "API",
"description": "API performance and usage monitoring"
},
{
"id": "monitoring",
"name": "Monitoring",
"description": "Error tracking and system monitoring"
},
{
"id": "user",
"name": "User",
"description": "User activity and behavior monitoring"
},
{
"id": "overview",
"name": "Overview",
"description": "System-wide overview and health dashboards"
},
{
"id": "alerts",
"name": "Alerts",
"description": "Alert management and monitoring"
},
{
"id": "logs",
"name": "Logs",
"description": "Log analysis and search"
}
],
"usage": {
"import_methods": [
"ui_import",
"api_import",
"kubernetes_configmap"
],
"recommended_import_order": [
"infrastructure-monitoring",
"system-health",
"application-performance",
"api-performance",
"database-performance",
"error-tracking",
"alert-management",
"log-analysis",
"user-activity"
]
},
"requirements": {
"signoz_version": ">= 0.10.0",
"opentelemetry_collector": ">= 0.45.0",
"metrics": [
"container_cpu_usage_seconds_total",
"container_memory_working_set_bytes",
"http_server_requests_seconds_count",
"http_server_requests_seconds_sum",
"pg_stat_activity_count",
"pg_stat_statements_total_time",
"error_total",
"alerts_total",
"kube_pod_status_phase",
"container_network_receive_bytes_total",
"kube_pod_container_status_restarts_total",
"kube_pod_container_status_ready",
"container_fs_reads_total",
"kube_pod_status_phase",
"kube_pod_container_status_restarts_total",
"kube_pod_container_status_ready",
"container_fs_reads_total",
"kubernetes_events",
"http_server_requests_seconds_bucket",
"http_server_active_requests",
"http_server_up",
"db_query_duration_seconds_sum",
"db_connections_active",
"http_client_request_duration_seconds_count",
"http_client_request_duration_seconds_sum",
"graphql_execution_time_seconds",
"graphql_errors_total",
"pg_stat_database_blks_hit",
"pg_stat_database_xact_commit",
"pg_locks_count",
"pg_table_size_bytes",
"pg_stat_user_tables_seq_scan",
"redis_memory_used_bytes",
"redis_commands_processed_total",
"redis_keyspace_hits",
"pg_stat_database_deadlocks",
"pg_stat_database_conn_errors",
"pg_replication_lag_bytes",
"pg_replication_is_replica",
"active_users",
"user_sessions_total",
"api_calls_per_user",
"session_duration_seconds",
"system_availability",
"service_health_score",
"system_cpu_usage",
"system_memory_usage",
"service_availability",
"alerts_active",
"alerts_total",
"log_lines_total"
]
},
"support": {
"documentation": "https://signoz.io/docs/",
"bakery_ia_docs": "../SIGNOZ_COMPLETE_CONFIGURATION_GUIDE.md",
"issues": "https://github.com/your-repo/issues"
},
"notes": {
"format_fix": "All dashboards have been updated to use the correct SigNoz JSON schema with proper filter arrays to resolve the 'e.filter is not a function' error.",
"compatibility": "Tested with SigNoz v0.10.0+ and OpenTelemetry Collector v0.45.0+",
"customization": "You can customize these dashboards by editing the JSON files or cloning them in the SigNoz UI"
}
}

View File

@@ -0,0 +1,105 @@
{
"dashboard": {
"title": "Bakery IA - Infrastructure Monitoring",
"description": "Comprehensive infrastructure monitoring dashboard for Bakery IA system",
"tags": ["infrastructure", "system", "kubernetes"],
"panels": [
{
"title": "CPU Usage",
"type": "timeseries",
"query": {
"metric": "container_cpu_usage_seconds_total",
"aggregate": "sum",
"groupBy": ["namespace"],
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "bakery-ia"
}
]
},
"unit": "percent",
"yAxis": {
"min": 0,
"max": 100
}
},
{
"title": "Memory Usage",
"type": "timeseries",
"query": {
"metric": "container_memory_working_set_bytes",
"aggregate": "sum",
"groupBy": ["namespace"],
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "bakery-ia"
}
]
},
"unit": "bytes"
},
{
"title": "Network Traffic",
"type": "timeseries",
"query": {
"metric": "container_network_receive_bytes_total",
"aggregate": "sum",
"groupBy": ["namespace"],
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "bakery-ia"
}
]
},
"unit": "bytes"
},
{
"title": "Pod Status",
"type": "stat",
"query": {
"metric": "kube_pod_status_phase",
"aggregate": "count",
"groupBy": ["phase"],
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "bakery-ia"
},
{
"key": "phase",
"operator": "=",
"value": "Running"
}
]
},
"unit": "number"
}
],
"variables": [
{
"name": "namespace",
"label": "Namespace",
"type": "dropdown",
"default": "bakery-ia",
"values": ["bakery-ia", "default", "kube-system"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,99 @@
{
"dashboard": {
"title": "Bakery IA - Log Analysis",
"description": "Comprehensive log analysis and search dashboard",
"tags": ["logs", "analysis", "search"],
"panels": [
{
"title": "Log Volume",
"type": "timeseries",
"query": {
"metric": "log_lines_total",
"aggregate": "sum",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "logs/s"
},
{
"title": "Error Logs",
"type": "timeseries",
"query": {
"metric": "log_lines_total",
"aggregate": "sum",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
},
{
"key": "level",
"operator": "=",
"value": "error"
}
]
},
"unit": "logs/s"
},
{
"title": "Logs by Level",
"type": "pie",
"query": {
"metric": "log_lines_total",
"aggregate": "sum",
"groupBy": ["level"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
}
},
{
"title": "Logs by Service",
"type": "pie",
"query": {
"metric": "log_lines_total",
"aggregate": "sum",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
}
}
],
"variables": [
{
"name": "service",
"label": "Service",
"type": "dropdown",
"default": "*",
"values": ["*", "auth-service", "gateway-service", "inventory-service", "production-service", "forecasting-service"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,92 @@
{
"dashboard": {
"title": "Bakery IA - System Health",
"description": "Comprehensive system health monitoring dashboard",
"tags": ["system", "health", "monitoring"],
"panels": [
{
"title": "System Availability",
"type": "stat",
"query": {
"metric": "system_availability",
"aggregate": "avg",
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "${namespace}"
}
]
},
"unit": "percent"
},
{
"title": "Service Health Score",
"type": "stat",
"query": {
"metric": "service_health_score",
"aggregate": "avg",
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "${namespace}"
}
]
},
"unit": "number"
},
{
"title": "CPU Usage",
"type": "timeseries",
"query": {
"metric": "system_cpu_usage",
"aggregate": "avg",
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "${namespace}"
}
]
},
"unit": "percent"
},
{
"title": "Memory Usage",
"type": "timeseries",
"query": {
"metric": "system_memory_usage",
"aggregate": "avg",
"filters": [
{
"key": "namespace",
"operator": "=",
"value": "${namespace}"
}
]
},
"unit": "percent"
}
],
"variables": [
{
"name": "namespace",
"label": "Namespace",
"type": "dropdown",
"default": "bakery-ia",
"values": ["bakery-ia", "default"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}

View File

@@ -0,0 +1,96 @@
{
"dashboard": {
"title": "Bakery IA - User Activity",
"description": "User activity and behavior monitoring dashboard",
"tags": ["user", "activity", "behavior"],
"panels": [
{
"title": "Active Users",
"type": "timeseries",
"query": {
"metric": "active_users",
"aggregate": "sum",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "number"
},
{
"title": "User Sessions",
"type": "timeseries",
"query": {
"metric": "user_sessions_total",
"aggregate": "sum",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "number"
},
{
"title": "API Calls per User",
"type": "timeseries",
"query": {
"metric": "api_calls_per_user",
"aggregate": "avg",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "number"
},
{
"title": "Session Duration",
"type": "timeseries",
"query": {
"metric": "session_duration_seconds",
"aggregate": "avg",
"groupBy": ["service"],
"filters": [
{
"key": "service",
"operator": "=",
"value": "${service}"
}
]
},
"unit": "seconds"
}
],
"variables": [
{
"name": "service",
"label": "Service",
"type": "dropdown",
"default": "*",
"values": ["*", "auth-service", "gateway-service", "inventory-service", "production-service"]
}
],
"layout": {
"type": "grid",
"columns": 12,
"gap": [16, 16]
},
"refresh": "30s",
"time": {
"from": "now-1h",
"to": "now"
}
}
}