Clean code
This commit is contained in:
281
HEALTH_CHECKS.md
281
HEALTH_CHECKS.md
@@ -1,281 +0,0 @@
|
|||||||
# Unified Health Check System
|
|
||||||
|
|
||||||
This document describes the unified health check system implemented across all microservices in the bakery-ia platform.
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The unified health check system provides standardized health monitoring endpoints across all services, with comprehensive database verification, Kubernetes integration, and detailed health reporting.
|
|
||||||
|
|
||||||
## Key Features
|
|
||||||
|
|
||||||
- **Standardized Endpoints**: All services now provide the same health check endpoints
|
|
||||||
- **Database Verification**: Comprehensive database health checks including table existence verification
|
|
||||||
- **Kubernetes Integration**: Proper separation of liveness and readiness probes
|
|
||||||
- **Detailed Reporting**: Rich health status information for debugging and monitoring
|
|
||||||
- **App State Integration**: Health checks automatically detect service ready state
|
|
||||||
|
|
||||||
## Health Check Endpoints
|
|
||||||
|
|
||||||
### `/health` - Basic Health Check
|
|
||||||
- **Purpose**: Basic service health status
|
|
||||||
- **Use Case**: General health monitoring, API gateways
|
|
||||||
- **Response**: Service name, version, status, and timestamp
|
|
||||||
- **Status Codes**: 200 (healthy/starting)
|
|
||||||
|
|
||||||
### `/health/ready` - Kubernetes Readiness Probe
|
|
||||||
- **Purpose**: Indicates if service is ready to receive traffic
|
|
||||||
- **Use Case**: Kubernetes readiness probe, load balancer health checks
|
|
||||||
- **Checks**: Application state, database connectivity, table verification, custom checks
|
|
||||||
- **Status Codes**: 200 (ready), 503 (not ready)
|
|
||||||
|
|
||||||
### `/health/live` - Kubernetes Liveness Probe
|
|
||||||
- **Purpose**: Indicates if service is alive and should not be restarted
|
|
||||||
- **Use Case**: Kubernetes liveness probe
|
|
||||||
- **Response**: Simple alive status
|
|
||||||
- **Status Codes**: 200 (alive)
|
|
||||||
|
|
||||||
### `/health/database` - Detailed Database Health
|
|
||||||
- **Purpose**: Comprehensive database health information for debugging
|
|
||||||
- **Use Case**: Database monitoring, troubleshooting
|
|
||||||
- **Checks**: Connectivity, table existence, connection pool status, response times
|
|
||||||
- **Status Codes**: 200 (healthy), 503 (unhealthy)
|
|
||||||
|
|
||||||
## Implementation
|
|
||||||
|
|
||||||
### Services Updated
|
|
||||||
|
|
||||||
The following services have been updated to use the unified health check system:
|
|
||||||
|
|
||||||
1. **Training Service** (`training-service`)
|
|
||||||
- Full implementation with database manager integration
|
|
||||||
- Table verification for ML training tables
|
|
||||||
- Expected tables: `model_training_logs`, `trained_models`, `model_performance_metrics`, `training_job_queue`, `model_artifacts`
|
|
||||||
|
|
||||||
2. **Orders Service** (`orders-service`)
|
|
||||||
- Legacy database integration with custom health checks
|
|
||||||
- Expected tables: `customers`, `customer_contacts`, `customer_orders`, `order_items`, `order_status_history`, `procurement_plans`, `procurement_requirements`
|
|
||||||
|
|
||||||
3. **Inventory Service** (`inventory-service`)
|
|
||||||
- Full database manager integration
|
|
||||||
- Food safety and inventory table verification
|
|
||||||
- Expected tables: `ingredients`, `stock`, `stock_movements`, `product_transformations`, `stock_alerts`, `food_safety_compliance`, `temperature_logs`, `food_safety_alerts`
|
|
||||||
|
|
||||||
### Code Integration
|
|
||||||
|
|
||||||
#### Basic Setup
|
|
||||||
```python
|
|
||||||
from shared.monitoring.health_checks import setup_fastapi_health_checks
|
|
||||||
|
|
||||||
# Setup unified health checks
|
|
||||||
health_manager = setup_fastapi_health_checks(
|
|
||||||
app=app,
|
|
||||||
service_name="my-service",
|
|
||||||
version="1.0.0",
|
|
||||||
database_manager=database_manager,
|
|
||||||
expected_tables=['table1', 'table2'],
|
|
||||||
custom_checks={"custom_check": custom_check_function}
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
#### With Custom Checks
|
|
||||||
```python
|
|
||||||
async def custom_health_check():
|
|
||||||
"""Custom health check function"""
|
|
||||||
return await some_service_check()
|
|
||||||
|
|
||||||
health_manager = setup_fastapi_health_checks(
|
|
||||||
app=app,
|
|
||||||
service_name="my-service",
|
|
||||||
version="1.0.0",
|
|
||||||
database_manager=database_manager,
|
|
||||||
expected_tables=['table1', 'table2'],
|
|
||||||
custom_checks={"external_service": custom_health_check}
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Service Ready State
|
|
||||||
```python
|
|
||||||
# In your lifespan function
|
|
||||||
async def lifespan(app: FastAPI):
|
|
||||||
# Startup logic
|
|
||||||
await initialize_service()
|
|
||||||
|
|
||||||
# Mark service as ready
|
|
||||||
app.state.ready = True
|
|
||||||
|
|
||||||
yield
|
|
||||||
|
|
||||||
# Shutdown logic
|
|
||||||
```
|
|
||||||
|
|
||||||
## Kubernetes Configuration
|
|
||||||
|
|
||||||
### Updated Probe Configuration
|
|
||||||
|
|
||||||
The microservice template and specific service configurations have been updated to use the new endpoints:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health/live
|
|
||||||
port: 8000
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
timeoutSeconds: 5
|
|
||||||
periodSeconds: 10
|
|
||||||
failureThreshold: 3
|
|
||||||
|
|
||||||
readinessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health/ready
|
|
||||||
port: 8000
|
|
||||||
initialDelaySeconds: 15
|
|
||||||
timeoutSeconds: 3
|
|
||||||
periodSeconds: 5
|
|
||||||
failureThreshold: 5
|
|
||||||
```
|
|
||||||
|
|
||||||
### Key Changes from Previous Configuration
|
|
||||||
|
|
||||||
1. **Liveness Probe**: Now uses `/health/live` instead of `/health`
|
|
||||||
2. **Readiness Probe**: Now uses `/health/ready` instead of `/health`
|
|
||||||
3. **Improved Timing**: Adjusted timeouts and failure thresholds for better reliability
|
|
||||||
4. **Separate Concerns**: Liveness and readiness are now properly separated
|
|
||||||
|
|
||||||
## Health Check Response Examples
|
|
||||||
|
|
||||||
### Basic Health Check Response
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"status": "healthy",
|
|
||||||
"service": "training-service",
|
|
||||||
"version": "1.0.0",
|
|
||||||
"timestamp": "2025-01-27T10:30:00Z"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Readiness Check Response (Ready)
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"status": "ready",
|
|
||||||
"checks": {
|
|
||||||
"application": true,
|
|
||||||
"database_connectivity": true,
|
|
||||||
"database_tables": true
|
|
||||||
},
|
|
||||||
"database": {
|
|
||||||
"status": "healthy",
|
|
||||||
"tables_verified": ["model_training_logs", "trained_models"],
|
|
||||||
"missing_tables": [],
|
|
||||||
"errors": []
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Database Health Response
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"status": "healthy",
|
|
||||||
"connectivity": true,
|
|
||||||
"tables_exist": true,
|
|
||||||
"tables_verified": ["model_training_logs", "trained_models"],
|
|
||||||
"missing_tables": [],
|
|
||||||
"errors": [],
|
|
||||||
"connection_info": {
|
|
||||||
"service_name": "training-service",
|
|
||||||
"database_type": "postgresql",
|
|
||||||
"pool_size": 20,
|
|
||||||
"current_checked_out": 2
|
|
||||||
},
|
|
||||||
"response_time_ms": 15.23
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
### Manual Testing
|
|
||||||
```bash
|
|
||||||
# Test all endpoints for a running service
|
|
||||||
curl http://localhost:8000/health
|
|
||||||
curl http://localhost:8000/health/ready
|
|
||||||
curl http://localhost:8000/health/live
|
|
||||||
curl http://localhost:8000/health/database
|
|
||||||
```
|
|
||||||
|
|
||||||
### Automated Testing
|
|
||||||
Use the provided test script:
|
|
||||||
```bash
|
|
||||||
python test_unified_health_checks.py
|
|
||||||
```
|
|
||||||
|
|
||||||
## Migration Guide
|
|
||||||
|
|
||||||
### For Existing Services
|
|
||||||
|
|
||||||
1. **Add Health Check Import**:
|
|
||||||
```python
|
|
||||||
from shared.monitoring.health_checks import setup_fastapi_health_checks
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Add Database Manager Import** (if using shared database):
|
|
||||||
```python
|
|
||||||
from app.core.database import database_manager
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Setup Health Checks** (after app creation, before router inclusion):
|
|
||||||
```python
|
|
||||||
health_manager = setup_fastapi_health_checks(
|
|
||||||
app=app,
|
|
||||||
service_name="your-service-name",
|
|
||||||
version=settings.VERSION,
|
|
||||||
database_manager=database_manager,
|
|
||||||
expected_tables=["table1", "table2"]
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Remove Old Health Endpoints**:
|
|
||||||
Remove any existing `@app.get("/health")` endpoints
|
|
||||||
|
|
||||||
5. **Add Ready State Management**:
|
|
||||||
```python
|
|
||||||
# In lifespan function after successful startup
|
|
||||||
app.state.ready = True
|
|
||||||
```
|
|
||||||
|
|
||||||
6. **Update Kubernetes Configuration**:
|
|
||||||
Update deployment YAML to use new probe endpoints
|
|
||||||
|
|
||||||
### For Services Using Legacy Database
|
|
||||||
|
|
||||||
If your service doesn't use the shared database manager:
|
|
||||||
|
|
||||||
```python
|
|
||||||
async def legacy_database_check():
|
|
||||||
"""Custom health check for legacy database"""
|
|
||||||
return await your_db_health_check()
|
|
||||||
|
|
||||||
health_manager = setup_fastapi_health_checks(
|
|
||||||
app=app,
|
|
||||||
service_name="your-service",
|
|
||||||
version=settings.VERSION,
|
|
||||||
database_manager=None,
|
|
||||||
expected_tables=None,
|
|
||||||
custom_checks={"legacy_database": legacy_database_check}
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Benefits
|
|
||||||
|
|
||||||
1. **Consistency**: All services now provide the same health check interface
|
|
||||||
2. **Better Kubernetes Integration**: Proper separation of liveness and readiness concerns
|
|
||||||
3. **Enhanced Debugging**: Detailed health information for troubleshooting
|
|
||||||
4. **Database Verification**: Comprehensive database health checks including table verification
|
|
||||||
5. **Monitoring Ready**: Rich health status information for monitoring systems
|
|
||||||
6. **Maintainability**: Centralized health check logic reduces code duplication
|
|
||||||
|
|
||||||
## Future Enhancements
|
|
||||||
|
|
||||||
1. **Metrics Integration**: Add Prometheus metrics for health check performance
|
|
||||||
2. **Circuit Breaker**: Implement circuit breaker pattern for external service checks
|
|
||||||
3. **Health Check Dependencies**: Add dependency health checks between services
|
|
||||||
4. **Performance Thresholds**: Add configurable performance thresholds for health checks
|
|
||||||
5. **Health Check Scheduling**: Add scheduled background health checks
|
|
||||||
@@ -1,27 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Script to seed the orders database with test data
|
|
||||||
set -e
|
|
||||||
|
|
||||||
echo "🌱 Seeding Orders Database with Test Data"
|
|
||||||
echo "========================================="
|
|
||||||
|
|
||||||
# Change to the orders service directory
|
|
||||||
cd services/orders
|
|
||||||
|
|
||||||
# Make sure we're in a virtual environment or have the dependencies
|
|
||||||
echo "📦 Setting up environment..."
|
|
||||||
|
|
||||||
# Run the seeding script
|
|
||||||
echo "🚀 Running seeding script..."
|
|
||||||
python scripts/seed_test_data.py
|
|
||||||
|
|
||||||
echo "✅ Database seeding completed!"
|
|
||||||
echo ""
|
|
||||||
echo "🎯 Test data created:"
|
|
||||||
echo " - 6 customers (including VIP, wholesale, and inactive)"
|
|
||||||
echo " - 25 orders in various statuses"
|
|
||||||
echo " - Order items with different products"
|
|
||||||
echo " - Order status history"
|
|
||||||
echo ""
|
|
||||||
echo "📋 You can now test the frontend with real data!"
|
|
||||||
Reference in New Issue
Block a user