11 KiB
Enhanced Inter-Service Communication System
This directory contains the enhanced inter-service communication system that integrates with the new repository pattern architecture. The system provides circuit breakers, caching, monitoring, and event tracking for all service-to-service communications.
Architecture Overview
Base Components
- BaseServiceClient - Foundation class providing authentication, retries, and basic HTTP operations
- EnhancedServiceClient - Adds circuit breaker, caching, and monitoring capabilities
- ServiceRegistry - Central registry for managing all enhanced service clients
Enhanced Service Clients
Each service has a specialized enhanced client:
- EnhancedDataServiceClient - Sales data, weather, traffic, products with optimized caching
- EnhancedAuthServiceClient - Authentication, user management, permissions with security focus
- EnhancedTrainingServiceClient - ML training, model management, deployment with pipeline monitoring
- EnhancedForecastingServiceClient - Forecasting, predictions, scenarios with analytics
- EnhancedTenantServiceClient - Tenant management, memberships, organization features
- EnhancedNotificationServiceClient - Notifications, templates, delivery tracking
Key Features
Circuit Breaker Pattern
- States: Closed (normal), Open (failing), Half-Open (testing recovery)
- Configuration: Failure threshold, recovery timeout, success threshold
- Monitoring: State changes tracked and logged
Intelligent Caching
- TTL-based: Different cache durations for different data types
- Invalidation: Pattern-based cache invalidation on updates
- Statistics: Hit/miss ratios and performance metrics
- Manual Control: Clear specific cache patterns when needed
Event Integration
- Repository Events: Entity created/updated/deleted events
- Correlation IDs: Track operations across services
- Metadata: Rich event metadata for debugging and monitoring
Monitoring & Metrics
- Request Metrics: Success/failure rates, latencies
- Cache Metrics: Hit rates, entry counts
- Circuit Breaker Metrics: State changes, failure counts
- Health Checks: Per-service and aggregate health status
Usage Examples
Basic Usage with Service Registry
from shared.clients.enhanced_service_client import ServiceRegistry
from shared.config.base import BaseServiceSettings
# Initialize registry
config = BaseServiceSettings()
registry = ServiceRegistry(config, calling_service="forecasting")
# Get enhanced clients
data_client = registry.get_data_client()
auth_client = registry.get_auth_client()
training_client = registry.get_training_client()
# Use with full features
sales_data = await data_client.get_all_sales_data_with_monitoring(
tenant_id="tenant-123",
start_date="2024-01-01",
end_date="2024-12-31",
correlation_id="forecast-job-456"
)
Data Service Operations
# Get sales data with intelligent caching
sales_data = await data_client.get_sales_data_cached(
tenant_id="tenant-123",
start_date="2024-01-01",
end_date="2024-01-31",
aggregation="daily"
)
# Upload sales data with cache invalidation and events
result = await data_client.upload_sales_data_with_events(
tenant_id="tenant-123",
sales_data=sales_records,
correlation_id="data-import-789"
)
# Get weather data with caching (30 min TTL)
weather_data = await data_client.get_weather_historical_cached(
tenant_id="tenant-123",
start_date="2024-01-01",
end_date="2024-01-31"
)
Authentication & User Management
# Authenticate with security monitoring
auth_result = await auth_client.authenticate_user_cached(
email="user@example.com",
password="password"
)
# Check permissions with caching
has_access = await auth_client.check_user_permissions_cached(
user_id="user-123",
tenant_id="tenant-456",
resource="sales_data",
action="read"
)
# Create user with events
user = await auth_client.create_user_with_events(
user_data={
"email": "new@example.com",
"name": "New User",
"role": "analyst"
},
tenant_id="tenant-123",
correlation_id="user-creation-789"
)
Training & ML Operations
# Create training job with monitoring
job = await training_client.create_training_job_with_monitoring(
tenant_id="tenant-123",
include_weather=True,
include_traffic=False,
min_data_points=30,
correlation_id="training-pipeline-456"
)
# Get active model with caching
model = await training_client.get_active_model_for_product_cached(
tenant_id="tenant-123",
product_name="croissants"
)
# Deploy model with events
deployment = await training_client.deploy_model_with_events(
tenant_id="tenant-123",
model_id="model-789",
correlation_id="deployment-123"
)
# Get pipeline status
status = await training_client.get_training_pipeline_status("tenant-123")
Forecasting & Predictions
# Create forecast with monitoring
forecast = await forecasting_client.create_forecast_with_monitoring(
tenant_id="tenant-123",
model_id="model-456",
start_date="2024-02-01",
end_date="2024-02-29",
correlation_id="forecast-creation-789"
)
# Get predictions with caching
predictions = await forecasting_client.get_predictions_cached(
tenant_id="tenant-123",
forecast_id="forecast-456",
start_date="2024-02-01",
end_date="2024-02-07"
)
# Real-time prediction with caching
prediction = await forecasting_client.create_realtime_prediction_with_monitoring(
tenant_id="tenant-123",
model_id="model-456",
target_date="2024-02-01",
features={"temperature": 20, "day_of_week": 1},
correlation_id="realtime-pred-123"
)
# Get forecasting dashboard
dashboard = await forecasting_client.get_forecasting_dashboard("tenant-123")
Tenant Management
# Create tenant with monitoring
tenant = await tenant_client.create_tenant_with_monitoring(
name="New Bakery Chain",
owner_id="user-123",
description="Multi-location bakery chain",
correlation_id="tenant-creation-456"
)
# Add member with events
membership = await tenant_client.add_tenant_member_with_events(
tenant_id="tenant-123",
user_id="user-456",
role="manager",
correlation_id="member-add-789"
)
# Get tenant analytics
analytics = await tenant_client.get_tenant_analytics("tenant-123")
Notification Management
# Send notification with monitoring
notification = await notification_client.send_notification_with_monitoring(
recipient_id="user-123",
notification_type="forecast_ready",
title="Forecast Complete",
message="Your weekly forecast is ready for review",
tenant_id="tenant-456",
priority="high",
channels=["email", "in_app"],
correlation_id="forecast-notification-789"
)
# Send bulk notification
bulk_result = await notification_client.send_bulk_notification_with_monitoring(
recipients=["user-123", "user-456", "user-789"],
notification_type="system_update",
title="System Maintenance",
message="Scheduled maintenance tonight at 2 AM",
priority="normal",
correlation_id="maintenance-notification-123"
)
# Get delivery analytics
analytics = await notification_client.get_delivery_analytics(
tenant_id="tenant-123",
start_date="2024-01-01",
end_date="2024-01-31"
)
Health Monitoring
Individual Service Health
# Get specific service health
data_health = data_client.get_data_service_health()
auth_health = auth_client.get_auth_service_health()
training_health = training_client.get_training_service_health()
# Health includes:
# - Circuit breaker status
# - Cache statistics and configuration
# - Service-specific features
# - Supported endpoints
Registry-Level Health
# Get all service health status
all_health = registry.get_all_health_status()
# Get aggregate metrics
metrics = registry.get_aggregate_metrics()
# Returns:
# - Total cache hits/misses and hit rate
# - Circuit breaker states for all services
# - Count of healthy vs total services
Configuration
Cache TTL Configuration
Each enhanced client has optimized cache TTL values:
# Data Service
sales_cache_ttl = 600 # 10 minutes
weather_cache_ttl = 1800 # 30 minutes
traffic_cache_ttl = 3600 # 1 hour
product_cache_ttl = 300 # 5 minutes
# Auth Service
user_cache_ttl = 300 # 5 minutes
token_cache_ttl = 60 # 1 minute
permission_cache_ttl = 900 # 15 minutes
# Training Service
job_cache_ttl = 180 # 3 minutes
model_cache_ttl = 600 # 10 minutes
metrics_cache_ttl = 300 # 5 minutes
# And so on...
Circuit Breaker Configuration
CircuitBreakerConfig(
failure_threshold=5, # Failures before opening
recovery_timeout=60, # Seconds before testing recovery
success_threshold=2, # Successes needed to close
timeout=30 # Request timeout in seconds
)
Event System Integration
All enhanced clients integrate with the enhanced event system:
Event Types
- EntityCreatedEvent - When entities are created
- EntityUpdatedEvent - When entities are modified
- EntityDeletedEvent - When entities are removed
Event Metadata
- correlation_id - Track operations across services
- source_service - Service that generated the event
- destination_service - Target service
- tenant_id - Tenant context
- user_id - User context
- tags - Additional metadata
Usage in Enhanced Clients
Events are automatically published for:
- Data uploads and modifications
- User creation/updates/deletion
- Training job lifecycle
- Model deployments
- Forecast creation
- Tenant management operations
- Notification delivery
Error Handling & Resilience
Circuit Breaker Protection
- Automatically stops requests when services are failing
- Provides fallback to cached data when available
- Gradually tests service recovery
Retry Logic
- Exponential backoff for transient failures
- Configurable retry counts and delays
- Authentication token refresh on 401 errors
Cache Fallbacks
- Returns cached data when services are unavailable
- Graceful degradation with stale data warnings
- Manual cache invalidation for data consistency
Integration with Repository Pattern
The enhanced clients seamlessly integrate with the new repository pattern:
Service Layer Integration
class ForecastingService:
def __init__(self,
forecast_repository: ForecastRepository,
service_registry: ServiceRegistry):
self.forecast_repository = forecast_repository
self.data_client = service_registry.get_data_client()
self.training_client = service_registry.get_training_client()
async def create_forecast(self, tenant_id: str, model_id: str):
# Get data through enhanced client
sales_data = await self.data_client.get_all_sales_data_with_monitoring(
tenant_id=tenant_id,
correlation_id=f"forecast_data_{datetime.utcnow().isoformat()}"
)
# Use repository for database operations
forecast = await self.forecast_repository.create({
"tenant_id": tenant_id,
"model_id": model_id,
"status": "pending"
})
return forecast
This completes the comprehensive enhanced inter-service communication system that integrates seamlessly with the new repository pattern architecture, providing resilience, monitoring, and advanced features for all service interactions.