# Enhanced Inter-Service Communication System This directory contains the enhanced inter-service communication system that integrates with the new repository pattern architecture. The system provides circuit breakers, caching, monitoring, and event tracking for all service-to-service communications. ## Architecture Overview ### Base Components 1. **BaseServiceClient** - Foundation class providing authentication, retries, and basic HTTP operations 2. **EnhancedServiceClient** - Adds circuit breaker, caching, and monitoring capabilities 3. **ServiceRegistry** - Central registry for managing all enhanced service clients ### Enhanced Service Clients Each service has a specialized enhanced client: - **EnhancedDataServiceClient** - Sales data, weather, traffic, products with optimized caching - **EnhancedAuthServiceClient** - Authentication, user management, permissions with security focus - **EnhancedTrainingServiceClient** - ML training, model management, deployment with pipeline monitoring - **EnhancedForecastingServiceClient** - Forecasting, predictions, scenarios with analytics - **EnhancedTenantServiceClient** - Tenant management, memberships, organization features - **EnhancedNotificationServiceClient** - Notifications, templates, delivery tracking ## Key Features ### Circuit Breaker Pattern - **States**: Closed (normal), Open (failing), Half-Open (testing recovery) - **Configuration**: Failure threshold, recovery timeout, success threshold - **Monitoring**: State changes tracked and logged ### Intelligent Caching - **TTL-based**: Different cache durations for different data types - **Invalidation**: Pattern-based cache invalidation on updates - **Statistics**: Hit/miss ratios and performance metrics - **Manual Control**: Clear specific cache patterns when needed ### Event Integration - **Repository Events**: Entity created/updated/deleted events - **Correlation IDs**: Track operations across services - **Metadata**: Rich event metadata for debugging and monitoring ### Monitoring & Metrics - **Request Metrics**: Success/failure rates, latencies - **Cache Metrics**: Hit rates, entry counts - **Circuit Breaker Metrics**: State changes, failure counts - **Health Checks**: Per-service and aggregate health status ## Usage Examples ### Basic Usage with Service Registry ```python from shared.clients.enhanced_service_client import ServiceRegistry from shared.config.base import BaseServiceSettings # Initialize registry config = BaseServiceSettings() registry = ServiceRegistry(config, calling_service="forecasting") # Get enhanced clients data_client = registry.get_data_client() auth_client = registry.get_auth_client() training_client = registry.get_training_client() # Use with full features sales_data = await data_client.get_all_sales_data_with_monitoring( tenant_id="tenant-123", start_date="2024-01-01", end_date="2024-12-31", correlation_id="forecast-job-456" ) ``` ### Data Service Operations ```python # Get sales data with intelligent caching sales_data = await data_client.get_sales_data_cached( tenant_id="tenant-123", start_date="2024-01-01", end_date="2024-01-31", aggregation="daily" ) # Upload sales data with cache invalidation and events result = await data_client.upload_sales_data_with_events( tenant_id="tenant-123", sales_data=sales_records, correlation_id="data-import-789" ) # Get weather data with caching (30 min TTL) weather_data = await data_client.get_weather_historical_cached( tenant_id="tenant-123", start_date="2024-01-01", end_date="2024-01-31" ) ``` ### Authentication & User Management ```python # Authenticate with security monitoring auth_result = await auth_client.authenticate_user_cached( email="user@example.com", password="password" ) # Check permissions with caching has_access = await auth_client.check_user_permissions_cached( user_id="user-123", tenant_id="tenant-456", resource="sales_data", action="read" ) # Create user with events user = await auth_client.create_user_with_events( user_data={ "email": "new@example.com", "name": "New User", "role": "analyst" }, tenant_id="tenant-123", correlation_id="user-creation-789" ) ``` ### Training & ML Operations ```python # Create training job with monitoring job = await training_client.create_training_job_with_monitoring( tenant_id="tenant-123", include_weather=True, include_traffic=False, min_data_points=30, correlation_id="training-pipeline-456" ) # Get active model with caching model = await training_client.get_active_model_for_product_cached( tenant_id="tenant-123", product_name="croissants" ) # Deploy model with events deployment = await training_client.deploy_model_with_events( tenant_id="tenant-123", model_id="model-789", correlation_id="deployment-123" ) # Get pipeline status status = await training_client.get_training_pipeline_status("tenant-123") ``` ### Forecasting & Predictions ```python # Create forecast with monitoring forecast = await forecasting_client.create_forecast_with_monitoring( tenant_id="tenant-123", model_id="model-456", start_date="2024-02-01", end_date="2024-02-29", correlation_id="forecast-creation-789" ) # Get predictions with caching predictions = await forecasting_client.get_predictions_cached( tenant_id="tenant-123", forecast_id="forecast-456", start_date="2024-02-01", end_date="2024-02-07" ) # Real-time prediction with caching prediction = await forecasting_client.create_realtime_prediction_with_monitoring( tenant_id="tenant-123", model_id="model-456", target_date="2024-02-01", features={"temperature": 20, "day_of_week": 1}, correlation_id="realtime-pred-123" ) # Get forecasting dashboard dashboard = await forecasting_client.get_forecasting_dashboard("tenant-123") ``` ### Tenant Management ```python # Create tenant with monitoring tenant = await tenant_client.create_tenant_with_monitoring( name="New Bakery Chain", owner_id="user-123", description="Multi-location bakery chain", correlation_id="tenant-creation-456" ) # Add member with events membership = await tenant_client.add_tenant_member_with_events( tenant_id="tenant-123", user_id="user-456", role="manager", correlation_id="member-add-789" ) # Get tenant analytics analytics = await tenant_client.get_tenant_analytics("tenant-123") ``` ### Notification Management ```python # Send notification with monitoring notification = await notification_client.send_notification_with_monitoring( recipient_id="user-123", notification_type="forecast_ready", title="Forecast Complete", message="Your weekly forecast is ready for review", tenant_id="tenant-456", priority="high", channels=["email", "in_app"], correlation_id="forecast-notification-789" ) # Send bulk notification bulk_result = await notification_client.send_bulk_notification_with_monitoring( recipients=["user-123", "user-456", "user-789"], notification_type="system_update", title="System Maintenance", message="Scheduled maintenance tonight at 2 AM", priority="normal", correlation_id="maintenance-notification-123" ) # Get delivery analytics analytics = await notification_client.get_delivery_analytics( tenant_id="tenant-123", start_date="2024-01-01", end_date="2024-01-31" ) ``` ## Health Monitoring ### Individual Service Health ```python # Get specific service health data_health = data_client.get_data_service_health() auth_health = auth_client.get_auth_service_health() training_health = training_client.get_training_service_health() # Health includes: # - Circuit breaker status # - Cache statistics and configuration # - Service-specific features # - Supported endpoints ``` ### Registry-Level Health ```python # Get all service health status all_health = registry.get_all_health_status() # Get aggregate metrics metrics = registry.get_aggregate_metrics() # Returns: # - Total cache hits/misses and hit rate # - Circuit breaker states for all services # - Count of healthy vs total services ``` ## Configuration ### Cache TTL Configuration Each enhanced client has optimized cache TTL values: ```python # Data Service sales_cache_ttl = 600 # 10 minutes weather_cache_ttl = 1800 # 30 minutes traffic_cache_ttl = 3600 # 1 hour product_cache_ttl = 300 # 5 minutes # Auth Service user_cache_ttl = 300 # 5 minutes token_cache_ttl = 60 # 1 minute permission_cache_ttl = 900 # 15 minutes # Training Service job_cache_ttl = 180 # 3 minutes model_cache_ttl = 600 # 10 minutes metrics_cache_ttl = 300 # 5 minutes # And so on... ``` ### Circuit Breaker Configuration ```python CircuitBreakerConfig( failure_threshold=5, # Failures before opening recovery_timeout=60, # Seconds before testing recovery success_threshold=2, # Successes needed to close timeout=30 # Request timeout in seconds ) ``` ## Event System Integration All enhanced clients integrate with the enhanced event system: ### Event Types - **EntityCreatedEvent** - When entities are created - **EntityUpdatedEvent** - When entities are modified - **EntityDeletedEvent** - When entities are removed ### Event Metadata - **correlation_id** - Track operations across services - **source_service** - Service that generated the event - **destination_service** - Target service - **tenant_id** - Tenant context - **user_id** - User context - **tags** - Additional metadata ### Usage in Enhanced Clients Events are automatically published for: - Data uploads and modifications - User creation/updates/deletion - Training job lifecycle - Model deployments - Forecast creation - Tenant management operations - Notification delivery ## Error Handling & Resilience ### Circuit Breaker Protection - Automatically stops requests when services are failing - Provides fallback to cached data when available - Gradually tests service recovery ### Retry Logic - Exponential backoff for transient failures - Configurable retry counts and delays - Authentication token refresh on 401 errors ### Cache Fallbacks - Returns cached data when services are unavailable - Graceful degradation with stale data warnings - Manual cache invalidation for data consistency ## Integration with Repository Pattern The enhanced clients seamlessly integrate with the new repository pattern: ### Service Layer Integration ```python class ForecastingService: def __init__(self, forecast_repository: ForecastRepository, service_registry: ServiceRegistry): self.forecast_repository = forecast_repository self.data_client = service_registry.get_data_client() self.training_client = service_registry.get_training_client() async def create_forecast(self, tenant_id: str, model_id: str): # Get data through enhanced client sales_data = await self.data_client.get_all_sales_data_with_monitoring( tenant_id=tenant_id, correlation_id=f"forecast_data_{datetime.utcnow().isoformat()}" ) # Use repository for database operations forecast = await self.forecast_repository.create({ "tenant_id": tenant_id, "model_id": model_id, "status": "pending" }) return forecast ``` This completes the comprehensive enhanced inter-service communication system that integrates seamlessly with the new repository pattern architecture, providing resilience, monitoring, and advanced features for all service interactions.