Initial commit - production deployment

This commit is contained in:
2026-01-21 17:17:16 +01:00
commit c23d00dd92
2289 changed files with 638440 additions and 0 deletions

View File

@@ -0,0 +1,54 @@
# =============================================================================
# Orchestrator Service Dockerfile - Environment-Configurable Base Images
# =============================================================================
# Build arguments for registry configuration:
# - BASE_REGISTRY: Registry URL (default: docker.io for Docker Hub)
# - PYTHON_IMAGE: Python image name and tag (default: python:3.11-slim)
# =============================================================================
ARG BASE_REGISTRY=docker.io
ARG PYTHON_IMAGE=python:3.11-slim
FROM ${BASE_REGISTRY}/${PYTHON_IMAGE} AS shared
WORKDIR /shared
COPY shared/ /shared/
ARG BASE_REGISTRY=docker.io
ARG PYTHON_IMAGE=python:3.11-slim
FROM ${BASE_REGISTRY}/${PYTHON_IMAGE}
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements
COPY shared/requirements-tracing.txt /tmp/
COPY services/orchestrator/requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r /tmp/requirements-tracing.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy shared libraries from the shared stage
COPY --from=shared /shared /app/shared
# Copy application code
COPY services/orchestrator/ .
# Add shared libraries to Python path
ENV PYTHONPATH="/app:/app/shared:${PYTHONPATH:-}"
ENV PYTHONUNBUFFERED=1
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Run application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

View File

@@ -0,0 +1,928 @@
# Orchestrator Service
## Overview
The **Orchestrator Service** automates daily operational workflows by coordinating tasks across multiple microservices. It schedules and executes recurring jobs like daily forecasting, production planning, procurement needs calculation, and report generation. Operating on a configurable schedule (default: daily at 8:00 AM Madrid time), it ensures that bakery owners start each day with fresh forecasts, optimized production plans, and actionable insights - all without manual intervention.
## Key Features
### Workflow Automation
- **Daily Forecasting** - Generate 7-day demand forecasts every morning
- **Production Planning** - Calculate production schedules from forecasts
- **Procurement Planning** - Identify purchasing needs automatically
- **Inventory Projections** - Project stock levels for next 14 days
- **Report Generation** - Daily summaries, weekly digests
- **Model Retraining** - Weekly ML model updates
- **Alert Cleanup** - Archive resolved alerts
### Scheduling System
- **Cron-Based Scheduling** - Flexible schedule configuration
- **Timezone-Aware** - Respects tenant timezone (Madrid default)
- **Configurable Frequency** - Daily, weekly, monthly workflows
- **Time-Based Execution** - Run at optimal times (early morning)
- **Holiday Awareness** - Skip or adjust on public holidays
- **Weekend Handling** - Different schedules for weekends
### Workflow Execution
- **Sequential Workflows** - Execute steps in correct order
- **Parallel Execution** - Run independent tasks concurrently
- **Error Handling** - Retry failed tasks with exponential backoff
- **Timeout Management** - Cancel long-running tasks
- **Progress Tracking** - Monitor workflow execution status
- **Result Caching** - Cache workflow results in Redis
### Multi-Tenant Management
- **Per-Tenant Workflows** - Execute for all active tenants
- **Tenant Priority** - Prioritize by subscription tier
- **Tenant Filtering** - Skip suspended or cancelled tenants
- **Load Balancing** - Distribute tenant workflows evenly
- **Resource Limits** - Prevent resource exhaustion
### Monitoring & Observability
- **Workflow Metrics** - Execution time, success rate
- **Health Checks** - Service and job health monitoring
- **Failure Alerts** - Notify on workflow failures
- **Audit Logging** - Complete execution history
- **Performance Tracking** - Identify slow workflows
- **Cost Tracking** - Monitor computational costs
### Leader Election
- **Distributed Coordination** - Redis-based leader election
- **High Availability** - Multiple orchestrator instances
- **Automatic Failover** - New leader elected on failure
- **Split-Brain Prevention** - Ensure only one leader
- **Leader Health** - Continuous health monitoring
### 🆕 Enterprise Tier: Network Dashboard & Orchestration (NEW)
- **Aggregated Network Metrics** - Single dashboard view consolidating all child outlet data
- **Production Coordination** - Central production facility gets visibility into network-wide demand
- **Distribution Integration** - Dashboard displays active delivery routes and shipment status
- **Network Demand Forecasting** - Aggregated demand forecasts across all retail outlets
- **Multi-Location Performance** - Compare performance metrics across all locations
- **Child Outlet Visibility** - Drill down into individual outlet performance
- **Enterprise KPIs** - Network-level metrics: total production, total sales, network-wide waste reduction
- **Subscription Gating** - Enterprise dashboard requires Enterprise tier subscription
## Business Value
### For Bakery Owners
- **Zero Manual Work** - Forecasts and plans generated automatically
- **Consistent Execution** - Never forget to plan production
- **Early Morning Ready** - Start day with fresh data (8:00 AM)
- **Weekend Coverage** - Works 7 days/week, 365 days/year
- **Reliable** - Automatic retries on failures
- **Transparent** - Clear audit trail of all automation
### Quantifiable Impact
- **Time Savings**: 15-20 hours/week on manual planning (€900-1,200/month)
- **Consistency**: 100% vs. 70-80% manual execution rate
- **Early Detection**: Issues identified before business hours
- **Error Reduction**: 95%+ accuracy vs. 80-90% manual
- **Staff Freedom**: Staff focus on operations, not planning
- **Scalability**: Handles 10,000+ tenants automatically
### For Platform Operations
- **Automation**: 95%+ of platform operations automated
- **Scalability**: Linear cost scaling with tenants
- **Reliability**: 99.9%+ workflow success rate
- **Predictability**: Consistent execution times
- **Resource Efficiency**: Optimal resource utilization
- **Cost Control**: Prevent runaway computational costs
## Technology Stack
- **Framework**: FastAPI (Python 3.11+) - Async web framework
- **Scheduler**: APScheduler - Job scheduling
- **Database**: PostgreSQL 17 - Workflow history
- **Caching**: Redis 7.4 - Leader election, results cache
- **Messaging**: RabbitMQ 4.1 - Event publishing
- **HTTP Client**: HTTPx - Async service calls
- **ORM**: SQLAlchemy 2.0 (async) - Database abstraction
- **Logging**: Structlog - Structured JSON logging
- **Metrics**: Prometheus Client - Workflow metrics
## API Endpoints (Key Routes)
### Workflow Management
- `GET /api/v1/orchestrator/workflows` - List workflows
- `GET /api/v1/orchestrator/workflows/{workflow_id}` - Get workflow details
- `POST /api/v1/orchestrator/workflows/{workflow_id}/execute` - Manually trigger workflow
- `PUT /api/v1/orchestrator/workflows/{workflow_id}` - Update workflow configuration
- `POST /api/v1/orchestrator/workflows/{workflow_id}/enable` - Enable workflow
- `POST /api/v1/orchestrator/workflows/{workflow_id}/disable` - Disable workflow
### Execution History
- `GET /api/v1/orchestrator/executions` - List workflow executions
- `GET /api/v1/orchestrator/executions/{execution_id}` - Get execution details
- `GET /api/v1/orchestrator/executions/{execution_id}/logs` - Get execution logs
- `GET /api/v1/orchestrator/executions/failed` - List failed executions
- `POST /api/v1/orchestrator/executions/{execution_id}/retry` - Retry failed execution
### Scheduling
- `GET /api/v1/orchestrator/schedule` - Get current schedule
- `PUT /api/v1/orchestrator/schedule` - Update schedule
- `GET /api/v1/orchestrator/schedule/next-run` - Get next execution time
### Health & Monitoring
- `GET /api/v1/orchestrator/health` - Service health
- `GET /api/v1/orchestrator/leader` - Current leader instance
- `GET /api/v1/orchestrator/metrics` - Workflow metrics
- `GET /api/v1/orchestrator/statistics` - Execution statistics
### 🆕 Enterprise Network Dashboard (NEW)
- `GET /api/v1/{parent_tenant}/orchestrator/enterprise/dashboard` - Get aggregated enterprise network dashboard
- `GET /api/v1/{parent_tenant}/orchestrator/enterprise/network-summary` - Get network-wide summary metrics
- `GET /api/v1/{parent_tenant}/orchestrator/enterprise/production-overview` - Get production coordination overview
- `GET /api/v1/{parent_tenant}/orchestrator/enterprise/distribution-status` - Get current distribution/delivery status
- `GET /api/v1/{parent_tenant}/orchestrator/enterprise/child-performance` - Compare performance across child outlets
## Database Schema
### Main Tables
**orchestrator_workflows**
```sql
CREATE TABLE orchestrator_workflows (
id UUID PRIMARY KEY,
workflow_name VARCHAR(255) NOT NULL UNIQUE,
workflow_type VARCHAR(100) NOT NULL, -- daily, weekly, monthly, on_demand
description TEXT,
-- Schedule
cron_expression VARCHAR(100), -- e.g., "0 8 * * *" for 8 AM daily
timezone VARCHAR(50) DEFAULT 'Europe/Madrid',
is_enabled BOOLEAN DEFAULT TRUE,
-- Execution
max_execution_time_seconds INTEGER DEFAULT 3600,
max_retries INTEGER DEFAULT 3,
retry_delay_seconds INTEGER DEFAULT 300,
-- Workflow steps
steps JSONB NOT NULL, -- Array of workflow steps
-- Status
last_execution_at TIMESTAMP,
last_success_at TIMESTAMP,
last_failure_at TIMESTAMP,
next_execution_at TIMESTAMP,
consecutive_failures INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
```
**orchestrator_executions**
```sql
CREATE TABLE orchestrator_executions (
id UUID PRIMARY KEY,
workflow_id UUID REFERENCES orchestrator_workflows(id),
workflow_name VARCHAR(255) NOT NULL,
execution_type VARCHAR(50) NOT NULL, -- scheduled, manual
triggered_by UUID, -- User ID if manual
-- Tenant
tenant_id UUID, -- NULL for global workflows
-- Status
status VARCHAR(50) DEFAULT 'pending', -- pending, running, completed, failed, cancelled
started_at TIMESTAMP,
completed_at TIMESTAMP,
duration_seconds INTEGER,
-- Results
steps_completed INTEGER DEFAULT 0,
steps_total INTEGER DEFAULT 0,
steps_failed INTEGER DEFAULT 0,
error_message TEXT,
result_summary JSONB,
-- Leader info
executed_by_instance VARCHAR(255), -- Instance ID that ran this
created_at TIMESTAMP DEFAULT NOW(),
INDEX idx_executions_workflow_date (workflow_id, created_at DESC),
INDEX idx_executions_tenant_date (tenant_id, created_at DESC)
);
```
**orchestrator_execution_logs**
```sql
CREATE TABLE orchestrator_execution_logs (
id UUID PRIMARY KEY,
execution_id UUID REFERENCES orchestrator_executions(id) ON DELETE CASCADE,
step_name VARCHAR(255) NOT NULL,
step_index INTEGER NOT NULL,
log_level VARCHAR(50) NOT NULL, -- info, warning, error
log_message TEXT NOT NULL,
log_data JSONB,
logged_at TIMESTAMP DEFAULT NOW(),
INDEX idx_execution_logs_execution (execution_id, step_index)
);
```
**orchestrator_leader**
```sql
CREATE TABLE orchestrator_leader (
id INTEGER PRIMARY KEY DEFAULT 1, -- Always 1 (singleton)
instance_id VARCHAR(255) NOT NULL,
instance_hostname VARCHAR(255),
became_leader_at TIMESTAMP NOT NULL,
last_heartbeat_at TIMESTAMP NOT NULL,
heartbeat_interval_seconds INTEGER DEFAULT 30,
CONSTRAINT single_leader CHECK (id = 1)
);
```
**orchestrator_metrics**
```sql
CREATE TABLE orchestrator_metrics (
id UUID PRIMARY KEY,
metric_date DATE NOT NULL,
workflow_name VARCHAR(255),
-- Volume
total_executions INTEGER DEFAULT 0,
successful_executions INTEGER DEFAULT 0,
failed_executions INTEGER DEFAULT 0,
-- Performance
avg_duration_seconds INTEGER,
min_duration_seconds INTEGER,
max_duration_seconds INTEGER,
-- Reliability
success_rate_percentage DECIMAL(5, 2),
avg_retry_count DECIMAL(5, 2),
calculated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(metric_date, workflow_name)
);
```
### Indexes for Performance
```sql
CREATE INDEX idx_workflows_enabled ON orchestrator_workflows(is_enabled, next_execution_at);
CREATE INDEX idx_executions_status ON orchestrator_executions(status, started_at);
CREATE INDEX idx_executions_workflow_status ON orchestrator_executions(workflow_id, status);
CREATE INDEX idx_metrics_date ON orchestrator_metrics(metric_date DESC);
```
## Business Logic Examples
### Daily Workflow Orchestration
```python
async def execute_daily_workflow():
"""
Main daily workflow executed at 8:00 AM Madrid time.
Coordinates forecasting, production, and procurement.
"""
workflow_name = "daily_operations"
execution_id = uuid.uuid4()
logger.info("Starting daily workflow", execution_id=str(execution_id))
# Create execution record
execution = OrchestratorExecution(
id=execution_id,
workflow_name=workflow_name,
execution_type='scheduled',
status='running',
started_at=datetime.utcnow()
)
db.add(execution)
await db.flush()
try:
# Get all active tenants
tenants = await db.query(Tenant).filter(
Tenant.status == 'active'
).all()
execution.steps_total = len(tenants) * 5 # 5 steps per tenant
for tenant in tenants:
try:
# Step 1: Generate forecasts
await log_step(execution_id, "generate_forecasts", tenant.id, "Starting forecast generation")
forecast_result = await trigger_forecasting(tenant.id)
await log_step(execution_id, "generate_forecasts", tenant.id, f"Generated {forecast_result['count']} forecasts")
execution.steps_completed += 1
# Step 2: Calculate production needs
await log_step(execution_id, "calculate_production", tenant.id, "Calculating production needs")
production_result = await trigger_production_planning(tenant.id)
await log_step(execution_id, "calculate_production", tenant.id, f"Planned {production_result['batches']} batches")
execution.steps_completed += 1
# Step 3: Calculate procurement needs
await log_step(execution_id, "calculate_procurement", tenant.id, "Calculating procurement needs")
procurement_result = await trigger_procurement_planning(tenant.id)
await log_step(execution_id, "calculate_procurement", tenant.id, f"Identified {procurement_result['needs_count']} procurement needs")
execution.steps_completed += 1
# Step 4: Generate inventory projections
await log_step(execution_id, "project_inventory", tenant.id, "Projecting inventory")
inventory_result = await trigger_inventory_projection(tenant.id)
await log_step(execution_id, "project_inventory", tenant.id, "Inventory projections completed")
execution.steps_completed += 1
# Step 5: Send daily summary
await log_step(execution_id, "send_summary", tenant.id, "Sending daily summary")
await send_daily_summary(tenant.id, {
'forecasts': forecast_result,
'production': production_result,
'procurement': procurement_result
})
await log_step(execution_id, "send_summary", tenant.id, "Daily summary sent")
execution.steps_completed += 1
except Exception as e:
execution.steps_failed += 1
await log_step(execution_id, "tenant_workflow", tenant.id, f"Failed: {str(e)}", level='error')
logger.error("Tenant workflow failed",
tenant_id=str(tenant.id),
error=str(e))
continue
# Mark execution complete
execution.status = 'completed'
execution.completed_at = datetime.utcnow()
execution.duration_seconds = int((execution.completed_at - execution.started_at).total_seconds())
await db.commit()
logger.info("Daily workflow completed",
execution_id=str(execution_id),
tenants_processed=len(tenants),
duration_seconds=execution.duration_seconds)
# Publish event
await publish_event('orchestrator', 'orchestrator.workflow_completed', {
'workflow_name': workflow_name,
'execution_id': str(execution_id),
'tenants_processed': len(tenants),
'steps_completed': execution.steps_completed,
'steps_failed': execution.steps_failed
})
except Exception as e:
execution.status = 'failed'
execution.error_message = str(e)
execution.completed_at = datetime.utcnow()
execution.duration_seconds = int((execution.completed_at - execution.started_at).total_seconds())
await db.commit()
logger.error("Daily workflow failed",
execution_id=str(execution_id),
error=str(e))
# Send alert
await send_workflow_failure_alert(workflow_name, str(e))
raise
async def trigger_forecasting(tenant_id: UUID) -> dict:
"""
Call forecasting service to generate forecasts.
"""
async with httpx.AsyncClient() as client:
response = await client.post(
f"{FORECASTING_SERVICE_URL}/api/v1/forecasting/generate",
json={'tenant_id': str(tenant_id), 'days_ahead': 7},
timeout=300.0
)
if response.status_code != 200:
raise Exception(f"Forecasting failed: {response.text}")
return response.json()
async def trigger_production_planning(tenant_id: UUID) -> dict:
"""
Call production service to generate production schedules.
"""
async with httpx.AsyncClient() as client:
response = await client.post(
f"{PRODUCTION_SERVICE_URL}/api/v1/production/schedules/generate",
json={'tenant_id': str(tenant_id)},
timeout=180.0
)
if response.status_code != 200:
raise Exception(f"Production planning failed: {response.text}")
return response.json()
async def trigger_procurement_planning(tenant_id: UUID) -> dict:
"""
Call procurement service to calculate needs.
"""
async with httpx.AsyncClient() as client:
response = await client.post(
f"{PROCUREMENT_SERVICE_URL}/api/v1/procurement/needs/calculate",
json={'tenant_id': str(tenant_id), 'days_ahead': 14},
timeout=180.0
)
if response.status_code != 200:
raise Exception(f"Procurement planning failed: {response.text}")
return response.json()
```
### Leader Election
```python
async def start_leader_election():
"""
Participate in leader election using Redis.
Only the leader executes workflows.
"""
instance_id = f"{socket.gethostname()}_{uuid.uuid4().hex[:8]}"
while True:
try:
# Try to become leader
is_leader = await try_become_leader(instance_id)
if is_leader:
logger.info("This instance is the leader", instance_id=instance_id)
# Start workflow scheduler
await start_workflow_scheduler()
# Maintain leadership with heartbeats
while True:
await asyncio.sleep(30) # Heartbeat every 30 seconds
if not await maintain_leadership(instance_id):
logger.warning("Lost leadership", instance_id=instance_id)
break
else:
# Not leader, check again in 60 seconds
logger.info("This instance is a follower", instance_id=instance_id)
await asyncio.sleep(60)
except Exception as e:
logger.error("Leader election error",
instance_id=instance_id,
error=str(e))
await asyncio.sleep(60)
async def try_become_leader(instance_id: str) -> bool:
"""
Try to acquire leadership using Redis lock.
"""
# Try to set leader lock in Redis
lock_key = "orchestrator:leader:lock"
lock_acquired = await redis.set(
lock_key,
instance_id,
ex=90, # Expire in 90 seconds
nx=True # Only set if not exists
)
if lock_acquired:
# Record in database
leader = await db.query(OrchestratorLeader).filter(
OrchestratorLeader.id == 1
).first()
if not leader:
leader = OrchestratorLeader(
id=1,
instance_id=instance_id,
instance_hostname=socket.gethostname(),
became_leader_at=datetime.utcnow(),
last_heartbeat_at=datetime.utcnow()
)
db.add(leader)
else:
leader.instance_id = instance_id
leader.instance_hostname = socket.gethostname()
leader.became_leader_at = datetime.utcnow()
leader.last_heartbeat_at = datetime.utcnow()
await db.commit()
return True
return False
async def maintain_leadership(instance_id: str) -> bool:
"""
Maintain leadership by refreshing Redis lock.
"""
lock_key = "orchestrator:leader:lock"
# Check if we still hold the lock
current_leader = await redis.get(lock_key)
if current_leader != instance_id:
return False
# Refresh lock
await redis.expire(lock_key, 90)
# Update heartbeat
leader = await db.query(OrchestratorLeader).filter(
OrchestratorLeader.id == 1
).first()
if leader and leader.instance_id == instance_id:
leader.last_heartbeat_at = datetime.utcnow()
await db.commit()
return True
return False
```
### Workflow Scheduler
```python
async def start_workflow_scheduler():
"""
Start APScheduler to execute workflows on schedule.
"""
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
scheduler = AsyncIOScheduler(timezone='Europe/Madrid')
# Get workflow configurations
workflows = await db.query(OrchestratorWorkflow).filter(
OrchestratorWorkflow.is_enabled == True
).all()
for workflow in workflows:
# Parse cron expression
trigger = CronTrigger.from_crontab(workflow.cron_expression, timezone=workflow.timezone)
# Add job to scheduler
scheduler.add_job(
execute_workflow,
trigger=trigger,
args=[workflow.id],
id=str(workflow.id),
name=workflow.workflow_name,
max_instances=1, # Prevent concurrent executions
replace_existing=True
)
logger.info("Scheduled workflow",
workflow_name=workflow.workflow_name,
cron=workflow.cron_expression)
# Start scheduler
scheduler.start()
logger.info("Workflow scheduler started")
# Keep scheduler running
while True:
await asyncio.sleep(3600) # Check every hour
```
## Events & Messaging
### Published Events (RabbitMQ)
**Exchange**: `orchestrator`
**Routing Keys**: `orchestrator.workflow_completed`, `orchestrator.workflow_failed`
**Workflow Completed Event**
```json
{
"event_type": "orchestrator_workflow_completed",
"workflow_name": "daily_operations",
"execution_id": "uuid",
"tenants_processed": 125,
"steps_completed": 625,
"steps_failed": 3,
"duration_seconds": 1820,
"timestamp": "2025-11-06T08:30:20Z"
}
```
**Workflow Failed Event**
```json
{
"event_type": "orchestrator_workflow_failed",
"workflow_name": "daily_operations",
"execution_id": "uuid",
"error_message": "Database connection timeout",
"tenants_affected": 45,
"timestamp": "2025-11-06T08:15:30Z"
}
```
### Consumed Events
None - Orchestrator initiates workflows but doesn't consume events
## Custom Metrics (Prometheus)
```python
# Workflow metrics
workflow_executions_total = Counter(
'orchestrator_workflow_executions_total',
'Total workflow executions',
['workflow_name', 'status']
)
workflow_duration_seconds = Histogram(
'orchestrator_workflow_duration_seconds',
'Workflow execution duration',
['workflow_name'],
buckets=[60, 300, 600, 1200, 1800, 3600]
)
workflow_success_rate = Gauge(
'orchestrator_workflow_success_rate_percentage',
'Workflow success rate',
['workflow_name']
)
tenants_processed_total = Counter(
'orchestrator_tenants_processed_total',
'Total tenants processed',
['workflow_name', 'status']
)
leader_instance = Gauge(
'orchestrator_leader_instance',
'Current leader instance (1=leader, 0=follower)',
['instance_id']
)
```
## Configuration
### Environment Variables
**Service Configuration:**
- `PORT` - Service port (default: 8018)
- `DATABASE_URL` - PostgreSQL connection string
- `REDIS_URL` - Redis connection string
- `RABBITMQ_URL` - RabbitMQ connection string
**Workflow Configuration:**
- `DAILY_WORKFLOW_CRON` - Daily workflow schedule (default: "0 8 * * *")
- `WEEKLY_WORKFLOW_CRON` - Weekly workflow schedule (default: "0 9 * * 1")
- `DEFAULT_TIMEZONE` - Default timezone (default: "Europe/Madrid")
- `MAX_WORKFLOW_DURATION_SECONDS` - Max execution time (default: 3600)
**Leader Election:**
- `ENABLE_LEADER_ELECTION` - Enable HA mode (default: true)
- `LEADER_HEARTBEAT_SECONDS` - Heartbeat interval (default: 30)
- `LEADER_LOCK_TTL_SECONDS` - Lock expiration (default: 90)
**Service URLs:**
- `FORECASTING_SERVICE_URL` - Forecasting service URL
- `PRODUCTION_SERVICE_URL` - Production service URL
- `PROCUREMENT_SERVICE_URL` - Procurement service URL
- `INVENTORY_SERVICE_URL` - Inventory service URL
## Development Setup
### Prerequisites
- Python 3.11+
- PostgreSQL 17
- Redis 7.4
- RabbitMQ 4.1
### Local Development
```bash
cd services/orchestrator
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export DATABASE_URL=postgresql://user:pass@localhost:5432/orchestrator
export REDIS_URL=redis://localhost:6379/0
export RABBITMQ_URL=amqp://guest:guest@localhost:5672/
export FORECASTING_SERVICE_URL=http://localhost:8003
export PRODUCTION_SERVICE_URL=http://localhost:8007
alembic upgrade head
python main.py
```
## Integration Points
### Dependencies
- **All Services** - Calls service APIs to execute workflows
- **🆕 Tenant Service** (NEW) - Fetch tenant hierarchy for enterprise dashboards
- **🆕 Forecasting Service** (NEW) - Fetch network-aggregated demand forecasts
- **🆕 Distribution Service** (NEW) - Fetch active delivery routes and shipment status
- **🆕 Production Service** (NEW) - Fetch production metrics across network
- **Redis** - Leader election and caching
- **PostgreSQL** - Workflow history
- **RabbitMQ** - Event publishing
### Dependents
- **All Services** - Benefit from automated workflows
- **Monitoring** - Tracks workflow execution
- **🆕 Frontend Enterprise Dashboard** (NEW) - Displays aggregated network metrics for parent tenants
## Business Value for VUE Madrid
### Problem Statement
Manual daily operations don't scale:
- Staff forget to generate forecasts daily
- Production planning done inconsistently
- Procurement needs identified too late
- Reports generated manually
- No weekend/holiday coverage
- Human error in execution
### Solution
Bakery-IA Orchestrator provides:
- **Fully Automated**: 95%+ operations automated
- **Consistent Execution**: 100% vs. 70-80% manual
- **Early Morning Ready**: Data ready before business opens
- **365-Day Coverage**: Works weekends and holidays
- **Error Recovery**: Automatic retries
- **Scalable**: Handles 10,000+ tenants
### Quantifiable Impact
**Time Savings:**
- 15-20 hours/week per bakery on manual planning
- €900-1,200/month labor cost savings per bakery
- 100% consistency vs. 70-80% manual execution
**Operational Excellence:**
- 99.9%+ workflow success rate
- Issues identified before business hours
- Zero forgotten forecasts or plans
- Predictable daily operations
**Platform Scalability:**
- Linear cost scaling with tenants
- 10,000+ tenant capacity with one orchestrator
- €0.01-0.05 per tenant per day computational cost
- High availability with leader election
### ROI for Platform
**Investment**: €50-200/month (compute + infrastructure)
**Value Delivered**: €900-1,200/month per tenant
**Platform Scale**: €90,000-120,000/month at 100 tenants
**Cost Ratio**: <1% of value delivered
---
## 🆕 Forecast Validation Integration (NEW)
### Overview
The orchestrator now integrates with the Forecasting Service's validation system to automatically validate forecast accuracy and trigger model improvements.
### Daily Workflow Integration
The daily workflow now includes a **Step 5: Validate Previous Forecasts** after generating new forecasts:
```python
# Step 5: Validate previous day's forecasts
await log_step(execution_id, "validate_forecasts", tenant.id, "Validating forecasts")
validation_result = await forecast_client.validate_forecasts(
tenant_id=tenant.id,
orchestration_run_id=execution_id
)
await log_step(
execution_id,
"validate_forecasts",
tenant.id,
f"Validation complete: MAPE={validation_result.get('overall_mape', 'N/A')}%"
)
execution.steps_completed += 1
```
### What Gets Validated
Every morning at 8:00 AM, the orchestrator:
1. **Generates today's forecasts** (Steps 1-4)
2. **Validates yesterday's forecasts** (Step 5) by:
- Fetching yesterday's forecast predictions
- Fetching yesterday's actual sales from Sales Service
- Calculating accuracy metrics (MAE, MAPE, RMSE, R², Accuracy %)
- Storing validation results in `validation_runs` table
- Identifying poor-performing products/locations
### Benefits
**For Bakery Owners:**
- **Daily Accuracy Tracking**: See how accurate yesterday's forecast was
- **Product-Level Insights**: Know which products have reliable forecasts
- **Continuous Improvement**: Models automatically retrain when accuracy drops
- **Trust & Confidence**: Validated accuracy metrics build trust in forecasts
**For Platform Operations:**
- **Automated Quality Control**: No manual validation needed
- **Early Problem Detection**: Performance degradation identified within 24 hours
- **Model Health Monitoring**: Track accuracy trends over time
- **Automatic Retraining**: Models improve automatically when needed
### Validation Metrics
Each validation run tracks:
- **Overall Metrics**: MAPE, MAE, RMSE, R², Accuracy %
- **Coverage**: % of forecasts with actual sales data
- **Product Performance**: Top/bottom performers by MAPE
- **Location Performance**: Accuracy by location/POS
- **Trend Analysis**: Week-over-week accuracy changes
### Historical Data Handling
When late sales data arrives (e.g., from CSV imports or delayed POS sync):
- **Webhook Integration**: Sales Service notifies Forecasting Service
- **Gap Detection**: System identifies dates with forecasts but no validation
- **Automatic Backfill**: Validates historical forecasts retroactively
- **Complete Coverage**: Ensures 100% of forecasts eventually get validated
### Performance Monitoring & Retraining
**Weekly Evaluation** (runs Sunday night):
```python
# Analyze 30-day performance
await retraining_service.evaluate_and_trigger_retraining(
tenant_id=tenant.id,
auto_trigger=True # Automatically retrain poor performers
)
```
**Retraining Triggers:**
- MAPE > 30% (critical threshold)
- MAPE increased > 5% in 30 days
- Model age > 30 days
- Manual trigger via API
**Automatic Actions:**
- Identifies products with MAPE > 30%
- Triggers retraining via Training Service
- Tracks retraining job status
- Validates improved accuracy after retraining
### Integration Flow
```
Daily Orchestrator (8:00 AM)
Step 1-4: Generate forecasts, production, procurement
Step 5: Validate yesterday's forecasts
Forecasting Service validates vs Sales Service
Store validation results in validation_runs table
If poor performance detected → Queue for retraining
Weekly Retraining Job (Sunday night)
Trigger Training Service for poor performers
Models improve over time
```
### Expected Results
**After 1 month:**
- 100% validation coverage (all forecasts validated)
- Baseline accuracy metrics established
- Poor performers identified for retraining
**After 3 months:**
- 10-15% accuracy improvement from automatic retraining
- Reduced MAPE from 25% → 15% average
- Better inventory decisions from trusted forecasts
- Reduced waste from more accurate predictions
**After 6 months:**
- Continuous model improvement cycle established
- Optimal accuracy for each product category
- Predictable performance metrics
- Trust in forecast-driven decisions
### Monitoring Dashboard Additions
New metrics available for dashboards:
1. **Validation Status Card**
- Last validation: timestamp, status
- Overall MAPE: % with trend arrow
- Validation coverage: %
- Health status: healthy/warning/critical
2. **Accuracy Trends Graph**
- 30-day MAPE trend line
- Target threshold lines (20%, 30%)
- Product performance distribution
3. **Retraining Activity**
- Models retrained this week
- Retraining success rate
- Products pending retraining
- Next scheduled retraining
---
**Copyright © 2025 Bakery-IA. All rights reserved.**

View File

@@ -0,0 +1,105 @@
# A generic, single database configuration for orchestrator service
[alembic]
# path to migration scripts
script_location = migrations
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
file_template = %%(year)d%%(month).2d%%(day).2d_%%(hour).2d%%(minute).2d_%%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python>=3.9 or backports.zoneinfo library.
# Any required deps can installed by adding `alembic[tz]` to the pip requirements
# string value is passed to ZoneInfo()
# leave blank for localtime
# timezone =
# max length of characters to apply to the
# "slug" field
# max_length = 40
# version_num, name, path
version_locations = %(here)s/migrations/versions
# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses
# os.pathsep. If this key is omitted entirely, it falls back to the legacy
# behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
# Use os.pathsep. Default configuration used for new projects.
version_path_separator = os
# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10.0
# recursive_version_locations = false
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
sqlalchemy.url = driver://user:pass@localhost/dbname
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME
# lint with attempts to fix using "ruff" - use the exec runner, execute a binary
# hooks = ruff
# ruff.type = exec
# ruff.executable = %(here)s/.venv/bin/ruff
# ruff.options = --fix REVISION_SCRIPT_FILENAME
# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
qualname =
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stdout,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s

View File

View File

@@ -0,0 +1,4 @@
from .orchestration import router as orchestration_router
from .internal_demo import router as internal_demo_router
__all__ = ["orchestration_router", "internal_demo_router"]

View File

@@ -0,0 +1,177 @@
"""
Internal API for Alert Intelligence Service
Provides orchestrator context for alert enrichment
"""
from fastapi import APIRouter, Header, HTTPException, Query
from typing import Optional, List, Dict, Any
from datetime import datetime, timedelta
from uuid import UUID
from pydantic import BaseModel
router = APIRouter(prefix="/api/internal", tags=["internal"])
class OrchestrationAction(BaseModel):
"""Recent orchestration action"""
id: str
type: str # purchase_order, production_batch
status: str # created, pending_approval, approved, completed
delivery_date: Optional[datetime]
reasoning: Optional[Dict[str, Any]]
estimated_resolution: Optional[datetime]
created_at: datetime
class RecentActionsResponse(BaseModel):
"""Response with recent orchestrator actions"""
actions: List[OrchestrationAction]
count: int
@router.get("/recent-actions", response_model=RecentActionsResponse)
async def get_recent_actions(
tenant_id: str = Query(..., description="Tenant ID"),
ingredient_id: Optional[str] = Query(None, description="Filter by ingredient"),
product_id: Optional[str] = Query(None, description="Filter by product"),
hours_ago: int = Query(24, description="Look back hours"),
):
"""
Get recent orchestrator actions for alert context enrichment.
Only accessible by internal services (alert-intelligence).
Returns orchestration runs with details about POs created, batches adjusted, etc.
This helps the alert system understand if AI already addressed similar issues.
"""
from shared.database.base import create_database_manager
from ..core.config import get_settings
from ..models.orchestration_run import OrchestrationRun, OrchestrationStatus
from sqlalchemy import select, and_, desc
import structlog
logger = structlog.get_logger()
try:
settings = get_settings()
db_manager = create_database_manager(settings.DATABASE_URL, "orchestrator")
async with db_manager.get_session() as session:
cutoff_time = datetime.utcnow() - timedelta(hours=hours_ago)
# Query recent orchestration runs
query = select(OrchestrationRun).where(
and_(
OrchestrationRun.tenant_id == UUID(tenant_id),
OrchestrationRun.created_at >= cutoff_time,
OrchestrationRun.status.in_([
OrchestrationStatus.completed,
OrchestrationStatus.partial_success
])
)
).order_by(desc(OrchestrationRun.created_at))
result = await session.execute(query)
runs = result.scalars().all()
actions = []
for run in runs:
run_metadata = run.run_metadata or {}
# Add purchase order actions
if run.purchase_orders_created > 0:
po_details = run_metadata.get('purchase_orders', [])
# If metadata has PO details, use them
if po_details:
for po in po_details:
# Filter by ingredient if specified
if ingredient_id:
po_items = po.get('items', [])
has_ingredient = any(
item.get('ingredient_id') == ingredient_id
for item in po_items
)
if not has_ingredient:
continue
actions.append(OrchestrationAction(
id=po.get('id', str(run.id)),
type="purchase_order",
status=po.get('status', 'created'),
delivery_date=po.get('delivery_date'),
reasoning=run_metadata.get('reasoning'),
estimated_resolution=po.get('delivery_date'),
created_at=run.created_at
))
else:
# Fallback: create generic action from run
actions.append(OrchestrationAction(
id=str(run.id),
type="purchase_order",
status="created",
delivery_date=None,
reasoning=run_metadata.get('reasoning'),
estimated_resolution=None,
created_at=run.created_at
))
# Add production batch actions
if run.production_batches_created > 0:
batch_details = run_metadata.get('production_batches', [])
if batch_details:
for batch in batch_details:
# Filter by product if specified
if product_id and batch.get('product_id') != product_id:
continue
actions.append(OrchestrationAction(
id=batch.get('id', str(run.id)),
type="production_batch",
status=batch.get('status', 'created'),
delivery_date=None,
reasoning=run_metadata.get('reasoning'),
estimated_resolution=batch.get('scheduled_date'),
created_at=run.created_at
))
else:
# Fallback: create generic action from run
if not product_id: # Only add if no product filter
actions.append(OrchestrationAction(
id=str(run.id),
type="production_batch",
status="created",
delivery_date=None,
reasoning=run_metadata.get('reasoning'),
estimated_resolution=None,
created_at=run.created_at
))
logger.info(
"recent_actions_fetched",
tenant_id=tenant_id,
hours_ago=hours_ago,
action_count=len(actions),
ingredient_id=ingredient_id,
product_id=product_id
)
return RecentActionsResponse(
actions=actions,
count=len(actions)
)
except Exception as e:
logger.error("error_fetching_recent_actions", error=str(e), tenant_id=tenant_id)
raise HTTPException(
status_code=500,
detail=f"Failed to fetch recent actions: {str(e)}"
)
@router.get("/health")
async def internal_health():
"""Internal health check"""
return {"status": "healthy", "api": "internal"}

View File

@@ -0,0 +1,277 @@
"""
Internal Demo API Endpoints for Orchestrator Service
Used by demo_session service to clone data for virtual demo tenants
"""
from fastapi import APIRouter, Depends, HTTPException, Header
from typing import Dict, Any
from uuid import UUID
import structlog
import os
import json
from app.core.database import get_db
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select, delete, func
from app.models.orchestration_run import OrchestrationRun, OrchestrationStatus
import uuid
from datetime import datetime, timezone, timedelta
from typing import Optional
import sys
from pathlib import Path
# Add shared utilities to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent))
from shared.utils.demo_dates import adjust_date_for_demo
from app.core.config import settings
router = APIRouter(prefix="/internal/demo", tags=["internal"])
logger = structlog.get_logger()
async def ensure_unique_run_number(db: AsyncSession, base_run_number: str) -> str:
"""Ensure the run number is unique by appending a suffix if needed"""
proposed_run_number = base_run_number
# Check if the proposed run number already exists in the database
while True:
result = await db.execute(
select(OrchestrationRun)
.where(OrchestrationRun.run_number == proposed_run_number)
)
existing_run = result.scalar_one_or_none()
if not existing_run:
# Run number is unique, return it
return proposed_run_number
# Generate a new run number with an additional random suffix
random_suffix = str(uuid.uuid4())[:4].upper()
proposed_run_number = f"{base_run_number[:50-len(random_suffix)-1]}-{random_suffix}"
async def load_fixture_data_for_tenant(
db: AsyncSession,
tenant_uuid: UUID,
demo_account_type: str,
reference_time: datetime,
base_tenant_id: Optional[str] = None
) -> int:
"""
Load orchestration run data from JSON fixture directly into the virtual tenant.
Returns the number of runs created.
"""
from shared.utils.seed_data_paths import get_seed_data_path
from shared.utils.demo_dates import resolve_time_marker, adjust_date_for_demo
# Load fixture data
if demo_account_type == "enterprise_child" and base_tenant_id:
json_file = get_seed_data_path("enterprise", "11-orchestrator.json", child_id=base_tenant_id)
else:
json_file = get_seed_data_path(demo_account_type, "11-orchestrator.json")
with open(json_file, 'r', encoding='utf-8') as f:
fixture_data = json.load(f)
orchestration_run_data = fixture_data.get("orchestration_run")
if not orchestration_run_data:
logger.warning("No orchestration_run data in fixture")
return 0
# Parse and adjust dates from fixture to reference_time
base_started_at = resolve_time_marker(orchestration_run_data.get("started_at"), reference_time)
base_completed_at = resolve_time_marker(orchestration_run_data.get("completed_at"), reference_time)
# Adjust dates to make them appear recent relative to session creation
started_at = adjust_date_for_demo(base_started_at, reference_time) if base_started_at else reference_time - timedelta(hours=2)
completed_at = adjust_date_for_demo(base_completed_at, reference_time) if base_completed_at else started_at + timedelta(minutes=15)
# Generate unique run number with session context
current_year = reference_time.year
unique_suffix = str(uuid.uuid4())[:8].upper()
run_number = f"ORCH-DEMO-PROF-{current_year}-001-{unique_suffix}"
# Create orchestration run for virtual tenant
new_run = OrchestrationRun(
id=uuid.uuid4(), # Generate new UUID
tenant_id=tenant_uuid,
run_number=run_number,
status=OrchestrationStatus[orchestration_run_data["status"]],
run_type=orchestration_run_data.get("run_type", "daily"),
priority="normal",
started_at=started_at,
completed_at=completed_at,
duration_seconds=orchestration_run_data.get("duration_seconds", 900),
# Step statuses from orchestration_results
forecasting_status="success",
forecasting_started_at=started_at,
forecasting_completed_at=started_at + timedelta(minutes=2),
production_status="success",
production_started_at=started_at + timedelta(minutes=2),
production_completed_at=started_at + timedelta(minutes=5),
procurement_status="success",
procurement_started_at=started_at + timedelta(minutes=5),
procurement_completed_at=started_at + timedelta(minutes=8),
notification_status="success",
notification_started_at=started_at + timedelta(minutes=8),
notification_completed_at=completed_at,
# Results from orchestration_results
forecasts_generated=fixture_data.get("orchestration_results", {}).get("forecasts_generated", 10),
production_batches_created=fixture_data.get("orchestration_results", {}).get("production_batches_created", 18),
procurement_plans_created=0,
purchase_orders_created=fixture_data.get("orchestration_results", {}).get("purchase_orders_created", 6),
notifications_sent=fixture_data.get("orchestration_results", {}).get("notifications_sent", 8),
# Metadata
triggered_by="system",
created_at=started_at,
updated_at=completed_at
)
db.add(new_run)
await db.flush()
logger.info(
"Loaded orchestration run from fixture",
tenant_id=str(tenant_uuid),
run_number=new_run.run_number,
started_at=started_at.isoformat()
)
return 1
@router.post("/clone")
async def clone_demo_data(
base_tenant_id: str,
virtual_tenant_id: str,
demo_account_type: str,
session_id: Optional[str] = None,
session_created_at: Optional[str] = None,
db: AsyncSession = Depends(get_db)
):
"""
Clone orchestration run demo data from base tenant to virtual tenant
This endpoint is called by the demo_session service during session initialization.
It clones orchestration runs with date adjustments to make them appear recent.
If the base tenant has no orchestration runs, it will first seed them from the fixture.
"""
start_time = datetime.now(timezone.utc)
# Parse session_created_at or use current time
if session_created_at:
try:
reference_time = datetime.fromisoformat(session_created_at.replace('Z', '+00:00'))
except:
reference_time = datetime.now(timezone.utc)
else:
reference_time = datetime.now(timezone.utc)
logger.info(
"Starting orchestration runs cloning with date adjustment",
base_tenant_id=base_tenant_id,
virtual_tenant_id=virtual_tenant_id,
demo_account_type=demo_account_type,
session_id=session_id,
reference_time=reference_time.isoformat()
)
try:
virtual_uuid = uuid.UUID(virtual_tenant_id)
# Load fixture data directly into virtual tenant (no base tenant cloning)
runs_created = await load_fixture_data_for_tenant(
db,
virtual_uuid,
demo_account_type,
reference_time,
base_tenant_id
)
await db.commit()
duration_ms = int((datetime.now(timezone.utc) - start_time).total_seconds() * 1000)
logger.info(
"Orchestration runs loaded from fixture successfully",
virtual_tenant_id=str(virtual_tenant_id),
runs_created=runs_created,
duration_ms=duration_ms
)
return {
"service": "orchestrator",
"status": "completed",
"success": True,
"records_cloned": runs_created,
"runs_cloned": runs_created,
"duration_ms": duration_ms
}
except Exception as e:
logger.error("Failed to clone orchestration runs", error=str(e), exc_info=True)
await db.rollback()
raise HTTPException(status_code=500, detail=f"Failed to clone orchestration runs: {str(e)}")
@router.delete("/tenant/{virtual_tenant_id}")
async def delete_demo_data(
virtual_tenant_id: str,
db: AsyncSession = Depends(get_db)
):
"""Delete all orchestration runs for a virtual demo tenant"""
logger.info("Deleting orchestration runs for virtual tenant", virtual_tenant_id=virtual_tenant_id)
start_time = datetime.now(timezone.utc)
try:
virtual_uuid = uuid.UUID(virtual_tenant_id)
# Count records
run_count = await db.scalar(
select(func.count(OrchestrationRun.id))
.where(OrchestrationRun.tenant_id == virtual_uuid)
)
# Delete orchestration runs
await db.execute(
delete(OrchestrationRun)
.where(OrchestrationRun.tenant_id == virtual_uuid)
)
await db.commit()
duration_ms = int((datetime.now(timezone.utc) - start_time).total_seconds() * 1000)
logger.info(
"Orchestration runs deleted successfully",
virtual_tenant_id=virtual_tenant_id,
duration_ms=duration_ms
)
return {
"service": "orchestrator",
"status": "deleted",
"virtual_tenant_id": virtual_tenant_id,
"records_deleted": {
"orchestration_runs": run_count,
"total": run_count
},
"duration_ms": duration_ms
}
except Exception as e:
logger.error("Failed to delete orchestration runs", error=str(e), exc_info=True)
await db.rollback()
raise HTTPException(status_code=500, detail=str(e))
@router.get("/clone/health")
async def health_check():
"""Health check for demo cloning endpoint"""
return {"status": "healthy", "service": "orchestrator"}

View File

@@ -0,0 +1,346 @@
# ================================================================
# services/orchestrator/app/api/orchestration.py
# ================================================================
"""
Orchestration API Endpoints
Testing and manual trigger endpoints for orchestration
"""
import uuid
from typing import Optional
from fastapi import APIRouter, Depends, HTTPException, Request
from pydantic import BaseModel, Field
import structlog
from app.core.database import get_db
from app.repositories.orchestration_run_repository import OrchestrationRunRepository
from sqlalchemy.ext.asyncio import AsyncSession
logger = structlog.get_logger()
router = APIRouter(prefix="/api/v1/tenants/{tenant_id}/orchestrator", tags=["Orchestration"])
# ================================================================
# REQUEST/RESPONSE SCHEMAS
# ================================================================
class OrchestratorTestRequest(BaseModel):
"""Request schema for testing orchestrator"""
test_scenario: Optional[str] = Field(None, description="Test scenario: full, production_only, procurement_only")
dry_run: bool = Field(False, description="Dry run mode (no actual changes)")
class OrchestratorTestResponse(BaseModel):
"""Response schema for orchestrator test"""
success: bool
message: str
tenant_id: str
forecasting_completed: bool = False
production_completed: bool = False
procurement_completed: bool = False
notifications_sent: bool = False
summary: dict = {}
class OrchestratorWorkflowRequest(BaseModel):
"""Request schema for daily workflow trigger"""
dry_run: bool = Field(False, description="Dry run mode (no actual changes)")
class OrchestratorWorkflowResponse(BaseModel):
"""Response schema for daily workflow trigger"""
success: bool
message: str
tenant_id: str
run_id: Optional[str] = None
forecasting_completed: bool = False
production_completed: bool = False
procurement_completed: bool = False
notifications_sent: bool = False
summary: dict = {}
# ================================================================
# API ENDPOINTS
# ================================================================
@router.post("/test", response_model=OrchestratorTestResponse)
async def trigger_orchestrator_test(
tenant_id: str,
request_data: OrchestratorTestRequest,
request: Request,
db: AsyncSession = Depends(get_db)
):
"""
Trigger orchestrator for testing purposes
This endpoint allows manual triggering of the orchestration workflow
for a specific tenant, useful for testing during development.
Args:
tenant_id: Tenant ID to orchestrate
request_data: Test request with scenario and dry_run options
request: FastAPI request object
db: Database session
Returns:
OrchestratorTestResponse with results
"""
logger.info("Orchestrator test trigger requested",
tenant_id=tenant_id,
test_scenario=request_data.test_scenario,
dry_run=request_data.dry_run)
try:
# Get scheduler service from app state
if not hasattr(request.app.state, 'scheduler_service'):
raise HTTPException(
status_code=503,
detail="Orchestrator scheduler service not available"
)
scheduler_service = request.app.state.scheduler_service
# Trigger orchestration
tenant_uuid = uuid.UUID(tenant_id)
result = await scheduler_service.trigger_orchestration_for_tenant(
tenant_id=tenant_uuid,
test_scenario=request_data.test_scenario
)
# Get the latest run for this tenant
repo = OrchestrationRunRepository(db)
latest_run = await repo.get_latest_run_for_tenant(tenant_uuid)
# Build response
response = OrchestratorTestResponse(
success=result.get('success', False),
message=result.get('message', 'Orchestration completed'),
tenant_id=tenant_id,
forecasting_completed=latest_run.forecasting_status == 'success' if latest_run else False,
production_completed=latest_run.production_status == 'success' if latest_run else False,
procurement_completed=latest_run.procurement_status == 'success' if latest_run else False,
notifications_sent=latest_run.notification_status == 'success' if latest_run else False,
summary={
'forecasts_generated': latest_run.forecasts_generated if latest_run else 0,
'batches_created': latest_run.production_batches_created if latest_run else 0,
'pos_created': latest_run.purchase_orders_created if latest_run else 0,
'notifications_sent': latest_run.notifications_sent if latest_run else 0
}
)
logger.info("Orchestrator test completed",
tenant_id=tenant_id,
success=response.success)
return response
except ValueError as e:
raise HTTPException(status_code=400, detail=f"Invalid tenant ID: {str(e)}")
except Exception as e:
logger.error("Orchestrator test failed",
tenant_id=tenant_id,
error=str(e),
exc_info=True)
raise HTTPException(status_code=500, detail=f"Orchestrator test failed: {str(e)}")
@router.post("/run-daily-workflow", response_model=OrchestratorWorkflowResponse)
async def run_daily_workflow(
tenant_id: str,
request_data: Optional[OrchestratorWorkflowRequest] = None,
request: Request = None,
db: AsyncSession = Depends(get_db)
):
"""
Trigger the daily orchestrated workflow for a tenant
This endpoint runs the complete daily workflow which includes:
1. Forecasting Service: Generate demand forecasts
2. Production Service: Create production schedule from forecasts
3. Procurement Service: Generate procurement plan
4. Notification Service: Send relevant notifications
This is the production endpoint used by the dashboard scheduler button.
Args:
tenant_id: Tenant ID to orchestrate
request_data: Optional request data with dry_run flag
request: FastAPI request object
db: Database session
Returns:
OrchestratorWorkflowResponse with workflow execution results
"""
logger.info("Daily workflow trigger requested", tenant_id=tenant_id)
# Handle optional request_data
if request_data is None:
request_data = OrchestratorWorkflowRequest()
try:
# Get scheduler service from app state
if not hasattr(request.app.state, 'scheduler_service'):
raise HTTPException(
status_code=503,
detail="Orchestrator scheduler service not available"
)
scheduler_service = request.app.state.scheduler_service
# Trigger orchestration (use full workflow, not test scenario)
tenant_uuid = uuid.UUID(tenant_id)
result = await scheduler_service.trigger_orchestration_for_tenant(
tenant_id=tenant_uuid,
test_scenario=None # Full production workflow
)
# Get the latest run for this tenant
repo = OrchestrationRunRepository(db)
latest_run = await repo.get_latest_run_for_tenant(tenant_uuid)
# Build response
response = OrchestratorWorkflowResponse(
success=result.get('success', False),
message=result.get('message', 'Daily workflow completed successfully'),
tenant_id=tenant_id,
run_id=str(latest_run.id) if latest_run else None,
forecasting_completed=latest_run.forecasting_status == 'success' if latest_run else False,
production_completed=latest_run.production_status == 'success' if latest_run else False,
procurement_completed=latest_run.procurement_status == 'success' if latest_run else False,
notifications_sent=latest_run.notification_status == 'success' if latest_run else False,
summary={
'run_number': latest_run.run_number if latest_run else 0,
'forecasts_generated': latest_run.forecasts_generated if latest_run else 0,
'production_batches_created': latest_run.production_batches_created if latest_run else 0,
'purchase_orders_created': latest_run.purchase_orders_created if latest_run else 0,
'notifications_sent': latest_run.notifications_sent if latest_run else 0,
'duration_seconds': latest_run.duration_seconds if latest_run else 0
}
)
logger.info("Daily workflow completed",
tenant_id=tenant_id,
success=response.success,
run_id=response.run_id)
return response
except ValueError as e:
raise HTTPException(status_code=400, detail=f"Invalid tenant ID: {str(e)}")
except Exception as e:
logger.error("Daily workflow failed",
tenant_id=tenant_id,
error=str(e),
exc_info=True)
raise HTTPException(status_code=500, detail=f"Daily workflow failed: {str(e)}")
@router.get("/health")
async def orchestrator_health():
"""Check orchestrator health"""
return {
"status": "healthy",
"service": "orchestrator",
"message": "Orchestrator service is running"
}
@router.get("/runs", response_model=dict)
async def list_orchestration_runs(
tenant_id: str,
limit: int = 10,
offset: int = 0,
db: AsyncSession = Depends(get_db)
):
"""
List orchestration runs for a tenant
Args:
tenant_id: Tenant ID
limit: Maximum number of runs to return
offset: Number of runs to skip
db: Database session
Returns:
List of orchestration runs
"""
try:
tenant_uuid = uuid.UUID(tenant_id)
repo = OrchestrationRunRepository(db)
runs = await repo.list_runs(
tenant_id=tenant_uuid,
limit=limit,
offset=offset
)
return {
"runs": [
{
"id": str(run.id),
"run_number": run.run_number,
"status": run.status.value,
"started_at": run.started_at.isoformat() if run.started_at else None,
"completed_at": run.completed_at.isoformat() if run.completed_at else None,
"duration_seconds": run.duration_seconds,
"forecasts_generated": run.forecasts_generated,
"batches_created": run.production_batches_created,
"pos_created": run.purchase_orders_created
}
for run in runs
],
"total": len(runs),
"limit": limit,
"offset": offset
}
except ValueError as e:
raise HTTPException(status_code=400, detail=f"Invalid tenant ID: {str(e)}")
except Exception as e:
logger.error("Error listing orchestration runs",
tenant_id=tenant_id,
error=str(e))
raise HTTPException(status_code=500, detail=str(e))
@router.get("/last-run")
async def get_last_orchestration_run(
tenant_id: str,
db: AsyncSession = Depends(get_db)
):
"""
Get timestamp of last orchestration run
Lightweight endpoint for health status frontend migration (Phase 4).
Returns only timestamp and run number for the most recent completed run.
Args:
tenant_id: Tenant ID
Returns:
Dict with timestamp and runNumber (or None if no runs)
"""
try:
tenant_uuid = uuid.UUID(tenant_id)
repo = OrchestrationRunRepository(db)
# Get most recent completed run
latest_run = await repo.get_latest_run_for_tenant(tenant_uuid)
if not latest_run:
return {"timestamp": None, "runNumber": None}
return {
"timestamp": latest_run.started_at.isoformat() if latest_run.started_at else None,
"runNumber": latest_run.run_number
}
except ValueError as e:
raise HTTPException(status_code=400, detail=f"Invalid tenant ID: {str(e)}")
except Exception as e:
logger.error("Error getting last orchestration run",
tenant_id=tenant_id,
error=str(e))
raise HTTPException(status_code=500, detail=str(e))

View File

@@ -0,0 +1,133 @@
# ================================================================
# services/orchestrator/app/core/config.py
# ================================================================
"""
Orchestrator Service Configuration
"""
import os
from pydantic import Field
from shared.config.base import BaseServiceSettings
class OrchestratorSettings(BaseServiceSettings):
"""Orchestrator service specific settings"""
# Service Identity
APP_NAME: str = "Orchestrator Service"
SERVICE_NAME: str = "orchestrator-service"
VERSION: str = "1.0.0"
DESCRIPTION: str = "Automated orchestration of forecasting, production, and procurement workflows"
# Database configuration (minimal - only for audit logs)
@property
def DATABASE_URL(self) -> str:
"""Build database URL from secure components"""
# Try complete URL first (for backward compatibility)
complete_url = os.getenv("ORCHESTRATOR_DATABASE_URL")
if complete_url:
return complete_url
# Build from components (secure approach)
user = os.getenv("ORCHESTRATOR_DB_USER", "orchestrator_user")
password = os.getenv("ORCHESTRATOR_DB_PASSWORD", "orchestrator_pass123")
host = os.getenv("ORCHESTRATOR_DB_HOST", "localhost")
port = os.getenv("ORCHESTRATOR_DB_PORT", "5432")
name = os.getenv("ORCHESTRATOR_DB_NAME", "orchestrator_db")
return f"postgresql+asyncpg://{user}:{password}@{host}:{port}/{name}"
# Orchestration Settings
ORCHESTRATION_ENABLED: bool = os.getenv("ORCHESTRATION_ENABLED", "true").lower() == "true"
ORCHESTRATION_SCHEDULE: str = os.getenv("ORCHESTRATION_SCHEDULE", "30 5 * * *") # 5:30 AM daily (cron format)
ORCHESTRATION_HOUR: int = int(os.getenv("ORCHESTRATION_HOUR", "2")) # Hour to run daily orchestration (default: 2 AM)
ORCHESTRATION_MINUTE: int = int(os.getenv("ORCHESTRATION_MINUTE", "0")) # Minute to run (default: :00)
ORCHESTRATION_TIMEOUT_SECONDS: int = int(os.getenv("ORCHESTRATION_TIMEOUT_SECONDS", "600")) # 10 minutes
# Tenant Processing
MAX_CONCURRENT_TENANTS: int = int(os.getenv("MAX_CONCURRENT_TENANTS", "5"))
TENANT_TIMEOUT_SECONDS: int = int(os.getenv("TENANT_TIMEOUT_SECONDS", "180")) # 3 minutes per tenant
# Retry Configuration
MAX_RETRIES: int = int(os.getenv("MAX_RETRIES", "3"))
RETRY_DELAY_SECONDS: int = int(os.getenv("RETRY_DELAY_SECONDS", "30"))
ENABLE_EXPONENTIAL_BACKOFF: bool = os.getenv("ENABLE_EXPONENTIAL_BACKOFF", "true").lower() == "true"
# Circuit Breaker
CIRCUIT_BREAKER_ENABLED: bool = os.getenv("CIRCUIT_BREAKER_ENABLED", "true").lower() == "true"
CIRCUIT_BREAKER_FAILURE_THRESHOLD: int = int(os.getenv("CIRCUIT_BREAKER_FAILURE_THRESHOLD", "5"))
CIRCUIT_BREAKER_RESET_TIMEOUT: int = int(os.getenv("CIRCUIT_BREAKER_RESET_TIMEOUT", "300")) # 5 minutes
# ================================================================
# CIRCUIT BREAKER SETTINGS - Enhanced with Pydantic validation
# ================================================================
CIRCUIT_BREAKER_TIMEOUT_DURATION: int = Field(
default=60,
description="Seconds to wait before attempting recovery"
)
CIRCUIT_BREAKER_SUCCESS_THRESHOLD: int = Field(
default=2,
description="Successful calls needed to close circuit"
)
# ================================================================
# SAGA PATTERN SETTINGS
# ================================================================
SAGA_TIMEOUT_SECONDS: int = Field(
default=600,
description="Timeout for saga execution (10 minutes)"
)
SAGA_ENABLE_COMPENSATION: bool = Field(
default=True,
description="Enable saga compensation on failure"
)
# Service Integration URLs
FORECASTING_SERVICE_URL: str = os.getenv("FORECASTING_SERVICE_URL", "http://forecasting-service:8000")
PRODUCTION_SERVICE_URL: str = os.getenv("PRODUCTION_SERVICE_URL", "http://production-service:8000")
PROCUREMENT_SERVICE_URL: str = os.getenv("PROCUREMENT_SERVICE_URL", "http://procurement-service:8000")
NOTIFICATION_SERVICE_URL: str = os.getenv("NOTIFICATION_SERVICE_URL", "http://notification-service:8000")
TENANT_SERVICE_URL: str = os.getenv("TENANT_SERVICE_URL", "http://tenant-service:8000")
# Notification Settings
SEND_NOTIFICATIONS: bool = os.getenv("SEND_NOTIFICATIONS", "true").lower() == "true"
NOTIFY_ON_SUCCESS: bool = os.getenv("NOTIFY_ON_SUCCESS", "true").lower() == "true"
NOTIFY_ON_FAILURE: bool = os.getenv("NOTIFY_ON_FAILURE", "true").lower() == "true"
# Audit and Logging
AUDIT_ORCHESTRATION_RUNS: bool = os.getenv("AUDIT_ORCHESTRATION_RUNS", "true").lower() == "true"
DETAILED_LOGGING: bool = os.getenv("DETAILED_LOGGING", "true").lower() == "true"
# AI Enhancement Settings
ORCHESTRATION_USE_AI_INSIGHTS: bool = os.getenv("ORCHESTRATION_USE_AI_INSIGHTS", "true").lower() == "true"
AI_INSIGHTS_SERVICE_URL: str = os.getenv("AI_INSIGHTS_SERVICE_URL", "http://ai-insights-service:8000")
AI_INSIGHTS_MIN_CONFIDENCE: int = int(os.getenv("AI_INSIGHTS_MIN_CONFIDENCE", "70"))
# Redis Cache Settings (for dashboard performance)
REDIS_HOST: str = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT: int = int(os.getenv("REDIS_PORT", "6379"))
REDIS_DB: int = int(os.getenv("REDIS_DB", "0"))
REDIS_PASSWORD: str = os.getenv("REDIS_PASSWORD", "")
REDIS_TLS_ENABLED: str = os.getenv("REDIS_TLS_ENABLED", "false")
CACHE_ENABLED: bool = os.getenv("CACHE_ENABLED", "true").lower() == "true"
CACHE_TTL_HEALTH: int = int(os.getenv("CACHE_TTL_HEALTH", "30")) # 30 seconds
CACHE_TTL_INSIGHTS: int = int(os.getenv("CACHE_TTL_INSIGHTS", "60")) # 1 minute (reduced for faster metrics updates)
CACHE_TTL_SUMMARY: int = int(os.getenv("CACHE_TTL_SUMMARY", "60")) # 1 minute
# Enterprise dashboard cache TTLs
CACHE_TTL_ENTERPRISE_SUMMARY: int = int(os.getenv("CACHE_TTL_ENTERPRISE_SUMMARY", "60")) # 1 minute
CACHE_TTL_ENTERPRISE_PERFORMANCE: int = int(os.getenv("CACHE_TTL_ENTERPRISE_PERFORMANCE", "60")) # 1 minute
CACHE_TTL_ENTERPRISE_DISTRIBUTION: int = int(os.getenv("CACHE_TTL_ENTERPRISE_DISTRIBUTION", "30")) # 30 seconds
CACHE_TTL_ENTERPRISE_FORECAST: int = int(os.getenv("CACHE_TTL_ENTERPRISE_FORECAST", "120")) # 2 minutes
CACHE_TTL_ENTERPRISE_NETWORK: int = int(os.getenv("CACHE_TTL_ENTERPRISE_NETWORK", "60")) # 1 minute
# Global settings instance
settings = OrchestratorSettings()
def get_settings():
"""Get the global settings instance"""
return settings

View File

@@ -0,0 +1,48 @@
# ================================================================
# services/orchestrator/app/core/database.py
# ================================================================
"""
Database connection and session management for Orchestrator Service
Minimal database - only for audit trail
"""
from shared.database.base import DatabaseManager
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
from .config import settings
# Initialize database manager
database_manager = DatabaseManager(
database_url=settings.DATABASE_URL,
echo=settings.DEBUG
)
# Create async session factory
AsyncSessionLocal = async_sessionmaker(
database_manager.async_engine,
class_=AsyncSession,
expire_on_commit=False,
autocommit=False,
autoflush=False,
)
async def get_db() -> AsyncSession:
"""
Dependency to get database session.
Used in FastAPI endpoints via Depends(get_db).
"""
async with AsyncSessionLocal() as session:
try:
yield session
finally:
await session.close()
async def init_db():
"""Initialize database (create tables if needed)"""
await database_manager.create_all()
async def close_db():
"""Close database connections"""
await database_manager.close()

View File

@@ -0,0 +1,237 @@
# ================================================================
# services/orchestrator/app/main.py
# ================================================================
"""
Orchestrator Service - FastAPI Application
Automated orchestration of forecasting, production, and procurement workflows
"""
from fastapi import FastAPI, Request
from sqlalchemy import text
from app.core.config import settings
from app.core.database import database_manager
from shared.service_base import StandardFastAPIService
class OrchestratorService(StandardFastAPIService):
"""Orchestrator Service with standardized setup"""
expected_migration_version = "001_initial_schema"
def __init__(self):
# Define expected database tables for health checks
orchestrator_expected_tables = [
'orchestration_runs'
]
self.rabbitmq_client = None
self.event_publisher = None
self.leader_election = None
self.scheduler_service = None
super().__init__(
service_name="orchestrator-service",
app_name=settings.APP_NAME,
description=settings.DESCRIPTION,
version=settings.VERSION,
api_prefix="", # Empty because RouteBuilder already includes /api/v1
database_manager=database_manager,
expected_tables=orchestrator_expected_tables,
enable_messaging=True # Enable RabbitMQ for event publishing
)
async def verify_migrations(self):
"""Verify database schema matches the latest migrations"""
try:
async with self.database_manager.get_session() as session:
result = await session.execute(text("SELECT version_num FROM alembic_version"))
version = result.scalar()
if version != self.expected_migration_version:
self.logger.error(f"Migration version mismatch: expected {self.expected_migration_version}, got {version}")
raise RuntimeError(f"Migration version mismatch: expected {self.expected_migration_version}, got {version}")
self.logger.info(f"Migration verification successful: {version}")
except Exception as e:
self.logger.error(f"Migration verification failed: {e}")
raise
async def _setup_messaging(self):
"""Setup messaging for orchestrator service"""
from shared.messaging import UnifiedEventPublisher, RabbitMQClient
try:
self.rabbitmq_client = RabbitMQClient(settings.RABBITMQ_URL, service_name="orchestrator-service")
await self.rabbitmq_client.connect()
# Create event publisher
self.event_publisher = UnifiedEventPublisher(self.rabbitmq_client, "orchestrator-service")
self.logger.info("Orchestrator service messaging setup completed")
except Exception as e:
self.logger.error("Failed to setup orchestrator messaging", error=str(e))
raise
async def _cleanup_messaging(self):
"""Cleanup messaging for orchestrator service"""
try:
if self.rabbitmq_client:
await self.rabbitmq_client.disconnect()
self.logger.info("Orchestrator service messaging cleanup completed")
except Exception as e:
self.logger.error("Error during orchestrator messaging cleanup", error=str(e))
async def on_startup(self, app: FastAPI):
"""Custom startup logic for orchestrator service"""
# Verify migrations first
await self.verify_migrations()
# Call parent startup (includes database, messaging, etc.)
await super().on_startup(app)
self.logger.info("Orchestrator Service starting up...")
# Initialize leader election for horizontal scaling
# Only the leader pod will run the scheduler
await self._setup_leader_election(app)
# REMOVED: Delivery tracking service - moved to procurement service (domain ownership)
async def _setup_leader_election(self, app: FastAPI):
"""
Setup leader election for scheduler.
CRITICAL FOR HORIZONTAL SCALING:
Without leader election, each pod would run the same scheduled jobs,
causing duplicate forecasts, production schedules, and database contention.
"""
from shared.leader_election import LeaderElectionService
import redis.asyncio as redis
try:
# Create Redis connection for leader election
redis_url = f"redis://:{settings.REDIS_PASSWORD}@{settings.REDIS_HOST}:{settings.REDIS_PORT}/{settings.REDIS_DB}"
if settings.REDIS_TLS_ENABLED.lower() == "true":
redis_url = redis_url.replace("redis://", "rediss://")
redis_client = redis.from_url(redis_url, decode_responses=False)
await redis_client.ping()
# Use shared leader election service
self.leader_election = LeaderElectionService(
redis_client,
service_name="orchestrator"
)
# Define callbacks for leader state changes
async def on_become_leader():
self.logger.info("This pod became the leader - starting scheduler")
from app.services.orchestrator_service import OrchestratorSchedulerService
self.scheduler_service = OrchestratorSchedulerService(self.event_publisher, settings)
await self.scheduler_service.start()
app.state.scheduler_service = self.scheduler_service
self.logger.info("Orchestrator scheduler service started (leader only)")
async def on_lose_leader():
self.logger.warning("This pod lost leadership - stopping scheduler")
if self.scheduler_service:
await self.scheduler_service.stop()
self.scheduler_service = None
if hasattr(app.state, 'scheduler_service'):
app.state.scheduler_service = None
self.logger.info("Orchestrator scheduler service stopped (no longer leader)")
# Start leader election
await self.leader_election.start(
on_become_leader=on_become_leader,
on_lose_leader=on_lose_leader
)
# Store leader election in app state for health checks
app.state.leader_election = self.leader_election
self.logger.info("Leader election initialized",
is_leader=self.leader_election.is_leader,
instance_id=self.leader_election.instance_id)
except Exception as e:
self.logger.error("Failed to setup leader election, falling back to standalone mode",
error=str(e))
# Fallback: start scheduler anyway (for single-pod deployments)
from app.services.orchestrator_service import OrchestratorSchedulerService
self.scheduler_service = OrchestratorSchedulerService(self.event_publisher, settings)
await self.scheduler_service.start()
app.state.scheduler_service = self.scheduler_service
self.logger.warning("Scheduler started in standalone mode (no leader election)")
async def on_shutdown(self, app: FastAPI):
"""Custom shutdown logic for orchestrator service"""
self.logger.info("Orchestrator Service shutting down...")
# Stop leader election (this will also stop scheduler if we're the leader)
if self.leader_election:
await self.leader_election.stop()
self.logger.info("Leader election stopped")
# Stop scheduler service if still running
if self.scheduler_service:
await self.scheduler_service.stop()
self.logger.info("Orchestrator scheduler service stopped")
def get_service_features(self):
"""Return orchestrator-specific features"""
return [
"automated_orchestration",
"forecasting_integration",
"production_scheduling",
"procurement_planning",
"notification_dispatch",
"leader_election",
"retry_mechanism",
"circuit_breaker"
]
# Create service instance
service = OrchestratorService()
# Create FastAPI app with standardized setup
app = service.create_app()
# Setup standard endpoints (health, readiness, metrics)
service.setup_standard_endpoints()
# Include routers
# BUSINESS: Orchestration operations
from app.api.orchestration import router as orchestration_router
from app.api.internal import router as internal_router
service.add_router(orchestration_router)
service.add_router(internal_router)
# INTERNAL: Service-to-service endpoints for demo data cloning
from app.api.internal_demo import router as internal_demo_router
service.add_router(internal_demo_router, tags=["internal-demo"])
@app.middleware("http")
async def logging_middleware(request: Request, call_next):
"""Add request logging middleware"""
import time
start_time = time.time()
response = await call_next(request)
process_time = time.time() - start_time
service.logger.info("HTTP request processed",
method=request.method,
url=str(request.url),
status_code=response.status_code,
process_time=round(process_time, 4))
return response
if __name__ == "__main__":
import uvicorn
uvicorn.run(
"main:app",
host="0.0.0.0",
port=8000,
reload=settings.DEBUG
)

View File

View File

@@ -0,0 +1,894 @@
"""
AI-Enhanced Orchestration Saga
Integrates ML insights into daily workflow orchestration
"""
import pandas as pd
import numpy as np
from typing import Dict, List, Any, Optional, Tuple
from datetime import datetime, timedelta
from uuid import UUID
import structlog
from shared.clients.ai_insights_client import AIInsightsClient
logger = structlog.get_logger()
class AIEnhancedOrchestrator:
"""
Enhanced orchestration engine that integrates ML insights into daily workflow.
Workflow:
1. Pre-Orchestration: Gather all relevant insights for target date
2. Intelligent Planning: Modify orchestration plan based on insights
3. Execution: Apply insights with confidence-based decision making
4. Feedback Tracking: Record outcomes for continuous learning
Replaces hardcoded logic with learned intelligence from:
- Demand Forecasting
- Supplier Performance
- Safety Stock Optimization
- Price Forecasting
- Production Yield Prediction
- Dynamic Business Rules
"""
def __init__(
self,
ai_insights_base_url: str = "http://ai-insights-service:8000",
min_confidence_threshold: int = 70
):
self.ai_insights_client = AIInsightsClient(ai_insights_base_url)
self.min_confidence_threshold = min_confidence_threshold
self.applied_insights = [] # Track applied insights for feedback
async def orchestrate_with_ai(
self,
tenant_id: str,
target_date: datetime,
base_orchestration_plan: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Run AI-enhanced orchestration for a target date.
Args:
tenant_id: Tenant identifier
target_date: Date to orchestrate for
base_orchestration_plan: Optional base plan to enhance (if None, creates new)
Returns:
Enhanced orchestration plan with applied insights and metadata
"""
logger.info(
"Starting AI-enhanced orchestration",
tenant_id=tenant_id,
target_date=target_date.isoformat()
)
# Step 1: Gather insights for target date
insights = await self._gather_insights(tenant_id, target_date)
logger.info(
"Insights gathered",
demand_forecasts=len(insights['demand_forecasts']),
supplier_alerts=len(insights['supplier_alerts']),
inventory_optimizations=len(insights['inventory_optimizations']),
price_opportunities=len(insights['price_opportunities']),
yield_predictions=len(insights['yield_predictions']),
business_rules=len(insights['business_rules'])
)
# Step 2: Initialize or load base plan
if base_orchestration_plan is None:
orchestration_plan = self._create_base_plan(target_date)
else:
orchestration_plan = base_orchestration_plan.copy()
# Step 3: Apply insights to plan
enhanced_plan = await self._apply_insights_to_plan(
orchestration_plan, insights, tenant_id
)
# Step 4: Generate execution summary
execution_summary = self._generate_execution_summary(
enhanced_plan, insights
)
logger.info(
"AI-enhanced orchestration complete",
tenant_id=tenant_id,
insights_applied=execution_summary['total_insights_applied'],
modifications=execution_summary['total_modifications']
)
return {
'tenant_id': tenant_id,
'target_date': target_date.isoformat(),
'orchestrated_at': datetime.utcnow().isoformat(),
'plan': enhanced_plan,
'insights_used': insights,
'execution_summary': execution_summary,
'applied_insights': self.applied_insights
}
async def _gather_insights(
self,
tenant_id: str,
target_date: datetime
) -> Dict[str, List[Dict[str, Any]]]:
"""
Gather all relevant insights for target date from AI Insights Service.
Returns insights categorized by type:
- demand_forecasts
- supplier_alerts
- inventory_optimizations
- price_opportunities
- yield_predictions
- business_rules
"""
# Get orchestration-ready insights
insights = await self.ai_insights_client.get_orchestration_ready_insights(
tenant_id=UUID(tenant_id),
target_date=target_date,
min_confidence=self.min_confidence_threshold
)
# Categorize insights by source
categorized = {
'demand_forecasts': [],
'supplier_alerts': [],
'inventory_optimizations': [],
'price_opportunities': [],
'yield_predictions': [],
'business_rules': [],
'other': []
}
for insight in insights:
source_model = insight.get('source_model', '')
category = insight.get('category', '')
if source_model == 'hybrid_forecaster' or category == 'demand':
categorized['demand_forecasts'].append(insight)
elif source_model == 'supplier_performance_predictor':
categorized['supplier_alerts'].append(insight)
elif source_model == 'safety_stock_optimizer':
categorized['inventory_optimizations'].append(insight)
elif source_model == 'price_forecaster':
categorized['price_opportunities'].append(insight)
elif source_model == 'yield_predictor':
categorized['yield_predictions'].append(insight)
elif source_model == 'business_rules_engine':
categorized['business_rules'].append(insight)
else:
categorized['other'].append(insight)
return categorized
def _create_base_plan(self, target_date: datetime) -> Dict[str, Any]:
"""Create base orchestration plan with default hardcoded values."""
return {
'target_date': target_date.isoformat(),
'procurement': {
'orders': [],
'supplier_selections': {},
'order_quantities': {}
},
'inventory': {
'safety_stock_levels': {},
'reorder_points': {},
'transfers': []
},
'production': {
'production_runs': [],
'recipe_quantities': {},
'worker_assignments': {}
},
'sales': {
'forecasted_demand': {},
'pricing_adjustments': {}
},
'modifications': [],
'ai_enhancements': []
}
async def _apply_insights_to_plan(
self,
plan: Dict[str, Any],
insights: Dict[str, List[Dict[str, Any]]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply categorized insights to orchestration plan.
Each insight type modifies specific parts of the plan:
- Demand forecasts → sales forecasts, production quantities
- Supplier alerts → supplier selection, procurement timing
- Inventory optimizations → safety stock levels, reorder points
- Price opportunities → procurement timing, order quantities
- Yield predictions → production quantities, worker assignments
- Business rules → cross-cutting modifications
"""
enhanced_plan = plan.copy()
# Apply demand forecasts
if insights['demand_forecasts']:
enhanced_plan = await self._apply_demand_forecasts(
enhanced_plan, insights['demand_forecasts'], tenant_id
)
# Apply supplier alerts
if insights['supplier_alerts']:
enhanced_plan = await self._apply_supplier_alerts(
enhanced_plan, insights['supplier_alerts'], tenant_id
)
# Apply inventory optimizations
if insights['inventory_optimizations']:
enhanced_plan = await self._apply_inventory_optimizations(
enhanced_plan, insights['inventory_optimizations'], tenant_id
)
# Apply price opportunities
if insights['price_opportunities']:
enhanced_plan = await self._apply_price_opportunities(
enhanced_plan, insights['price_opportunities'], tenant_id
)
# Apply yield predictions
if insights['yield_predictions']:
enhanced_plan = await self._apply_yield_predictions(
enhanced_plan, insights['yield_predictions'], tenant_id
)
# Apply business rules (highest priority, can override)
if insights['business_rules']:
enhanced_plan = await self._apply_business_rules(
enhanced_plan, insights['business_rules'], tenant_id
)
return enhanced_plan
async def _apply_demand_forecasts(
self,
plan: Dict[str, Any],
forecasts: List[Dict[str, Any]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply demand forecasts to sales and production planning.
Modifications:
- Update sales forecasted_demand
- Adjust production recipe_quantities
- Record insight application
"""
for forecast in forecasts:
if forecast['confidence'] < self.min_confidence_threshold:
continue
metrics = forecast.get('metrics_json', {})
product_id = metrics.get('product_id')
predicted_demand = metrics.get('predicted_demand')
forecast_date = metrics.get('forecast_date')
if not product_id or predicted_demand is None:
continue
# Update sales forecast
plan['sales']['forecasted_demand'][product_id] = {
'quantity': predicted_demand,
'confidence': forecast['confidence'],
'source': 'ai_forecast',
'insight_id': forecast.get('id')
}
# Adjust production quantities (demand + buffer)
buffer_pct = 1.10 # 10% buffer for uncertainty
production_quantity = int(predicted_demand * buffer_pct)
plan['production']['recipe_quantities'][product_id] = {
'quantity': production_quantity,
'demand_forecast': predicted_demand,
'buffer_applied': buffer_pct,
'source': 'ai_forecast',
'insight_id': forecast.get('id')
}
# Record modification
plan['modifications'].append({
'type': 'demand_forecast_applied',
'insight_id': forecast.get('id'),
'product_id': product_id,
'predicted_demand': predicted_demand,
'production_quantity': production_quantity,
'confidence': forecast['confidence']
})
# Track for feedback
self.applied_insights.append({
'insight_id': forecast.get('id'),
'type': 'demand_forecast',
'applied_at': datetime.utcnow().isoformat(),
'tenant_id': tenant_id,
'metrics': {
'product_id': product_id,
'predicted_demand': predicted_demand,
'production_quantity': production_quantity
}
})
logger.info(
"Applied demand forecast",
product_id=product_id,
predicted_demand=predicted_demand,
production_quantity=production_quantity
)
return plan
async def _apply_supplier_alerts(
self,
plan: Dict[str, Any],
alerts: List[Dict[str, Any]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply supplier performance alerts to procurement decisions.
Modifications:
- Switch suppliers for low reliability
- Adjust lead times for delays
- Increase order quantities for short deliveries
"""
for alert in alerts:
if alert['confidence'] < self.min_confidence_threshold:
continue
metrics = alert.get('metrics_json', {})
supplier_id = metrics.get('supplier_id')
reliability_score = metrics.get('reliability_score')
predicted_delay = metrics.get('predicted_delivery_delay_days')
if not supplier_id:
continue
# Low reliability: recommend supplier switch
if reliability_score and reliability_score < 70:
plan['procurement']['supplier_selections'][supplier_id] = {
'action': 'avoid',
'reason': f'Low reliability score: {reliability_score}',
'alternative_required': True,
'source': 'supplier_alert',
'insight_id': alert.get('id')
}
plan['modifications'].append({
'type': 'supplier_switch_recommended',
'insight_id': alert.get('id'),
'supplier_id': supplier_id,
'reliability_score': reliability_score,
'confidence': alert['confidence']
})
# Delay predicted: adjust lead time
if predicted_delay and predicted_delay > 1:
plan['procurement']['supplier_selections'][supplier_id] = {
'action': 'adjust_lead_time',
'additional_lead_days': int(predicted_delay),
'reason': f'Predicted delay: {predicted_delay} days',
'source': 'supplier_alert',
'insight_id': alert.get('id')
}
plan['modifications'].append({
'type': 'lead_time_adjusted',
'insight_id': alert.get('id'),
'supplier_id': supplier_id,
'additional_days': int(predicted_delay),
'confidence': alert['confidence']
})
# Track for feedback
self.applied_insights.append({
'insight_id': alert.get('id'),
'type': 'supplier_alert',
'applied_at': datetime.utcnow().isoformat(),
'tenant_id': tenant_id,
'metrics': {
'supplier_id': supplier_id,
'reliability_score': reliability_score,
'predicted_delay': predicted_delay
}
})
logger.info(
"Applied supplier alert",
supplier_id=supplier_id,
reliability_score=reliability_score,
predicted_delay=predicted_delay
)
return plan
async def _apply_inventory_optimizations(
self,
plan: Dict[str, Any],
optimizations: List[Dict[str, Any]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply safety stock optimizations to inventory management.
Modifications:
- Update safety stock levels (from hardcoded 95% to learned optimal)
- Adjust reorder points accordingly
"""
for optimization in optimizations:
if optimization['confidence'] < self.min_confidence_threshold:
continue
metrics = optimization.get('metrics_json', {})
product_id = metrics.get('inventory_product_id')
optimal_safety_stock = metrics.get('optimal_safety_stock')
optimal_service_level = metrics.get('optimal_service_level')
if not product_id or optimal_safety_stock is None:
continue
# Update safety stock level
plan['inventory']['safety_stock_levels'][product_id] = {
'quantity': optimal_safety_stock,
'service_level': optimal_service_level,
'source': 'ai_optimization',
'insight_id': optimization.get('id'),
'replaced_hardcoded': True
}
# Adjust reorder point (lead time demand + safety stock)
# This would use demand forecast if available
lead_time_demand = metrics.get('lead_time_demand', optimal_safety_stock * 2)
reorder_point = lead_time_demand + optimal_safety_stock
plan['inventory']['reorder_points'][product_id] = {
'quantity': reorder_point,
'lead_time_demand': lead_time_demand,
'safety_stock': optimal_safety_stock,
'source': 'ai_optimization',
'insight_id': optimization.get('id')
}
plan['modifications'].append({
'type': 'safety_stock_optimized',
'insight_id': optimization.get('id'),
'product_id': product_id,
'optimal_safety_stock': optimal_safety_stock,
'optimal_service_level': optimal_service_level,
'confidence': optimization['confidence']
})
# Track for feedback
self.applied_insights.append({
'insight_id': optimization.get('id'),
'type': 'inventory_optimization',
'applied_at': datetime.utcnow().isoformat(),
'tenant_id': tenant_id,
'metrics': {
'product_id': product_id,
'optimal_safety_stock': optimal_safety_stock,
'reorder_point': reorder_point
}
})
logger.info(
"Applied safety stock optimization",
product_id=product_id,
optimal_safety_stock=optimal_safety_stock,
reorder_point=reorder_point
)
return plan
async def _apply_price_opportunities(
self,
plan: Dict[str, Any],
opportunities: List[Dict[str, Any]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply price forecasting opportunities to procurement timing.
Modifications:
- Advance orders for predicted price increases
- Delay orders for predicted price decreases
- Increase quantities for bulk opportunities
"""
for opportunity in opportunities:
if opportunity['confidence'] < self.min_confidence_threshold:
continue
metrics = opportunity.get('metrics_json', {})
ingredient_id = metrics.get('ingredient_id')
recommendation = metrics.get('recommendation')
expected_price_change = metrics.get('expected_price_change_pct')
if not ingredient_id or not recommendation:
continue
# Buy now: price increasing
if recommendation == 'buy_now' and expected_price_change and expected_price_change > 5:
plan['procurement']['order_quantities'][ingredient_id] = {
'action': 'increase',
'multiplier': 1.5, # Buy 50% more
'reason': f'Price expected to increase {expected_price_change:.1f}%',
'source': 'price_forecast',
'insight_id': opportunity.get('id')
}
plan['modifications'].append({
'type': 'bulk_purchase_opportunity',
'insight_id': opportunity.get('id'),
'ingredient_id': ingredient_id,
'expected_price_change': expected_price_change,
'quantity_multiplier': 1.5,
'confidence': opportunity['confidence']
})
# Wait: price decreasing
elif recommendation == 'wait' and expected_price_change and expected_price_change < -5:
plan['procurement']['order_quantities'][ingredient_id] = {
'action': 'delay',
'delay_days': 7,
'reason': f'Price expected to decrease {abs(expected_price_change):.1f}%',
'source': 'price_forecast',
'insight_id': opportunity.get('id')
}
plan['modifications'].append({
'type': 'procurement_delayed',
'insight_id': opportunity.get('id'),
'ingredient_id': ingredient_id,
'expected_price_change': expected_price_change,
'delay_days': 7,
'confidence': opportunity['confidence']
})
# Track for feedback
self.applied_insights.append({
'insight_id': opportunity.get('id'),
'type': 'price_opportunity',
'applied_at': datetime.utcnow().isoformat(),
'tenant_id': tenant_id,
'metrics': {
'ingredient_id': ingredient_id,
'recommendation': recommendation,
'expected_price_change': expected_price_change
}
})
logger.info(
"Applied price opportunity",
ingredient_id=ingredient_id,
recommendation=recommendation,
expected_price_change=expected_price_change
)
return plan
async def _apply_yield_predictions(
self,
plan: Dict[str, Any],
predictions: List[Dict[str, Any]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply production yield predictions to production planning.
Modifications:
- Increase production quantities for low predicted yield
- Optimize worker assignments
- Adjust production timing
"""
for prediction in predictions:
if prediction['confidence'] < self.min_confidence_threshold:
continue
metrics = prediction.get('metrics_json', {})
recipe_id = metrics.get('recipe_id')
predicted_yield = metrics.get('predicted_yield')
expected_waste = metrics.get('expected_waste')
if not recipe_id or predicted_yield is None:
continue
# Low yield: increase production quantity to compensate
if predicted_yield < 90:
current_quantity = plan['production']['recipe_quantities'].get(
recipe_id, {}
).get('quantity', 100)
# Adjust quantity to account for predicted waste
adjusted_quantity = int(current_quantity * (100 / predicted_yield))
plan['production']['recipe_quantities'][recipe_id] = {
'quantity': adjusted_quantity,
'predicted_yield': predicted_yield,
'waste_compensation': adjusted_quantity - current_quantity,
'source': 'yield_prediction',
'insight_id': prediction.get('id')
}
plan['modifications'].append({
'type': 'yield_compensation_applied',
'insight_id': prediction.get('id'),
'recipe_id': recipe_id,
'predicted_yield': predicted_yield,
'original_quantity': current_quantity,
'adjusted_quantity': adjusted_quantity,
'confidence': prediction['confidence']
})
# Track for feedback
self.applied_insights.append({
'insight_id': prediction.get('id'),
'type': 'yield_prediction',
'applied_at': datetime.utcnow().isoformat(),
'tenant_id': tenant_id,
'metrics': {
'recipe_id': recipe_id,
'predicted_yield': predicted_yield,
'expected_waste': expected_waste
}
})
logger.info(
"Applied yield prediction",
recipe_id=recipe_id,
predicted_yield=predicted_yield
)
return plan
async def _apply_business_rules(
self,
plan: Dict[str, Any],
rules: List[Dict[str, Any]],
tenant_id: str
) -> Dict[str, Any]:
"""
Apply dynamic business rules to orchestration plan.
Business rules can override other insights based on business logic.
"""
for rule in rules:
if rule['confidence'] < self.min_confidence_threshold:
continue
# Business rules are flexible and defined in JSONB
# Parse recommendation_actions to understand what to apply
actions = rule.get('recommendation_actions', [])
for action in actions:
action_type = action.get('action')
params = action.get('params', {})
# Example: Force supplier switch
if action_type == 'force_supplier_switch':
supplier_id = params.get('from_supplier_id')
alternate_id = params.get('to_supplier_id')
if supplier_id and alternate_id:
plan['procurement']['supplier_selections'][supplier_id] = {
'action': 'replace',
'alternate_supplier': alternate_id,
'reason': rule.get('description'),
'source': 'business_rule',
'insight_id': rule.get('id'),
'override': True
}
# Example: Halt production
elif action_type == 'halt_production':
recipe_id = params.get('recipe_id')
if recipe_id:
plan['production']['recipe_quantities'][recipe_id] = {
'quantity': 0,
'halted': True,
'reason': rule.get('description'),
'source': 'business_rule',
'insight_id': rule.get('id')
}
plan['modifications'].append({
'type': 'business_rule_applied',
'insight_id': rule.get('id'),
'rule_description': rule.get('description'),
'confidence': rule['confidence']
})
# Track for feedback
self.applied_insights.append({
'insight_id': rule.get('id'),
'type': 'business_rule',
'applied_at': datetime.utcnow().isoformat(),
'tenant_id': tenant_id,
'metrics': {'actions': len(actions)}
})
logger.info(
"Applied business rule",
rule_description=rule.get('title')
)
return plan
def _generate_execution_summary(
self,
plan: Dict[str, Any],
insights: Dict[str, List[Dict[str, Any]]]
) -> Dict[str, Any]:
"""Generate summary of AI-enhanced orchestration execution."""
total_insights_available = sum(len(v) for v in insights.values())
total_insights_applied = len(self.applied_insights)
total_modifications = len(plan.get('modifications', []))
# Count by type
insights_by_type = {}
for category, category_insights in insights.items():
insights_by_type[category] = {
'available': len(category_insights),
'applied': len([
i for i in self.applied_insights
if i['type'] == category.rstrip('s') # Remove plural
])
}
return {
'total_insights_available': total_insights_available,
'total_insights_applied': total_insights_applied,
'total_modifications': total_modifications,
'application_rate': round(
(total_insights_applied / total_insights_available * 100)
if total_insights_available > 0 else 0,
2
),
'insights_by_type': insights_by_type,
'modifications_summary': self._summarize_modifications(plan)
}
def _summarize_modifications(self, plan: Dict[str, Any]) -> Dict[str, int]:
"""Summarize modifications by type."""
modifications = plan.get('modifications', [])
summary = {}
for mod in modifications:
mod_type = mod.get('type', 'unknown')
summary[mod_type] = summary.get(mod_type, 0) + 1
return summary
async def record_orchestration_feedback(
self,
tenant_id: str,
target_date: datetime,
actual_outcomes: Dict[str, Any]
) -> Dict[str, Any]:
"""
Record feedback for applied insights to enable continuous learning.
Args:
tenant_id: Tenant identifier
target_date: Orchestration target date
actual_outcomes: Actual results:
- actual_demand: {product_id: actual_quantity}
- actual_yields: {recipe_id: actual_yield_pct}
- actual_costs: {ingredient_id: actual_price}
- supplier_performance: {supplier_id: on_time_delivery}
Returns:
Feedback recording results
"""
logger.info(
"Recording orchestration feedback",
tenant_id=tenant_id,
target_date=target_date.isoformat(),
applied_insights=len(self.applied_insights)
)
feedback_results = []
for applied in self.applied_insights:
insight_id = applied.get('insight_id')
insight_type = applied.get('type')
metrics = applied.get('metrics', {})
# Prepare feedback based on type
feedback_data = {
'applied': True,
'applied_at': applied.get('applied_at'),
'outcome_date': target_date.isoformat()
}
# Demand forecast feedback
if insight_type == 'demand_forecast':
product_id = metrics.get('product_id')
predicted_demand = metrics.get('predicted_demand')
actual_demand = actual_outcomes.get('actual_demand', {}).get(product_id)
if actual_demand is not None:
error = abs(actual_demand - predicted_demand)
error_pct = (error / actual_demand * 100) if actual_demand > 0 else 0
feedback_data['outcome_metrics'] = {
'predicted_demand': predicted_demand,
'actual_demand': actual_demand,
'error': error,
'error_pct': round(error_pct, 2),
'accuracy': round(100 - error_pct, 2)
}
# Yield prediction feedback
elif insight_type == 'yield_prediction':
recipe_id = metrics.get('recipe_id')
predicted_yield = metrics.get('predicted_yield')
actual_yield = actual_outcomes.get('actual_yields', {}).get(recipe_id)
if actual_yield is not None:
error = abs(actual_yield - predicted_yield)
feedback_data['outcome_metrics'] = {
'predicted_yield': predicted_yield,
'actual_yield': actual_yield,
'error': round(error, 2),
'accuracy': round(100 - (error / actual_yield * 100), 2) if actual_yield > 0 else 0
}
# Record feedback via AI Insights Client
try:
await self.ai_insights_client.record_feedback(
tenant_id=UUID(tenant_id),
insight_id=UUID(insight_id) if insight_id else None,
feedback_data=feedback_data
)
feedback_results.append({
'insight_id': insight_id,
'insight_type': insight_type,
'status': 'recorded',
'feedback': feedback_data
})
except Exception as e:
logger.error(
"Error recording feedback",
insight_id=insight_id,
error=str(e)
)
feedback_results.append({
'insight_id': insight_id,
'insight_type': insight_type,
'status': 'failed',
'error': str(e)
})
logger.info(
"Feedback recording complete",
total=len(feedback_results),
successful=len([r for r in feedback_results if r['status'] == 'recorded'])
)
return {
'tenant_id': tenant_id,
'target_date': target_date.isoformat(),
'feedback_recorded_at': datetime.utcnow().isoformat(),
'total_insights': len(self.applied_insights),
'feedback_results': feedback_results,
'successful': len([r for r in feedback_results if r['status'] == 'recorded']),
'failed': len([r for r in feedback_results if r['status'] == 'failed'])
}
async def close(self):
"""Close HTTP client connections."""
await self.ai_insights_client.close()

View File

@@ -0,0 +1,13 @@
# ================================================================
# services/orchestrator/app/models/__init__.py
# ================================================================
"""
Orchestrator Service Models
"""
from .orchestration_run import OrchestrationRun, OrchestrationStatus
__all__ = [
"OrchestrationRun",
"OrchestrationStatus",
]

View File

@@ -0,0 +1,113 @@
# ================================================================
# services/orchestrator/app/models/orchestration_run.py
# ================================================================
"""
Orchestration Run Models - Audit trail for orchestration executions
"""
import uuid
import enum
from datetime import datetime, timezone
from sqlalchemy import Column, String, DateTime, Integer, Text, Boolean, Enum as SQLEnum
from sqlalchemy.dialects.postgresql import UUID, JSONB
from sqlalchemy.sql import func
from shared.database.base import Base
class OrchestrationStatus(enum.Enum):
"""Orchestration run status"""
pending = "pending"
running = "running"
completed = "completed"
partial_success = "partial_success"
failed = "failed"
cancelled = "cancelled"
class OrchestrationRun(Base):
"""Audit trail for orchestration executions"""
__tablename__ = "orchestration_runs"
# Primary identification
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
run_number = Column(String(50), nullable=False, unique=True, index=True)
# Run details
tenant_id = Column(UUID(as_uuid=True), nullable=False, index=True)
status = Column(SQLEnum(OrchestrationStatus), nullable=False, default=OrchestrationStatus.pending, index=True)
run_type = Column(String(50), nullable=False, default="scheduled") # scheduled, manual, test
priority = Column(String(20), nullable=False, default="normal") # normal, high, critical
# Timing
started_at = Column(DateTime(timezone=True), nullable=False, default=lambda: datetime.now(timezone.utc))
completed_at = Column(DateTime(timezone=True), nullable=True)
duration_seconds = Column(Integer, nullable=True)
# Step tracking
forecasting_started_at = Column(DateTime(timezone=True), nullable=True)
forecasting_completed_at = Column(DateTime(timezone=True), nullable=True)
forecasting_status = Column(String(20), nullable=True) # success, failed, skipped
forecasting_error = Column(Text, nullable=True)
production_started_at = Column(DateTime(timezone=True), nullable=True)
production_completed_at = Column(DateTime(timezone=True), nullable=True)
production_status = Column(String(20), nullable=True) # success, failed, skipped
production_error = Column(Text, nullable=True)
procurement_started_at = Column(DateTime(timezone=True), nullable=True)
procurement_completed_at = Column(DateTime(timezone=True), nullable=True)
procurement_status = Column(String(20), nullable=True) # success, failed, skipped
procurement_error = Column(Text, nullable=True)
notification_started_at = Column(DateTime(timezone=True), nullable=True)
notification_completed_at = Column(DateTime(timezone=True), nullable=True)
notification_status = Column(String(20), nullable=True) # success, failed, skipped
notification_error = Column(Text, nullable=True)
# AI Insights tracking
ai_insights_started_at = Column(DateTime(timezone=True), nullable=True)
ai_insights_completed_at = Column(DateTime(timezone=True), nullable=True)
ai_insights_status = Column(String(20), nullable=True) # success, failed, skipped
ai_insights_error = Column(Text, nullable=True)
ai_insights_generated = Column(Integer, nullable=False, default=0)
ai_insights_posted = Column(Integer, nullable=False, default=0)
# Results summary
forecasts_generated = Column(Integer, nullable=False, default=0)
production_batches_created = Column(Integer, nullable=False, default=0)
procurement_plans_created = Column(Integer, nullable=False, default=0)
purchase_orders_created = Column(Integer, nullable=False, default=0)
notifications_sent = Column(Integer, nullable=False, default=0)
# Forecast data passed between services
forecast_data = Column(JSONB, nullable=True) # Store forecast results for downstream services
# Error handling
retry_count = Column(Integer, nullable=False, default=0)
max_retries_reached = Column(Boolean, nullable=False, default=False)
error_message = Column(Text, nullable=True)
error_details = Column(JSONB, nullable=True)
# External references
forecast_id = Column(UUID(as_uuid=True), nullable=True)
production_schedule_id = Column(UUID(as_uuid=True), nullable=True)
procurement_plan_id = Column(UUID(as_uuid=True), nullable=True)
# Saga tracking
saga_steps_total = Column(Integer, nullable=False, default=0)
saga_steps_completed = Column(Integer, nullable=False, default=0)
# Audit fields
created_at = Column(DateTime(timezone=True), server_default=func.now(), nullable=False)
updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now(), nullable=False)
triggered_by = Column(String(100), nullable=True) # scheduler, user_id, api
# Performance metrics
fulfillment_rate = Column(Integer, nullable=True) # Percentage as integer (0-100)
on_time_delivery_rate = Column(Integer, nullable=True) # Percentage as integer (0-100)
cost_accuracy = Column(Integer, nullable=True) # Percentage as integer (0-100)
quality_score = Column(Integer, nullable=True) # Rating as integer (0-100)
# Metadata
run_metadata = Column(JSONB, nullable=True)

View File

@@ -0,0 +1,193 @@
# ================================================================
# services/orchestrator/app/repositories/orchestration_run_repository.py
# ================================================================
"""
Orchestration Run Repository - Database operations for orchestration audit trail
"""
import uuid
from datetime import datetime, date, timezone
from typing import List, Optional, Dict, Any
from sqlalchemy import select, and_, desc, func
from sqlalchemy.ext.asyncio import AsyncSession
from app.models.orchestration_run import OrchestrationRun, OrchestrationStatus
class OrchestrationRunRepository:
"""Repository for orchestration run operations"""
def __init__(self, db: AsyncSession):
self.db = db
async def create_run(self, run_data: Dict[str, Any]) -> OrchestrationRun:
"""Create a new orchestration run"""
run = OrchestrationRun(**run_data)
self.db.add(run)
await self.db.flush()
return run
async def get_run_by_id(self, run_id: uuid.UUID) -> Optional[OrchestrationRun]:
"""Get orchestration run by ID"""
stmt = select(OrchestrationRun).where(OrchestrationRun.id == run_id)
result = await self.db.execute(stmt)
return result.scalar_one_or_none()
async def update_run(self, run_id: uuid.UUID, updates: Dict[str, Any]) -> Optional[OrchestrationRun]:
"""Update orchestration run"""
run = await self.get_run_by_id(run_id)
if not run:
return None
for key, value in updates.items():
if hasattr(run, key):
setattr(run, key, value)
run.updated_at = datetime.now(timezone.utc)
await self.db.flush()
return run
async def list_runs(
self,
tenant_id: Optional[uuid.UUID] = None,
status: Optional[OrchestrationStatus] = None,
start_date: Optional[date] = None,
end_date: Optional[date] = None,
limit: int = 50,
offset: int = 0
) -> List[OrchestrationRun]:
"""List orchestration runs with filters"""
conditions = []
if tenant_id:
conditions.append(OrchestrationRun.tenant_id == tenant_id)
if status:
conditions.append(OrchestrationRun.status == status)
if start_date:
conditions.append(func.date(OrchestrationRun.started_at) >= start_date)
if end_date:
conditions.append(func.date(OrchestrationRun.started_at) <= end_date)
stmt = (
select(OrchestrationRun)
.where(and_(*conditions) if conditions else True)
.order_by(desc(OrchestrationRun.started_at))
.limit(limit)
.offset(offset)
)
result = await self.db.execute(stmt)
return result.scalars().all()
async def get_latest_run_for_tenant(self, tenant_id: uuid.UUID) -> Optional[OrchestrationRun]:
"""Get the most recent orchestration run for a tenant"""
stmt = (
select(OrchestrationRun)
.where(OrchestrationRun.tenant_id == tenant_id)
.order_by(desc(OrchestrationRun.started_at))
.limit(1)
)
result = await self.db.execute(stmt)
return result.scalar_one_or_none()
async def generate_run_number(self) -> str:
"""
Generate unique run number atomically using database-level counting.
Uses MAX(run_number) + 1 approach to avoid race conditions
between reading count and inserting new record.
"""
today = date.today()
date_str = today.strftime("%Y%m%d")
# Get the highest run number for today atomically
# Using MAX on run_number suffix to avoid counting which has race conditions
stmt = select(func.max(OrchestrationRun.run_number)).where(
OrchestrationRun.run_number.like(f"ORCH-{date_str}-%")
)
result = await self.db.execute(stmt)
max_run_number = result.scalar()
if max_run_number:
# Extract the numeric suffix and increment it
try:
suffix = int(max_run_number.split('-')[-1])
next_number = suffix + 1
except (ValueError, IndexError):
# Fallback to 1 if parsing fails
next_number = 1
else:
# No runs for today yet
next_number = 1
return f"ORCH-{date_str}-{next_number:04d}"
async def get_failed_runs(self, limit: int = 10) -> List[OrchestrationRun]:
"""Get recent failed orchestration runs"""
stmt = (
select(OrchestrationRun)
.where(OrchestrationRun.status == OrchestrationStatus.failed)
.order_by(desc(OrchestrationRun.started_at))
.limit(limit)
)
result = await self.db.execute(stmt)
return result.scalars().all()
async def get_run_statistics(
self,
start_date: Optional[date] = None,
end_date: Optional[date] = None
) -> Dict[str, Any]:
"""Get orchestration run statistics"""
conditions = []
if start_date:
conditions.append(func.date(OrchestrationRun.started_at) >= start_date)
if end_date:
conditions.append(func.date(OrchestrationRun.started_at) <= end_date)
where_clause = and_(*conditions) if conditions else True
# Total runs
total_stmt = select(func.count(OrchestrationRun.id)).where(where_clause)
total_result = await self.db.execute(total_stmt)
total_runs = total_result.scalar() or 0
# Successful runs
success_stmt = select(func.count(OrchestrationRun.id)).where(
and_(
where_clause,
OrchestrationRun.status == OrchestrationStatus.completed
)
)
success_result = await self.db.execute(success_stmt)
successful_runs = success_result.scalar() or 0
# Failed runs
failed_stmt = select(func.count(OrchestrationRun.id)).where(
and_(
where_clause,
OrchestrationRun.status == OrchestrationStatus.failed
)
)
failed_result = await self.db.execute(failed_stmt)
failed_runs = failed_result.scalar() or 0
# Average duration
avg_duration_stmt = select(func.avg(OrchestrationRun.duration_seconds)).where(
and_(
where_clause,
OrchestrationRun.status == OrchestrationStatus.completed
)
)
avg_duration_result = await self.db.execute(avg_duration_stmt)
avg_duration = avg_duration_result.scalar() or 0
return {
'total_runs': total_runs,
'successful_runs': successful_runs,
'failed_runs': failed_runs,
'success_rate': (successful_runs / total_runs * 100) if total_runs > 0 else 0,
'average_duration_seconds': float(avg_duration) if avg_duration else 0
}

View File

@@ -0,0 +1,162 @@
"""
Orchestration Notification Service - Simplified
Emits minimal events using EventPublisher.
All enrichment handled by alert_processor.
"""
from datetime import datetime, timezone
from typing import Optional, Dict, Any
from uuid import UUID
import structlog
from shared.messaging import UnifiedEventPublisher
logger = structlog.get_logger()
class OrchestrationNotificationService:
"""
Service for emitting orchestration notifications using EventPublisher.
"""
def __init__(self, event_publisher: UnifiedEventPublisher):
self.publisher = event_publisher
async def emit_orchestration_run_started_notification(
self,
tenant_id: UUID,
run_id: str,
run_type: str, # 'scheduled', 'manual', 'triggered'
scope: str, # 'full', 'inventory_only', 'production_only'
) -> None:
"""
Emit notification when an orchestration run starts.
"""
metadata = {
"run_id": run_id,
"run_type": run_type,
"scope": scope,
"started_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.orchestration_run_started",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"orchestration_run_started_notification_emitted",
tenant_id=str(tenant_id),
run_id=run_id
)
async def emit_orchestration_run_completed_notification(
self,
tenant_id: UUID,
run_id: str,
duration_seconds: float,
actions_created: int,
actions_by_type: Dict[str, int], # e.g., {'purchase_order': 2, 'production_batch': 3}
status: str = "success",
) -> None:
"""
Emit notification when an orchestration run completes.
"""
# Build message with action summary
if actions_created == 0:
action_summary = "No actions needed"
else:
action_summary = ", ".join([f"{count} {action_type}" for action_type, count in actions_by_type.items()])
metadata = {
"run_id": run_id,
"status": status,
"duration_seconds": float(duration_seconds),
"actions_created": actions_created,
"actions_by_type": actions_by_type,
"action_summary": action_summary,
"completed_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.orchestration_run_completed",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"orchestration_run_completed_notification_emitted",
tenant_id=str(tenant_id),
run_id=run_id,
actions_created=actions_created
)
async def emit_action_created_notification(
self,
tenant_id: UUID,
run_id: str,
action_id: str,
action_type: str, # 'purchase_order', 'production_batch', 'inventory_adjustment'
action_details: Dict[str, Any], # Type-specific details
reason: str,
estimated_impact: Optional[Dict[str, Any]] = None,
) -> None:
"""
Emit notification when the orchestrator creates an action.
"""
metadata = {
"run_id": run_id,
"action_id": action_id,
"action_type": action_type,
"action_details": action_details,
"reason": reason,
"estimated_impact": estimated_impact,
"created_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.action_created",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"action_created_notification_emitted",
tenant_id=str(tenant_id),
action_id=action_id,
action_type=action_type
)
async def emit_action_completed_notification(
self,
tenant_id: UUID,
action_id: str,
action_type: str,
action_status: str, # 'approved', 'completed', 'rejected', 'cancelled'
completed_by: Optional[str] = None,
) -> None:
"""
Emit notification when an orchestrator action is completed/resolved.
"""
metadata = {
"action_id": action_id,
"action_type": action_type,
"action_status": action_status,
"completed_by": completed_by,
"completed_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.action_completed",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"action_completed_notification_emitted",
tenant_id=str(tenant_id),
action_id=action_id,
action_status=action_status
)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,728 @@
"""
Orchestrator Scheduler Service - REFACTORED
Coordinates daily auto-generation workflow: Forecasting → Production → Procurement
CHANGES FROM ORIGINAL:
- Updated to use new EventPublisher pattern for all messaging
- Integrated OrchestrationSaga for error handling and compensation
- Added circuit breakers for all service calls
- Implemented real Forecasting Service integration
- Implemented real Production Service integration
- Implemented real Tenant Service integration
- Implemented real Notification Service integration
- NO backwards compatibility, NO feature flags - complete rewrite
"""
import asyncio
import uuid
from datetime import datetime, date, timezone
from decimal import Decimal
from typing import List, Dict, Any, Optional
import structlog
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
# Updated imports - removed old alert system
from shared.messaging import UnifiedEventPublisher
from shared.clients.forecast_client import ForecastServiceClient
from shared.clients.production_client import ProductionServiceClient
from shared.clients.procurement_client import ProcurementServiceClient
from shared.clients.notification_client import NotificationServiceClient
from shared.clients.tenant_client import TenantServiceClient
from shared.clients.inventory_client import InventoryServiceClient
from shared.clients.suppliers_client import SuppliersServiceClient
from shared.clients.recipes_client import RecipesServiceClient
from shared.clients.training_client import TrainingServiceClient
from shared.utils.circuit_breaker import CircuitBreaker, CircuitBreakerOpenError
from app.core.config import settings
from app.repositories.orchestration_run_repository import OrchestrationRunRepository
from app.models.orchestration_run import OrchestrationStatus
from app.services.orchestration_saga import OrchestrationSaga
logger = structlog.get_logger()
class OrchestratorSchedulerService:
"""
Orchestrator Service using EventPublisher for messaging
Handles automated daily orchestration of forecasting, production, and procurement
"""
def __init__(self, event_publisher: UnifiedEventPublisher, config):
self.publisher = event_publisher
self.config = config
# APScheduler instance for running daily orchestration
self.scheduler = None
# Service clients
self.forecast_client = ForecastServiceClient(config, "orchestrator-service")
self.production_client = ProductionServiceClient(config, "orchestrator-service")
self.procurement_client = ProcurementServiceClient(config, "orchestrator-service")
self.notification_client = NotificationServiceClient(config, "orchestrator-service")
self.tenant_client = TenantServiceClient(config)
self.training_client = TrainingServiceClient(config, "orchestrator-service")
# Clients for centralized data fetching
self.inventory_client = InventoryServiceClient(config, "orchestrator-service")
self.suppliers_client = SuppliersServiceClient(config, "orchestrator-service")
self.recipes_client = RecipesServiceClient(config, "orchestrator-service")
# Circuit breakers for each service
self.forecast_breaker = CircuitBreaker(
failure_threshold=5,
timeout_duration=60,
success_threshold=2
)
self.production_breaker = CircuitBreaker(
failure_threshold=5,
timeout_duration=60,
success_threshold=2
)
self.procurement_breaker = CircuitBreaker(
failure_threshold=5,
timeout_duration=60,
success_threshold=2
)
self.tenant_breaker = CircuitBreaker(
failure_threshold=3,
timeout_duration=30,
success_threshold=2
)
self.inventory_breaker = CircuitBreaker(
failure_threshold=5,
timeout_duration=60,
success_threshold=2
)
self.suppliers_breaker = CircuitBreaker(
failure_threshold=5,
timeout_duration=60,
success_threshold=2
)
self.recipes_breaker = CircuitBreaker(
failure_threshold=5,
timeout_duration=60,
success_threshold=2
)
async def emit_orchestration_run_started(
self,
tenant_id: uuid.UUID,
run_id: str,
run_type: str, # 'scheduled', 'manual', 'triggered'
scope: str, # 'full', 'inventory_only', 'production_only'
):
"""
Emit notification when an orchestration run starts.
"""
metadata = {
"run_id": run_id,
"run_type": run_type,
"scope": scope,
"started_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.orchestration_run_started",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"orchestration_run_started_notification_emitted",
tenant_id=str(tenant_id),
run_id=run_id
)
async def emit_orchestration_run_completed(
self,
tenant_id: uuid.UUID,
run_id: str,
duration_seconds: float,
actions_created: int,
actions_by_type: Dict[str, int], # e.g., {'purchase_order': 2, 'production_batch': 3}
status: str = "success",
):
"""
Emit notification when an orchestration run completes.
"""
# Build message with action summary
if actions_created == 0:
action_summary = "No actions needed"
else:
action_summary = ", ".join([f"{count} {action_type}" for action_type, count in actions_by_type.items()])
metadata = {
"run_id": run_id,
"status": status,
"duration_seconds": float(duration_seconds),
"actions_created": actions_created,
"actions_by_type": actions_by_type,
"action_summary": action_summary,
"completed_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.orchestration_run_completed",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"orchestration_run_completed_notification_emitted",
tenant_id=str(tenant_id),
run_id=run_id,
actions_created=actions_created
)
async def emit_action_created_notification(
self,
tenant_id: uuid.UUID,
run_id: str,
action_id: str,
action_type: str, # 'purchase_order', 'production_batch', 'inventory_adjustment'
action_details: Dict[str, Any], # Type-specific details
reason: str,
estimated_impact: Optional[Dict[str, Any]] = None,
):
"""
Emit notification when the orchestrator creates an action.
"""
metadata = {
"run_id": run_id,
"action_id": action_id,
"action_type": action_type,
"action_details": action_details,
"reason": reason,
"estimated_impact": estimated_impact,
"created_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.action_created",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"action_created_notification_emitted",
tenant_id=str(tenant_id),
action_id=action_id,
action_type=action_type
)
async def emit_action_completed_notification(
self,
tenant_id: uuid.UUID,
action_id: str,
action_type: str,
action_status: str, # 'approved', 'completed', 'rejected', 'cancelled'
completed_by: Optional[str] = None,
):
"""
Emit notification when an orchestrator action is completed/resolved.
"""
metadata = {
"action_id": action_id,
"action_type": action_type,
"action_status": action_status,
"completed_by": completed_by,
"completed_at": datetime.now(timezone.utc).isoformat(),
}
await self.publisher.publish_notification(
event_type="operations.action_completed",
tenant_id=tenant_id,
data=metadata
)
logger.info(
"action_completed_notification_emitted",
tenant_id=str(tenant_id),
action_id=action_id,
action_status=action_status
)
async def run_daily_orchestration(self):
"""
Main orchestration workflow - runs daily
Executes for all active tenants in parallel (with limits)
"""
if not settings.ORCHESTRATION_ENABLED:
logger.info("Orchestration disabled via config")
return
logger.info("Starting daily orchestration workflow")
try:
# Get all active tenants
active_tenants = await self._get_active_tenants()
if not active_tenants:
logger.warning("No active tenants found for orchestration")
return
logger.info("Processing tenants",
total_tenants=len(active_tenants))
# Process tenants with concurrency limit
semaphore = asyncio.Semaphore(settings.MAX_CONCURRENT_TENANTS)
async def process_with_semaphore(tenant_id):
async with semaphore:
return await self._orchestrate_tenant(tenant_id)
# Process all tenants in parallel (but limited by semaphore)
tasks = [process_with_semaphore(tenant_id) for tenant_id in active_tenants]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Log summary
successful = sum(1 for r in results if r and not isinstance(r, Exception))
failed = len(results) - successful
logger.info("Daily orchestration completed",
total_tenants=len(active_tenants),
successful=successful,
failed=failed)
except Exception as e:
logger.error("Error in daily orchestration",
error=str(e), exc_info=True)
async def _orchestrate_tenant(self, tenant_id: uuid.UUID) -> bool:
"""
Orchestrate workflow for a single tenant using Saga pattern
Returns True if successful, False otherwise
"""
logger.info("Starting orchestration for tenant", tenant_id=str(tenant_id))
# Create orchestration run record
async with self.config.database_manager.get_session() as session:
repo = OrchestrationRunRepository(session)
run_number = await repo.generate_run_number()
run = await repo.create_run({
'run_number': run_number,
'tenant_id': tenant_id,
'status': OrchestrationStatus.running,
'run_type': 'scheduled',
'started_at': datetime.now(timezone.utc),
'triggered_by': 'scheduler'
})
await session.commit()
run_id = run.id
try:
# Emit orchestration started event
await self.emit_orchestration_run_started(
tenant_id=tenant_id,
run_id=str(run_id),
run_type='scheduled',
scope='full'
)
# Set timeout for entire tenant orchestration
async with asyncio.timeout(settings.TENANT_TIMEOUT_SECONDS):
# Execute orchestration using Saga pattern
# AI enhancement is enabled via ORCHESTRATION_USE_AI_INSIGHTS config
saga = OrchestrationSaga(
forecast_client=self.forecast_client,
production_client=self.production_client,
procurement_client=self.procurement_client,
notification_client=self.notification_client,
inventory_client=self.inventory_client,
suppliers_client=self.suppliers_client,
recipes_client=self.recipes_client,
training_client=self.training_client,
use_ai_enhancement=settings.ORCHESTRATION_USE_AI_INSIGHTS,
ai_insights_base_url=settings.AI_INSIGHTS_SERVICE_URL,
ai_insights_min_confidence=settings.AI_INSIGHTS_MIN_CONFIDENCE,
# Pass circuit breakers to saga for fault tolerance
forecast_breaker=self.forecast_breaker,
production_breaker=self.production_breaker,
procurement_breaker=self.procurement_breaker,
inventory_breaker=self.inventory_breaker,
suppliers_breaker=self.suppliers_breaker,
recipes_breaker=self.recipes_breaker
)
result = await saga.execute_orchestration(
tenant_id=str(tenant_id),
orchestration_run_id=str(run_id)
)
if result['success']:
# Update orchestration run with saga results
await self._complete_orchestration_run_with_saga(
run_id,
result
)
# Emit orchestration completed event
await self.emit_orchestration_run_completed(
tenant_id=tenant_id,
run_id=str(run_id),
duration_seconds=result.get('duration_seconds', 0),
actions_created=result.get('total_actions', 0),
actions_by_type=result.get('actions_by_type', {}),
status='success'
)
logger.info("Tenant orchestration completed successfully",
tenant_id=str(tenant_id), run_id=str(run_id))
return True
else:
# Saga failed (with compensation)
await self._mark_orchestration_failed(
run_id,
result.get('error', 'Saga execution failed')
)
# Emit orchestration failed event
await self.emit_orchestration_run_completed(
tenant_id=tenant_id,
run_id=str(run_id),
duration_seconds=result.get('duration_seconds', 0),
actions_created=0,
actions_by_type={},
status='failed'
)
return False
except asyncio.TimeoutError:
logger.error("Tenant orchestration timeout",
tenant_id=str(tenant_id),
timeout_seconds=settings.TENANT_TIMEOUT_SECONDS)
await self._mark_orchestration_failed(run_id, "Timeout exceeded")
return False
except Exception as e:
logger.error("Tenant orchestration failed",
tenant_id=str(tenant_id),
error=str(e), exc_info=True)
await self._mark_orchestration_failed(run_id, str(e))
return False
async def _get_active_tenants(self) -> List[uuid.UUID]:
"""
Get list of active tenants for orchestration
REAL IMPLEMENTATION (no stubs)
"""
try:
logger.info("Fetching active tenants from Tenant Service")
# Call Tenant Service with circuit breaker
tenants_data = await self.tenant_breaker.call(
self.tenant_client.get_active_tenants
)
if not tenants_data:
logger.warning("Tenant Service returned no active tenants")
return []
# Extract tenant IDs
tenant_ids = []
for tenant in tenants_data:
tenant_id = tenant.get('id') or tenant.get('tenant_id')
if tenant_id:
# Convert string to UUID if needed
if isinstance(tenant_id, str):
tenant_id = uuid.UUID(tenant_id)
tenant_ids.append(tenant_id)
logger.info(f"Found {len(tenant_ids)} active tenants for orchestration")
return tenant_ids
except CircuitBreakerOpenError:
logger.error("Circuit breaker open for Tenant Service, skipping orchestration")
return []
except Exception as e:
logger.error("Error getting active tenants", error=str(e), exc_info=True)
return []
async def _complete_orchestration_run_with_saga(
self,
run_id: uuid.UUID,
saga_result: Dict[str, Any]
):
"""
Complete orchestration run with saga results
Args:
run_id: Orchestration run ID
saga_result: Result from saga execution
"""
async with self.config.database_manager.get_session() as session:
repo = OrchestrationRunRepository(session)
run = await repo.get_run_by_id(run_id)
if run:
started_at = run.started_at
completed_at = datetime.now(timezone.utc)
duration = (completed_at - started_at).total_seconds()
# Extract results from saga
forecast_id = saga_result.get('forecast_id')
production_schedule_id = saga_result.get('production_schedule_id')
procurement_plan_id = saga_result.get('procurement_plan_id')
notifications_sent = saga_result.get('notifications_sent', 0)
# Get saga summary
saga_summary = saga_result.get('saga_summary', {})
total_steps = saga_summary.get('total_steps', 0)
completed_steps = saga_summary.get('completed_steps', 0)
# Extract actual counts from saga result (no placeholders)
forecast_data = saga_result.get('forecast_data', {})
production_data = saga_result.get('production_data', {})
procurement_data = saga_result.get('procurement_data', {})
forecasts_generated = forecast_data.get('forecasts_created', 0)
production_batches_created = production_data.get('batches_created', 0)
purchase_orders_created = procurement_data.get('pos_created', 0)
# Extract AI insights tracking
ai_insights_generated = saga_result.get('ai_insights_generated', 0)
ai_insights_posted = saga_result.get('ai_insights_posted', 0)
ai_insights_errors = saga_result.get('ai_insights_errors', [])
# Generate reasoning metadata for the orchestrator context
reasoning_metadata = self._generate_reasoning_metadata(
forecast_data,
production_data,
procurement_data,
ai_insights_generated,
ai_insights_posted
)
await repo.update_run(run_id, {
'status': OrchestrationStatus.completed,
'completed_at': completed_at,
'duration_seconds': int(duration),
'forecast_id': forecast_id,
'forecasting_status': 'success',
'forecasting_completed_at': completed_at,
'forecasts_generated': forecasts_generated,
'production_schedule_id': production_schedule_id,
'production_status': 'success',
'production_completed_at': completed_at,
'production_batches_created': production_batches_created,
'procurement_plan_id': procurement_plan_id,
'procurement_status': 'success',
'procurement_completed_at': completed_at,
'procurement_plans_created': 1, # Always 1 plan per orchestration
'purchase_orders_created': purchase_orders_created,
'notification_status': 'success',
'notification_completed_at': completed_at,
'notifications_sent': notifications_sent,
'ai_insights_status': 'success' if not ai_insights_errors else 'partial',
'ai_insights_generated': ai_insights_generated,
'ai_insights_posted': ai_insights_posted,
'ai_insights_completed_at': completed_at,
'saga_steps_total': total_steps,
'saga_steps_completed': completed_steps,
'run_metadata': reasoning_metadata
})
await session.commit()
def _generate_reasoning_metadata(
self,
forecast_data: Dict[str, Any],
production_data: Dict[str, Any],
procurement_data: Dict[str, Any],
ai_insights_generated: int,
ai_insights_posted: int
) -> Dict[str, Any]:
"""
Generate reasoning metadata for orchestration run that will be used by alert processor.
This creates structured reasoning data that the alert processor can use to provide
context when showing AI reasoning to users.
"""
reasoning_metadata = {
'reasoning': {
'type': 'daily_orchestration_summary',
'timestamp': datetime.now(timezone.utc).isoformat(),
'summary': 'Daily orchestration run completed successfully',
'details': {}
},
'purchase_orders': [],
'production_batches': [],
'ai_insights': {
'generated': ai_insights_generated,
'posted': ai_insights_posted
}
}
# Add forecast reasoning
if forecast_data:
reasoning_metadata['reasoning']['details']['forecasting'] = {
'forecasts_created': forecast_data.get('forecasts_created', 0),
'method': 'automated_daily_forecast',
'reasoning': 'Generated forecasts based on historical patterns and seasonal trends'
}
# Add production reasoning
if production_data:
reasoning_metadata['reasoning']['details']['production'] = {
'batches_created': production_data.get('batches_created', 0),
'method': 'demand_based_scheduling',
'reasoning': 'Scheduled production batches based on forecasted demand and inventory levels'
}
# Add procurement reasoning
if procurement_data:
reasoning_metadata['reasoning']['details']['procurement'] = {
'requirements_created': procurement_data.get('requirements_created', 0),
'pos_created': procurement_data.get('pos_created', 0),
'method': 'automated_procurement',
'reasoning': 'Generated procurement plan based on production needs and inventory optimization'
}
# Add purchase order details with reasoning
if procurement_data and procurement_data.get('purchase_orders'):
for po in procurement_data['purchase_orders']:
po_reasoning = {
'id': po.get('id'),
'status': po.get('status', 'created'),
'delivery_date': po.get('delivery_date'),
'reasoning': {
'type': 'inventory_optimization',
'parameters': {
'trigger': 'low_stock_prediction',
'min_depletion_days': po.get('min_depletion_days', 3),
'quantity': po.get('quantity'),
'unit': po.get('unit'),
'supplier': po.get('supplier_name'),
'financial_impact_eur': po.get('estimated_savings_eur', 0)
}
}
}
reasoning_metadata['purchase_orders'].append(po_reasoning)
# Add production batch details with reasoning
if production_data and production_data.get('production_batches'):
for batch in production_data['production_batches']:
batch_reasoning = {
'id': batch.get('id'),
'status': batch.get('status', 'scheduled'),
'scheduled_date': batch.get('scheduled_date'),
'reasoning': {
'type': 'demand_forecasting',
'parameters': {
'trigger': 'forecasted_demand',
'forecasted_quantity': batch.get('forecasted_quantity'),
'product_name': batch.get('product_name'),
'financial_impact_eur': batch.get('estimated_revenue_eur', 0)
}
}
}
reasoning_metadata['production_batches'].append(batch_reasoning)
return reasoning_metadata
async def _mark_orchestration_failed(self, run_id: uuid.UUID, error_message: str):
"""Mark orchestration run as failed"""
async with self.config.database_manager.get_session() as session:
repo = OrchestrationRunRepository(session)
run = await repo.get_run_by_id(run_id)
if run:
started_at = run.started_at
completed_at = datetime.now(timezone.utc)
duration = (completed_at - started_at).total_seconds()
await repo.update_run(run_id, {
'status': OrchestrationStatus.failed,
'completed_at': completed_at,
'duration_seconds': int(duration),
'error_message': error_message
})
await session.commit()
# Manual trigger for testing
async def trigger_orchestration_for_tenant(
self,
tenant_id: uuid.UUID,
test_scenario: Optional[str] = None
) -> Dict[str, Any]:
"""
Manually trigger orchestration for a tenant (for testing)
Args:
tenant_id: Tenant ID to orchestrate
test_scenario: Optional test scenario (full, production_only, procurement_only)
Returns:
Dict with orchestration results
"""
logger.info("Manual orchestration trigger",
tenant_id=str(tenant_id),
test_scenario=test_scenario)
success = await self._orchestrate_tenant(tenant_id)
return {
'success': success,
'tenant_id': str(tenant_id),
'test_scenario': test_scenario,
'message': 'Orchestration completed' if success else 'Orchestration failed'
}
async def start(self):
"""Start the orchestrator scheduler service"""
if not settings.ORCHESTRATION_ENABLED:
logger.info("Orchestration disabled via config")
return
# Initialize APScheduler
self.scheduler = AsyncIOScheduler()
# Add daily orchestration job
self.scheduler.add_job(
self.run_daily_orchestration,
trigger=CronTrigger(
hour=settings.ORCHESTRATION_HOUR,
minute=settings.ORCHESTRATION_MINUTE
),
id='daily_orchestration',
name='Daily Orchestration Workflow',
replace_existing=True,
max_instances=1,
coalesce=True
)
# Start the scheduler
self.scheduler.start()
# Log next run time
next_run = self.scheduler.get_job('daily_orchestration').next_run_time
logger.info(
"OrchestratorSchedulerService started with daily job",
orchestration_hour=settings.ORCHESTRATION_HOUR,
orchestration_minute=settings.ORCHESTRATION_MINUTE,
next_run=next_run.isoformat() if next_run else None
)
async def stop(self):
"""Stop the orchestrator scheduler service"""
if self.scheduler and self.scheduler.running:
self.scheduler.shutdown(wait=True)
logger.info("OrchestratorSchedulerService stopped")
else:
logger.info("OrchestratorSchedulerService already stopped")
def get_circuit_breaker_stats(self) -> Dict[str, Any]:
"""Get circuit breaker statistics for monitoring"""
return {
'forecast_service': self.forecast_breaker.get_stats(),
'production_service': self.production_breaker.get_stats(),
'procurement_service': self.procurement_breaker.get_stats(),
'tenant_service': self.tenant_breaker.get_stats(),
'inventory_service': self.inventory_breaker.get_stats(),
'suppliers_service': self.suppliers_breaker.get_stats(),
'recipes_service': self.recipes_breaker.get_stats()
}

View File

@@ -0,0 +1,265 @@
# services/orchestrator/app/utils/cache.py
"""
Redis caching utilities for dashboard endpoints
"""
import json
import redis.asyncio as redis
from typing import Optional, Any, Callable
from functools import wraps
import structlog
from app.core.config import settings
from pydantic import BaseModel
logger = structlog.get_logger()
# Redis client instance
_redis_client: Optional[redis.Redis] = None
async def get_redis_client() -> redis.Redis:
"""Get or create Redis client"""
global _redis_client
if _redis_client is None:
try:
# Check if TLS is enabled - convert string to boolean properly
redis_tls_str = str(getattr(settings, 'REDIS_TLS_ENABLED', 'false')).lower()
redis_tls_enabled = redis_tls_str in ('true', '1', 'yes', 'on')
connection_kwargs = {
'host': str(getattr(settings, 'REDIS_HOST', 'localhost')),
'port': int(getattr(settings, 'REDIS_PORT', 6379)),
'db': int(getattr(settings, 'REDIS_DB', 0)),
'decode_responses': True,
'socket_connect_timeout': 5,
'socket_timeout': 5
}
# Add password if configured
redis_password = getattr(settings, 'REDIS_PASSWORD', None)
if redis_password:
connection_kwargs['password'] = redis_password
# Add SSL/TLS support if enabled
if redis_tls_enabled:
import ssl
connection_kwargs['ssl'] = True
connection_kwargs['ssl_cert_reqs'] = ssl.CERT_NONE
logger.debug(f"Redis TLS enabled - connecting with SSL to {connection_kwargs['host']}:{connection_kwargs['port']}")
_redis_client = redis.Redis(**connection_kwargs)
# Test connection
await _redis_client.ping()
logger.info(f"Redis client connected successfully (TLS: {redis_tls_enabled})")
except Exception as e:
logger.warning(f"Failed to connect to Redis: {e}. Caching will be disabled.")
_redis_client = None
return _redis_client
async def close_redis():
"""Close Redis connection"""
global _redis_client
if _redis_client:
await _redis_client.close()
_redis_client = None
logger.info("Redis connection closed")
async def get_cached(key: str) -> Optional[Any]:
"""
Get cached value by key
Args:
key: Cache key
Returns:
Cached value (deserialized from JSON) or None if not found or error
"""
try:
client = await get_redis_client()
if not client:
return None
cached = await client.get(key)
if cached:
logger.debug(f"Cache hit: {key}")
return json.loads(cached)
else:
logger.debug(f"Cache miss: {key}")
return None
except Exception as e:
logger.warning(f"Cache get error for key {key}: {e}")
return None
def _serialize_value(value: Any) -> Any:
"""
Recursively serialize values for JSON storage, handling Pydantic models properly.
Args:
value: Value to serialize
Returns:
JSON-serializable value
"""
if isinstance(value, BaseModel):
# Convert Pydantic model to dictionary
return value.model_dump()
elif isinstance(value, (list, tuple)):
# Recursively serialize list/tuple elements
return [_serialize_value(item) for item in value]
elif isinstance(value, dict):
# Recursively serialize dictionary values
return {key: _serialize_value(val) for key, val in value.items()}
else:
# For other types, use default serialization
return value
async def set_cached(key: str, value: Any, ttl: int = 60) -> bool:
"""
Set cached value with TTL
Args:
key: Cache key
value: Value to cache (will be JSON serialized)
ttl: Time to live in seconds
Returns:
True if successful, False otherwise
"""
try:
client = await get_redis_client()
if not client:
return False
# Serialize value properly before JSON encoding
serialized_value = _serialize_value(value)
serialized = json.dumps(serialized_value)
await client.setex(key, ttl, serialized)
logger.debug(f"Cache set: {key} (TTL: {ttl}s)")
return True
except Exception as e:
logger.warning(f"Cache set error for key {key}: {e}")
return False
async def delete_cached(key: str) -> bool:
"""
Delete cached value
Args:
key: Cache key
Returns:
True if successful, False otherwise
"""
try:
client = await get_redis_client()
if not client:
return False
await client.delete(key)
logger.debug(f"Cache deleted: {key}")
return True
except Exception as e:
logger.warning(f"Cache delete error for key {key}: {e}")
return False
async def delete_pattern(pattern: str) -> int:
"""
Delete all keys matching pattern
Args:
pattern: Redis key pattern (e.g., "dashboard:*")
Returns:
Number of keys deleted
"""
try:
client = await get_redis_client()
if not client:
return 0
keys = []
async for key in client.scan_iter(match=pattern):
keys.append(key)
if keys:
deleted = await client.delete(*keys)
logger.info(f"Deleted {deleted} keys matching pattern: {pattern}")
return deleted
return 0
except Exception as e:
logger.warning(f"Cache delete pattern error for {pattern}: {e}")
return 0
def cache_response(key_prefix: str, ttl: int = 60):
"""
Decorator to cache endpoint responses
Args:
key_prefix: Prefix for cache key (will be combined with tenant_id)
ttl: Time to live in seconds
Usage:
@cache_response("dashboard:health", ttl=30)
async def get_health(tenant_id: str):
...
"""
def decorator(func: Callable):
@wraps(func)
async def wrapper(*args, **kwargs):
# Extract tenant_id from kwargs or args
tenant_id = kwargs.get('tenant_id')
if not tenant_id and args:
# Try to find tenant_id in args (assuming it's the first argument)
tenant_id = args[0] if len(args) > 0 else None
if not tenant_id:
# No tenant_id, skip caching
return await func(*args, **kwargs)
# Build cache key
cache_key = f"{key_prefix}:{tenant_id}"
# Try to get from cache
cached_value = await get_cached(cache_key)
if cached_value is not None:
return cached_value
# Execute function
result = await func(*args, **kwargs)
# Cache result
await set_cached(cache_key, result, ttl)
return result
return wrapper
return decorator
def make_cache_key(prefix: str, tenant_id: str, **params) -> str:
"""
Create a cache key with optional parameters
Args:
prefix: Key prefix
tenant_id: Tenant ID
**params: Additional parameters to include in key
Returns:
Cache key string
"""
key_parts = [prefix, tenant_id]
for k, v in sorted(params.items()):
if v is not None:
key_parts.append(f"{k}:{v}")
return ":".join(key_parts)

View File

View File

@@ -0,0 +1,232 @@
# Migration Guide - Consolidated Schema
## Overview
This guide explains how to use the new consolidated initial schema migration for the Orchestration Service.
## Background
The orchestration service schema was previously split across two migration files:
1. `20251029_1700_add_orchestration_runs.py` - Initial table creation
2. `20251105_add_ai_insights_tracking.py` - AI insights fields addition
These have been consolidated into a single, well-structured initial schema file: `001_initial_schema.py`
## For New Deployments
If you're deploying the orchestration service for the first time:
1. **Use the consolidated migration:**
```bash
cd services/orchestrator
alembic upgrade head
```
2. **The migration will create:**
- `orchestration_runs` table with all columns
- `orchestrationstatus` enum type
- All 15 indexes for optimal query performance
3. **Verify the migration:**
```bash
alembic current
# Should show: 001_initial_schema (head)
```
## For Existing Deployments
If your database already has the orchestration schema from the old migrations:
### Option 1: Keep Existing Migration History (Recommended)
**Do nothing.** Your existing migrations are functionally equivalent to the consolidated version. The schema structure is identical.
- Your alembic_version table will show: `20251105_add_ai_insights`
- The new consolidated migration is for future deployments
- No action needed - your database is up to date
### Option 2: Reset Migration History (For Clean State)
Only do this if you want a clean migration history and can afford downtime:
1. **Backup your database:**
```bash
pg_dump -h localhost -U user orchestrator_db > backup.sql
```
2. **Drop and recreate schema:**
```bash
# In PostgreSQL
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
```
3. **Apply consolidated migration:**
```bash
cd services/orchestrator
alembic upgrade head
```
4. **Restore data (if needed):**
```bash
psql -h localhost -U user orchestrator_db < backup_data_only.sql
```
⚠️ **Warning:** This approach requires downtime and careful data backup/restore.
## Checking Your Current Migration Status
### Check Alembic version:
```bash
cd services/orchestrator
alembic current
```
### Check applied migrations:
```bash
alembic history
```
### Verify table structure:
```sql
-- Check if table exists
SELECT EXISTS (
SELECT FROM information_schema.tables
WHERE table_schema = 'public'
AND table_name = 'orchestration_runs'
);
-- Check column count
SELECT COUNT(*) FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'orchestration_runs';
-- Should return 47 columns
-- Check indexes
SELECT indexname FROM pg_indexes
WHERE tablename = 'orchestration_runs';
-- Should return 15 indexes
```
## Migration File Comparison
### Old Migration Chain
```
None → 20251029_1700 → 20251105_add_ai_insights
```
### New Consolidated Migration
```
None → 001_initial_schema
```
Both result in the exact same database schema, but the new version is:
- ✅ Better organized and documented
- ✅ Easier to understand and maintain
- ✅ Fixes revision ID inconsistencies
- ✅ Properly categorizes fields
- ✅ Eliminates duplicate index definitions
## Troubleshooting
### Issue: "relation already exists"
**Cause:** Database already has the schema from old migrations.
**Solution:**
- For existing deployments, no action needed
- For fresh start, see "Option 2: Reset Migration History" above
### Issue: "enum type already exists"
**Cause:** The orchestrationstatus enum was created by old migration.
**Solution:**
- The migration uses `create_type=False` and `checkfirst=True` to handle this
- Should not be an issue in practice
### Issue: "duplicate key value violates unique constraint on alembic_version"
**Cause:** Trying to apply new migration on database with old migrations.
**Solution:**
- Don't apply the new migration on existing databases
- The old migrations already provide the same schema
## Deprecation Notice
### Files Superseded (Do Not Delete Yet)
The following migration files are superseded but kept for reference:
- `20251029_1700_add_orchestration_runs.py`
- `20251105_add_ai_insights_tracking.py`
**Why keep them?**
- Existing deployments reference these migrations
- Provides migration history for troubleshooting
- Can be removed in future major version
### Future Cleanup
In a future major version (e.g., v2.0.0), after all deployments have migrated:
1. Archive old migration files to `migrations/archive/`
2. Update documentation to reference only consolidated schema
3. Clean up alembic version history
## Best Practices
1. **Always backup before migrations:**
```bash
pg_dump -Fc -h localhost -U user orchestrator_db > backup_$(date +%Y%m%d).dump
```
2. **Test migrations in staging first:**
- Never run migrations directly in production
- Verify schema changes in staging environment
- Check application compatibility
3. **Monitor migration performance:**
- Initial migration should complete in < 1 second for empty database
- Index creation time scales with data volume
4. **Use version control:**
- All migration files are in git
- Never modify existing migration files
- Create new migrations for schema changes
## Getting Help
If you encounter issues with migrations:
1. Check migration status: `alembic current`
2. Review migration history: `alembic history`
3. Check database schema: See SQL queries in "Checking Your Current Migration Status" section
4. Review logs: Check alembic output for error details
5. Consult SCHEMA_DOCUMENTATION.md for expected schema structure
## Next Steps
After successfully applying migrations:
1. **Verify application startup:**
```bash
docker-compose up orchestrator
```
2. **Run health checks:**
```bash
curl http://localhost:8000/health
```
3. **Test basic operations:**
- Create a test orchestration run
- Query run status
- Verify data persistence
4. **Monitor logs:**
```bash
docker-compose logs -f orchestrator
```
## Related Documentation
- `SCHEMA_DOCUMENTATION.md` - Complete schema reference
- `001_initial_schema.py` - Consolidated migration file
- `../../README.md` - Orchestration service overview

View File

@@ -0,0 +1,295 @@
# Orchestration Service Database Schema
## Overview
This document describes the database schema for the Orchestration Service, which tracks and manages the execution of orchestration workflows across the bakery system.
## Schema Version History
### Initial Schema (001_initial_schema)
This is the consolidated initial schema that includes all tables, columns, indexes, and constraints from the original fragmented migrations.
**Consolidated from:**
- `20251029_1700_add_orchestration_runs.py` - Base orchestration_runs table
- `20251105_add_ai_insights_tracking.py` - AI insights tracking additions
## Tables
### orchestration_runs
The main audit trail table for orchestration executions. This table tracks the entire lifecycle of an orchestration run, including all workflow steps, results, and performance metrics.
#### Columns
##### Primary Identification
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `id` | UUID | No | Primary key, auto-generated UUID |
| `run_number` | VARCHAR(50) | No | Unique human-readable run identifier (indexed, unique) |
##### Run Details
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `tenant_id` | UUID | No | - | Tenant/organization identifier (indexed) |
| `status` | ENUM | No | 'pending' | Current run status (indexed) |
| `run_type` | VARCHAR(50) | No | 'scheduled' | Type of run: scheduled, manual, test (indexed) |
| `priority` | VARCHAR(20) | No | 'normal' | Run priority: normal, high, critical |
##### Timing
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `started_at` | TIMESTAMP | No | now() | When the run started (indexed) |
| `completed_at` | TIMESTAMP | Yes | NULL | When the run completed (indexed) |
| `duration_seconds` | INTEGER | Yes | NULL | Total duration in seconds |
##### Step Tracking - Forecasting
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `forecasting_started_at` | TIMESTAMP | Yes | When forecasting step started |
| `forecasting_completed_at` | TIMESTAMP | Yes | When forecasting step completed |
| `forecasting_status` | VARCHAR(20) | Yes | Status: success, failed, skipped |
| `forecasting_error` | TEXT | Yes | Error message if failed |
##### Step Tracking - Production
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `production_started_at` | TIMESTAMP | Yes | When production step started |
| `production_completed_at` | TIMESTAMP | Yes | When production step completed |
| `production_status` | VARCHAR(20) | Yes | Status: success, failed, skipped |
| `production_error` | TEXT | Yes | Error message if failed |
##### Step Tracking - Procurement
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `procurement_started_at` | TIMESTAMP | Yes | When procurement step started |
| `procurement_completed_at` | TIMESTAMP | Yes | When procurement step completed |
| `procurement_status` | VARCHAR(20) | Yes | Status: success, failed, skipped |
| `procurement_error` | TEXT | Yes | Error message if failed |
##### Step Tracking - Notifications
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `notification_started_at` | TIMESTAMP | Yes | When notification step started |
| `notification_completed_at` | TIMESTAMP | Yes | When notification step completed |
| `notification_status` | VARCHAR(20) | Yes | Status: success, failed, skipped |
| `notification_error` | TEXT | Yes | Error message if failed |
##### Step Tracking - AI Insights
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `ai_insights_started_at` | TIMESTAMP | Yes | NULL | When AI insights step started |
| `ai_insights_completed_at` | TIMESTAMP | Yes | NULL | When AI insights step completed |
| `ai_insights_status` | VARCHAR(20) | Yes | NULL | Status: success, failed, skipped |
| `ai_insights_error` | TEXT | Yes | NULL | Error message if failed |
| `ai_insights_generated` | INTEGER | No | 0 | Number of AI insights generated |
| `ai_insights_posted` | INTEGER | No | 0 | Number of AI insights posted |
##### Results Summary
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `forecasts_generated` | INTEGER | No | 0 | Total forecasts generated |
| `production_batches_created` | INTEGER | No | 0 | Total production batches created |
| `procurement_plans_created` | INTEGER | No | 0 | Total procurement plans created |
| `purchase_orders_created` | INTEGER | No | 0 | Total purchase orders created |
| `notifications_sent` | INTEGER | No | 0 | Total notifications sent |
##### Data Storage
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `forecast_data` | JSONB | Yes | Forecast results for downstream services |
| `run_metadata` | JSONB | Yes | Additional run metadata |
##### Error Handling
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `retry_count` | INTEGER | No | 0 | Number of retry attempts |
| `max_retries_reached` | BOOLEAN | No | false | Whether max retries was reached |
| `error_message` | TEXT | Yes | NULL | General error message |
| `error_details` | JSONB | Yes | NULL | Detailed error information |
##### External References
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `forecast_id` | UUID | Yes | Reference to forecast record |
| `production_schedule_id` | UUID | Yes | Reference to production schedule |
| `procurement_plan_id` | UUID | Yes | Reference to procurement plan |
##### Saga Tracking
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `saga_steps_total` | INTEGER | No | 0 | Total saga steps planned |
| `saga_steps_completed` | INTEGER | No | 0 | Saga steps completed |
##### Audit Fields
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| `created_at` | TIMESTAMP | No | now() | Record creation timestamp |
| `updated_at` | TIMESTAMP | No | now() | Record last update timestamp (auto-updated) |
| `triggered_by` | VARCHAR(100) | Yes | NULL | Who/what triggered the run (indexed) |
##### Performance Metrics
| Column | Type | Nullable | Description |
|--------|------|----------|-------------|
| `fulfillment_rate` | INTEGER | Yes | Fulfillment rate percentage (0-100, indexed) |
| `on_time_delivery_rate` | INTEGER | Yes | On-time delivery rate percentage (0-100, indexed) |
| `cost_accuracy` | INTEGER | Yes | Cost accuracy percentage (0-100, indexed) |
| `quality_score` | INTEGER | Yes | Quality score (0-100, indexed) |
#### Indexes
##### Single Column Indexes
- `ix_orchestration_runs_run_number` - UNIQUE index on run_number for fast lookups
- `ix_orchestration_runs_tenant_id` - Tenant filtering
- `ix_orchestration_runs_status` - Status filtering
- `ix_orchestration_runs_started_at` - Temporal queries
- `ix_orchestration_runs_completed_at` - Temporal queries
- `ix_orchestration_runs_run_type` - Type filtering
- `ix_orchestration_runs_trigger` - Trigger source filtering
##### Composite Indexes (for common query patterns)
- `ix_orchestration_runs_tenant_status` - (tenant_id, status) - Tenant's runs by status
- `ix_orchestration_runs_tenant_type` - (tenant_id, run_type) - Tenant's runs by type
- `ix_orchestration_runs_tenant_started` - (tenant_id, started_at) - Tenant's runs by date
- `ix_orchestration_runs_status_started` - (status, started_at) - Global runs by status and date
##### Performance Metric Indexes
- `ix_orchestration_runs_fulfillment_rate` - Fulfillment rate queries
- `ix_orchestration_runs_on_time_delivery_rate` - Delivery performance queries
- `ix_orchestration_runs_cost_accuracy` - Cost tracking queries
- `ix_orchestration_runs_quality_score` - Quality filtering
## Enums
### orchestrationstatus
Represents the current status of an orchestration run.
**Values:**
- `pending` - Run is queued but not yet started
- `running` - Run is currently executing
- `completed` - Run completed successfully
- `partial_success` - Run completed but some steps failed
- `failed` - Run failed to complete
- `cancelled` - Run was cancelled
## Workflow Steps
The orchestration service coordinates multiple workflow steps in sequence:
1. **Forecasting** - Generate demand forecasts
2. **Production** - Create production schedules
3. **Procurement** - Generate procurement plans and purchase orders
4. **Notifications** - Send notifications to stakeholders
5. **AI Insights** - Generate and post AI-driven insights
Each step tracks:
- Start/completion timestamps
- Status (success/failed/skipped)
- Error messages (if applicable)
- Step-specific metrics
## Data Flow
```
Orchestration Run
1. Forecasting → forecast_data (JSONB)
2. Production → production_schedule_id (UUID)
3. Procurement → procurement_plan_id (UUID)
4. Notifications → notifications_sent (count)
5. AI Insights → ai_insights_posted (count)
```
## Query Patterns
### Common Queries
1. **Get active runs for a tenant:**
```sql
SELECT * FROM orchestration_runs
WHERE tenant_id = ? AND status IN ('pending', 'running')
ORDER BY started_at DESC;
```
*Uses: ix_orchestration_runs_tenant_status*
2. **Get run history for a date range:**
```sql
SELECT * FROM orchestration_runs
WHERE tenant_id = ? AND started_at BETWEEN ? AND ?
ORDER BY started_at DESC;
```
*Uses: ix_orchestration_runs_tenant_started*
3. **Get performance metrics summary:**
```sql
SELECT AVG(fulfillment_rate), AVG(on_time_delivery_rate),
AVG(cost_accuracy), AVG(quality_score)
FROM orchestration_runs
WHERE tenant_id = ? AND status = 'completed'
AND started_at > ?;
```
*Uses: ix_orchestration_runs_tenant_started + metric indexes*
4. **Find failed runs needing attention:**
```sql
SELECT * FROM orchestration_runs
WHERE status = 'failed' AND retry_count < 3
AND max_retries_reached = false
ORDER BY started_at DESC;
```
*Uses: ix_orchestration_runs_status*
## Migration Notes
### Consolidation Changes
The original schema was split across two migrations:
1. Base table with most fields
2. AI insights tracking added later
This consolidation:
- ✅ Combines all fields into one initial migration
- ✅ Fixes revision ID inconsistencies
- ✅ Removes duplicate index definitions
- ✅ Organizes fields logically by category
- ✅ Adds comprehensive documentation
- ✅ Improves maintainability
### Old Migration Files
The following files are superseded by `001_initial_schema.py`:
- `20251029_1700_add_orchestration_runs.py`
- `20251105_add_ai_insights_tracking.py`
**Important:** If your database was already migrated using the old files, you should not apply the new consolidated migration. The new migration is for fresh deployments or can be used after resetting the migration history.
## Best Practices
1. **Always set tenant_id** - Required for multi-tenant isolation
2. **Use run_number for user-facing displays** - More readable than UUID
3. **Track all step timing** - Helps identify bottlenecks
4. **Store detailed errors** - Use error_details JSONB for structured error data
5. **Update metrics in real-time** - Keep counts and statuses current
6. **Use saga tracking** - Helps monitor overall progress
7. **Leverage indexes** - Use composite indexes for multi-column queries
## Performance Considerations
- All timestamp columns have indexes for temporal queries
- Composite indexes optimize common multi-column filters
- JSONB columns (forecast_data, error_details, run_metadata) allow flexible data storage
- Performance metric indexes enable fast analytics queries
- Unique constraint on run_number prevents duplicates
## Future Enhancements
Potential schema improvements for future versions:
- Add foreign key constraints to external references (if services support it)
- Add partition by started_at for very high-volume deployments
- Add GIN indexes on JSONB columns for complex queries
- Add materialized views for common analytics queries

View File

@@ -0,0 +1,141 @@
"""Alembic environment configuration for inventory service"""
import asyncio
import os
import sys
from logging.config import fileConfig
from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import async_engine_from_config
from alembic import context
# Add the service directory to the Python path
service_path = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
if service_path not in sys.path:
sys.path.insert(0, service_path)
# Add shared modules to path
shared_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..", "shared"))
if shared_path not in sys.path:
sys.path.insert(0, shared_path)
try:
from app.core.config import settings
from shared.database.base import Base
# Import all models to ensure they are registered with Base.metadata
from app.models import * # noqa: F401, F403
except ImportError as e:
print(f"Import error in migrations env.py: {e}")
print(f"Current Python path: {sys.path}")
raise
# this is the Alembic Config object
config = context.config
# Determine service name from file path
service_name = os.path.basename(os.path.dirname(os.path.dirname(__file__)))
service_name_upper = service_name.upper().replace('-', '_')
# Set database URL from environment variables with multiple fallback strategies
database_url = (
os.getenv(f'{service_name_upper}_DATABASE_URL') or # Service-specific
os.getenv('DATABASE_URL') # Generic fallback
)
# If DATABASE_URL is not set, construct from individual components
if not database_url:
# Try generic PostgreSQL environment variables first
postgres_host = os.getenv('POSTGRES_HOST')
postgres_port = os.getenv('POSTGRES_PORT', '5432')
postgres_db = os.getenv('POSTGRES_DB')
postgres_user = os.getenv('POSTGRES_USER')
postgres_password = os.getenv('POSTGRES_PASSWORD')
if all([postgres_host, postgres_db, postgres_user, postgres_password]):
database_url = f"postgresql+asyncpg://{postgres_user}:{postgres_password}@{postgres_host}:{postgres_port}/{postgres_db}"
else:
# Try service-specific environment variables
db_host = os.getenv(f'{service_name_upper}_DB_HOST', f'{service_name}-db-service')
db_port = os.getenv(f'{service_name_upper}_DB_PORT', '5432')
db_name = os.getenv(f'{service_name_upper}_DB_NAME', f'{service_name.replace("-", "_")}_db')
db_user = os.getenv(f'{service_name_upper}_DB_USER', f'{service_name.replace("-", "_")}_user')
db_password = os.getenv(f'{service_name_upper}_DB_PASSWORD')
if db_password:
database_url = f"postgresql+asyncpg://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}"
else:
# Final fallback: try to get from settings object
try:
database_url = getattr(settings, 'DATABASE_URL', None)
except Exception:
pass
if not database_url:
error_msg = f"ERROR: No database URL configured for {service_name} service"
print(error_msg)
raise Exception(error_msg)
config.set_main_option("sqlalchemy.url", database_url)
# Interpret the config file for Python logging
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# Set target metadata
target_metadata = Base.metadata
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode."""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
compare_type=True,
compare_server_default=True,
)
with context.begin_transaction():
context.run_migrations()
def do_run_migrations(connection: Connection) -> None:
"""Execute migrations with the given connection."""
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
compare_server_default=True,
)
with context.begin_transaction():
context.run_migrations()
async def run_async_migrations() -> None:
"""Run migrations in 'online' mode with async support."""
connectable = async_engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode."""
asyncio.run(run_async_migrations())
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

View File

@@ -0,0 +1,26 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision: str = ${repr(up_revision)}
down_revision: Union[str, None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}

View File

@@ -0,0 +1,201 @@
"""Initial orchestration schema
Revision ID: 001_initial_schema
Revises:
Create Date: 2025-11-05 00:00:00.000000
This is the consolidated initial schema for the orchestration service.
It includes all tables, enums, indexes, and constraints needed for the
orchestration_runs table and related functionality.
Tables:
- orchestration_runs: Main audit trail for orchestration executions
Enums:
- orchestrationstatus: Status values for orchestration runs
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = '001_initial_schema'
down_revision = None
branch_labels = None
depends_on = None
def upgrade():
"""Create initial orchestration schema"""
# ================================================================
# Create Enums
# ================================================================
# Create PostgreSQL enum type for orchestration status
orchestrationstatus_enum = postgresql.ENUM(
'pending',
'running',
'completed',
'partial_success',
'failed',
'cancelled',
name='orchestrationstatus',
create_type=False
)
orchestrationstatus_enum.create(op.get_bind(), checkfirst=True)
# ================================================================
# Create Tables
# ================================================================
# Create orchestration_runs table
op.create_table(
'orchestration_runs',
# Primary identification
sa.Column('id', postgresql.UUID(as_uuid=True), nullable=False),
sa.Column('run_number', sa.String(length=50), nullable=False),
# Run details
sa.Column('tenant_id', postgresql.UUID(as_uuid=True), nullable=False),
sa.Column('status', orchestrationstatus_enum, nullable=False, server_default='pending'),
sa.Column('run_type', sa.String(length=50), nullable=False, server_default=sa.text("'scheduled'::character varying")),
sa.Column('priority', sa.String(length=20), nullable=False, server_default=sa.text("'normal'::character varying")),
# Timing
sa.Column('started_at', sa.DateTime(timezone=True), server_default=sa.text('now()'), nullable=False),
sa.Column('completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('duration_seconds', sa.Integer(), nullable=True),
# Forecasting step tracking
sa.Column('forecasting_started_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('forecasting_completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('forecasting_status', sa.String(length=20), nullable=True),
sa.Column('forecasting_error', sa.Text(), nullable=True),
# Production step tracking
sa.Column('production_started_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('production_completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('production_status', sa.String(length=20), nullable=True),
sa.Column('production_error', sa.Text(), nullable=True),
# Procurement step tracking
sa.Column('procurement_started_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('procurement_completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('procurement_status', sa.String(length=20), nullable=True),
sa.Column('procurement_error', sa.Text(), nullable=True),
# Notification step tracking
sa.Column('notification_started_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('notification_completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('notification_status', sa.String(length=20), nullable=True),
sa.Column('notification_error', sa.Text(), nullable=True),
# AI Insights step tracking
sa.Column('ai_insights_started_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('ai_insights_completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('ai_insights_status', sa.String(length=20), nullable=True),
sa.Column('ai_insights_error', sa.Text(), nullable=True),
sa.Column('ai_insights_generated', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('ai_insights_posted', sa.Integer(), nullable=False, server_default=sa.text('0')),
# Results summary
sa.Column('forecasts_generated', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('production_batches_created', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('procurement_plans_created', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('purchase_orders_created', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('notifications_sent', sa.Integer(), nullable=False, server_default=sa.text('0')),
# Forecast data passed between services
sa.Column('forecast_data', postgresql.JSONB(astext_type=sa.Text()), nullable=True),
# Error handling
sa.Column('retry_count', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('max_retries_reached', sa.Boolean(), nullable=False, server_default=sa.text('false')),
sa.Column('error_message', sa.Text(), nullable=True),
sa.Column('error_details', postgresql.JSONB(astext_type=sa.Text()), nullable=True),
# External references
sa.Column('forecast_id', postgresql.UUID(as_uuid=True), nullable=True),
sa.Column('production_schedule_id', postgresql.UUID(as_uuid=True), nullable=True),
sa.Column('procurement_plan_id', postgresql.UUID(as_uuid=True), nullable=True),
# Saga tracking
sa.Column('saga_steps_total', sa.Integer(), nullable=False, server_default=sa.text('0')),
sa.Column('saga_steps_completed', sa.Integer(), nullable=False, server_default=sa.text('0')),
# Audit fields
sa.Column('created_at', sa.DateTime(timezone=True), server_default=sa.text('now()'), nullable=False),
sa.Column('updated_at', sa.DateTime(timezone=True), server_default=sa.text('now()'), onupdate=sa.text('now()'), nullable=False),
sa.Column('triggered_by', sa.String(length=100), nullable=True),
# Performance metrics
sa.Column('fulfillment_rate', sa.Integer(), nullable=True),
sa.Column('on_time_delivery_rate', sa.Integer(), nullable=True),
sa.Column('cost_accuracy', sa.Integer(), nullable=True),
sa.Column('quality_score', sa.Integer(), nullable=True),
# Metadata
sa.Column('run_metadata', postgresql.JSONB(astext_type=sa.Text()), nullable=True),
# Constraints
sa.PrimaryKeyConstraint('id', name=op.f('pk_orchestration_runs'))
)
# ================================================================
# Create Indexes
# ================================================================
# Primary lookup indexes
op.create_index('ix_orchestration_runs_run_number', 'orchestration_runs', ['run_number'], unique=True)
op.create_index('ix_orchestration_runs_tenant_id', 'orchestration_runs', ['tenant_id'], unique=False)
op.create_index('ix_orchestration_runs_status', 'orchestration_runs', ['status'], unique=False)
# Temporal indexes
op.create_index('ix_orchestration_runs_started_at', 'orchestration_runs', ['started_at'], unique=False)
op.create_index('ix_orchestration_runs_completed_at', 'orchestration_runs', ['completed_at'], unique=False)
# Classification indexes
op.create_index('ix_orchestration_runs_run_type', 'orchestration_runs', ['run_type'], unique=False)
op.create_index('ix_orchestration_runs_trigger', 'orchestration_runs', ['triggered_by'], unique=False)
# Composite indexes for common queries
op.create_index('ix_orchestration_runs_tenant_status', 'orchestration_runs', ['tenant_id', 'status'], unique=False)
op.create_index('ix_orchestration_runs_tenant_type', 'orchestration_runs', ['tenant_id', 'run_type'], unique=False)
op.create_index('ix_orchestration_runs_tenant_started', 'orchestration_runs', ['tenant_id', 'started_at'], unique=False)
op.create_index('ix_orchestration_runs_status_started', 'orchestration_runs', ['status', 'started_at'], unique=False)
# Performance metric indexes
op.create_index('ix_orchestration_runs_fulfillment_rate', 'orchestration_runs', ['fulfillment_rate'], unique=False)
op.create_index('ix_orchestration_runs_on_time_delivery_rate', 'orchestration_runs', ['on_time_delivery_rate'], unique=False)
op.create_index('ix_orchestration_runs_cost_accuracy', 'orchestration_runs', ['cost_accuracy'], unique=False)
op.create_index('ix_orchestration_runs_quality_score', 'orchestration_runs', ['quality_score'], unique=False)
def downgrade():
"""Drop orchestration schema"""
# Drop indexes
op.drop_index('ix_orchestration_runs_quality_score', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_cost_accuracy', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_on_time_delivery_rate', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_fulfillment_rate', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_status_started', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_tenant_started', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_tenant_type', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_tenant_status', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_trigger', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_run_type', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_completed_at', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_started_at', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_status', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_tenant_id', table_name='orchestration_runs')
op.drop_index('ix_orchestration_runs_run_number', table_name='orchestration_runs')
# Drop table
op.drop_table('orchestration_runs')
# Drop enum type
op.execute("DROP TYPE IF EXISTS orchestrationstatus")

View File

@@ -0,0 +1,55 @@
# Orchestrator Service Dependencies
# FastAPI and web framework
fastapi==0.119.0
uvicorn[standard]==0.32.1
pydantic==2.12.3
pydantic-settings==2.7.1
# Database (minimal - only for audit logs)
sqlalchemy==2.0.44
asyncpg==0.30.0
alembic==1.17.0
psycopg2-binary==2.9.10
# HTTP clients (for service orchestration)
httpx==0.28.1
# Data processing and ML support
pandas==2.2.2
numpy==1.26.4
# Redis for leader election
redis==6.4.0
# Message queuing
aio-pika==9.4.3
# Scheduling (APScheduler for cron-based scheduling)
APScheduler==3.10.4
# Logging and monitoring
structlog==25.4.0
psutil==5.9.8
opentelemetry-api==1.39.1
opentelemetry-sdk==1.39.1
opentelemetry-instrumentation-fastapi==0.60b1
opentelemetry-exporter-otlp-proto-grpc==1.39.1
opentelemetry-exporter-otlp-proto-http==1.39.1
opentelemetry-instrumentation-httpx==0.60b1
opentelemetry-instrumentation-redis==0.60b1
opentelemetry-instrumentation-sqlalchemy==0.60b1
# Date and time utilities
python-dateutil==2.9.0.post0
pytz==2024.2
# Validation
email-validator==2.2.0
# Authentication and JWT
python-jose[cryptography]==3.3.0
# Development dependencies
python-multipart==0.0.6
pytest==8.3.4
pytest-asyncio==0.25.2