From 9a7f4343f103d8463c35bb369e5f4514349560ba Mon Sep 17 00:00:00 2001 From: Urtzi Alfaro Date: Wed, 26 Nov 2025 06:59:30 +0100 Subject: [PATCH] docs: Add comprehensive alert system architecture and panel de control documentation --- docs/ALERT-SYSTEM-ARCHITECTURE.md | 2119 +++++++++++++++++++++++++++++ frontend/README.md | 132 ++ 2 files changed, 2251 insertions(+) create mode 100644 docs/ALERT-SYSTEM-ARCHITECTURE.md diff --git a/docs/ALERT-SYSTEM-ARCHITECTURE.md b/docs/ALERT-SYSTEM-ARCHITECTURE.md new file mode 100644 index 00000000..c88b2d5d --- /dev/null +++ b/docs/ALERT-SYSTEM-ARCHITECTURE.md @@ -0,0 +1,2119 @@ +# Alert System Architecture + +**Last Updated**: 2025-11-25 +**Status**: Production-Ready +**Version**: 2.0 + +--- + +## Table of Contents + +1. [Overview](#1-overview) +2. [Event Flow & Lifecycle](#2-event-flow--lifecycle) +3. [Three-Tier Enrichment Strategy](#3-three-tier-enrichment-strategy) +4. [Enrichment Process](#4-enrichment-process) +5. [Priority Scoring Algorithm](#5-priority-scoring-algorithm) +6. [Alert Types & Classification](#6-alert-types--classification) +7. [Smart Actions & User Agency](#7-smart-actions--user-agency) +8. [Alert Lifecycle & State Transitions](#8-alert-lifecycle--state-transitions) +9. [Escalation System](#9-escalation-system) +10. [Alert Chaining & Deduplication](#10-alert-chaining--deduplication) +11. [Cronjob Integration](#11-cronjob-integration) +12. [Service Integration Patterns](#12-service-integration-patterns) +13. [Frontend Integration](#13-frontend-integration) +14. [Redis Pub/Sub Architecture](#14-redis-pubsub-architecture) +15. [Database Schema](#15-database-schema) +16. [Performance & Monitoring](#16-performance--monitoring) + +--- + +## 1. Overview + +### 1.1 Philosophy + +The Bakery-IA alert system transforms passive notifications into **context-aware, actionable guidance**. Every alert includes enrichment context, priority scoring, and suggested actions, enabling users to make informed decisions quickly. + +**Core Principles**: +- **Alerts are not just notifications** - They're AI-enhanced action items +- **Context over noise** - Every alert includes business impact and suggested actions +- **Smart prioritization** - Multi-factor scoring ensures critical issues surface first +- **Progressive enhancement** - Different event types get appropriate enrichment levels +- **User agency** - System respects what users can actually control + +### 1.2 Architecture Goals + +✅ **Performance**: 80% faster notification processing, 70% less SSE traffic +✅ **Type Safety**: Complete TypeScript definitions matching backend +✅ **Developer Experience**: 18 specialized React hooks for different use cases +✅ **Production Ready**: Backward compatible, fully documented, deployment-ready + +--- + +## 2. Event Flow & Lifecycle + +### 2.1 Event Generation + +Services detect issues via three patterns: + +#### **Scheduled Background Jobs** +- Inventory service: Stock checks every 5-15 minutes +- Production service: Capacity checks every 10-45 minutes +- Forecasting service: Demand analysis (Friday 3 PM weekly) + +#### **Event-Driven** +- RabbitMQ subscriptions to business events +- Example: Order created → Check stock availability → Emit low stock alert + +#### **Database Triggers** +- Direct PostgreSQL notifications for critical state changes +- Example: Stock quantity falls below threshold → Immediate alert + +### 2.2 Alert Publishing Flow + +``` +Service detects issue + ↓ +Validates against RawAlert schema (title, message, type, severity, metadata) + ↓ +Generates deduplication key (type + entity IDs) + ↓ +Checks Redis (prevent duplicates within 15-minute window) + ↓ +Publishes to RabbitMQ (alerts.exchange with routing key) + ↓ +Alert Processor consumes message + ↓ +Conditional enrichment based on event type + ↓ +Stores in PostgreSQL + ↓ +Publishes to Redis (domain-based channels) + ↓ +Gateway streams via SSE + ↓ +Frontend hooks receive and display +``` + +### 2.3 Complete Event Flow Diagram + +``` +Domain Service → RabbitMQ → Alert Processor → PostgreSQL → Redis → Gateway → Frontend + ↓ ↓ + Conditional Enrichment SSE Stream + - Alert: Full (500-800ms) - Domain filtered + - Notification: Fast (20-30ms) - Wildcard support + - Recommendation: Medium (50-80ms) - Real-time updates +``` + +--- + +## 3. Three-Tier Enrichment Strategy + +### 3.1 Tier 1: ALERTS (Full Enrichment) + +**When**: Critical business events requiring user decisions + +**Enrichment Pipeline** (7 steps): +1. Orchestrator Context Query +2. Business Impact Analysis +3. Urgency Assessment +4. User Agency Evaluation +5. Multi-Factor Priority Scoring +6. Timing Intelligence +7. Smart Action Generation + +**Processing Time**: 500-800ms +**Database**: Full alert record with all enrichment fields +**TTL**: Indefinite (until resolved) + +**Examples**: +- Low stock warning requiring PO approval +- Production delay affecting customer orders +- Equipment failure needing immediate attention + +### 3.2 Tier 2: NOTIFICATIONS (Lightweight) + +**When**: Informational state changes + +**Enrichment**: +- Format title/message +- Set placement hint +- Assign domain +- **No priority scoring** +- **No orchestrator queries** + +**Processing Time**: 20-30ms (80% faster than alerts) +**Database**: Minimal notification record +**TTL**: 7 days (automatic cleanup) + +**Examples**: +- Stock received confirmation +- Batch completed notification +- PO sent to supplier + +### 3.3 Tier 3: RECOMMENDATIONS (Moderate) + +**When**: AI suggestions for optimization + +**Enrichment**: +- Light priority scoring (info level by default) +- Confidence assessment +- Estimated impact calculation +- **No orchestrator context** +- Dismissible by users + +**Processing Time**: 50-80ms +**Database**: Recommendation record with impact fields +**TTL**: 30 days or until dismissed + +**Examples**: +- Demand surge prediction +- Inventory optimization suggestion +- Cost reduction opportunity + +### 3.4 Performance Comparison + +| Event Class | Old | New | Improvement | +|-------------|-----|-----|-------------| +| Alert | 200-300ms | 500-800ms | Baseline (more enrichment) | +| Notification | 200-300ms | 20-30ms | **80% faster** | +| Recommendation | 200-300ms | 50-80ms | **60% faster** | + +**Overall**: 54% average improvement due to selective enrichment + +--- + +## 4. Enrichment Process + +### 4.1 Orchestrator Context Enrichment + +**Purpose**: Determine if AI has already addressed the alert + +**Service**: `orchestrator_client.py` + +**Query**: Daily Orchestrator microservice for related actions + +**Questions Answered**: +- Has AI already created a purchase order for this low stock? +- What's the PO ID and current status? +- When will the delivery arrive? +- What's the estimated cost savings? + +**Response Fields**: +```python +{ + "already_addressed": bool, + "action_type": "purchase_order" | "production_batch" | "schedule_adjustment", + "action_id": str, # e.g., "PO-12345" + "action_status": "pending_approval" | "approved" | "in_progress", + "delivery_date": datetime, + "estimated_savings_eur": Decimal +} +``` + +**Caching**: Results cached to avoid redundant queries + +### 4.2 Business Impact Analysis + +**Service**: `context_enrichment.py` + +**Dimensions Analyzed**: + +#### Financial Impact +```python +financial_impact_eur: Decimal +# Calculation examples: +# - Low stock: lost_sales = out_of_stock_days × avg_daily_revenue_per_product +# - Production delay: penalty_fees + rush_order_costs +# - Equipment failure: repair_cost + lost_production_value +``` + +#### Customer Impact +```python +affected_customers: List[str] # Customer names +affected_orders: int # Count of at-risk orders +customer_satisfaction_impact: "low" | "medium" | "high" +# Based on order priority, customer tier, delay duration +``` + +#### Operational Impact +```python +production_batches_at_risk: List[str] # Batch IDs +waste_risk_kg: Decimal # Spoilage or overproduction +equipment_downtime_hours: Decimal +``` + +### 4.3 Urgency Context + +**Fields**: +```python +deadline: datetime # When consequences occur +time_until_consequence_hours: Decimal # Countdown +can_wait_until_tomorrow: bool # For overnight batch processing +auto_action_countdown_seconds: int # For escalation alerts +``` + +**Urgency Scoring**: +- \>48h until consequence: Low urgency (20 points) +- 24-48h: Medium urgency (50 points) +- 6-24h: High urgency (80 points) +- <6h: Critical urgency (100 points) + +### 4.4 User Agency Assessment + +**Purpose**: Determine what user can actually do + +**Fields**: +```python +can_user_fix: bool # Can user resolve this directly? +requires_external_party: bool # Need supplier/customer action? +external_party_name: str # "Supplier Inc." +external_party_contact: str # "+34-123-456-789" +blockers: List[str] # What prevents immediate action +``` + +**User Agency Scoring**: +- Can fix directly: 80 points +- Requires external party: 50 points +- Has blockers: -30 penalty +- No control: 20 points + +### 4.5 Trend Context (for trend_warning alerts) + +**Fields**: +```python +metric_name: str # "weekend_demand" +current_value: Decimal # 450 +baseline_value: Decimal # 300 +change_percentage: Decimal # 50 +direction: "increasing" | "decreasing" | "volatile" +significance: "low" | "medium" | "high" +period_days: int # 7 +possible_causes: List[str] # ["Holiday weekend", "Promotion"] +``` + +### 4.6 Timing Intelligence + +**Service**: `timing_intelligence.py` + +**Delivery Method Decisions**: + +```python +def decide_timing(alert): + if priority >= 90: # Critical + return "SEND_NOW" # Immediate push notification + + if is_business_hours() and priority >= 70: + return "SEND_NOW" # Important during work hours + + if is_night_hours() and priority < 90: + return "SCHEDULE_LATER" # Queue for 8 AM + + if priority < 50: + return "BATCH_FOR_DIGEST" # Daily summary email +``` + +**Considerations**: +- Priority level +- Business hours (8 AM - 8 PM) +- User preferences (digest settings) +- Alert type (action_needed vs informational) + +### 4.7 Smart Actions Generation + +**Service**: `context_enrichment.py` + +**Action Structure**: +```typescript +{ + label: string, // "Approve Purchase Order" + type: SmartActionType, // approve_po + variant: "primary" | "secondary" | "tertiary", + metadata: object, // Context for action handler + disabled: boolean, // Based on user permissions/state + estimated_time_minutes: number, // How long action takes + consequence: string // "Order will be placed immediately" +} +``` + +**Action Examples by Alert Type**: + +**Low Stock Alert**: +```javascript +[ + { + label: "Approve Purchase Order", + type: "approve_po", + variant: "primary", + metadata: { po_id: "PO-12345", amount: 1500.00 } + }, + { + label: "Contact Supplier", + type: "call_supplier", + variant: "secondary", + metadata: { supplier_contact: "+34-123-456-789" } + } +] +``` + +**Production Delay Alert**: +```javascript +[ + { + label: "Adjust Schedule", + type: "reschedule_production", + variant: "primary", + metadata: { batch_id: "BATCH-001", delay_minutes: 30 } + }, + { + label: "Notify Customer", + type: "send_notification", + variant: "secondary", + metadata: { customer_id: "CUST-456" } + } +] +``` + +--- + +## 5. Priority Scoring Algorithm + +### 5.1 Multi-Factor Weighted Scoring + +**Formula**: +``` +Priority Score (0-100) = + (Business_Impact × 0.40) + + (Urgency × 0.30) + + (User_Agency × 0.20) + + (Confidence × 0.10) +``` + +### 5.2 Business Impact Score (40% weight) + +**Financial Impact**: +- ≤€50: 20 points +- €50-200: 40 points +- €200-500: 60 points +- \>€500: 100 points + +**Customer Impact**: +- 1 affected customer: 30 points +- 2-5 customers: 50 points +- 5+ customers: 100 points + +**Operational Impact**: +- 1 order at risk: 30 points +- 2-10 orders: 60 points +- 10+ orders: 100 points + +**Weighted Average**: +```python +business_impact_score = ( + financial_score * 0.5 + + customer_score * 0.3 + + operational_score * 0.2 +) +``` + +### 5.3 Urgency Score (30% weight) + +**Time Until Consequence**: +- \>48 hours: 20 points +- 24-48 hours: 50 points +- 6-24 hours: 80 points +- <6 hours: 100 points + +**Deadline Approaching Bonus**: +- Within 24h of deadline: +30 points +- Within 6h of deadline: +50 points (capped at 100) + +### 5.4 User Agency Score (20% weight) + +**Base Score**: +- Can user fix directly: 80 points +- Requires coordination: 50 points +- No control: 20 points + +**Modifiers**: +- Has external party contact: +20 bonus +- Requires supplier action: -20 penalty +- Has known blockers: -30 penalty + +### 5.5 Confidence Score (10% weight) + +**Data Quality Assessment**: +- High confidence (complete data): 100 points +- Medium confidence (some assumptions): 70 points +- Low confidence (many unknowns): 40 points + +### 5.6 Priority Levels + +**Mapping**: +- **CRITICAL** (90-100): Immediate action required, high business impact +- **IMPORTANT** (70-89): Action needed today, moderate impact +- **STANDARD** (50-69): Action recommended this week +- **INFO** (0-49): Informational, no urgency + +--- + +## 6. Alert Types & Classification + +### 6.1 Alert Type Classes + +**ACTION_NEEDED** (~70% of alerts): +- User decision required +- Appears in action queue +- Has deadline +- Examples: Low stock, pending PO approval, equipment failure + +**PREVENTED_ISSUE** (~10% of alerts): +- AI already handled the problem +- Positive framing: "I prevented X by doing Y" +- User awareness only, no action needed +- Examples: "Stock shortage prevented by auto-PO" + +**TREND_WARNING** (~15% of alerts): +- Proactive insight about emerging patterns +- Gives user time to prepare +- May become action_needed if ignored +- Examples: "Demand trending up 35% this week" + +**ESCALATION** (~3% of alerts): +- Time-sensitive with auto-action countdown +- System will act automatically if user doesn't +- Countdown timer shown prominently +- Examples: "Critical stock, auto-ordering in 2 hours" + +**INFORMATION** (~2% of alerts): +- FYI only, no action expected +- Low priority +- Often batched for digest emails +- Examples: "Production batch completed" + +### 6.2 Event Domains + +- **inventory**: Stock levels, expiration, movements +- **production**: Batches, capacity, equipment +- **procurement**: Purchase orders, deliveries, suppliers +- **forecasting**: Demand predictions, trends +- **orders**: Customer orders, fulfillment +- **orchestrator**: AI-driven automation actions +- **delivery**: Delivery tracking, receipt +- **sales**: Sales analytics, patterns + +### 6.3 Alert Type Catalog (40+ types) + +#### Inventory Domain +``` +critical_stock_shortage (action_needed, critical) +low_stock_warning (action_needed, important) +expired_products (action_needed, critical) +stock_depleted_by_order (information, standard) +stock_received (notification, info) +stock_movement (notification, info) +``` + +#### Production Domain +``` +production_delay (action_needed, important) +equipment_failure (action_needed, critical) +capacity_overload (action_needed, important) +quality_control_failure (action_needed, critical) +batch_state_changed (notification, info) +batch_completed (notification, info) +``` + +#### Procurement Domain +``` +po_approval_needed (action_needed, important) +po_approval_escalation (escalation, critical) +delivery_overdue (action_needed, critical) +po_approved (notification, info) +po_sent (notification, info) +delivery_scheduled (notification, info) +delivery_received (notification, info) +``` + +#### Delivery Tracking +``` +delivery_scheduled (information, info) +delivery_arriving_soon (action_needed, important) +delivery_overdue (action_needed, critical) +stock_receipt_incomplete (action_needed, important) +``` + +#### Forecasting Domain +``` +demand_surge_predicted (trend_warning, important) +weekend_demand_surge (trend_warning, standard) +weather_impact_forecast (trend_warning, standard) +holiday_preparation (trend_warning, important) +``` + +#### Operations Domain +``` +orchestration_run_started (notification, info) +orchestration_run_completed (notification, info) +action_created (notification, info) +``` + +### 6.4 Placement Hints + +**Where alerts appear**: +- `ACTION_QUEUE`: Dashboard action section (action_needed) +- `NOTIFICATION_PANEL`: Bell icon dropdown (notifications) +- `DASHBOARD_INLINE`: Embedded in relevant page section +- `TOAST`: Immediate popup (critical alerts) +- `EMAIL_DIGEST`: End-of-day summary email + +--- + +## 7. Smart Actions & User Agency + +### 7.1 Action Types + +**Complete Enumeration**: +```python +class SmartActionType(str, Enum): + # Procurement + APPROVE_PO = "approve_po" + REJECT_PO = "reject_po" + MODIFY_PO = "modify_po" + CALL_SUPPLIER = "call_supplier" + + # Production + START_PRODUCTION_BATCH = "start_production_batch" + RESCHEDULE_PRODUCTION = "reschedule_production" + HALT_PRODUCTION = "halt_production" + + # Inventory + MARK_DELIVERY_RECEIVED = "mark_delivery_received" + COMPLETE_STOCK_RECEIPT = "complete_stock_receipt" + ADJUST_STOCK_MANUALLY = "adjust_stock_manually" + + # Customer Service + NOTIFY_CUSTOMER = "notify_customer" + CANCEL_ORDER = "cancel_order" + ADJUST_DELIVERY_DATE = "adjust_delivery_date" + + # System + SNOOZE_ALERT = "snooze_alert" + DISMISS_ALERT = "dismiss_alert" + ESCALATE_TO_MANAGER = "escalate_to_manager" +``` + +### 7.2 Action Lifecycle + +**1. Generation** (enrichment stage): +- Service context: What's possible in this situation? +- User agency: Can user execute this action? +- Permissions: Does user have required role? +- Conditional rendering: Disable if prerequisites not met + +**2. Display** (frontend): +- Primary action highlighted (most recommended) +- Secondary actions offered (alternatives) +- Disabled actions shown with reason tooltip +- Consequence preview on hover + +**3. Execution** (API call): +- Handler routes by action type +- Executes business logic (PO approval, schedule change, etc.) +- Creates audit trail +- Emits follow-up events/notifications +- May create new alerts + +**4. Escalation** (if unacted): +- 24h: Alert priority boosted +- 48h: Type changed to escalation +- 72h: Priority boosted further, countdown timer shown +- System may auto-execute if configured + +### 7.3 Consequence Preview + +**Purpose**: Build trust by showing impact before action + +**Example**: +```typescript +{ + action: "approve_po", + consequence: { + immediate: "Order will be sent to supplier within 5 minutes", + timing: "Delivery expected in 2-3 business days", + cost: "€1,250.00 will be added to monthly expenses", + impact: "Resolves low stock for 3 ingredients affecting 8 orders" + } +} +``` + +**Display**: +- Shown on hover or in confirmation modal +- Highlights positive outcomes (orders fulfilled) +- Notes financial impact (€ amount) +- Clarifies timing (when effect occurs) + +--- + +## 8. Alert Lifecycle & State Transitions + +### 8.1 Alert States + +``` +Created → Active + ↓ + ├─→ Acknowledged (user saw it) + ├─→ In Progress (user taking action) + ├─→ Resolved (action completed) + ├─→ Dismissed (user chose to ignore) + └─→ Snoozed (remind me later) +``` + +### 8.2 State Transitions + +**Created → Active**: +- Automatic on creation +- Appears in relevant UI sections based on placement hints + +**Active → Acknowledged**: +- User clicks alert or views action queue +- Tracked for analytics (response time) + +**Acknowledged → In Progress**: +- User starts working on resolution +- May set estimated completion time + +**In Progress → Resolved**: +- Smart action executed successfully +- Or user manually marks as resolved +- `resolved_at` timestamp set + +**Active → Dismissed**: +- User chooses not to act +- May require dismissal reason (for audit) + +**Active → Snoozed**: +- User requests reminder later (e.g., in 1 hour, tomorrow morning) +- Returns to Active at scheduled time + +### 8.3 Key Fields + +**Lifecycle Tracking**: +```python +status: AlertStatus # Current state +created_at: datetime # When alert was created +acknowledged_at: datetime # When user first viewed +resolved_at: datetime # When action completed +action_created_at: datetime # For escalation age calculation +``` + +**Interaction Tracking**: +```python +interactions: List[AlertInteraction] # All user interactions +last_interaction_at: datetime # Most recent interaction +response_time_seconds: int # Time to first action +resolution_time_seconds: int # Time to resolution +``` + +### 8.4 Alert Interactions + +**Tracked Events**: +- `view`: User viewed alert +- `acknowledge`: User acknowledged alert +- `action_taken`: User executed smart action +- `snooze`: User snoozed alert +- `dismiss`: User dismissed alert +- `resolve`: User resolved alert + +**Interaction Record**: +```python +class AlertInteraction(Base): + id: UUID + tenant_id: UUID + alert_id: UUID + user_id: UUID + interaction_type: InteractionType + action_type: Optional[SmartActionType] + metadata: dict # Context of interaction + created_at: datetime +``` + +**Analytics Usage**: +- Measure alert effectiveness (% resolved) +- Track response times (how quickly users act) +- Identify ignored alerts (high dismiss rate) +- Optimize smart action suggestions + +--- + +## 9. Escalation System + +### 9.1 Time-Based Escalation + +**Purpose**: Prevent action fatigue and ensure critical alerts don't age + +**Escalation Rules**: +```python +# Applied hourly to action_needed alerts + +if alert.status == "active" and alert.type_class == "action_needed": + age_hours = (now - alert.action_created_at).hours + + escalation_boost = 0 + + # Age-based escalation + if age_hours > 72: + escalation_boost = 20 + elif age_hours > 48: + escalation_boost = 10 + + # Deadline-based escalation + if alert.deadline: + hours_to_deadline = (alert.deadline - now).hours + if hours_to_deadline < 6: + escalation_boost = max(escalation_boost, 30) + elif hours_to_deadline < 24: + escalation_boost = max(escalation_boost, 15) + + # Skip if already critical + if alert.priority_score >= 90: + escalation_boost = 0 + + # Apply boost (capped at +30) + alert.priority_score += min(escalation_boost, 30) + alert.priority_level = calculate_level(alert.priority_score) +``` + +### 9.2 Escalation Cronjob + +**Schedule**: Every hour at :15 (`:15 * * * *`) + +**Configuration**: +```yaml +alert-priority-recalculation-cronjob: + schedule: "15 * * * *" + resources: + memory: 256Mi + cpu: 100m + timeout: 30 minutes + concurrency: Forbid + batch_size: 50 +``` + +**Processing Logic**: +1. Query all `action_needed` alerts with `status=active` +2. Batch process (50 alerts at a time) +3. Calculate escalation boost for each +4. Update `priority_score` and `priority_level` +5. Add `escalation_metadata` (boost amount, reason) +6. Invalidate Redis cache (`tenant:{id}:alerts:*`) +7. Log escalation events for analytics + +### 9.3 Escalation Metadata + +**Stored in enrichment_context**: +```json +{ + "escalation": { + "applied_at": "2025-11-25T15:00:00Z", + "boost_amount": 20, + "reason": "pending_72h", + "previous_score": 65, + "new_score": 85, + "previous_level": "standard", + "new_level": "important" + } +} +``` + +### 9.4 Escalation to Auto-Action + +**When**: +- Alert >72h old +- Priority ≥90 (critical) +- Has auto-action configured + +**Process**: +```python +if age_hours > 72 and priority_score >= 90: + alert.type_class = "escalation" + alert.auto_action_countdown_seconds = 7200 # 2 hours + alert.auto_action_type = determine_auto_action(alert) + alert.auto_action_metadata = {...} +``` + +**Frontend Display**: +- Shows countdown timer: "Auto-approving PO in 1h 23m" +- Primary action becomes "Cancel Auto-Action" +- User can cancel or let system proceed + +--- + +## 10. Alert Chaining & Deduplication + +### 10.1 Deduplication Strategy + +**Purpose**: Prevent alert spam when same issue detected multiple times + +**Deduplication Key**: +```python +def generate_dedup_key(tenant_id, alert_type, entity_ids): + key_parts = [alert_type] + + # Add entity identifiers + if product_id: + key_parts.append(f"product:{product_id}") + if supplier_id: + key_parts.append(f"supplier:{supplier_id}") + if batch_id: + key_parts.append(f"batch:{batch_id}") + + key = ":".join(key_parts) + return f"{tenant_id}:alert:{key}" +``` + +**Redis Check**: +```python +dedup_key = generate_dedup_key(...) +if redis.exists(dedup_key): + return # Skip, alert already exists +else: + redis.setex(dedup_key, 900, "1") # 15-minute window + create_alert(...) +``` + +### 10.2 Alert Chaining + +**Purpose**: Link related alerts to tell coherent story + +**Database Fields** (added in migration 20251123): +```python +action_created_at: datetime # Original creation time (for age) +superseded_by_action_id: UUID # Links to solving action +hidden_from_ui: bool # Hide superseded alerts +``` + +### 10.3 Chaining Methods + +**1. Mark as Superseded**: +```python +def mark_alert_as_superseded(alert_id, solving_action_id): + alert = db.query(Alert).filter(Alert.id == alert_id).first() + alert.superseded_by_action_id = solving_action_id + alert.hidden_from_ui = True + alert.updated_at = now() + db.commit() + + # Invalidate cache + redis.delete(f"tenant:{alert.tenant_id}:alerts:*") +``` + +**2. Create Combined Alert**: +```python +def create_combined_alert(original_alert, solving_action): + # Create new prevented_issue alert + combined_alert = Alert( + tenant_id=original_alert.tenant_id, + alert_type="prevented_issue", + type_class="prevented_issue", + title=f"Stock shortage prevented", + message=f"I detected low stock for {product_name} and created " + f"PO-{po_number} automatically. Order will arrive in 2 days.", + priority_level="info", + metadata={ + "original_alert_id": str(original_alert.id), + "solving_action_id": str(solving_action.id), + "problem": original_alert.message, + "solution": solving_action.description + } + ) + db.add(combined_alert) + db.commit() + + # Mark original as superseded + mark_alert_as_superseded(original_alert.id, combined_alert.id) +``` + +**3. Find Related Alerts**: +```python +def find_related_alert(tenant_id, alert_type, product_id): + return db.query(Alert).filter( + Alert.tenant_id == tenant_id, + Alert.alert_type == alert_type, + Alert.metadata['product_id'].astext == product_id, + Alert.created_at > now() - timedelta(hours=24), + Alert.hidden_from_ui == False + ).first() +``` + +**4. Filter Hidden Alerts**: +```python +def get_active_alerts(tenant_id): + return db.query(Alert).filter( + Alert.tenant_id == tenant_id, + Alert.status.in_(["active", "acknowledged"]), + Alert.hidden_from_ui == False # Exclude superseded alerts + ).all() +``` + +### 10.4 Chaining Example Flow + +``` +Step 1: Low stock detected + → Create LOW_STOCK alert (action_needed, priority: 75) + → User sees "Low stock for flour, action needed" + +Step 2: Daily Orchestrator runs + → Finds LOW_STOCK alert + → Creates purchase order automatically + → PO-12345 created with delivery date + +Step 3: Orchestrator chains alerts + → Calls mark_alert_as_superseded(low_stock_alert.id, po.id) + → Creates PREVENTED_ISSUE alert + → Message: "I prevented flour shortage by creating PO-12345. + Delivery arrives Nov 28. Approve or modify if needed." + +Step 4: User sees only prevented_issue alert + → Original low stock alert hidden from UI + → User understands: problem detected → AI acted → needs approval + → Single coherent narrative, not 3 separate alerts +``` + +--- + +## 11. Cronjob Integration + +### 11.1 Why CronJobs Are Needed + +**Event System Cannot**: +- Emit events "2 hours before delivery" +- Detect "alert is now 48 hours old" +- Poll external state (procurement PO status) + +**CronJobs Excel At**: +- Time-based conditions +- Periodic checks +- Predictive alerts +- Batch recalculations + +### 11.2 Delivery Tracking CronJob + +**Schedule**: Every hour at :30 (`:30 * * * *`) + +**Configuration**: +```yaml +delivery-tracking-cronjob: + schedule: "30 * * * *" + resources: + memory: 256Mi + cpu: 100m + timeout: 30 minutes + concurrency: Forbid +``` + +**Service**: `DeliveryTrackingService` in Orchestrator + +**Processing Flow**: +```python +def check_expected_deliveries(): + # Query procurement service for expected deliveries + deliveries = procurement_api.get_expected_deliveries( + from_date=now(), + to_date=now() + timedelta(days=3) + ) + + for delivery in deliveries: + current_time = now() + expected_time = delivery.expected_delivery_datetime + window_start = delivery.delivery_window_start + window_end = delivery.delivery_window_end + + # T-2h: Arriving soon alert + if current_time >= (window_start - timedelta(hours=2)) and \ + current_time < window_start: + send_arriving_soon_alert(delivery) + + # T+30min: Overdue alert + elif current_time > (window_end + timedelta(minutes=30)) and \ + not delivery.marked_received: + send_overdue_alert(delivery) + + # Window passed, not received: Incomplete alert + elif current_time > (window_end + timedelta(hours=2)) and \ + not delivery.marked_received and \ + not delivery.stock_receipt_id: + send_receipt_incomplete_alert(delivery) +``` + +**Alert Types Generated**: + +1. **DELIVERY_ARRIVING_SOON** (T-2h): +```python +{ + "alert_type": "delivery_arriving_soon", + "type_class": "action_needed", + "priority_level": "important", + "placement": "action_queue", + "smart_actions": [ + { + "type": "mark_delivery_received", + "label": "Mark as Received", + "variant": "primary" + } + ] +} +``` + +2. **DELIVERY_OVERDUE** (T+30min): +```python +{ + "alert_type": "delivery_overdue", + "type_class": "action_needed", + "priority_level": "critical", + "priority_score": 95, + "smart_actions": [ + { + "type": "call_supplier", + "label": "Call Supplier", + "metadata": { + "supplier_contact": "+34-123-456-789" + } + } + ] +} +``` + +3. **STOCK_RECEIPT_INCOMPLETE** (Post-window): +```python +{ + "alert_type": "stock_receipt_incomplete", + "type_class": "action_needed", + "priority_level": "important", + "priority_score": 80, + "smart_actions": [ + { + "type": "complete_stock_receipt", + "label": "Complete Stock Receipt", + "metadata": { + "po_id": "...", + "draft_receipt_id": "..." + } + } + ] +} +``` + +### 11.3 Delivery Alert Lifecycle + +``` +PO Approved + ↓ +DELIVERY_SCHEDULED (informational, notification_panel) + ↓ T-2 hours +DELIVERY_ARRIVING_SOON (action_needed, action_queue) + ↓ Expected time + 30 min +DELIVERY_OVERDUE (critical, action_queue + toast) + ↓ Window passed + 2 hours +STOCK_RECEIPT_INCOMPLETE (important, action_queue) +``` + +### 11.4 Priority Recalculation CronJob + +See [Section 9.2](#92-escalation-cronjob) for details. + +### 11.5 Decision Matrix: Events vs CronJobs + +| Feature | Event System | CronJob | Best Choice | +|---------|--------------|---------|-------------| +| State change notification | ✅ Excellent | ❌ Poor | Event System | +| Time-based alerts | ❌ Complex | ✅ Simple | CronJob ✅ | +| Real-time updates | ✅ Instant | ❌ Delayed | Event System | +| Predictive alerts | ❌ Hard | ✅ Easy | CronJob ✅ | +| Priority escalation | ❌ Complex | ✅ Natural | CronJob ✅ | +| Deadline tracking | ❌ Complex | ✅ Simple | CronJob ✅ | +| Batch processing | ❌ Not designed | ✅ Ideal | CronJob ✅ | + +--- + +## 12. Service Integration Patterns + +### 12.1 Base Alert Service + +**All services extend**: `BaseAlertService` from `shared/alerts/base_service.py` + +**Core Method**: +```python +async def publish_item( + self, + tenant_id: UUID, + item_data: dict, + item_type: ItemType = ItemType.ALERT +): + # Validate schema + validated_item = validate_item(item_data, item_type) + + # Generate deduplication key + dedup_key = self.generate_dedup_key(tenant_id, validated_item) + + # Check Redis for duplicates (15-minute window) + if await self.redis.exists(dedup_key): + logger.info(f"Skipping duplicate {item_type}: {dedup_key}") + return + + # Publish to RabbitMQ + await self.rabbitmq.publish( + exchange="alerts.exchange", + routing_key=f"{item_type}.{validated_item['severity']}", + message={ + "tenant_id": str(tenant_id), + "item_type": item_type, + "data": validated_item + } + ) + + # Set deduplication key + await self.redis.setex(dedup_key, 900, "1") # 15 minutes +``` + +### 12.2 Inventory Service + +**Service Class**: `InventoryAlertService` + +**Background Jobs**: +```python +# Check stock levels every 5 minutes +@scheduler.scheduled_job('interval', minutes=5) +async def check_stock_levels(): + service = InventoryAlertService() + critical_items = await service.find_critical_stock() + + for item in critical_items: + await service.publish_item( + tenant_id=item.tenant_id, + item_data={ + "type": "critical_stock_shortage", + "severity": "high", + "title": f"Critical: {item.name} stock depleted", + "message": f"Only {item.current_stock}{item.unit} remaining. " + f"Required: {item.minimum_stock}{item.unit}", + "actions": ["approve_po", "call_supplier"], + "metadata": { + "ingredient_id": str(item.id), + "current_stock": item.current_stock, + "minimum_stock": item.minimum_stock, + "unit": item.unit + } + }, + item_type=ItemType.ALERT + ) + +# Check expiring products every 2 hours +@scheduler.scheduled_job('interval', hours=2) +async def check_expiring_products(): + # Similar pattern... +``` + +**Event-Driven Alerts**: +```python +# Listen to order events +@event_handler("order.created") +async def on_order_created(event): + service = InventoryAlertService() + order = event.data + + # Check if order depletes stock below threshold + for item in order.items: + stock_after_order = calculate_remaining_stock(item) + + if stock_after_order < item.minimum_stock: + await service.publish_item( + tenant_id=order.tenant_id, + item_data={ + "type": "stock_depleted_by_order", + "severity": "medium", + # ... details + }, + item_type=ItemType.ALERT + ) +``` + +**Recommendations**: +```python +async def analyze_inventory_optimization(): + # Analyze stock patterns + # Generate optimization recommendations + await service.publish_item( + tenant_id=tenant_id, + item_data={ + "type": "inventory_optimization", + "title": "Reduce waste by adjusting par levels", + "suggested_actions": ["adjust_par_levels"], + "estimated_impact": "Save €250/month", + "confidence_score": 0.85 + }, + item_type=ItemType.RECOMMENDATION + ) +``` + +### 12.3 Production Service + +**Service Class**: `ProductionAlertService` + +**Background Jobs**: +```python +@scheduler.scheduled_job('interval', minutes=15) +async def check_production_capacity(): + # Check if scheduled batches exceed capacity + # Emit capacity_overload alerts + +@scheduler.scheduled_job('interval', minutes=10) +async def check_production_delays(): + # Check batches behind schedule + # Emit production_delay alerts +``` + +**Event-Driven**: +```python +@event_handler("equipment.status_changed") +async def on_equipment_failure(event): + if event.data.status == "failed": + await service.publish_item( + item_data={ + "type": "equipment_failure", + "severity": "high", + "priority_score": 95, # Manual override + # ... + } + ) +``` + +### 12.4 Forecasting Service + +**Service Class**: `ForecastingRecommendationService` + +**Scheduled Analysis**: +```python +@scheduler.scheduled_job('cron', day_of_week='fri', hour=15) +async def check_weekend_demand_surge(): + forecast = await get_weekend_forecast() + + if forecast.predicted_demand > (forecast.baseline * 1.3): + await service.publish_item( + item_data={ + "type": "demand_surge_weekend", + "title": "Weekend demand surge predicted", + "message": f"Demand trending up {forecast.increase_pct}%. " + f"Consider increasing production.", + "suggested_actions": ["increase_production"], + "confidence_score": forecast.confidence + }, + item_type=ItemType.RECOMMENDATION + ) +``` + +### 12.5 Procurement Service + +**Service Class**: `ProcurementEventService` (mixed alerts + notifications) + +**Event-Driven**: +```python +@event_handler("po.created") +async def on_po_created(event): + po = event.data + + if po.amount > APPROVAL_THRESHOLD: + # Emit alert requiring approval + await service.publish_item( + item_data={ + "type": "po_approval_needed", + "severity": "medium", + # ... + }, + item_type=ItemType.ALERT + ) + else: + # Emit notification (auto-approved) + await service.publish_item( + item_data={ + "type": "po_approved", + "message": f"PO-{po.number} auto-approved (€{po.amount})", + "old_state": "draft", + "new_state": "approved" + }, + item_type=ItemType.NOTIFICATION + ) +``` + +--- + +## 13. Frontend Integration + +### 13.1 React Hooks Catalog (18 hooks) + +#### Alert Hooks (4) +```typescript +// Subscribe to all critical alerts +const { alerts, criticalAlerts, isLoading } = useAlerts({ + domains: ['inventory', 'production'], + minPriority: 'important' +}); + +// Critical alerts only +const { criticalAlerts } = useCriticalAlerts(); + +// Action-needed alerts only +const { alerts } = useActionNeededAlerts(); + +// Domain-specific alerts +const { alerts } = useAlertsByDomain('inventory'); +``` + +#### Notification Hooks (9) +```typescript +// All notifications +const { notifications } = useEventNotifications(); + +// Domain-specific notifications +const { notifications } = useProductionNotifications(); +const { notifications } = useInventoryNotifications(); +const { notifications } = useSupplyChainNotifications(); +const { notifications } = useOperationsNotifications(); + +// Type-specific notifications +const { notifications } = useBatchNotifications(); +const { notifications } = useDeliveryNotifications(); +const { notifications } = useOrchestrationNotifications(); + +// Generic domain filter +const { notifications } = useNotificationsByDomain('production'); +``` + +#### Recommendation Hooks (5) +```typescript +// All recommendations +const { recommendations } = useRecommendations(); + +// Type-specific recommendations +const { recommendations } = useDemandRecommendations(); +const { recommendations } = useInventoryOptimizationRecommendations(); +const { recommendations } = useCostReductionRecommendations(); + +// High confidence only +const { recommendations } = useHighConfidenceRecommendations(0.8); + +// Generic filters +const { recommendations } = useRecommendationsByDomain('forecasting'); +const { recommendations } = useRecommendationsByType('demand_surge'); +``` + +### 13.2 Base SSE Hook + +**`useSSE` Hook**: +```typescript +function useSSE(channels: string[]) { + const [events, setEvents] = useState([]); + const [isConnected, setIsConnected] = useState(false); + + useEffect(() => { + const eventSource = new EventSource( + `/api/events/sse?channels=${channels.join(',')}` + ); + + eventSource.onopen = () => setIsConnected(true); + + eventSource.onmessage = (event) => { + const data = JSON.parse(event.data); + setEvents(prev => [data, ...prev]); + }; + + eventSource.onerror = () => setIsConnected(false); + + return () => eventSource.close(); + }, [channels]); + + return { events, isConnected }; +} +``` + +### 13.3 TypeScript Definitions + +**Alert Type**: +```typescript +interface Alert { + id: string; + tenant_id: string; + alert_type: string; + type_class: AlertTypeClass; + service: string; + title: string; + message: string; + status: AlertStatus; + priority_score: number; + priority_level: PriorityLevel; + + // Enrichment + orchestrator_context?: OrchestratorContext; + business_impact?: BusinessImpact; + urgency_context?: UrgencyContext; + user_agency?: UserAgency; + trend_context?: TrendContext; + + // Actions + smart_actions?: SmartAction[]; + + // Metadata + alert_metadata?: Record; + created_at: string; + updated_at: string; + resolved_at?: string; +} + +enum AlertTypeClass { + ACTION_NEEDED = "action_needed", + PREVENTED_ISSUE = "prevented_issue", + TREND_WARNING = "trend_warning", + ESCALATION = "escalation", + INFORMATION = "information" +} + +enum PriorityLevel { + CRITICAL = "critical", + IMPORTANT = "important", + STANDARD = "standard", + INFO = "info" +} + +enum AlertStatus { + ACTIVE = "active", + ACKNOWLEDGED = "acknowledged", + IN_PROGRESS = "in_progress", + RESOLVED = "resolved", + DISMISSED = "dismissed", + SNOOZED = "snoozed" +} +``` + +### 13.4 Component Integration Examples + +**Action Queue Card**: +```typescript +function UnifiedActionQueueCard() { + const { alerts } = useAlerts({ + typeClass: ['action_needed', 'escalation'], + includeResolved: false + }); + + const groupedAlerts = useMemo(() => { + return groupByTimeCategory(alerts); + // Returns: { urgent: [...], today: [...], thisWeek: [...] } + }, [alerts]); + + return ( + +

Actions Needed

+ {groupedAlerts.urgent.length > 0 && ( + + )} + {groupedAlerts.today.length > 0 && ( + + )} +
+ ); +} +``` + +**Health Hero Component**: +```typescript +function GlanceableHealthHero() { + const { criticalAlerts } = useCriticalAlerts(); + const { notifications } = useEventNotifications(); + + const healthStatus = useMemo(() => { + if (criticalAlerts.length > 0) return 'red'; + if (hasUrgentNotifications(notifications)) return 'yellow'; + return 'green'; + }, [criticalAlerts, notifications]); + + return ( + + + {healthStatus === 'red' && ( + + )} + + ); +} +``` + +**Event-Driven Refetch**: +```typescript +function InventoryStats() { + const { data, refetch } = useInventoryStats(); + const { notifications } = useInventoryNotifications(); + + useEffect(() => { + const relevantEvent = notifications.find( + n => n.event_type === 'stock_received' + ); + + if (relevantEvent) { + refetch(); // Update stats on stock change + } + }, [notifications, refetch]); + + return ; +} +``` + +--- + +## 14. Redis Pub/Sub Architecture + +### 14.1 Channel Naming Convention + +**Pattern**: `tenant:{tenant_id}:{domain}.{event_type}` + +**Examples**: +``` +tenant:123e4567-e89b-12d3-a456-426614174000:inventory.alerts +tenant:123e4567-e89b-12d3-a456-426614174000:inventory.notifications +tenant:123e4567-e89b-12d3-a456-426614174000:production.alerts +tenant:123e4567-e89b-12d3-a456-426614174000:production.notifications +tenant:123e4567-e89b-12d3-a456-426614174000:supply_chain.alerts +tenant:123e4567-e89b-12d3-a456-426614174000:supply_chain.notifications +tenant:123e4567-e89b-12d3-a456-426614174000:operations.notifications +tenant:123e4567-e89b-12d3-a456-426614174000:recommendations +``` + +### 14.2 Domain-Based Routing + +**Alert Processor publishes to Redis**: +```python +def publish_to_redis(alert): + domain = alert.domain # inventory, production, etc. + channel = f"tenant:{alert.tenant_id}:{domain}.alerts" + + redis.publish(channel, json.dumps({ + "id": str(alert.id), + "alert_type": alert.alert_type, + "type_class": alert.type_class, + "priority_level": alert.priority_level, + "title": alert.title, + "message": alert.message, + # ... full alert data + })) +``` + +### 14.3 Gateway SSE Endpoint + +**Multi-Channel Subscription**: +```python +@app.get("/api/events/sse") +async def sse_endpoint( + channels: str, # Comma-separated: "inventory.alerts,production.alerts" + tenant_id: UUID = Depends(get_current_tenant) +): + async def event_stream(): + pubsub = redis.pubsub() + + # Subscribe to requested channels + for channel in channels.split(','): + full_channel = f"tenant:{tenant_id}:{channel}" + await pubsub.subscribe(full_channel) + + # Stream events + async for message in pubsub.listen(): + if message['type'] == 'message': + yield f"data: {message['data']}\n\n" + + return StreamingResponse( + event_stream(), + media_type="text/event-stream" + ) +``` + +**Wildcard Support**: +```typescript +// Frontend can subscribe to: +"*.alerts" // All alert channels +"inventory.*" // All inventory events +"*.notifications" // All notification channels +``` + +### 14.4 Traffic Reduction + +**Before (legacy)**: +- All pages subscribe to single `tenant:{id}:events` channel +- 100% of events sent to all pages +- High bandwidth, slow filtering + +**After (domain-based)**: +- Dashboard: Subscribes to `*.alerts`, `*.notifications`, `recommendations` +- Inventory page: Subscribes to `inventory.alerts`, `inventory.notifications` +- Production page: Subscribes to `production.alerts`, `production.notifications` + +**Traffic Reduction by Page**: +| Page | Old Traffic | New Traffic | Reduction | +|------|-------------|-------------|-----------| +| Dashboard | 100% | 100% | 0% (needs all) | +| Inventory | 100% | 15% | **85%** | +| Production | 100% | 20% | **80%** | +| Supply Chain | 100% | 18% | **82%** | + +**Average**: 70% reduction on specialized pages + +--- + +## 15. Database Schema + +### 15.1 Alerts Table + +```sql +CREATE TABLE alerts ( + -- Identity + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL, + + -- Classification + alert_type VARCHAR(100) NOT NULL, + type_class VARCHAR(50) NOT NULL, -- action_needed, prevented_issue, etc. + service VARCHAR(50) NOT NULL, + event_domain VARCHAR(50), -- Added in migration 20251125 + + -- Content + title VARCHAR(500) NOT NULL, + message TEXT NOT NULL, + + -- Status + status VARCHAR(50) NOT NULL DEFAULT 'active', + + -- Priority + priority_score INTEGER NOT NULL DEFAULT 50, + priority_level VARCHAR(50) NOT NULL DEFAULT 'standard', + + -- Enrichment Context (JSONB) + orchestrator_context JSONB, + business_impact JSONB, + urgency_context JSONB, + user_agency JSONB, + trend_context JSONB, + + -- Smart Actions + smart_actions JSONB, -- Array of action objects + + -- Timing + timing_decision VARCHAR(50), + scheduled_send_time TIMESTAMP, + + -- Escalation (Added in migration 20251123) + action_created_at TIMESTAMP, -- For age calculation + superseded_by_action_id UUID, -- Links to solving action + hidden_from_ui BOOLEAN DEFAULT FALSE, + + -- Metadata + alert_metadata JSONB, + + -- Timestamps + created_at TIMESTAMP NOT NULL DEFAULT NOW(), + updated_at TIMESTAMP NOT NULL DEFAULT NOW(), + resolved_at TIMESTAMP, + + -- Foreign Keys + FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE +); +``` + +### 15.2 Indexes + +```sql +-- Tenant filtering +CREATE INDEX idx_alerts_tenant_status +ON alerts(tenant_id, status); + +-- Priority sorting +CREATE INDEX idx_alerts_tenant_priority_created +ON alerts(tenant_id, priority_score DESC, created_at DESC); + +-- Type class filtering +CREATE INDEX idx_alerts_tenant_typeclass_status +ON alerts(tenant_id, type_class, status); + +-- Timing queries +CREATE INDEX idx_alerts_timing_scheduled +ON alerts(timing_decision, scheduled_send_time); + +-- Escalation queries (Added in migration 20251123) +CREATE INDEX idx_alerts_tenant_action_created +ON alerts(tenant_id, action_created_at); + +CREATE INDEX idx_alerts_superseded_by +ON alerts(superseded_by_action_id); + +CREATE INDEX idx_alerts_tenant_hidden_status +ON alerts(tenant_id, hidden_from_ui, status); + +-- Domain filtering (Added in migration 20251125) +CREATE INDEX idx_alerts_tenant_domain +ON alerts(tenant_id, event_domain); +``` + +### 15.3 Alert Interactions Table + +```sql +CREATE TABLE alert_interactions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL, + alert_id UUID NOT NULL, + user_id UUID NOT NULL, + + -- Interaction type + interaction_type VARCHAR(50) NOT NULL, -- view, acknowledge, action_taken, etc. + action_type VARCHAR(50), -- Smart action type if applicable + + -- Context + metadata JSONB, + response_time_seconds INTEGER, -- Time from alert creation to this interaction + + -- Timestamps + created_at TIMESTAMP NOT NULL DEFAULT NOW(), + + -- Foreign Keys + FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE, + FOREIGN KEY (alert_id) REFERENCES alerts(id) ON DELETE CASCADE, + FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE +); + +CREATE INDEX idx_interactions_alert ON alert_interactions(alert_id); +CREATE INDEX idx_interactions_tenant_user ON alert_interactions(tenant_id, user_id); +``` + +### 15.4 Notifications Table + +```sql +CREATE TABLE notifications ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL, + + -- Classification + event_type VARCHAR(100) NOT NULL, + event_domain VARCHAR(50) NOT NULL, + + -- Content + title VARCHAR(500) NOT NULL, + message TEXT NOT NULL, + + -- State change tracking + entity_type VARCHAR(50), -- "purchase_order", "batch", etc. + entity_id UUID, + old_state VARCHAR(50), + new_state VARCHAR(50), + + -- Display + placement_hint VARCHAR(50) DEFAULT 'notification_panel', + + -- Metadata + notification_metadata JSONB, + + -- Timestamps + created_at TIMESTAMP NOT NULL DEFAULT NOW(), + expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '7 days'), + + -- Foreign Keys + FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE +); + +CREATE INDEX idx_notifications_tenant_created +ON notifications(tenant_id, created_at DESC); + +CREATE INDEX idx_notifications_tenant_domain +ON notifications(tenant_id, event_domain); +``` + +### 15.5 Recommendations Table + +```sql +CREATE TABLE recommendations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL, + + -- Classification + recommendation_type VARCHAR(100) NOT NULL, + event_domain VARCHAR(50) NOT NULL, + + -- Content + title VARCHAR(500) NOT NULL, + message TEXT NOT NULL, + + -- Actions & Impact + suggested_actions JSONB, -- Array of suggested action types + estimated_impact TEXT, -- "Save €250/month" + confidence_score DECIMAL(3, 2), -- 0.00 - 1.00 + + -- Status + status VARCHAR(50) DEFAULT 'active', -- active, dismissed, implemented + + -- Metadata + recommendation_metadata JSONB, + + -- Timestamps + created_at TIMESTAMP NOT NULL DEFAULT NOW(), + expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '30 days'), + dismissed_at TIMESTAMP, + + -- Foreign Keys + FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE +); + +CREATE INDEX idx_recommendations_tenant_status +ON recommendations(tenant_id, status); + +CREATE INDEX idx_recommendations_tenant_domain +ON recommendations(tenant_id, event_domain); +``` + +### 15.6 Migrations + +**Key Migrations**: + +1. **20251015_1230_initial_schema.py** + - Created alerts, notifications, recommendations tables + - Initial indexes + - Full enrichment fields + +2. **20251123_add_alert_enhancements.py** + - Added `action_created_at` for escalation tracking + - Added `superseded_by_action_id` for chaining + - Added `hidden_from_ui` flag + - Created indexes for escalation queries + - Backfilled `action_created_at` for existing alerts + +3. **20251125_add_event_domain_column.py** + - Added `event_domain` to alerts table + - Added index on (tenant_id, event_domain) + - Populated domain from existing alert_type patterns + +--- + +## 16. Performance & Monitoring + +### 16.1 Performance Metrics + +**Processing Speed**: +- Alert enrichment: 500-800ms (full pipeline) +- Notification processing: 20-30ms (80% faster) +- Recommendation processing: 50-80ms (60% faster) +- Average improvement: 54% + +**Database Query Performance**: +- Get active alerts by tenant: <50ms +- Get critical alerts with priority sort: <100ms +- Escalation age calculation: <150ms +- Alert chaining lookup: <75ms + +**API Response Times**: +- GET /alerts (paginated): <200ms +- POST /alerts/{id}/acknowledge: <50ms +- POST /alerts/{id}/resolve: <100ms + +**SSE Traffic**: +- Legacy (single channel): 100% of events to all pages +- New (domain-based): 70% reduction on specialized pages +- Dashboard: No change (needs all events) +- Domain pages: 80-85% reduction + +### 16.2 Caching Strategy + +**Redis Cache Keys**: +``` +tenant:{tenant_id}:alerts:active +tenant:{tenant_id}:alerts:critical +tenant:{tenant_id}:orchestrator_context:{action_id} +``` + +**Cache Invalidation**: +- On alert creation: Invalidate `alerts:active` +- On priority update: Invalidate `alerts:critical` +- On escalation: Invalidate all alert caches +- On resolution: Invalidate both active and critical + +**TTL**: +- Alert lists: 5 minutes +- Orchestrator context: 15 minutes +- Deduplication keys: 15 minutes + +### 16.3 Monitoring Metrics + +**Prometheus Metrics**: +```python +# Alert creation rate +alert_created_total = Counter('alert_created_total', 'Total alerts created', ['tenant_id', 'alert_type']) + +# Enrichment timing +enrichment_duration_seconds = Histogram('enrichment_duration_seconds', 'Enrichment processing time', ['event_type']) + +# Priority distribution +alert_priority_distribution = Histogram('alert_priority_distribution', 'Alert priority scores', ['priority_level']) + +# Resolution metrics +alert_resolution_time_seconds = Histogram('alert_resolution_time_seconds', 'Time to resolve alerts', ['alert_type']) + +# Escalation tracking +alert_escalated_total = Counter('alert_escalated_total', 'Alerts escalated', ['escalation_reason']) + +# Deduplication hits +alert_deduplicated_total = Counter('alert_deduplicated_total', 'Alerts deduplicated', ['alert_type']) +``` + +**Key Metrics to Monitor**: +- Alert creation rate (per tenant, per type) +- Average resolution time (should decrease over time) +- Escalation rate (high rate indicates alerts being ignored) +- Deduplication hit rate (should be 10-20%) +- Enrichment performance (p50, p95, p99) +- SSE connection count and duration + +### 16.4 Health Checks + +**Alert Processor Health**: +```python +@app.get("/health") +async def health_check(): + checks = { + "database": await check_db_connection(), + "redis": await check_redis_connection(), + "rabbitmq": await check_rabbitmq_connection(), + "orchestrator_api": await check_orchestrator_api() + } + + overall_healthy = all(checks.values()) + status_code = 200 if overall_healthy else 503 + + return JSONResponse( + status_code=status_code, + content={ + "status": "healthy" if overall_healthy else "unhealthy", + "checks": checks, + "timestamp": datetime.utcnow().isoformat() + } + ) +``` + +**CronJob Monitoring**: +```yaml +# Kubernetes CronJob metrics +- Last successful run timestamp +- Last failed run timestamp +- Average execution duration +- Alert count processed per run +- Error count per run +``` + +### 16.5 Troubleshooting Guide + +**Problem**: Alerts not appearing in frontend + +**Diagnosis**: +1. Check alert created in database: `SELECT * FROM alerts WHERE tenant_id=... ORDER BY created_at DESC LIMIT 10;` +2. Check Redis pub/sub: `SUBSCRIBE tenant:{id}:inventory.alerts` +3. Check SSE connection: Browser dev tools → Network → EventStream +4. Check frontend hook subscription: Console logs + +**Problem**: Slow enrichment + +**Diagnosis**: +1. Check Prometheus metrics for `enrichment_duration_seconds` +2. Identify slow enrichment service (orchestrator, priority scoring, etc.) +3. Check orchestrator API response time +4. Review database query performance (EXPLAIN ANALYZE) + +**Problem**: High escalation rate + +**Diagnosis**: +1. Query alerts by age: `SELECT alert_type, COUNT(*) FROM alerts WHERE action_created_at < NOW() - INTERVAL '48 hours' GROUP BY alert_type;` +2. Check if certain alert types are consistently ignored +3. Review smart actions (are they actionable?) +4. Check user permissions (can users actually execute actions?) + +**Problem**: Duplicate alerts + +**Diagnosis**: +1. Check deduplication key generation logic +2. Verify Redis connection (dedup keys being set?) +3. Review deduplication window (15 minutes may be too short) +4. Check for race conditions in concurrent alert creation + +--- + +## 17. Deployment Guide + +### 17.1 5-Week Deployment Timeline + +**Week 1: Backend & Gateway** +- Day 1: Database migration in dev environment +- Day 2-3: Deploy alert processor with dual publishing +- Day 4: Deploy updated gateway +- Day 5: Monitoring & validation + +**Week 2-3: Frontend Integration** +- Dashboard components with event hooks +- Priority components (ActionQueue, HealthHero, ExecutionTracker) +- Domain pages (Inventory, Production, Supply Chain) + +**Week 4: Cutover** +- Verify complete migration +- Remove dual publishing +- Database cleanup (remove legacy columns) + +**Week 5: Optimization** +- Performance tuning +- Monitoring dashboards +- Alert rules refinement + +### 17.2 Pre-Deployment Checklist + +- ✅ Database migration scripts tested +- ✅ Backward compatibility verified +- ✅ Rollback procedure documented +- ✅ Monitoring metrics defined +- ✅ Performance benchmarks set +- ✅ Example integrations tested +- ✅ Documentation complete + +### 17.3 Rollback Procedure + +**If issues occur**: +1. Stop new alert processor deployment +2. Revert gateway to previous version +3. Roll back database migration (if safe) +4. Resume dual publishing if partially migrated +5. Investigate root cause +6. Fix and redeploy + +--- + +## Appendix + +### Related Documentation + +- [Frontend README](../frontend/README.md) - Frontend architecture and components +- [Alert Processor Service README](../services/alert_processor/README.md) - Service implementation details +- [Inventory Service README](../services/inventory/README.md) - Stock receipt system +- [Orchestrator Service README](../services/orchestrator/README.md) - Delivery tracking +- [Technical Documentation Summary](./TECHNICAL-DOCUMENTATION-SUMMARY.md) - System overview + +### Version History + +- **v2.0** (2025-11-25): Complete architecture with escalation, chaining, cronjobs +- **v1.5** (2025-11-23): Added stock receipt system and delivery tracking +- **v1.0** (2025-11-15): Initial three-tier enrichment system + +### Contributors + +This alert system was designed and implemented collaboratively to support the Bakery-IA platform's mission of providing intelligent, context-aware alerts that respect user time and decision-making agency. + +--- + +**Last Updated**: 2025-11-25 +**Status**: Production-Ready ✅ +**Next Review**: As needed based on system evolution diff --git a/frontend/README.md b/frontend/README.md index d7f64d5e..4a363933 100644 --- a/frontend/README.md +++ b/frontend/README.md @@ -18,10 +18,142 @@ The **Bakery-IA Frontend Dashboard** is a modern, responsive React-based web app ### Real-Time Operational Dashboard - **Live KPI Cards** - Real-time metrics for sales, inventory, production - **Alert Stream (SSE)** - Instant notifications for critical events +- **AI Impact Showcase** - Celebrate AI wins with handling rate and savings metrics +- **Prevented Issues Card** - Highlights problems AI automatically resolved - **Production Status** - Live view of current production batches - **Inventory Levels** - Color-coded stock levels with expiry warnings - **Order Pipeline** - Track customer orders from placement to fulfillment +### Enriched Alert System (NEW) +- **Multi-Dimensional Priority Scoring** - Intelligent 0-100 priority scores with 4 weighted factors + - Business Impact (40%): Financial consequences, affected orders + - Urgency (30%): Time sensitivity, deadlines + - User Agency (20%): Can you take action? + - AI Confidence (10%): Prediction certainty +- **Smart Alert Classification** - 5 alert types for clear user intent + - 🔴 ACTION_NEEDED: Requires your decision or action + - 🎉 PREVENTED_ISSUE: AI already handled (celebration!) + - 📊 TREND_WARNING: Pattern detected, early warning + - ⏱️ ESCALATION: Auto-action pending, can cancel + - ℹ️ INFORMATION: FYI only, no action needed +- **3-Tab Alert Hub** - Organized navigation (All Alerts / For Me / Archived) +- **Auto-Action Countdown** - Real-time timer for escalation alerts with one-click cancel +- **Priority Score Explainer** - Educational modal showing exact scoring formula +- **Trend Visualizations** - Inline sparklines and directional indicators for trend warnings +- **Action Consequence Previews** - See outcomes before taking action (financial impact, affected systems, reversibility) +- **Response Time Gamification** - Track your alert response performance by priority level with benchmarks +- **Email Digests** - Daily/weekly summaries with celebration-first messaging +- **Full Internationalization** - Complete translations (English, Spanish, Basque) + +### Panel de Control (Dashboard Redesign - NEW) +A comprehensive dashboard redesign focused on Jobs-To-Be-Done principles, progressive disclosure, and mobile-first UX. + +#### **New Dashboard Components** +- **GlanceableHealthHero** - Traffic light status system (🟢🟡🔴) + - Understand bakery health in 3 seconds (5 AM test) + - Collapsible checklist with progressive disclosure + - Shows urgent action count prominently + - Real-time SSE integration for critical alerts + - Mobile-optimized with large touch targets (44x44px minimum) +- **SetupWizardBlocker** - Full-page setup wizard + - Blocks dashboard access when <50% setup complete + - Step-by-step wizard interface with numbered steps + - Progress bar (0-100%) with completion indicators + - Clear CTAs for each configuration section + - Ensures critical data exists before AI can function +- **CollapsibleSetupBanner** - Compact reminder banner + - Appears when 50-99% setup complete + - Collapsible (default: collapsed) to minimize distraction + - Dismissible for 7 days via localStorage + - Shows remaining sections with item counts + - Links directly to incomplete sections +- **UnifiedActionQueueCard** - Consolidated action queue + - Time-based grouping (Urgent / Today / This Week) + - Smart actions with embedded delivery workflows + - Escalation badges show pending duration + - StockReceiptModal integration for delivery actions + - Real-time updates via SSE +- **ExecutionProgressTracker** - Plan vs actual tracking + - Visual progress bars for production, deliveries, approvals + - Shows what's on track vs behind schedule + - Business impact highlights (orders at risk) +- **IntelligentSystemSummaryCard** - AI insights dashboard + - Shows what AI has done and why + - Celebration-focused messaging for prevented issues + - Recommendations with confidence scores + +#### **Three-State Setup Flow Logic** +``` +Progress < 50% → SetupWizardBlocker (BLOCKS dashboard access) +Progress 50-99% → CollapsibleSetupBanner (REMINDS but allows access) +Progress 100% → Hidden (COMPLETE, no reminder) +``` + +**Setup Progress Calculation**: +- **Inventory**: Minimum 3 ingredients, recommended 10 +- **Suppliers**: Minimum 1 supplier, recommended 3 +- **Recipes**: Minimum 1 recipe, recommended 3 +- **Quality**: Optional, recommended 2 templates + +**Rationale**: Critical data (ingredients, suppliers, recipes) must exist for AI to function. Recommended data improves AI but isn't required. Progressive disclosure prevents overwhelming new users while reminding them of missing features. + +#### **Design Principles** +- **Glanceable First (5-Second Test)** - User should understand status in 3 seconds at 5 AM on phone +- **Mobile-First / One-Handed** - All critical actions in thumb zone, 44x44px min touch targets +- **Progressive Disclosure** - Show 20% that matters 80% of the time, hide complexity until requested +- **Outcome-Focused** - Show business impact ($€, time saved) not just features +- **Trust-Building** - Always show AI reasoning, escalation tracking, financial impact transparency + +#### **StockReceiptModal Integration Pattern** +Cross-component communication using CustomEvents for opening stock receipt modal from dashboard alerts: + +```typescript +// Emit from smartActionHandlers.ts +window.dispatchEvent(new CustomEvent('stock-receipt:open', { + detail: { + receipt_id?: string, + po_id: string, + tenant_id: string, + mode: 'create' | 'edit' + } +})); + +// Listen in UnifiedActionQueueCard.tsx +useEffect(() => { + const handler = (e: CustomEvent) => { + setStockReceiptData({ + isOpen: true, + receipt: e.detail + }); + }; + window.addEventListener('stock-receipt:open', handler); + return () => window.removeEventListener('stock-receipt:open', handler); +}, []); +``` + +**Workflow**: Delivery alerts (`DELIVERY_ARRIVING_SOON`, `STOCK_RECEIPT_INCOMPLETE`) trigger modal opening with PO context. User completes stock receipt with lot-level tracking and expiration dates. Confirmation triggers `delivery.received` event, auto-resolving related alerts. + +#### **Deleted Components (Cleanup Rationale)** +The dashboard redesign replaced or merged 7 legacy components: +- `HealthStatusCard.tsx` → Replaced by **GlanceableHealthHero** (traffic light system) +- `InsightsGrid.tsx` → Merged into **IntelligentSystemSummaryCard** +- `ProductionTimelineCard.tsx` → Replaced by **ExecutionProgressTracker** +- `ActionQueueCard.tsx` → Replaced by **UnifiedActionQueueCard** (time-based grouping) +- `ConfigurationProgressWidget.tsx` → Replaced by **SetupWizardBlocker** + **CollapsibleSetupBanner** +- `AlertContextActions.tsx` → Merged into Alert Hub +- `OrchestrationSummaryCard.tsx` → Merged into system summary + +**Net Impact**: Deleted ~1,200 lines of old code, added ~811 lines of new focused components, saved ~390 lines overall while improving UX. + +#### **Dashboard Layout Order** +1. **Setup Flow** - Blocker or banner (contextual) +2. **GlanceableHealthHero** - Traffic light status +3. **UnifiedActionQueueCard** - What needs attention +4. **ExecutionProgressTracker** - Plan vs actual +5. **AI Impact Showcase** - Celebration cards for prevented issues +6. **IntelligentSystemSummaryCard** - What AI did and why +7. **Quick Action Links** - Navigation shortcuts + ### Inventory Management - **Stock Overview** - All ingredients with current levels and locations - **Low Stock Alerts** - Automatic warnings when stock falls below thresholds