2120 lines
58 KiB
Markdown
2120 lines
58 KiB
Markdown
# Alert System Architecture
|
||
|
||
**Last Updated**: 2025-11-25
|
||
**Status**: Production-Ready
|
||
**Version**: 2.0
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Overview](#1-overview)
|
||
2. [Event Flow & Lifecycle](#2-event-flow--lifecycle)
|
||
3. [Three-Tier Enrichment Strategy](#3-three-tier-enrichment-strategy)
|
||
4. [Enrichment Process](#4-enrichment-process)
|
||
5. [Priority Scoring Algorithm](#5-priority-scoring-algorithm)
|
||
6. [Alert Types & Classification](#6-alert-types--classification)
|
||
7. [Smart Actions & User Agency](#7-smart-actions--user-agency)
|
||
8. [Alert Lifecycle & State Transitions](#8-alert-lifecycle--state-transitions)
|
||
9. [Escalation System](#9-escalation-system)
|
||
10. [Alert Chaining & Deduplication](#10-alert-chaining--deduplication)
|
||
11. [Cronjob Integration](#11-cronjob-integration)
|
||
12. [Service Integration Patterns](#12-service-integration-patterns)
|
||
13. [Frontend Integration](#13-frontend-integration)
|
||
14. [Redis Pub/Sub Architecture](#14-redis-pubsub-architecture)
|
||
15. [Database Schema](#15-database-schema)
|
||
16. [Performance & Monitoring](#16-performance--monitoring)
|
||
|
||
---
|
||
|
||
## 1. Overview
|
||
|
||
### 1.1 Philosophy
|
||
|
||
The Bakery-IA alert system transforms passive notifications into **context-aware, actionable guidance**. Every alert includes enrichment context, priority scoring, and suggested actions, enabling users to make informed decisions quickly.
|
||
|
||
**Core Principles**:
|
||
- **Alerts are not just notifications** - They're AI-enhanced action items
|
||
- **Context over noise** - Every alert includes business impact and suggested actions
|
||
- **Smart prioritization** - Multi-factor scoring ensures critical issues surface first
|
||
- **Progressive enhancement** - Different event types get appropriate enrichment levels
|
||
- **User agency** - System respects what users can actually control
|
||
|
||
### 1.2 Architecture Goals
|
||
|
||
✅ **Performance**: 80% faster notification processing, 70% less SSE traffic
|
||
✅ **Type Safety**: Complete TypeScript definitions matching backend
|
||
✅ **Developer Experience**: 18 specialized React hooks for different use cases
|
||
✅ **Production Ready**: Backward compatible, fully documented, deployment-ready
|
||
|
||
---
|
||
|
||
## 2. Event Flow & Lifecycle
|
||
|
||
### 2.1 Event Generation
|
||
|
||
Services detect issues via three patterns:
|
||
|
||
#### **Scheduled Background Jobs**
|
||
- Inventory service: Stock checks every 5-15 minutes
|
||
- Production service: Capacity checks every 10-45 minutes
|
||
- Forecasting service: Demand analysis (Friday 3 PM weekly)
|
||
|
||
#### **Event-Driven**
|
||
- RabbitMQ subscriptions to business events
|
||
- Example: Order created → Check stock availability → Emit low stock alert
|
||
|
||
#### **Database Triggers**
|
||
- Direct PostgreSQL notifications for critical state changes
|
||
- Example: Stock quantity falls below threshold → Immediate alert
|
||
|
||
### 2.2 Alert Publishing Flow
|
||
|
||
```
|
||
Service detects issue
|
||
↓
|
||
Validates against RawAlert schema (title, message, type, severity, metadata)
|
||
↓
|
||
Generates deduplication key (type + entity IDs)
|
||
↓
|
||
Checks Redis (prevent duplicates within 15-minute window)
|
||
↓
|
||
Publishes to RabbitMQ (alerts.exchange with routing key)
|
||
↓
|
||
Alert Processor consumes message
|
||
↓
|
||
Conditional enrichment based on event type
|
||
↓
|
||
Stores in PostgreSQL
|
||
↓
|
||
Publishes to Redis (domain-based channels)
|
||
↓
|
||
Gateway streams via SSE
|
||
↓
|
||
Frontend hooks receive and display
|
||
```
|
||
|
||
### 2.3 Complete Event Flow Diagram
|
||
|
||
```
|
||
Domain Service → RabbitMQ → Alert Processor → PostgreSQL → Redis → Gateway → Frontend
|
||
↓ ↓
|
||
Conditional Enrichment SSE Stream
|
||
- Alert: Full (500-800ms) - Domain filtered
|
||
- Notification: Fast (20-30ms) - Wildcard support
|
||
- Recommendation: Medium (50-80ms) - Real-time updates
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Three-Tier Enrichment Strategy
|
||
|
||
### 3.1 Tier 1: ALERTS (Full Enrichment)
|
||
|
||
**When**: Critical business events requiring user decisions
|
||
|
||
**Enrichment Pipeline** (7 steps):
|
||
1. Orchestrator Context Query
|
||
2. Business Impact Analysis
|
||
3. Urgency Assessment
|
||
4. User Agency Evaluation
|
||
5. Multi-Factor Priority Scoring
|
||
6. Timing Intelligence
|
||
7. Smart Action Generation
|
||
|
||
**Processing Time**: 500-800ms
|
||
**Database**: Full alert record with all enrichment fields
|
||
**TTL**: Indefinite (until resolved)
|
||
|
||
**Examples**:
|
||
- Low stock warning requiring PO approval
|
||
- Production delay affecting customer orders
|
||
- Equipment failure needing immediate attention
|
||
|
||
### 3.2 Tier 2: NOTIFICATIONS (Lightweight)
|
||
|
||
**When**: Informational state changes
|
||
|
||
**Enrichment**:
|
||
- Format title/message
|
||
- Set placement hint
|
||
- Assign domain
|
||
- **No priority scoring**
|
||
- **No orchestrator queries**
|
||
|
||
**Processing Time**: 20-30ms (80% faster than alerts)
|
||
**Database**: Minimal notification record
|
||
**TTL**: 7 days (automatic cleanup)
|
||
|
||
**Examples**:
|
||
- Stock received confirmation
|
||
- Batch completed notification
|
||
- PO sent to supplier
|
||
|
||
### 3.3 Tier 3: RECOMMENDATIONS (Moderate)
|
||
|
||
**When**: AI suggestions for optimization
|
||
|
||
**Enrichment**:
|
||
- Light priority scoring (info level by default)
|
||
- Confidence assessment
|
||
- Estimated impact calculation
|
||
- **No orchestrator context**
|
||
- Dismissible by users
|
||
|
||
**Processing Time**: 50-80ms
|
||
**Database**: Recommendation record with impact fields
|
||
**TTL**: 30 days or until dismissed
|
||
|
||
**Examples**:
|
||
- Demand surge prediction
|
||
- Inventory optimization suggestion
|
||
- Cost reduction opportunity
|
||
|
||
### 3.4 Performance Comparison
|
||
|
||
| Event Class | Old | New | Improvement |
|
||
|-------------|-----|-----|-------------|
|
||
| Alert | 200-300ms | 500-800ms | Baseline (more enrichment) |
|
||
| Notification | 200-300ms | 20-30ms | **80% faster** |
|
||
| Recommendation | 200-300ms | 50-80ms | **60% faster** |
|
||
|
||
**Overall**: 54% average improvement due to selective enrichment
|
||
|
||
---
|
||
|
||
## 4. Enrichment Process
|
||
|
||
### 4.1 Orchestrator Context Enrichment
|
||
|
||
**Purpose**: Determine if AI has already addressed the alert
|
||
|
||
**Service**: `orchestrator_client.py`
|
||
|
||
**Query**: Daily Orchestrator microservice for related actions
|
||
|
||
**Questions Answered**:
|
||
- Has AI already created a purchase order for this low stock?
|
||
- What's the PO ID and current status?
|
||
- When will the delivery arrive?
|
||
- What's the estimated cost savings?
|
||
|
||
**Response Fields**:
|
||
```python
|
||
{
|
||
"already_addressed": bool,
|
||
"action_type": "purchase_order" | "production_batch" | "schedule_adjustment",
|
||
"action_id": str, # e.g., "PO-12345"
|
||
"action_status": "pending_approval" | "approved" | "in_progress",
|
||
"delivery_date": datetime,
|
||
"estimated_savings_eur": Decimal
|
||
}
|
||
```
|
||
|
||
**Caching**: Results cached to avoid redundant queries
|
||
|
||
### 4.2 Business Impact Analysis
|
||
|
||
**Service**: `context_enrichment.py`
|
||
|
||
**Dimensions Analyzed**:
|
||
|
||
#### Financial Impact
|
||
```python
|
||
financial_impact_eur: Decimal
|
||
# Calculation examples:
|
||
# - Low stock: lost_sales = out_of_stock_days × avg_daily_revenue_per_product
|
||
# - Production delay: penalty_fees + rush_order_costs
|
||
# - Equipment failure: repair_cost + lost_production_value
|
||
```
|
||
|
||
#### Customer Impact
|
||
```python
|
||
affected_customers: List[str] # Customer names
|
||
affected_orders: int # Count of at-risk orders
|
||
customer_satisfaction_impact: "low" | "medium" | "high"
|
||
# Based on order priority, customer tier, delay duration
|
||
```
|
||
|
||
#### Operational Impact
|
||
```python
|
||
production_batches_at_risk: List[str] # Batch IDs
|
||
waste_risk_kg: Decimal # Spoilage or overproduction
|
||
equipment_downtime_hours: Decimal
|
||
```
|
||
|
||
### 4.3 Urgency Context
|
||
|
||
**Fields**:
|
||
```python
|
||
deadline: datetime # When consequences occur
|
||
time_until_consequence_hours: Decimal # Countdown
|
||
can_wait_until_tomorrow: bool # For overnight batch processing
|
||
auto_action_countdown_seconds: int # For escalation alerts
|
||
```
|
||
|
||
**Urgency Scoring**:
|
||
- \>48h until consequence: Low urgency (20 points)
|
||
- 24-48h: Medium urgency (50 points)
|
||
- 6-24h: High urgency (80 points)
|
||
- <6h: Critical urgency (100 points)
|
||
|
||
### 4.4 User Agency Assessment
|
||
|
||
**Purpose**: Determine what user can actually do
|
||
|
||
**Fields**:
|
||
```python
|
||
can_user_fix: bool # Can user resolve this directly?
|
||
requires_external_party: bool # Need supplier/customer action?
|
||
external_party_name: str # "Supplier Inc."
|
||
external_party_contact: str # "+34-123-456-789"
|
||
blockers: List[str] # What prevents immediate action
|
||
```
|
||
|
||
**User Agency Scoring**:
|
||
- Can fix directly: 80 points
|
||
- Requires external party: 50 points
|
||
- Has blockers: -30 penalty
|
||
- No control: 20 points
|
||
|
||
### 4.5 Trend Context (for trend_warning alerts)
|
||
|
||
**Fields**:
|
||
```python
|
||
metric_name: str # "weekend_demand"
|
||
current_value: Decimal # 450
|
||
baseline_value: Decimal # 300
|
||
change_percentage: Decimal # 50
|
||
direction: "increasing" | "decreasing" | "volatile"
|
||
significance: "low" | "medium" | "high"
|
||
period_days: int # 7
|
||
possible_causes: List[str] # ["Holiday weekend", "Promotion"]
|
||
```
|
||
|
||
### 4.6 Timing Intelligence
|
||
|
||
**Service**: `timing_intelligence.py`
|
||
|
||
**Delivery Method Decisions**:
|
||
|
||
```python
|
||
def decide_timing(alert):
|
||
if priority >= 90: # Critical
|
||
return "SEND_NOW" # Immediate push notification
|
||
|
||
if is_business_hours() and priority >= 70:
|
||
return "SEND_NOW" # Important during work hours
|
||
|
||
if is_night_hours() and priority < 90:
|
||
return "SCHEDULE_LATER" # Queue for 8 AM
|
||
|
||
if priority < 50:
|
||
return "BATCH_FOR_DIGEST" # Daily summary email
|
||
```
|
||
|
||
**Considerations**:
|
||
- Priority level
|
||
- Business hours (8 AM - 8 PM)
|
||
- User preferences (digest settings)
|
||
- Alert type (action_needed vs informational)
|
||
|
||
### 4.7 Smart Actions Generation
|
||
|
||
**Service**: `context_enrichment.py`
|
||
|
||
**Action Structure**:
|
||
```typescript
|
||
{
|
||
label: string, // "Approve Purchase Order"
|
||
type: SmartActionType, // approve_po
|
||
variant: "primary" | "secondary" | "tertiary",
|
||
metadata: object, // Context for action handler
|
||
disabled: boolean, // Based on user permissions/state
|
||
estimated_time_minutes: number, // How long action takes
|
||
consequence: string // "Order will be placed immediately"
|
||
}
|
||
```
|
||
|
||
**Action Examples by Alert Type**:
|
||
|
||
**Low Stock Alert**:
|
||
```javascript
|
||
[
|
||
{
|
||
label: "Approve Purchase Order",
|
||
type: "approve_po",
|
||
variant: "primary",
|
||
metadata: { po_id: "PO-12345", amount: 1500.00 }
|
||
},
|
||
{
|
||
label: "Contact Supplier",
|
||
type: "call_supplier",
|
||
variant: "secondary",
|
||
metadata: { supplier_contact: "+34-123-456-789" }
|
||
}
|
||
]
|
||
```
|
||
|
||
**Production Delay Alert**:
|
||
```javascript
|
||
[
|
||
{
|
||
label: "Adjust Schedule",
|
||
type: "reschedule_production",
|
||
variant: "primary",
|
||
metadata: { batch_id: "BATCH-001", delay_minutes: 30 }
|
||
},
|
||
{
|
||
label: "Notify Customer",
|
||
type: "send_notification",
|
||
variant: "secondary",
|
||
metadata: { customer_id: "CUST-456" }
|
||
}
|
||
]
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Priority Scoring Algorithm
|
||
|
||
### 5.1 Multi-Factor Weighted Scoring
|
||
|
||
**Formula**:
|
||
```
|
||
Priority Score (0-100) =
|
||
(Business_Impact × 0.40) +
|
||
(Urgency × 0.30) +
|
||
(User_Agency × 0.20) +
|
||
(Confidence × 0.10)
|
||
```
|
||
|
||
### 5.2 Business Impact Score (40% weight)
|
||
|
||
**Financial Impact**:
|
||
- ≤€50: 20 points
|
||
- €50-200: 40 points
|
||
- €200-500: 60 points
|
||
- \>€500: 100 points
|
||
|
||
**Customer Impact**:
|
||
- 1 affected customer: 30 points
|
||
- 2-5 customers: 50 points
|
||
- 5+ customers: 100 points
|
||
|
||
**Operational Impact**:
|
||
- 1 order at risk: 30 points
|
||
- 2-10 orders: 60 points
|
||
- 10+ orders: 100 points
|
||
|
||
**Weighted Average**:
|
||
```python
|
||
business_impact_score = (
|
||
financial_score * 0.5 +
|
||
customer_score * 0.3 +
|
||
operational_score * 0.2
|
||
)
|
||
```
|
||
|
||
### 5.3 Urgency Score (30% weight)
|
||
|
||
**Time Until Consequence**:
|
||
- \>48 hours: 20 points
|
||
- 24-48 hours: 50 points
|
||
- 6-24 hours: 80 points
|
||
- <6 hours: 100 points
|
||
|
||
**Deadline Approaching Bonus**:
|
||
- Within 24h of deadline: +30 points
|
||
- Within 6h of deadline: +50 points (capped at 100)
|
||
|
||
### 5.4 User Agency Score (20% weight)
|
||
|
||
**Base Score**:
|
||
- Can user fix directly: 80 points
|
||
- Requires coordination: 50 points
|
||
- No control: 20 points
|
||
|
||
**Modifiers**:
|
||
- Has external party contact: +20 bonus
|
||
- Requires supplier action: -20 penalty
|
||
- Has known blockers: -30 penalty
|
||
|
||
### 5.5 Confidence Score (10% weight)
|
||
|
||
**Data Quality Assessment**:
|
||
- High confidence (complete data): 100 points
|
||
- Medium confidence (some assumptions): 70 points
|
||
- Low confidence (many unknowns): 40 points
|
||
|
||
### 5.6 Priority Levels
|
||
|
||
**Mapping**:
|
||
- **CRITICAL** (90-100): Immediate action required, high business impact
|
||
- **IMPORTANT** (70-89): Action needed today, moderate impact
|
||
- **STANDARD** (50-69): Action recommended this week
|
||
- **INFO** (0-49): Informational, no urgency
|
||
|
||
---
|
||
|
||
## 6. Alert Types & Classification
|
||
|
||
### 6.1 Alert Type Classes
|
||
|
||
**ACTION_NEEDED** (~70% of alerts):
|
||
- User decision required
|
||
- Appears in action queue
|
||
- Has deadline
|
||
- Examples: Low stock, pending PO approval, equipment failure
|
||
|
||
**PREVENTED_ISSUE** (~10% of alerts):
|
||
- AI already handled the problem
|
||
- Positive framing: "I prevented X by doing Y"
|
||
- User awareness only, no action needed
|
||
- Examples: "Stock shortage prevented by auto-PO"
|
||
|
||
**TREND_WARNING** (~15% of alerts):
|
||
- Proactive insight about emerging patterns
|
||
- Gives user time to prepare
|
||
- May become action_needed if ignored
|
||
- Examples: "Demand trending up 35% this week"
|
||
|
||
**ESCALATION** (~3% of alerts):
|
||
- Time-sensitive with auto-action countdown
|
||
- System will act automatically if user doesn't
|
||
- Countdown timer shown prominently
|
||
- Examples: "Critical stock, auto-ordering in 2 hours"
|
||
|
||
**INFORMATION** (~2% of alerts):
|
||
- FYI only, no action expected
|
||
- Low priority
|
||
- Often batched for digest emails
|
||
- Examples: "Production batch completed"
|
||
|
||
### 6.2 Event Domains
|
||
|
||
- **inventory**: Stock levels, expiration, movements
|
||
- **production**: Batches, capacity, equipment
|
||
- **procurement**: Purchase orders, deliveries, suppliers
|
||
- **forecasting**: Demand predictions, trends
|
||
- **orders**: Customer orders, fulfillment
|
||
- **orchestrator**: AI-driven automation actions
|
||
- **delivery**: Delivery tracking, receipt
|
||
- **sales**: Sales analytics, patterns
|
||
|
||
### 6.3 Alert Type Catalog (40+ types)
|
||
|
||
#### Inventory Domain
|
||
```
|
||
critical_stock_shortage (action_needed, critical)
|
||
low_stock_warning (action_needed, important)
|
||
expired_products (action_needed, critical)
|
||
stock_depleted_by_order (information, standard)
|
||
stock_received (notification, info)
|
||
stock_movement (notification, info)
|
||
```
|
||
|
||
#### Production Domain
|
||
```
|
||
production_delay (action_needed, important)
|
||
equipment_failure (action_needed, critical)
|
||
capacity_overload (action_needed, important)
|
||
quality_control_failure (action_needed, critical)
|
||
batch_state_changed (notification, info)
|
||
batch_completed (notification, info)
|
||
```
|
||
|
||
#### Procurement Domain
|
||
```
|
||
po_approval_needed (action_needed, important)
|
||
po_approval_escalation (escalation, critical)
|
||
delivery_overdue (action_needed, critical)
|
||
po_approved (notification, info)
|
||
po_sent (notification, info)
|
||
delivery_scheduled (notification, info)
|
||
delivery_received (notification, info)
|
||
```
|
||
|
||
#### Delivery Tracking
|
||
```
|
||
delivery_scheduled (information, info)
|
||
delivery_arriving_soon (action_needed, important)
|
||
delivery_overdue (action_needed, critical)
|
||
stock_receipt_incomplete (action_needed, important)
|
||
```
|
||
|
||
#### Forecasting Domain
|
||
```
|
||
demand_surge_predicted (trend_warning, important)
|
||
weekend_demand_surge (trend_warning, standard)
|
||
weather_impact_forecast (trend_warning, standard)
|
||
holiday_preparation (trend_warning, important)
|
||
```
|
||
|
||
#### Operations Domain
|
||
```
|
||
orchestration_run_started (notification, info)
|
||
orchestration_run_completed (notification, info)
|
||
action_created (notification, info)
|
||
```
|
||
|
||
### 6.4 Placement Hints
|
||
|
||
**Where alerts appear**:
|
||
- `ACTION_QUEUE`: Dashboard action section (action_needed)
|
||
- `NOTIFICATION_PANEL`: Bell icon dropdown (notifications)
|
||
- `DASHBOARD_INLINE`: Embedded in relevant page section
|
||
- `TOAST`: Immediate popup (critical alerts)
|
||
- `EMAIL_DIGEST`: End-of-day summary email
|
||
|
||
---
|
||
|
||
## 7. Smart Actions & User Agency
|
||
|
||
### 7.1 Action Types
|
||
|
||
**Complete Enumeration**:
|
||
```python
|
||
class SmartActionType(str, Enum):
|
||
# Procurement
|
||
APPROVE_PO = "approve_po"
|
||
REJECT_PO = "reject_po"
|
||
MODIFY_PO = "modify_po"
|
||
CALL_SUPPLIER = "call_supplier"
|
||
|
||
# Production
|
||
START_PRODUCTION_BATCH = "start_production_batch"
|
||
RESCHEDULE_PRODUCTION = "reschedule_production"
|
||
HALT_PRODUCTION = "halt_production"
|
||
|
||
# Inventory
|
||
MARK_DELIVERY_RECEIVED = "mark_delivery_received"
|
||
COMPLETE_STOCK_RECEIPT = "complete_stock_receipt"
|
||
ADJUST_STOCK_MANUALLY = "adjust_stock_manually"
|
||
|
||
# Customer Service
|
||
NOTIFY_CUSTOMER = "notify_customer"
|
||
CANCEL_ORDER = "cancel_order"
|
||
ADJUST_DELIVERY_DATE = "adjust_delivery_date"
|
||
|
||
# System
|
||
SNOOZE_ALERT = "snooze_alert"
|
||
DISMISS_ALERT = "dismiss_alert"
|
||
ESCALATE_TO_MANAGER = "escalate_to_manager"
|
||
```
|
||
|
||
### 7.2 Action Lifecycle
|
||
|
||
**1. Generation** (enrichment stage):
|
||
- Service context: What's possible in this situation?
|
||
- User agency: Can user execute this action?
|
||
- Permissions: Does user have required role?
|
||
- Conditional rendering: Disable if prerequisites not met
|
||
|
||
**2. Display** (frontend):
|
||
- Primary action highlighted (most recommended)
|
||
- Secondary actions offered (alternatives)
|
||
- Disabled actions shown with reason tooltip
|
||
- Consequence preview on hover
|
||
|
||
**3. Execution** (API call):
|
||
- Handler routes by action type
|
||
- Executes business logic (PO approval, schedule change, etc.)
|
||
- Creates audit trail
|
||
- Emits follow-up events/notifications
|
||
- May create new alerts
|
||
|
||
**4. Escalation** (if unacted):
|
||
- 24h: Alert priority boosted
|
||
- 48h: Type changed to escalation
|
||
- 72h: Priority boosted further, countdown timer shown
|
||
- System may auto-execute if configured
|
||
|
||
### 7.3 Consequence Preview
|
||
|
||
**Purpose**: Build trust by showing impact before action
|
||
|
||
**Example**:
|
||
```typescript
|
||
{
|
||
action: "approve_po",
|
||
consequence: {
|
||
immediate: "Order will be sent to supplier within 5 minutes",
|
||
timing: "Delivery expected in 2-3 business days",
|
||
cost: "€1,250.00 will be added to monthly expenses",
|
||
impact: "Resolves low stock for 3 ingredients affecting 8 orders"
|
||
}
|
||
}
|
||
```
|
||
|
||
**Display**:
|
||
- Shown on hover or in confirmation modal
|
||
- Highlights positive outcomes (orders fulfilled)
|
||
- Notes financial impact (€ amount)
|
||
- Clarifies timing (when effect occurs)
|
||
|
||
---
|
||
|
||
## 8. Alert Lifecycle & State Transitions
|
||
|
||
### 8.1 Alert States
|
||
|
||
```
|
||
Created → Active
|
||
↓
|
||
├─→ Acknowledged (user saw it)
|
||
├─→ In Progress (user taking action)
|
||
├─→ Resolved (action completed)
|
||
├─→ Dismissed (user chose to ignore)
|
||
└─→ Snoozed (remind me later)
|
||
```
|
||
|
||
### 8.2 State Transitions
|
||
|
||
**Created → Active**:
|
||
- Automatic on creation
|
||
- Appears in relevant UI sections based on placement hints
|
||
|
||
**Active → Acknowledged**:
|
||
- User clicks alert or views action queue
|
||
- Tracked for analytics (response time)
|
||
|
||
**Acknowledged → In Progress**:
|
||
- User starts working on resolution
|
||
- May set estimated completion time
|
||
|
||
**In Progress → Resolved**:
|
||
- Smart action executed successfully
|
||
- Or user manually marks as resolved
|
||
- `resolved_at` timestamp set
|
||
|
||
**Active → Dismissed**:
|
||
- User chooses not to act
|
||
- May require dismissal reason (for audit)
|
||
|
||
**Active → Snoozed**:
|
||
- User requests reminder later (e.g., in 1 hour, tomorrow morning)
|
||
- Returns to Active at scheduled time
|
||
|
||
### 8.3 Key Fields
|
||
|
||
**Lifecycle Tracking**:
|
||
```python
|
||
status: AlertStatus # Current state
|
||
created_at: datetime # When alert was created
|
||
acknowledged_at: datetime # When user first viewed
|
||
resolved_at: datetime # When action completed
|
||
action_created_at: datetime # For escalation age calculation
|
||
```
|
||
|
||
**Interaction Tracking**:
|
||
```python
|
||
interactions: List[AlertInteraction] # All user interactions
|
||
last_interaction_at: datetime # Most recent interaction
|
||
response_time_seconds: int # Time to first action
|
||
resolution_time_seconds: int # Time to resolution
|
||
```
|
||
|
||
### 8.4 Alert Interactions
|
||
|
||
**Tracked Events**:
|
||
- `view`: User viewed alert
|
||
- `acknowledge`: User acknowledged alert
|
||
- `action_taken`: User executed smart action
|
||
- `snooze`: User snoozed alert
|
||
- `dismiss`: User dismissed alert
|
||
- `resolve`: User resolved alert
|
||
|
||
**Interaction Record**:
|
||
```python
|
||
class AlertInteraction(Base):
|
||
id: UUID
|
||
tenant_id: UUID
|
||
alert_id: UUID
|
||
user_id: UUID
|
||
interaction_type: InteractionType
|
||
action_type: Optional[SmartActionType]
|
||
metadata: dict # Context of interaction
|
||
created_at: datetime
|
||
```
|
||
|
||
**Analytics Usage**:
|
||
- Measure alert effectiveness (% resolved)
|
||
- Track response times (how quickly users act)
|
||
- Identify ignored alerts (high dismiss rate)
|
||
- Optimize smart action suggestions
|
||
|
||
---
|
||
|
||
## 9. Escalation System
|
||
|
||
### 9.1 Time-Based Escalation
|
||
|
||
**Purpose**: Prevent action fatigue and ensure critical alerts don't age
|
||
|
||
**Escalation Rules**:
|
||
```python
|
||
# Applied hourly to action_needed alerts
|
||
|
||
if alert.status == "active" and alert.type_class == "action_needed":
|
||
age_hours = (now - alert.action_created_at).hours
|
||
|
||
escalation_boost = 0
|
||
|
||
# Age-based escalation
|
||
if age_hours > 72:
|
||
escalation_boost = 20
|
||
elif age_hours > 48:
|
||
escalation_boost = 10
|
||
|
||
# Deadline-based escalation
|
||
if alert.deadline:
|
||
hours_to_deadline = (alert.deadline - now).hours
|
||
if hours_to_deadline < 6:
|
||
escalation_boost = max(escalation_boost, 30)
|
||
elif hours_to_deadline < 24:
|
||
escalation_boost = max(escalation_boost, 15)
|
||
|
||
# Skip if already critical
|
||
if alert.priority_score >= 90:
|
||
escalation_boost = 0
|
||
|
||
# Apply boost (capped at +30)
|
||
alert.priority_score += min(escalation_boost, 30)
|
||
alert.priority_level = calculate_level(alert.priority_score)
|
||
```
|
||
|
||
### 9.2 Escalation Cronjob
|
||
|
||
**Schedule**: Every hour at :15 (`:15 * * * *`)
|
||
|
||
**Configuration**:
|
||
```yaml
|
||
alert-priority-recalculation-cronjob:
|
||
schedule: "15 * * * *"
|
||
resources:
|
||
memory: 256Mi
|
||
cpu: 100m
|
||
timeout: 30 minutes
|
||
concurrency: Forbid
|
||
batch_size: 50
|
||
```
|
||
|
||
**Processing Logic**:
|
||
1. Query all `action_needed` alerts with `status=active`
|
||
2. Batch process (50 alerts at a time)
|
||
3. Calculate escalation boost for each
|
||
4. Update `priority_score` and `priority_level`
|
||
5. Add `escalation_metadata` (boost amount, reason)
|
||
6. Invalidate Redis cache (`tenant:{id}:alerts:*`)
|
||
7. Log escalation events for analytics
|
||
|
||
### 9.3 Escalation Metadata
|
||
|
||
**Stored in enrichment_context**:
|
||
```json
|
||
{
|
||
"escalation": {
|
||
"applied_at": "2025-11-25T15:00:00Z",
|
||
"boost_amount": 20,
|
||
"reason": "pending_72h",
|
||
"previous_score": 65,
|
||
"new_score": 85,
|
||
"previous_level": "standard",
|
||
"new_level": "important"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 9.4 Escalation to Auto-Action
|
||
|
||
**When**:
|
||
- Alert >72h old
|
||
- Priority ≥90 (critical)
|
||
- Has auto-action configured
|
||
|
||
**Process**:
|
||
```python
|
||
if age_hours > 72 and priority_score >= 90:
|
||
alert.type_class = "escalation"
|
||
alert.auto_action_countdown_seconds = 7200 # 2 hours
|
||
alert.auto_action_type = determine_auto_action(alert)
|
||
alert.auto_action_metadata = {...}
|
||
```
|
||
|
||
**Frontend Display**:
|
||
- Shows countdown timer: "Auto-approving PO in 1h 23m"
|
||
- Primary action becomes "Cancel Auto-Action"
|
||
- User can cancel or let system proceed
|
||
|
||
---
|
||
|
||
## 10. Alert Chaining & Deduplication
|
||
|
||
### 10.1 Deduplication Strategy
|
||
|
||
**Purpose**: Prevent alert spam when same issue detected multiple times
|
||
|
||
**Deduplication Key**:
|
||
```python
|
||
def generate_dedup_key(tenant_id, alert_type, entity_ids):
|
||
key_parts = [alert_type]
|
||
|
||
# Add entity identifiers
|
||
if product_id:
|
||
key_parts.append(f"product:{product_id}")
|
||
if supplier_id:
|
||
key_parts.append(f"supplier:{supplier_id}")
|
||
if batch_id:
|
||
key_parts.append(f"batch:{batch_id}")
|
||
|
||
key = ":".join(key_parts)
|
||
return f"{tenant_id}:alert:{key}"
|
||
```
|
||
|
||
**Redis Check**:
|
||
```python
|
||
dedup_key = generate_dedup_key(...)
|
||
if redis.exists(dedup_key):
|
||
return # Skip, alert already exists
|
||
else:
|
||
redis.setex(dedup_key, 900, "1") # 15-minute window
|
||
create_alert(...)
|
||
```
|
||
|
||
### 10.2 Alert Chaining
|
||
|
||
**Purpose**: Link related alerts to tell coherent story
|
||
|
||
**Database Fields** (added in migration 20251123):
|
||
```python
|
||
action_created_at: datetime # Original creation time (for age)
|
||
superseded_by_action_id: UUID # Links to solving action
|
||
hidden_from_ui: bool # Hide superseded alerts
|
||
```
|
||
|
||
### 10.3 Chaining Methods
|
||
|
||
**1. Mark as Superseded**:
|
||
```python
|
||
def mark_alert_as_superseded(alert_id, solving_action_id):
|
||
alert = db.query(Alert).filter(Alert.id == alert_id).first()
|
||
alert.superseded_by_action_id = solving_action_id
|
||
alert.hidden_from_ui = True
|
||
alert.updated_at = now()
|
||
db.commit()
|
||
|
||
# Invalidate cache
|
||
redis.delete(f"tenant:{alert.tenant_id}:alerts:*")
|
||
```
|
||
|
||
**2. Create Combined Alert**:
|
||
```python
|
||
def create_combined_alert(original_alert, solving_action):
|
||
# Create new prevented_issue alert
|
||
combined_alert = Alert(
|
||
tenant_id=original_alert.tenant_id,
|
||
alert_type="prevented_issue",
|
||
type_class="prevented_issue",
|
||
title=f"Stock shortage prevented",
|
||
message=f"I detected low stock for {product_name} and created "
|
||
f"PO-{po_number} automatically. Order will arrive in 2 days.",
|
||
priority_level="info",
|
||
metadata={
|
||
"original_alert_id": str(original_alert.id),
|
||
"solving_action_id": str(solving_action.id),
|
||
"problem": original_alert.message,
|
||
"solution": solving_action.description
|
||
}
|
||
)
|
||
db.add(combined_alert)
|
||
db.commit()
|
||
|
||
# Mark original as superseded
|
||
mark_alert_as_superseded(original_alert.id, combined_alert.id)
|
||
```
|
||
|
||
**3. Find Related Alerts**:
|
||
```python
|
||
def find_related_alert(tenant_id, alert_type, product_id):
|
||
return db.query(Alert).filter(
|
||
Alert.tenant_id == tenant_id,
|
||
Alert.alert_type == alert_type,
|
||
Alert.metadata['product_id'].astext == product_id,
|
||
Alert.created_at > now() - timedelta(hours=24),
|
||
Alert.hidden_from_ui == False
|
||
).first()
|
||
```
|
||
|
||
**4. Filter Hidden Alerts**:
|
||
```python
|
||
def get_active_alerts(tenant_id):
|
||
return db.query(Alert).filter(
|
||
Alert.tenant_id == tenant_id,
|
||
Alert.status.in_(["active", "acknowledged"]),
|
||
Alert.hidden_from_ui == False # Exclude superseded alerts
|
||
).all()
|
||
```
|
||
|
||
### 10.4 Chaining Example Flow
|
||
|
||
```
|
||
Step 1: Low stock detected
|
||
→ Create LOW_STOCK alert (action_needed, priority: 75)
|
||
→ User sees "Low stock for flour, action needed"
|
||
|
||
Step 2: Daily Orchestrator runs
|
||
→ Finds LOW_STOCK alert
|
||
→ Creates purchase order automatically
|
||
→ PO-12345 created with delivery date
|
||
|
||
Step 3: Orchestrator chains alerts
|
||
→ Calls mark_alert_as_superseded(low_stock_alert.id, po.id)
|
||
→ Creates PREVENTED_ISSUE alert
|
||
→ Message: "I prevented flour shortage by creating PO-12345.
|
||
Delivery arrives Nov 28. Approve or modify if needed."
|
||
|
||
Step 4: User sees only prevented_issue alert
|
||
→ Original low stock alert hidden from UI
|
||
→ User understands: problem detected → AI acted → needs approval
|
||
→ Single coherent narrative, not 3 separate alerts
|
||
```
|
||
|
||
---
|
||
|
||
## 11. Cronjob Integration
|
||
|
||
### 11.1 Why CronJobs Are Needed
|
||
|
||
**Event System Cannot**:
|
||
- Emit events "2 hours before delivery"
|
||
- Detect "alert is now 48 hours old"
|
||
- Poll external state (procurement PO status)
|
||
|
||
**CronJobs Excel At**:
|
||
- Time-based conditions
|
||
- Periodic checks
|
||
- Predictive alerts
|
||
- Batch recalculations
|
||
|
||
### 11.2 Delivery Tracking CronJob
|
||
|
||
**Schedule**: Every hour at :30 (`:30 * * * *`)
|
||
|
||
**Configuration**:
|
||
```yaml
|
||
delivery-tracking-cronjob:
|
||
schedule: "30 * * * *"
|
||
resources:
|
||
memory: 256Mi
|
||
cpu: 100m
|
||
timeout: 30 minutes
|
||
concurrency: Forbid
|
||
```
|
||
|
||
**Service**: `DeliveryTrackingService` in Orchestrator
|
||
|
||
**Processing Flow**:
|
||
```python
|
||
def check_expected_deliveries():
|
||
# Query procurement service for expected deliveries
|
||
deliveries = procurement_api.get_expected_deliveries(
|
||
from_date=now(),
|
||
to_date=now() + timedelta(days=3)
|
||
)
|
||
|
||
for delivery in deliveries:
|
||
current_time = now()
|
||
expected_time = delivery.expected_delivery_datetime
|
||
window_start = delivery.delivery_window_start
|
||
window_end = delivery.delivery_window_end
|
||
|
||
# T-2h: Arriving soon alert
|
||
if current_time >= (window_start - timedelta(hours=2)) and \
|
||
current_time < window_start:
|
||
send_arriving_soon_alert(delivery)
|
||
|
||
# T+30min: Overdue alert
|
||
elif current_time > (window_end + timedelta(minutes=30)) and \
|
||
not delivery.marked_received:
|
||
send_overdue_alert(delivery)
|
||
|
||
# Window passed, not received: Incomplete alert
|
||
elif current_time > (window_end + timedelta(hours=2)) and \
|
||
not delivery.marked_received and \
|
||
not delivery.stock_receipt_id:
|
||
send_receipt_incomplete_alert(delivery)
|
||
```
|
||
|
||
**Alert Types Generated**:
|
||
|
||
1. **DELIVERY_ARRIVING_SOON** (T-2h):
|
||
```python
|
||
{
|
||
"alert_type": "delivery_arriving_soon",
|
||
"type_class": "action_needed",
|
||
"priority_level": "important",
|
||
"placement": "action_queue",
|
||
"smart_actions": [
|
||
{
|
||
"type": "mark_delivery_received",
|
||
"label": "Mark as Received",
|
||
"variant": "primary"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
2. **DELIVERY_OVERDUE** (T+30min):
|
||
```python
|
||
{
|
||
"alert_type": "delivery_overdue",
|
||
"type_class": "action_needed",
|
||
"priority_level": "critical",
|
||
"priority_score": 95,
|
||
"smart_actions": [
|
||
{
|
||
"type": "call_supplier",
|
||
"label": "Call Supplier",
|
||
"metadata": {
|
||
"supplier_contact": "+34-123-456-789"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
3. **STOCK_RECEIPT_INCOMPLETE** (Post-window):
|
||
```python
|
||
{
|
||
"alert_type": "stock_receipt_incomplete",
|
||
"type_class": "action_needed",
|
||
"priority_level": "important",
|
||
"priority_score": 80,
|
||
"smart_actions": [
|
||
{
|
||
"type": "complete_stock_receipt",
|
||
"label": "Complete Stock Receipt",
|
||
"metadata": {
|
||
"po_id": "...",
|
||
"draft_receipt_id": "..."
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### 11.3 Delivery Alert Lifecycle
|
||
|
||
```
|
||
PO Approved
|
||
↓
|
||
DELIVERY_SCHEDULED (informational, notification_panel)
|
||
↓ T-2 hours
|
||
DELIVERY_ARRIVING_SOON (action_needed, action_queue)
|
||
↓ Expected time + 30 min
|
||
DELIVERY_OVERDUE (critical, action_queue + toast)
|
||
↓ Window passed + 2 hours
|
||
STOCK_RECEIPT_INCOMPLETE (important, action_queue)
|
||
```
|
||
|
||
### 11.4 Priority Recalculation CronJob
|
||
|
||
See [Section 9.2](#92-escalation-cronjob) for details.
|
||
|
||
### 11.5 Decision Matrix: Events vs CronJobs
|
||
|
||
| Feature | Event System | CronJob | Best Choice |
|
||
|---------|--------------|---------|-------------|
|
||
| State change notification | ✅ Excellent | ❌ Poor | Event System |
|
||
| Time-based alerts | ❌ Complex | ✅ Simple | CronJob ✅ |
|
||
| Real-time updates | ✅ Instant | ❌ Delayed | Event System |
|
||
| Predictive alerts | ❌ Hard | ✅ Easy | CronJob ✅ |
|
||
| Priority escalation | ❌ Complex | ✅ Natural | CronJob ✅ |
|
||
| Deadline tracking | ❌ Complex | ✅ Simple | CronJob ✅ |
|
||
| Batch processing | ❌ Not designed | ✅ Ideal | CronJob ✅ |
|
||
|
||
---
|
||
|
||
## 12. Service Integration Patterns
|
||
|
||
### 12.1 Base Alert Service
|
||
|
||
**All services extend**: `BaseAlertService` from `shared/alerts/base_service.py`
|
||
|
||
**Core Method**:
|
||
```python
|
||
async def publish_item(
|
||
self,
|
||
tenant_id: UUID,
|
||
item_data: dict,
|
||
item_type: ItemType = ItemType.ALERT
|
||
):
|
||
# Validate schema
|
||
validated_item = validate_item(item_data, item_type)
|
||
|
||
# Generate deduplication key
|
||
dedup_key = self.generate_dedup_key(tenant_id, validated_item)
|
||
|
||
# Check Redis for duplicates (15-minute window)
|
||
if await self.redis.exists(dedup_key):
|
||
logger.info(f"Skipping duplicate {item_type}: {dedup_key}")
|
||
return
|
||
|
||
# Publish to RabbitMQ
|
||
await self.rabbitmq.publish(
|
||
exchange="alerts.exchange",
|
||
routing_key=f"{item_type}.{validated_item['severity']}",
|
||
message={
|
||
"tenant_id": str(tenant_id),
|
||
"item_type": item_type,
|
||
"data": validated_item
|
||
}
|
||
)
|
||
|
||
# Set deduplication key
|
||
await self.redis.setex(dedup_key, 900, "1") # 15 minutes
|
||
```
|
||
|
||
### 12.2 Inventory Service
|
||
|
||
**Service Class**: `InventoryAlertService`
|
||
|
||
**Background Jobs**:
|
||
```python
|
||
# Check stock levels every 5 minutes
|
||
@scheduler.scheduled_job('interval', minutes=5)
|
||
async def check_stock_levels():
|
||
service = InventoryAlertService()
|
||
critical_items = await service.find_critical_stock()
|
||
|
||
for item in critical_items:
|
||
await service.publish_item(
|
||
tenant_id=item.tenant_id,
|
||
item_data={
|
||
"type": "critical_stock_shortage",
|
||
"severity": "high",
|
||
"title": f"Critical: {item.name} stock depleted",
|
||
"message": f"Only {item.current_stock}{item.unit} remaining. "
|
||
f"Required: {item.minimum_stock}{item.unit}",
|
||
"actions": ["approve_po", "call_supplier"],
|
||
"metadata": {
|
||
"ingredient_id": str(item.id),
|
||
"current_stock": item.current_stock,
|
||
"minimum_stock": item.minimum_stock,
|
||
"unit": item.unit
|
||
}
|
||
},
|
||
item_type=ItemType.ALERT
|
||
)
|
||
|
||
# Check expiring products every 2 hours
|
||
@scheduler.scheduled_job('interval', hours=2)
|
||
async def check_expiring_products():
|
||
# Similar pattern...
|
||
```
|
||
|
||
**Event-Driven Alerts**:
|
||
```python
|
||
# Listen to order events
|
||
@event_handler("order.created")
|
||
async def on_order_created(event):
|
||
service = InventoryAlertService()
|
||
order = event.data
|
||
|
||
# Check if order depletes stock below threshold
|
||
for item in order.items:
|
||
stock_after_order = calculate_remaining_stock(item)
|
||
|
||
if stock_after_order < item.minimum_stock:
|
||
await service.publish_item(
|
||
tenant_id=order.tenant_id,
|
||
item_data={
|
||
"type": "stock_depleted_by_order",
|
||
"severity": "medium",
|
||
# ... details
|
||
},
|
||
item_type=ItemType.ALERT
|
||
)
|
||
```
|
||
|
||
**Recommendations**:
|
||
```python
|
||
async def analyze_inventory_optimization():
|
||
# Analyze stock patterns
|
||
# Generate optimization recommendations
|
||
await service.publish_item(
|
||
tenant_id=tenant_id,
|
||
item_data={
|
||
"type": "inventory_optimization",
|
||
"title": "Reduce waste by adjusting par levels",
|
||
"suggested_actions": ["adjust_par_levels"],
|
||
"estimated_impact": "Save €250/month",
|
||
"confidence_score": 0.85
|
||
},
|
||
item_type=ItemType.RECOMMENDATION
|
||
)
|
||
```
|
||
|
||
### 12.3 Production Service
|
||
|
||
**Service Class**: `ProductionAlertService`
|
||
|
||
**Background Jobs**:
|
||
```python
|
||
@scheduler.scheduled_job('interval', minutes=15)
|
||
async def check_production_capacity():
|
||
# Check if scheduled batches exceed capacity
|
||
# Emit capacity_overload alerts
|
||
|
||
@scheduler.scheduled_job('interval', minutes=10)
|
||
async def check_production_delays():
|
||
# Check batches behind schedule
|
||
# Emit production_delay alerts
|
||
```
|
||
|
||
**Event-Driven**:
|
||
```python
|
||
@event_handler("equipment.status_changed")
|
||
async def on_equipment_failure(event):
|
||
if event.data.status == "failed":
|
||
await service.publish_item(
|
||
item_data={
|
||
"type": "equipment_failure",
|
||
"severity": "high",
|
||
"priority_score": 95, # Manual override
|
||
# ...
|
||
}
|
||
)
|
||
```
|
||
|
||
### 12.4 Forecasting Service
|
||
|
||
**Service Class**: `ForecastingRecommendationService`
|
||
|
||
**Scheduled Analysis**:
|
||
```python
|
||
@scheduler.scheduled_job('cron', day_of_week='fri', hour=15)
|
||
async def check_weekend_demand_surge():
|
||
forecast = await get_weekend_forecast()
|
||
|
||
if forecast.predicted_demand > (forecast.baseline * 1.3):
|
||
await service.publish_item(
|
||
item_data={
|
||
"type": "demand_surge_weekend",
|
||
"title": "Weekend demand surge predicted",
|
||
"message": f"Demand trending up {forecast.increase_pct}%. "
|
||
f"Consider increasing production.",
|
||
"suggested_actions": ["increase_production"],
|
||
"confidence_score": forecast.confidence
|
||
},
|
||
item_type=ItemType.RECOMMENDATION
|
||
)
|
||
```
|
||
|
||
### 12.5 Procurement Service
|
||
|
||
**Service Class**: `ProcurementEventService` (mixed alerts + notifications)
|
||
|
||
**Event-Driven**:
|
||
```python
|
||
@event_handler("po.created")
|
||
async def on_po_created(event):
|
||
po = event.data
|
||
|
||
if po.amount > APPROVAL_THRESHOLD:
|
||
# Emit alert requiring approval
|
||
await service.publish_item(
|
||
item_data={
|
||
"type": "po_approval_needed",
|
||
"severity": "medium",
|
||
# ...
|
||
},
|
||
item_type=ItemType.ALERT
|
||
)
|
||
else:
|
||
# Emit notification (auto-approved)
|
||
await service.publish_item(
|
||
item_data={
|
||
"type": "po_approved",
|
||
"message": f"PO-{po.number} auto-approved (€{po.amount})",
|
||
"old_state": "draft",
|
||
"new_state": "approved"
|
||
},
|
||
item_type=ItemType.NOTIFICATION
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
## 13. Frontend Integration
|
||
|
||
### 13.1 React Hooks Catalog (18 hooks)
|
||
|
||
#### Alert Hooks (4)
|
||
```typescript
|
||
// Subscribe to all critical alerts
|
||
const { alerts, criticalAlerts, isLoading } = useAlerts({
|
||
domains: ['inventory', 'production'],
|
||
minPriority: 'important'
|
||
});
|
||
|
||
// Critical alerts only
|
||
const { criticalAlerts } = useCriticalAlerts();
|
||
|
||
// Action-needed alerts only
|
||
const { alerts } = useActionNeededAlerts();
|
||
|
||
// Domain-specific alerts
|
||
const { alerts } = useAlertsByDomain('inventory');
|
||
```
|
||
|
||
#### Notification Hooks (9)
|
||
```typescript
|
||
// All notifications
|
||
const { notifications } = useEventNotifications();
|
||
|
||
// Domain-specific notifications
|
||
const { notifications } = useProductionNotifications();
|
||
const { notifications } = useInventoryNotifications();
|
||
const { notifications } = useSupplyChainNotifications();
|
||
const { notifications } = useOperationsNotifications();
|
||
|
||
// Type-specific notifications
|
||
const { notifications } = useBatchNotifications();
|
||
const { notifications } = useDeliveryNotifications();
|
||
const { notifications } = useOrchestrationNotifications();
|
||
|
||
// Generic domain filter
|
||
const { notifications } = useNotificationsByDomain('production');
|
||
```
|
||
|
||
#### Recommendation Hooks (5)
|
||
```typescript
|
||
// All recommendations
|
||
const { recommendations } = useRecommendations();
|
||
|
||
// Type-specific recommendations
|
||
const { recommendations } = useDemandRecommendations();
|
||
const { recommendations } = useInventoryOptimizationRecommendations();
|
||
const { recommendations } = useCostReductionRecommendations();
|
||
|
||
// High confidence only
|
||
const { recommendations } = useHighConfidenceRecommendations(0.8);
|
||
|
||
// Generic filters
|
||
const { recommendations } = useRecommendationsByDomain('forecasting');
|
||
const { recommendations } = useRecommendationsByType('demand_surge');
|
||
```
|
||
|
||
### 13.2 Base SSE Hook
|
||
|
||
**`useSSE` Hook**:
|
||
```typescript
|
||
function useSSE(channels: string[]) {
|
||
const [events, setEvents] = useState<Event[]>([]);
|
||
const [isConnected, setIsConnected] = useState(false);
|
||
|
||
useEffect(() => {
|
||
const eventSource = new EventSource(
|
||
`/api/events/sse?channels=${channels.join(',')}`
|
||
);
|
||
|
||
eventSource.onopen = () => setIsConnected(true);
|
||
|
||
eventSource.onmessage = (event) => {
|
||
const data = JSON.parse(event.data);
|
||
setEvents(prev => [data, ...prev]);
|
||
};
|
||
|
||
eventSource.onerror = () => setIsConnected(false);
|
||
|
||
return () => eventSource.close();
|
||
}, [channels]);
|
||
|
||
return { events, isConnected };
|
||
}
|
||
```
|
||
|
||
### 13.3 TypeScript Definitions
|
||
|
||
**Alert Type**:
|
||
```typescript
|
||
interface Alert {
|
||
id: string;
|
||
tenant_id: string;
|
||
alert_type: string;
|
||
type_class: AlertTypeClass;
|
||
service: string;
|
||
title: string;
|
||
message: string;
|
||
status: AlertStatus;
|
||
priority_score: number;
|
||
priority_level: PriorityLevel;
|
||
|
||
// Enrichment
|
||
orchestrator_context?: OrchestratorContext;
|
||
business_impact?: BusinessImpact;
|
||
urgency_context?: UrgencyContext;
|
||
user_agency?: UserAgency;
|
||
trend_context?: TrendContext;
|
||
|
||
// Actions
|
||
smart_actions?: SmartAction[];
|
||
|
||
// Metadata
|
||
alert_metadata?: Record<string, any>;
|
||
created_at: string;
|
||
updated_at: string;
|
||
resolved_at?: string;
|
||
}
|
||
|
||
enum AlertTypeClass {
|
||
ACTION_NEEDED = "action_needed",
|
||
PREVENTED_ISSUE = "prevented_issue",
|
||
TREND_WARNING = "trend_warning",
|
||
ESCALATION = "escalation",
|
||
INFORMATION = "information"
|
||
}
|
||
|
||
enum PriorityLevel {
|
||
CRITICAL = "critical",
|
||
IMPORTANT = "important",
|
||
STANDARD = "standard",
|
||
INFO = "info"
|
||
}
|
||
|
||
enum AlertStatus {
|
||
ACTIVE = "active",
|
||
ACKNOWLEDGED = "acknowledged",
|
||
IN_PROGRESS = "in_progress",
|
||
RESOLVED = "resolved",
|
||
DISMISSED = "dismissed",
|
||
SNOOZED = "snoozed"
|
||
}
|
||
```
|
||
|
||
### 13.4 Component Integration Examples
|
||
|
||
**Action Queue Card**:
|
||
```typescript
|
||
function UnifiedActionQueueCard() {
|
||
const { alerts } = useAlerts({
|
||
typeClass: ['action_needed', 'escalation'],
|
||
includeResolved: false
|
||
});
|
||
|
||
const groupedAlerts = useMemo(() => {
|
||
return groupByTimeCategory(alerts);
|
||
// Returns: { urgent: [...], today: [...], thisWeek: [...] }
|
||
}, [alerts]);
|
||
|
||
return (
|
||
<Card>
|
||
<h2>Actions Needed</h2>
|
||
{groupedAlerts.urgent.length > 0 && (
|
||
<UrgentSection alerts={groupedAlerts.urgent} />
|
||
)}
|
||
{groupedAlerts.today.length > 0 && (
|
||
<TodaySection alerts={groupedAlerts.today} />
|
||
)}
|
||
</Card>
|
||
);
|
||
}
|
||
```
|
||
|
||
**Health Hero Component**:
|
||
```typescript
|
||
function GlanceableHealthHero() {
|
||
const { criticalAlerts } = useCriticalAlerts();
|
||
const { notifications } = useEventNotifications();
|
||
|
||
const healthStatus = useMemo(() => {
|
||
if (criticalAlerts.length > 0) return 'red';
|
||
if (hasUrgentNotifications(notifications)) return 'yellow';
|
||
return 'green';
|
||
}, [criticalAlerts, notifications]);
|
||
|
||
return (
|
||
<Card>
|
||
<StatusIndicator color={healthStatus} />
|
||
{healthStatus === 'red' && (
|
||
<UrgentBadge count={criticalAlerts.length} />
|
||
)}
|
||
</Card>
|
||
);
|
||
}
|
||
```
|
||
|
||
**Event-Driven Refetch**:
|
||
```typescript
|
||
function InventoryStats() {
|
||
const { data, refetch } = useInventoryStats();
|
||
const { notifications } = useInventoryNotifications();
|
||
|
||
useEffect(() => {
|
||
const relevantEvent = notifications.find(
|
||
n => n.event_type === 'stock_received'
|
||
);
|
||
|
||
if (relevantEvent) {
|
||
refetch(); // Update stats on stock change
|
||
}
|
||
}, [notifications, refetch]);
|
||
|
||
return <StatsCard data={data} />;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 14. Redis Pub/Sub Architecture
|
||
|
||
### 14.1 Channel Naming Convention
|
||
|
||
**Pattern**: `tenant:{tenant_id}:{domain}.{event_type}`
|
||
|
||
**Examples**:
|
||
```
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:inventory.alerts
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:inventory.notifications
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:production.alerts
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:production.notifications
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:supply_chain.alerts
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:supply_chain.notifications
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:operations.notifications
|
||
tenant:123e4567-e89b-12d3-a456-426614174000:recommendations
|
||
```
|
||
|
||
### 14.2 Domain-Based Routing
|
||
|
||
**Alert Processor publishes to Redis**:
|
||
```python
|
||
def publish_to_redis(alert):
|
||
domain = alert.domain # inventory, production, etc.
|
||
channel = f"tenant:{alert.tenant_id}:{domain}.alerts"
|
||
|
||
redis.publish(channel, json.dumps({
|
||
"id": str(alert.id),
|
||
"alert_type": alert.alert_type,
|
||
"type_class": alert.type_class,
|
||
"priority_level": alert.priority_level,
|
||
"title": alert.title,
|
||
"message": alert.message,
|
||
# ... full alert data
|
||
}))
|
||
```
|
||
|
||
### 14.3 Gateway SSE Endpoint
|
||
|
||
**Multi-Channel Subscription**:
|
||
```python
|
||
@app.get("/api/events/sse")
|
||
async def sse_endpoint(
|
||
channels: str, # Comma-separated: "inventory.alerts,production.alerts"
|
||
tenant_id: UUID = Depends(get_current_tenant)
|
||
):
|
||
async def event_stream():
|
||
pubsub = redis.pubsub()
|
||
|
||
# Subscribe to requested channels
|
||
for channel in channels.split(','):
|
||
full_channel = f"tenant:{tenant_id}:{channel}"
|
||
await pubsub.subscribe(full_channel)
|
||
|
||
# Stream events
|
||
async for message in pubsub.listen():
|
||
if message['type'] == 'message':
|
||
yield f"data: {message['data']}\n\n"
|
||
|
||
return StreamingResponse(
|
||
event_stream(),
|
||
media_type="text/event-stream"
|
||
)
|
||
```
|
||
|
||
**Wildcard Support**:
|
||
```typescript
|
||
// Frontend can subscribe to:
|
||
"*.alerts" // All alert channels
|
||
"inventory.*" // All inventory events
|
||
"*.notifications" // All notification channels
|
||
```
|
||
|
||
### 14.4 Traffic Reduction
|
||
|
||
**Before (legacy)**:
|
||
- All pages subscribe to single `tenant:{id}:events` channel
|
||
- 100% of events sent to all pages
|
||
- High bandwidth, slow filtering
|
||
|
||
**After (domain-based)**:
|
||
- Dashboard: Subscribes to `*.alerts`, `*.notifications`, `recommendations`
|
||
- Inventory page: Subscribes to `inventory.alerts`, `inventory.notifications`
|
||
- Production page: Subscribes to `production.alerts`, `production.notifications`
|
||
|
||
**Traffic Reduction by Page**:
|
||
| Page | Old Traffic | New Traffic | Reduction |
|
||
|------|-------------|-------------|-----------|
|
||
| Dashboard | 100% | 100% | 0% (needs all) |
|
||
| Inventory | 100% | 15% | **85%** |
|
||
| Production | 100% | 20% | **80%** |
|
||
| Supply Chain | 100% | 18% | **82%** |
|
||
|
||
**Average**: 70% reduction on specialized pages
|
||
|
||
---
|
||
|
||
## 15. Database Schema
|
||
|
||
### 15.1 Alerts Table
|
||
|
||
```sql
|
||
CREATE TABLE alerts (
|
||
-- Identity
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
tenant_id UUID NOT NULL,
|
||
|
||
-- Classification
|
||
alert_type VARCHAR(100) NOT NULL,
|
||
type_class VARCHAR(50) NOT NULL, -- action_needed, prevented_issue, etc.
|
||
service VARCHAR(50) NOT NULL,
|
||
event_domain VARCHAR(50), -- Added in migration 20251125
|
||
|
||
-- Content
|
||
title VARCHAR(500) NOT NULL,
|
||
message TEXT NOT NULL,
|
||
|
||
-- Status
|
||
status VARCHAR(50) NOT NULL DEFAULT 'active',
|
||
|
||
-- Priority
|
||
priority_score INTEGER NOT NULL DEFAULT 50,
|
||
priority_level VARCHAR(50) NOT NULL DEFAULT 'standard',
|
||
|
||
-- Enrichment Context (JSONB)
|
||
orchestrator_context JSONB,
|
||
business_impact JSONB,
|
||
urgency_context JSONB,
|
||
user_agency JSONB,
|
||
trend_context JSONB,
|
||
|
||
-- Smart Actions
|
||
smart_actions JSONB, -- Array of action objects
|
||
|
||
-- Timing
|
||
timing_decision VARCHAR(50),
|
||
scheduled_send_time TIMESTAMP,
|
||
|
||
-- Escalation (Added in migration 20251123)
|
||
action_created_at TIMESTAMP, -- For age calculation
|
||
superseded_by_action_id UUID, -- Links to solving action
|
||
hidden_from_ui BOOLEAN DEFAULT FALSE,
|
||
|
||
-- Metadata
|
||
alert_metadata JSONB,
|
||
|
||
-- Timestamps
|
||
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
||
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
||
resolved_at TIMESTAMP,
|
||
|
||
-- Foreign Keys
|
||
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE
|
||
);
|
||
```
|
||
|
||
### 15.2 Indexes
|
||
|
||
```sql
|
||
-- Tenant filtering
|
||
CREATE INDEX idx_alerts_tenant_status
|
||
ON alerts(tenant_id, status);
|
||
|
||
-- Priority sorting
|
||
CREATE INDEX idx_alerts_tenant_priority_created
|
||
ON alerts(tenant_id, priority_score DESC, created_at DESC);
|
||
|
||
-- Type class filtering
|
||
CREATE INDEX idx_alerts_tenant_typeclass_status
|
||
ON alerts(tenant_id, type_class, status);
|
||
|
||
-- Timing queries
|
||
CREATE INDEX idx_alerts_timing_scheduled
|
||
ON alerts(timing_decision, scheduled_send_time);
|
||
|
||
-- Escalation queries (Added in migration 20251123)
|
||
CREATE INDEX idx_alerts_tenant_action_created
|
||
ON alerts(tenant_id, action_created_at);
|
||
|
||
CREATE INDEX idx_alerts_superseded_by
|
||
ON alerts(superseded_by_action_id);
|
||
|
||
CREATE INDEX idx_alerts_tenant_hidden_status
|
||
ON alerts(tenant_id, hidden_from_ui, status);
|
||
|
||
-- Domain filtering (Added in migration 20251125)
|
||
CREATE INDEX idx_alerts_tenant_domain
|
||
ON alerts(tenant_id, event_domain);
|
||
```
|
||
|
||
### 15.3 Alert Interactions Table
|
||
|
||
```sql
|
||
CREATE TABLE alert_interactions (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
tenant_id UUID NOT NULL,
|
||
alert_id UUID NOT NULL,
|
||
user_id UUID NOT NULL,
|
||
|
||
-- Interaction type
|
||
interaction_type VARCHAR(50) NOT NULL, -- view, acknowledge, action_taken, etc.
|
||
action_type VARCHAR(50), -- Smart action type if applicable
|
||
|
||
-- Context
|
||
metadata JSONB,
|
||
response_time_seconds INTEGER, -- Time from alert creation to this interaction
|
||
|
||
-- Timestamps
|
||
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
||
|
||
-- Foreign Keys
|
||
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE,
|
||
FOREIGN KEY (alert_id) REFERENCES alerts(id) ON DELETE CASCADE,
|
||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE INDEX idx_interactions_alert ON alert_interactions(alert_id);
|
||
CREATE INDEX idx_interactions_tenant_user ON alert_interactions(tenant_id, user_id);
|
||
```
|
||
|
||
### 15.4 Notifications Table
|
||
|
||
```sql
|
||
CREATE TABLE notifications (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
tenant_id UUID NOT NULL,
|
||
|
||
-- Classification
|
||
event_type VARCHAR(100) NOT NULL,
|
||
event_domain VARCHAR(50) NOT NULL,
|
||
|
||
-- Content
|
||
title VARCHAR(500) NOT NULL,
|
||
message TEXT NOT NULL,
|
||
|
||
-- State change tracking
|
||
entity_type VARCHAR(50), -- "purchase_order", "batch", etc.
|
||
entity_id UUID,
|
||
old_state VARCHAR(50),
|
||
new_state VARCHAR(50),
|
||
|
||
-- Display
|
||
placement_hint VARCHAR(50) DEFAULT 'notification_panel',
|
||
|
||
-- Metadata
|
||
notification_metadata JSONB,
|
||
|
||
-- Timestamps
|
||
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
||
expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '7 days'),
|
||
|
||
-- Foreign Keys
|
||
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE INDEX idx_notifications_tenant_created
|
||
ON notifications(tenant_id, created_at DESC);
|
||
|
||
CREATE INDEX idx_notifications_tenant_domain
|
||
ON notifications(tenant_id, event_domain);
|
||
```
|
||
|
||
### 15.5 Recommendations Table
|
||
|
||
```sql
|
||
CREATE TABLE recommendations (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
tenant_id UUID NOT NULL,
|
||
|
||
-- Classification
|
||
recommendation_type VARCHAR(100) NOT NULL,
|
||
event_domain VARCHAR(50) NOT NULL,
|
||
|
||
-- Content
|
||
title VARCHAR(500) NOT NULL,
|
||
message TEXT NOT NULL,
|
||
|
||
-- Actions & Impact
|
||
suggested_actions JSONB, -- Array of suggested action types
|
||
estimated_impact TEXT, -- "Save €250/month"
|
||
confidence_score DECIMAL(3, 2), -- 0.00 - 1.00
|
||
|
||
-- Status
|
||
status VARCHAR(50) DEFAULT 'active', -- active, dismissed, implemented
|
||
|
||
-- Metadata
|
||
recommendation_metadata JSONB,
|
||
|
||
-- Timestamps
|
||
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
||
expires_at TIMESTAMP DEFAULT (NOW() + INTERVAL '30 days'),
|
||
dismissed_at TIMESTAMP,
|
||
|
||
-- Foreign Keys
|
||
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE INDEX idx_recommendations_tenant_status
|
||
ON recommendations(tenant_id, status);
|
||
|
||
CREATE INDEX idx_recommendations_tenant_domain
|
||
ON recommendations(tenant_id, event_domain);
|
||
```
|
||
|
||
### 15.6 Migrations
|
||
|
||
**Key Migrations**:
|
||
|
||
1. **20251015_1230_initial_schema.py**
|
||
- Created alerts, notifications, recommendations tables
|
||
- Initial indexes
|
||
- Full enrichment fields
|
||
|
||
2. **20251123_add_alert_enhancements.py**
|
||
- Added `action_created_at` for escalation tracking
|
||
- Added `superseded_by_action_id` for chaining
|
||
- Added `hidden_from_ui` flag
|
||
- Created indexes for escalation queries
|
||
- Backfilled `action_created_at` for existing alerts
|
||
|
||
3. **20251125_add_event_domain_column.py**
|
||
- Added `event_domain` to alerts table
|
||
- Added index on (tenant_id, event_domain)
|
||
- Populated domain from existing alert_type patterns
|
||
|
||
---
|
||
|
||
## 16. Performance & Monitoring
|
||
|
||
### 16.1 Performance Metrics
|
||
|
||
**Processing Speed**:
|
||
- Alert enrichment: 500-800ms (full pipeline)
|
||
- Notification processing: 20-30ms (80% faster)
|
||
- Recommendation processing: 50-80ms (60% faster)
|
||
- Average improvement: 54%
|
||
|
||
**Database Query Performance**:
|
||
- Get active alerts by tenant: <50ms
|
||
- Get critical alerts with priority sort: <100ms
|
||
- Escalation age calculation: <150ms
|
||
- Alert chaining lookup: <75ms
|
||
|
||
**API Response Times**:
|
||
- GET /alerts (paginated): <200ms
|
||
- POST /alerts/{id}/acknowledge: <50ms
|
||
- POST /alerts/{id}/resolve: <100ms
|
||
|
||
**SSE Traffic**:
|
||
- Legacy (single channel): 100% of events to all pages
|
||
- New (domain-based): 70% reduction on specialized pages
|
||
- Dashboard: No change (needs all events)
|
||
- Domain pages: 80-85% reduction
|
||
|
||
### 16.2 Caching Strategy
|
||
|
||
**Redis Cache Keys**:
|
||
```
|
||
tenant:{tenant_id}:alerts:active
|
||
tenant:{tenant_id}:alerts:critical
|
||
tenant:{tenant_id}:orchestrator_context:{action_id}
|
||
```
|
||
|
||
**Cache Invalidation**:
|
||
- On alert creation: Invalidate `alerts:active`
|
||
- On priority update: Invalidate `alerts:critical`
|
||
- On escalation: Invalidate all alert caches
|
||
- On resolution: Invalidate both active and critical
|
||
|
||
**TTL**:
|
||
- Alert lists: 5 minutes
|
||
- Orchestrator context: 15 minutes
|
||
- Deduplication keys: 15 minutes
|
||
|
||
### 16.3 Monitoring Metrics
|
||
|
||
**Prometheus Metrics**:
|
||
```python
|
||
# Alert creation rate
|
||
alert_created_total = Counter('alert_created_total', 'Total alerts created', ['tenant_id', 'alert_type'])
|
||
|
||
# Enrichment timing
|
||
enrichment_duration_seconds = Histogram('enrichment_duration_seconds', 'Enrichment processing time', ['event_type'])
|
||
|
||
# Priority distribution
|
||
alert_priority_distribution = Histogram('alert_priority_distribution', 'Alert priority scores', ['priority_level'])
|
||
|
||
# Resolution metrics
|
||
alert_resolution_time_seconds = Histogram('alert_resolution_time_seconds', 'Time to resolve alerts', ['alert_type'])
|
||
|
||
# Escalation tracking
|
||
alert_escalated_total = Counter('alert_escalated_total', 'Alerts escalated', ['escalation_reason'])
|
||
|
||
# Deduplication hits
|
||
alert_deduplicated_total = Counter('alert_deduplicated_total', 'Alerts deduplicated', ['alert_type'])
|
||
```
|
||
|
||
**Key Metrics to Monitor**:
|
||
- Alert creation rate (per tenant, per type)
|
||
- Average resolution time (should decrease over time)
|
||
- Escalation rate (high rate indicates alerts being ignored)
|
||
- Deduplication hit rate (should be 10-20%)
|
||
- Enrichment performance (p50, p95, p99)
|
||
- SSE connection count and duration
|
||
|
||
### 16.4 Health Checks
|
||
|
||
**Alert Processor Health**:
|
||
```python
|
||
@app.get("/health")
|
||
async def health_check():
|
||
checks = {
|
||
"database": await check_db_connection(),
|
||
"redis": await check_redis_connection(),
|
||
"rabbitmq": await check_rabbitmq_connection(),
|
||
"orchestrator_api": await check_orchestrator_api()
|
||
}
|
||
|
||
overall_healthy = all(checks.values())
|
||
status_code = 200 if overall_healthy else 503
|
||
|
||
return JSONResponse(
|
||
status_code=status_code,
|
||
content={
|
||
"status": "healthy" if overall_healthy else "unhealthy",
|
||
"checks": checks,
|
||
"timestamp": datetime.utcnow().isoformat()
|
||
}
|
||
)
|
||
```
|
||
|
||
**CronJob Monitoring**:
|
||
```yaml
|
||
# Kubernetes CronJob metrics
|
||
- Last successful run timestamp
|
||
- Last failed run timestamp
|
||
- Average execution duration
|
||
- Alert count processed per run
|
||
- Error count per run
|
||
```
|
||
|
||
### 16.5 Troubleshooting Guide
|
||
|
||
**Problem**: Alerts not appearing in frontend
|
||
|
||
**Diagnosis**:
|
||
1. Check alert created in database: `SELECT * FROM alerts WHERE tenant_id=... ORDER BY created_at DESC LIMIT 10;`
|
||
2. Check Redis pub/sub: `SUBSCRIBE tenant:{id}:inventory.alerts`
|
||
3. Check SSE connection: Browser dev tools → Network → EventStream
|
||
4. Check frontend hook subscription: Console logs
|
||
|
||
**Problem**: Slow enrichment
|
||
|
||
**Diagnosis**:
|
||
1. Check Prometheus metrics for `enrichment_duration_seconds`
|
||
2. Identify slow enrichment service (orchestrator, priority scoring, etc.)
|
||
3. Check orchestrator API response time
|
||
4. Review database query performance (EXPLAIN ANALYZE)
|
||
|
||
**Problem**: High escalation rate
|
||
|
||
**Diagnosis**:
|
||
1. Query alerts by age: `SELECT alert_type, COUNT(*) FROM alerts WHERE action_created_at < NOW() - INTERVAL '48 hours' GROUP BY alert_type;`
|
||
2. Check if certain alert types are consistently ignored
|
||
3. Review smart actions (are they actionable?)
|
||
4. Check user permissions (can users actually execute actions?)
|
||
|
||
**Problem**: Duplicate alerts
|
||
|
||
**Diagnosis**:
|
||
1. Check deduplication key generation logic
|
||
2. Verify Redis connection (dedup keys being set?)
|
||
3. Review deduplication window (15 minutes may be too short)
|
||
4. Check for race conditions in concurrent alert creation
|
||
|
||
---
|
||
|
||
## 17. Deployment Guide
|
||
|
||
### 17.1 5-Week Deployment Timeline
|
||
|
||
**Week 1: Backend & Gateway**
|
||
- Day 1: Database migration in dev environment
|
||
- Day 2-3: Deploy alert processor with dual publishing
|
||
- Day 4: Deploy updated gateway
|
||
- Day 5: Monitoring & validation
|
||
|
||
**Week 2-3: Frontend Integration**
|
||
- Dashboard components with event hooks
|
||
- Priority components (ActionQueue, HealthHero, ExecutionTracker)
|
||
- Domain pages (Inventory, Production, Supply Chain)
|
||
|
||
**Week 4: Cutover**
|
||
- Verify complete migration
|
||
- Remove dual publishing
|
||
- Database cleanup (remove legacy columns)
|
||
|
||
**Week 5: Optimization**
|
||
- Performance tuning
|
||
- Monitoring dashboards
|
||
- Alert rules refinement
|
||
|
||
### 17.2 Pre-Deployment Checklist
|
||
|
||
- ✅ Database migration scripts tested
|
||
- ✅ Backward compatibility verified
|
||
- ✅ Rollback procedure documented
|
||
- ✅ Monitoring metrics defined
|
||
- ✅ Performance benchmarks set
|
||
- ✅ Example integrations tested
|
||
- ✅ Documentation complete
|
||
|
||
### 17.3 Rollback Procedure
|
||
|
||
**If issues occur**:
|
||
1. Stop new alert processor deployment
|
||
2. Revert gateway to previous version
|
||
3. Roll back database migration (if safe)
|
||
4. Resume dual publishing if partially migrated
|
||
5. Investigate root cause
|
||
6. Fix and redeploy
|
||
|
||
---
|
||
|
||
## Appendix
|
||
|
||
### Related Documentation
|
||
|
||
- [Frontend README](../frontend/README.md) - Frontend architecture and components
|
||
- [Alert Processor Service README](../services/alert_processor/README.md) - Service implementation details
|
||
- [Inventory Service README](../services/inventory/README.md) - Stock receipt system
|
||
- [Orchestrator Service README](../services/orchestrator/README.md) - Delivery tracking
|
||
- [Technical Documentation Summary](./TECHNICAL-DOCUMENTATION-SUMMARY.md) - System overview
|
||
|
||
### Version History
|
||
|
||
- **v2.0** (2025-11-25): Complete architecture with escalation, chaining, cronjobs
|
||
- **v1.5** (2025-11-23): Added stock receipt system and delivery tracking
|
||
- **v1.0** (2025-11-15): Initial three-tier enrichment system
|
||
|
||
### Contributors
|
||
|
||
This alert system was designed and implemented collaboratively to support the Bakery-IA platform's mission of providing intelligent, context-aware alerts that respect user time and decision-making agency.
|
||
|
||
---
|
||
|
||
**Last Updated**: 2025-11-25
|
||
**Status**: Production-Ready ✅
|
||
**Next Review**: As needed based on system evolution
|