This commit implements multiple improvements to the onboarding wizard:
**1. Unified UI Components:**
- Created InfoCard component for consistent "why is important" blocks across all steps
- Created TemplateCard component for consistent template displays
- Both components use global CSS variables for proper dark mode support
**2. Initial Stock Entry Step Improvements:**
- Fixed title/subtitle positioning using unified InfoCard component
- Fixed missing count bug in warning message (now uses {{count}} interpolation)
- Fixed dark mode colors using CSS variables (--color-success, --color-info, etc.)
- Changed next button title from "completar configuración" to "Continuar →"
- Implemented stock creation API call using useAddStock hook
- Products with stock now properly save to backend on step completion
**3. Dark Mode Fixes:**
- Fixed QualitySetupStep: Enhanced button selection visibility with rings and shadows
- Fixed TeamSetupStep: Enhanced role selection visibility with rings and shadows
- Fixed AddressAutocomplete: Replaced all hardcoded colors with CSS variables
- All dropdown results, icons, and hover states now properly adapt to dark mode
**4. Streamlined Wizard Flow:**
- Removed POI Detection step from wizard (step previously added complexity)
- POI detection now runs automatically in background after tenant registration
- Non-blocking approach ensures users aren't delayed by POI detection
- Removed Revision step (setup-review) as it adds no user value
- Completion step is now the final step before dashboard
**5. Backend Updates:**
- Updated onboarding_progress.py to remove poi-detection from ONBOARDING_STEPS
- Updated onboarding_progress.py to remove setup-review from ONBOARDING_STEPS
- Updated step dependencies to reflect streamlined flow
- POI detection documented as automatic background process
All changes maintain backward compatibility and use proper TypeScript types.
BACKEND IMPLEMENTATION: Implemented template code auto-generation for quality
check templates following the proven pattern from orders and inventory services.
IMPLEMENTATION DETAILS:
**New Method: _generate_template_code()**
Location: services/production/app/services/quality_template_service.py:447-513
Format: TPL-{TYPE}-{SEQUENCE}
- TYPE: 2-letter prefix based on check_type
- SEQUENCE: Sequential 4-digit number per type per tenant
- Examples:
- Product Quality → TPL-PQ-0001, TPL-PQ-0002, etc.
- Process Hygiene → TPL-PH-0001, TPL-PH-0002, etc.
- Equipment → TPL-EQ-0001
- Safety → TPL-SA-0001
- Cleaning → TPL-CL-0001
- Temperature Control → TPL-TC-0001
- Documentation → TPL-DC-0001
**Type Mapping:**
- product_quality → PQ
- process_hygiene → PH
- equipment → EQ
- safety → SA
- cleaning → CL
- temperature → TC
- documentation → DC
- Fallback: First 2 chars of template name or "TP"
**Generation Logic:**
1. Map check_type to 2-letter prefix
2. Query database for count of existing codes with same prefix
3. Increment sequence number (count + 1)
4. Format as TPL-{TYPE}-{SEQUENCE:04d}
5. Fallback to UUID-based code if any error occurs
**Integration:**
- Updated create_template() method (lines 42-50)
- Auto-generates template code ONLY if not provided
- Maintains support for custom codes from users
- Logs generation for audit trail
**Benefits:**
✅ Database-enforced uniqueness per tenant per type
✅ Meaningful codes grouped by quality check type
✅ Follows established pattern (orders, inventory)
✅ Thread-safe with async database context
✅ Graceful fallback to UUID on errors
✅ Full audit logging
**Technical Details:**
- Uses SQLAlchemy select with func.count for efficient counting
- Filters by tenant_id and template_code prefix
- Uses LIKE operator for prefix matching (TPL-{type}-%)
- Executed within service's async db session
**Testing Suggestions:**
1. Create template without code → should auto-generate
2. Create template with custom code → should use provided code
3. Create multiple templates of same type → should increment
4. Create templates of different types → separate sequences
5. Verify tenant isolation
This completes the quality template backend auto-generation,
matching the frontend changes in QualityTemplateWizard.tsx
BACKEND IMPLEMENTATION: Implemented SKU auto-generation following the proven
pattern from the orders service (order_number generation).
IMPLEMENTATION DETAILS:
**New Method: _generate_sku()**
Location: services/inventory/app/services/inventory_service.py:1069-1104
Format: SKU-{PREFIX}-{SEQUENCE}
- PREFIX: First 3 characters of product name (uppercase)
- SEQUENCE: Sequential 4-digit number per prefix per tenant
- Examples:
- "Flour" → SKU-FLO-0001, SKU-FLO-0002, etc.
- "Bread" → SKU-BRE-0001, SKU-BRE-0002, etc.
- "Sourdough Starter" → SKU-SOU-0001, etc.
**Generation Logic:**
1. Extract prefix from product name (first 3 chars)
2. Query database for count of existing SKUs with same prefix
3. Increment sequence number (count + 1)
4. Format as SKU-{PREFIX}-{SEQUENCE:04d}
5. Fallback to UUID-based SKU if any error occurs
**Integration:**
- Updated create_ingredient() method (line 52-54)
- Auto-generates SKU ONLY if not provided by frontend
- Maintains support for custom SKUs from users
- Logs generation for audit trail
**Benefits:**
✅ Database-enforced uniqueness per tenant
✅ Meaningful, sequential SKUs grouped by product type
✅ Follows established orders service pattern
✅ Thread-safe with database transaction context
✅ Graceful fallback to UUID on errors
✅ Full audit logging
**Technical Details:**
- Uses SQLAlchemy select with func.count for efficient counting
- Filters by tenant_id for tenant isolation
- Uses LIKE operator for prefix matching (SKU-{prefix}-%)
- Executed within get_db_transaction() context for safety
**Testing Suggestions:**
1. Create ingredient without SKU → should auto-generate
2. Create ingredient with custom SKU → should use provided SKU
3. Create multiple ingredients with same name prefix → should increment
4. Verify tenant isolation (different tenants can have same SKU)
NEXT: Consider adding similar generation for:
- Quality template codes (TPL-{TYPE}-{SEQUENCE})
- Production batch numbers (if not already implemented)
This completes the backend implementation for inventory SKU generation,
matching the frontend changes that delegated generation to backend.
ProductionTimelineItem schema requires a 'reasoning' field (string), but the
dashboard service was only providing 'reasoning_data'. Added the reasoning
text field with fallback to auto-generated text if not present in batch data.
Fixes Pydantic validation error: 'Field required' for reasoning field.
The previous logic required batches to both START and END within the date range,
which excluded batches that start today but end later. Now correctly filters
batches based on their planned_start_time only, so today's batches include all
batches scheduled to start today regardless of their end time.
Fixes bug where PENDING batches with today's start date were not appearing
in the dashboard production timeline.
Error: 500 Internal Server Error on /dashboard/action-queue
Pydantic validation error: ActionItem requires 'reasoning' and 'consequence' fields
Root Cause:
-----------
Purchase order approval actions were missing required fields:
- Had: reasoning_data (dict) - not a valid field
- Needed: reasoning (string) and consequence (string)
The Fix:
--------
services/orchestrator/app/services/dashboard_service.py line 380-396
Changed from:
'reasoning_data': {...} # Invalid field
To:
'reasoning': 'Pending approval for {supplier} - {type}'
'consequence': 'Delayed delivery may impact production schedule'
Now action items have all required fields for Pydantic validation to pass.
Fixes the 500 error on action-queue endpoint.
MAJOR FIX: Dashboard now shows actual business data instead of mock values
The Bug:
--------
Dashboard insights were displaying hardcoded values:
- Savings: Always showed "€124 this week"
- Deliveries: Always showed "0 arriving today"
- Not reflecting actual business activity
Root Cause:
-----------
Line 468-472 in dashboard.py had hardcoded mock savings data:
```python
savings_data = {
"weekly_savings": 124,
"trend_percentage": 12
}
```
Deliveries data wasn't being fetched from any service.
The Fix:
--------
1. **Real Savings Calculation:**
- Fetches last 7 days of purchase orders
- Sums up actual savings from price optimization
- Uses po.optimization_data.savings field
- Calculates: weekly_savings from PO optimization savings
- Result: Shows €X based on actual cost optimizations
2. **Real Deliveries Data:**
- Fetches pending purchase orders from procurement service
- Counts POs with expected_delivery_date == today
- Result: Shows actual number of deliveries arriving today
3. **Data Flow:**
```
procurement_client.get_pending_purchase_orders()
→ Filter by created_at (last 7 days) for savings
→ Filter by expected_delivery_date (today) for deliveries
→ Calculate totals
→ Pass to dashboard_service.calculate_insights()
```
Benefits:
---------
✅ Savings widget shows real optimization results
✅ Deliveries widget shows actual incoming deliveries
✅ Inventory widget already uses real stock data
✅ Waste widget already uses real sustainability data
✅ Dashboard reflects actual business activity
Note: Trend percentage for savings still defaults to 12% as it
requires historical comparison data (future enhancement).
The alert_processor service was crashing with:
ImportError: cannot import name 'get_db' from 'shared.database.base'
Root cause: alerts.py was using Depends(get_db) pattern which doesn't exist
in shared.database.base.
Fix: Refactored alerts.py to follow the same pattern as analytics.py:
- Removed Depends(get_db) dependency injection
- Each endpoint now creates a DatabaseManager instance
- Uses db_manager.get_session() context manager for database access
This matches the alert_processor service's existing architecture in analytics.py.
Moved alert-related methods from ProcurementServiceClient to a new
dedicated AlertsServiceClient for better separation of concerns.
Changes:
- Created shared/clients/alerts_client.py:
* get_alerts_summary() - Alert counts by severity/status
* get_critical_alerts() - Filtered list of urgent alerts
* get_alerts_by_severity() - Filter by any severity level
* get_alert_by_id() - Get specific alert details
* Includes severity mapping (critical → urgent)
- Updated shared/clients/__init__.py:
* Added AlertsServiceClient import/export
* Added get_alerts_client() factory function
- Updated procurement_client.py:
* Removed get_critical_alerts() method
* Removed get_alerts_summary() method
* Kept only procurement-specific methods
- Updated dashboard.py:
* Import and initialize alerts_client
* Use alerts_client for alert operations
* Use procurement_client only for procurement operations
Benefits:
- Better separation of concerns
- Alerts logically grouped with alert_processor service
- Cleaner, more maintainable service client architecture
- Each client maps to its domain service
Completed migration from generic .get() calls to typed service client
methods for better code clarity and maintainability.
Changes:
- Production timeline: Use get_todays_batches() instead of .get()
- Insights: Use get_sustainability_widget() and get_stock_status()
All dashboard endpoints now use domain-specific typed methods instead
of raw HTTP paths, making the code more discoverable and type-safe.
Implemented missing alert endpoints that the dashboard requires for
health status and action queue functionality.
Alert Processor Service Changes:
- Created alerts_repository.py:
* get_alerts() - Filter alerts by severity/status/resolved with pagination
* get_alerts_summary() - Count alerts by severity and status
* get_alert_by_id() - Get specific alert
- Created alerts.py API endpoints:
* GET /api/v1/tenants/{tenant_id}/alerts/summary - Alert counts
* GET /api/v1/tenants/{tenant_id}/alerts - Filtered alert list
* GET /api/v1/tenants/{tenant_id}/alerts/{alert_id} - Single alert
- Severity mapping: "critical" (dashboard) maps to "urgent" (alert_processor)
- Status enum: active, resolved, acknowledged, ignored
- Severity enum: low, medium, high, urgent
API Server Changes:
- Registered alerts_router in api_server.py
- Exported alerts_router in __init__.py
Procurement Client Changes:
- Updated get_critical_alerts() to use /alerts path
- Updated get_alerts_summary() to use /alerts/summary path
- Added severity mapping (critical → urgent)
- Added documentation about gateway routing
This fixes the 404 errors for alert endpoints in the dashboard.
Created typed, domain-specific methods in service clients instead of
using generic .get() calls with paths. This improves type safety,
discoverability, and maintainability.
Service Client Changes:
- ProcurementServiceClient:
* get_pending_purchase_orders() - POs awaiting approval
* get_critical_alerts() - Critical severity alerts
* get_alerts_summary() - Alert counts by severity
- ProductionServiceClient:
* get_todays_batches() - Today's production timeline
* get_production_batches_by_status() - Filter by status
- InventoryServiceClient:
* get_stock_status() - Dashboard stock metrics
* get_sustainability_widget() - Sustainability data
Dashboard API Changes:
- Updated all endpoints to use new dedicated methods
- Cleaner, more maintainable code
- Better error handling and logging
- Fixed inventory data type handling (list vs dict)
Note: Alert endpoints return 404 - alert_processor service needs
endpoints: /alerts/summary and /alerts (filtered by severity).
Fixed 404 errors by adding service name prefixes to all client endpoint calls.
Gateway routing requires paths like /production/..., /procurement/..., /inventory/...
Changes:
- Production endpoints: Add /production/ prefix
- Procurement endpoints: Add /procurement/ prefix
- Inventory endpoints: Add /inventory/ prefix
- Handle inventory API returning list instead of dict for stock-status
Fixes:
- 404 errors for production-batches, purchase-orders, alerts endpoints
- AttributeError when inventory_data is a list
All service client calls now match gateway routing expectations.
Replace direct httpx calls with shared service client architecture for
better fault tolerance, authentication, and consistency.
Changes:
- Remove httpx import and usage
- Add service client imports (inventory, production, procurement)
- Initialize service clients at module level
- Refactor all 5 dashboard endpoints to use service clients:
* health-status: Use inventory/production/procurement clients
* orchestration-summary: Use procurement/production clients
* action-queue: Use procurement client
* production-timeline: Use production client
* insights: Use inventory client
Benefits:
- Built-in circuit breaker pattern for fault tolerance
- Automatic service authentication with JWT tokens
- Consistent error handling and retry logic
- Removes hardcoded service URLs
- Better testability and maintainability
Fixed 'items' is not defined error in demo seed script:
- Changed 'items' references to 'items_data' (the function parameter)
- Updated product_names extraction to use dict.get() for safety
- Fixed create_po_reasoning_supplier_contract call to use correct parameters
(contract_quantity instead of trust_score)
This resolves the warnings about failed reasoning_data generation
during purchase order seeding.
Convert pipe-separated reasoning codes to structured JSON format for:
- Safety stock calculator (statistical calculations, errors)
- Price forecaster (procurement recommendations, volatility)
- Order optimization (EOQ, tier pricing)
This enables i18n translation of internal calculation reasoning
and provides structured data for frontend AI insights display.
Benefits:
- Consistent with PO/Batch reasoning_data format
- Frontend can translate using same i18n infrastructure
- Structured parameters enable rich UI visualization
- No legacy string parsing needed
Changes:
- safety_stock_calculator.py: Replace reasoning str with reasoning_data dict
- price_forecaster.py: Convert recommendation reasoning to structured format
- optimization.py: Update EOQ and tier pricing to use reasoning_data
Part of complete i18n implementation for AI insights.
This commit removes the last hardcoded English text from reasoning fields
across all backend services, completing the i18n implementation.
Changes by service:
Safety Stock Calculator (safety_stock_calculator.py):
- CALC:STATISTICAL_Z_SCORE - Statistical calculation with Z-score
- CALC:ADVANCED_VARIABILITY - Advanced formula with demand and lead time variability
- CALC:FIXED_PERCENTAGE - Fixed percentage of lead time demand
- All calculation methods now use structured codes with pipe-separated parameters
Price Forecaster (price_forecaster.py):
- PRICE_FORECAST:DECREASE_EXPECTED - Price expected to decrease
- PRICE_FORECAST:INCREASE_EXPECTED - Price expected to increase
- PRICE_FORECAST:HIGH_VOLATILITY - High price volatility detected
- PRICE_FORECAST:BELOW_AVERAGE - Current price below average (buy opportunity)
- PRICE_FORECAST:STABLE - Price stable, normal schedule
- All forecasts include relevant parameters (change_pct, days, etc.)
Optimization Utils (shared/utils/optimization.py):
- EOQ:BASE - Economic Order Quantity base calculation
- EOQ:MOQ_APPLIED - Minimum order quantity constraint applied
- EOQ:MAX_APPLIED - Maximum order quantity constraint applied
- TIER_PRICING:CURRENT_TIER - Current tier pricing
- TIER_PRICING:UPGRADED - Upgraded to higher tier for savings
- All optimizations include calculation parameters
Format: All codes use pattern "CATEGORY:TYPE|param1=value|param2=value"
This allows frontend to parse and translate with parameters while maintaining
technical accuracy for logging and debugging.
Frontend can now translate ALL reasoning codes across the entire system.
Demo Seed Scripts:
- Updated seed_demo_purchase_orders.py to use structured reasoning_data
* Imports create_po_reasoning_low_stock and create_po_reasoning_supplier_contract
* Generates reasoning_data with product names, stock levels, and consequences
* Removed deprecated reasoning/consequence TEXT fields
- Updated seed_demo_batches.py to use structured reasoning_data
* Imports create_batch_reasoning_forecast_demand and create_batch_reasoning_regular_schedule
* Generates intelligent reasoning based on batch priority and AI assistance
* Adds reasoning_data to all production batches
Backend Services - Error Code Implementation:
- Updated safety_stock_calculator.py with error codes
* Replaced "Lead time or demand std dev is zero or negative" with ERROR:LEAD_TIME_INVALID
* Replaced "Insufficient historical demand data" with ERROR:INSUFFICIENT_DATA
- Updated replenishment_planning_service.py with error codes
* Replaced "Insufficient data for safety stock calculation" with ERROR:INSUFFICIENT_DATA
* Frontend can now translate error codes using i18n
Demo data will now display with translatable reasoning in EN/ES/EU languages.
Backend services return error codes that frontend translates for user's language.
Completed the migration to structured reasoning_data for multilingual
dashboard support. Removed hardcoded TEXT fields (reasoning, consequence)
and updated all related code to use JSONB reasoning_data.
Changes:
1. Models Updated (removed TEXT fields):
- PurchaseOrder: Removed reasoning, consequence TEXT columns
- ProductionBatch: Removed reasoning TEXT column
- Both now use only reasoning_data (JSONB/JSON)
2. Dashboard Service Updated:
- Changed to return reasoning_data instead of TEXT fields
- Creates default reasoning_data if missing
- PO actions: reasoning_data with type and parameters
- Production timeline: reasoning_data for each batch
3. Unified Schemas Updated (no separate migration):
- services/procurement/migrations/001_unified_initial_schema.py
- services/production/migrations/001_unified_initial_schema.py
- Removed reasoning/consequence columns from table definitions
- Updated comments to reflect i18n approach
Database Schema:
- purchase_orders: Only reasoning_data (JSONB)
- production_batches: Only reasoning_data (JSON)
Backend now generates:
{
"type": "low_stock_detection",
"parameters": {
"supplier_name": "Harinas del Norte",
"days_until_stockout": 3,
...
},
"consequence": {
"type": "stockout_risk",
"severity": "high"
}
}
Next Steps:
- Frontend: Create i18n translation keys
- Frontend: Update components to translate reasoning_data
- Test multilingual support (ES, EN, CA)
Implemented proper reasoning data generation for purchase orders and
production batches to enable multilingual dashboard support.
Backend Strategy:
- Generate structured JSON with type codes and parameters
- Store only reasoning_data (JSONB), not hardcoded text
- Frontend will translate using i18n libraries
Changes:
1. Created shared/schemas/reasoning_types.py
- Defined reasoning types for POs and batches
- Created helper functions for common reasoning patterns
- Supports multiple reasoning types (low_stock, forecast_demand, etc.)
2. Production Service (services/production/app/services/production_service.py)
- Generate reasoning_data when creating batches from forecast
- Include parameters: product_name, predicted_demand, current_stock, etc.
- Structure supports frontend i18n interpolation
3. Procurement Service (services/procurement/app/services/procurement_service.py)
- Implemented actual PO creation (was placeholder before!)
- Groups requirements by supplier
- Generates reasoning_data based on context (low_stock vs forecast)
- Creates PO items automatically
Example reasoning_data:
{
"type": "low_stock_detection",
"parameters": {
"supplier_name": "Harinas del Norte",
"product_names": ["Flour Type 55", "Flour Type 45"],
"days_until_stockout": 3,
"current_stock": 45.5,
"required_stock": 200
},
"consequence": {
"type": "stockout_risk",
"severity": "high",
"impact_days": 3
}
}
Frontend will translate:
- EN: "Low stock detected for Harinas del Norte. Stock runs out in 3 days."
- ES: "Stock bajo detectado para Harinas del Norte. Se agota en 3 días."
- CA: "Estoc baix detectat per Harinas del Norte. S'esgota en 3 dies."
Next steps:
- Remove TEXT fields (reasoning, consequence) from models
- Update dashboard service to use reasoning_data
- Create frontend i18n translation keys
- Update dashboard components to translate dynamically
Fixed critical React error #306 by adding proper null handling for
reasoning and consequence fields in the dashboard service.
Issue: When database columns (reasoning, consequence) contain NULL
values, Python's .get() method returns None, which becomes undefined
in JavaScript, causing React to throw error #306 when trying to render.
Solution: Changed from `.get("field", "default")` to `.get("field") or "default"`
to properly handle None values throughout the dashboard service.
Changes:
- Purchase order actions: Added null coalescing for reasoning/consequence
- Production timeline: Added null coalescing for reasoning field
- Alert actions: Added null coalescing for description and source
- Onboarding actions: Added null coalescing for title and consequence
This ensures all text fields always have valid string values before
being sent to the frontend, preventing undefined rendering errors.
Consolidated incremental migrations into single unified initial schema files for both procurement and production services. This simplifies database setup and eliminates migration chain complexity.
Changes:
- Procurement: Merged 3 migrations into 001_unified_initial_schema.py
- Initial schema (20251015_1229)
- Add supplier_price_list_id (20251030_0737)
- Add JTBD reasoning fields (20251107)
- Production: Merged 3 migrations into 001_unified_initial_schema.py
- Initial schema (20251015_1231)
- Add waste tracking fields (20251023_0900)
- Add JTBD reasoning fields (20251107)
All new fields (reasoning, consequence, reasoning_data, waste_defect_type, is_ai_assisted, supplier_price_list_id) are now included in the initial schemas from the start.
Updated model files to use deferred() for reasoning fields to prevent breaking queries when running against existing databases.
- Fix frontend import: Change from useAppContext to useTenant store
- Fix backend imports: Use app.core.database instead of shared.database
- Remove auth dependencies from dashboard endpoints
- Add database migrations for reasoning fields in procurement and production
Migrations:
- procurement: Add reasoning, consequence, reasoning_data to purchase_orders
- production: Add reasoning, reasoning_data to production_batches
Implements comprehensive dashboard redesign based on Jobs To Be Done methodology
focused on answering: "What requires my attention right now?"
## Backend Implementation
### Dashboard Service (NEW)
- Health status calculation (green/yellow/red traffic light)
- Action queue prioritization (critical/important/normal)
- Orchestration summary with narrative format
- Production timeline transformation
- Insights calculation and consequence prediction
### API Endpoints (NEW)
- GET /dashboard/health-status - Overall bakery health indicator
- GET /dashboard/orchestration-summary - What system did automatically
- GET /dashboard/action-queue - Prioritized tasks requiring attention
- GET /dashboard/production-timeline - Today's production schedule
- GET /dashboard/insights - Key metrics (savings, inventory, waste, deliveries)
### Enhanced Models
- PurchaseOrder: Added reasoning, consequence, reasoning_data fields
- ProductionBatch: Added reasoning, reasoning_data fields
- Enables transparency into automation decisions
## Frontend Implementation
### API Hooks (NEW)
- useBakeryHealthStatus() - Real-time health monitoring
- useOrchestrationSummary() - System transparency
- useActionQueue() - Prioritized action management
- useProductionTimeline() - Production tracking
- useInsights() - Glanceable metrics
### Dashboard Components (NEW)
- HealthStatusCard: Traffic light indicator with checklist
- ActionQueueCard: Prioritized actions with reasoning/consequences
- OrchestrationSummaryCard: Narrative of what system did
- ProductionTimelineCard: Chronological production view
- InsightsGrid: 2x2 grid of key metrics
### Main Dashboard Page (REPLACED)
- Complete rewrite with mobile-first design
- All sections integrated with error handling
- Real-time refresh and quick action links
- Old dashboard backed up as DashboardPage.legacy.tsx
## Key Features
### Automation-First
- Shows what orchestrator did overnight
- Builds trust through transparency
- Explains reasoning for all automated decisions
### Action-Oriented
- Prioritizes tasks over information display
- Clear consequences for each action
- Large touch-friendly buttons
### Progressive Disclosure
- Shows 20% of info that matters 80% of time
- Expandable details when needed
- No overwhelming metrics
### Mobile-First
- One-handed operation
- Large touch targets (min 44px)
- Responsive grid layouts
### Trust-Building
- Narrative format ("I planned your day")
- Reasoning inputs transparency
- Clear status indicators
## User Segments Supported
1. Solo Bakery Owner (Primary)
- Simple health indicator
- Action checklist (max 3-5 items)
- Mobile-optimized
2. Multi-Location Owner
- Multi-tenant support (existing)
- Comparison capabilities
- Delegation ready
3. Enterprise/Central Bakery (Future)
- Network topology support
- Advanced analytics ready
## JTBD Analysis Delivered
Main Job: "Help me quickly understand bakery status and know what needs my intervention"
Emotional Jobs Addressed:
- Feel in control despite automation
- Reduce daily anxiety
- Feel competent with technology
- Trust system as safety net
Social Jobs Addressed:
- Demonstrate professional management
- Avoid being bottleneck
- Show sustainability
## Technical Stack
Backend: Python, FastAPI, SQLAlchemy, PostgreSQL
Frontend: React, TypeScript, TanStack Query, Tailwind CSS
Architecture: Microservices with circuit breakers
## Breaking Changes
- Complete dashboard page rewrite (old version backed up)
- New API endpoints require orchestrator service deployment
- Database migrations needed for reasoning fields
## Migration Required
Run migrations to add new model fields:
- purchase_orders: reasoning, consequence, reasoning_data
- production_batches: reasoning, reasoning_data
## Documentation
See DASHBOARD_REDESIGN_SUMMARY.md for complete implementation details,
JTBD analysis, success metrics, and deployment guide.
BREAKING CHANGE: Dashboard page completely redesigned with new data structures
Root cause: The validation in NotificationBaseRepository._validate_notification_data
was checking enum objects against string lists, causing validation to fail when the
EnhancedNotificationService passed NotificationType/NotificationPriority/NotificationStatus
enum objects instead of their string values.
The validation now properly handles both enum objects (by extracting their .value)
and string values, fixing the "Invalid notification type" error from orchestrator.
Changes:
- Updated priority validation to handle enum objects
- Updated notification type validation to handle enum objects
- Updated status validation to handle enum objects
Fixes the error:
"Invalid notification data: ['Invalid notification type. Must be one of: ['email', 'whatsapp', 'push', 'sms']']"
CRITICAL FIX - Database Transaction:
- Removed duplicate commit logic from _store_in_db() inner function (lines 805-811)
- Prevents 'Method commit() can't be called here' error
- Now only outer scope handles commits (line 821 for new sessions, parent for parent sessions)
- Fixes issue where all 5 models trained successfully but failed to store in DB
MINOR FIX - Logging:
- Fixed Spanish holidays logger call (line 977-978)
- Removed invalid keyword arguments (region=, years=) from logger.info()
- Now uses f-string format consistent with rest of codebase
- Prevents 'Logger._log() got an unexpected keyword argument' warning
Impact:
- Training pipeline can now complete successfully
- Models will be stored in database after training
- No more cascading transaction failures
- Cleaner logs without warnings
Root cause: Double commit introduced during recent session management fixes
Related commits: 673108e, 74215d3, fd0a96e, b2de56e, e585e9f
Issues Fixed:
4️⃣ data_processor.py (Line 230-232):
- Second update_log_progress call without commit after data preparation
- Added commit() after completion update to prevent deadlock
- Added debug logging for visibility
5️⃣ prophet_manager.py _store_model (Line 750):
- Created TRIPLE nested session (training_service → trainer → lock → _store_model)
- Refactored _store_model to accept optional session parameter
- Uses parent session from lock context instead of creating new one
- Updated call site to pass db_session parameter
Complete Session Hierarchy After All Fixes:
training_service.py (session)
└─ commit() ← FIX#2 (e585e9f)
└─ trainer.py (new session) ✅ OK
└─ data_processor.py (new session)
└─ commit() after first update ← FIX#3 (b2de56e)
└─ commit() after second update ← FIX#4 (THIS)
└─ prophet_manager.train_bakery_model (uses parent or new session) ← FIX#1 (caff497)
└─ lock.acquire(session)
└─ _store_model(session=parent) ← FIX#5 (THIS)
└─ NO NESTED SESSION ✅
All nested session deadlocks in training path are now resolved.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Root Cause:
After fixing the training_service.py deadlock, training progressed to
data preparation but got stuck there. The data_processor.py creates
another nested session at line 143, updates training_log without
committing, causing another deadlock scenario.
Session Hierarchy:
1. training_service.py: outer session (fixed in e585e9f)
2. trainer.py: creates own session (passes deadlock due to commit)
3. data_processor.py: creates ANOTHER nested session (THIS FIX)
Fix:
Added explicit db_session.commit() after progress update in data_processor
(line 153) to ensure the UPDATE is committed before continuing with data
processing operations that may interact with other sessions.
This completes the chain of nested session fixes:
- caff497: prophet_manager + hybrid_trainer session passing
- e585e9f: training_service commit before trainer call
- THIS: data_processor commit after progress update
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Root Cause (Actual):
The actual nested session issue was in training_service.py, not just in
the trainer methods. The flow was:
1. training_service.py creates outer session (line 173)
2. Updates training_log at line 235-237 (uncommitted)
3. Calls trainer.train_tenant_models() at line 239
4. Trainer creates its own session at line 93
5. DEADLOCK: Outer session has uncommitted UPDATE, inner session can't proceed
Fix:
Added explicit session.commit() after the ml_training progress update
(line 241) to ensure the UPDATE is committed before trainer creates
its own session. This prevents the deadlock condition.
Related to previous commit caff497 which fixed nested sessions in
prophet_manager and hybrid_trainer, but missed the actual root cause
in training_service.py.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Root Cause:
The training process was hanging at the first progress update due to a
nested database session issue. The main trainer created a session and
repositories, then called prophet_manager.train_bakery_model() which
created another nested session with an advisory lock. This caused a
deadlock where:
1. Outer session had uncommitted UPDATE on model_training_logs
2. Inner session tried to acquire advisory lock
3. Neither could proceed, causing training to hang indefinitely
Changes Made:
1. prophet_manager.py:
- Added optional 'session' parameter to train_bakery_model()
- Refactored to use parent session if provided, otherwise create new one
- Prevents nested session creation during training
2. hybrid_trainer.py:
- Added optional 'session' parameter to train_hybrid_model()
- Passes session to prophet_manager to maintain single session context
3. trainer.py:
- Updated _train_single_product() to accept and pass session
- Updated _train_all_models_enhanced() to accept and pass session
- Pass db_session from main training context to all training methods
- Added explicit db_session.flush() after critical progress update
- This ensures updates are visible before acquiring locks
Impact:
- Eliminates nested session deadlocks
- Training now proceeds past initial progress update
- Maintains single database session context throughout training
- Prevents database transaction conflicts
Related Issues:
- Fixes training hang during onboarding process
- Not directly related to audit_metadata changes but exposed by them
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Root Cause:
Training process was stuck at 40% because blocking synchronous ML operations
(model.fit(), model.predict(), study.optimize()) were freezing the asyncio
event loop, preventing RabbitMQ heartbeats, WebSocket communication, and
progress updates.
Changes:
1. prophet_manager.py:
- Wrapped model.fit() at line 189 with asyncio.to_thread()
- Wrapped study.optimize() at line 453 with asyncio.to_thread()
2. hybrid_trainer.py:
- Made _train_xgboost() async and wrapped model.fit() with asyncio.to_thread()
- Made _evaluate_hybrid_model() async and wrapped predict() calls
- Fixed predict() method to wrap blocking predict() calls
Impact:
- Event loop no longer blocks during ML training
- RabbitMQ heartbeats continue during training
- WebSocket progress updates work correctly
- Training can now complete successfully
Fixes: Training hang at 40% during onboarding phase
Root Causes Fixed:
1. BatchForecastResponse schema mismatch in forecasting service
- Changed 'batch_id' to 'id' (required field name)
- Changed 'products_processed' to 'total_products'
- Changed 'success' to 'status' with "completed" value
- Changed 'message' to 'error_message'
- Added all required fields: batch_name, completed_products, failed_products,
requested_at, completed_at, processing_time_ms, forecasts
- This was causing "11 validation errors for BatchForecastResponse"
which made the forecast service return None, triggering saga failure
2. Missing pandas dependency in orchestrator service
- Added pandas==2.2.2 and numpy==1.26.4 to requirements.txt
- Fixes "No module named 'pandas'" warning when loading AI enhancement
These issues prevented the orchestrator from completing Step 3 (generate_forecasts)
in the daily workflow, causing the entire saga to fail and compensate.
Root cause analysis:
- The orchestration saga was failing at the 'fetch_shared_data_snapshot' step
- Lines 350-356 had a logic error: tried to import pandas in exception handler after pandas import already failed
- This caused an uncaught exception that propagated up and failed the entire saga
The fix:
- Replaced pandas DataFrame placeholder with a simple dict for traffic_predictions
- Since traffic predictions are marked as "not yet implemented", pandas is not needed yet
- This eliminates the pandas dependency from the orchestrator service
- When traffic predictions are implemented in Phase 5, the dict can be converted to DataFrame
Impact:
- Orchestration saga will no longer fail due to missing pandas
- AI enhancement warning will still appear (requires separate fix to add pandas to requirements if needed)
- Traffic predictions placeholder now uses empty dict instead of empty DataFrame
Remove invalid 'calling_service_name' parameter from AIInsightsClient
constructor call. The client only accepts 'base_url' and 'timeout' parameters.
This resolves the TypeError that was causing orchestration workflow failures.