Improve the demo feature of the project

This commit is contained in:
Urtzi Alfaro
2025-10-12 18:47:33 +02:00
parent dbc7f2fa0d
commit 7556a00db7
168 changed files with 10102 additions and 18869 deletions

View File

@@ -1,223 +0,0 @@
🎨 Frontend Design Recommendations for PanIA
1. MODERN UX/UI PRINCIPLES (2024-2025)
🎯 User-Centered Design Philosophy
- Jobs-to-be-Done Framework: Organize around what users need to
accomplish, not features
- Progressive Disclosure: Show only what's needed when it's needed
- Contextual Intelligence: AI-powered interfaces that adapt to user
behavior and business context
- Micro-Moment Design: Optimize for quick, task-focused interactions
🏗️ Information Architecture Principles
- Hub-and-Spoke Model: Central dashboard with specialized workspaces
- Layered Navigation: Primary → Secondary → Contextual navigation
levels
- Cross-Module Integration: Seamless data flow between related
functions
- Predictive Navigation: Surface relevant actions before users need
them
2. RECOMMENDED NAVIGATION STRUCTURE
🎛️ Primary Navigation (Top Level)
🏠 Dashboard 🥖 Operations 📊 Analytics ⚙️ Settings
🔗 Secondary Navigation (Operations Hub)
Operations/
├── 📦 Production
│ ├── Schedule
│ ├── Active Batches
│ └── Equipment
├── 📋 Orders
│ ├── Incoming
│ ├── In Progress
│ └── Supplier Orders
├── 🏪 Inventory
│ ├── Stock Levels
│ ├── Movements
│ └── Alerts
├── 🛒 Sales
│ ├── Daily Sales
│ ├── Customer Orders
│ └── POS Integration
└── 📖 Recipes
├── Active Recipes
├── Development
└── Costing
📈 Analytics Hub
Analytics/
├── 🔮 Forecasting
├── 📊 Sales Analytics
├── 📈 Production Reports
├── 💰 Financial Reports
├── 🎯 Performance KPIs
└── 🤖 AI Insights
3. MODERN UI DESIGN PATTERNS
🎨 Visual Design System
- Neumorphism + Glassmorphism: Subtle depth with transparency effects
- Adaptive Color System: Dynamic themes based on time of day/business
hours
- Micro-Interactions: Delightful feedback for all user actions
- Data Visualization: Interactive charts with drill-down capabilities
📱 Layout Patterns
- Compound Layout: Dashboard cards that expand into detailed views
- Progressive Web App: Offline-first design with sync indicators
- Responsive Grid: CSS Grid + Flexbox for complex layouts
- Floating Action Buttons: Quick access to primary actions
🎯 Interaction Patterns
- Command Palette: Universal search + actions (Cmd+K)
- Contextual Panels: Side panels for related information
- Smart Defaults: AI-powered form pre-filling
- Undo/Redo System: Confidence-building interaction safety
4. PAGE ORGANIZATION STRATEGY
🏠 Dashboard Design
┌─────────────────────────────────────────────────┐
│ Today's Overview AI Recommendations │
├─────────────────────────────────────────────────┤
│ Critical Alerts Weather Impact │
├─────────────────────────────────────────────────┤
│ Production Status Sales Performance │
├─────────────────────────────────────────────────┤
│ Quick Actions Recent Activity │
└─────────────────────────────────────────────────┘
📊 Analytics Design
- Export Everything: PDF, Excel, API endpoints for all reports
- AI Narrative: Natural language insights explaining the data
⚡ Operational Pages
- Split Complex Pages: Break inventory/production into focused
sub-pages
- Context-Aware Sidebars: Related information always accessible
- Bulk Operations: Multi-select with batch actions
- Real-Time Sync: Live updates with optimistic UI
5. COMPONENT ARCHITECTURE
🧱 Design System Components
// Foundational Components
Button, Input, Card, Modal, Table, Form
// Composite Components
DataTable, FilterPanel, SearchBox, ActionBar
// Domain Components
ProductCard, OrderSummary, InventoryAlert, RecipeViewer
// Layout Components
PageHeader, Sidebar, NavigationBar, BreadcrumbTrail
// Feedback Components
LoadingState, EmptyState, ErrorBoundary, SuccessMessage
🎨 Visual Hierarchy
- Typography Scale: Clear heading hierarchy with proper contrast
- Color System: Semantic colors (success, warning, error, info)
- Spacing System: Consistent 4px/8px grid system
- Shadow System: Layered depth for component elevation
6. USER EXPERIENCE ENHANCEMENTS
🚀 Performance Optimizations
- Skeleton Loading: Immediate visual feedback during data loading
- Virtual Scrolling: Handle large datasets efficiently
- Optimistic Updates: Immediate UI response with error handling
- Background Sync: Offline-first with automatic sync
♿ Accessibility Standards
- WCAG 2.2 AA Compliance: Screen reader support, keyboard navigation
- Focus Management: Clear focus indicators and logical tab order
- Color Blind Support: Pattern + color coding for data visualization
- High Contrast Mode: Automatic detection and support
🎯 Personalization Features
- Customizable Dashboards: User-configurable widgets and layouts
- Saved Views: Bookmarkable filtered states
- Notification Preferences: Granular control over alerts
- Theme Preferences: Light/dark/auto modes
7. MOBILE-FIRST CONSIDERATIONS
📱 Progressive Web App Features
- Offline Mode: Critical functions work without internet
- Push Notifications: Order alerts, stock alerts, production updates
- Home Screen Install: Native app-like experience
- Background Sync: Data synchronization when connection returns
🖱️ Touch-Optimized Interactions
- 44px Touch Targets: Minimum size for all interactive elements
- Swipe Gestures: Navigate between related screens
- Pull-to-Refresh: Intuitive data refresh mechanism
- Bottom Navigation: Thumb-friendly primary navigation on mobile
8. AI-POWERED UX ENHANCEMENTS
🤖 Intelligent Features
- Predictive Search: Suggestions based on context and history
- Smart Notifications: Context-aware alerts with actionable insights
- Automated Workflows: AI-suggested process optimizations
- Anomaly Detection: Visual highlights for unusual patterns
💬 Conversational Interface
- AI Assistant: Natural language queries for data and actions
- Voice Commands: Hands-free operation for production environments
- Smart Help: Context-aware documentation and tips
- Guided Tours: Adaptive onboarding based on user role
9. TECHNICAL IMPLEMENTATION RECOMMENDATIONS
🏗️ Architecture Patterns
- React Router: Replace custom navigation with URL-based routing
- Zustand/Redux Toolkit: Predictable state management
- React Query: Server state management with caching
- Framer Motion: Smooth animations and transitions
🎨 Styling Strategy
- CSS-in-JS: Styled-components or Emotion for dynamic theming
- Design Tokens: Centralized design system values
- Responsive Utilities: Mobile-first responsive design
- Component Variants: Consistent styling patterns
🎯 Key Priority Areas:
1. Navigation Restructure: Move from custom state navigation to React
Router with proper URL structure
2. Information Architecture: Organize around user workflows
(Hub-and-Spoke model)
3. Page Simplification: Break complex pages into focused, task-oriented
views
4. Unified Analytics: Replace scattered reports with a cohesive
Analytics hub
5. Modern UI Patterns: Implement 2024-2025 design standards with
AI-powered enhancements

View File

@@ -1,567 +0,0 @@
# Production Planning System - Implementation Summary
**Implementation Date:** 2025-10-09
**Status:** ✅ COMPLETE
**Version:** 2.0
---
## Executive Summary
Successfully implemented all three phases of the production planning system improvements, transforming the manual procurement-only system into a fully automated, timezone-aware, cached, and monitored production planning platform.
### Key Achievements
**100% Automation** - Both production and procurement planning now run automatically every morning
**50% Cost Reduction** - Forecast caching eliminates duplicate computations
**Timezone Accuracy** - All schedulers respect tenant-specific timezones
**Complete Observability** - Comprehensive metrics and alerting in place
**Robust Workflows** - Plan rejection triggers automatic notifications and regeneration
**Production Ready** - Full documentation and runbooks for operations team
---
## Implementation Phases
### ✅ Phase 1: Critical Gaps (COMPLETED)
#### 1.1 Production Scheduler Service
**Status:** ✅ COMPLETE
**Effort:** 4 hours (estimated 3-4 days, completed faster due to reuse of proven patterns)
**Files Created/Modified:**
- 📄 Created: [`services/production/app/services/production_scheduler_service.py`](../services/production/app/services/production_scheduler_service.py)
- ✏️ Modified: [`services/production/app/main.py`](../services/production/app/main.py)
**Features Implemented:**
- ✅ Daily production schedule generation at 5:30 AM
- ✅ Stale schedule cleanup at 5:50 AM
- ✅ Test mode for development (every 30 minutes)
- ✅ Parallel tenant processing with 180s timeout per tenant
- ✅ Leader election support (distributed deployment ready)
- ✅ Idempotency (checks for existing schedules)
- ✅ Demo tenant filtering
- ✅ Comprehensive error handling and logging
- ✅ Integration with ProductionService.calculate_daily_requirements()
- ✅ Automatic batch creation from requirements
- ✅ Notifications to production managers
**Test Endpoint:**
```bash
POST /test/production-scheduler
```
#### 1.2 Timezone Configuration
**Status:** ✅ COMPLETE
**Effort:** 1 hour (as estimated)
**Files Created/Modified:**
- ✏️ Modified: [`services/tenant/app/models/tenants.py`](../services/tenant/app/models/tenants.py)
- 📄 Created: [`services/tenant/migrations/versions/20251009_add_timezone_to_tenants.py`](../services/tenant/migrations/versions/20251009_add_timezone_to_tenants.py)
- 📄 Created: [`shared/utils/timezone_helper.py`](../shared/utils/timezone_helper.py)
**Features Implemented:**
-`timezone` field added to Tenant model (default: "Europe/Madrid")
- ✅ Database migration for existing tenants
- ✅ TimezoneHelper utility class with comprehensive methods:
- `get_current_date_in_timezone()`
- `get_current_datetime_in_timezone()`
- `convert_to_utc()` / `convert_from_utc()`
- `is_business_hours()`
- `get_next_business_day_at_time()`
- ✅ Validation for IANA timezone strings
- ✅ Fallback to default timezone on errors
**Migration Command:**
```bash
alembic upgrade head # Applies 20251009_add_timezone_to_tenants
```
---
### ✅ Phase 2: Optimization (COMPLETED)
#### 2.1 Forecast Caching
**Status:** ✅ COMPLETE
**Effort:** 3 hours (estimated 2 days, completed faster with clear design)
**Files Created/Modified:**
- 📄 Created: [`services/forecasting/app/services/forecast_cache.py`](../services/forecasting/app/services/forecast_cache.py)
- ✏️ Modified: [`services/forecasting/app/api/forecasting_operations.py`](../services/forecasting/app/api/forecasting_operations.py)
**Features Implemented:**
- ✅ Service-level Redis caching for forecasts
- ✅ Cache key format: `forecast:{tenant_id}:{product_id}:{forecast_date}`
- ✅ Smart TTL calculation (expires midnight after forecast_date)
- ✅ Batch forecast caching support
- ✅ Cache invalidation methods:
- Per product
- Per tenant
- All forecasts (admin only)
- ✅ Cache metadata in responses (`cached: true` flag)
- ✅ Cache statistics endpoint
- ✅ Automatic cache hit/miss logging
- ✅ Graceful fallback if Redis unavailable
**Performance Impact:**
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Duplicate forecasts | 2x per day | 1x per day | 50% reduction |
| Forecast response time | 2-5s | 50-100ms | 95%+ faster |
| Forecasting service load | 100% | 50% | 50% reduction |
**Cache Endpoints:**
```bash
GET /api/v1/{tenant_id}/forecasting/cache/stats
DELETE /api/v1/{tenant_id}/forecasting/cache/product/{product_id}
DELETE /api/v1/{tenant_id}/forecasting/cache
```
#### 2.2 Plan Rejection Workflow
**Status:** ✅ COMPLETE
**Effort:** 2 hours (estimated 3 days, completed faster by extending existing code)
**Files Modified:**
- ✏️ Modified: [`services/orders/app/services/procurement_service.py`](../services/orders/app/services/procurement_service.py)
**Features Implemented:**
- ✅ Rejection handler method (`_handle_plan_rejection()`)
- ✅ Notification system for stakeholders
- ✅ RabbitMQ events:
- `procurement.plan.rejected`
- `procurement.plan.regeneration_requested`
- `procurement.plan.status_changed`
- ✅ Auto-regeneration logic based on rejection keywords:
- "stale", "outdated", "old data"
- "datos antiguos", "desactualizado", "obsoleto" (Spanish)
- ✅ Rejection tracking in `approval_workflow` JSONB
- ✅ Integration with existing status update workflow
**Workflow:**
```
Plan Rejected → Record in audit trail → Send notifications
→ Publish events
→ Analyze reason
→ Auto-regenerate (if applicable)
→ Schedule regeneration
```
---
### ✅ Phase 3: Enhancements (COMPLETED)
#### 3.1 Monitoring & Metrics
**Status:** ✅ COMPLETE
**Effort:** 2 hours (as estimated)
**Files Created:**
- 📄 Created: [`shared/monitoring/scheduler_metrics.py`](../shared/monitoring/scheduler_metrics.py)
**Metrics Implemented:**
**Production Scheduler:**
- `production_schedules_generated_total` (Counter by tenant, status)
- `production_schedule_generation_duration_seconds` (Histogram by tenant)
- `production_tenants_processed_total` (Counter by status)
- `production_batches_created_total` (Counter by tenant)
- `production_scheduler_runs_total` (Counter by trigger)
- `production_scheduler_errors_total` (Counter by error_type)
**Procurement Scheduler:**
- `procurement_plans_generated_total` (Counter by tenant, status)
- `procurement_plan_generation_duration_seconds` (Histogram by tenant)
- `procurement_tenants_processed_total` (Counter by status)
- `procurement_requirements_created_total` (Counter by tenant, priority)
- `procurement_scheduler_runs_total` (Counter by trigger)
- `procurement_plan_rejections_total` (Counter by tenant, auto_regenerated)
- `procurement_plans_by_status` (Gauge by tenant, status)
**Forecast Cache:**
- `forecast_cache_hits_total` (Counter by tenant)
- `forecast_cache_misses_total` (Counter by tenant)
- `forecast_cache_hit_rate` (Gauge by tenant, 0-100%)
- `forecast_cache_entries_total` (Gauge by cache_type)
- `forecast_cache_invalidations_total` (Counter by tenant, reason)
**General Health:**
- `scheduler_health_status` (Gauge by service, scheduler_type)
- `scheduler_last_run_timestamp` (Gauge by service, scheduler_type)
- `scheduler_next_run_timestamp` (Gauge by service, scheduler_type)
- `tenant_processing_timeout_total` (Counter by service, tenant_id)
**Alert Rules Created:**
- 🚨 `DailyProductionPlanningFailed` (high severity)
- 🚨 `DailyProcurementPlanningFailed` (high severity)
- 🚨 `NoProductionSchedulesGenerated` (critical severity)
- ⚠️ `ForecastCacheHitRateLow` (warning)
- ⚠️ `HighTenantProcessingTimeouts` (warning)
- 🚨 `SchedulerUnhealthy` (critical severity)
#### 3.2 Documentation & Runbooks
**Status:** ✅ COMPLETE
**Effort:** 2 hours (as estimated)
**Files Created:**
- 📄 Created: [`docs/PRODUCTION_PLANNING_SYSTEM.md`](./PRODUCTION_PLANNING_SYSTEM.md) (comprehensive documentation, 1000+ lines)
- 📄 Created: [`docs/SCHEDULER_RUNBOOK.md`](./SCHEDULER_RUNBOOK.md) (operational runbook, 600+ lines)
- 📄 Created: [`docs/IMPLEMENTATION_SUMMARY.md`](./IMPLEMENTATION_SUMMARY.md) (this file)
**Documentation Includes:**
- ✅ System architecture overview with diagrams
- ✅ Scheduler configuration and features
- ✅ Forecast caching strategy and implementation
- ✅ Plan rejection workflow details
- ✅ Timezone configuration guide
- ✅ Monitoring and alerting guidelines
- ✅ API reference for all endpoints
- ✅ Testing procedures (manual and automated)
- ✅ Troubleshooting guide with common issues
- ✅ Maintenance procedures
- ✅ Change log
**Runbook Includes:**
- ✅ Quick reference for common incidents
- ✅ Emergency contact information
- ✅ Step-by-step resolution procedures
- ✅ Health check commands
- ✅ Maintenance mode procedures
- ✅ Metrics to monitor
- ✅ Log patterns to watch
- ✅ Escalation procedures
- ✅ Known issues and workarounds
- ✅ Post-deployment testing checklist
---
## Technical Debt Eliminated
### Resolved Issues
| Issue | Priority | Resolution |
|-------|----------|------------|
| **No automated production scheduling** | 🔴 Critical | ✅ ProductionSchedulerService implemented |
| **Duplicate forecast computations** | 🟡 Medium | ✅ Service-level caching eliminates redundancy |
| **Timezone configuration missing** | 🟡 High | ✅ Tenant timezone field + TimezoneHelper utility |
| **Plan rejection incomplete workflow** | 🟡 Medium | ✅ Full workflow with notifications & regeneration |
| **No monitoring for schedulers** | 🟡 Medium | ✅ Comprehensive Prometheus metrics |
| **Missing operational documentation** | 🟢 Low | ✅ Full docs + runbooks created |
### Code Quality Improvements
-**Zero TODOs** in production planning code
-**100% type hints** on all new code
-**Comprehensive error handling** with structured logging
-**Defensive programming** with fallbacks and graceful degradation
-**Clean separation of concerns** (service/repository/API layers)
-**Reusable patterns** (BaseAlertService, RouteBuilder, etc.)
-**No legacy code** - modern async/await throughout
-**Full observability** - metrics, logs, traces
---
## Files Created (12 new files)
1. [`services/production/app/services/production_scheduler_service.py`](../services/production/app/services/production_scheduler_service.py) - Production scheduler (350 lines)
2. [`services/tenant/migrations/versions/20251009_add_timezone_to_tenants.py`](../services/tenant/migrations/versions/20251009_add_timezone_to_tenants.py) - Timezone migration (25 lines)
3. [`shared/utils/timezone_helper.py`](../shared/utils/timezone_helper.py) - Timezone utilities (300 lines)
4. [`services/forecasting/app/services/forecast_cache.py`](../services/forecasting/app/services/forecast_cache.py) - Forecast caching (450 lines)
5. [`shared/monitoring/scheduler_metrics.py`](../shared/monitoring/scheduler_metrics.py) - Metrics definitions (250 lines)
6. [`docs/PRODUCTION_PLANNING_SYSTEM.md`](./PRODUCTION_PLANNING_SYSTEM.md) - Full documentation (1000+ lines)
7. [`docs/SCHEDULER_RUNBOOK.md`](./SCHEDULER_RUNBOOK.md) - Operational runbook (600+ lines)
8. [`docs/IMPLEMENTATION_SUMMARY.md`](./IMPLEMENTATION_SUMMARY.md) - This summary (current file)
## Files Modified (5 files)
1. [`services/production/app/main.py`](../services/production/app/main.py) - Integrated ProductionSchedulerService
2. [`services/tenant/app/models/tenants.py`](../services/tenant/app/models/tenants.py) - Added timezone field
3. [`services/orders/app/services/procurement_service.py`](../services/orders/app/services/procurement_service.py) - Added rejection workflow
4. [`services/forecasting/app/api/forecasting_operations.py`](../services/forecasting/app/api/forecasting_operations.py) - Integrated caching
5. (Various) - Added metrics collection calls
**Total Lines of Code:** ~3,000+ lines (new functionality + documentation)
---
## Testing & Validation
### Manual Testing Performed
✅ Production scheduler test endpoint works
✅ Procurement scheduler test endpoint works
✅ Forecast cache hit/miss tracking verified
✅ Plan rejection workflow tested with auto-regeneration
✅ Timezone calculation verified for multiple timezones
✅ Leader election tested in multi-instance deployment
✅ Timeout handling verified
✅ Error isolation between tenants confirmed
### Automated Testing Required
The following tests should be added to the test suite:
```python
# Unit Tests
- test_production_scheduler_service.py
- test_procurement_scheduler_service.py
- test_forecast_cache_service.py
- test_timezone_helper.py
- test_plan_rejection_workflow.py
# Integration Tests
- test_scheduler_integration.py
- test_cache_integration.py
- test_rejection_workflow_integration.py
# End-to-End Tests
- test_daily_planning_e2e.py
- test_plan_lifecycle_e2e.py
```
---
## Deployment Checklist
### Pre-Deployment
- [x] All code reviewed and approved
- [x] Documentation complete
- [x] Runbooks created for ops team
- [x] Metrics and alerts configured
- [ ] Integration tests passing (to be implemented)
- [ ] Load testing performed (recommend before production)
- [ ] Backup procedures verified
### Deployment Steps
1. **Database Migrations**
```bash
# Tenant service - add timezone field
kubectl exec -it deployment/tenant-service -- alembic upgrade head
```
2. **Deploy Services (in order)**
```bash
# 1. Deploy tenant service (timezone migration)
kubectl apply -f k8s/tenant-service.yaml
kubectl rollout status deployment/tenant-service
# 2. Deploy forecasting service (caching)
kubectl apply -f k8s/forecasting-service.yaml
kubectl rollout status deployment/forecasting-service
# 3. Deploy orders service (rejection workflow)
kubectl apply -f k8s/orders-service.yaml
kubectl rollout status deployment/orders-service
# 4. Deploy production service (scheduler)
kubectl apply -f k8s/production-service.yaml
kubectl rollout status deployment/production-service
```
3. **Verify Deployment**
```bash
# Check all services healthy
curl http://tenant-service:8000/health
curl http://forecasting-service:8000/health
curl http://orders-service:8000/health
curl http://production-service:8000/health
# Verify schedulers initialized
kubectl logs deployment/production-service | grep "scheduled jobs configured"
kubectl logs deployment/orders-service | grep "scheduled jobs configured"
```
4. **Test Schedulers**
```bash
# Manually trigger test runs
curl -X POST http://production-service:8000/test/production-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
curl -X POST http://orders-service:8000/test/procurement-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
5. **Monitor Metrics**
- Visit Grafana dashboard
- Verify metrics are being collected
- Check alert rules are active
### Post-Deployment
- [ ] Monitor schedulers for 48 hours
- [ ] Verify cache hit rate reaches 70%+
- [ ] Confirm all tenants processed successfully
- [ ] Review logs for unexpected errors
- [ ] Validate metrics and alerts functioning
- [ ] Collect user feedback on plan quality
---
## Performance Benchmarks
### Before Implementation
| Metric | Value | Notes |
|--------|-------|-------|
| Manual production planning | 100% | Operators create schedules manually |
| Forecast calls per day | 2x per product | Orders + Production (if automated) |
| Forecast response time | 2-5 seconds | No caching |
| Plan rejection handling | Manual only | No automated workflow |
| Timezone accuracy | UTC only | Could be wrong for non-UTC tenants |
| Monitoring | Partial | No scheduler-specific metrics |
### After Implementation
| Metric | Value | Improvement |
|--------|-------|-------------|
| Automated production planning | 100% | ✅ Fully automated |
| Forecast calls per day | 1x per product | ✅ 50% reduction |
| Forecast response time (cache hit) | 50-100ms | ✅ 95%+ faster |
| Plan rejection handling | Automated | ✅ Full workflow |
| Timezone accuracy | Per-tenant | ✅ 100% accurate |
| Monitoring | Comprehensive | ✅ 30+ metrics |
---
## Business Impact
### Quantifiable Benefits
1. **Time Savings**
- Production planning: ~30 min/day → automated = **~180 hours/year saved**
- Procurement planning: Already automated, improved with caching
- Operations troubleshooting: Reduced by 50% with better monitoring
2. **Cost Reduction**
- Forecasting service compute: **50% reduction** in forecast generations
- Database load: **30% reduction** in duplicate queries
- Support tickets: Expected **40% reduction** with better monitoring
3. **Accuracy Improvement**
- Timezone accuracy: **100%** (previously could be off by hours)
- Plan consistency: **95%+** (automated → no human error)
- Data freshness: **24 hours** (plans never stale)
### Qualitative Benefits
-**Improved UX**: Operators arrive to ready-made plans
-**Better insights**: Comprehensive metrics enable data-driven decisions
-**Faster troubleshooting**: Runbooks reduce MTTR by 60%+
-**Scalability**: System now handles 10x tenants without changes
-**Reliability**: Automated workflows eliminate human error
-**Compliance**: Full audit trail for all plan changes
---
## Lessons Learned
### What Went Well
1. **Reusing Proven Patterns**: Leveraging BaseAlertService and existing scheduler infrastructure accelerated development
2. **Service-Level Caching**: Implementing cache in Forecasting Service (vs. clients) was the right choice
3. **Comprehensive Documentation**: Writing docs alongside code ensured accuracy and completeness
4. **Timezone Helper Utility**: Creating a reusable utility prevented timezone bugs across services
5. **Parallel Processing**: Processing tenants concurrently with timeouts proved robust
### Challenges Overcome
1. **Timezone Complexity**: Required careful design of TimezoneHelper to handle edge cases
2. **Cache Invalidation**: Needed smart TTL calculation to balance freshness and efficiency
3. **Leader Election**: Ensuring only one scheduler runs required proper RabbitMQ integration
4. **Error Isolation**: Preventing one tenant's failure from affecting others required thoughtful error handling
### Recommendations for Future Work
1. **Add Integration Tests**: Comprehensive test suite for scheduler workflows
2. **Implement Load Testing**: Verify system handles 100+ tenants concurrently
3. **Add UI for Plan Acceptance**: Complete operator workflow with in-app accept/reject
4. **Enhance Analytics**: Add ML-based plan quality scoring
5. **Multi-Region Support**: Extend timezone handling for global deployments
6. **Webhook Support**: Allow external systems to subscribe to plan events
---
## Next Steps
### Immediate (Week 1-2)
- [ ] Deploy to staging environment
- [ ] Perform load testing with 100+ tenants
- [ ] Add integration tests
- [ ] Train operations team on runbook procedures
- [ ] Set up Grafana dashboard
### Short-term (Month 1-2)
- [ ] Deploy to production (phased rollout)
- [ ] Monitor metrics and tune alert thresholds
- [ ] Collect user feedback on automated plans
- [ ] Implement UI for plan acceptance workflow
- [ ] Add webhook support for external integrations
### Long-term (Quarter 2-3)
- [ ] Add ML-based plan quality scoring
- [ ] Implement multi-region timezone support
- [ ] Add advanced caching strategies (prewarming, predictive)
- [ ] Build analytics dashboard for plan performance
- [ ] Optimize scheduler performance for 1000+ tenants
---
## Success Criteria
### Phase 1 Success Criteria ✅
- [x] Production scheduler runs daily at correct time for each tenant
- [x] Schedules generated successfully for 95%+ of tenants
- [x] Zero duplicate schedules per day
- [x] Timezone-accurate execution
- [x] Leader election prevents duplicate runs
### Phase 2 Success Criteria ✅
- [x] Forecast cache hit rate > 70% within 48 hours
- [x] Forecast response time < 200ms for cache hits
- [x] Plan rejection triggers notifications
- [x] Auto-regeneration works for stale data rejections
- [x] All events published to RabbitMQ successfully
### Phase 3 Success Criteria ✅
- [x] All 30+ metrics collecting successfully
- [x] Alert rules configured and firing correctly
- [x] Documentation comprehensive and accurate
- [x] Runbook covers all common scenarios
- [x] Operations team trained and confident
---
## Conclusion
The Production Planning System implementation is **COMPLETE** and **PRODUCTION READY**. All three phases have been successfully implemented, tested, and documented.
The system now provides:
**Fully automated** production and procurement planning
**Timezone-aware** scheduling for global deployments
**Efficient caching** eliminating redundant computations
**Robust workflows** with automatic plan rejection handling
**Complete observability** with metrics, logs, and alerts
**Operational excellence** with comprehensive documentation and runbooks
The implementation exceeded expectations in several areas:
- **Faster development** than estimated (reusing patterns)
- **Better performance** than projected (95%+ cache hit rate expected)
- **More comprehensive** documentation than required
- **Production-ready** with zero known critical issues
**Status:** READY FOR DEPLOYMENT
---
**Document Version:** 1.0
**Created:** 2025-10-09
**Author:** AI Implementation Team
**Reviewed By:** [Pending]
**Approved By:** [Pending]

View File

@@ -1,216 +0,0 @@
# Bakery AI Platform - MVP Gap Analysis Report
## Executive Summary
Based on the detailed bakery research report and analysis of the current platform, this document identifies critical missing features that are preventing the platform from delivering value to Madrid's small bakery owners. While the platform has a solid technical foundation with microservices architecture and AI forecasting capabilities, it lacks several core operational features that are essential for day-to-day bakery management.
## Current Platform Status
### ✅ **Implemented Features**
#### Backend Services (Functional)
- **Authentication Service**: Complete user registration, login, JWT tokens, role-based access
- **Tenant Service**: Multi-tenant architecture, subscription management, team member access
- **Training Service**: ML model training using Prophet for demand forecasting
- **Forecasting Service**: AI-powered demand predictions and alerts
- **Data Service**: Weather data integration (AEMET), traffic data, external data processing
- **Notification Service**: Email and WhatsApp notifications
- **API Gateway**: Centralized routing, rate limiting, service discovery
#### Frontend Features (Functional)
- **Dashboard**: Revenue metrics, weather display, production overview
- **Authentication**: Login/registration pages with proper validation
- **Forecasting**: Demand prediction visualizations, forecast charts
- **Production Planning**: Basic production scheduling interface
- **Order Management**: Mock order display with supplier information
- **Settings**: User profile and basic configuration
#### Technical Infrastructure
- Microservices architecture with Docker containerization
- PostgreSQL databases per service with proper migrations
- RabbitMQ message queuing for inter-service communication
- Monitoring with Prometheus and Grafana
- Comprehensive error handling and logging
### ❌ **Critical Missing Features for MVP Launch**
## 1. **INVENTORY MANAGEMENT SYSTEM** 🚨 **HIGHEST PRIORITY**
### **Problem Identified**:
According to the bakery research, manual inventory tracking is described as "too cumbersome," "time-consuming," and highly susceptible to "mistakes." This leads to:
- 1.5% to 20% losses due to spoilage and waste
- Production delays during peak hours
- Quality inconsistencies
- Lost sales opportunities
### **Missing Components**:
- **Ingredient tracking**: Real-time stock levels for flour, yeast, dairy products
- **Automatic reordering**: FIFO/FEFO expiration date management
- **Spoilage monitoring**: Track and predict ingredient expiration
- **Stock alerts**: Low stock warnings integrated with production planning
- **Barcode/QR scanning**: Easy inventory updates without manual entry
- **Supplier integration**: Automated ordering from suppliers like Harinas Castellana
### **Required Implementation**:
```
Backend Services Needed:
- Inventory Service (new microservice)
- Supplier Service (new microservice)
- Integration with existing Forecasting Service
Frontend Components Needed:
- Real-time inventory dashboard
- Mobile-friendly inventory scanning
- Automated reorder interface
- Expiration date tracking
```
## 2. **RECIPE & PRODUCTION MANAGEMENT** 🚨 **HIGH PRIORITY**
### **Problem Identified**:
Individual bakeries struggle with production planning complexity due to:
- Wide variety of products with different preparation times
- Manual calculation of ingredient quantities
- Lack of standardized recipes affecting quality consistency
### **Missing Components**:
- **Digital recipe management**: Store recipes with exact measurements
- **Bill of Materials (BOM)**: Automatic ingredient calculation based on production volume
- **Yield tracking**: Compare actual vs. expected production output
- **Cost calculation**: Real-time cost per product based on current ingredient prices
- **Production workflow**: Step-by-step production guidance
- **Quality control**: Track temperature, humidity, timing parameters
## 3. **SUPPLIER & PROCUREMENT SYSTEM** 🚨 **HIGH PRIORITY**
### **Problem Identified**:
Research shows small bakeries face "low buyer power" and struggle with:
- Manual ordering processes via phone/WhatsApp
- Difficulty tracking supplier performance
- Limited negotiation power with suppliers
### **Missing Components**:
- **Supplier database**: Contact information, lead times, reliability ratings
- **Purchase order system**: Digital ordering with approval workflows
- **Price comparison**: Compare prices across multiple suppliers
- **Delivery tracking**: Monitor order status and delivery reliability
- **Payment terms**: Track payment schedules and supplier agreements
- **Performance analytics**: Supplier reliability and cost analysis
## 4. **SALES DATA INTEGRATION** 🚨 **HIGH PRIORITY**
### **Problem Identified**:
Current forecasting relies on manual data entry. Research shows bakeries need:
- Integration with POS systems
- Historical sales pattern analysis
- External factor correlation (weather, events, holidays)
### **Missing Components**:
- **POS Integration**: Automatic sales data import from common Spanish POS systems
- **Manual sales entry**: Simple interface for bakeries without POS
- **Product categorization**: Organize sales by bread types, pastries, seasonal items
- **Customer analytics**: Track popular products and buying patterns
- **Seasonal adjustments**: Account for holidays, local events, weather impacts
## 5. **WASTE TRACKING & REDUCTION** 🚨 **MEDIUM PRIORITY**
### **Problem Identified**:
Research indicates waste reduction potential of 20-40% through AI optimization:
- Unsold products (1.5% of production)
- Ingredient spoilage
- Production errors
### **Missing Components**:
- **Daily waste logging**: Track unsold products, spoiled ingredients
- **Waste analytics**: Identify patterns in waste generation
- **Dynamic pricing**: Reduce prices on items approaching expiration
- **Donation tracking**: Manage food donations to reduce total waste
- **Cost impact analysis**: Calculate financial impact of waste reduction
## 6. **MOBILE-FIRST INTERFACE** 🚨 **MEDIUM PRIORITY**
### **Problem Identified**:
Research emphasizes bakery owners work demanding schedules starting at 4:30 AM and need "mobile accessibility" for on-the-go management.
### **Missing Components**:
- **Mobile-responsive design**: Current frontend is not optimized for mobile
- **Offline capabilities**: Work without internet connection
- **Quick actions**: Fast inventory checks, order placement
- **Voice input**: Hands-free operation in production environment
- **QR code scanning**: For inventory and product management
## 7. **FINANCIAL MANAGEMENT** 🚨 **LOW PRIORITY**
### **Problem Identified**:
With 75-85% of revenue consumed by operating costs and 4-9% profit margins, bakeries need precise cost control.
### **Missing Components**:
- **Cost tracking**: Monitor food costs (25-35% of sales) and labor costs (24-40% of sales)
- **Profit analysis**: Real-time profit margins per product
- **Budget planning**: Monthly expense forecasting
- **Tax preparation**: VAT calculations, expense categorization
- **Financial reporting**: P&L statements, cash flow analysis
## Implementation Priority Matrix
| Feature | Business Impact | Technical Complexity | Implementation Time | Priority |
|---------|----------------|---------------------|-------------------|----------|
| Inventory Management | Very High | Medium | 6-8 weeks | 1 |
| Recipe & BOM System | Very High | Medium | 4-6 weeks | 2 |
| Supplier Management | High | Low-Medium | 4-5 weeks | 3 |
| Sales Data Integration | High | Medium | 3-4 weeks | 4 |
| Waste Tracking | Medium | Low | 2-3 weeks | 5 |
| Mobile Optimization | Medium | Medium | 4-6 weeks | 6 |
| Financial Management | Low | High | 8-10 weeks | 7 |
## Technical Architecture Requirements
### New Microservices Needed:
1. **Inventory Service** - Real-time stock management, expiration tracking
2. **Recipe Service** - Digital recipes, BOM calculations, cost management
3. **Supplier Service** - Supplier database, purchase orders, performance tracking
4. **Integration Service** - POS system connectors, external data feeds
### Database Schema Extensions:
- Products table with recipes and ingredient relationships
- Inventory transactions with batch/lot tracking
- Supplier master data with performance metrics
- Purchase orders with approval workflows
### Frontend Components Required:
- Mobile-responsive inventory management interface
- Recipe editor with drag-drop ingredient addition
- Supplier portal for order placement and tracking
- Real-time dashboard with critical alerts
## MVP Launch Recommendations
### Phase 1 (8-10 weeks): Core Operations
- Implement Inventory Management System
- Build Recipe & BOM functionality
- Create Supplier Management portal
- Mobile UI optimization
### Phase 2 (4-6 weeks): Data Integration
- POS system integrations
- Enhanced sales data processing
- Waste tracking implementation
### Phase 3 (6-8 weeks): Advanced Features
- Financial management tools
- Advanced analytics and reporting
- Performance optimization
## Conclusion
The current platform has excellent technical foundations but lacks the core operational features that small Madrid bakeries desperately need. The research clearly shows that **inventory management inefficiencies are the #1 pain point**, causing 1.5-20% losses and significant operational stress.
**Without implementing inventory management, recipe management, and supplier systems, the platform cannot deliver the value proposition of waste reduction and cost savings that bakeries require for survival.**
The recommended approach is to focus on the top 4 priority features for MVP launch, which will provide immediate tangible value to bakery owners and justify the platform subscription costs.
---
**Report Generated**: January 2025
**Status**: MVP Gap Analysis Complete
**Next Actions**: Begin Phase 1 implementation planning

View File

@@ -1,324 +0,0 @@
# 🚀 AI-Powered Onboarding Automation Implementation
## Overview
This document details the complete implementation of the intelligent onboarding automation system that transforms the bakery AI platform from manual setup to automated inventory creation using AI-powered product classification.
## 🎯 Business Impact
**Before**: Manual file upload → Manual inventory setup → Training (2-3 hours)
**After**: Upload file → AI creates inventory → Training (5-10 minutes)
- **80% reduction** in onboarding time
- **Automated inventory creation** from historical sales data
- **Business model intelligence** (Production/Retail/Hybrid detection)
- **Zero technical knowledge required** from users
## 🏗️ Architecture Overview
### Backend Services
#### 1. Sales Service (`/services/sales/`)
**New Components:**
- `app/api/onboarding.py` - 3-step onboarding API endpoints
- `app/services/onboarding_import_service.py` - Orchestrates the automation workflow
- `app/services/inventory_client.py` - Enhanced with AI classification integration
**API Endpoints:**
```
POST /api/v1/tenants/{tenant_id}/onboarding/analyze
POST /api/v1/tenants/{tenant_id}/onboarding/create-inventory
POST /api/v1/tenants/{tenant_id}/onboarding/import-sales
GET /api/v1/tenants/{tenant_id}/onboarding/business-model-guide
```
#### 2. Inventory Service (`/services/inventory/`)
**New Components:**
- `app/api/classification.py` - AI product classification endpoints
- `app/services/product_classifier.py` - 300+ bakery product classification engine
- Enhanced inventory models for dual product types (ingredients + finished products)
**AI Classification Engine:**
```
POST /api/v1/tenants/{tenant_id}/inventory/classify-product
POST /api/v1/tenants/{tenant_id}/inventory/classify-products-batch
```
### Frontend Components
#### 1. Enhanced Onboarding Page (`/frontend/src/pages/onboarding/OnboardingPage.tsx`)
**Features:**
- Smart/Traditional import mode toggle
- Conditional navigation (hides buttons during smart import)
- Integrated business model detection
- Seamless transition to training phase
#### 2. Smart Import Component (`/frontend/src/components/onboarding/SmartHistoricalDataImport.tsx`)
**Phase-Based UI:**
- **Upload Phase**: Drag-and-drop with file validation
- **Analysis Phase**: AI processing with progress indicators
- **Review Phase**: Interactive suggestion cards with approval toggles
- **Creation Phase**: Automated inventory creation
- **Import Phase**: Historical data mapping and import
#### 3. Enhanced API Services (`/frontend/src/api/services/onboarding.service.ts`)
**New Methods:**
```typescript
analyzeSalesDataForOnboarding(tenantId, file)
createInventoryFromSuggestions(tenantId, suggestions)
importSalesWithInventory(tenantId, file, mapping)
getBusinessModelGuide(tenantId, model)
```
## 🧠 AI Classification Engine
### Product Categories Supported
#### Ingredients (Production Bakeries)
- **Flour & Grains**: 15+ varieties (wheat, rye, oat, corn, etc.)
- **Yeast & Fermentation**: Fresh, dry, instant, sourdough starters
- **Dairy Products**: Milk, cream, butter, cheese, yogurt
- **Eggs**: Whole, whites, yolks
- **Sweeteners**: Sugar, honey, syrups, artificial sweeteners
- **Fats**: Oils, margarine, lard, specialty fats
- **Spices & Flavorings**: 20+ common bakery spices
- **Additives**: Baking powder, soda, cream of tartar, lecithin
- **Packaging**: Bags, containers, wrapping materials
#### Finished Products (Retail Bakeries)
- **Bread**: 10+ varieties (white, whole grain, artisan, etc.)
- **Pastries**: Croissants, Danish, puff pastry items
- **Cakes**: Layer cakes, cheesecakes, specialty cakes
- **Cookies**: 8+ varieties from shortbread to specialty
- **Muffins & Quick Breads**: Sweet and savory varieties
- **Sandwiches**: Prepared items for immediate sale
- **Beverages**: Coffee, tea, juices, hot chocolate
### Business Model Detection
**Algorithm analyzes ingredient ratio:**
- **Production Model** (≥70% ingredients): Focus on recipe management, supplier relationships
- **Retail Model** (≤30% ingredients): Focus on central baker relationships, freshness monitoring
- **Hybrid Model** (30-70% ingredients): Balanced approach with both features
### Confidence Scoring
- **High Confidence (≥70%)**: Auto-approved suggestions
- **Medium Confidence (40-69%)**: Flagged for review
- **Low Confidence (<40%)**: Requires manual verification
## 🔄 Three-Phase Workflow
### Phase 1: AI Analysis
```mermaid
graph LR
A[Upload File] --> B[Parse Data]
B --> C[Extract Products]
C --> D[AI Classification]
D --> E[Business Model Detection]
E --> F[Generate Suggestions]
```
**Input**: CSV/Excel/JSON with sales data
**Processing**: Product name extraction AI classification Confidence scoring
**Output**: Structured suggestions with business model analysis
### Phase 2: Review & Approval
```mermaid
graph LR
A[Display Suggestions] --> B[User Review]
B --> C[Modify if Needed]
C --> D[Approve Items]
D --> E[Create Inventory]
```
**Features**:
- Interactive suggestion cards
- Bulk approve/reject options
- Real-time confidence indicators
- Modification support
### Phase 3: Automated Import
```mermaid
graph LR
A[Create Inventory Items] --> B[Generate Mapping]
B --> C[Map Historical Sales]
C --> D[Import with References]
D --> E[Complete Setup]
```
**Process**:
- Creates inventory items via API
- Maps product names to inventory IDs
- Imports historical sales with proper references
- Maintains data integrity
## 📊 Business Model Intelligence
### Production Bakery Recommendations
- Set up supplier relationships for ingredients
- Configure recipe management and costing
- Enable production planning and scheduling
- Set up ingredient inventory alerts and reorder points
### Retail Bakery Recommendations
- Configure central baker relationships
- Set up delivery schedules and tracking
- Enable finished product freshness monitoring
- Focus on sales forecasting and ordering
### Hybrid Bakery Recommendations
- Configure both ingredient and finished product management
- Set up flexible inventory categories
- Enable comprehensive analytics
- Plan workflows for both business models
## 🛡️ Error Handling & Fallbacks
### File Validation
- **Format Support**: CSV, Excel (.xlsx, .xls), JSON
- **Size Limits**: 10MB maximum
- **Encoding**: Auto-detection (UTF-8, Latin-1, CP1252)
- **Structure Validation**: Required columns detection
### Graceful Degradation
- **AI Classification Fails** Fallback suggestions generated
- **Network Issues** Traditional import mode available
- **Validation Errors** Smart import suggestions with helpful guidance
- **Low Confidence** Manual review prompts
### Data Integrity
- **Atomic Operations**: All-or-nothing inventory creation
- **Validation**: Product name uniqueness checks
- **Rollback**: Failed operations don't affect existing data
- **Audit Trail**: Complete import history tracking
## 🎨 UX/UI Design Principles
### Progressive Enhancement
- **Smart by Default**: AI-powered import is the primary experience
- **Traditional Fallback**: Manual mode available for edge cases
- **Contextual Switching**: Easy toggle between modes with clear benefits
### Visual Feedback
- **Progress Indicators**: Clear phase progression
- **Confidence Colors**: Green (high), Yellow (medium), Red (low)
- **Real-time Updates**: Instant feedback during processing
- **Success Celebrations**: Completion animations and confetti
### Mobile-First Design
- **Responsive Layout**: Works on all screen sizes
- **Touch-Friendly**: Large buttons and touch targets
- **Gesture Support**: Swipe and pinch interactions
- **Offline Indicators**: Clear connectivity status
## 📈 Performance Optimizations
### Backend Optimizations
- **Async Processing**: Non-blocking AI classification
- **Batch Operations**: Bulk product processing
- **Database Indexing**: Optimized queries for product lookup
- **Caching**: Redis cache for classification results
### Frontend Optimizations
- **Lazy Loading**: Components loaded on demand
- **File Streaming**: Large file processing without memory issues
- **Progressive Enhancement**: Core functionality first, enhancements second
- **Error Boundaries**: Isolated failure handling
## 🧪 Testing Strategy
### Unit Tests
- AI classification accuracy (>90% for common products)
- Business model detection precision
- API endpoint validation
- File parsing robustness
### Integration Tests
- End-to-end onboarding workflow
- Service communication validation
- Database transaction integrity
- Error handling scenarios
### User Acceptance Tests
- Bakery owner onboarding simulation
- Different file format validation
- Business model detection accuracy
- Mobile device compatibility
## 🚀 Deployment & Rollout
### Feature Flags
- **Smart Import Toggle**: Can be disabled per tenant
- **AI Confidence Thresholds**: Adjustable based on feedback
- **Business Model Detection**: Can be bypassed if needed
### Monitoring & Analytics
- **Onboarding Completion Rates**: Track improvement vs traditional
- **AI Classification Accuracy**: Monitor and improve over time
- **User Satisfaction**: NPS scoring on completion
- **Performance Metrics**: Processing time and success rates
### Gradual Rollout
1. **Beta Testing**: Select bakery owners
2. **Regional Rollout**: Madrid market first
3. **Full Release**: All markets with monitoring
4. **Optimization**: Continuous improvement based on data
## 📚 Documentation & Training
### User Documentation
- **Video Tutorials**: Step-by-step onboarding guide
- **Help Articles**: Troubleshooting common issues
- **Best Practices**: File preparation guidelines
- **FAQ**: Common questions and answers
### Developer Documentation
- **API Reference**: Complete endpoint documentation
- **Architecture Guide**: Service interaction diagrams
- **Deployment Guide**: Infrastructure setup
- **Troubleshooting**: Common issues and solutions
## 🔮 Future Enhancements
### AI Improvements
- **Learning from Corrections**: User feedback training
- **Multi-language Support**: International product names
- **Image Recognition**: Product photo classification
- **Seasonal Intelligence**: Holiday and seasonal product detection
### Advanced Features
- **Predictive Inventory**: AI-suggested initial stock levels
- **Supplier Matching**: Automatic supplier recommendations
- **Recipe Suggestions**: AI-generated recipes from ingredients
- **Market Intelligence**: Competitive analysis integration
### User Experience
- **Voice Upload**: Dictated product lists
- **Barcode Scanning**: Product identification via camera
- **Augmented Reality**: Visual inventory setup guide
- **Collaborative Setup**: Multi-user onboarding process
## 📋 Success Metrics
### Quantitative KPIs
- **Onboarding Time**: Target <10 minutes (vs 2-3 hours)
- **Completion Rate**: Target >95% (vs ~60%)
- **AI Accuracy**: Target >90% classification accuracy
- **User Satisfaction**: Target NPS >8.5
### Qualitative Indicators
- **Reduced Support Tickets**: Fewer onboarding-related issues
- **Positive Feedback**: User testimonials and reviews
- **Feature Adoption**: High smart import usage rates
- **Business Growth**: Faster time-to-value for new customers
## 🎉 Conclusion
The AI-powered onboarding automation system successfully transforms the bakery AI platform into a truly intelligent, user-friendly solution. By reducing friction, automating complex tasks, and providing business intelligence, this implementation delivers on the promise of making bakery management as smooth and simple as possible.
The system is designed for scalability, maintainability, and continuous improvement, ensuring it will evolve with user needs and technological advances.
---
**Implementation Status**: ✅ Complete
**Last Updated**: 2025-01-13
**Next Review**: 2025-02-13

View File

@@ -1,718 +0,0 @@
# Production Planning System Documentation
## Overview
The Production Planning System automates daily production and procurement scheduling for bakery operations. The system consists of two primary schedulers that run every morning to generate plans based on demand forecasts, inventory levels, and capacity constraints.
**Last Updated:** 2025-10-09
**Version:** 2.0 (Automated Scheduling)
**Status:** Production Ready
---
## Architecture
### System Components
```
┌─────────────────────────────────────────────────────────────────┐
│ DAILY PLANNING WORKFLOW │
└─────────────────────────────────────────────────────────────────┘
05:30 AM → Production Scheduler
├─ Generates production schedules for all tenants
├─ Calls Forecasting Service (cached) for demand
├─ Calls Orders Service for demand requirements
├─ Creates production batches
└─ Sends notifications to production managers
06:00 AM → Procurement Scheduler
├─ Generates procurement plans for all tenants
├─ Calls Forecasting Service (cached - reuses cached data!)
├─ Calls Inventory Service for stock levels
├─ Matches suppliers for requirements
└─ Sends notifications to procurement managers
08:00 AM → Operators review plans
├─ Accept → Plans move to "approved" status
├─ Reject → Automatic regeneration if stale data detected
└─ Modify → Recalculate and resubmit
Throughout Day → Alert services monitor execution
├─ Production delays
├─ Capacity issues
├─ Quality problems
└─ Equipment failures
```
### Services Involved
| Service | Role | Endpoints |
|---------|------|-----------|
| **Production Service** | Generates daily production schedules | `POST /api/v1/{tenant_id}/production/operations/schedule` |
| **Orders Service** | Generates daily procurement plans | `POST /api/v1/{tenant_id}/orders/operations/procurement/generate` |
| **Forecasting Service** | Provides demand predictions (cached) | `POST /api/v1/{tenant_id}/forecasting/operations/single` |
| **Inventory Service** | Provides current stock levels | `GET /api/v1/{tenant_id}/inventory/products` |
| **Tenant Service** | Provides timezone configuration | `GET /api/v1/tenants/{tenant_id}` |
---
## Schedulers
### 1. Production Scheduler
**Service:** Production Service
**Class:** `ProductionSchedulerService`
**File:** [`services/production/app/services/production_scheduler_service.py`](../services/production/app/services/production_scheduler_service.py)
#### Schedule
| Job | Time | Purpose | Grace Period |
|-----|------|---------|--------------|
| **Daily Production Planning** | 5:30 AM (tenant timezone) | Generate next-day production schedules | 5 minutes |
| **Stale Schedule Cleanup** | 5:50 AM | Archive/cancel old schedules, send escalations | 5 minutes |
| **Test Mode** | Every 30 min (DEBUG only) | Development/testing | 5 minutes |
#### Features
-**Timezone-aware**: Respects tenant timezone configuration
-**Leader election**: Only one instance runs in distributed deployment
-**Idempotent**: Checks if schedule exists before creating
-**Parallel processing**: Processes tenants concurrently with timeouts
-**Error isolation**: Tenant failures don't affect others
-**Demo tenant filtering**: Excludes demo tenants from automation
#### Workflow
1. **Tenant Discovery**: Fetch all active non-demo tenants
2. **Parallel Processing**: Process each tenant concurrently (180s timeout)
3. **Date Calculation**: Use tenant timezone to determine target date
4. **Duplicate Check**: Skip if schedule already exists
5. **Requirements Calculation**: Call `calculate_daily_requirements()`
6. **Schedule Creation**: Create schedule with status "draft"
7. **Batch Generation**: Create production batches from requirements
8. **Notification**: Send alert to production managers
9. **Monitoring**: Record metrics for observability
#### Configuration
```python
# Environment Variables
PRODUCTION_TEST_MODE=false # Enable 30-minute test job
DEBUG=false # Enable verbose logging
# Tenant Configuration
tenant.timezone=Europe/Madrid # IANA timezone string
```
---
### 2. Procurement Scheduler
**Service:** Orders Service
**Class:** `ProcurementSchedulerService`
**File:** [`services/orders/app/services/procurement_scheduler_service.py`](../services/orders/app/services/procurement_scheduler_service.py)
#### Schedule
| Job | Time | Purpose | Grace Period |
|-----|------|---------|--------------|
| **Daily Procurement Planning** | 6:00 AM (tenant timezone) | Generate next-day procurement plans | 5 minutes |
| **Stale Plan Cleanup** | 6:30 AM | Archive/cancel old plans, send reminders | 5 minutes |
| **Weekly Optimization** | Monday 7:00 AM | Weekly procurement optimization review | 10 minutes |
| **Test Mode** | Every 30 min (DEBUG only) | Development/testing | 5 minutes |
#### Features
-**Timezone-aware**: Respects tenant timezone configuration
-**Leader election**: Prevents duplicate runs
-**Idempotent**: Checks if plan exists before generating
-**Parallel processing**: 120s timeout per tenant
-**Forecast fallback**: Uses historical data if forecast unavailable
-**Critical stock alerts**: Automatic alerts for zero-stock items
-**Rejection workflow**: Auto-regeneration for rejected plans
#### Workflow
1. **Tenant Discovery**: Fetch active non-demo tenants
2. **Parallel Processing**: Process each tenant (120s timeout)
3. **Date Calculation**: Use tenant timezone
4. **Duplicate Check**: Skip if plan exists (unless force_regenerate)
5. **Forecasting**: Call Forecasting Service (uses cache!)
6. **Inventory Check**: Get current stock levels
7. **Requirements Calculation**: Calculate net requirements
8. **Supplier Matching**: Find suitable suppliers
9. **Plan Creation**: Create plan with status "draft"
10. **Critical Alerts**: Send alerts for critical items
11. **Notification**: Notify procurement managers
12. **Caching**: Cache plan in Redis (6h TTL)
---
## Forecast Caching
### Overview
To eliminate redundant forecast computations, the Forecasting Service now includes a service-level Redis cache. Both Production and Procurement schedulers benefit from this without any code changes.
**File:** [`services/forecasting/app/services/forecast_cache.py`](../services/forecasting/app/services/forecast_cache.py)
### Cache Strategy
```
Key Format: forecast:{tenant_id}:{product_id}:{forecast_date}
TTL: Until midnight of day after forecast_date
Example: forecast:abc-123:prod-456:2025-10-10 → expires 2025-10-11 00:00:00
```
### Cache Flow
```
Client Request → Forecasting API
Check Redis Cache
├─ HIT → Return cached result (add 'cached: true')
└─ MISS → Generate forecast
Cache result (TTL)
Return result
```
### Benefits
| Metric | Before Caching | After Caching | Improvement |
|--------|---------------|---------------|-------------|
| **Duplicate Forecasts** | 2x per day (Production + Procurement) | 1x per day | 50% reduction |
| **Forecast Response Time** | ~2-5 seconds | ~50-100ms (cache hit) | 95%+ faster |
| **Forecasting Service Load** | 100% | 50% | 50% reduction |
| **Cache Hit Rate** | N/A | ~80-90% (expected) | - |
### Cache Invalidation
Forecasts are invalidated when:
1. **TTL Expiry**: Automatic at midnight after forecast_date
2. **Model Retraining**: When ML model is retrained for product
3. **Manual Invalidation**: Via API endpoint (admin only)
```python
# Invalidate specific product forecasts
DELETE /api/v1/{tenant_id}/forecasting/cache/product/{product_id}
# Invalidate all tenant forecasts
DELETE /api/v1/{tenant_id}/forecasting/cache
# Invalidate all forecasts (use with caution!)
DELETE /admin/forecasting/cache/all
```
---
## Plan Rejection Workflow
### Overview
When a procurement plan is rejected by an operator, the system automatically handles the rejection with notifications and optional regeneration.
**File:** [`services/orders/app/services/procurement_service.py`](../services/orders/app/services/procurement_service.py:1244-1404)
### Rejection Flow
```
User Rejects Plan (status → "cancelled")
Record rejection in approval_workflow (JSONB)
Send notification to stakeholders
Publish rejection event (RabbitMQ)
Analyze rejection reason
├─ Contains "stale", "outdated", etc. → Auto-regenerate
└─ Other reason → Manual regeneration required
Schedule regeneration (if applicable)
Send regeneration request event
```
### Auto-Regeneration Keywords
Plans are automatically regenerated if rejection notes contain:
- `stale`
- `outdated`
- `old data`
- `datos antiguos` (Spanish)
- `desactualizado` (Spanish)
- `obsoleto` (Spanish)
### Events Published
| Event | Routing Key | Consumers |
|-------|-------------|-----------|
| **Plan Rejected** | `procurement.plan.rejected` | Alert Service, UI Notifications |
| **Regeneration Requested** | `procurement.plan.regeneration_requested` | Procurement Scheduler |
| **Plan Status Changed** | `procurement.plan.status_changed` | Inventory Service, Dashboard |
---
## Timezone Configuration
### Overview
All schedulers are timezone-aware to ensure accurate "daily" execution relative to the bakery's local time.
### Tenant Configuration
**Model:** `Tenant`
**File:** [`services/tenant/app/models/tenants.py`](../services/tenant/app/models/tenants.py:32-33)
**Field:** `timezone` (String, default: `"Europe/Madrid"`)
**Migration:** [`services/tenant/migrations/versions/20251009_add_timezone_to_tenants.py`](../services/tenant/migrations/versions/20251009_add_timezone_to_tenants.py)
### Supported Timezones
All IANA timezone strings are supported. Common examples:
- `Europe/Madrid` (Spain - CEST/CET)
- `Europe/London` (UK - BST/GMT)
- `America/New_York` (US Eastern)
- `America/Los_Angeles` (US Pacific)
- `Asia/Tokyo` (Japan)
- `UTC` (Universal Time)
### Usage in Schedulers
```python
from shared.utils.timezone_helper import TimezoneHelper
# Get current date in tenant's timezone
target_date = TimezoneHelper.get_current_date_in_timezone(tenant_tz)
# Get current datetime in tenant's timezone
now = TimezoneHelper.get_current_datetime_in_timezone(tenant_tz)
# Check if within business hours
is_business_hours = TimezoneHelper.is_business_hours(
timezone_str=tenant_tz,
start_hour=8,
end_hour=20
)
```
---
## Monitoring & Alerts
### Prometheus Metrics
**File:** [`shared/monitoring/scheduler_metrics.py`](../shared/monitoring/scheduler_metrics.py)
#### Key Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `production_schedules_generated_total` | Counter | Total production schedules generated (by tenant, status) |
| `production_schedule_generation_duration_seconds` | Histogram | Time to generate schedule per tenant |
| `procurement_plans_generated_total` | Counter | Total procurement plans generated (by tenant, status) |
| `procurement_plan_generation_duration_seconds` | Histogram | Time to generate plan per tenant |
| `forecast_cache_hits_total` | Counter | Forecast cache hits (by tenant) |
| `forecast_cache_misses_total` | Counter | Forecast cache misses (by tenant) |
| `forecast_cache_hit_rate` | Gauge | Cache hit rate percentage (0-100) |
| `procurement_plan_rejections_total` | Counter | Plan rejections (by tenant, auto_regenerated) |
| `scheduler_health_status` | Gauge | Scheduler health (1=healthy, 0=unhealthy) |
| `tenant_processing_timeout_total` | Counter | Tenant processing timeouts (by service) |
### Recommended Alerts
```yaml
# Alert: Daily production planning failed
- alert: DailyProductionPlanningFailed
expr: production_schedules_generated_total{status="failure"} > 0
for: 10m
labels:
severity: high
annotations:
summary: "Daily production planning failed for at least one tenant"
description: "Check production scheduler logs for tenant {{ $labels.tenant_id }}"
# Alert: Daily procurement planning failed
- alert: DailyProcurementPlanningFailed
expr: procurement_plans_generated_total{status="failure"} > 0
for: 10m
labels:
severity: high
annotations:
summary: "Daily procurement planning failed for at least one tenant"
description: "Check procurement scheduler logs for tenant {{ $labels.tenant_id }}"
# Alert: No production schedules in 24 hours
- alert: NoProductionSchedulesGenerated
expr: rate(production_schedules_generated_total{status="success"}[24h]) == 0
for: 1h
labels:
severity: critical
annotations:
summary: "No production schedules generated in last 24 hours"
description: "Production scheduler may be down or misconfigured"
# Alert: Forecast cache hit rate low
- alert: ForecastCacheHitRateLow
expr: forecast_cache_hit_rate < 50
for: 30m
labels:
severity: warning
annotations:
summary: "Forecast cache hit rate below 50%"
description: "Cache may not be functioning correctly for tenant {{ $labels.tenant_id }}"
# Alert: High tenant processing timeouts
- alert: HighTenantProcessingTimeouts
expr: rate(tenant_processing_timeout_total[5m]) > 0.1
for: 15m
labels:
severity: warning
annotations:
summary: "High rate of tenant processing timeouts"
description: "{{ $labels.service }} scheduler experiencing timeouts for tenant {{ $labels.tenant_id }}"
# Alert: Scheduler unhealthy
- alert: SchedulerUnhealthy
expr: scheduler_health_status == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Scheduler is unhealthy"
description: "{{ $labels.service }} {{ $labels.scheduler_type }} scheduler is reporting unhealthy status"
```
### Grafana Dashboard
Create dashboard with panels for:
1. **Scheduler Success Rate** (line chart)
- `production_schedules_generated_total{status="success"}`
- `procurement_plans_generated_total{status="success"}`
2. **Schedule Generation Duration** (heatmap)
- `production_schedule_generation_duration_seconds`
- `procurement_plan_generation_duration_seconds`
3. **Forecast Cache Hit Rate** (gauge)
- `forecast_cache_hit_rate`
4. **Tenant Processing Status** (pie chart)
- `production_tenants_processed_total`
- `procurement_tenants_processed_total`
5. **Plan Rejections** (table)
- `procurement_plan_rejections_total`
6. **Scheduler Health** (status panel)
- `scheduler_health_status`
---
## Testing
### Manual Testing
#### Test Production Scheduler
```bash
# Trigger test production schedule generation
curl -X POST http://production-service:8000/test/production-scheduler \
-H "Authorization: Bearer $TOKEN"
# Expected response:
{
"message": "Production scheduler test triggered successfully"
}
```
#### Test Procurement Scheduler
```bash
# Trigger test procurement plan generation
curl -X POST http://orders-service:8000/test/procurement-scheduler \
-H "Authorization: Bearer $TOKEN"
# Expected response:
{
"message": "Procurement scheduler test triggered successfully"
}
```
### Automated Testing
```python
# Test production scheduler
async def test_production_scheduler():
scheduler = ProductionSchedulerService(config)
await scheduler.start()
await scheduler.test_production_schedule_generation()
assert scheduler._checks_performed > 0
# Test procurement scheduler
async def test_procurement_scheduler():
scheduler = ProcurementSchedulerService(config)
await scheduler.start()
await scheduler.test_procurement_generation()
assert scheduler._checks_performed > 0
# Test forecast caching
async def test_forecast_cache():
cache = get_forecast_cache_service(redis_url)
# Cache forecast
await cache.cache_forecast(tenant_id, product_id, forecast_date, data)
# Retrieve cached forecast
cached = await cache.get_cached_forecast(tenant_id, product_id, forecast_date)
assert cached is not None
assert cached['cached'] == True
```
---
## Troubleshooting
### Scheduler Not Running
**Symptoms:** No schedules/plans generated in morning
**Checks:**
1. Verify scheduler service is running: `kubectl get pods -n production`
2. Check scheduler health endpoint: `curl http://service:8000/health`
3. Check APScheduler status in logs: `grep "scheduler" logs/production.log`
4. Verify leader election (distributed setup): Check `is_leader` in logs
**Solutions:**
- Restart service: `kubectl rollout restart deployment/production-service`
- Check environment variables: `PRODUCTION_TEST_MODE`, `DEBUG`
- Verify database connectivity
- Check RabbitMQ connectivity for leader election
### Timezone Issues
**Symptoms:** Schedules generated at wrong time
**Checks:**
1. Check tenant timezone configuration:
```sql
SELECT id, name, timezone FROM tenants WHERE id = '{tenant_id}';
```
2. Verify server timezone: `date` (should be UTC in containers)
3. Check logs for timezone warnings
**Solutions:**
- Update tenant timezone: `UPDATE tenants SET timezone = 'Europe/Madrid' WHERE id = '{tenant_id}';`
- Verify TimezoneHelper is being used in schedulers
- Check cron trigger configuration uses correct timezone
### Low Cache Hit Rate
**Symptoms:** `forecast_cache_hit_rate < 50%`
**Checks:**
1. Verify Redis is running: `redis-cli ping`
2. Check cache keys: `redis-cli KEYS "forecast:*"`
3. Check TTL on cache entries: `redis-cli TTL "forecast:{tenant}:{product}:{date}"`
4. Review logs for cache errors
**Solutions:**
- Restart Redis if unhealthy
- Clear cache and let it rebuild: `redis-cli FLUSHDB`
- Verify REDIS_URL environment variable
- Check Redis memory limits: `redis-cli INFO memory`
### Plan Rejection Not Auto-Regenerating
**Symptoms:** Rejected plans not triggering regeneration
**Checks:**
1. Check rejection notes contain auto-regenerate keywords
2. Verify RabbitMQ events are being published: Check `procurement.plan.rejected` queue
3. Check scheduler is listening to regeneration events
**Solutions:**
- Use keywords like "stale" or "outdated" in rejection notes
- Manually trigger regeneration via API
- Check RabbitMQ connectivity
- Verify event routing keys are correct
### Tenant Processing Timeouts
**Symptoms:** `tenant_processing_timeout_total` increasing
**Checks:**
1. Check timeout duration (180s for production, 120s for procurement)
2. Review slow queries in database logs
3. Check external service response times (Forecasting, Inventory)
4. Monitor CPU/memory usage during scheduler runs
**Solutions:**
- Increase timeout if consistently hitting limit
- Optimize database queries (add indexes)
- Scale external services if response time high
- Process fewer tenants in parallel (reduce concurrency)
---
## Maintenance
### Scheduled Maintenance Windows
When performing maintenance on schedulers:
1. **Announce downtime** to users (UI banner)
2. **Disable schedulers** temporarily:
```python
# Set environment variable
SCHEDULER_DISABLED=true
```
3. **Perform maintenance** (database migrations, service updates)
4. **Re-enable schedulers**:
```python
SCHEDULER_DISABLED=false
```
5. **Manually trigger** missed runs if needed:
```bash
curl -X POST http://service:8000/test/production-scheduler
curl -X POST http://service:8000/test/procurement-scheduler
```
### Database Migrations
When adding fields to scheduler-related tables:
1. **Create migration** with proper rollback
2. **Test migration** on staging environment
3. **Run migration** during low-traffic period (3-4 AM)
4. **Verify scheduler** still works after migration
5. **Monitor metrics** for anomalies
### Cache Maintenance
**Clear Stale Cache Entries:**
```bash
# Clear all forecast cache (will rebuild automatically)
redis-cli KEYS "forecast:*" | xargs redis-cli DEL
# Clear specific tenant's cache
redis-cli KEYS "forecast:{tenant_id}:*" | xargs redis-cli DEL
```
**Monitor Cache Size:**
```bash
# Check number of forecast keys
redis-cli DBSIZE
# Check memory usage
redis-cli INFO memory
```
---
## API Reference
### Production Scheduler Endpoints
```
POST /test/production-scheduler
Description: Manually trigger production scheduler (test mode)
Auth: Bearer token required
Response: {"message": "Production scheduler test triggered successfully"}
```
### Procurement Scheduler Endpoints
```
POST /test/procurement-scheduler
Description: Manually trigger procurement scheduler (test mode)
Auth: Bearer token required
Response: {"message": "Procurement scheduler test triggered successfully"}
```
### Forecast Cache Endpoints
```
GET /api/v1/{tenant_id}/forecasting/cache/stats
Description: Get forecast cache statistics
Auth: Bearer token required
Response: {
"available": true,
"total_forecast_keys": 1234,
"batch_forecast_keys": 45,
"single_forecast_keys": 1189,
"hit_rate_percent": 87.5,
...
}
DELETE /api/v1/{tenant_id}/forecasting/cache/product/{product_id}
Description: Invalidate forecast cache for specific product
Auth: Bearer token required (admin only)
Response: {"invalidated_keys": 7}
DELETE /api/v1/{tenant_id}/forecasting/cache
Description: Invalidate all forecast cache for tenant
Auth: Bearer token required (admin only)
Response: {"invalidated_keys": 123}
```
---
## Change Log
### Version 2.0 (2025-10-09) - Automated Scheduling
**Added:**
- ✨ ProductionSchedulerService for automated daily production planning
- ✨ Timezone configuration in Tenant model
- ✨ Forecast caching in Forecasting Service (service-level)
- ✨ Plan rejection workflow with auto-regeneration
- ✨ Comprehensive Prometheus metrics for monitoring
- ✨ TimezoneHelper utility for consistent timezone handling
**Changed:**
- 🔄 All schedulers now timezone-aware
- 🔄 Forecast service returns `cached: true` flag in metadata
- 🔄 Plan rejection triggers notifications and events
**Fixed:**
- 🐛 Duplicate forecast computations eliminated (50% reduction)
- 🐛 Timezone-related scheduling issues resolved
- 🐛 Rejected plans now have proper workflow handling
**Documentation:**
- 📚 Comprehensive production planning system documentation
- 📚 Runbooks for troubleshooting common issues
- 📚 Monitoring and alerting guidelines
### Version 1.0 (2025-10-07) - Initial Release
**Added:**
- ✨ ProcurementSchedulerService for automated procurement planning
- ✨ Daily, weekly, and cleanup jobs
- ✨ Leader election for distributed deployments
- ✨ Parallel tenant processing with timeouts
---
## Support & Contact
For issues or questions about the Production Planning System:
- **Documentation:** This file
- **Source Code:** `services/production/`, `services/orders/`
- **Issues:** GitHub Issues
- **Slack:** `#production-planning` channel
---
**Document Version:** 2.0
**Last Review Date:** 2025-10-09
**Next Review Date:** 2025-11-09

View File

@@ -1,172 +0,0 @@
⏺ Based on my analysis of your bakery project, here's a clear divide
and conquer strategy to transition from mock/demo data to real API
connections:
Current State Analysis
✅ What's Already Done:
- Complete backend microservices architecture (17+ services)
- Full API client setup with authentication & tenant management
- Comprehensive service layer (auth, inventory, sales, forecasting,
etc.)
- Real-time SSE & WebSocket infrastructure
🔍 What Needs Connection:
- Mock data in auth store (mockLogin, mockRefreshToken)
- Mock alerts in alerts store (mockAlerts, mockRules)
- Various hardcoded arrays across hooks and utilities
Divide & Conquer Strategy
Phase 1: Authentication Foundation (Week 1)
Priority: CRITICAL - Everything depends on this
// IMMEDIATE ACTION: Replace mock auth
1. Update auth.store.ts → Connect to real auth service
2. Replace mockLogin() with authService.login()
3. Replace mockRefreshToken() with authService.refreshToken()
4. Test tenant switching and permission system
Files to modify:
- frontend/src/stores/auth.store.ts:46-88 (replace mock functions)
- frontend/src/services/api/auth.service.ts (already done)
Phase 2: Core Operations (Week 2)
Priority: HIGH - Daily operations
// Connect inventory management first (most used)
1. Inventory Service → Real API calls
- Replace mock data in components
- Connect to /api/v1/inventory/* endpoints
2. Production Service → Real API calls
- Connect batch management
- Link to /api/v1/production/* endpoints
3. Sales Service → Real API calls
- Connect POS integration
- Link to /api/v1/sales/* endpoints
Files to modify:
- All inventory components using mock data
- Production scheduling hooks
- Sales tracking components
Phase 3: Analytics & Intelligence (Week 3)
Priority: MEDIUM - Business insights
// Connect AI-powered features
1. Forecasting Service → Real ML predictions
- Connect to /api/v1/forecasts/* endpoints
- Enable real demand forecasting
2. Training Service → Real model training
- Connect WebSocket training progress
- Enable /api/v1/training/* endpoints
3. Analytics Service → Real business data
- Connect charts and reports
- Enable trend analysis
Phase 4: Communication & Automation (Week 4)
Priority: LOW - Enhancements
// Replace mock alerts with real-time system
1. Alerts Store → Real SSE connection
- Connect to /api/v1/sse/alerts/stream/{tenant_id}
- Replace mockAlerts with live data
2. Notification Service → Real messaging
- WhatsApp & Email integration
- Connect to /api/v1/notifications/* endpoints
3. External Data → Live feeds
- Weather API (AEMET)
- Traffic patterns
- Market events
Specific Next Steps
STEP 1: Start with Authentication (Today)
// 1. Replace frontend/src/stores/auth.store.ts
// Remove lines 46-88 and replace with:
const performLogin = async (email: string, password: string) => {
const response = await authService.login({ email, password });
if (!response.success) throw new Error(response.error || 'Login
failed');
return response.data;
};
const performRefresh = async (refreshToken: string) => {
const response = await authService.refreshToken(refreshToken);
if (!response.success) throw new Error('Token refresh failed');
return response.data;
};
STEP 2: Connect Inventory (Next)
// 2. Update inventory components to use real API
// Replace mock data arrays with:
const { data: ingredients, isLoading } = useQuery({
queryKey: ['ingredients', tenantId],
queryFn: () => inventoryService.getIngredients(),
});
STEP 3: Enable Real-time Alerts (After)
// 3. Connect SSE for real alerts
// Replace alerts store mock data with:
useEffect(() => {
const eventSource = new
EventSource(`/api/v1/sse/alerts/stream/${tenantId}`);
eventSource.onmessage = (event) => {
const alert = JSON.parse(event.data);
createAlert(alert);
};
return () => eventSource.close();
}, [tenantId]);
Migration Checklist
Immediate (This Week)
- Replace auth mock functions with real API calls
- Test login/logout/refresh flows
- Verify tenant switching works
- Test permission-based UI rendering
Short-term (Next 2 Weeks)
- Connect inventory service to real API
- Enable production planning with real data
- Connect sales tracking to POS systems
- Test data consistency across services
Medium-term (Next Month)
- Enable ML forecasting with real models
- Connect real-time alert system
- Integrate external data sources (weather, traffic)
- Test full end-to-end workflows
Risk Mitigation
1. Gradual Migration: Keep mock fallbacks during transition
2. Environment Switching: Use env variables to toggle mock/real APIs
3. Error Handling: Robust error handling for API failures
4. Data Validation: Verify API responses match expected schemas
5. User Testing: Test with real bakery workflows
Ready to start? I recommend beginning with Step 1 (Authentication)
today. The infrastructure is already there - you just need to connect
the pipes!

View File

@@ -1,414 +0,0 @@
# Production Planning Scheduler - Quick Start Guide
**For Developers & DevOps**
---
## 🚀 5-Minute Setup
### Prerequisites
```bash
# Running services
- PostgreSQL (production, orders, tenant databases)
- Redis (for forecast caching)
- RabbitMQ (for events and leader election)
# Environment variables
PRODUCTION_DATABASE_URL=postgresql://...
ORDERS_DATABASE_URL=postgresql://...
TENANT_DATABASE_URL=postgresql://...
REDIS_URL=redis://localhost:6379/0
RABBITMQ_URL=amqp://guest:guest@localhost:5672/
```
### Run Migrations
```bash
# Add timezone to tenants table
cd services/tenant
alembic upgrade head
# Verify migration
psql $TENANT_DATABASE_URL -c "SELECT id, name, timezone FROM tenants LIMIT 5;"
```
### Start Services
```bash
# Terminal 1 - Production Service (with scheduler)
cd services/production
uvicorn app.main:app --reload --port 8001
# Terminal 2 - Orders Service (with scheduler)
cd services/orders
uvicorn app.main:app --reload --port 8002
# Terminal 3 - Forecasting Service (with caching)
cd services/forecasting
uvicorn app.main:app --reload --port 8003
```
### Test Schedulers
```bash
# Test production scheduler
curl -X POST http://localhost:8001/test/production-scheduler
# Expected output:
{
"message": "Production scheduler test triggered successfully"
}
# Test procurement scheduler
curl -X POST http://localhost:8002/test/procurement-scheduler
# Expected output:
{
"message": "Procurement scheduler test triggered successfully"
}
# Check logs
tail -f services/production/logs/production.log | grep "schedule"
tail -f services/orders/logs/orders.log | grep "plan"
```
---
## 📋 Configuration
### Enable Test Mode (Development)
```bash
# Run schedulers every 30 minutes instead of daily
export PRODUCTION_TEST_MODE=true
export PROCUREMENT_TEST_MODE=true
export DEBUG=true
```
### Configure Tenant Timezone
```sql
-- Update tenant timezone
UPDATE tenants SET timezone = 'America/New_York' WHERE id = '{tenant_id}';
-- Verify
SELECT id, name, timezone FROM tenants WHERE id = '{tenant_id}';
```
### Check Redis Cache
```bash
# Connect to Redis
redis-cli
# Check forecast cache keys
KEYS forecast:*
# Get cache stats
GET forecast:cache:stats
# Clear cache (if needed)
FLUSHDB
```
---
## 🔍 Monitoring
### View Metrics (Prometheus)
```bash
# Production scheduler metrics
curl http://localhost:8001/metrics | grep production_schedules
# Procurement scheduler metrics
curl http://localhost:8002/metrics | grep procurement_plans
# Forecast cache metrics
curl http://localhost:8003/metrics | grep forecast_cache
```
### Key Metrics to Watch
```promql
# Scheduler success rate (should be > 95%)
rate(production_schedules_generated_total{status="success"}[5m])
rate(procurement_plans_generated_total{status="success"}[5m])
# Cache hit rate (should be > 70%)
forecast_cache_hit_rate
# Generation time (should be < 60s)
histogram_quantile(0.95,
rate(production_schedule_generation_duration_seconds_bucket[5m]))
```
---
## 🐛 Debugging
### Check Scheduler Status
```python
# In Python shell
from app.services.production_scheduler_service import ProductionSchedulerService
from app.core.config import settings
scheduler = ProductionSchedulerService(settings)
await scheduler.start()
# Check configured jobs
jobs = scheduler.scheduler.get_jobs()
for job in jobs:
print(f"{job.name}: next run at {job.next_run_time}")
```
### View Scheduler Logs
```bash
# Production scheduler
kubectl logs -f deployment/production-service | grep -E "scheduler|schedule"
# Procurement scheduler
kubectl logs -f deployment/orders-service | grep -E "scheduler|plan"
# Look for these patterns:
# ✅ "Daily production planning completed"
# ✅ "Production schedule created successfully"
# ❌ "Error processing tenant production"
# ⚠️ "Tenant processing timed out"
```
### Test Timezone Handling
```python
from shared.utils.timezone_helper import TimezoneHelper
# Get current date in different timezones
madrid_date = TimezoneHelper.get_current_date_in_timezone("Europe/Madrid")
ny_date = TimezoneHelper.get_current_date_in_timezone("America/New_York")
tokyo_date = TimezoneHelper.get_current_date_in_timezone("Asia/Tokyo")
print(f"Madrid: {madrid_date}")
print(f"NY: {ny_date}")
print(f"Tokyo: {tokyo_date}")
# Check if business hours
is_business = TimezoneHelper.is_business_hours(
timezone_str="Europe/Madrid",
start_hour=8,
end_hour=20
)
print(f"Business hours: {is_business}")
```
### Test Forecast Cache
```python
from services.forecasting.app.services.forecast_cache import get_forecast_cache_service
from datetime import date
from uuid import UUID
cache = get_forecast_cache_service(redis_url="redis://localhost:6379/0")
# Check if available
print(f"Cache available: {cache.is_available()}")
# Get cache stats
stats = cache.get_cache_stats()
print(f"Cache stats: {stats}")
# Test cache operation
tenant_id = UUID("your-tenant-id")
product_id = UUID("your-product-id")
forecast_date = date.today()
# Try to get cached forecast
cached = await cache.get_cached_forecast(tenant_id, product_id, forecast_date)
print(f"Cached forecast: {cached}")
```
---
## 🧪 Testing
### Unit Tests
```bash
# Run scheduler tests
pytest services/production/tests/test_production_scheduler_service.py -v
pytest services/orders/tests/test_procurement_scheduler_service.py -v
# Run cache tests
pytest services/forecasting/tests/test_forecast_cache.py -v
# Run timezone tests
pytest shared/tests/test_timezone_helper.py -v
```
### Integration Tests
```bash
# Run full scheduler integration test
pytest tests/integration/test_scheduler_integration.py -v
# Run cache integration test
pytest tests/integration/test_cache_integration.py -v
# Run plan rejection workflow test
pytest tests/integration/test_plan_rejection_workflow.py -v
```
### Manual End-to-End Test
```bash
# 1. Clear existing schedules/plans
psql $PRODUCTION_DATABASE_URL -c "DELETE FROM production_schedules WHERE schedule_date = CURRENT_DATE;"
psql $ORDERS_DATABASE_URL -c "DELETE FROM procurement_plans WHERE plan_date = CURRENT_DATE;"
# 2. Trigger schedulers
curl -X POST http://localhost:8001/test/production-scheduler
curl -X POST http://localhost:8002/test/procurement-scheduler
# 3. Wait 30 seconds
# 4. Verify schedules/plans created
psql $PRODUCTION_DATABASE_URL -c "SELECT id, schedule_date, status FROM production_schedules WHERE schedule_date = CURRENT_DATE;"
psql $ORDERS_DATABASE_URL -c "SELECT id, plan_date, status FROM procurement_plans WHERE plan_date = CURRENT_DATE;"
# 5. Check cache hit rate
redis-cli GET forecast_cache_hits_total
redis-cli GET forecast_cache_misses_total
```
---
## 📚 Common Commands
### Scheduler Management
```bash
# Disable scheduler (maintenance mode)
kubectl set env deployment/production-service SCHEDULER_DISABLED=true
# Re-enable scheduler
kubectl set env deployment/production-service SCHEDULER_DISABLED-
# Check scheduler health
curl http://localhost:8001/health | jq .custom_checks.scheduler_service
# Manually trigger scheduler
curl -X POST http://localhost:8001/test/production-scheduler
```
### Cache Management
```bash
# View cache stats
curl http://localhost:8003/api/v1/{tenant_id}/forecasting/cache/stats | jq .
# Clear product cache
curl -X DELETE http://localhost:8003/api/v1/{tenant_id}/forecasting/cache/product/{product_id}
# Clear tenant cache
curl -X DELETE http://localhost:8003/api/v1/{tenant_id}/forecasting/cache
# View cache keys
redis-cli KEYS "forecast:*" | head -20
```
### Database Queries
```sql
-- Check production schedules
SELECT id, schedule_date, status, total_batches, auto_generated
FROM production_schedules
WHERE schedule_date >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY schedule_date DESC;
-- Check procurement plans
SELECT id, plan_date, status, total_requirements, total_estimated_cost
FROM procurement_plans
WHERE plan_date >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY plan_date DESC;
-- Check tenant timezones
SELECT id, name, timezone, city
FROM tenants
WHERE is_active = true
ORDER BY timezone;
-- Check plan approval workflow
SELECT id, plan_number, status, approval_workflow
FROM procurement_plans
WHERE status = 'cancelled'
ORDER BY created_at DESC
LIMIT 10;
```
---
## 🔧 Troubleshooting Quick Fixes
### Scheduler Not Running
```bash
# Check if service is running
ps aux | grep uvicorn
# Check if scheduler initialized
grep "scheduled jobs configured" logs/production.log
# Restart service
pkill -f "uvicorn app.main:app"
uvicorn app.main:app --reload
```
### Cache Not Working
```bash
# Check Redis connection
redis-cli ping # Should return PONG
# Check Redis keys
redis-cli DBSIZE # Should have keys
# Restart Redis (if needed)
redis-cli SHUTDOWN
redis-server --daemonize yes
```
### Wrong Timezone
```bash
# Check server timezone (should be UTC)
date
# Check tenant timezone
psql $TENANT_DATABASE_URL -c \
"SELECT timezone FROM tenants WHERE id = '{tenant_id}';"
# Update if wrong
psql $TENANT_DATABASE_URL -c \
"UPDATE tenants SET timezone = 'Europe/Madrid' WHERE id = '{tenant_id}';"
```
---
## 📖 Additional Resources
- **Full Documentation:** [PRODUCTION_PLANNING_SYSTEM.md](./PRODUCTION_PLANNING_SYSTEM.md)
- **Operational Runbook:** [SCHEDULER_RUNBOOK.md](./SCHEDULER_RUNBOOK.md)
- **Implementation Summary:** [IMPLEMENTATION_SUMMARY.md](./IMPLEMENTATION_SUMMARY.md)
- **Code:**
- Production Scheduler: [`services/production/app/services/production_scheduler_service.py`](../services/production/app/services/production_scheduler_service.py)
- Procurement Scheduler: [`services/orders/app/services/procurement_scheduler_service.py`](../services/orders/app/services/procurement_scheduler_service.py)
- Forecast Cache: [`services/forecasting/app/services/forecast_cache.py`](../services/forecasting/app/services/forecast_cache.py)
- Timezone Helper: [`shared/utils/timezone_helper.py`](../shared/utils/timezone_helper.py)
---
**Version:** 1.0
**Last Updated:** 2025-10-09
**Maintained By:** Backend Team

View File

@@ -1,530 +0,0 @@
# Production Planning Scheduler Runbook
**Quick Reference Guide for DevOps & Support Teams**
---
## Quick Links
- [Full Documentation](./PRODUCTION_PLANNING_SYSTEM.md)
- [Metrics Dashboard](http://grafana:3000/d/production-planning)
- [Logs](http://kibana:5601)
- [Alerts](http://alertmanager:9093)
---
## Emergency Contacts
| Role | Contact | Availability |
|------|---------|--------------|
| **Backend Lead** | #backend-team | 24/7 |
| **DevOps On-Call** | #devops-oncall | 24/7 |
| **Product Owner** | TBD | Business hours |
---
## Scheduler Overview
| Scheduler | Time | What It Does |
|-----------|------|--------------|
| **Production** | 5:30 AM (tenant timezone) | Creates daily production schedules |
| **Procurement** | 6:00 AM (tenant timezone) | Creates daily procurement plans |
**Critical:** Both schedulers MUST complete successfully every morning, or bakeries won't have production/procurement plans for the day!
---
## Common Incidents & Solutions
### 🔴 CRITICAL: Scheduler Completely Failed
**Alert:** `SchedulerUnhealthy` or `NoProductionSchedulesGenerated`
**Impact:** HIGH - No plans generated for any tenant
**Immediate Actions (< 5 minutes):**
```bash
# 1. Check if service is running
kubectl get pods -n production | grep production-service
kubectl get pods -n orders | grep orders-service
# 2. Check recent logs for errors
kubectl logs -n production deployment/production-service --tail=100 | grep ERROR
kubectl logs -n orders deployment/orders-service --tail=100 | grep ERROR
# 3. Restart service if frozen/crashed
kubectl rollout restart deployment/production-service -n production
kubectl rollout restart deployment/orders-service -n orders
# 4. Wait 2 minutes for scheduler to initialize, then manually trigger
curl -X POST http://production-service:8000/test/production-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
curl -X POST http://orders-service:8000/test/procurement-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
**Follow-up Actions:**
- Check RabbitMQ health (leader election depends on it)
- Review database connectivity
- Check resource limits (CPU/memory)
- Monitor metrics for successful generation
---
### 🟠 HIGH: Single Tenant Failed
**Alert:** `DailyProductionPlanningFailed{tenant_id="abc-123"}`
**Impact:** MEDIUM - One bakery affected
**Immediate Actions (< 10 minutes):**
```bash
# 1. Check logs for specific tenant
kubectl logs -n production deployment/production-service --tail=500 | \
grep "tenant_id=abc-123" | grep ERROR
# 2. Common causes:
# - Tenant database connection issue
# - External service timeout (Forecasting, Inventory)
# - Invalid data (e.g., missing products)
# 3. Manually retry for this tenant
curl -X POST http://production-service:8000/test/production-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
# (Scheduler will skip tenants that already have schedules)
# 4. If still failing, check tenant-specific issues:
# - Verify tenant exists and is active
# - Check tenant's inventory has products
# - Check forecasting service can access tenant data
```
**Follow-up Actions:**
- Contact tenant to understand their setup
- Review tenant data quality
- Check if tenant is new (may need initial setup)
---
### 🟡 MEDIUM: Scheduler Running Slow
**Alert:** `production_schedule_generation_duration_seconds > 120s`
**Impact:** LOW - Scheduler completes but takes longer than expected
**Immediate Actions (< 15 minutes):**
```bash
# 1. Check current execution time
kubectl logs -n production deployment/production-service --tail=100 | \
grep "production planning completed"
# 2. Check database query performance
# Look for slow query logs in PostgreSQL
# 3. Check external service response times
# - Forecasting Service health: curl http://forecasting-service:8000/health
# - Inventory Service health: curl http://inventory-service:8000/health
# - Orders Service health: curl http://orders-service:8000/health
# 4. Check CPU/memory usage
kubectl top pods -n production | grep production-service
kubectl top pods -n orders | grep orders-service
```
**Follow-up Actions:**
- Consider increasing timeout if consistently near limit
- Optimize slow database queries
- Scale external services if overloaded
- Review tenant count (may need to process fewer in parallel)
---
### 🟡 MEDIUM: Low Forecast Cache Hit Rate
**Alert:** `ForecastCacheHitRateLow < 50%`
**Impact:** LOW - Increased load on Forecasting Service, slower responses
**Immediate Actions (< 10 minutes):**
```bash
# 1. Check Redis is running
kubectl get pods -n redis | grep redis
redis-cli ping # Should return PONG
# 2. Check cache statistics
curl http://forecasting-service:8000/api/v1/{tenant_id}/forecasting/cache/stats \
-H "Authorization: Bearer $ADMIN_TOKEN"
# 3. Check cache keys
redis-cli KEYS "forecast:*" | wc -l # Should have many entries
# 4. Check Redis memory
redis-cli INFO memory | grep used_memory_human
# 5. If cache is empty or Redis is down, restart Redis
kubectl rollout restart statefulset/redis -n redis
```
**Follow-up Actions:**
- Monitor cache rebuild (should hit ~80-90% within 1 day)
- Check Redis configuration (memory limits, eviction policy)
- Review forecast TTL settings
- Check for cache invalidation bugs
---
### 🟢 LOW: Plan Rejected by User
**Alert:** `procurement_plan_rejections_total` increasing
**Impact:** LOW - Normal user workflow
**Actions (< 5 minutes):**
```bash
# 1. Check rejection logs for patterns
kubectl logs -n orders deployment/orders-service --tail=200 | \
grep "plan rejection"
# 2. Check if auto-regeneration triggered
kubectl logs -n orders deployment/orders-service --tail=200 | \
grep "Auto-regenerating plan"
# 3. Verify rejection notification sent
# Check RabbitMQ queue: procurement.plan.rejected
# 4. If rejection notes mention "stale" or "outdated", plan will auto-regenerate
# Otherwise, user needs to manually regenerate or modify plan
```
**Follow-up Actions:**
- Review rejection reasons for trends
- Consider user training if many rejections
- Improve plan accuracy if consistent issues
---
## Health Check Commands
### Quick Service Health Check
```bash
# Production Service
curl http://production-service:8000/health | jq .
# Orders Service
curl http://orders-service:8000/health | jq .
# Forecasting Service
curl http://forecasting-service:8000/health | jq .
# Redis
redis-cli ping
# RabbitMQ
curl http://rabbitmq:15672/api/health/checks/alarms \
-u guest:guest | jq .
```
### Detailed Scheduler Status
```bash
# Check last scheduler run time
curl http://production-service:8000/health | \
jq '.custom_checks.scheduler_service'
# Check APScheduler job status (requires internal access)
# Look for: scheduler.get_jobs() output in logs
kubectl logs -n production deployment/production-service | \
grep "scheduled jobs configured"
```
### Database Connectivity
```bash
# Check production database
kubectl exec -it deployment/production-service -n production -- \
python -c "from app.core.database import database_manager; \
import asyncio; \
asyncio.run(database_manager.health_check())"
# Check orders database
kubectl exec -it deployment/orders-service -n orders -- \
python -c "from app.core.database import database_manager; \
import asyncio; \
asyncio.run(database_manager.health_check())"
```
---
## Maintenance Procedures
### Disable Schedulers (Maintenance Mode)
```bash
# 1. Set environment variable to disable schedulers
kubectl set env deployment/production-service SCHEDULER_DISABLED=true -n production
kubectl set env deployment/orders-service SCHEDULER_DISABLED=true -n orders
# 2. Wait for pods to restart
kubectl rollout status deployment/production-service -n production
kubectl rollout status deployment/orders-service -n orders
# 3. Verify schedulers are disabled (check logs)
kubectl logs -n production deployment/production-service | grep "Scheduler disabled"
```
### Re-enable Schedulers (After Maintenance)
```bash
# 1. Remove environment variable
kubectl set env deployment/production-service SCHEDULER_DISABLED- -n production
kubectl set env deployment/orders-service SCHEDULER_DISABLED- -n orders
# 2. Wait for pods to restart
kubectl rollout status deployment/production-service -n production
kubectl rollout status deployment/orders-service -n orders
# 3. Manually trigger to catch up (if during scheduled time)
curl -X POST http://production-service:8000/test/production-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
curl -X POST http://orders-service:8000/test/procurement-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### Clear Forecast Cache
```bash
# Clear all forecast cache (will rebuild automatically)
redis-cli KEYS "forecast:*" | xargs redis-cli DEL
# Clear specific tenant's cache
redis-cli KEYS "forecast:{tenant_id}:*" | xargs redis-cli DEL
# Verify cache cleared
redis-cli DBSIZE
```
---
## Metrics to Monitor
### Production Scheduler
```promql
# Success rate (should be > 95%)
rate(production_schedules_generated_total{status="success"}[5m]) /
rate(production_schedules_generated_total[5m])
# Average generation time (should be < 60s)
histogram_quantile(0.95,
rate(production_schedule_generation_duration_seconds_bucket[5m]))
# Failed tenants (should be 0)
increase(production_tenants_processed_total{status="failure"}[5m])
```
### Procurement Scheduler
```promql
# Success rate (should be > 95%)
rate(procurement_plans_generated_total{status="success"}[5m]) /
rate(procurement_plans_generated_total[5m])
# Average generation time (should be < 60s)
histogram_quantile(0.95,
rate(procurement_plan_generation_duration_seconds_bucket[5m]))
# Failed tenants (should be 0)
increase(procurement_tenants_processed_total{status="failure"}[5m])
```
### Forecast Cache
```promql
# Cache hit rate (should be > 70%)
forecast_cache_hit_rate
# Cache hits per minute
rate(forecast_cache_hits_total[5m])
# Cache misses per minute
rate(forecast_cache_misses_total[5m])
```
---
## Log Patterns to Watch
### Success Patterns
```
✅ "Daily production planning completed" - All tenants processed
✅ "Production schedule created successfully" - Individual tenant success
✅ "Forecast cache HIT" - Cache working correctly
✅ "Production scheduler service started" - Service initialized
```
### Warning Patterns
```
⚠️ "Tenant processing timed out" - Individual tenant taking too long
⚠️ "Forecast cache MISS" - Cache miss (expected some, but not all)
⚠️ "Approving plan older than 24 hours" - Stale plan being approved
⚠️ "Could not fetch tenant timezone" - Timezone configuration issue
```
### Error Patterns
```
❌ "Daily production planning failed completely" - Complete failure
❌ "Error processing tenant production" - Tenant-specific failure
❌ "Forecast cache Redis connection failed" - Cache unavailable
❌ "Migration version mismatch" - Database migration issue
❌ "Failed to publish event" - RabbitMQ connectivity issue
```
---
## Escalation Procedure
### Level 1: DevOps On-Call (0-30 minutes)
- Check service health
- Review logs for obvious errors
- Restart services if crashed
- Manually trigger schedulers if needed
- Monitor for resolution
### Level 2: Backend Team (30-60 minutes)
- Investigate complex errors
- Check database issues
- Review scheduler logic
- Coordinate with other teams (if external service issue)
### Level 3: Engineering Lead (> 60 minutes)
- Major architectural issues
- Database corruption or loss
- Multi-service cascading failures
- Decisions on emergency fixes vs. scheduled maintenance
---
## Testing After Deployment
### Post-Deployment Checklist
```bash
# 1. Verify services are running
kubectl get pods -n production
kubectl get pods -n orders
# 2. Check health endpoints
curl http://production-service:8000/health
curl http://orders-service:8000/health
# 3. Verify schedulers are configured
kubectl logs -n production deployment/production-service | \
grep "scheduled jobs configured"
# 4. Manually trigger test run
curl -X POST http://production-service:8000/test/production-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
curl -X POST http://orders-service:8000/test/procurement-scheduler \
-H "Authorization: Bearer $ADMIN_TOKEN"
# 5. Verify test run completed successfully
kubectl logs -n production deployment/production-service | \
grep "Production schedule created successfully"
kubectl logs -n orders deployment/orders-service | \
grep "Procurement plan generated successfully"
# 6. Check metrics dashboard
# Visit: http://grafana:3000/d/production-planning
```
---
## Known Issues & Workarounds
### Issue: Scheduler runs twice in distributed setup
**Symptom:** Duplicate schedules/plans for same tenant and date
**Cause:** Leader election not working (RabbitMQ connection issue)
**Workaround:**
```bash
# Temporarily scale to single instance
kubectl scale deployment/production-service --replicas=1 -n production
kubectl scale deployment/orders-service --replicas=1 -n orders
# Fix RabbitMQ connectivity
# Then scale back up
kubectl scale deployment/production-service --replicas=3 -n production
kubectl scale deployment/orders-service --replicas=3 -n orders
```
### Issue: Timezone shows wrong time
**Symptom:** Schedules generated at wrong hour
**Cause:** Tenant timezone not configured or incorrect
**Workaround:**
```sql
-- Check tenant timezone
SELECT id, name, timezone FROM tenants WHERE id = '{tenant_id}';
-- Update if incorrect
UPDATE tenants SET timezone = 'Europe/Madrid' WHERE id = '{tenant_id}';
-- Verify server uses UTC
-- In container: date (should show UTC)
```
### Issue: Forecast cache always misses
**Symptom:** `forecast_cache_hit_rate = 0%`
**Cause:** Redis not accessible or REDIS_URL misconfigured
**Workaround:**
```bash
# Check REDIS_URL environment variable
kubectl get deployment forecasting-service -n forecasting -o yaml | \
grep REDIS_URL
# Should be: redis://redis:6379/0
# If incorrect, update:
kubectl set env deployment/forecasting-service \
REDIS_URL=redis://redis:6379/0 -n forecasting
```
---
## Additional Resources
- **Full Documentation:** [PRODUCTION_PLANNING_SYSTEM.md](./PRODUCTION_PLANNING_SYSTEM.md)
- **Metrics File:** [`shared/monitoring/scheduler_metrics.py`](../shared/monitoring/scheduler_metrics.py)
- **Scheduler Code:**
- Production: [`services/production/app/services/production_scheduler_service.py`](../services/production/app/services/production_scheduler_service.py)
- Procurement: [`services/orders/app/services/procurement_scheduler_service.py`](../services/orders/app/services/procurement_scheduler_service.py)
- **Forecast Cache:** [`services/forecasting/app/services/forecast_cache.py`](../services/forecasting/app/services/forecast_cache.py)
---
**Runbook Version:** 1.0
**Last Updated:** 2025-10-09
**Maintained By:** Backend Team