bakery-ia

Author	SHA1	Message	Date
Urtzi Alfaro	e8fda39e50	Improve metrics	2026-01-08 20:48:24 +01:00
Urtzi Alfaro	29d19087f1	Update monitoring packages to latest versions - Updated all OpenTelemetry packages to latest versions: - opentelemetry-api: 1.27.0 → 1.39.1 - opentelemetry-sdk: 1.27.0 → 1.39.1 - opentelemetry-exporter-otlp-proto-grpc: 1.27.0 → 1.39.1 - opentelemetry-exporter-otlp-proto-http: 1.27.0 → 1.39.1 - opentelemetry-instrumentation-fastapi: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-httpx: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-redis: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-sqlalchemy: 0.48b0 → 0.60b1 - Removed prometheus-client==0.23.1 from all services - Unified all services to use the same monitoring package versions Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>	2026-01-08 19:25:52 +01:00
Urtzi Alfaro	c07df124fb	Improve UI	2025-12-30 14:40:20 +01:00
Urtzi Alfaro	02f0c91a15	Fix UI issues	2025-12-29 19:33:35 +01:00
Urtzi Alfaro	ff830a3415	demo seed change	2025-12-13 23:57:54 +01:00
Urtzi Alfaro	667e6e0404	New alert service	2025-12-05 20:07:01 +01:00
Urtzi Alfaro	972db02f6d	New enterprise feature	2025-11-30 09:12:40 +01:00
Urtzi Alfaro	843cd2bf5c	Improve the UI and training	2025-11-15 15:20:10 +01:00
Urtzi Alfaro	c349b845a6	Bug fixes of training	2025-11-14 20:27:39 +01:00
Urtzi Alfaro	a8d8828935	imporve features	2025-11-14 07:23:56 +01:00
Urtzi Alfaro	5783c7ed05	Add POI feature and imporve the overall backend implementation	2025-11-12 15:34:10 +01:00
Urtzi Alfaro	3007bde05b	Improve kubernetes for prod	2025-11-06 11:04:50 +01:00
Urtzi Alfaro	3ad093d38b	Fix orchestrator issues	2025-11-05 22:54:14 +01:00
Claude	0c4a941ca8	Fix critical double commit bug and Spanish holidays warning CRITICAL FIX - Database Transaction: - Removed duplicate commit logic from _store_in_db() inner function (lines 805-811) - Prevents 'Method commit() can't be called here' error - Now only outer scope handles commits (line 821 for new sessions, parent for parent sessions) - Fixes issue where all 5 models trained successfully but failed to store in DB MINOR FIX - Logging: - Fixed Spanish holidays logger call (line 977-978) - Removed invalid keyword arguments (region=, years=) from logger.info() - Now uses f-string format consistent with rest of codebase - Prevents 'Logger._log() got an unexpected keyword argument' warning Impact: - Training pipeline can now complete successfully - Models will be stored in database after training - No more cascading transaction failures - Cleaner logs without warnings Root cause: Double commit introduced during recent session management fixes Related commits: `673108e`, `74215d3`, `fd0a96e`, `b2de56e`, `e585e9f`	2025-11-05 18:34:00 +00:00
Urtzi Alfaro	673108e3c0	Fix deadlock issues in training 2	2025-11-05 19:26:52 +01:00
Urtzi Alfaro	74215d3e85	Fix deadlock issues in training	2025-11-05 18:47:20 +01:00
Urtzi Alfaro	fd0a96e254	Fix remaining nested session issues in training pipeline Issues Fixed: 4️⃣ data_processor.py (Line 230-232): - Second update_log_progress call without commit after data preparation - Added commit() after completion update to prevent deadlock - Added debug logging for visibility 5️⃣ prophet_manager.py _store_model (Line 750): - Created TRIPLE nested session (training_service → trainer → lock → _store_model) - Refactored _store_model to accept optional session parameter - Uses parent session from lock context instead of creating new one - Updated call site to pass db_session parameter Complete Session Hierarchy After All Fixes: training_service.py (session) └─ commit() ← FIX #2 (`e585e9f`) └─ trainer.py (new session) ✅ OK └─ data_processor.py (new session) └─ commit() after first update ← FIX #3 (`b2de56e`) └─ commit() after second update ← FIX #4 (THIS) └─ prophet_manager.train_bakery_model (uses parent or new session) ← FIX #1 (`caff497`) └─ lock.acquire(session) └─ _store_model(session=parent) ← FIX #5 (THIS) └─ NO NESTED SESSION ✅ All nested session deadlocks in training path are now resolved. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 16:41:53 +01:00
Urtzi Alfaro	b2de56ead3	Fix additional nested session deadlock in data_processor.py Root Cause: After fixing the training_service.py deadlock, training progressed to data preparation but got stuck there. The data_processor.py creates another nested session at line 143, updates training_log without committing, causing another deadlock scenario. Session Hierarchy: 1. training_service.py: outer session (fixed in `e585e9f`) 2. trainer.py: creates own session (passes deadlock due to commit) 3. data_processor.py: creates ANOTHER nested session (THIS FIX) Fix: Added explicit db_session.commit() after progress update in data_processor (line 153) to ensure the UPDATE is committed before continuing with data processing operations that may interact with other sessions. This completes the chain of nested session fixes: - `caff497`: prophet_manager + hybrid_trainer session passing - `e585e9f`: training_service commit before trainer call - THIS: data_processor commit after progress update 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 16:39:05 +01:00
Urtzi Alfaro	e585e9fac0	Fix critical nested session deadlock in training_service.py Root Cause (Actual): The actual nested session issue was in training_service.py, not just in the trainer methods. The flow was: 1. training_service.py creates outer session (line 173) 2. Updates training_log at line 235-237 (uncommitted) 3. Calls trainer.train_tenant_models() at line 239 4. Trainer creates its own session at line 93 5. DEADLOCK: Outer session has uncommitted UPDATE, inner session can't proceed Fix: Added explicit session.commit() after the ml_training progress update (line 241) to ensure the UPDATE is committed before trainer creates its own session. This prevents the deadlock condition. Related to previous commit `caff497` which fixed nested sessions in prophet_manager and hybrid_trainer, but missed the actual root cause in training_service.py. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 16:30:15 +01:00
Urtzi Alfaro	caff49761d	Fix training hang caused by nested database sessions and deadlocks Root Cause: The training process was hanging at the first progress update due to a nested database session issue. The main trainer created a session and repositories, then called prophet_manager.train_bakery_model() which created another nested session with an advisory lock. This caused a deadlock where: 1. Outer session had uncommitted UPDATE on model_training_logs 2. Inner session tried to acquire advisory lock 3. Neither could proceed, causing training to hang indefinitely Changes Made: 1. prophet_manager.py: - Added optional 'session' parameter to train_bakery_model() - Refactored to use parent session if provided, otherwise create new one - Prevents nested session creation during training 2. hybrid_trainer.py: - Added optional 'session' parameter to train_hybrid_model() - Passes session to prophet_manager to maintain single session context 3. trainer.py: - Updated _train_single_product() to accept and pass session - Updated _train_all_models_enhanced() to accept and pass session - Pass db_session from main training context to all training methods - Added explicit db_session.flush() after critical progress update - This ensures updates are visible before acquiring locks Impact: - Eliminates nested session deadlocks - Training now proceeds past initial progress update - Maintains single database session context throughout training - Prevents database transaction conflicts Related Issues: - Fixes training hang during onboarding process - Not directly related to audit_metadata changes but exposed by them 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 16:13:32 +01:00
Claude	c64585af57	Fix training hang by wrapping blocking ML operations in thread pool Root Cause: Training process was stuck at 40% because blocking synchronous ML operations (model.fit(), model.predict(), study.optimize()) were freezing the asyncio event loop, preventing RabbitMQ heartbeats, WebSocket communication, and progress updates. Changes: 1. prophet_manager.py: - Wrapped model.fit() at line 189 with asyncio.to_thread() - Wrapped study.optimize() at line 453 with asyncio.to_thread() 2. hybrid_trainer.py: - Made _train_xgboost() async and wrapped model.fit() with asyncio.to_thread() - Made _evaluate_hybrid_model() async and wrapped predict() calls - Fixed predict() method to wrap blocking predict() calls Impact: - Event loop no longer blocks during ML training - RabbitMQ heartbeats continue during training - WebSocket progress updates work correctly - Training can now complete successfully Fixes: Training hang at 40% during onboarding phase	2025-11-05 14:34:53 +00:00
Claude	136761af19	Fix AuditLogger.log_event() parameter name: metadata -> audit_metadata	2025-11-05 14:17:39 +00:00
Claude	8df90338b2	Fix training log race conditions and audit event error Critical fixes for training session logging: 1. Training log race condition fix: - Add explicit session commits after creating training logs - Handle duplicate key errors gracefully when multiple sessions try to create the same log simultaneously - Implement retry logic to query for existing logs after duplicate key violations - Prevents "Training log not found" errors during training 2. Audit event async generator error fix: - Replace incorrect next(get_db()) usage with proper async context manager (database_manager.get_session()) - Fixes "'async_generator' object is not an iterator" error - Ensures audit logging works correctly These changes address race conditions in concurrent database sessions and ensure training logs are properly synchronized across the training pipeline.	2025-11-05 13:24:22 +00:00
Claude	5a84be83d6	Fix multiple critical bugs in onboarding training step This commit addresses all identified bugs and issues in the training code path: ## Critical Fixes: - Add get_start_time() method to TrainingLogRepository and fix non-existent method call - Remove duplicate training.started event from API endpoint (trainer publishes the accurate one) - Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone" ## High Priority Fixes: - Fix division by zero risk in time estimation with double-check and max() safety - Remove unreachable exception handler in training_operations.py - Simplify WebSocket token refresh logic to only reconnect on actual user session changes ## Medium Priority Fixes: - Fix auto-start training effect with useRef to prevent duplicate starts - Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket - Extract all magic numbers to centralized constants files: - Backend: services/training/app/core/training_constants.py - Frontend: frontend/src/constants/training.ts - Standardize error logging with exc_info=True on critical errors ## Code Quality Improvements: - All progress percentages now use named constants - All timeouts and intervals now use named constants - Improved code maintainability and readability - Better separation of concerns ## Files Changed: - Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py - Backend: training_operations.py, training_log_repository.py, training_constants.py (new) - Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new) All training progress events now properly flow from 0% to 100% with no gaps.	2025-11-05 13:02:39 +00:00
Claude	799e7dbaeb	Fix training job concurrent database session conflicts Root Cause: - Multiple parallel training tasks (3 at a time) were sharing the same database session - This caused SQLAlchemy session state conflicts: "Session is already flushing" and "rollback() is already in progress" - Additionally, duplicate model records were being created by both trainer and training_service Fixes: 1. Separated model training from database writes: - Training happens in parallel (CPU-intensive) - Database writes happen sequentially after training completes - This eliminates concurrent session access 2. Removed duplicate database writes: - Trainer now writes all model records sequentially after parallel training - Training service now retrieves models instead of creating duplicates - Performance metrics are also created by trainer (no duplicates) 3. Added proper data flow: - _train_single_product: Only trains models, stores results - _write_training_results_to_database: Sequential DB writes after training - _store_trained_models: Changed to retrieve existing models - _create_performance_metrics: Changed to verify existing metrics Benefits: - Eliminates database session conflicts - Prevents duplicate model records - Maintains parallel training performance - Ensures data consistency Files Modified: - services/training/app/ml/trainer.py - services/training/app/services/training_service.py Resolves: Onboarding training job database session conflicts	2025-11-05 12:41:42 +00:00
Urtzi Alfaro	394ad3aea4	Improve AI logic	2025-11-05 13:34:56 +01:00
Urtzi Alfaro	5adb0e39c0	Improve the frontend 5	2025-11-02 20:24:44 +01:00
Urtzi Alfaro	269d3b5032	Add user delete process	2025-10-31 11:54:19 +01:00
Urtzi Alfaro	36217a2729	Improve the frontend 2	2025-10-29 06:58:05 +01:00
Urtzi Alfaro	858d985c92	Improve the frontend modals	2025-10-27 16:33:26 +01:00
Urtzi Alfaro	8d30172483	Improve the frontend	2025-10-21 19:50:07 +02:00
Urtzi Alfaro	05da20357d	Improve teh securty of teh DB	2025-10-19 19:22:37 +02:00
Urtzi Alfaro	312e36c893	Update requirements and insfra versions	2025-10-17 23:09:40 +02:00
Urtzi Alfaro	dbb48d8e2c	Improve the sales import	2025-10-15 21:09:42 +02:00
Urtzi Alfaro	8f9e9a7edc	Add role-based filtering and imporve code	2025-10-15 16:12:49 +02:00
Urtzi Alfaro	96ad5c6692	Refactor datetime and timezone utils	2025-10-12 23:16:04 +02:00
Urtzi Alfaro	7556a00db7	Improve the demo feature of the project	2025-10-12 18:47:33 +02:00
Urtzi Alfaro	dbc7f2fa0d	Re-create migrations init tables	2025-10-09 20:47:31 +02:00
Urtzi Alfaro	3c689b4f98	REFACTOR external service and improve websocket training	2025-10-09 14:11:02 +02:00
Urtzi Alfaro	7c72f83c51	REFACTOR ALL APIs fix 1	2025-10-07 07:15:07 +02:00
Urtzi Alfaro	38fb98bc27	REFACTOR ALL APIs	2025-10-06 15:27:01 +02:00
Urtzi Alfaro	0fdc3b0211	Fix issues	2025-10-01 16:25:53 +02:00
Urtzi Alfaro	2eeebfc1e0	Fix Alembic issue	2025-10-01 11:24:06 +02:00
Urtzi Alfaro	7cc4b957a5	Fix DB issue 2s	2025-09-30 21:58:10 +02:00
Urtzi Alfaro	147893015e	Fix DB issues	2025-09-30 13:32:51 +02:00
Urtzi Alfaro	ec6bcb4c7d	Add migration services	2025-09-30 08:12:45 +02:00
Urtzi Alfaro	2712a60a2a	Refactor services alembic	2025-09-29 19:16:34 +02:00
Urtzi Alfaro	befcc126b0	Refactor all main.py	2025-09-29 13:13:12 +02:00
Urtzi Alfaro	4777e59e7a	Add base kubernetes support final fix 4	2025-09-29 07:54:25 +02:00
Urtzi Alfaro	57f77638cc	Add base kubernetes support final fix 2	2025-09-28 19:48:05 +02:00

1 2 3

138 Commits