bakery-ia

Author	SHA1	Message	Date
Urtzi Alfaro	caff49761d	Fix training hang caused by nested database sessions and deadlocks Root Cause: The training process was hanging at the first progress update due to a nested database session issue. The main trainer created a session and repositories, then called prophet_manager.train_bakery_model() which created another nested session with an advisory lock. This caused a deadlock where: 1. Outer session had uncommitted UPDATE on model_training_logs 2. Inner session tried to acquire advisory lock 3. Neither could proceed, causing training to hang indefinitely Changes Made: 1. prophet_manager.py: - Added optional 'session' parameter to train_bakery_model() - Refactored to use parent session if provided, otherwise create new one - Prevents nested session creation during training 2. hybrid_trainer.py: - Added optional 'session' parameter to train_hybrid_model() - Passes session to prophet_manager to maintain single session context 3. trainer.py: - Updated _train_single_product() to accept and pass session - Updated _train_all_models_enhanced() to accept and pass session - Pass db_session from main training context to all training methods - Added explicit db_session.flush() after critical progress update - This ensures updates are visible before acquiring locks Impact: - Eliminates nested session deadlocks - Training now proceeds past initial progress update - Maintains single database session context throughout training - Prevents database transaction conflicts Related Issues: - Fixes training hang during onboarding process - Not directly related to audit_metadata changes but exposed by them 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 16:13:32 +01:00
Claude	c64585af57	Fix training hang by wrapping blocking ML operations in thread pool Root Cause: Training process was stuck at 40% because blocking synchronous ML operations (model.fit(), model.predict(), study.optimize()) were freezing the asyncio event loop, preventing RabbitMQ heartbeats, WebSocket communication, and progress updates. Changes: 1. prophet_manager.py: - Wrapped model.fit() at line 189 with asyncio.to_thread() - Wrapped study.optimize() at line 453 with asyncio.to_thread() 2. hybrid_trainer.py: - Made _train_xgboost() async and wrapped model.fit() with asyncio.to_thread() - Made _evaluate_hybrid_model() async and wrapped predict() calls - Fixed predict() method to wrap blocking predict() calls Impact: - Event loop no longer blocks during ML training - RabbitMQ heartbeats continue during training - WebSocket progress updates work correctly - Training can now complete successfully Fixes: Training hang at 40% during onboarding phase	2025-11-05 14:34:53 +00:00
Urtzi Alfaro	394ad3aea4	Improve AI logic	2025-11-05 13:34:56 +01:00
Urtzi Alfaro	96ad5c6692	Refactor datetime and timezone utils	2025-10-12 23:16:04 +02:00
Urtzi Alfaro	3c689b4f98	REFACTOR external service and improve websocket training	2025-10-09 14:11:02 +02:00
Urtzi Alfaro	03737430ee	Fix new services implementation 3	2025-08-14 16:47:34 +02:00
Urtzi Alfaro	8d125ab0d5	Refactor the traffic fetching system	2025-08-10 18:32:47 +02:00
Urtzi Alfaro	488bb3ef93	REFACTOR - Database logic	2025-08-08 09:08:41 +02:00
Urtzi Alfaro	cd6fd875f7	Improve training test 3	2025-07-29 12:45:39 +02:00
Urtzi Alfaro	98f546af12	Improve training code	2025-07-28 19:28:39 +02:00
Urtzi Alfaro	e63a99b818	Checking onboardin flow - fix 4	2025-07-27 16:29:53 +02:00
Urtzi Alfaro	f3071c00bd	Add all the code for training service	2025-07-19 16:59:37 +02:00

12 Commits