Root Cause:
Training process was stuck at 40% because blocking synchronous ML operations
(model.fit(), model.predict(), study.optimize()) were freezing the asyncio
event loop, preventing RabbitMQ heartbeats, WebSocket communication, and
progress updates.
Changes:
1. prophet_manager.py:
- Wrapped model.fit() at line 189 with asyncio.to_thread()
- Wrapped study.optimize() at line 453 with asyncio.to_thread()
2. hybrid_trainer.py:
- Made _train_xgboost() async and wrapped model.fit() with asyncio.to_thread()
- Made _evaluate_hybrid_model() async and wrapped predict() calls
- Fixed predict() method to wrap blocking predict() calls
Impact:
- Event loop no longer blocks during ML training
- RabbitMQ heartbeats continue during training
- WebSocket progress updates work correctly
- Training can now complete successfully
Fixes: Training hang at 40% during onboarding phase