Fix training hang by wrapping blocking ML operations in thread pool

Root Cause:
Training process was stuck at 40% because blocking synchronous ML operations
(model.fit(), model.predict(), study.optimize()) were freezing the asyncio
event loop, preventing RabbitMQ heartbeats, WebSocket communication, and
progress updates.

Changes:
1. prophet_manager.py:
   - Wrapped model.fit() at line 189 with asyncio.to_thread()
   - Wrapped study.optimize() at line 453 with asyncio.to_thread()

2. hybrid_trainer.py:
   - Made _train_xgboost() async and wrapped model.fit() with asyncio.to_thread()
   - Made _evaluate_hybrid_model() async and wrapped predict() calls
   - Fixed predict() method to wrap blocking predict() calls

Impact:
- Event loop no longer blocks during ML training
- RabbitMQ heartbeats continue during training
- WebSocket progress updates work correctly
- Training can now complete successfully

Fixes: Training hang at 40% during onboarding phase
This commit is contained in:
Claude
2025-11-05 14:34:53 +00:00
parent 94b3b343f5
commit c64585af57
2 changed files with 29 additions and 12 deletions

View File

@@ -186,7 +186,9 @@ class BakeryProphetManager:
# Fit the model with enhanced error handling
try:
logger.info(f"Starting Prophet model fit for {inventory_product_id}")
model.fit(prophet_data)
# ✅ FIX: Run blocking model.fit() in thread pool to avoid blocking event loop
import asyncio
await asyncio.to_thread(model.fit, prophet_data)
logger.info(f"Prophet model fit completed successfully for {inventory_product_id}")
except Exception as fit_error:
error_details = {
@@ -450,7 +452,15 @@ class BakeryProphetManager:
direction='minimize',
sampler=optuna.samplers.TPESampler(seed=product_seed)
)
study.optimize(objective, n_trials=n_trials, timeout=const.OPTUNA_TIMEOUT_SECONDS, show_progress_bar=False)
# ✅ FIX: Run blocking study.optimize() in thread pool to avoid blocking event loop
import asyncio
await asyncio.to_thread(
study.optimize,
objective,
n_trials=n_trials,
timeout=const.OPTUNA_TIMEOUT_SECONDS,
show_progress_bar=False
)
# Return best parameters
best_params = study.best_params