Fix training hang caused by nested database sessions and deadlocks
Root Cause: The training process was hanging at the first progress update due to a nested database session issue. The main trainer created a session and repositories, then called prophet_manager.train_bakery_model() which created another nested session with an advisory lock. This caused a deadlock where: 1. Outer session had uncommitted UPDATE on model_training_logs 2. Inner session tried to acquire advisory lock 3. Neither could proceed, causing training to hang indefinitely Changes Made: 1. prophet_manager.py: - Added optional 'session' parameter to train_bakery_model() - Refactored to use parent session if provided, otherwise create new one - Prevents nested session creation during training 2. hybrid_trainer.py: - Added optional 'session' parameter to train_hybrid_model() - Passes session to prophet_manager to maintain single session context 3. trainer.py: - Updated _train_single_product() to accept and pass session - Updated _train_all_models_enhanced() to accept and pass session - Pass db_session from main training context to all training methods - Added explicit db_session.flush() after critical progress update - This ensures updates are visible before acquiring locks Impact: - Eliminates nested session deadlocks - Training now proceeds past initial progress update - Maintains single database session context throughout training - Prevents database transaction conflicts Related Issues: - Fixes training hang during onboarding process - Not directly related to audit_metadata changes but exposed by them 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -56,7 +56,8 @@ class HybridProphetXGBoost:
|
||||
inventory_product_id: str,
|
||||
df: pd.DataFrame,
|
||||
job_id: str,
|
||||
validation_split: float = 0.2
|
||||
validation_split: float = 0.2,
|
||||
session = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Train hybrid Prophet + XGBoost model.
|
||||
@@ -67,6 +68,7 @@ class HybridProphetXGBoost:
|
||||
df: Training data (must have 'ds', 'y' and regressor columns)
|
||||
job_id: Training job identifier
|
||||
validation_split: Fraction of data for validation
|
||||
session: Optional database session (uses parent session if provided to avoid nested sessions)
|
||||
|
||||
Returns:
|
||||
Dictionary with model metadata and performance metrics
|
||||
@@ -80,11 +82,13 @@ class HybridProphetXGBoost:
|
||||
|
||||
# Step 1: Train Prophet model (base forecaster)
|
||||
logger.info("Step 1: Training Prophet base model")
|
||||
# ✅ FIX: Pass session to prophet_manager to avoid nested session issues
|
||||
prophet_result = await self.prophet_manager.train_bakery_model(
|
||||
tenant_id=tenant_id,
|
||||
inventory_product_id=inventory_product_id,
|
||||
df=df.copy(),
|
||||
job_id=job_id
|
||||
job_id=job_id,
|
||||
session=session
|
||||
)
|
||||
|
||||
self.prophet_model_data = prophet_result
|
||||
|
||||
Reference in New Issue
Block a user