This commit addresses all identified bugs and issues in the training code path: ## Critical Fixes: - Add get_start_time() method to TrainingLogRepository and fix non-existent method call - Remove duplicate training.started event from API endpoint (trainer publishes the accurate one) - Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone" ## High Priority Fixes: - Fix division by zero risk in time estimation with double-check and max() safety - Remove unreachable exception handler in training_operations.py - Simplify WebSocket token refresh logic to only reconnect on actual user session changes ## Medium Priority Fixes: - Fix auto-start training effect with useRef to prevent duplicate starts - Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket - Extract all magic numbers to centralized constants files: - Backend: services/training/app/core/training_constants.py - Frontend: frontend/src/constants/training.ts - Standardize error logging with exc_info=True on critical errors ## Code Quality Improvements: - All progress percentages now use named constants - All timeouts and intervals now use named constants - Improved code maintainability and readability - Better separation of concerns ## Files Changed: - Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py - Backend: training_operations.py, training_log_repository.py, training_constants.py (new) - Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new) All training progress events now properly flow from 0% to 100% with no gaps.
36 lines
1.2 KiB
Python
36 lines
1.2 KiB
Python
"""
|
|
Training Progress Constants
|
|
Centralized constants for training progress tracking and timing
|
|
"""
|
|
|
|
# Progress Milestones (percentage)
|
|
PROGRESS_STARTED = 0
|
|
PROGRESS_DATA_VALIDATION = 10
|
|
PROGRESS_DATA_ANALYSIS = 20
|
|
PROGRESS_DATA_PREPARATION_COMPLETE = 30
|
|
PROGRESS_ML_TRAINING_START = 40
|
|
PROGRESS_TRAINING_COMPLETE = 85
|
|
PROGRESS_STORING_MODELS = 92
|
|
PROGRESS_STORING_METRICS = 94
|
|
PROGRESS_COMPLETED = 100
|
|
|
|
# Progress Ranges
|
|
PROGRESS_TRAINING_RANGE_START = 20 # After data analysis
|
|
PROGRESS_TRAINING_RANGE_END = 80 # Before finalization
|
|
PROGRESS_TRAINING_RANGE_WIDTH = PROGRESS_TRAINING_RANGE_END - PROGRESS_TRAINING_RANGE_START # 60%
|
|
|
|
# Time Limits and Intervals (seconds)
|
|
MAX_ESTIMATED_TIME_REMAINING_SECONDS = 1800 # 30 minutes
|
|
WEBSOCKET_HEARTBEAT_INTERVAL_SECONDS = 30
|
|
WEBSOCKET_RECONNECT_MAX_ATTEMPTS = 3
|
|
WEBSOCKET_RECONNECT_INITIAL_DELAY_SECONDS = 1
|
|
WEBSOCKET_RECONNECT_MAX_DELAY_SECONDS = 10
|
|
|
|
# Training Timeouts (seconds)
|
|
TRAINING_SKIP_OPTION_DELAY_SECONDS = 120 # 2 minutes
|
|
HTTP_POLLING_INTERVAL_MS = 5000 # 5 seconds
|
|
HTTP_POLLING_DEBOUNCE_MS = 5000 # 5 seconds before enabling after WebSocket disconnect
|
|
|
|
# Frontend Display
|
|
TRAINING_COMPLETION_DELAY_MS = 2000 # Delay before navigating after completion
|