Commit Graph

16 Commits

Author SHA1 Message Date
Claude
5a84be83d6 Fix multiple critical bugs in onboarding training step
This commit addresses all identified bugs and issues in the training code path:

## Critical Fixes:
- Add get_start_time() method to TrainingLogRepository and fix non-existent method call
- Remove duplicate training.started event from API endpoint (trainer publishes the accurate one)
- Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone"

## High Priority Fixes:
- Fix division by zero risk in time estimation with double-check and max() safety
- Remove unreachable exception handler in training_operations.py
- Simplify WebSocket token refresh logic to only reconnect on actual user session changes

## Medium Priority Fixes:
- Fix auto-start training effect with useRef to prevent duplicate starts
- Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket
- Extract all magic numbers to centralized constants files:
  - Backend: services/training/app/core/training_constants.py
  - Frontend: frontend/src/constants/training.ts
- Standardize error logging with exc_info=True on critical errors

## Code Quality Improvements:
- All progress percentages now use named constants
- All timeouts and intervals now use named constants
- Improved code maintainability and readability
- Better separation of concerns

## Files Changed:
- Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py
- Backend: training_operations.py, training_log_repository.py, training_constants.py (new)
- Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new)

All training progress events now properly flow from 0% to 100% with no gaps.
2025-11-05 13:02:39 +00:00
Urtzi Alfaro
3c689b4f98 REFACTOR external service and improve websocket training 2025-10-09 14:11:02 +02:00
Urtzi Alfaro
4777e59e7a Add base kubernetes support final fix 4 2025-09-29 07:54:25 +02:00
Urtzi Alfaro
63a3f9c77a Add base kubernetes support 2025-09-27 11:18:13 +02:00
Urtzi Alfaro
2f6f13bfef Training job in the background 2025-08-01 16:26:36 +02:00
Urtzi Alfaro
84ed4a7a2e Start fixing forecast service API 3 2025-07-29 15:08:55 +02:00
Urtzi Alfaro
938fd24e3a Fix data fetch 5 2025-07-27 21:32:29 +02:00
Urtzi Alfaro
e63a99b818 Checking onboardin flow - fix 4 2025-07-27 16:29:53 +02:00
Urtzi Alfaro
e2b85162f0 Fix generating pytest for training service 2025-07-25 14:10:27 +02:00
Urtzi Alfaro
153ae3f154 Fix forecasting service 2025-07-21 20:43:17 +02:00
Urtzi Alfaro
9a67f3d175 Improve base config 2025-07-19 21:44:52 +02:00
Urtzi Alfaro
c7fd6135f0 Improve auth models 2025-07-19 21:16:25 +02:00
Urtzi Alfaro
f3071c00bd Add all the code for training service 2025-07-19 16:59:37 +02:00
Urtzi Alfaro
4073222888 Fix imports 2025-07-18 14:41:39 +02:00
Urtzi Alfaro
cb80a93c4b Few fixes 2025-07-17 14:34:24 +02:00
Urtzi Alfaro
347ff51bd7 Initial microservices setup from artifacts 2025-07-17 13:09:24 +02:00