Commit Graph

102 Commits

Author SHA1 Message Date
Claude
5a84be83d6 Fix multiple critical bugs in onboarding training step
This commit addresses all identified bugs and issues in the training code path:

## Critical Fixes:
- Add get_start_time() method to TrainingLogRepository and fix non-existent method call
- Remove duplicate training.started event from API endpoint (trainer publishes the accurate one)
- Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone"

## High Priority Fixes:
- Fix division by zero risk in time estimation with double-check and max() safety
- Remove unreachable exception handler in training_operations.py
- Simplify WebSocket token refresh logic to only reconnect on actual user session changes

## Medium Priority Fixes:
- Fix auto-start training effect with useRef to prevent duplicate starts
- Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket
- Extract all magic numbers to centralized constants files:
  - Backend: services/training/app/core/training_constants.py
  - Frontend: frontend/src/constants/training.ts
- Standardize error logging with exc_info=True on critical errors

## Code Quality Improvements:
- All progress percentages now use named constants
- All timeouts and intervals now use named constants
- Improved code maintainability and readability
- Better separation of concerns

## Files Changed:
- Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py
- Backend: training_operations.py, training_log_repository.py, training_constants.py (new)
- Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new)

All training progress events now properly flow from 0% to 100% with no gaps.
2025-11-05 13:02:39 +00:00
Claude
799e7dbaeb Fix training job concurrent database session conflicts
Root Cause:
- Multiple parallel training tasks (3 at a time) were sharing the same database session
- This caused SQLAlchemy session state conflicts: "Session is already flushing" and "rollback() is already in progress"
- Additionally, duplicate model records were being created by both trainer and training_service

Fixes:
1. Separated model training from database writes:
   - Training happens in parallel (CPU-intensive)
   - Database writes happen sequentially after training completes
   - This eliminates concurrent session access

2. Removed duplicate database writes:
   - Trainer now writes all model records sequentially after parallel training
   - Training service now retrieves models instead of creating duplicates
   - Performance metrics are also created by trainer (no duplicates)

3. Added proper data flow:
   - _train_single_product: Only trains models, stores results
   - _write_training_results_to_database: Sequential DB writes after training
   - _store_trained_models: Changed to retrieve existing models
   - _create_performance_metrics: Changed to verify existing metrics

Benefits:
- Eliminates database session conflicts
- Prevents duplicate model records
- Maintains parallel training performance
- Ensures data consistency

Files Modified:
- services/training/app/ml/trainer.py
- services/training/app/services/training_service.py

Resolves: Onboarding training job database session conflicts
2025-11-05 12:41:42 +00:00
Urtzi Alfaro
394ad3aea4 Improve AI logic 2025-11-05 13:34:56 +01:00
Urtzi Alfaro
5adb0e39c0 Improve the frontend 5 2025-11-02 20:24:44 +01:00
Urtzi Alfaro
269d3b5032 Add user delete process 2025-10-31 11:54:19 +01:00
Urtzi Alfaro
36217a2729 Improve the frontend 2 2025-10-29 06:58:05 +01:00
Urtzi Alfaro
858d985c92 Improve the frontend modals 2025-10-27 16:33:26 +01:00
Urtzi Alfaro
8d30172483 Improve the frontend 2025-10-21 19:50:07 +02:00
Urtzi Alfaro
05da20357d Improve teh securty of teh DB 2025-10-19 19:22:37 +02:00
Urtzi Alfaro
dbb48d8e2c Improve the sales import 2025-10-15 21:09:42 +02:00
Urtzi Alfaro
8f9e9a7edc Add role-based filtering and imporve code 2025-10-15 16:12:49 +02:00
Urtzi Alfaro
96ad5c6692 Refactor datetime and timezone utils 2025-10-12 23:16:04 +02:00
Urtzi Alfaro
3c689b4f98 REFACTOR external service and improve websocket training 2025-10-09 14:11:02 +02:00
Urtzi Alfaro
7c72f83c51 REFACTOR ALL APIs fix 1 2025-10-07 07:15:07 +02:00
Urtzi Alfaro
38fb98bc27 REFACTOR ALL APIs 2025-10-06 15:27:01 +02:00
Urtzi Alfaro
7cc4b957a5 Fix DB issue 2s 2025-09-30 21:58:10 +02:00
Urtzi Alfaro
147893015e Fix DB issues 2025-09-30 13:32:51 +02:00
Urtzi Alfaro
ec6bcb4c7d Add migration services 2025-09-30 08:12:45 +02:00
Urtzi Alfaro
befcc126b0 Refactor all main.py 2025-09-29 13:13:12 +02:00
Urtzi Alfaro
4777e59e7a Add base kubernetes support final fix 4 2025-09-29 07:54:25 +02:00
Urtzi Alfaro
63a3f9c77a Add base kubernetes support 2025-09-27 11:18:13 +02:00
Urtzi Alfaro
a8f6e9d593 Simplify the onboardinf flow components 3 2025-09-08 21:52:56 +02:00
Urtzi Alfaro
0faaa25e58 Start integrating the onboarding flow with backend 3 2025-09-04 23:19:53 +02:00
Urtzi Alfaro
f33f5d242a Fix issues 4 2025-08-17 15:21:10 +02:00
Urtzi Alfaro
cafd316c4b Fix issues 3 2025-08-17 13:35:05 +02:00
Urtzi Alfaro
d21094a940 Fix issues 2 2025-08-17 11:12:17 +02:00
Urtzi Alfaro
109961ef6e Fix issues 2025-08-17 10:28:58 +02:00
Urtzi Alfaro
8914786973 New Frontend 2025-08-16 20:13:40 +02:00
Urtzi Alfaro
f7de9115d1 Fix new services implementation 5 2025-08-15 17:53:59 +02:00
Urtzi Alfaro
03737430ee Fix new services implementation 3 2025-08-14 16:47:34 +02:00
Urtzi Alfaro
fbe7470ad9 REFACTOR data service 2025-08-12 18:17:30 +02:00
Urtzi Alfaro
8d125ab0d5 Refactor the traffic fetching system 2025-08-10 18:32:47 +02:00
Urtzi Alfaro
3c2acc934a Improve the traffic fetching system 2025-08-10 17:31:38 +02:00
Urtzi Alfaro
312fdc8ef3 Improve the traffic fetching system 2025-08-08 23:29:48 +02:00
Urtzi Alfaro
8af17f1433 Improve the design of the frontend 2 2025-08-08 23:06:54 +02:00
Urtzi Alfaro
488bb3ef93 REFACTOR - Database logic 2025-08-08 09:08:41 +02:00
Urtzi Alfaro
32a7b913d0 Fix new Frontend 15 2025-08-04 21:46:12 +02:00
Urtzi Alfaro
8bb14ecc4f Fix new Frontend 14 2025-08-04 19:17:31 +02:00
Urtzi Alfaro
0ba543a19a Fix new Frontend 13 2025-08-04 18:58:12 +02:00
Urtzi Alfaro
35b02ca364 Fix new Frontend 12 2025-08-04 18:21:42 +02:00
Urtzi Alfaro
935f45a283 Add training job status in the db 2025-08-03 14:55:13 +02:00
Urtzi Alfaro
66716c054d Fix user delete flow 2 2025-08-02 17:53:28 +02:00
Urtzi Alfaro
3681429e11 Improve user delete flow 2025-08-02 17:09:53 +02:00
Urtzi Alfaro
277e8bec73 Add user role 2025-08-02 09:41:50 +02:00
Urtzi Alfaro
6e6c2cc42e Websocket fix 3 2025-08-01 20:43:02 +02:00
Urtzi Alfaro
37938b614f Websocket fix 2 2025-08-01 18:13:34 +02:00
Urtzi Alfaro
81e7ab7432 Websocket fix 1 2025-08-01 17:55:14 +02:00
Urtzi Alfaro
2f6f13bfef Training job in the background 2025-08-01 16:26:36 +02:00
Urtzi Alfaro
e581a144be Improve the event messaging for training service 2 2025-07-31 15:34:35 +02:00
Urtzi Alfaro
923b2d48d2 Improve the event messaging for training service 2025-07-30 21:21:02 +02:00