Commit Graph

732 Commits

Author SHA1 Message Date
Claude
5a84be83d6 Fix multiple critical bugs in onboarding training step
This commit addresses all identified bugs and issues in the training code path:

## Critical Fixes:
- Add get_start_time() method to TrainingLogRepository and fix non-existent method call
- Remove duplicate training.started event from API endpoint (trainer publishes the accurate one)
- Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone"

## High Priority Fixes:
- Fix division by zero risk in time estimation with double-check and max() safety
- Remove unreachable exception handler in training_operations.py
- Simplify WebSocket token refresh logic to only reconnect on actual user session changes

## Medium Priority Fixes:
- Fix auto-start training effect with useRef to prevent duplicate starts
- Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket
- Extract all magic numbers to centralized constants files:
  - Backend: services/training/app/core/training_constants.py
  - Frontend: frontend/src/constants/training.ts
- Standardize error logging with exc_info=True on critical errors

## Code Quality Improvements:
- All progress percentages now use named constants
- All timeouts and intervals now use named constants
- Improved code maintainability and readability
- Better separation of concerns

## Files Changed:
- Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py
- Backend: training_operations.py, training_log_repository.py, training_constants.py (new)
- Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new)

All training progress events now properly flow from 0% to 100% with no gaps.
2025-11-05 13:02:39 +00:00
ualsweb
e3ea92640b Merge pull request #1 from ualsweb/claude/fix-onboarding-training-job-011CUpkdtoMWGH7ANd33zRbm
Fix training job concurrent database session conflicts
2025-11-05 13:44:07 +01:00
Claude
799e7dbaeb Fix training job concurrent database session conflicts
Root Cause:
- Multiple parallel training tasks (3 at a time) were sharing the same database session
- This caused SQLAlchemy session state conflicts: "Session is already flushing" and "rollback() is already in progress"
- Additionally, duplicate model records were being created by both trainer and training_service

Fixes:
1. Separated model training from database writes:
   - Training happens in parallel (CPU-intensive)
   - Database writes happen sequentially after training completes
   - This eliminates concurrent session access

2. Removed duplicate database writes:
   - Trainer now writes all model records sequentially after parallel training
   - Training service now retrieves models instead of creating duplicates
   - Performance metrics are also created by trainer (no duplicates)

3. Added proper data flow:
   - _train_single_product: Only trains models, stores results
   - _write_training_results_to_database: Sequential DB writes after training
   - _store_trained_models: Changed to retrieve existing models
   - _create_performance_metrics: Changed to verify existing metrics

Benefits:
- Eliminates database session conflicts
- Prevents duplicate model records
- Maintains parallel training performance
- Ensures data consistency

Files Modified:
- services/training/app/ml/trainer.py
- services/training/app/services/training_service.py

Resolves: Onboarding training job database session conflicts
2025-11-05 12:41:42 +00:00
Urtzi Alfaro
394ad3aea4 Improve AI logic 2025-11-05 13:34:56 +01:00
Urtzi Alfaro
5c87fbcf48 Improve the frontend 6 2025-11-02 20:26:25 +01:00
Urtzi Alfaro
5adb0e39c0 Improve the frontend 5 2025-11-02 20:24:44 +01:00
Urtzi Alfaro
0220da1725 Improve the frontend 4 2025-11-01 21:35:03 +01:00
Urtzi Alfaro
f44d235c6d Add user delete process 2 2025-10-31 18:57:58 +01:00
Urtzi Alfaro
269d3b5032 Add user delete process 2025-10-31 11:54:19 +01:00
Urtzi Alfaro
63f5c6d512 Improve the frontend 3 2025-10-30 21:08:07 +01:00
Urtzi Alfaro
36217a2729 Improve the frontend 2 2025-10-29 06:58:05 +01:00
Urtzi Alfaro
858d985c92 Improve the frontend modals 2025-10-27 16:33:26 +01:00
Urtzi Alfaro
61376b7a9f Improve the frontend and fix TODOs 2025-10-24 13:05:04 +02:00
Urtzi Alfaro
07c33fa578 Improve the frontend and repository layer 2025-10-23 07:44:54 +02:00
Urtzi Alfaro
8d30172483 Improve the frontend 2025-10-21 19:50:07 +02:00
Urtzi Alfaro
05da20357d Improve teh securty of teh DB 2025-10-19 19:22:37 +02:00
Urtzi Alfaro
62971c07d7 Update landing page 2025-10-18 16:03:23 +02:00
Urtzi Alfaro
312e36c893 Update requirements and insfra versions 2025-10-17 23:09:40 +02:00
Urtzi Alfaro
7e089b80cf Improve public pages 2025-10-17 18:14:28 +02:00
Urtzi Alfaro
d4060962e4 Improve demo seed 2025-10-17 07:31:14 +02:00
Urtzi Alfaro
b6cb800758 Improve GDPR implementation 2025-10-16 07:28:04 +02:00
Urtzi Alfaro
dbb48d8e2c Improve the sales import 2025-10-15 21:09:42 +02:00
Urtzi Alfaro
8f9e9a7edc Add role-based filtering and imporve code 2025-10-15 16:12:49 +02:00
Urtzi Alfaro
96ad5c6692 Refactor datetime and timezone utils 2025-10-12 23:16:04 +02:00
Urtzi Alfaro
7556a00db7 Improve the demo feature of the project 2025-10-12 18:47:33 +02:00
Urtzi Alfaro
dbc7f2fa0d Re-create migrations init tables 2025-10-09 20:47:31 +02:00
Urtzi Alfaro
b420af32c5 REFACTOR production scheduler 2025-10-09 18:01:24 +02:00
Urtzi Alfaro
3c689b4f98 REFACTOR external service and improve websocket training 2025-10-09 14:11:02 +02:00
Urtzi Alfaro
7c72f83c51 REFACTOR ALL APIs fix 1 2025-10-07 07:15:07 +02:00
Urtzi Alfaro
38fb98bc27 REFACTOR ALL APIs 2025-10-06 15:27:01 +02:00
Urtzi Alfaro
dc8221bd2f Add DEMO feature to the project 2025-10-03 14:09:34 +02:00
Urtzi Alfaro
1243c2ca6d Add fixes to procurement logic and fix rel-time connections 2025-10-02 13:20:30 +02:00
Urtzi Alfaro
c9d8d1d071 Fix onboarding process not getting the subcription plan 2025-10-01 21:56:38 +02:00
Urtzi Alfaro
b93fb850c3 Add tilt support 2025-10-01 18:58:30 +02:00
Urtzi Alfaro
0fdc3b0211 Fix issues 2025-10-01 16:25:53 +02:00
Urtzi Alfaro
36b44c41f1 Fix issues 2025-10-01 14:39:10 +02:00
Urtzi Alfaro
6fa655275f Fix notification service health issues 2025-10-01 12:28:00 +02:00
Urtzi Alfaro
016742d63f Fix startup issues 2025-10-01 12:17:59 +02:00
Urtzi Alfaro
2eeebfc1e0 Fix Alembic issue 2025-10-01 11:24:06 +02:00
Urtzi Alfaro
7cc4b957a5 Fix DB issue 2s 2025-09-30 21:58:10 +02:00
Urtzi Alfaro
147893015e Fix DB issues 2025-09-30 13:32:51 +02:00
Urtzi Alfaro
ec6bcb4c7d Add migration services 2025-09-30 08:12:45 +02:00
Urtzi Alfaro
d1c83dce74 Clean code 2025-09-29 19:18:45 +02:00
Urtzi Alfaro
2712a60a2a Refactor services alembic 2025-09-29 19:16:34 +02:00
Urtzi Alfaro
befcc126b0 Refactor all main.py 2025-09-29 13:13:12 +02:00
Urtzi Alfaro
4777e59e7a Add base kubernetes support final fix 4 2025-09-29 07:54:25 +02:00
Urtzi Alfaro
57f77638cc Add base kubernetes support final fix 2 2025-09-28 19:48:05 +02:00
Urtzi Alfaro
83f1d9df87 Add base kubernetes support final fix 1 2025-09-28 14:01:45 +02:00
Urtzi Alfaro
3816383760 Add base kubernetes support final 2025-09-28 13:54:28 +02:00
Urtzi Alfaro
b95ecf1c53 Add base kubernetes support 5 2025-09-27 22:55:42 +02:00