bakery-ia

Author	SHA1	Message	Date
Claude	c64585af57	Fix training hang by wrapping blocking ML operations in thread pool Root Cause: Training process was stuck at 40% because blocking synchronous ML operations (model.fit(), model.predict(), study.optimize()) were freezing the asyncio event loop, preventing RabbitMQ heartbeats, WebSocket communication, and progress updates. Changes: 1. prophet_manager.py: - Wrapped model.fit() at line 189 with asyncio.to_thread() - Wrapped study.optimize() at line 453 with asyncio.to_thread() 2. hybrid_trainer.py: - Made _train_xgboost() async and wrapped model.fit() with asyncio.to_thread() - Made _evaluate_hybrid_model() async and wrapped predict() calls - Fixed predict() method to wrap blocking predict() calls Impact: - Event loop no longer blocks during ML training - RabbitMQ heartbeats continue during training - WebSocket progress updates work correctly - Training can now complete successfully Fixes: Training hang at 40% during onboarding phase	2025-11-05 14:34:53 +00:00
Claude	136761af19	Fix AuditLogger.log_event() parameter name: metadata -> audit_metadata	2025-11-05 14:17:39 +00:00
Claude	8df90338b2	Fix training log race conditions and audit event error Critical fixes for training session logging: 1. Training log race condition fix: - Add explicit session commits after creating training logs - Handle duplicate key errors gracefully when multiple sessions try to create the same log simultaneously - Implement retry logic to query for existing logs after duplicate key violations - Prevents "Training log not found" errors during training 2. Audit event async generator error fix: - Replace incorrect next(get_db()) usage with proper async context manager (database_manager.get_session()) - Fixes "'async_generator' object is not an iterator" error - Ensures audit logging works correctly These changes address race conditions in concurrent database sessions and ensure training logs are properly synchronized across the training pipeline.	2025-11-05 13:24:22 +00:00
Claude	5a84be83d6	Fix multiple critical bugs in onboarding training step This commit addresses all identified bugs and issues in the training code path: ## Critical Fixes: - Add get_start_time() method to TrainingLogRepository and fix non-existent method call - Remove duplicate training.started event from API endpoint (trainer publishes the accurate one) - Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone" ## High Priority Fixes: - Fix division by zero risk in time estimation with double-check and max() safety - Remove unreachable exception handler in training_operations.py - Simplify WebSocket token refresh logic to only reconnect on actual user session changes ## Medium Priority Fixes: - Fix auto-start training effect with useRef to prevent duplicate starts - Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket - Extract all magic numbers to centralized constants files: - Backend: services/training/app/core/training_constants.py - Frontend: frontend/src/constants/training.ts - Standardize error logging with exc_info=True on critical errors ## Code Quality Improvements: - All progress percentages now use named constants - All timeouts and intervals now use named constants - Improved code maintainability and readability - Better separation of concerns ## Files Changed: - Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py - Backend: training_operations.py, training_log_repository.py, training_constants.py (new) - Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new) All training progress events now properly flow from 0% to 100% with no gaps.	2025-11-05 13:02:39 +00:00
Claude	799e7dbaeb	Fix training job concurrent database session conflicts Root Cause: - Multiple parallel training tasks (3 at a time) were sharing the same database session - This caused SQLAlchemy session state conflicts: "Session is already flushing" and "rollback() is already in progress" - Additionally, duplicate model records were being created by both trainer and training_service Fixes: 1. Separated model training from database writes: - Training happens in parallel (CPU-intensive) - Database writes happen sequentially after training completes - This eliminates concurrent session access 2. Removed duplicate database writes: - Trainer now writes all model records sequentially after parallel training - Training service now retrieves models instead of creating duplicates - Performance metrics are also created by trainer (no duplicates) 3. Added proper data flow: - _train_single_product: Only trains models, stores results - _write_training_results_to_database: Sequential DB writes after training - _store_trained_models: Changed to retrieve existing models - _create_performance_metrics: Changed to verify existing metrics Benefits: - Eliminates database session conflicts - Prevents duplicate model records - Maintains parallel training performance - Ensures data consistency Files Modified: - services/training/app/ml/trainer.py - services/training/app/services/training_service.py Resolves: Onboarding training job database session conflicts	2025-11-05 12:41:42 +00:00
Urtzi Alfaro	394ad3aea4	Improve AI logic	2025-11-05 13:34:56 +01:00
Urtzi Alfaro	5adb0e39c0	Improve the frontend 5	2025-11-02 20:24:44 +01:00
Urtzi Alfaro	269d3b5032	Add user delete process	2025-10-31 11:54:19 +01:00
Urtzi Alfaro	36217a2729	Improve the frontend 2	2025-10-29 06:58:05 +01:00
Urtzi Alfaro	858d985c92	Improve the frontend modals	2025-10-27 16:33:26 +01:00
Urtzi Alfaro	8d30172483	Improve the frontend	2025-10-21 19:50:07 +02:00
Urtzi Alfaro	05da20357d	Improve teh securty of teh DB	2025-10-19 19:22:37 +02:00
Urtzi Alfaro	dbb48d8e2c	Improve the sales import	2025-10-15 21:09:42 +02:00
Urtzi Alfaro	8f9e9a7edc	Add role-based filtering and imporve code	2025-10-15 16:12:49 +02:00
Urtzi Alfaro	96ad5c6692	Refactor datetime and timezone utils	2025-10-12 23:16:04 +02:00
Urtzi Alfaro	3c689b4f98	REFACTOR external service and improve websocket training	2025-10-09 14:11:02 +02:00
Urtzi Alfaro	7c72f83c51	REFACTOR ALL APIs fix 1	2025-10-07 07:15:07 +02:00
Urtzi Alfaro	38fb98bc27	REFACTOR ALL APIs	2025-10-06 15:27:01 +02:00
Urtzi Alfaro	7cc4b957a5	Fix DB issue 2s	2025-09-30 21:58:10 +02:00
Urtzi Alfaro	147893015e	Fix DB issues	2025-09-30 13:32:51 +02:00
Urtzi Alfaro	ec6bcb4c7d	Add migration services	2025-09-30 08:12:45 +02:00
Urtzi Alfaro	befcc126b0	Refactor all main.py	2025-09-29 13:13:12 +02:00
Urtzi Alfaro	4777e59e7a	Add base kubernetes support final fix 4	2025-09-29 07:54:25 +02:00
Urtzi Alfaro	63a3f9c77a	Add base kubernetes support	2025-09-27 11:18:13 +02:00
Urtzi Alfaro	a8f6e9d593	Simplify the onboardinf flow components 3	2025-09-08 21:52:56 +02:00
Urtzi Alfaro	0faaa25e58	Start integrating the onboarding flow with backend 3	2025-09-04 23:19:53 +02:00
Urtzi Alfaro	f33f5d242a	Fix issues 4	2025-08-17 15:21:10 +02:00
Urtzi Alfaro	cafd316c4b	Fix issues 3	2025-08-17 13:35:05 +02:00
Urtzi Alfaro	d21094a940	Fix issues 2	2025-08-17 11:12:17 +02:00
Urtzi Alfaro	109961ef6e	Fix issues	2025-08-17 10:28:58 +02:00
Urtzi Alfaro	8914786973	New Frontend	2025-08-16 20:13:40 +02:00
Urtzi Alfaro	f7de9115d1	Fix new services implementation 5	2025-08-15 17:53:59 +02:00
Urtzi Alfaro	03737430ee	Fix new services implementation 3	2025-08-14 16:47:34 +02:00
Urtzi Alfaro	fbe7470ad9	REFACTOR data service	2025-08-12 18:17:30 +02:00
Urtzi Alfaro	8d125ab0d5	Refactor the traffic fetching system	2025-08-10 18:32:47 +02:00
Urtzi Alfaro	3c2acc934a	Improve the traffic fetching system	2025-08-10 17:31:38 +02:00
Urtzi Alfaro	312fdc8ef3	Improve the traffic fetching system	2025-08-08 23:29:48 +02:00
Urtzi Alfaro	8af17f1433	Improve the design of the frontend 2	2025-08-08 23:06:54 +02:00
Urtzi Alfaro	488bb3ef93	REFACTOR - Database logic	2025-08-08 09:08:41 +02:00
Urtzi Alfaro	32a7b913d0	Fix new Frontend 15	2025-08-04 21:46:12 +02:00
Urtzi Alfaro	8bb14ecc4f	Fix new Frontend 14	2025-08-04 19:17:31 +02:00
Urtzi Alfaro	0ba543a19a	Fix new Frontend 13	2025-08-04 18:58:12 +02:00
Urtzi Alfaro	35b02ca364	Fix new Frontend 12	2025-08-04 18:21:42 +02:00
Urtzi Alfaro	935f45a283	Add training job status in the db	2025-08-03 14:55:13 +02:00
Urtzi Alfaro	66716c054d	Fix user delete flow 2	2025-08-02 17:53:28 +02:00
Urtzi Alfaro	3681429e11	Improve user delete flow	2025-08-02 17:09:53 +02:00
Urtzi Alfaro	277e8bec73	Add user role	2025-08-02 09:41:50 +02:00
Urtzi Alfaro	6e6c2cc42e	Websocket fix 3	2025-08-01 20:43:02 +02:00
Urtzi Alfaro	37938b614f	Websocket fix 2	2025-08-01 18:13:34 +02:00
Urtzi Alfaro	81e7ab7432	Websocket fix 1	2025-08-01 17:55:14 +02:00

1 2 3

105 Commits