Commit Graph

373 Commits

Author SHA1 Message Date
Urtzi Alfaro
48e61f4970 Delete unused migrations 2025-11-05 14:46:04 +01:00
Claude
7b81b1a537 Create consolidated initial schema migration for orchestration service
This commit consolidates the fragmented orchestration service migrations
into a single, well-structured initial schema version file.

Changes:
- Created 001_initial_schema.py consolidating all table definitions
- Merged fields from 2 previous migrations into one comprehensive file
- Added SCHEMA_DOCUMENTATION.md with complete schema reference
- Added MIGRATION_GUIDE.md for deployment instructions

Schema includes:
- orchestration_runs table (47 columns)
- orchestrationstatus enum type
- 15 optimized indexes for query performance
- Full step tracking (forecasting, production, procurement, notifications, AI insights)
- Saga pattern support
- Performance metrics tracking
- Error handling and retry logic

Benefits:
- Better organization and documentation
- Fixes revision ID inconsistencies from old migrations
- Eliminates duplicate index definitions
- Logically categorizes fields by purpose
- Easier to understand and maintain
- Comprehensive documentation for developers

The consolidated migration provides the same final schema as the
original migration chain but in a cleaner, more maintainable format.
2025-11-05 13:41:57 +00:00
ualsweb
fb2e1af270 Merge pull request #4 from ualsweb/claude/audit-orchestration-scheduler-011CUpnzhnQBA2aqEg24omEb
Fix all critical orchestration scheduler issues and add improvements
2025-11-05 14:36:03 +01:00
Claude
961bd2328f Fix all critical orchestration scheduler issues and add improvements
This commit addresses all 15 issues identified in the orchestration scheduler analysis:

HIGH PRIORITY FIXES:
1.  Database update methods already in orchestrator service (not in saga)
2.  Add null check for training_client before using it
3.  Fix cron schedule config from "0 5" to "30 5" (5:30 AM)
4.  Standardize on timezone-aware datetime (datetime.now(timezone.utc))
5.  Implement saga compensation logic with actual deletion calls
6.  Extract actual counts from saga results (no placeholders)

MEDIUM PRIORITY FIXES:
7.  Add circuit breakers for inventory/suppliers/recipes clients
8.  Pass circuit breakers to saga and use them in all service calls
9.  Add calling_service_name to AI Insights client
10.  Add database indexes on (tenant_id, started_at) and (status, started_at)
11.  Handle empty shared data gracefully (fail if all 3 fetches fail)

LOW PRIORITY IMPROVEMENTS:
12.  Make notification/validation failures more visible with explicit logging
13.  Track AI insights status in orchestration_runs table
14.  Improve run number generation atomicity using MAX() approach
15.  Optimize tenant ID handling (consistent UUID usage)

CHANGES:
- services/orchestrator/app/core/config.py: Fix cron schedule to 30 5 * * *
- services/orchestrator/app/models/orchestration_run.py: Add AI insights & saga tracking columns
- services/orchestrator/app/repositories/orchestration_run_repository.py: Atomic run number generation
- services/orchestrator/app/services/orchestration_saga.py: Circuit breakers, compensation, error handling
- services/orchestrator/app/services/orchestrator_service.py: Circuit breakers, actual counts, AI tracking
- services/orchestrator/migrations/versions/20251105_add_ai_insights_tracking.py: New migration

All issues resolved. No backwards compatibility. No TODOs. Production-ready.
2025-11-05 13:33:13 +00:00
Claude
8df90338b2 Fix training log race conditions and audit event error
Critical fixes for training session logging:

1. Training log race condition fix:
   - Add explicit session commits after creating training logs
   - Handle duplicate key errors gracefully when multiple sessions
     try to create the same log simultaneously
   - Implement retry logic to query for existing logs after
     duplicate key violations
   - Prevents "Training log not found" errors during training

2. Audit event async generator error fix:
   - Replace incorrect next(get_db()) usage with proper
     async context manager (database_manager.get_session())
   - Fixes "'async_generator' object is not an iterator" error
   - Ensures audit logging works correctly

These changes address race conditions in concurrent database
sessions and ensure training logs are properly synchronized
across the training pipeline.
2025-11-05 13:24:22 +00:00
Claude
5a84be83d6 Fix multiple critical bugs in onboarding training step
This commit addresses all identified bugs and issues in the training code path:

## Critical Fixes:
- Add get_start_time() method to TrainingLogRepository and fix non-existent method call
- Remove duplicate training.started event from API endpoint (trainer publishes the accurate one)
- Add missing progress events for 80-100% range (85%, 92%, 94%) to eliminate progress "dead zone"

## High Priority Fixes:
- Fix division by zero risk in time estimation with double-check and max() safety
- Remove unreachable exception handler in training_operations.py
- Simplify WebSocket token refresh logic to only reconnect on actual user session changes

## Medium Priority Fixes:
- Fix auto-start training effect with useRef to prevent duplicate starts
- Add HTTP polling debounce delay (5s) to prevent race conditions with WebSocket
- Extract all magic numbers to centralized constants files:
  - Backend: services/training/app/core/training_constants.py
  - Frontend: frontend/src/constants/training.ts
- Standardize error logging with exc_info=True on critical errors

## Code Quality Improvements:
- All progress percentages now use named constants
- All timeouts and intervals now use named constants
- Improved code maintainability and readability
- Better separation of concerns

## Files Changed:
- Backend: training_service.py, trainer.py, training_events.py, progress_tracker.py
- Backend: training_operations.py, training_log_repository.py, training_constants.py (new)
- Frontend: training.ts (hooks), MLTrainingStep.tsx, training.ts (constants, new)

All training progress events now properly flow from 0% to 100% with no gaps.
2025-11-05 13:02:39 +00:00
Claude
799e7dbaeb Fix training job concurrent database session conflicts
Root Cause:
- Multiple parallel training tasks (3 at a time) were sharing the same database session
- This caused SQLAlchemy session state conflicts: "Session is already flushing" and "rollback() is already in progress"
- Additionally, duplicate model records were being created by both trainer and training_service

Fixes:
1. Separated model training from database writes:
   - Training happens in parallel (CPU-intensive)
   - Database writes happen sequentially after training completes
   - This eliminates concurrent session access

2. Removed duplicate database writes:
   - Trainer now writes all model records sequentially after parallel training
   - Training service now retrieves models instead of creating duplicates
   - Performance metrics are also created by trainer (no duplicates)

3. Added proper data flow:
   - _train_single_product: Only trains models, stores results
   - _write_training_results_to_database: Sequential DB writes after training
   - _store_trained_models: Changed to retrieve existing models
   - _create_performance_metrics: Changed to verify existing metrics

Benefits:
- Eliminates database session conflicts
- Prevents duplicate model records
- Maintains parallel training performance
- Ensures data consistency

Files Modified:
- services/training/app/ml/trainer.py
- services/training/app/services/training_service.py

Resolves: Onboarding training job database session conflicts
2025-11-05 12:41:42 +00:00
Urtzi Alfaro
394ad3aea4 Improve AI logic 2025-11-05 13:34:56 +01:00
Urtzi Alfaro
5adb0e39c0 Improve the frontend 5 2025-11-02 20:24:44 +01:00
Urtzi Alfaro
0220da1725 Improve the frontend 4 2025-11-01 21:35:03 +01:00
Urtzi Alfaro
f44d235c6d Add user delete process 2 2025-10-31 18:57:58 +01:00
Urtzi Alfaro
269d3b5032 Add user delete process 2025-10-31 11:54:19 +01:00
Urtzi Alfaro
63f5c6d512 Improve the frontend 3 2025-10-30 21:08:07 +01:00
Urtzi Alfaro
36217a2729 Improve the frontend 2 2025-10-29 06:58:05 +01:00
Urtzi Alfaro
858d985c92 Improve the frontend modals 2025-10-27 16:33:26 +01:00
Urtzi Alfaro
61376b7a9f Improve the frontend and fix TODOs 2025-10-24 13:05:04 +02:00
Urtzi Alfaro
07c33fa578 Improve the frontend and repository layer 2025-10-23 07:44:54 +02:00
Urtzi Alfaro
8d30172483 Improve the frontend 2025-10-21 19:50:07 +02:00
Urtzi Alfaro
05da20357d Improve teh securty of teh DB 2025-10-19 19:22:37 +02:00
Urtzi Alfaro
62971c07d7 Update landing page 2025-10-18 16:03:23 +02:00
Urtzi Alfaro
312e36c893 Update requirements and insfra versions 2025-10-17 23:09:40 +02:00
Urtzi Alfaro
7e089b80cf Improve public pages 2025-10-17 18:14:28 +02:00
Urtzi Alfaro
d4060962e4 Improve demo seed 2025-10-17 07:31:14 +02:00
Urtzi Alfaro
b6cb800758 Improve GDPR implementation 2025-10-16 07:28:04 +02:00
Urtzi Alfaro
dbb48d8e2c Improve the sales import 2025-10-15 21:09:42 +02:00
Urtzi Alfaro
8f9e9a7edc Add role-based filtering and imporve code 2025-10-15 16:12:49 +02:00
Urtzi Alfaro
96ad5c6692 Refactor datetime and timezone utils 2025-10-12 23:16:04 +02:00
Urtzi Alfaro
7556a00db7 Improve the demo feature of the project 2025-10-12 18:47:33 +02:00
Urtzi Alfaro
dbc7f2fa0d Re-create migrations init tables 2025-10-09 20:47:31 +02:00
Urtzi Alfaro
b420af32c5 REFACTOR production scheduler 2025-10-09 18:01:24 +02:00
Urtzi Alfaro
3c689b4f98 REFACTOR external service and improve websocket training 2025-10-09 14:11:02 +02:00
Urtzi Alfaro
7c72f83c51 REFACTOR ALL APIs fix 1 2025-10-07 07:15:07 +02:00
Urtzi Alfaro
38fb98bc27 REFACTOR ALL APIs 2025-10-06 15:27:01 +02:00
Urtzi Alfaro
dc8221bd2f Add DEMO feature to the project 2025-10-03 14:09:34 +02:00
Urtzi Alfaro
1243c2ca6d Add fixes to procurement logic and fix rel-time connections 2025-10-02 13:20:30 +02:00
Urtzi Alfaro
c9d8d1d071 Fix onboarding process not getting the subcription plan 2025-10-01 21:56:38 +02:00
Urtzi Alfaro
0fdc3b0211 Fix issues 2025-10-01 16:25:53 +02:00
Urtzi Alfaro
36b44c41f1 Fix issues 2025-10-01 14:39:10 +02:00
Urtzi Alfaro
6fa655275f Fix notification service health issues 2025-10-01 12:28:00 +02:00
Urtzi Alfaro
2eeebfc1e0 Fix Alembic issue 2025-10-01 11:24:06 +02:00
Urtzi Alfaro
7cc4b957a5 Fix DB issue 2s 2025-09-30 21:58:10 +02:00
Urtzi Alfaro
147893015e Fix DB issues 2025-09-30 13:32:51 +02:00
Urtzi Alfaro
ec6bcb4c7d Add migration services 2025-09-30 08:12:45 +02:00
Urtzi Alfaro
2712a60a2a Refactor services alembic 2025-09-29 19:16:34 +02:00
Urtzi Alfaro
befcc126b0 Refactor all main.py 2025-09-29 13:13:12 +02:00
Urtzi Alfaro
4777e59e7a Add base kubernetes support final fix 4 2025-09-29 07:54:25 +02:00
Urtzi Alfaro
57f77638cc Add base kubernetes support final fix 2 2025-09-28 19:48:05 +02:00
Urtzi Alfaro
3816383760 Add base kubernetes support final 2025-09-28 13:54:28 +02:00
Urtzi Alfaro
b95ecf1c53 Add base kubernetes support 5 2025-09-27 22:55:42 +02:00
Urtzi Alfaro
222f945466 Add base kubernetes support 2 2025-09-27 12:10:43 +02:00