Files
bakery-ia/services/training/DEVELOPER_GUIDE.md

231 lines
5.6 KiB
Markdown

# Training Service - Developer Guide
## Quick Reference for Common Tasks
### Using Constants
Always use constants instead of magic numbers:
```python
from app.core import constants as const
# ✅ Good
if len(sales_data) < const.MIN_DATA_POINTS_REQUIRED:
raise ValueError("Insufficient data")
# ❌ Bad
if len(sales_data) < 30:
raise ValueError("Insufficient data")
```
### Timezone Handling
Always use timezone utilities:
```python
from app.utils.timezone_utils import ensure_timezone_aware, prepare_prophet_datetime
# ✅ Good - Ensure timezone-aware
dt = ensure_timezone_aware(user_input_date)
# ✅ Good - Prepare for Prophet
df = prepare_prophet_datetime(df, 'ds')
# ❌ Bad - Manual timezone handling
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
```
### Error Handling
Always raise exceptions, never return empty lists:
```python
# ✅ Good
if not data:
raise ValueError(f"No data available for {tenant_id}")
# ❌ Bad
if not data:
logger.error("No data")
return []
```
### Database Sessions
Use context manager correctly:
```python
# ✅ Good
async with self.database_manager.get_session() as session:
await session.execute(query)
# ❌ Bad
async with self.database_manager.get_session()() as session: # Double call!
await session.execute(query)
```
### Parallel Execution
Use asyncio.gather for concurrent operations:
```python
# ✅ Good - Parallel
tasks = [train_product(pid) for pid in product_ids]
results = await asyncio.gather(*tasks, return_exceptions=True)
# ❌ Bad - Sequential
results = []
for pid in product_ids:
result = await train_product(pid)
results.append(result)
```
### HTTP Client Configuration
Timeouts are configured automatically in DataClient:
```python
# No need to configure timeouts manually
# They're set in DataClient.__init__() using constants
client = DataClient() # Timeouts already configured
```
## File Organization
### Core Modules
- `core/constants.py` - All configuration constants
- `core/config.py` - Service settings
- `core/database.py` - Database configuration
### Utilities
- `utils/timezone_utils.py` - Timezone handling functions
- `utils/__init__.py` - Utility exports
### ML Components
- `ml/trainer.py` - Main training orchestration
- `ml/prophet_manager.py` - Prophet model management
- `ml/data_processor.py` - Data preprocessing
### Services
- `services/data_client.py` - External service communication
- `services/training_service.py` - Training job management
- `services/training_orchestrator.py` - Training pipeline coordination
## Common Pitfalls
### ❌ Don't Create Legacy Aliases
```python
# ❌ Bad
MyNewClass = OldClassName # Removed!
```
### ❌ Don't Use Magic Numbers
```python
# ❌ Bad
if score > 0.8: # What does 0.8 mean?
# ✅ Good
if score > const.IMPROVEMENT_SIGNIFICANCE_THRESHOLD:
```
### ❌ Don't Return Empty Lists on Error
```python
# ❌ Bad
except Exception as e:
logger.error(f"Failed: {e}")
return []
# ✅ Good
except Exception as e:
logger.error(f"Failed: {e}")
raise RuntimeError(f"Operation failed: {e}")
```
### ❌ Don't Handle Timezones Manually
```python
# ❌ Bad
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
# ✅ Good
from app.utils.timezone_utils import ensure_timezone_aware
dt = ensure_timezone_aware(dt)
```
## Testing Checklist
Before submitting code:
- [ ] All magic numbers replaced with constants
- [ ] Timezone handling uses utility functions
- [ ] Errors raise exceptions (not return empty collections)
- [ ] Database sessions use single `get_session()` call
- [ ] Parallel operations use `asyncio.gather`
- [ ] No legacy compatibility aliases
- [ ] No commented-out code
- [ ] Logging uses structured logging
## Performance Guidelines
### Training Jobs
- ✅ Use parallel execution for multiple products
- ✅ Reduce Optuna trials for low-volume products
- ✅ Use constants for all thresholds
- ⚠️ Monitor memory usage during parallel training
### Database Operations
- ✅ Use repository pattern
- ✅ Batch operations when possible
- ✅ Close sessions properly
- ⚠️ Connection pool limits not yet configured
### HTTP Requests
- ✅ Timeouts configured automatically
- ✅ Use shared clients from `shared/clients`
- ⚠️ Circuit breaker not yet implemented
- ⚠️ Request retries delegated to base client
## Debugging Tips
### Training Failures
1. Check logs for data validation errors
2. Verify timezone consistency in date ranges
3. Check minimum data point requirements
4. Review Prophet error messages
### Performance Issues
1. Check if parallel training is being used
2. Verify Optuna trial counts
3. Monitor database connection usage
4. Check HTTP timeout configurations
### Data Quality Issues
1. Review validation errors in logs
2. Check zero-ratio thresholds
3. Verify product classification
4. Review date range alignment
## Migration from Old Code
### If You Find Legacy Code
1. Check if alias exists (should be removed)
2. Update imports to use new names
3. Remove backward compatibility wrappers
4. Update documentation
### If You Find Magic Numbers
1. Add constant to `core/constants.py`
2. Update usage to reference constant
3. Document what the number represents
### If You Find Manual Timezone Handling
1. Import from `utils/timezone_utils`
2. Use appropriate utility function
3. Remove manual implementation
## Getting Help
- Review `IMPLEMENTATION_SUMMARY.md` for recent changes
- Check constants in `core/constants.py` for configuration
- Look at `utils/timezone_utils.py` for timezone functions
- Refer to analysis report for architectural decisions
---
*Last Updated: 2025-10-07*
*Status: Current*