REFACTOR external service and improve websocket training
This commit is contained in:
230
services/training/DEVELOPER_GUIDE.md
Normal file
230
services/training/DEVELOPER_GUIDE.md
Normal file
@@ -0,0 +1,230 @@
|
||||
# Training Service - Developer Guide
|
||||
|
||||
## Quick Reference for Common Tasks
|
||||
|
||||
### Using Constants
|
||||
Always use constants instead of magic numbers:
|
||||
|
||||
```python
|
||||
from app.core import constants as const
|
||||
|
||||
# ✅ Good
|
||||
if len(sales_data) < const.MIN_DATA_POINTS_REQUIRED:
|
||||
raise ValueError("Insufficient data")
|
||||
|
||||
# ❌ Bad
|
||||
if len(sales_data) < 30:
|
||||
raise ValueError("Insufficient data")
|
||||
```
|
||||
|
||||
### Timezone Handling
|
||||
Always use timezone utilities:
|
||||
|
||||
```python
|
||||
from app.utils.timezone_utils import ensure_timezone_aware, prepare_prophet_datetime
|
||||
|
||||
# ✅ Good - Ensure timezone-aware
|
||||
dt = ensure_timezone_aware(user_input_date)
|
||||
|
||||
# ✅ Good - Prepare for Prophet
|
||||
df = prepare_prophet_datetime(df, 'ds')
|
||||
|
||||
# ❌ Bad - Manual timezone handling
|
||||
if dt.tzinfo is None:
|
||||
dt = dt.replace(tzinfo=timezone.utc)
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
Always raise exceptions, never return empty lists:
|
||||
|
||||
```python
|
||||
# ✅ Good
|
||||
if not data:
|
||||
raise ValueError(f"No data available for {tenant_id}")
|
||||
|
||||
# ❌ Bad
|
||||
if not data:
|
||||
logger.error("No data")
|
||||
return []
|
||||
```
|
||||
|
||||
### Database Sessions
|
||||
Use context manager correctly:
|
||||
|
||||
```python
|
||||
# ✅ Good
|
||||
async with self.database_manager.get_session() as session:
|
||||
await session.execute(query)
|
||||
|
||||
# ❌ Bad
|
||||
async with self.database_manager.get_session()() as session: # Double call!
|
||||
await session.execute(query)
|
||||
```
|
||||
|
||||
### Parallel Execution
|
||||
Use asyncio.gather for concurrent operations:
|
||||
|
||||
```python
|
||||
# ✅ Good - Parallel
|
||||
tasks = [train_product(pid) for pid in product_ids]
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
# ❌ Bad - Sequential
|
||||
results = []
|
||||
for pid in product_ids:
|
||||
result = await train_product(pid)
|
||||
results.append(result)
|
||||
```
|
||||
|
||||
### HTTP Client Configuration
|
||||
Timeouts are configured automatically in DataClient:
|
||||
|
||||
```python
|
||||
# No need to configure timeouts manually
|
||||
# They're set in DataClient.__init__() using constants
|
||||
client = DataClient() # Timeouts already configured
|
||||
```
|
||||
|
||||
## File Organization
|
||||
|
||||
### Core Modules
|
||||
- `core/constants.py` - All configuration constants
|
||||
- `core/config.py` - Service settings
|
||||
- `core/database.py` - Database configuration
|
||||
|
||||
### Utilities
|
||||
- `utils/timezone_utils.py` - Timezone handling functions
|
||||
- `utils/__init__.py` - Utility exports
|
||||
|
||||
### ML Components
|
||||
- `ml/trainer.py` - Main training orchestration
|
||||
- `ml/prophet_manager.py` - Prophet model management
|
||||
- `ml/data_processor.py` - Data preprocessing
|
||||
|
||||
### Services
|
||||
- `services/data_client.py` - External service communication
|
||||
- `services/training_service.py` - Training job management
|
||||
- `services/training_orchestrator.py` - Training pipeline coordination
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### ❌ Don't Create Legacy Aliases
|
||||
```python
|
||||
# ❌ Bad
|
||||
MyNewClass = OldClassName # Removed!
|
||||
```
|
||||
|
||||
### ❌ Don't Use Magic Numbers
|
||||
```python
|
||||
# ❌ Bad
|
||||
if score > 0.8: # What does 0.8 mean?
|
||||
|
||||
# ✅ Good
|
||||
if score > const.IMPROVEMENT_SIGNIFICANCE_THRESHOLD:
|
||||
```
|
||||
|
||||
### ❌ Don't Return Empty Lists on Error
|
||||
```python
|
||||
# ❌ Bad
|
||||
except Exception as e:
|
||||
logger.error(f"Failed: {e}")
|
||||
return []
|
||||
|
||||
# ✅ Good
|
||||
except Exception as e:
|
||||
logger.error(f"Failed: {e}")
|
||||
raise RuntimeError(f"Operation failed: {e}")
|
||||
```
|
||||
|
||||
### ❌ Don't Handle Timezones Manually
|
||||
```python
|
||||
# ❌ Bad
|
||||
if dt.tzinfo is None:
|
||||
dt = dt.replace(tzinfo=timezone.utc)
|
||||
|
||||
# ✅ Good
|
||||
from app.utils.timezone_utils import ensure_timezone_aware
|
||||
dt = ensure_timezone_aware(dt)
|
||||
```
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
Before submitting code:
|
||||
- [ ] All magic numbers replaced with constants
|
||||
- [ ] Timezone handling uses utility functions
|
||||
- [ ] Errors raise exceptions (not return empty collections)
|
||||
- [ ] Database sessions use single `get_session()` call
|
||||
- [ ] Parallel operations use `asyncio.gather`
|
||||
- [ ] No legacy compatibility aliases
|
||||
- [ ] No commented-out code
|
||||
- [ ] Logging uses structured logging
|
||||
|
||||
## Performance Guidelines
|
||||
|
||||
### Training Jobs
|
||||
- ✅ Use parallel execution for multiple products
|
||||
- ✅ Reduce Optuna trials for low-volume products
|
||||
- ✅ Use constants for all thresholds
|
||||
- ⚠️ Monitor memory usage during parallel training
|
||||
|
||||
### Database Operations
|
||||
- ✅ Use repository pattern
|
||||
- ✅ Batch operations when possible
|
||||
- ✅ Close sessions properly
|
||||
- ⚠️ Connection pool limits not yet configured
|
||||
|
||||
### HTTP Requests
|
||||
- ✅ Timeouts configured automatically
|
||||
- ✅ Use shared clients from `shared/clients`
|
||||
- ⚠️ Circuit breaker not yet implemented
|
||||
- ⚠️ Request retries delegated to base client
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
### Training Failures
|
||||
1. Check logs for data validation errors
|
||||
2. Verify timezone consistency in date ranges
|
||||
3. Check minimum data point requirements
|
||||
4. Review Prophet error messages
|
||||
|
||||
### Performance Issues
|
||||
1. Check if parallel training is being used
|
||||
2. Verify Optuna trial counts
|
||||
3. Monitor database connection usage
|
||||
4. Check HTTP timeout configurations
|
||||
|
||||
### Data Quality Issues
|
||||
1. Review validation errors in logs
|
||||
2. Check zero-ratio thresholds
|
||||
3. Verify product classification
|
||||
4. Review date range alignment
|
||||
|
||||
## Migration from Old Code
|
||||
|
||||
### If You Find Legacy Code
|
||||
1. Check if alias exists (should be removed)
|
||||
2. Update imports to use new names
|
||||
3. Remove backward compatibility wrappers
|
||||
4. Update documentation
|
||||
|
||||
### If You Find Magic Numbers
|
||||
1. Add constant to `core/constants.py`
|
||||
2. Update usage to reference constant
|
||||
3. Document what the number represents
|
||||
|
||||
### If You Find Manual Timezone Handling
|
||||
1. Import from `utils/timezone_utils`
|
||||
2. Use appropriate utility function
|
||||
3. Remove manual implementation
|
||||
|
||||
## Getting Help
|
||||
|
||||
- Review `IMPLEMENTATION_SUMMARY.md` for recent changes
|
||||
- Check constants in `core/constants.py` for configuration
|
||||
- Look at `utils/timezone_utils.py` for timezone functions
|
||||
- Refer to analysis report for architectural decisions
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: 2025-10-07*
|
||||
*Status: Current*
|
||||
Reference in New Issue
Block a user