Add POI feature and imporve the overall backend implementation

This commit is contained in:
Urtzi Alfaro
2025-11-12 15:34:10 +01:00
parent e8096cd979
commit 5783c7ed05
173 changed files with 16862 additions and 9078 deletions

View File

@@ -38,7 +38,8 @@ The **Training Service** is the machine learning pipeline engine of Bakery-IA, r
### Feature Engineering
- **Historical Data Aggregation** - Collect sales data for model training
- **External Data Integration** - Fetch weather, traffic, holiday data
- **Feature Extraction** - Generate 20+ temporal and contextual features
- **POI Feature Integration** - Merge location-based POI features into training data
- **Feature Extraction** - Generate 30+ temporal, contextual, and location-based features
- **Data Validation** - Ensure minimum data requirements before training
- **Outlier Detection** - Filter anomalous data points
@@ -61,6 +62,10 @@ async def train_model_pipeline(tenant_id: str, product_id: str):
weather_data = await fetch_weather_data(tenant_id)
traffic_data = await fetch_traffic_data(tenant_id)
holiday_data = await fetch_holiday_calendar()
poi_features = await fetch_poi_features(tenant_id) # NEW: Location context
# Merge POI features into training dataframe
features = merge_poi_features(features, poi_features)
# Step 3: Prophet Model Training
model = Prophet(
@@ -533,11 +538,56 @@ async def test_training_progress():
asyncio.run(test_training_progress())
```
## POI Feature Integration
### How POI Features Enhance Training
The Training Service integrates location-based POI features from the External Service to improve forecast accuracy:
**POI Features Included:**
- `school_density` - Number of schools within 1km radius
- `office_density` - Number of offices and business centers nearby
- `residential_density` - Residential area proximity
- `transport_hub_proximity` - Distance to metro, bus, train stations
- `commercial_zone_score` - Commercial activity in the area
- `restaurant_density` - Nearby restaurants and cafes
- `competitor_proximity` - Distance to competing bakeries
- And 11+ more location-based features
**Integration Process:**
1. **Fetch POI Context** - Retrieve tenant's POI features from External Service (`/poi-context/{tenant_id}`)
2. **Extract ML Features** - Parse `ml_features` JSON object from POI context
3. **Merge with Training Data** - Add POI features as additional columns in training dataframe
4. **Prophet Training** - Include POI features as regressors in Prophet model
5. **Feature Importance** - Track which POI features most impact predictions
**Example POI Feature Integration:**
```python
from app.ml.poi_feature_integrator import POIFeatureIntegrator
# Initialize POI integrator
poi_integrator = POIFeatureIntegrator(external_service_url)
# Fetch and merge POI features
poi_features = await poi_integrator.fetch_poi_features(tenant_id)
training_df = poi_integrator.merge_poi_features(training_df, poi_features)
# POI features now available as columns:
# training_df['school_density'], training_df['office_density'], etc.
# Add POI features as Prophet regressors
for feature_name in poi_features.keys():
prophet_model.add_regressor(feature_name)
```
**Endpoint Used:**
- `GET {EXTERNAL_SERVICE_URL}/poi-context/{tenant_id}` - Fetch POI features
## Integration Points
### Dependencies (Services Called)
- **Sales Service** - Fetch historical sales data for training
- **External Service** - Fetch weather, traffic, holiday data
- **External Service** - Fetch weather, traffic, holiday, and POI feature data
- **PostgreSQL** - Store job queue, models, metrics, logs
- **RabbitMQ** - Publish training completion events
- **File System** - Store model artifacts