Improve the demo feature of the project
This commit is contained in:
3
services/external/Dockerfile
vendored
3
services/external/Dockerfile
vendored
@@ -28,8 +28,7 @@ COPY --from=shared /shared /app/shared
|
||||
# Copy application code
|
||||
COPY services/external/ .
|
||||
|
||||
# Copy scripts directory
|
||||
COPY scripts/ /app/scripts/
|
||||
|
||||
|
||||
# Add shared libraries to Python path
|
||||
ENV PYTHONPATH="/app:/app/shared:${PYTHONPATH:-}"
|
||||
|
||||
477
services/external/IMPLEMENTATION_COMPLETE.md
vendored
477
services/external/IMPLEMENTATION_COMPLETE.md
vendored
@@ -1,477 +0,0 @@
|
||||
# External Data Service - Implementation Complete
|
||||
|
||||
## ✅ Implementation Summary
|
||||
|
||||
All components from the EXTERNAL_DATA_SERVICE_REDESIGN.md have been successfully implemented. This document provides deployment and usage instructions.
|
||||
|
||||
---
|
||||
|
||||
## 📋 Implemented Components
|
||||
|
||||
### Backend (Python/FastAPI)
|
||||
|
||||
#### 1. City Registry & Geolocation (`app/registry/`)
|
||||
- ✅ `city_registry.py` - Multi-city configuration registry
|
||||
- ✅ `geolocation_mapper.py` - Tenant-to-city mapping with Haversine distance
|
||||
|
||||
#### 2. Data Adapters (`app/ingestion/`)
|
||||
- ✅ `base_adapter.py` - Abstract adapter interface
|
||||
- ✅ `adapters/madrid_adapter.py` - Madrid implementation (AEMET + OpenData)
|
||||
- ✅ `adapters/__init__.py` - Adapter registry and factory
|
||||
- ✅ `ingestion_manager.py` - Multi-city orchestration
|
||||
|
||||
#### 3. Database Layer (`app/models/`, `app/repositories/`)
|
||||
- ✅ `models/city_weather.py` - CityWeatherData model
|
||||
- ✅ `models/city_traffic.py` - CityTrafficData model
|
||||
- ✅ `repositories/city_data_repository.py` - City data CRUD operations
|
||||
|
||||
#### 4. Cache Layer (`app/cache/`)
|
||||
- ✅ `redis_cache.py` - Redis caching for <100ms access
|
||||
|
||||
#### 5. API Endpoints (`app/api/`)
|
||||
- ✅ `city_operations.py` - New city-based endpoints
|
||||
- ✅ Updated `main.py` - Router registration
|
||||
|
||||
#### 6. Schemas (`app/schemas/`)
|
||||
- ✅ `city_data.py` - CityInfoResponse, DataAvailabilityResponse
|
||||
|
||||
#### 7. Job Scripts (`app/jobs/`)
|
||||
- ✅ `initialize_data.py` - 24-month data initialization
|
||||
- ✅ `rotate_data.py` - Monthly data rotation
|
||||
|
||||
### Frontend (TypeScript)
|
||||
|
||||
#### 1. Type Definitions
|
||||
- ✅ `frontend/src/api/types/external.ts` - Added CityInfoResponse, DataAvailabilityResponse
|
||||
|
||||
#### 2. API Services
|
||||
- ✅ `frontend/src/api/services/external.ts` - Complete external data service client
|
||||
|
||||
### Infrastructure (Kubernetes)
|
||||
|
||||
#### 1. Manifests (`infrastructure/kubernetes/external/`)
|
||||
- ✅ `init-job.yaml` - One-time 24-month data load
|
||||
- ✅ `cronjob.yaml` - Monthly rotation (1st of month, 2am UTC)
|
||||
- ✅ `deployment.yaml` - Main service with readiness probes
|
||||
- ✅ `configmap.yaml` - Configuration
|
||||
- ✅ `secrets.yaml` - API keys template
|
||||
|
||||
### Database
|
||||
|
||||
#### 1. Migrations
|
||||
- ✅ `migrations/versions/20251007_0733_add_city_data_tables.py` - City data tables
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Deployment Instructions
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. **Database**
|
||||
```bash
|
||||
# Ensure PostgreSQL is running
|
||||
# Database: external_db
|
||||
# User: external_user
|
||||
```
|
||||
|
||||
2. **Redis**
|
||||
```bash
|
||||
# Ensure Redis is running
|
||||
# Default: redis://external-redis:6379/0
|
||||
```
|
||||
|
||||
3. **API Keys**
|
||||
- AEMET API Key (Spanish weather)
|
||||
- Madrid OpenData API Key (traffic)
|
||||
|
||||
### Step 1: Apply Database Migration
|
||||
|
||||
```bash
|
||||
cd /Users/urtzialfaro/Documents/bakery-ia/services/external
|
||||
|
||||
# Run migration
|
||||
alembic upgrade head
|
||||
|
||||
# Verify tables
|
||||
psql $DATABASE_URL -c "\dt city_*"
|
||||
# Expected: city_weather_data, city_traffic_data
|
||||
```
|
||||
|
||||
### Step 2: Configure Kubernetes Secrets
|
||||
|
||||
```bash
|
||||
cd /Users/urtzialfaro/Documents/bakery-ia/infrastructure/kubernetes/external
|
||||
|
||||
# Edit secrets.yaml with actual values
|
||||
# Replace YOUR_AEMET_API_KEY_HERE
|
||||
# Replace YOUR_MADRID_OPENDATA_KEY_HERE
|
||||
# Replace YOUR_DB_PASSWORD_HERE
|
||||
|
||||
# Apply secrets
|
||||
kubectl apply -f secrets.yaml
|
||||
kubectl apply -f configmap.yaml
|
||||
```
|
||||
|
||||
### Step 3: Run Initialization Job
|
||||
|
||||
```bash
|
||||
# Apply init job
|
||||
kubectl apply -f init-job.yaml
|
||||
|
||||
# Monitor progress
|
||||
kubectl logs -f job/external-data-init -n bakery-ia
|
||||
|
||||
# Check completion
|
||||
kubectl get job external-data-init -n bakery-ia
|
||||
# Should show: COMPLETIONS 1/1
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
Starting data initialization job months=24
|
||||
Initializing city data city=Madrid start=2023-10-07 end=2025-10-07
|
||||
Madrid weather data fetched records=XXXX
|
||||
Madrid traffic data fetched records=XXXX
|
||||
City initialization complete city=Madrid weather_records=XXXX traffic_records=XXXX
|
||||
✅ Data initialization completed successfully
|
||||
```
|
||||
|
||||
### Step 4: Deploy Main Service
|
||||
|
||||
```bash
|
||||
# Apply deployment
|
||||
kubectl apply -f deployment.yaml
|
||||
|
||||
# Wait for readiness
|
||||
kubectl wait --for=condition=ready pod -l app=external-service -n bakery-ia --timeout=300s
|
||||
|
||||
# Verify deployment
|
||||
kubectl get pods -n bakery-ia -l app=external-service
|
||||
```
|
||||
|
||||
### Step 5: Schedule Monthly CronJob
|
||||
|
||||
```bash
|
||||
# Apply cronjob
|
||||
kubectl apply -f cronjob.yaml
|
||||
|
||||
# Verify schedule
|
||||
kubectl get cronjob external-data-rotation -n bakery-ia
|
||||
|
||||
# Expected output:
|
||||
# NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
|
||||
# external-data-rotation 0 2 1 * * False 0 <none> 1m
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### 1. Test City Listing
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/v1/external/cities
|
||||
```
|
||||
|
||||
Expected response:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"city_id": "madrid",
|
||||
"name": "Madrid",
|
||||
"country": "ES",
|
||||
"latitude": 40.4168,
|
||||
"longitude": -3.7038,
|
||||
"radius_km": 30.0,
|
||||
"weather_provider": "aemet",
|
||||
"traffic_provider": "madrid_opendata",
|
||||
"enabled": true
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### 2. Test Data Availability
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/v1/external/operations/cities/madrid/availability
|
||||
```
|
||||
|
||||
Expected response:
|
||||
```json
|
||||
{
|
||||
"city_id": "madrid",
|
||||
"city_name": "Madrid",
|
||||
"weather_available": true,
|
||||
"weather_start_date": "2023-10-07T00:00:00+00:00",
|
||||
"weather_end_date": "2025-10-07T00:00:00+00:00",
|
||||
"weather_record_count": 17520,
|
||||
"traffic_available": true,
|
||||
"traffic_start_date": "2023-10-07T00:00:00+00:00",
|
||||
"traffic_end_date": "2025-10-07T00:00:00+00:00",
|
||||
"traffic_record_count": 17520
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Test Optimized Historical Weather
|
||||
|
||||
```bash
|
||||
TENANT_ID="your-tenant-id"
|
||||
curl "http://localhost:8000/api/v1/tenants/${TENANT_ID}/external/operations/historical-weather-optimized?latitude=40.42&longitude=-3.70&start_date=2024-01-01T00:00:00Z&end_date=2024-01-31T23:59:59Z"
|
||||
```
|
||||
|
||||
Expected: Array of weather records with <100ms response time
|
||||
|
||||
### 4. Test Optimized Historical Traffic
|
||||
|
||||
```bash
|
||||
TENANT_ID="your-tenant-id"
|
||||
curl "http://localhost:8000/api/v1/tenants/${TENANT_ID}/external/operations/historical-traffic-optimized?latitude=40.42&longitude=-3.70&start_date=2024-01-01T00:00:00Z&end_date=2024-01-31T23:59:59Z"
|
||||
```
|
||||
|
||||
Expected: Array of traffic records with <100ms response time
|
||||
|
||||
### 5. Test Cache Performance
|
||||
|
||||
```bash
|
||||
# First request (cache miss)
|
||||
time curl "http://localhost:8000/api/v1/tenants/${TENANT_ID}/external/operations/historical-weather-optimized?..."
|
||||
# Expected: ~200-500ms (database query)
|
||||
|
||||
# Second request (cache hit)
|
||||
time curl "http://localhost:8000/api/v1/tenants/${TENANT_ID}/external/operations/historical-weather-optimized?..."
|
||||
# Expected: <100ms (Redis cache)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Check Job Status
|
||||
|
||||
```bash
|
||||
# Init job
|
||||
kubectl logs job/external-data-init -n bakery-ia
|
||||
|
||||
# CronJob history
|
||||
kubectl get jobs -n bakery-ia -l job=data-rotation --sort-by=.metadata.creationTimestamp
|
||||
```
|
||||
|
||||
### Check Service Health
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/health/ready
|
||||
curl http://localhost:8000/health/live
|
||||
```
|
||||
|
||||
### Check Database Records
|
||||
|
||||
```bash
|
||||
psql $DATABASE_URL
|
||||
|
||||
# Weather records per city
|
||||
SELECT city_id, COUNT(*), MIN(date), MAX(date)
|
||||
FROM city_weather_data
|
||||
GROUP BY city_id;
|
||||
|
||||
# Traffic records per city
|
||||
SELECT city_id, COUNT(*), MIN(date), MAX(date)
|
||||
FROM city_traffic_data
|
||||
GROUP BY city_id;
|
||||
```
|
||||
|
||||
### Check Redis Cache
|
||||
|
||||
```bash
|
||||
redis-cli
|
||||
|
||||
# Check cache keys
|
||||
KEYS weather:*
|
||||
KEYS traffic:*
|
||||
|
||||
# Check cache hit stats (if configured)
|
||||
INFO stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Add New City
|
||||
|
||||
1. Edit `services/external/app/registry/city_registry.py`:
|
||||
|
||||
```python
|
||||
CityDefinition(
|
||||
city_id="valencia",
|
||||
name="Valencia",
|
||||
country=Country.SPAIN,
|
||||
latitude=39.4699,
|
||||
longitude=-0.3763,
|
||||
radius_km=25.0,
|
||||
weather_provider=WeatherProvider.AEMET,
|
||||
weather_config={"station_ids": ["8416"], "municipality_code": "46250"},
|
||||
traffic_provider=TrafficProvider.VALENCIA_OPENDATA,
|
||||
traffic_config={"api_endpoint": "https://..."},
|
||||
timezone="Europe/Madrid",
|
||||
population=800_000,
|
||||
enabled=True # Enable the city
|
||||
)
|
||||
```
|
||||
|
||||
2. Create adapter `services/external/app/ingestion/adapters/valencia_adapter.py`
|
||||
|
||||
3. Register in `adapters/__init__.py`:
|
||||
|
||||
```python
|
||||
ADAPTER_REGISTRY = {
|
||||
"madrid": MadridAdapter,
|
||||
"valencia": ValenciaAdapter, # Add
|
||||
}
|
||||
```
|
||||
|
||||
4. Re-run init job or manually populate data
|
||||
|
||||
### Adjust Data Retention
|
||||
|
||||
Edit `infrastructure/kubernetes/external/configmap.yaml`:
|
||||
|
||||
```yaml
|
||||
data:
|
||||
retention-months: "36" # Change from 24 to 36 months
|
||||
```
|
||||
|
||||
Re-deploy:
|
||||
```bash
|
||||
kubectl apply -f configmap.yaml
|
||||
kubectl rollout restart deployment external-service -n bakery-ia
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Init Job Fails
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
kubectl logs job/external-data-init -n bakery-ia
|
||||
|
||||
# Common issues:
|
||||
# - Missing API keys → Check secrets
|
||||
# - Database connection → Check DATABASE_URL
|
||||
# - External API timeout → Increase backoffLimit in init-job.yaml
|
||||
```
|
||||
|
||||
### Service Not Ready
|
||||
|
||||
```bash
|
||||
# Check readiness probe
|
||||
kubectl describe pod -l app=external-service -n bakery-ia | grep -A 10 Readiness
|
||||
|
||||
# Common issues:
|
||||
# - No data in database → Run init job
|
||||
# - Database migration not applied → Run alembic upgrade head
|
||||
```
|
||||
|
||||
### Cache Not Working
|
||||
|
||||
```bash
|
||||
# Check Redis connection
|
||||
kubectl exec -it deployment/external-service -n bakery-ia -- redis-cli -u $REDIS_URL ping
|
||||
# Expected: PONG
|
||||
|
||||
# Check cache keys
|
||||
kubectl exec -it deployment/external-service -n bakery-ia -- redis-cli -u $REDIS_URL KEYS "*"
|
||||
```
|
||||
|
||||
### Slow Queries
|
||||
|
||||
```bash
|
||||
# Enable query logging in PostgreSQL
|
||||
# Check for missing indexes
|
||||
psql $DATABASE_URL -c "\d city_weather_data"
|
||||
# Should have: idx_city_weather_lookup, ix_city_weather_data_city_id, ix_city_weather_data_date
|
||||
|
||||
psql $DATABASE_URL -c "\d city_traffic_data"
|
||||
# Should have: idx_city_traffic_lookup, ix_city_traffic_data_city_id, ix_city_traffic_data_date
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Performance Benchmarks
|
||||
|
||||
Expected performance (after cache warm-up):
|
||||
|
||||
| Operation | Before (Old) | After (New) | Improvement |
|
||||
|-----------|--------------|-------------|-------------|
|
||||
| Historical Weather (1 month) | 3-5 seconds | <100ms | 30-50x faster |
|
||||
| Historical Traffic (1 month) | 5-10 seconds | <100ms | 50-100x faster |
|
||||
| Training Data Load (24 months) | 60-120 seconds | 1-2 seconds | 60x faster |
|
||||
| Redundant Fetches | N tenants × 1 request each | 1 request shared | N x deduplication |
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Maintenance
|
||||
|
||||
### Monthly (Automatic via CronJob)
|
||||
|
||||
- Data rotation happens on 1st of each month at 2am UTC
|
||||
- Deletes data older than 24 months
|
||||
- Ingests last month's data
|
||||
- No manual intervention needed
|
||||
|
||||
### Quarterly
|
||||
|
||||
- Review cache hit rates
|
||||
- Optimize cache TTL if needed
|
||||
- Review database indexes
|
||||
|
||||
### Yearly
|
||||
|
||||
- Review city registry (add/remove cities)
|
||||
- Update API keys if expired
|
||||
- Review retention policy (24 months vs longer)
|
||||
|
||||
---
|
||||
|
||||
## ✅ Implementation Checklist
|
||||
|
||||
- [x] City registry and geolocation mapper
|
||||
- [x] Base adapter and Madrid adapter
|
||||
- [x] Database models for city data
|
||||
- [x] City data repository
|
||||
- [x] Data ingestion manager
|
||||
- [x] Redis cache layer
|
||||
- [x] City data schemas
|
||||
- [x] New API endpoints for city operations
|
||||
- [x] Kubernetes job scripts (init + rotate)
|
||||
- [x] Kubernetes manifests (job, cronjob, deployment)
|
||||
- [x] Frontend TypeScript types
|
||||
- [x] Frontend API service methods
|
||||
- [x] Database migration
|
||||
- [x] Updated main.py router registration
|
||||
|
||||
---
|
||||
|
||||
## 📚 Additional Resources
|
||||
|
||||
- Full Architecture: `/Users/urtzialfaro/Documents/bakery-ia/EXTERNAL_DATA_SERVICE_REDESIGN.md`
|
||||
- API Documentation: `http://localhost:8000/docs` (when service is running)
|
||||
- Database Schema: See migration file `20251007_0733_add_city_data_tables.py`
|
||||
|
||||
---
|
||||
|
||||
## 🎉 Success Criteria
|
||||
|
||||
Implementation is complete when:
|
||||
|
||||
1. ✅ Init job runs successfully
|
||||
2. ✅ Service deployment is ready
|
||||
3. ✅ All API endpoints return data
|
||||
4. ✅ Cache hit rate > 70% after warm-up
|
||||
5. ✅ Response times < 100ms for cached data
|
||||
6. ✅ Monthly CronJob is scheduled
|
||||
7. ✅ Frontend can call new endpoints
|
||||
8. ✅ Training service can use optimized endpoints
|
||||
|
||||
All criteria have been met with this implementation.
|
||||
Reference in New Issue
Block a user