diff --git a/DOCKER_MAINTENANCE.md b/DOCKER_MAINTENANCE.md new file mode 100644 index 00000000..094ed099 --- /dev/null +++ b/DOCKER_MAINTENANCE.md @@ -0,0 +1,120 @@ +# Docker Maintenance Guide for Local Development + +## The Problem + +When developing with Tilt and local Kubernetes (Kind), Docker accumulates: +- **Multiple image versions** from each code change (Tilt rebuilds) +- **Unused volumes** from previous cluster runs +- **Build cache** that grows over time + +This quickly fills up disk space, causing pods to fail with "No space left on device" errors. + +## Quick Fix (When You Hit Disk Issues) + +```bash +# Clean up all unused Docker resources +docker system prune -a --volumes -f +``` + +This removes: +- All unused images +- All unused volumes +- All build cache + +**Expected recovery**: 60-100GB + +## Regular Maintenance + +### Option 1: Use the Cleanup Script (Recommended) + +Run the maintenance script weekly: + +```bash +./scripts/cleanup-docker.sh +``` + +Or run it automatically without confirmation: + +```bash +./scripts/cleanup-docker.sh --auto +``` + +### Option 2: Manual Commands + +```bash +# Remove images older than 24 hours +docker image prune -af --filter "until=24h" + +# Remove unused volumes +docker volume prune -f + +# Remove build cache +docker builder prune -af +``` + +### Option 3: Set Up Automated Cleanup + +Add to your crontab (run every Sunday at 2 AM): + +```bash +crontab -e +# Add this line: +0 2 * * 0 /Users/urtzialfaro/Documents/bakery-ia/scripts/cleanup-docker.sh --auto >> /tmp/docker-cleanup.log 2>&1 +``` + +## Monitoring Disk Usage + +### Check Docker disk usage: +```bash +docker system df +``` + +### Check Kind node disk usage: +```bash +docker exec bakery-ia-local-control-plane df -h /var +``` + +### Alert thresholds: +- **< 70%**: Healthy ✅ +- **70-85%**: Consider cleanup soon ⚠️ +- **> 85%**: Run cleanup immediately 🚨 +- **> 95%**: Critical - pods will fail ❌ + +## Prevention Tips + +1. **Run cleanup weekly** to prevent accumulation +2. **Monitor disk usage** before long dev sessions +3. **Delete old Kind clusters** when switching projects: + ```bash + kind delete cluster --name bakery-ia-local + ``` +4. **Increase Docker disk allocation** in Docker Desktop settings if you frequently rebuild many services + +## Troubleshooting + +### Pods in CrashLoopBackOff after disk issues: + +1. Run cleanup (see Quick Fix above) +2. Restart failed pods: + ```bash + kubectl get pods -n bakery-ia | grep -E "(CrashLoopBackOff|Error)" | awk '{print $1}' | xargs kubectl delete pod -n bakery-ia + ``` + +### Cleanup didn't free enough space: + +If still above 90% after cleanup: + +```bash +# Nuclear option - rebuild everything +kind delete cluster --name bakery-ia-local +docker system prune -a --volumes -f +# Then recreate cluster with your setup scripts +``` + +## What Happened Today (2026-01-12) + +- **Issue**: Disk was 100% full (113GB/113GB), causing database pods to crash +- **Root cause**: 122 unused Docker images + 16 unused volumes + 6GB build cache +- **Solution**: Ran `docker system prune -a --volumes -f` +- **Result**: Freed 89GB, disk now at 22% usage (24GB/113GB) +- **All services recovered successfully** diff --git a/Lista_de_Mejoras_Propuestas.md b/Lista_de_Mejoras_Propuestas.md deleted file mode 100644 index f7a3b350..00000000 --- a/Lista_de_Mejoras_Propuestas.md +++ /dev/null @@ -1,185 +0,0 @@ -# Lista de Mejoras Propuestas - -## 1. Nueva Sección: Seguridad y Ciberseguridad (Añadir después de 5.2) - -### 5.3. Arquitectura de Seguridad y Cumplimiento Normativo Europeo - -**Autenticación y Autorización:** -- JWT con rotación de tokens cada 15 minutos -- Control de acceso basado en roles (RBAC) -- Rate limiting (300 req/min por cliente) -- Autenticación multifactor (MFA) planificada para Q1 2026 - -**Protección de Datos:** -- Cifrado AES-256 en reposo -- HTTPS obligatorio con certificados Let's Encrypt auto-renovables -- Aislamiento multi-tenant a nivel de base de datos -- Prevención SQL injection mediante Pydantic schemas -- Protección XSS y CORS - -**Monitorización y Trazabilidad:** -- OpenTelemetry para trazabilidad distribuida end-to-end -- SigNoz como plataforma unificada de observabilidad -- Logs centralizados con correlación de trazas -- Auditoría completa de accesos y cambios -- Alertas en tiempo real (email y Slack) - -**Cumplimiento RGPD:** -- Privacy by design -- Derecho al olvido implementado -- Exportación de datos en CSV/JSON -- Gestión de consentimientos con historial -- Anonimización de datos analíticos - -**Infraestructura Segura:** -- Kubernetes con políticas de seguridad de pods -- Actualizaciones automáticas de seguridad -- Backups cifrados diarios -- Plan de recuperación ante desastres (DR) -- PostgreSQL 17 con configuraciones hardened - -## 2. Actualizar Sección 4.4 (Competencia) - Añadir Ventaja de Seguridad - -**Ventaja Competitiva en Seguridad:** -- Arquitectura "Security-First" vs. soluciones legacy vulnerables -- Cumplimiento RGPD desde el diseño (competidores retrofitting) -- Observabilidad completa (OpenTelemetry + SigNoz) vs. cajas negras -- Certificaciones de seguridad planificadas (ISO 27001, ENS) -- Alineación con NIS2 Directive (obligatoria 2024 para cadena alimentaria) - -## 3. Actualizar Sección 8.1 (Financiación) - Nuevas Líneas de Ayuda - -### Añadir subsección: "Financiación Europea en Ciberseguridad 2026-2027" - -**Programas Europeos Identificados (2026-2027):** - -1. **Digital Europe Programme - Ciberseguridad (€390M totales)** - - **UPTAKE Program**: €15M para SMEs en cumplimiento normativo - - **AI-Based Cybersecurity**: €15M para sistemas IA de seguridad - - Cofinanciación: hasta 75% de costes de proyecto - - Proyectos: €3-5M por proyecto, duración 36 meses - - **Elegibilidad**: Bakery-IA califica como SME tecnológica - - **Solicitud estimada**: €200.000 (Q1 2026) - -2. **INCIBE EMPRENDE Program (España Digital 2026)** - - Presupuesto: €191M (2023-2026) - - 34 entidades colaboradoras en España - - Ideación, incubación y aceleración en ciberseguridad - - Fondos Next Generation-EU - - **Solicitud estimada**: €50.000 (Q2 2026) - -3. **ENISA Emprendedoras Digitales** - - Préstamos participativos hasta €51M movilizables - - Líneas específicas para emprendimiento digital - - Sin avales personales, condiciones flexibles - - **Solicitud estimada**: €75.000 (Q2 2026) - -**Alineación con Prioridades UE:** -- NIS2 Directive (seguridad sector alimentario) -- Cyber Resilience Act (productos digitales seguros) -- AI Act (IA transparente y auditable) -- GDPR (protección datos desde diseño) - -## 4. Actualizar Sección 5.2 (Arquitectura Técnica) - -### Añadir detalles de la documentación técnica: - -**Arquitectura de Microservicios (21 servicios independientes):** -- API Gateway centralizado con JWT y cache Redis (95% hit rate) -- Frontend: React 18 + TypeScript (PWA mobile-first) -- 18 bases de datos PostgreSQL 17 (patrón database-per-service) -- Redis 7.4 para caché y RabbitMQ 4.1 para eventos -- Kubernetes en VPS (escalabilidad horizontal) - -**Innovación Técnica Destacable:** -- **Sistema de Alertas Enriquecidas** (3 niveles: Alertas/Notificaciones/Recomendaciones) -- **Priorización Inteligente** con scoring 0-100 (4 factores ponderados) -- **Escalado Temporal** (+10 a 48h, +20 a 72h, +30 cerca deadline) -- **Encadenamiento Causal** (stock shortage → retraso producción → riesgo pedido) -- **Deduplicación** (95% reducción spam de alertas) -- **SSE + WebSocket** para actualizaciones en tiempo real -- **Prophet ML** con 20+ features (AEMET, tráfico Madrid, festivos) - -**Observabilidad de Clase Empresarial:** -- **SigNoz**: Trazas, métricas y logs unificados -- **OpenTelemetry**: Auto-instrumentación de 18 servicios -- **ClickHouse**: Backend de alto rendimiento para análisis -- **Alerting**: Multi-canal (email, Slack) vía AlertManager -- Monitorización: 18 DBs PostgreSQL, Redis, RabbitMQ, Kubernetes - -## 5. Actualizar Resumen Ejecutivo (Sección 0) - -### Añadir bullet: - -- **Seguridad y Cumplimiento:** Arquitectura Security-First con cumplimiento RGPD, observabilidad completa (OpenTelemetry/SigNoz), y alineación con normativas europeas (NIS2, Cyber Resilience Act). Elegible para €390M del Digital Europe Programme en ciberseguridad. - -## 6. Actualizar Sección 9 (Decálogo) - Añadir Oportunidades de Financiación - -**6. Alineación Estratégica con Prioridades Europeas 2026-2027:** - - **€390M disponibles** en Digital Europe Programme para ciberseguridad - - **€191M** del programa INCIBE EMPRENDE (España Digital 2026) - - Bakery-IA califica para **3 líneas de financiación simultáneas**: - * UPTAKE (€200K) - Cumplimiento normativo SMEs - * INCIBE EMPRENDE (€50K) - Aceleración cybersecurity startups - * ENISA Digital (€75K) - Préstamo participativo - - **Total potencial**: €325.000 en financiación no dilutiva adicional - - Ventaja competitiva: Security-First vs. competidores legacy - -## 7. Nueva Tabla de Costes Operativos - Añadir Línea de Seguridad - -| **Seguridad y Cumplimiento** | | | -| Certificado SSL (Let's Encrypt) | Gratuito | €0 | Renovación automática | -| SigNoz Observability (self-hosted) | Incluido en VPS | €0 | Vs. €500+/mes en SaaS | -| Auditoría RGPD anual | Externa | €1,200/año | Compliance obligatorio | -| Backups cifrados (off-site) | Backblaze B2 | €5/mes | €60/año | - -## 8. Actualizar Roadmap (Sección 9) - Añadir Hitos de Seguridad - -**Q1 2026:** -- ✅ Implementación MFA (Multi-Factor Authentication) -- ✅ Solicitud Digital Europe Programme (UPTAKE) -- ✅ Auditoría RGPD externa - -**Q2 2026:** -- 📋 Certificación ISO 27001 (inicio proceso) -- 📋 Implementación NIS2 compliance -- 📋 Solicitud INCIBE EMPRENDE - -**Q3 2026:** -- 📋 Penetration testing externo -- 📋 Certificación ENS (Esquema Nacional de Seguridad) - -## 9. Actualizar Petición Concreta a VUE (Sección 9.2) - -### Añadir nuevo punto 5: - -**5. Conexión con Programas Europeos de Ciberseguridad:** - - Orientación para solicitud Digital Europe Programme (UPTAKE: €200K) - - Introducción a INCIBE EMPRENDE y red de 34 entidades colaboradoras - - Asesoramiento en preparación de propuestas técnicas para fondos EU - - Contacto con CDTI para programa NEOTEC (R&D+Ciberseguridad) - -## 10. Añadir Anexo 7 - Compliance y Certificaciones - -## ANEXO 7: ROADMAP DE COMPLIANCE Y CERTIFICACIONES - -**Normativas Aplicables:** -- ✅ RGPD (Reglamento General de Protección de Datos) - Implementado -- 📋 NIS2 Directive (Seguridad de redes y sistemas de información) - Q2 2026 -- 📋 Cyber Resilience Act (Productos digitales seguros) - Q3 2026 -- 📋 AI Act (Transparencia y auditoría de IA) - Q4 2026 - -**Certificaciones Planificadas:** -- 📋 ISO 27001 (Gestión de Seguridad de la Información) - 12-18 meses -- 📋 ENS Medio (Esquema Nacional de Seguridad) - 6-9 meses -- 📋 SOC 2 Type II (para clientes Enterprise) - 18-24 meses - -**Inversión Estimada en Compliance (3 años):** €25,000 -**ROI Esperado:** Acceso a clientes Enterprise (+€150K ARR potencial) - -## Resumen de Cambios Cuantitativos: -- Nueva financiación identificada: €325.000 (vs. €18.000 original) -- Nuevos programas: 3 líneas europeas de ciberseguridad -- Secciones nuevas: 2 (Seguridad 5.3, Compliance Anexo 7) -- Actualizaciones: 8 secciones existentes mejoradas -- Ventaja competitiva: Security-First enfatizada en 4 lugares \ No newline at end of file diff --git a/STRIPE_TESTING_GUIDE.md b/STRIPE_TESTING_GUIDE.md index 8caaf18d..9502808a 100644 --- a/STRIPE_TESTING_GUIDE.md +++ b/STRIPE_TESTING_GUIDE.md @@ -45,20 +45,72 @@ Before you begin testing, ensure you have: ### Step 3: Configure Environment Variables -#### Frontend `.env` file: -```bash -# Create or update: /frontend/.env -VITE_STRIPE_PUBLISHABLE_KEY=pk_test_your_publishable_key_here +**IMPORTANT:** This project **runs exclusively in Kubernetes**: +- **Development:** Kind/Colima + Tilt for local K8s development +- **Production:** MicroK8s on Ubuntu VPS + +All configuration is managed through Kubernetes ConfigMaps and Secrets. + +#### Frontend - Kubernetes ConfigMap: + +Update [infrastructure/kubernetes/base/configmap.yaml](infrastructure/kubernetes/base/configmap.yaml:374-378): + +```yaml +# FRONTEND CONFIGURATION section (lines 374-378) +VITE_STRIPE_PUBLISHABLE_KEY: "pk_test_your_actual_stripe_publishable_key_here" +VITE_PILOT_MODE_ENABLED: "true" +VITE_PILOT_COUPON_CODE: "PILOT2025" +VITE_PILOT_TRIAL_MONTHS: "3" ``` -#### Backend `.env` file: -```bash -# Create or update: /services/tenant/.env -STRIPE_SECRET_KEY=sk_test_your_secret_key_here -STRIPE_WEBHOOK_SECRET=whsec_your_webhook_secret_here +#### Backend - Kubernetes Secrets: + +Update [infrastructure/kubernetes/base/secrets.yaml](infrastructure/kubernetes/base/secrets.yaml:142-150): + +```yaml +# payment-secrets section (lines 142-150) +apiVersion: v1 +kind: Secret +metadata: + name: payment-secrets + namespace: bakery-ia +type: Opaque +data: + STRIPE_SECRET_KEY: + STRIPE_WEBHOOK_SECRET: ``` -**Note:** The webhook secret will be obtained in Step 4 when setting up webhooks. +**Encode your Stripe keys:** +```bash +# Encode Stripe secret key +echo -n "sk_test_your_actual_key_here" | base64 + +# Encode webhook secret (obtained in next step) +echo -n "whsec_your_actual_secret_here" | base64 +``` + +#### Apply Configuration Changes: + +After updating the files, apply to your Kubernetes cluster: + +```bash +# Development (Kind/Colima with Tilt) +# Tilt will automatically detect changes and reload + +# Or manually apply +kubectl apply -f infrastructure/kubernetes/base/configmap.yaml +kubectl apply -f infrastructure/kubernetes/base/secrets.yaml + +# Production (MicroK8s on VPS) +microk8s kubectl apply -f infrastructure/kubernetes/base/configmap.yaml +microk8s kubectl apply -f infrastructure/kubernetes/base/secrets.yaml + +# Restart deployments to pick up new config +kubectl rollout restart deployment/frontend-deployment -n bakery-ia +kubectl rollout restart deployment/tenant-service -n bakery-ia +``` + +**Note:** The webhook secret will be obtained in the next step when setting up webhooks. ### Step 4: Install/Update Dependencies @@ -82,33 +134,179 @@ npm install ### Step 1: Create Products and Prices +**Important:** Your application uses EUR currency and has specific pricing for the Spanish market. + +#### 1.1 Create Starter Plan Product + 1. In Stripe Dashboard (Test Mode), go to **Products** → **Add product** +2. Fill in product details: + - **Product name:** `Starter Plan` + - **Description:** `Plan básico para panaderías pequeñas - Includes basic forecasting, waste tracking, and supplier management` + - **Statement descriptor:** `BAKERY-STARTER` (appears on customer's credit card statement) -2. **Create Starter Plan:** - - Product name: `Starter Plan` - - Description: `Basic subscription for small businesses` - - Pricing: - - Price: `$29.00` - - Billing period: `Monthly` - - Currency: `USD` (or your preferred currency) - - Click **Save product** - - Copy the **Price ID** (starts with `price_...`) - you'll need this +3. **Create Monthly Price:** + - Click **Add another price** + - Price: `€49.00` + - Billing period: `Monthly` + - Currency: `EUR` + - Price description: `Starter Plan - Monthly` + - Click **Save price** + - **⚠️ COPY THE PRICE ID** (starts with `price_...`) - Label it as `STARTER_MONTHLY_PRICE_ID` -3. **Create Professional Plan:** - - Product name: `Professional Plan` - - Description: `Advanced subscription for growing businesses` - - Pricing: - - Price: `$99.00` - - Billing period: `Monthly` - - Currency: `USD` - - Click **Save product** - - Copy the **Price ID** +4. **Create Yearly Price (with discount):** + - Click **Add another price** on the same product + - Price: `€490.00` (17% discount - equivalent to 2 months free) + - Billing period: `Yearly` + - Currency: `EUR` + - Price description: `Starter Plan - Yearly (2 months free)` + - Click **Save price** + - **⚠️ COPY THE PRICE ID** - Label it as `STARTER_YEARLY_PRICE_ID` -4. **Update your application configuration:** - - Store these Price IDs in your application settings - - You'll use these when creating subscriptions +#### 1.2 Create Professional Plan Product -### Step 2: Configure Webhooks +1. Click **Add product** to create a new product +2. Fill in product details: + - **Product name:** `Professional Plan` + - **Description:** `Plan profesional para panaderías en crecimiento - Business analytics, enhanced AI (92% accurate), what-if scenarios, multi-location support` + - **Statement descriptor:** `BAKERY-PRO` + +3. **Create Monthly Price:** + - Price: `€149.00` + - Billing period: `Monthly` + - Currency: `EUR` + - Price description: `Professional Plan - Monthly` + - **⚠️ COPY THE PRICE ID** - Label it as `PROFESSIONAL_MONTHLY_PRICE_ID` + +4. **Create Yearly Price (with discount):** + - Price: `€1,490.00` (17% discount - equivalent to 2 months free) + - Billing period: `Yearly` + - Currency: `EUR` + - Price description: `Professional Plan - Yearly (2 months free)` + - **⚠️ COPY THE PRICE ID** - Label it as `PROFESSIONAL_YEARLY_PRICE_ID` + +#### 1.3 Create Enterprise Plan Product + +1. Click **Add product** to create a new product +2. Fill in product details: + - **Product name:** `Enterprise Plan` + - **Description:** `Plan enterprise para grandes operaciones - Unlimited features, production distribution, centralized dashboard, white-label, SSO, dedicated support` + - **Statement descriptor:** `BAKERY-ENTERPRISE` + +3. **Create Monthly Price:** + - Price: `€499.00` + - Billing period: `Monthly` + - Currency: `EUR` + - Price description: `Enterprise Plan - Monthly` + - **⚠️ COPY THE PRICE ID** - Label it as `ENTERPRISE_MONTHLY_PRICE_ID` + +4. **Create Yearly Price (with discount):** + - Price: `€4,990.00` (17% discount - equivalent to 2 months free) + - Billing period: `Yearly` + - Currency: `EUR` + - Price description: `Enterprise Plan - Yearly (2 months free)` + - **⚠️ COPY THE PRICE ID** - Label it as `ENTERPRISE_YEARLY_PRICE_ID` + +#### 1.4 Trial Period Configuration + +**IMPORTANT:** Your system does NOT use Stripe's default trial periods. Instead: +- All trial periods are managed through the **PILOT2025 coupon only** +- Do NOT configure default trials on products in Stripe +- The coupon provides a 90-day (3-month) trial extension +- Without the PILOT2025 coupon, subscriptions start with immediate billing + +#### 1.5 Price ID Reference Sheet + +Create a reference document with all your Price IDs: + +``` +STARTER_MONTHLY_PRICE_ID=price_XXXXXXXXXXXXXXXX +STARTER_YEARLY_PRICE_ID=price_XXXXXXXXXXXXXXXX +PROFESSIONAL_MONTHLY_PRICE_ID=price_XXXXXXXXXXXXXXXX +PROFESSIONAL_YEARLY_PRICE_ID=price_XXXXXXXXXXXXXXXX +ENTERPRISE_MONTHLY_PRICE_ID=price_XXXXXXXXXXXXXXXX +ENTERPRISE_YEARLY_PRICE_ID=price_XXXXXXXXXXXXXXXX +``` + +⚠️ **You'll need these Price IDs when configuring your backend environment variables or updating subscription creation logic.** + +### Step 2: Create the PILOT2025 Coupon (3 Months Free) + +**CRITICAL:** This coupon provides a 90-day (3-month) trial period where customers pay €0. + +#### How it Works: +Your application validates the `PILOT2025` coupon code and, when valid: +1. Creates the Stripe subscription with `trial_period_days=90` +2. Stripe automatically: + - Sets subscription status to `trialing` + - Charges **€0 for the first 90 days** + - Schedules the first invoice for day 91 + - Automatically begins normal billing after trial ends + +**IMPORTANT:** You do NOT need to create a coupon in Stripe Dashboard. The coupon is managed entirely in your application's database. Stripe only needs to know about the trial period duration (90 days). + +#### Verify PILOT2025 Coupon in Your Database: + +The PILOT2025 coupon should already exist in your database (created by the startup seeder). Verify it exists: + +```sql +SELECT * FROM coupons WHERE code = 'PILOT2025'; +``` + +**Expected values:** +- `code`: `PILOT2025` +- `discount_type`: `trial_extension` +- `discount_value`: `90` (days) +- `max_redemptions`: `20` +- `active`: `true` +- `valid_until`: ~180 days from creation + +If the coupon doesn't exist, it will be created automatically on application startup via the [startup_seeder.py](services/tenant/app/jobs/startup_seeder.py). + +#### Environment Variables for Pilot Mode: + +**Frontend - Kubernetes ConfigMap:** + +These are already configured in [infrastructure/kubernetes/base/configmap.yaml](infrastructure/kubernetes/base/configmap.yaml:374-378): + +```yaml +# FRONTEND CONFIGURATION (lines 374-378) +VITE_PILOT_MODE_ENABLED: "true" +VITE_PILOT_COUPON_CODE: "PILOT2025" +VITE_PILOT_TRIAL_MONTHS: "3" +VITE_STRIPE_PUBLISHABLE_KEY: "pk_test_your_stripe_publishable_key_here" +``` + +Update the `VITE_STRIPE_PUBLISHABLE_KEY` value with your actual test key from Stripe, then apply: + +```bash +# Development (Tilt auto-reloads, but you can manually apply) +kubectl apply -f infrastructure/kubernetes/base/configmap.yaml + +# Production +microk8s kubectl apply -f infrastructure/kubernetes/base/configmap.yaml +microk8s kubectl rollout restart deployment/frontend-deployment -n bakery-ia +``` + +**Backend Configuration:** + +The backend coupon configuration is managed in code at [services/tenant/app/jobs/startup_seeder.py](services/tenant/app/jobs/startup_seeder.py). The PILOT2025 coupon with max_redemptions=20 is created automatically on application startup. + +**Important Notes:** +- ✅ Coupon is validated in your application database, NOT in Stripe +- ✅ Works with ALL three tiers (Starter, Professional, Enterprise) +- ✅ Provides a 90-day (3-month) trial with **€0 charge** during trial +- ✅ Without this coupon, subscriptions start billing immediately +- ✅ After the 90-day trial ends, Stripe automatically bills the full monthly price +- ✅ Application tracks coupon usage to prevent double-redemption per tenant +- ✅ Limited to first 20 pilot customers + +**Billing Example with PILOT2025:** +- Day 0: Customer subscribes to Professional Plan (€149/month) with PILOT2025 +- Day 0-90: Customer pays **€0** (trialing status in Stripe) +- Day 91: Stripe automatically charges €149.00 and status becomes "active" +- Every 30 days after: €149.00 charged automatically + +### Step 3: Configure Webhooks 1. Navigate to **Developers** → **Webhooks** 2. Click **+ Add endpoint** @@ -125,6 +323,9 @@ npm install - `invoice.payment_succeeded` - `invoice.payment_failed` - `customer.subscription.trial_will_end` + - `coupon.created` (for coupon tracking) + - `coupon.deleted` (for coupon tracking) + - `promotion_code.created` (if using promotion codes) 5. Click **Add endpoint** @@ -186,25 +387,29 @@ Stripe provides test card numbers to simulate different scenarios. **Never use r ## Testing Scenarios -### Scenario 1: Successful Registration with Payment +### Scenario 1: Successful Registration with Payment (Starter Plan) -**Objective:** Test the complete registration flow with valid payment method. +**Objective:** Test the complete registration flow with valid payment method for Starter Plan. **Steps:** -1. **Start your applications:** - ```bash - # Terminal 1 - Backend - cd services/tenant - uvicorn app.main:app --reload --port 8000 +1. **Start your Kubernetes environment:** - # Terminal 2 - Frontend - cd frontend - npm run dev + **Development (Kind/Colima + Tilt):** + ```bash + # Start Tilt (this starts all services) + tilt up + + # Or if already running, just verify services are healthy + kubectl get pods -n bakery-ia ``` + **Access the application:** + - Tilt will provide the URLs (usually port-forwarded) + - Or use: `kubectl port-forward svc/frontend-service 3000:3000 -n bakery-ia` + 2. **Navigate to registration page:** - - Open browser: `http://localhost:5173/register` (or your frontend URL) + - Open browser: `http://localhost:3000/register` (or your Tilt-provided URL) 3. **Fill in user details:** - Full Name: `John Doe` @@ -225,24 +430,29 @@ Stripe provides test card numbers to simulate different scenarios. **Never use r - Country: `US` 5. **Select a plan:** - - Choose `Starter Plan` or `Professional Plan` + - Choose `Starter Plan` (€49/month or €490/year) -6. **Submit the form** +6. **Do NOT apply coupon** (this scenario tests without the PILOT2025 coupon) + +7. **Submit the form** **Expected Results:** - ✅ Payment method created successfully - ✅ User account created -- ✅ Subscription created in Stripe +- ✅ Subscription created in Stripe with immediate billing (no trial) - ✅ Database records created - ✅ User redirected to dashboard - ✅ No console errors +- ✅ First invoice created immediately for €49.00 **Verification:** 1. **In Stripe Dashboard:** - Go to **Customers** → Find "John Doe" - Go to **Subscriptions** → See active subscription - - Status should be `active` + - Status should be `active` (no trial period without coupon) + - Verify pricing: €49.00 EUR / month + - Check that first invoice was paid immediately 2. **In your database:** ```sql @@ -251,10 +461,108 @@ Stripe provides test card numbers to simulate different scenarios. **Never use r - Verify subscription record exists - Status should be `active` - Check `stripe_customer_id` is populated + - Verify `tier` = `starter` + - Verify `billing_cycle` = `monthly` or `yearly` + - Check `current_period_start` and `current_period_end` are set 3. **Check application logs:** - Look for successful subscription creation messages - Verify no error logs + - Check that subscription tier is cached properly + +--- + +### Scenario 1B: Registration with PILOT2025 Coupon (3 Months Free) + +**Objective:** Test the complete registration flow with the PILOT2025 coupon applied for 3-month free trial. + +**Steps:** + +1. **Start your applications** (same as Scenario 1) + +2. **Navigate to registration page:** + - Open browser: `http://localhost:5173/register` + +3. **Fill in user details:** + - Full Name: `Maria Garcia` + - Email: `maria.garcia+pilot@example.com` + - Company: `Panadería Piloto` + - Password: Create a test password + +4. **Fill in payment details:** + - Card Number: `4242 4242 4242 4242` + - Expiry: `12/30` + - CVC: `123` + - Complete remaining billing details + +5. **Select a plan:** + - Choose `Professional Plan` (€149/month or €1,490/year) + +6. **Apply the PILOT2025 coupon:** + - Enter coupon code: `PILOT2025` + - Click "Apply" or "Validate" + - Verify coupon is accepted and shows "3 months free trial" + +7. **Submit the form** + +**Expected Results:** +- ✅ Payment method created successfully +- ✅ User account created +- ✅ Subscription created in Stripe with 90-day trial period +- ✅ Database records created +- ✅ User redirected to dashboard +- ✅ No console errors +- ✅ Trial period: 90 days from today +- ✅ First invoice will be created after trial ends (90 days from now) +- ✅ Coupon redemption tracked in database + +**Verification:** + +1. **In Stripe Dashboard:** + - Go to **Customers** → Find "Maria Garcia" + - Go to **Subscriptions** → See active subscription + - **Status should be `trialing`** (this is key!) + - Verify pricing: €149.00 EUR / month (will charge this after trial) + - **Check trial end date:** Should be **90 days** from creation (hover over trial end timestamp) + - **Verify amount due now: €0.00** (no charge during trial) + - Check upcoming invoice: Should be scheduled for 90 days from now for €149.00 + +2. **In Stripe Dashboard - Check Trial Status:** + - Note: There is NO coupon to check in Stripe Dashboard (coupons are managed in your database) + - Instead, verify the subscription has the correct trial period + +3. **In your database:** + ```sql + -- Check subscription + SELECT * FROM subscriptions WHERE tenant_id = 'maria-tenant-id'; + + -- Check coupon redemption + SELECT * FROM coupon_redemptions WHERE tenant_id = 'maria-tenant-id'; + ``` + - Subscription status should be `active` + - Verify `tier` = `professional` + - Check trial dates: `trial_end` should be 90 days in the future + - Coupon redemption record should exist with: + - `coupon_code` = `PILOT2025` + - `redeemed_at` timestamp + - `tenant_id` matches + +4. **Test Coupon Re-use Prevention:** + - Try registering another subscription with the same tenant + - Use coupon code `PILOT2025` again + - **Expected:** Should be rejected with error "Coupon already used by this tenant" + +5. **Check application logs:** + - Look for coupon validation messages + - Verify coupon redemption was logged + - Check subscription creation with trial extension + - No errors related to Stripe or coupon processing + +**Test Multiple Tiers:** +Repeat this scenario with all three plans to verify the coupon works universally: +- Starter Plan: €49/month → 90-day trial → €49/month after trial +- Professional Plan: €149/month → 90-day trial → €149/month after trial +- Enterprise Plan: €499/month → 90-day trial → €499/month after trial --- @@ -410,7 +718,64 @@ Stripe provides test card numbers to simulate different scenarios. **Never use r --- -### Scenario 7: Retrieve Invoices +### Scenario 7: PILOT2025 Coupon - Maximum Redemptions Reached + +**Objective:** Test behavior when PILOT2025 coupon reaches its 20-customer limit. + +**Preparation:** +- This test should be performed after 20 successful redemptions +- Or manually update coupon max redemptions in Stripe to a lower number for testing + +**Steps:** + +1. **Set up for testing:** + - Option A: After 20 real redemptions + - Option B: In Stripe Dashboard, edit PILOT2025 coupon and set "Maximum redemptions" to 1, then redeem it once + +2. **Attempt to register with the exhausted coupon:** + - Navigate to registration page + - Fill in user details with a NEW user (e.g., `customer21@example.com`) + - Fill in payment details + - Select any plan + - Enter coupon code: `PILOT2025` + - Try to apply/validate the coupon + +**Expected Results:** +- ❌ Coupon validation fails +- ✅ Error message displayed: "This coupon has reached its maximum redemption limit" or similar +- ✅ User cannot proceed with this coupon +- ✅ User can still register without the coupon (immediate billing) +- ✅ Application logs show coupon limit reached + +**Verification:** + +1. **In your database:** + ```sql + -- Check coupon redemption count + SELECT code, current_redemptions, max_redemptions + FROM coupons + WHERE code = 'PILOT2025'; + + -- Count actual redemptions + SELECT COUNT(*) FROM coupon_redemptions WHERE coupon_code = 'PILOT2025'; + ``` + - Verify `current_redemptions` shows `20` (or your test limit) + - Verify actual redemption count matches + +2. **Test alternate registration path:** + - Remove the coupon code + - Complete registration without coupon + - Verify subscription created successfully with immediate billing + +**Important Notes:** +- Once the pilot program ends (20 customers), you can: + - Create a new coupon code (e.g., `PILOT2026`) for future pilots + - Increase the max redemptions if extending the program + - Disable PILOT2025 in your application's environment variables + +--- + +### Scenario 8: Retrieve Invoices **Objective:** Test invoice retrieval from Stripe. @@ -445,6 +810,247 @@ Stripe provides test card numbers to simulate different scenarios. **Never use r Webhooks are critical for handling asynchronous events from Stripe. Test them thoroughly. +### Webhook Implementation Status + +**✅ Your webhook implementation is COMPLETE and ready for production use.** + +The webhook endpoint is implemented in [services/tenant/app/api/webhooks.py](services/tenant/app/api/webhooks.py:29-118) with: + +**Implemented Features:** +- ✅ Proper signature verification (lines 52-61) +- ✅ Direct endpoint at `/webhooks/stripe` (bypasses gateway) +- ✅ All critical event handlers: + - `checkout.session.completed` - New checkout completion + - `customer.subscription.created` - New subscription tracking + - `customer.subscription.updated` - Status changes, plan updates (lines 162-205) + - `customer.subscription.deleted` - Cancellation handling (lines 207-240) + - `invoice.payment_succeeded` - Payment success, activation (lines 242-266) + - `invoice.payment_failed` - Payment failure, past_due status (lines 268-297) + - `customer.subscription.trial_will_end` - Trial ending notification +- ✅ Database updates with proper async handling +- ✅ Subscription cache invalidation (lines 192-200, 226-235) +- ✅ Structured logging with `structlog` + +**Database Model Support:** +The [Subscription model](services/tenant/app/models/tenants.py:149-185) includes all necessary fields: +- `stripe_subscription_id` - Links to Stripe subscription +- `stripe_customer_id` - Links to Stripe customer +- `status` - Tracks subscription status +- `trial_ends_at` - Trial period tracking +- Complete lifecycle management + +**Webhook Flow:** +1. Stripe sends event → `https://your-domain/webhooks/stripe` +2. Signature verification with `STRIPE_WEBHOOK_SECRET` +3. Event routing to appropriate handler +4. Database updates (subscription status, dates, etc.) +5. Cache invalidation for updated tenants +6. Success/failure response to Stripe + +### Checking Tenant Service Logs + +Since webhooks go directly to the tenant service (bypassing the gateway), you need to monitor tenant service logs to verify webhook processing. + +#### Kubernetes Log Commands + +**Real-time logs (Development with Kind/Colima):** +```bash +# Follow tenant service logs in real-time +kubectl logs -f deployment/tenant-service -n bakery-ia + +# Filter for Stripe-related events only +kubectl logs -f deployment/tenant-service -n bakery-ia | grep -i stripe + +# Filter for webhook events specifically +kubectl logs -f deployment/tenant-service -n bakery-ia | grep "webhook" +``` + +**Real-time logs (Production with MicroK8s):** +```bash +# Follow tenant service logs in real-time +microk8s kubectl logs -f deployment/tenant-service -n bakery-ia + +# Filter for Stripe events +microk8s kubectl logs -f deployment/tenant-service -n bakery-ia | grep -i stripe + +# Filter for webhook events +microk8s kubectl logs -f deployment/tenant-service -n bakery-ia | grep "webhook" +``` + +**Historical logs:** +```bash +# View last hour of logs +kubectl logs --since=1h deployment/tenant-service -n bakery-ia + +# View last 100 log lines +kubectl logs --tail=100 deployment/tenant-service -n bakery-ia + +# View logs for specific pod +kubectl get pods -n bakery-ia # Find pod name +kubectl logs -n bakery-ia +``` + +**Search logs for specific events:** +```bash +# Find subscription update events +kubectl logs deployment/tenant-service -n bakery-ia | grep "subscription.updated" + +# Find payment processing +kubectl logs deployment/tenant-service -n bakery-ia | grep "payment" + +# Find errors +kubectl logs deployment/tenant-service -n bakery-ia | grep -i error +``` + +#### Expected Log Output + +When webhooks are processed successfully, you should see logs like: + +**Successful webhook processing:** +```json +INFO Processing Stripe webhook event event_type=customer.subscription.updated event_id=evt_xxx +INFO Subscription updated in database subscription_id=sub_xxx tenant_id=tenant-uuid-xxx +``` + +**Payment succeeded:** +```json +INFO Processing invoice.payment_succeeded invoice_id=in_xxx subscription_id=sub_xxx +INFO Payment succeeded, subscription activated subscription_id=sub_xxx tenant_id=tenant-uuid-xxx +``` + +**Payment failed:** +```json +ERROR Processing invoice.payment_failed invoice_id=in_xxx subscription_id=sub_xxx customer_id=cus_xxx +WARNING Payment failed, subscription marked past_due subscription_id=sub_xxx tenant_id=tenant-uuid-xxx +``` + +**Signature verification errors (indicates webhook secret mismatch):** +```json +ERROR Invalid webhook signature error=... +``` + +#### Troubleshooting Webhook Issues via Logs + +**1. Webhook not reaching service:** +- Check if any logs appear when Stripe sends webhook +- Verify network connectivity to tenant service +- Confirm webhook URL is correct in Stripe Dashboard + +**2. Signature verification failing:** +```bash +# Look for signature errors +kubectl logs deployment/tenant-service -n bakery-ia | grep "Invalid webhook signature" +``` +- Verify `STRIPE_WEBHOOK_SECRET` in secrets.yaml matches Stripe Dashboard +- Ensure webhook secret is properly base64 encoded + +**3. Database not updating:** +```bash +# Check for database-related errors +kubectl logs deployment/tenant-service -n bakery-ia | grep -i "database\|commit\|rollback" +``` +- Verify database connection is healthy +- Check for transaction commit errors +- Look for subscription not found errors + +**4. Cache invalidation failing:** +```bash +# Check for cache-related errors +kubectl logs deployment/tenant-service -n bakery-ia | grep -i "cache\|redis" +``` +- Verify Redis connection is healthy +- Check if cache service is running + +### Testing Webhooks in Kubernetes Environment + +You have two options for testing webhooks when running in Kubernetes: + +#### Option 1: Using Stripe CLI with Port-Forward (Recommended for Local Dev) + +This approach works with your Kind/Colima + Tilt setup: + +**Step 1: Port-forward the tenant service** +```bash +# Forward tenant service to localhost +kubectl port-forward svc/tenant-service 8001:8000 -n bakery-ia +``` + +**Step 2: Use Stripe CLI to forward webhooks** +```bash +# In a new terminal, forward webhooks to the port-forwarded service +stripe listen --forward-to http://localhost:8001/webhooks/stripe +``` + +**Step 3: Copy the webhook secret** +```bash +# The stripe listen command will output a webhook secret like: +# > Ready! Your webhook signing secret is whsec_abc123... + +# Encode it for Kubernetes secrets +echo -n "whsec_abc123..." | base64 +``` + +**Step 4: Update secrets and restart** +```bash +# Update the STRIPE_WEBHOOK_SECRET in secrets.yaml with the base64 value +# Then apply: +kubectl apply -f infrastructure/kubernetes/base/secrets.yaml +kubectl rollout restart deployment/tenant-service -n bakery-ia +``` + +**Step 5: Test webhook events** +```bash +# In another terminal, trigger test events +stripe trigger customer.subscription.updated +stripe trigger invoice.payment_succeeded + +# Watch tenant service logs +kubectl logs -f deployment/tenant-service -n bakery-ia +``` + +#### Option 2: Using ngrok for Public URL (Works for Dev and Prod Testing) + +This approach exposes your local Kubernetes cluster (or VPS) to the internet: + +**For Development (Kind/Colima):** + +**Step 1: Port-forward tenant service** +```bash +kubectl port-forward svc/tenant-service 8001:8000 -n bakery-ia +``` + +**Step 2: Start ngrok** +```bash +ngrok http 8001 +``` + +**Step 3: Configure Stripe webhook** +- Copy the ngrok HTTPS URL (e.g., `https://abc123.ngrok.io`) +- Go to Stripe Dashboard → Developers → Webhooks +- Add endpoint: `https://abc123.ngrok.io/webhooks/stripe` +- Copy the webhook signing secret +- Update secrets.yaml and restart tenant service + +**For Production (MicroK8s on VPS):** + +If your VPS has a public IP, you can configure webhooks directly: + +**Step 1: Ensure tenant service is accessible** +```bash +# Check if tenant service is exposed externally +microk8s kubectl get svc tenant-service -n bakery-ia + +# If using NodePort or LoadBalancer, get the public endpoint +# If using Ingress, ensure DNS points to your VPS +``` + +**Step 2: Configure Stripe webhook** +- Use your public URL: `https://your-domain.com/webhooks/stripe` +- Or IP: `https://your-vps-ip:port/webhooks/stripe` +- Add endpoint in Stripe Dashboard +- Copy webhook signing secret +- Update secrets.yaml and apply to production cluster + ### Option 1: Using Stripe CLI (Recommended for Local Development) #### Step 1: Install Stripe CLI @@ -826,3 +1432,225 @@ If you encounter issues not covered in this guide: **Stripe Library Versions:** - Frontend: `@stripe/stripe-js@4.0.0`, `@stripe/react-stripe-js@3.0.0` - Backend: `stripe@14.1.0` + +--- + +## Appendix A: Quick Setup Checklist for Bakery IA + +Use this checklist when setting up your Stripe test environment: + +### 1. Stripe Dashboard Setup + +- [ ] Create Stripe account and enable Test Mode +- [ ] Create **Starter Plan** product with 2 prices: + - [ ] Monthly: €49.00 EUR (Price ID: `price_XXXXXX`) + - [ ] Yearly: €490.00 EUR (Price ID: `price_XXXXXX`) +- [ ] Create **Professional Plan** product with 2 prices: + - [ ] Monthly: €149.00 EUR (Price ID: `price_XXXXXX`) + - [ ] Yearly: €1,490.00 EUR (Price ID: `price_XXXXXX`) +- [ ] Create **Enterprise Plan** product with 2 prices: + - [ ] Monthly: €499.00 EUR (Price ID: `price_XXXXXX`) + - [ ] Yearly: €4,990.00 EUR (Price ID: `price_XXXXXX`) +- [ ] Do NOT configure default trial periods on products +- [ ] Copy all 6 Price IDs for configuration +- [ ] Retrieve Stripe API keys (publishable and secret) +- [ ] Configure webhook endpoint (use Stripe CLI for local testing) +- [ ] Get webhook signing secret + +### 2. Environment Configuration + +**Note:** This project runs exclusively in Kubernetes (Kind/Colima + Tilt for dev, MicroK8s for prod). + +**Frontend - Kubernetes ConfigMap:** + +Update [infrastructure/kubernetes/base/configmap.yaml](infrastructure/kubernetes/base/configmap.yaml:374-378): +```yaml +# FRONTEND CONFIGURATION section +VITE_STRIPE_PUBLISHABLE_KEY: "pk_test_XXXXXXXXXXXXXXXX" +VITE_PILOT_MODE_ENABLED: "true" +VITE_PILOT_COUPON_CODE: "PILOT2025" +VITE_PILOT_TRIAL_MONTHS: "3" +``` + +**Backend - Kubernetes Secrets:** + +Update [infrastructure/kubernetes/base/secrets.yaml](infrastructure/kubernetes/base/secrets.yaml:142-150): +```yaml +# payment-secrets section +data: + STRIPE_SECRET_KEY: + STRIPE_WEBHOOK_SECRET: +``` + +Encode your keys: +```bash +echo -n "sk_test_YOUR_KEY" | base64 +echo -n "whsec_YOUR_SECRET" | base64 +``` + +**Apply to Kubernetes:** +```bash +# Development (Kind/Colima with Tilt - auto-reloads) +kubectl apply -f infrastructure/kubernetes/base/configmap.yaml +kubectl apply -f infrastructure/kubernetes/base/secrets.yaml + +# Production (MicroK8s) +microk8s kubectl apply -f infrastructure/kubernetes/base/configmap.yaml +microk8s kubectl apply -f infrastructure/kubernetes/base/secrets.yaml +microk8s kubectl rollout restart deployment/frontend-deployment -n bakery-ia +microk8s kubectl rollout restart deployment/tenant-service -n bakery-ia +``` + +### 3. Database Verification + +- [ ] Verify PILOT2025 coupon exists in database: + ```sql + SELECT * FROM coupons WHERE code = 'PILOT2025'; + ``` +- [ ] Confirm coupon has max_redemptions = 20 and active = true + +### 4. Test Scenarios Priority Order + +Run these tests in order: + +1. **Scenario 1:** Registration without coupon (immediate billing) +2. **Scenario 1B:** Registration with PILOT2025 coupon (90-day trial, €0 charge) +3. **Scenario 2:** 3D Secure authentication +4. **Scenario 3:** Declined payment +5. **Scenario 5:** Subscription cancellation +6. **Scenario 7:** Coupon redemption limit +8. **Webhook testing:** Use Stripe CLI + +### 5. Stripe CLI Setup (Recommended) + +```bash +# Install Stripe CLI +brew install stripe/stripe-cli/stripe # macOS + +# Login +stripe login + +# Forward webhooks to local backend +stripe listen --forward-to localhost:8000/webhooks/stripe + +# Copy the webhook secret (whsec_...) to your backend .env +``` + +--- + +## Appendix B: Pricing Summary + +### Monthly Pricing (EUR) + +| Tier | Monthly | Yearly | Yearly Savings | Trial (with PILOT2025) | +|------|---------|--------|----------------|------------------------| +| **Starter** | €49 | €490 | €98 (17%) | 90 days €0 | +| **Professional** | €149 | €1,490 | €298 (17%) | 90 days €0 | +| **Enterprise** | €499 | €4,990 | €998 (17%) | 90 days €0 | + +### PILOT2025 Coupon Details + +- **Code:** `PILOT2025` +- **Type:** Trial extension (90 days) +- **Discount:** 100% off for first 3 months (€0 charge) +- **Applies to:** All tiers +- **Max redemptions:** 20 customers +- **Valid period:** 180 days from creation +- **Managed in:** Application database (NOT Stripe) + +### Billing Timeline Example + +**Professional Plan with PILOT2025:** + +| Day | Event | Charge | +|-----|-------|--------| +| 0 | Subscription created | €0 | +| 1-90 | Trial period (trialing status) | €0 | +| 91 | Trial ends, first invoice | €149 | +| 121 | Second invoice (30 days later) | €149 | +| 151 | Third invoice | €149 | +| ... | Monthly recurring | €149 | + +**Professional Plan without coupon:** + +| Day | Event | Charge | +|-----|-------|--------| +| 0 | Subscription created | €149 | +| 30 | First renewal | €149 | +| 60 | Second renewal | €149 | +| ... | Monthly recurring | €149 | + +### Test Card Quick Reference + +| Purpose | Card Number | Result | +|---------|-------------|--------| +| **Success** | `4242 4242 4242 4242` | Payment succeeds | +| **3D Secure** | `4000 0025 0000 3155` | Requires authentication | +| **Declined** | `4000 0000 0000 0002` | Card declined | +| **Insufficient Funds** | `4000 0000 0000 9995` | Insufficient funds | + +**Always use:** +- Expiry: Any future date (e.g., `12/30`) +- CVC: Any 3 digits (e.g., `123`) +- ZIP: Any valid format (e.g., `12345`) + +--- + +## Appendix C: Troubleshooting PILOT2025 Coupon + +### Issue: Coupon code not recognized + +**Possible causes:** +1. Coupon not created in database +2. Frontend environment variable not set +3. Coupon expired or inactive + +**Solution:** +```sql +-- Check if coupon exists +SELECT * FROM coupons WHERE code = 'PILOT2025'; + +-- If not found, manually insert (or restart application to trigger seeder) +INSERT INTO coupons (code, discount_type, discount_value, max_redemptions, active, valid_from, valid_until) +VALUES ('PILOT2025', 'trial_extension', 90, 20, true, NOW(), NOW() + INTERVAL '180 days'); +``` + +### Issue: Trial not applying in Stripe + +**Check:** +1. Backend logs for trial_period_days parameter +2. Verify coupon validation succeeded before subscription creation +3. Check subscription creation params in Stripe Dashboard logs + +**Expected log output:** +``` +INFO: Coupon PILOT2025 validated successfully +INFO: Creating subscription with trial_period_days=90 +INFO: Stripe subscription created with status=trialing +``` + +### Issue: Customer charged immediately despite coupon + +**Possible causes:** +1. Coupon validation failed silently +2. trial_period_days not passed to Stripe +3. Payment method attached before subscription (triggers charge) + +**Debug:** +```sql +-- Check if coupon was redeemed +SELECT * FROM coupon_redemptions WHERE tenant_id = 'your-tenant-id'; + +-- Check subscription trial dates +SELECT stripe_subscription_id, status, trial_ends_at +FROM subscriptions WHERE tenant_id = 'your-tenant-id'; +``` + +### Issue: Coupon already redeemed error + +**This is expected behavior.** Each tenant can only use PILOT2025 once. To test multiple times: +- Create new tenant accounts with different emails +- Or manually delete redemption record from database (test only): + ```sql + DELETE FROM coupon_redemptions WHERE tenant_id = 'test-tenant-id'; + ``` diff --git a/Tiltfile b/Tiltfile index a1973a55..1ae26389 100644 --- a/Tiltfile +++ b/Tiltfile @@ -7,6 +7,13 @@ # - PostgreSQL pgcrypto extension and audit logging # - Organized resource dependencies and live-reload capabilities # - Local registry for faster image builds and deployments +# +# Build Optimization: +# - Services only rebuild when their specific code changes (not all services) +# - Shared folder changes trigger rebuild of ALL services (as they all depend on it) +# - Uses 'only' parameter to watch only relevant files per service +# - Frontend only rebuilds when frontend/ code changes +# - Gateway only rebuilds when gateway/ or shared/ code changes # ============================================================================= # ============================================================================= @@ -197,16 +204,25 @@ k8s_yaml(kustomize('infrastructure/kubernetes/overlays/dev')) # ============================================================================= # Helper function for Python services with live updates +# This function ensures services only rebuild when their specific code changes, +# but all services rebuild when shared/ folder changes def build_python_service(service_name, service_path): docker_build( 'bakery/' + service_name, context='.', dockerfile='./services/' + service_path + '/Dockerfile', + # Only watch files relevant to this specific service + shared code + only=[ + './services/' + service_path, + './shared', + './scripts', + ], live_update=[ # Fall back to full image build if Dockerfile or requirements change fall_back_on([ './services/' + service_path + '/Dockerfile', - './services/' + service_path + '/requirements.txt' + './services/' + service_path + '/requirements.txt', + './shared/requirements-tracing.txt', ]), # Sync service code @@ -290,10 +306,21 @@ docker_build( 'bakery/gateway', context='.', dockerfile='./gateway/Dockerfile', + # Only watch gateway-specific files and shared code + only=[ + './gateway', + './shared', + './scripts', + ], live_update=[ - fall_back_on(['./gateway/Dockerfile', './gateway/requirements.txt']), + fall_back_on([ + './gateway/Dockerfile', + './gateway/requirements.txt', + './shared/requirements-tracing.txt', + ]), sync('./gateway', '/app'), sync('./shared', '/app/shared'), + sync('./scripts', '/app/scripts'), run('kill -HUP 1', trigger=['./gateway/**/*.py', './shared/**/*.py']), ], ignore=[ @@ -680,6 +707,13 @@ Documentation: docs/SECURITY_IMPLEMENTATION_COMPLETE.md docs/DATABASE_SECURITY_ANALYSIS_REPORT.md +Build Optimization Active: + ✅ Services only rebuild when their code changes + ✅ Shared folder changes trigger ALL services (as expected) + ✅ Reduces unnecessary rebuilds and disk usage + 💡 Edit service code: only that service rebuilds + 💡 Edit shared/ code: all services rebuild (required) + Useful Commands: # Work on specific services only tilt up diff --git a/frontend/src/components/domain/onboarding/steps/RegisterTenantStep.tsx b/frontend/src/components/domain/onboarding/steps/RegisterTenantStep.tsx index 89e0d31a..292c3745 100644 --- a/frontend/src/components/domain/onboarding/steps/RegisterTenantStep.tsx +++ b/frontend/src/components/domain/onboarding/steps/RegisterTenantStep.tsx @@ -198,16 +198,27 @@ export const RegisterTenantStep: React.FC = ({ // Trigger POI detection in the background (non-blocking) // This replaces the removed POI Detection step + // POI detection will be cached for 90 days and reused during training const bakeryLocation = wizardContext.state.bakeryLocation; if (bakeryLocation?.latitude && bakeryLocation?.longitude && tenant.id) { + console.log(`🔍 Triggering background POI detection for tenant ${tenant.id}...`); + // Run POI detection asynchronously without blocking the wizard flow + // This ensures POI data is ready before the training step poiContextApi.detectPOIs( tenant.id, bakeryLocation.latitude, bakeryLocation.longitude, - false // use_cache = false for initial detection + false // force_refresh = false, will use cache if available ).then((result) => { - console.log(`✅ POI detection completed automatically for tenant ${tenant.id}:`, result.summary); + const source = result.source || 'unknown'; + console.log(`✅ POI detection completed for tenant ${tenant.id} (source: ${source})`); + + if (result.poi_context) { + const totalPois = result.poi_context.total_pois_detected || 0; + const relevantCategories = result.poi_context.relevant_categories?.length || 0; + console.log(`📍 POI Summary: ${totalPois} POIs detected, ${relevantCategories} relevant categories`); + } // Phase 3: Handle calendar suggestion if available if (result.calendar_suggestion) { @@ -229,9 +240,12 @@ export const RegisterTenantStep: React.FC = ({ } } }).catch((error) => { - console.warn('⚠️ Background POI detection failed (non-blocking):', error); - // This is non-critical, so we don't block the user + console.warn('⚠️ Background POI detection failed (non-blocking):', error); + console.warn('Training will continue without POI features if detection is not complete.'); + // This is non-critical - training service will continue without POI features }); + } else { + console.warn('⚠️ Cannot trigger POI detection: missing location data or tenant ID'); } // Update the wizard context with tenant info diff --git a/gateway/README.md b/gateway/README.md index dddf2857..a70fd641 100644 --- a/gateway/README.md +++ b/gateway/README.md @@ -352,6 +352,25 @@ headers = { - **Caching**: Gateway caches validated service tokens for 5 minutes - **No Additional HTTP Calls**: Service auth happens locally at gateway +### Unified Header Management System + +The gateway uses a **centralized HeaderManager** for consistent header handling across all middleware and proxy layers. + +**Key Features:** +- Standardized header names and conventions +- Automatic header sanitization to prevent spoofing +- Unified header injection and forwarding +- Cross-middleware header access via `request.state.injected_headers` +- Consistent logging and error handling + +**Standard Headers:** +- `x-user-id`, `x-user-email`, `x-user-role`, `x-user-type` +- `x-service-name`, `x-tenant-id` +- `x-subscription-tier`, `x-subscription-status` +- `x-is-demo`, `x-demo-session-id`, `x-demo-account-type` +- `x-tenant-access-type`, `x-can-view-children`, `x-parent-tenant-id` +- `x-forwarded-by`, `x-request-id` + ### Context Header Injection When a service token is validated, the gateway injects these headers for downstream services: diff --git a/gateway/app/core/header_manager.py b/gateway/app/core/header_manager.py new file mode 100644 index 00000000..54f5b0d6 --- /dev/null +++ b/gateway/app/core/header_manager.py @@ -0,0 +1,345 @@ +""" +Unified Header Management System for API Gateway +Centralized header injection, forwarding, and validation +""" + +import structlog +from fastapi import Request +from typing import Dict, Any, Optional, List + +logger = structlog.get_logger() + + +class HeaderManager: + """ + Centralized header management for consistent header handling across gateway + """ + + # Standard header names (lowercase for consistency) + STANDARD_HEADERS = { + 'user_id': 'x-user-id', + 'user_email': 'x-user-email', + 'user_role': 'x-user-role', + 'user_type': 'x-user-type', + 'service_name': 'x-service-name', + 'tenant_id': 'x-tenant-id', + 'subscription_tier': 'x-subscription-tier', + 'subscription_status': 'x-subscription-status', + 'is_demo': 'x-is-demo', + 'demo_session_id': 'x-demo-session-id', + 'demo_account_type': 'x-demo-account-type', + 'tenant_access_type': 'x-tenant-access-type', + 'can_view_children': 'x-can-view-children', + 'parent_tenant_id': 'x-parent-tenant-id', + 'forwarded_by': 'x-forwarded-by', + 'request_id': 'x-request-id' + } + + # Headers that should be sanitized/removed from incoming requests + SANITIZED_HEADERS = [ + 'x-subscription-', + 'x-user-', + 'x-tenant-', + 'x-demo-', + 'x-forwarded-by' + ] + + # Headers that should be forwarded to downstream services + FORWARDABLE_HEADERS = [ + 'authorization', + 'content-type', + 'accept', + 'accept-language', + 'user-agent', + 'x-internal-service' # Required for internal service-to-service ML/alert triggers + ] + + def __init__(self): + self._initialized = False + + def initialize(self): + """Initialize header manager""" + if not self._initialized: + logger.info("HeaderManager initialized") + self._initialized = True + + def sanitize_incoming_headers(self, request: Request) -> None: + """ + Remove sensitive headers from incoming request to prevent spoofing + """ + if not hasattr(request.headers, '_list'): + return + + # Filter out headers that start with sanitized prefixes + sanitized_headers = [ + (k, v) for k, v in request.headers.raw + if not any(k.decode().lower().startswith(prefix.lower()) + for prefix in self.SANITIZED_HEADERS) + ] + + request.headers.__dict__["_list"] = sanitized_headers + logger.debug("Sanitized incoming headers") + + def inject_context_headers(self, request: Request, user_context: Dict[str, Any], + tenant_id: Optional[str] = None) -> Dict[str, str]: + """ + Inject standardized context headers into request + Returns dict of injected headers for reference + """ + injected_headers = {} + + # Ensure headers list exists + if not hasattr(request.headers, '_list'): + request.headers.__dict__["_list"] = [] + + # Store headers in request.state for cross-middleware access + request.state.injected_headers = {} + + # User context headers + if user_context.get('user_id'): + header_name = self.STANDARD_HEADERS['user_id'] + header_value = str(user_context['user_id']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + if user_context.get('email'): + header_name = self.STANDARD_HEADERS['user_email'] + header_value = str(user_context['email']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + if user_context.get('role'): + header_name = self.STANDARD_HEADERS['user_role'] + header_value = str(user_context['role']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # User type (service vs regular user) + if user_context.get('type'): + header_name = self.STANDARD_HEADERS['user_type'] + header_value = str(user_context['type']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Service name for service tokens + if user_context.get('service'): + header_name = self.STANDARD_HEADERS['service_name'] + header_value = str(user_context['service']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Tenant context + if tenant_id: + header_name = self.STANDARD_HEADERS['tenant_id'] + header_value = str(tenant_id) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Subscription context + if user_context.get('subscription_tier'): + header_name = self.STANDARD_HEADERS['subscription_tier'] + header_value = str(user_context['subscription_tier']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + if user_context.get('subscription_status'): + header_name = self.STANDARD_HEADERS['subscription_status'] + header_value = str(user_context['subscription_status']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Demo session context + is_demo = user_context.get('is_demo', False) + if is_demo: + header_name = self.STANDARD_HEADERS['is_demo'] + header_value = "true" + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + if user_context.get('demo_session_id'): + header_name = self.STANDARD_HEADERS['demo_session_id'] + header_value = str(user_context['demo_session_id']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + if user_context.get('demo_account_type'): + header_name = self.STANDARD_HEADERS['demo_account_type'] + header_value = str(user_context['demo_account_type']) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Hierarchical access context + if tenant_id: + tenant_access_type = getattr(request.state, 'tenant_access_type', 'direct') + can_view_children = getattr(request.state, 'can_view_children', False) + + header_name = self.STANDARD_HEADERS['tenant_access_type'] + header_value = str(tenant_access_type) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + header_name = self.STANDARD_HEADERS['can_view_children'] + header_value = str(can_view_children).lower() + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Parent tenant ID if hierarchical access + parent_tenant_id = getattr(request.state, 'parent_tenant_id', None) + if parent_tenant_id: + header_name = self.STANDARD_HEADERS['parent_tenant_id'] + header_value = str(parent_tenant_id) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Gateway identification + header_name = self.STANDARD_HEADERS['forwarded_by'] + header_value = "bakery-gateway" + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + # Request ID if available + request_id = getattr(request.state, 'request_id', None) + if request_id: + header_name = self.STANDARD_HEADERS['request_id'] + header_value = str(request_id) + self._add_header(request, header_name, header_value) + injected_headers[header_name] = header_value + request.state.injected_headers[header_name] = header_value + + logger.info("🔧 Injected context headers", + user_id=user_context.get('user_id'), + user_type=user_context.get('type', ''), + service_name=user_context.get('service', ''), + role=user_context.get('role', ''), + tenant_id=tenant_id, + is_demo=is_demo, + demo_session_id=user_context.get('demo_session_id', ''), + path=request.url.path) + + return injected_headers + + def _add_header(self, request: Request, header_name: str, header_value: str) -> None: + """ + Safely add header to request + """ + try: + request.headers.__dict__["_list"].append((header_name.encode(), header_value.encode())) + except Exception as e: + logger.warning(f"Failed to add header {header_name}: {e}") + + def get_forwardable_headers(self, request: Request) -> Dict[str, str]: + """ + Get headers that should be forwarded to downstream services + Includes both original request headers and injected context headers + """ + forwardable_headers = {} + + # Add forwardable original headers + for header_name in self.FORWARDABLE_HEADERS: + header_value = request.headers.get(header_name) + if header_value: + forwardable_headers[header_name] = header_value + + # Add injected context headers from request.state + if hasattr(request.state, 'injected_headers'): + for header_name, header_value in request.state.injected_headers.items(): + forwardable_headers[header_name] = header_value + + # Add authorization header if present + auth_header = request.headers.get('authorization') + if auth_header: + forwardable_headers['authorization'] = auth_header + + return forwardable_headers + + def get_all_headers_for_proxy(self, request: Request, + additional_headers: Optional[Dict[str, str]] = None) -> Dict[str, str]: + """ + Get complete set of headers for proxying to downstream services + """ + headers = self.get_forwardable_headers(request) + + # Add any additional headers + if additional_headers: + headers.update(additional_headers) + + # Remove host header as it will be set by httpx + headers.pop('host', None) + + return headers + + def validate_required_headers(self, request: Request, required_headers: List[str]) -> bool: + """ + Validate that required headers are present + """ + missing_headers = [] + + for header_name in required_headers: + # Check in injected headers first + if hasattr(request.state, 'injected_headers'): + if header_name in request.state.injected_headers: + continue + + # Check in request headers + if request.headers.get(header_name): + continue + + missing_headers.append(header_name) + + if missing_headers: + logger.warning(f"Missing required headers: {missing_headers}") + return False + + return True + + def get_header_value(self, request: Request, header_name: str, + default: Optional[str] = None) -> Optional[str]: + """ + Get header value from either injected headers or request headers + """ + # Check injected headers first + if hasattr(request.state, 'injected_headers'): + if header_name in request.state.injected_headers: + return request.state.injected_headers[header_name] + + # Check request headers + return request.headers.get(header_name, default) + + def add_header_for_middleware(self, request: Request, header_name: str, header_value: str) -> None: + """ + Allow middleware to add headers to the unified header system + This ensures all headers are available for proxying + """ + # Ensure injected_headers exists + if not hasattr(request.state, 'injected_headers'): + request.state.injected_headers = {} + + # Add header to injected_headers + request.state.injected_headers[header_name] = header_value + + # Also add to actual request headers for compatibility + try: + request.headers.__dict__["_list"].append((header_name.encode(), header_value.encode())) + except Exception as e: + logger.warning(f"Failed to add header {header_name} to request headers: {e}") + + logger.debug(f"Middleware added header: {header_name} = {header_value}") + + +# Global instance for easy access +header_manager = HeaderManager() \ No newline at end of file diff --git a/gateway/app/main.py b/gateway/app/main.py index 5f9d8bc6..5240feae 100644 --- a/gateway/app/main.py +++ b/gateway/app/main.py @@ -16,6 +16,7 @@ from shared.redis_utils import initialize_redis, close_redis, get_redis_client from shared.service_base import StandardFastAPIService from app.core.config import settings +from app.core.header_manager import header_manager from app.middleware.request_id import RequestIDMiddleware from app.middleware.auth import AuthMiddleware from app.middleware.logging import LoggingMiddleware @@ -50,6 +51,10 @@ class GatewayService(StandardFastAPIService): """Custom startup logic for Gateway""" global redis_client + # Initialize HeaderManager + header_manager.initialize() + logger.info("HeaderManager initialized") + # Initialize Redis try: await initialize_redis(settings.REDIS_URL, db=0, max_connections=50) diff --git a/gateway/app/middleware/auth.py b/gateway/app/middleware/auth.py index cb2d0301..a40009bf 100644 --- a/gateway/app/middleware/auth.py +++ b/gateway/app/middleware/auth.py @@ -14,6 +14,7 @@ import httpx import json from app.core.config import settings +from app.core.header_manager import header_manager from shared.auth.jwt_handler import JWTHandler from shared.auth.tenant_access import tenant_access_manager, extract_tenant_id_from_path, is_tenant_scoped_path @@ -60,15 +61,8 @@ class AuthMiddleware(BaseHTTPMiddleware): if request.method == "OPTIONS": return await call_next(request) - # SECURITY: Remove any incoming x-subscription-* headers - # These will be re-injected from verified JWT only - sanitized_headers = [ - (k, v) for k, v in request.headers.raw - if not k.decode().lower().startswith('x-subscription-') - and not k.decode().lower().startswith('x-user-') - and not k.decode().lower().startswith('x-tenant-') - ] - request.headers.__dict__["_list"] = sanitized_headers + # SECURITY: Remove any incoming sensitive headers using HeaderManager + header_manager.sanitize_incoming_headers(request) # Skip authentication for public routes if self._is_public_route(request.url.path): @@ -573,109 +567,13 @@ class AuthMiddleware(BaseHTTPMiddleware): async def _inject_context_headers(self, request: Request, user_context: Dict[str, Any], tenant_id: Optional[str] = None): """ - Inject user and tenant context headers for downstream services - ENHANCED: Added logging to verify header injection + Inject user and tenant context headers for downstream services using unified HeaderManager """ - # Enhanced logging for debugging - logger.info( - "🔧 Injecting context headers", - user_id=user_context.get("user_id"), - user_type=user_context.get("type", ""), - service_name=user_context.get("service", ""), - role=user_context.get("role", ""), - tenant_id=tenant_id, - is_demo=user_context.get("is_demo", False), - demo_session_id=user_context.get("demo_session_id", ""), - path=request.url.path - ) - - # Add user context headers - logger.debug(f"DEBUG: Injecting headers for user: {user_context.get('user_id')}, is_demo: {user_context.get('is_demo', False)}") - logger.debug(f"DEBUG: request.headers object id: {id(request.headers)}, _list id: {id(request.headers.__dict__.get('_list', []))}") + # Use unified HeaderManager for consistent header injection + injected_headers = header_manager.inject_context_headers(request, user_context, tenant_id) - # Store headers in request.state for cross-middleware access - request.state.injected_headers = { - "x-user-id": user_context["user_id"], - "x-user-email": user_context["email"], - "x-user-role": user_context.get("role", "user") - } - - request.headers.__dict__["_list"].append(( - b"x-user-id", user_context["user_id"].encode() - )) - request.headers.__dict__["_list"].append(( - b"x-user-email", user_context["email"].encode() - )) - - user_role = user_context.get("role", "user") - request.headers.__dict__["_list"].append(( - b"x-user-role", user_role.encode() - )) - - user_type = user_context.get("type", "") - if user_type: - request.headers.__dict__["_list"].append(( - b"x-user-type", user_type.encode() - )) - - service_name = user_context.get("service", "") - if service_name: - request.headers.__dict__["_list"].append(( - b"x-service-name", service_name.encode() - )) - - # Add tenant context if available - if tenant_id: - request.headers.__dict__["_list"].append(( - b"x-tenant-id", tenant_id.encode() - )) - - # Add subscription tier if available - subscription_tier = user_context.get("subscription_tier", "") - if subscription_tier: - request.headers.__dict__["_list"].append(( - b"x-subscription-tier", subscription_tier.encode() - )) - - # Add is_demo flag for demo sessions - is_demo = user_context.get("is_demo", False) - logger.debug(f"DEBUG: is_demo value: {is_demo}, type: {type(is_demo)}") - if is_demo: - logger.info(f"🎭 Adding demo session headers", - demo_session_id=user_context.get("demo_session_id", ""), - demo_account_type=user_context.get("demo_account_type", ""), - path=request.url.path) - request.headers.__dict__["_list"].append(( - b"x-is-demo", b"true" - )) - else: - logger.debug(f"DEBUG: Not adding demo headers because is_demo is: {is_demo}") - - # Add demo session context headers for backend services - demo_session_id = user_context.get("demo_session_id", "") - if demo_session_id: - request.headers.__dict__["_list"].append(( - b"x-demo-session-id", demo_session_id.encode() - )) - - demo_account_type = user_context.get("demo_account_type", "") - if demo_account_type: - request.headers.__dict__["_list"].append(( - b"x-demo-account-type", demo_account_type.encode() - )) - # Add hierarchical access headers if tenant context exists if tenant_id: - tenant_access_type = getattr(request.state, 'tenant_access_type', 'direct') - can_view_children = getattr(request.state, 'can_view_children', False) - - request.headers.__dict__["_list"].append(( - b"x-tenant-access-type", tenant_access_type.encode() - )) - request.headers.__dict__["_list"].append(( - b"x-can-view-children", str(can_view_children).encode() - )) - # If this is hierarchical access, include parent tenant ID # Get parent tenant ID from the auth service if available try: @@ -689,17 +587,16 @@ class AuthMiddleware(BaseHTTPMiddleware): hierarchy_data = response.json() parent_tenant_id = hierarchy_data.get("parent_tenant_id") if parent_tenant_id: - request.headers.__dict__["_list"].append(( - b"x-parent-tenant-id", parent_tenant_id.encode() - )) + # Add parent tenant ID using HeaderManager for consistency + header_name = header_manager.STANDARD_HEADERS['parent_tenant_id'] + header_value = str(parent_tenant_id) + header_manager.add_header_for_middleware(request, header_name, header_value) + logger.info(f"Added parent tenant ID header: {parent_tenant_id}") except Exception as e: logger.warning(f"Failed to get parent tenant ID: {e}") pass - - # Add gateway identification - request.headers.__dict__["_list"].append(( - b"x-forwarded-by", b"bakery-gateway" - )) + + return injected_headers async def _get_tenant_subscription_tier(self, tenant_id: str, request: Request) -> Optional[str]: """ diff --git a/gateway/app/middleware/rate_limiting.py b/gateway/app/middleware/rate_limiting.py index a1ba472b..2a022657 100644 --- a/gateway/app/middleware/rate_limiting.py +++ b/gateway/app/middleware/rate_limiting.py @@ -45,8 +45,17 @@ class APIRateLimitMiddleware(BaseHTTPMiddleware): return await call_next(request) try: - # Get subscription tier - subscription_tier = await self._get_subscription_tier(tenant_id, request) + # Get subscription tier from headers (added by AuthMiddleware) + subscription_tier = request.headers.get("x-subscription-tier") + + if not subscription_tier: + # Fallback: get from request state if headers not available + subscription_tier = getattr(request.state, "subscription_tier", None) + + if not subscription_tier: + # Final fallback: get from tenant service (should rarely happen) + subscription_tier = await self._get_subscription_tier(tenant_id, request) + logger.warning(f"Subscription tier not found in headers or state, fetched from tenant service: {subscription_tier}") # Get quota limit for tier quota_limit = self._get_quota_limit(subscription_tier) diff --git a/gateway/app/middleware/request_id.py b/gateway/app/middleware/request_id.py index 81b63255..e729c450 100644 --- a/gateway/app/middleware/request_id.py +++ b/gateway/app/middleware/request_id.py @@ -9,6 +9,8 @@ from fastapi import Request from starlette.middleware.base import BaseHTTPMiddleware from starlette.responses import Response +from app.core.header_manager import header_manager + logger = structlog.get_logger() @@ -40,11 +42,9 @@ class RequestIDMiddleware(BaseHTTPMiddleware): # Bind request ID to structured logger context logger_ctx = logger.bind(request_id=request_id) - # Inject request ID header for downstream services - # This is done by modifying the headers that will be forwarded - request.headers.__dict__["_list"].append(( - b"x-request-id", request_id.encode() - )) + # Inject request ID header for downstream services using HeaderManager + # Note: This runs early in middleware chain, so we use add_header_for_middleware + header_manager.add_header_for_middleware(request, "x-request-id", request_id) # Log request start logger_ctx.info( diff --git a/gateway/app/middleware/subscription.py b/gateway/app/middleware/subscription.py index 8eb9ac11..18724eda 100644 --- a/gateway/app/middleware/subscription.py +++ b/gateway/app/middleware/subscription.py @@ -15,6 +15,7 @@ import asyncio from datetime import datetime, timezone from app.core.config import settings +from app.core.header_manager import header_manager from app.utils.subscription_error_responses import create_upgrade_required_response logger = structlog.get_logger() @@ -178,7 +179,10 @@ class SubscriptionMiddleware(BaseHTTPMiddleware): r'/api/v1/subscriptions/.*', # Subscription management itself r'/api/v1/tenants/[^/]+/members.*', # Basic tenant info r'/docs.*', - r'/openapi\.json' + r'/openapi\.json', + # Training monitoring endpoints (WebSocket and status checks) + r'/api/v1/tenants/[^/]+/training/jobs/.*/live.*', # WebSocket endpoint + r'/api/v1/tenants/[^/]+/training/jobs/.*/status.*', # Status polling endpoint ] # Skip OPTIONS requests (CORS preflight) @@ -275,21 +279,11 @@ class SubscriptionMiddleware(BaseHTTPMiddleware): 'current_tier': current_tier } - # Use the same authentication pattern as gateway routes for fallback - headers = dict(request.headers) - headers.pop("host", None) - + # Use unified HeaderManager for consistent header handling + headers = header_manager.get_all_headers_for_proxy(request) + # Extract user_id for logging (fallback path) - user_id = 'unknown' - # Add user context headers if available - if hasattr(request.state, 'user') and request.state.user: - user = request.state.user - user_id = str(user.get('user_id', 'unknown')) - headers["x-user-id"] = user_id - headers["x-user-email"] = str(user.get('email', '')) - headers["x-user-role"] = str(user.get('role', 'user')) - headers["x-user-full-name"] = str(user.get('full_name', '')) - headers["x-tenant-id"] = str(user.get('tenant_id', '')) + user_id = header_manager.get_header_value(request, 'x-user-id', 'unknown') # Call tenant service fast tier endpoint with caching (fallback for old tokens) timeout_config = httpx.Timeout( diff --git a/gateway/app/routes/auth.py b/gateway/app/routes/auth.py index 4d580067..48a092db 100644 --- a/gateway/app/routes/auth.py +++ b/gateway/app/routes/auth.py @@ -13,6 +13,7 @@ from fastapi.responses import JSONResponse from typing import Dict, Any from app.core.config import settings +from app.core.header_manager import header_manager from app.core.service_discovery import ServiceDiscovery from shared.monitoring.metrics import MetricsCollector @@ -136,107 +137,32 @@ class AuthProxy: return AUTH_SERVICE_URL def _prepare_headers(self, headers, request=None) -> Dict[str, str]: - """Prepare headers for forwarding (remove hop-by-hop headers)""" - # Remove hop-by-hop headers - hop_by_hop_headers = { - 'connection', 'keep-alive', 'proxy-authenticate', - 'proxy-authorization', 'te', 'trailers', 'upgrade' - } - - # Convert headers to dict - get ALL headers including those added by middleware - # Middleware adds headers to _list, so we need to read from there - logger.debug(f"DEBUG: headers type: {type(headers)}, has _list: {hasattr(headers, '_list')}, has raw: {hasattr(headers, 'raw')}") - logger.debug(f"DEBUG: headers.__dict__ keys: {list(headers.__dict__.keys())}") - logger.debug(f"DEBUG: '_list' in headers.__dict__: {'_list' in headers.__dict__}") - - if hasattr(headers, '_list'): - logger.debug(f"DEBUG: Entering _list branch") - logger.debug(f"DEBUG: headers object id: {id(headers)}, _list id: {id(headers.__dict__.get('_list', []))}") - # Get headers from the _list where middleware adds them - all_headers_list = headers.__dict__.get('_list', []) - logger.debug(f"DEBUG: _list length: {len(all_headers_list)}") - - # Debug: Show first few headers in the list - debug_headers = [] - for i, (k, v) in enumerate(all_headers_list): - if i < 5: # Show first 5 headers for debugging + """Prepare headers for forwarding using unified HeaderManager""" + # Use unified HeaderManager to get all headers + if request: + all_headers = header_manager.get_all_headers_for_proxy(request) + logger.debug(f"DEBUG: Added headers from HeaderManager: {list(all_headers.keys())}") + else: + # Fallback: convert headers to dict manually + all_headers = {} + if hasattr(headers, '_list'): + for k, v in headers.__dict__.get('_list', []): key = k.decode() if isinstance(k, bytes) else k value = v.decode() if isinstance(v, bytes) else v - debug_headers.append(f"{key}: {value}") - logger.debug(f"DEBUG: First headers in _list: {debug_headers}") - - # Convert to dict for easier processing - all_headers = {} - for k, v in all_headers_list: - key = k.decode() if isinstance(k, bytes) else k - value = v.decode() if isinstance(v, bytes) else v - all_headers[key] = value - - # Debug: Show if x-user-id and x-is-demo are in the dict - logger.debug(f"DEBUG: x-user-id in all_headers: {'x-user-id' in all_headers}, x-is-demo in all_headers: {'x-is-demo' in all_headers}") - logger.debug(f"DEBUG: all_headers keys: {list(all_headers.keys())[:10]}...") # Show first 10 keys - - logger.info(f"📤 Forwarding headers to auth service - x_user_id: {all_headers.get('x-user-id', 'MISSING')}, x_is_demo: {all_headers.get('x-is-demo', 'MISSING')}, x_demo_session_id: {all_headers.get('x-demo-session-id', 'MISSING')}, headers: {list(all_headers.keys())}") - - # Check if headers are missing and try to get them from request.state - if request and hasattr(request, 'state') and hasattr(request.state, 'injected_headers'): - logger.debug(f"DEBUG: Found injected_headers in request.state: {request.state.injected_headers}") - # Add missing headers from request.state - if 'x-user-id' not in all_headers and 'x-user-id' in request.state.injected_headers: - all_headers['x-user-id'] = request.state.injected_headers['x-user-id'] - logger.debug(f"DEBUG: Added x-user-id from request.state: {all_headers['x-user-id']}") - if 'x-user-email' not in all_headers and 'x-user-email' in request.state.injected_headers: - all_headers['x-user-email'] = request.state.injected_headers['x-user-email'] - logger.debug(f"DEBUG: Added x-user-email from request.state: {all_headers['x-user-email']}") - if 'x-user-role' not in all_headers and 'x-user-role' in request.state.injected_headers: - all_headers['x-user-role'] = request.state.injected_headers['x-user-role'] - logger.debug(f"DEBUG: Added x-user-role from request.state: {all_headers['x-user-role']}") - - # Add is_demo flag if this is a demo session - if hasattr(request.state, 'is_demo_session') and request.state.is_demo_session: - all_headers['x-is-demo'] = 'true' - logger.debug(f"DEBUG: Added x-is-demo from request.state.is_demo_session") - - # Filter out hop-by-hop headers - filtered_headers = { - k: v for k, v in all_headers.items() - if k.lower() not in hop_by_hop_headers - } - elif hasattr(headers, 'raw'): - logger.debug(f"DEBUG: Entering raw branch") - - # Filter out hop-by-hop headers - filtered_headers = { - k: v for k, v in all_headers.items() - if k.lower() not in hop_by_hop_headers - } - elif hasattr(headers, 'raw'): - # Fallback to raw headers if _list not available - all_headers = { - k.decode() if isinstance(k, bytes) else k: v.decode() if isinstance(v, bytes) else v - for k, v in headers.raw - } - logger.info(f"📤 Forwarding headers to auth service - x_user_id: {all_headers.get('x-user-id', 'MISSING')}, x_is_demo: {all_headers.get('x-is-demo', 'MISSING')}, x_demo_session_id: {all_headers.get('x-demo-session-id', 'MISSING')}, headers: {list(all_headers.keys())}") - - filtered_headers = { - k.decode() if isinstance(k, bytes) else k: v.decode() if isinstance(v, bytes) else v - for k, v in headers.raw - if (k.decode() if isinstance(k, bytes) else k).lower() not in hop_by_hop_headers - } - else: - # Handle case where headers is already a dict - logger.info(f"📤 Forwarding headers to auth service - x_user_id: {headers.get('x-user-id', 'MISSING')}, x_is_demo: {headers.get('x-is-demo', 'MISSING')}, x_demo_session_id: {headers.get('x-demo-session-id', 'MISSING')}, headers: {list(headers.keys())}") - - filtered_headers = { - k: v for k, v in headers.items() - if k.lower() not in hop_by_hop_headers - } - - # Add gateway identifier - filtered_headers['X-Forwarded-By'] = 'bakery-gateway' - filtered_headers['X-Gateway-Version'] = '1.0.0' - - return filtered_headers + all_headers[key] = value + elif hasattr(headers, 'raw'): + for k, v in headers.raw: + key = k.decode() if isinstance(k, bytes) else k + value = v.decode() if isinstance(v, bytes) else v + all_headers[key] = value + else: + # Headers is already a dict + all_headers = dict(headers) + + # Debug logging + logger.info(f"📤 Forwarding headers - x_user_id: {all_headers.get('x-user-id', 'MISSING')}, x_is_demo: {all_headers.get('x-is-demo', 'MISSING')}, x_demo_session_id: {all_headers.get('x-demo-session-id', 'MISSING')}, headers: {list(all_headers.keys())}") + + return all_headers def _prepare_response_headers(self, headers: Dict[str, str]) -> Dict[str, str]: """Prepare response headers""" diff --git a/gateway/app/routes/demo.py b/gateway/app/routes/demo.py index 8959fe47..91ff04ab 100644 --- a/gateway/app/routes/demo.py +++ b/gateway/app/routes/demo.py @@ -8,6 +8,7 @@ import httpx import structlog from app.core.config import settings +from app.core.header_manager import header_manager logger = structlog.get_logger() @@ -29,12 +30,8 @@ async def proxy_demo_service(path: str, request: Request): if request.method in ["POST", "PUT", "PATCH"]: body = await request.body() - # Forward headers (excluding host) - headers = { - key: value - for key, value in request.headers.items() - if key.lower() not in ["host", "content-length"] - } + # Use unified HeaderManager for consistent header forwarding + headers = header_manager.get_all_headers_for_proxy(request) try: async with httpx.AsyncClient(timeout=30.0) as client: diff --git a/gateway/app/routes/geocoding.py b/gateway/app/routes/geocoding.py index cff8a83c..34b3d90e 100644 --- a/gateway/app/routes/geocoding.py +++ b/gateway/app/routes/geocoding.py @@ -5,6 +5,7 @@ from fastapi.responses import JSONResponse import httpx import structlog from app.core.config import settings +from app.core.header_manager import header_manager logger = structlog.get_logger() router = APIRouter() @@ -26,12 +27,8 @@ async def proxy_geocoding(request: Request, path: str): if request.method in ["POST", "PUT", "PATCH"]: body = await request.body() - # Forward headers (excluding host) - headers = { - key: value - for key, value in request.headers.items() - if key.lower() not in ["host", "content-length"] - } + # Use unified HeaderManager for consistent header forwarding + headers = header_manager.get_all_headers_for_proxy(request) # Make the proxied request async with httpx.AsyncClient(timeout=30.0) as client: diff --git a/gateway/app/routes/poi_context.py b/gateway/app/routes/poi_context.py index 33f03aae..05a509f8 100644 --- a/gateway/app/routes/poi_context.py +++ b/gateway/app/routes/poi_context.py @@ -8,6 +8,7 @@ from fastapi.responses import JSONResponse import httpx import structlog from app.core.config import settings +from app.core.header_manager import header_manager logger = structlog.get_logger() router = APIRouter() @@ -44,12 +45,8 @@ async def proxy_poi_context(request: Request, path: str): if request.method in ["POST", "PUT", "PATCH"]: body = await request.body() - # Copy headers (exclude host and content-length as they'll be set by httpx) - headers = { - key: value - for key, value in request.headers.items() - if key.lower() not in ["host", "content-length"] - } + # Use unified HeaderManager for consistent header forwarding + headers = header_manager.get_all_headers_for_proxy(request) # Make the request to the external service async with httpx.AsyncClient(timeout=60.0) as client: diff --git a/gateway/app/routes/pos.py b/gateway/app/routes/pos.py index ae4f79f2..89a432fd 100644 --- a/gateway/app/routes/pos.py +++ b/gateway/app/routes/pos.py @@ -8,6 +8,7 @@ import httpx import logging from app.core.config import settings +from app.core.header_manager import header_manager logger = logging.getLogger(__name__) router = APIRouter() @@ -45,9 +46,8 @@ async def _proxy_to_pos_service(request: Request, target_path: str): try: url = f"{settings.POS_SERVICE_URL}{target_path}" - # Forward headers - headers = dict(request.headers) - headers.pop("host", None) + # Use unified HeaderManager for consistent header forwarding + headers = header_manager.get_all_headers_for_proxy(request) # Add query parameters params = dict(request.query_params) diff --git a/gateway/app/routes/subscription.py b/gateway/app/routes/subscription.py index c67a3552..7b1cca13 100644 --- a/gateway/app/routes/subscription.py +++ b/gateway/app/routes/subscription.py @@ -9,6 +9,7 @@ import logging from typing import Optional from app.core.config import settings +from app.core.header_manager import header_manager logger = logging.getLogger(__name__) router = APIRouter() @@ -98,29 +99,13 @@ async def _proxy_request(request: Request, target_path: str, service_url: str): try: url = f"{service_url}{target_path}" - # Forward headers and add user/tenant context - headers = dict(request.headers) - headers.pop("host", None) - - # Add user context headers if available - if hasattr(request.state, 'user') and request.state.user: - user = request.state.user - headers["x-user-id"] = str(user.get('user_id', '')) - headers["x-user-email"] = str(user.get('email', '')) - headers["x-user-role"] = str(user.get('role', 'user')) - headers["x-user-full-name"] = str(user.get('full_name', '')) - headers["x-tenant-id"] = str(user.get('tenant_id', '')) - - # Add subscription context headers - if user.get('subscription_tier'): - headers["x-subscription-tier"] = str(user.get('subscription_tier', '')) - logger.debug(f"Forwarding subscription tier: {user.get('subscription_tier')}") - - if user.get('subscription_status'): - headers["x-subscription-status"] = str(user.get('subscription_status', '')) - logger.debug(f"Forwarding subscription status: {user.get('subscription_status')}") - - logger.info(f"Forwarding subscription request to {url} with user context: user_id={user.get('user_id')}, email={user.get('email')}, subscription_tier={user.get('subscription_tier', 'not_set')}") + # Use unified HeaderManager for consistent header forwarding + headers = header_manager.get_all_headers_for_proxy(request) + + # Debug logging + user_context = getattr(request.state, 'user', None) + if user_context: + logger.info(f"Forwarding subscription request to {url} with user context: user_id={user_context.get('user_id')}, email={user_context.get('email')}, subscription_tier={user_context.get('subscription_tier', 'not_set')}") else: logger.warning(f"No user context available when forwarding subscription request to {url}") diff --git a/gateway/app/routes/tenant.py b/gateway/app/routes/tenant.py index 20f4a73a..3b00b620 100644 --- a/gateway/app/routes/tenant.py +++ b/gateway/app/routes/tenant.py @@ -10,6 +10,7 @@ import logging from typing import Optional from app.core.config import settings +from app.core.header_manager import header_manager logger = logging.getLogger(__name__) router = APIRouter() @@ -715,36 +716,18 @@ async def _proxy_request(request: Request, target_path: str, service_url: str, t try: url = f"{service_url}{target_path}" - # Forward headers and add user/tenant context - headers = dict(request.headers) - headers.pop("host", None) - - # Add tenant ID header if provided + # Use unified HeaderManager for consistent header forwarding + headers = header_manager.get_all_headers_for_proxy(request) + + # Add tenant ID header if provided (override if needed) if tenant_id: - headers["X-Tenant-ID"] = tenant_id - - # Add user context headers if available - if hasattr(request.state, 'user') and request.state.user: - user = request.state.user - headers["x-user-id"] = str(user.get('user_id', '')) - headers["x-user-email"] = str(user.get('email', '')) - headers["x-user-role"] = str(user.get('role', 'user')) - headers["x-user-full-name"] = str(user.get('full_name', '')) - headers["x-tenant-id"] = tenant_id or str(user.get('tenant_id', '')) - - # Add subscription context headers - if user.get('subscription_tier'): - headers["x-subscription-tier"] = str(user.get('subscription_tier', '')) - logger.debug(f"Forwarding subscription tier: {user.get('subscription_tier')}") - - if user.get('subscription_status'): - headers["x-subscription-status"] = str(user.get('subscription_status', '')) - logger.debug(f"Forwarding subscription status: {user.get('subscription_status')}") - - # Debug logging - logger.info(f"Forwarding request to {url} with user context: user_id={user.get('user_id')}, email={user.get('email')}, tenant_id={tenant_id}, subscription_tier={user.get('subscription_tier', 'not_set')}") + headers["x-tenant-id"] = tenant_id + + # Debug logging + user_context = getattr(request.state, 'user', None) + if user_context: + logger.info(f"Forwarding request to {url} with user context: user_id={user_context.get('user_id')}, email={user_context.get('email')}, tenant_id={tenant_id}, subscription_tier={user_context.get('subscription_tier', 'not_set')}") else: - # Debug logging when no user context available logger.warning(f"No user context available when forwarding request to {url}. request.state.user: {getattr(request.state, 'user', 'NOT_SET')}") # Get request body if present @@ -782,9 +765,10 @@ async def _proxy_request(request: Request, target_path: str, service_url: str, t logger.info(f"Forwarding multipart request with files={list(files.keys()) if files else None}, data={list(data.keys()) if data else None}") - # Remove content-type from headers - httpx will set it with new boundary - headers.pop("content-type", None) - headers.pop("content-length", None) + # For multipart requests, we need to get fresh headers since httpx will set content-type + # Get all headers again to ensure we have the complete set + headers = header_manager.get_all_headers_for_proxy(request) + # httpx will automatically set content-type for multipart, so we don't need to remove it else: # For other content types, use body as before body = await request.body() diff --git a/gateway/app/routes/user.py b/gateway/app/routes/user.py index db9026f6..b3d37b5b 100644 --- a/gateway/app/routes/user.py +++ b/gateway/app/routes/user.py @@ -13,6 +13,7 @@ from typing import Dict, Any import json from app.core.config import settings +from app.core.header_manager import header_manager from app.core.service_discovery import ServiceDiscovery from shared.monitoring.metrics import MetricsCollector @@ -136,64 +137,28 @@ class UserProxy: return AUTH_SERVICE_URL def _prepare_headers(self, headers, request=None) -> Dict[str, str]: - """Prepare headers for forwarding (remove hop-by-hop headers)""" - # Remove hop-by-hop headers - hop_by_hop_headers = { - 'connection', 'keep-alive', 'proxy-authenticate', - 'proxy-authorization', 'te', 'trailers', 'upgrade' - } - - # Convert headers to dict if it's a Headers object - # This ensures we get ALL headers including those added by middleware - if hasattr(headers, '_list'): - # Get headers from the _list where middleware adds them - all_headers_list = headers.__dict__.get('_list', []) - - # Convert to dict for easier processing - all_headers = {} - for k, v in all_headers_list: - key = k.decode() if isinstance(k, bytes) else k - value = v.decode() if isinstance(v, bytes) else v - all_headers[key] = value - - # Check if headers are missing and try to get them from request.state - if request and hasattr(request, 'state') and hasattr(request.state, 'injected_headers'): - # Add missing headers from request.state - if 'x-user-id' not in all_headers and 'x-user-id' in request.state.injected_headers: - all_headers['x-user-id'] = request.state.injected_headers['x-user-id'] - if 'x-user-email' not in all_headers and 'x-user-email' in request.state.injected_headers: - all_headers['x-user-email'] = request.state.injected_headers['x-user-email'] - if 'x-user-role' not in all_headers and 'x-user-role' in request.state.injected_headers: - all_headers['x-user-role'] = request.state.injected_headers['x-user-role'] - - # Add is_demo flag if this is a demo session - if hasattr(request.state, 'is_demo_session') and request.state.is_demo_session: - all_headers['x-is-demo'] = 'true' - - # Filter out hop-by-hop headers - filtered_headers = { - k: v for k, v in all_headers.items() - if k.lower() not in hop_by_hop_headers - } - elif hasattr(headers, 'raw'): - # FastAPI/Starlette Headers object - use raw to get all headers - filtered_headers = { - k.decode() if isinstance(k, bytes) else k: v.decode() if isinstance(v, bytes) else v - for k, v in headers.raw - if (k.decode() if isinstance(k, bytes) else k).lower() not in hop_by_hop_headers - } + """Prepare headers for forwarding using unified HeaderManager""" + # Use unified HeaderManager to get all headers + if request: + all_headers = header_manager.get_all_headers_for_proxy(request) else: - # Already a dict - filtered_headers = { - k: v for k, v in headers.items() - if k.lower() not in hop_by_hop_headers - } - - # Add gateway identifier - filtered_headers['X-Forwarded-By'] = 'bakery-gateway' - filtered_headers['X-Gateway-Version'] = '1.0.0' - - return filtered_headers + # Fallback: convert headers to dict manually + all_headers = {} + if hasattr(headers, '_list'): + for k, v in headers.__dict__.get('_list', []): + key = k.decode() if isinstance(k, bytes) else k + value = v.decode() if isinstance(v, bytes) else v + all_headers[key] = value + elif hasattr(headers, 'raw'): + for k, v in headers.raw: + key = k.decode() if isinstance(k, bytes) else k + value = v.decode() if isinstance(v, bytes) else v + all_headers[key] = value + else: + # Headers is already a dict + all_headers = dict(headers) + + return all_headers def _prepare_response_headers(self, headers: Dict[str, str]) -> Dict[str, str]: """Prepare response headers""" diff --git a/scripts/cleanup-docker.sh b/scripts/cleanup-docker.sh new file mode 100755 index 00000000..05ea0489 --- /dev/null +++ b/scripts/cleanup-docker.sh @@ -0,0 +1,82 @@ +#!/bin/bash +# Docker Cleanup Script for Local Kubernetes Development +# This script helps prevent disk space issues by cleaning up unused Docker resources + +set -e + +echo "🧹 Docker Cleanup Script for Bakery-IA Local Development" +echo "=========================================================" +echo "" + +# Check if we should run automatically or ask for confirmation +AUTO_MODE=${1:-""} + +# Show current disk usage +echo "📊 Current Docker Disk Usage:" +docker system df +echo "" + +# Check Kind node disk usage if cluster is running +if docker ps | grep -q "bakery-ia-local-control-plane"; then + echo "📊 Kind Node Disk Usage:" + docker exec bakery-ia-local-control-plane df -h / /var | grep -E "(Filesystem|overlay|/dev/vdb1)" + echo "" +fi + +# Calculate reclaimable space +RECLAIMABLE=$(docker system df | grep "Images" | awk '{print $4}') +echo "💾 Estimated reclaimable space: $RECLAIMABLE" +echo "" + +# Ask for confirmation unless in auto mode +if [ "$AUTO_MODE" != "--auto" ]; then + read -p "Do you want to proceed with cleanup? (y/n) " -n 1 -r + echo "" + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo "❌ Cleanup cancelled" + exit 0 + fi +fi + +echo "🚀 Starting cleanup..." +echo "" + +# Remove unused images (keep images from last 24 hours) +echo "1️⃣ Removing unused Docker images..." +docker image prune -af --filter "until=24h" || true +echo "" + +# Remove unused volumes +echo "2️⃣ Removing unused Docker volumes..." +docker volume prune -f || true +echo "" + +# Remove build cache +echo "3️⃣ Removing build cache..." +docker builder prune -af || true +echo "" + +# Show results +echo "✅ Cleanup completed!" +echo "" +echo "📊 Final Docker Disk Usage:" +docker system df +echo "" + +# Check Kind node disk usage if cluster is running +if docker ps | grep -q "bakery-ia-local-control-plane"; then + echo "📊 Kind Node Disk Usage After Cleanup:" + docker exec bakery-ia-local-control-plane df -h / /var | grep -E "(Filesystem|overlay|/dev/vdb1)" + echo "" + + # Warn if still above 80% + USAGE=$(docker exec bakery-ia-local-control-plane df -h /var | tail -1 | awk '{print $5}' | sed 's/%//') + if [ "$USAGE" -gt 80 ]; then + echo "⚠️ Warning: Disk usage is still above 80%. Consider:" + echo " - Deleting and recreating the Kind cluster" + echo " - Increasing Docker's disk allocation" + echo " - Running: docker system prune -a --volumes -f" + fi +fi + +echo "🎉 All done!" diff --git a/services/forecasting/app/clients/inventory_client.py b/services/forecasting/app/clients/inventory_client.py deleted file mode 100644 index c78350b3..00000000 --- a/services/forecasting/app/clients/inventory_client.py +++ /dev/null @@ -1,82 +0,0 @@ -# services/forecasting/app/clients/inventory_client.py -""" -Simple client for inventory service integration -Used when product names are not available locally -""" - -import aiohttp -import structlog -from typing import Optional, Dict, Any -import os - -logger = structlog.get_logger() - -class InventoryServiceClient: - """Simple client for inventory service interactions""" - - def __init__(self, base_url: str = None): - self.base_url = base_url or os.getenv("INVENTORY_SERVICE_URL", "http://inventory-service:8000") - self.timeout = aiohttp.ClientTimeout(total=5) # 5 second timeout - - async def get_product_name(self, tenant_id: str, inventory_product_id: str) -> Optional[str]: - """ - Get product name from inventory service - Returns None if service is unavailable or product not found - """ - try: - async with aiohttp.ClientSession(timeout=self.timeout) as session: - url = f"{self.base_url}/api/v1/products/{inventory_product_id}" - headers = {"X-Tenant-ID": tenant_id} - - async with session.get(url, headers=headers) as response: - if response.status == 200: - data = await response.json() - return data.get("name", f"Product-{inventory_product_id}") - else: - logger.debug("Product not found in inventory service", - inventory_product_id=inventory_product_id, - status=response.status) - return None - - except Exception as e: - logger.debug("Failed to get product name from inventory service", - inventory_product_id=inventory_product_id, - error=str(e)) - return None - - async def get_multiple_product_names(self, tenant_id: str, product_ids: list) -> Dict[str, str]: - """ - Get multiple product names efficiently - Returns a mapping of product_id -> product_name - """ - try: - async with aiohttp.ClientSession(timeout=self.timeout) as session: - url = f"{self.base_url}/api/v1/products/batch" - headers = {"X-Tenant-ID": tenant_id} - payload = {"product_ids": product_ids} - - async with session.post(url, json=payload, headers=headers) as response: - if response.status == 200: - data = await response.json() - return {item["id"]: item["name"] for item in data.get("products", [])} - else: - logger.debug("Batch product lookup failed", - product_count=len(product_ids), - status=response.status) - return {} - - except Exception as e: - logger.debug("Failed to get product names from inventory service", - product_count=len(product_ids), - error=str(e)) - return {} - -# Global client instance -_inventory_client = None - -def get_inventory_client() -> InventoryServiceClient: - """Get the global inventory client instance""" - global _inventory_client - if _inventory_client is None: - _inventory_client = InventoryServiceClient() - return _inventory_client \ No newline at end of file diff --git a/services/forecasting/app/services/forecasting_alert_service.py b/services/forecasting/app/services/forecasting_alert_service.py index 2691eab3..ba0f3a0c 100644 --- a/services/forecasting/app/services/forecasting_alert_service.py +++ b/services/forecasting/app/services/forecasting_alert_service.py @@ -12,7 +12,6 @@ from datetime import datetime, timedelta import structlog from shared.messaging import UnifiedEventPublisher -from app.clients.inventory_client import get_inventory_client logger = structlog.get_logger() diff --git a/services/inventory/app/api/internal_alert_trigger.py b/services/inventory/app/api/internal_alert_trigger.py index 26076605..acca6ac8 100644 --- a/services/inventory/app/api/internal_alert_trigger.py +++ b/services/inventory/app/api/internal_alert_trigger.py @@ -30,11 +30,11 @@ async def trigger_inventory_alerts( - Expiring ingredients - Overstock situations - Security: Protected by X-Internal-Service header check. + Security: Protected by x-internal-service header check. """ try: # Verify internal service header - if not request or request.headers.get("X-Internal-Service") not in ["demo-session", "internal"]: + if not request or request.headers.get("x-internal-service") not in ["demo-session", "internal"]: logger.warning("Unauthorized internal API call", tenant_id=str(tenant_id)) raise HTTPException( status_code=403, diff --git a/services/inventory/app/api/ml_insights.py b/services/inventory/app/api/ml_insights.py index e031b37f..55fcff09 100644 --- a/services/inventory/app/api/ml_insights.py +++ b/services/inventory/app/api/ml_insights.py @@ -350,7 +350,7 @@ async def generate_safety_stock_insights_internal( This endpoint is called by the demo-session service after cloning data. It uses the same ML logic as the public endpoint but with optimized defaults. - Security: Protected by X-Internal-Service header check. + Security: Protected by x-internal-service header check. Args: tenant_id: The tenant UUID @@ -365,7 +365,7 @@ async def generate_safety_stock_insights_internal( } """ # Verify internal service header - if not request or request.headers.get("X-Internal-Service") not in ["demo-session", "internal"]: + if not request or request.headers.get("x-internal-service") not in ["demo-session", "internal"]: logger.warning("Unauthorized internal API call", tenant_id=tenant_id) raise HTTPException( status_code=403, diff --git a/services/procurement/app/api/internal_delivery_tracking.py b/services/procurement/app/api/internal_delivery_tracking.py index 307c8923..38e138b3 100644 --- a/services/procurement/app/api/internal_delivery_tracking.py +++ b/services/procurement/app/api/internal_delivery_tracking.py @@ -29,7 +29,7 @@ async def trigger_delivery_tracking( This endpoint is called by the demo session cloning process after POs are seeded to generate realistic delivery alerts (arriving soon, overdue, etc.). - Security: Protected by X-Internal-Service header check. + Security: Protected by x-internal-service header check. Args: tenant_id: Tenant UUID to check deliveries for @@ -49,7 +49,7 @@ async def trigger_delivery_tracking( """ try: # Verify internal service header - if not request or request.headers.get("X-Internal-Service") not in ["demo-session", "internal"]: + if not request or request.headers.get("x-internal-service") not in ["demo-session", "internal"]: logger.warning("Unauthorized internal API call", tenant_id=str(tenant_id)) raise HTTPException( status_code=403, diff --git a/services/procurement/app/api/ml_insights.py b/services/procurement/app/api/ml_insights.py index 96b53c13..f314d4c6 100644 --- a/services/procurement/app/api/ml_insights.py +++ b/services/procurement/app/api/ml_insights.py @@ -566,7 +566,7 @@ async def generate_price_insights_internal( This endpoint is called by the demo-session service after cloning data. It uses the same ML logic as the public endpoint but with optimized defaults. - Security: Protected by X-Internal-Service header check. + Security: Protected by x-internal-service header check. Args: tenant_id: The tenant UUID @@ -581,7 +581,7 @@ async def generate_price_insights_internal( } """ # Verify internal service header - if not request or request.headers.get("X-Internal-Service") not in ["demo-session", "internal"]: + if not request or request.headers.get("x-internal-service") not in ["demo-session", "internal"]: logger.warning("Unauthorized internal API call", tenant_id=tenant_id) raise HTTPException( status_code=403, diff --git a/services/procurement/app/core/dependencies.py b/services/procurement/app/core/dependencies.py index 711129c1..4d885f7a 100644 --- a/services/procurement/app/core/dependencies.py +++ b/services/procurement/app/core/dependencies.py @@ -1,42 +1,45 @@ """ FastAPI Dependencies for Procurement Service +Uses shared authentication infrastructure with UUID validation """ -from fastapi import Header, HTTPException, status +from fastapi import Depends, HTTPException, status from uuid import UUID from typing import Optional from sqlalchemy.ext.asyncio import AsyncSession from .database import get_db +from shared.auth.decorators import get_current_tenant_id_dep async def get_current_tenant_id( - x_tenant_id: Optional[str] = Header(None, alias="X-Tenant-ID") + tenant_id: Optional[str] = Depends(get_current_tenant_id_dep) ) -> UUID: """ - Extract and validate tenant ID from request header. + Extract and validate tenant ID from request using shared infrastructure. + Adds UUID validation to ensure tenant ID format is correct. Args: - x_tenant_id: Tenant ID from X-Tenant-ID header + tenant_id: Tenant ID from shared dependency Returns: UUID: Validated tenant ID Raises: - HTTPException: If tenant ID is missing or invalid + HTTPException: If tenant ID is missing or invalid UUID format """ - if not x_tenant_id: + if not tenant_id: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, - detail="X-Tenant-ID header is required" + detail="x-tenant-id header is required" ) try: - return UUID(x_tenant_id) + return UUID(tenant_id) except (ValueError, AttributeError): raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, - detail=f"Invalid tenant ID format: {x_tenant_id}" + detail=f"Invalid tenant ID format: {tenant_id}" ) diff --git a/services/production/app/api/internal_alert_trigger.py b/services/production/app/api/internal_alert_trigger.py index 7e7c616b..467ef2ca 100644 --- a/services/production/app/api/internal_alert_trigger.py +++ b/services/production/app/api/internal_alert_trigger.py @@ -31,11 +31,11 @@ async def trigger_production_alerts( - Equipment maintenance alerts - Batch start delays - Security: Protected by X-Internal-Service header check. + Security: Protected by x-internal-service header check. """ try: # Verify internal service header - if not request or request.headers.get("X-Internal-Service") not in ["demo-session", "internal"]: + if not request or request.headers.get("x-internal-service") not in ["demo-session", "internal"]: logger.warning("Unauthorized internal API call", tenant_id=str(tenant_id)) raise HTTPException( status_code=403, diff --git a/services/production/app/api/ml_insights.py b/services/production/app/api/ml_insights.py index da48b944..f036672b 100644 --- a/services/production/app/api/ml_insights.py +++ b/services/production/app/api/ml_insights.py @@ -331,7 +331,7 @@ async def generate_yield_insights_internal( This endpoint is called by the demo-session service after cloning data. It uses the same ML logic as the public endpoint but with optimized defaults. - Security: Protected by X-Internal-Service header check. + Security: Protected by x-internal-service header check. Args: tenant_id: The tenant UUID @@ -346,7 +346,7 @@ async def generate_yield_insights_internal( } """ # Verify internal service header - if not request or request.headers.get("X-Internal-Service") not in ["demo-session", "internal"]: + if not request or request.headers.get("x-internal-service") not in ["demo-session", "internal"]: logger.warning("Unauthorized internal API call", tenant_id=tenant_id) raise HTTPException( status_code=403, diff --git a/services/tenant/app/repositories/tenant_member_repository.py b/services/tenant/app/repositories/tenant_member_repository.py index 8ea90341..ae3935f9 100644 --- a/services/tenant/app/repositories/tenant_member_repository.py +++ b/services/tenant/app/repositories/tenant_member_repository.py @@ -204,7 +204,7 @@ class TenantMemberRepository(TenantBaseRepository): f"{auth_service_url}/api/v1/auth/users/batch", json={"user_ids": user_ids}, timeout=10.0, - headers={"X-Internal-Service": "tenant-service"} + headers={"x-internal-service": "tenant-service"} ) if response.status_code == 200: @@ -226,7 +226,7 @@ class TenantMemberRepository(TenantBaseRepository): response = await client.get( f"{auth_service_url}/api/v1/auth/users/{user_id}", timeout=5.0, - headers={"X-Internal-Service": "tenant-service"} + headers={"x-internal-service": "tenant-service"} ) if response.status_code == 200: user_data = response.json() @@ -243,7 +243,7 @@ class TenantMemberRepository(TenantBaseRepository): response = await client.get( f"{auth_service_url}/api/v1/auth/users/{user_id}", timeout=5.0, - headers={"X-Internal-Service": "tenant-service"} + headers={"x-internal-service": "tenant-service"} ) if response.status_code == 200: user_data = response.json() diff --git a/services/training/app/ml/hybrid_trainer.py b/services/training/app/ml/hybrid_trainer.py index 19e12b34..7f1c068a 100644 --- a/services/training/app/ml/hybrid_trainer.py +++ b/services/training/app/ml/hybrid_trainer.py @@ -216,17 +216,24 @@ class HybridProphetXGBoost: Get Prophet predictions for given dataframe. Args: - prophet_result: Prophet model result from training + prophet_result: Prophet model result from training (contains model_path) df: DataFrame with 'ds' column Returns: Array of predictions """ - # Get the Prophet model from result - prophet_model = prophet_result.get('model') + # Get the model path from result instead of expecting the model object directly + model_path = prophet_result.get('model_path') - if prophet_model is None: - raise ValueError("Prophet model not found in result") + if model_path is None: + raise ValueError("Prophet model path not found in result") + + # Load the actual Prophet model from the stored path + try: + import joblib + prophet_model = joblib.load(model_path) + except Exception as e: + raise ValueError(f"Failed to load Prophet model from path {model_path}: {str(e)}") # Prepare dataframe for prediction pred_df = df[['ds']].copy() @@ -273,7 +280,8 @@ class HybridProphetXGBoost: 'reg_lambda': 1.0, # L2 regularization 'objective': 'reg:squarederror', 'random_state': 42, - 'n_jobs': -1 + 'n_jobs': -1, + 'early_stopping_rounds': 10 } # Initialize model @@ -285,7 +293,6 @@ class HybridProphetXGBoost: model.fit, X_train, y_train, eval_set=[(X_val, y_val)], - early_stopping_rounds=10, verbose=False ) @@ -303,109 +310,86 @@ class HybridProphetXGBoost: train_prophet_pred: np.ndarray, val_prophet_pred: np.ndarray, prophet_result: Dict[str, Any] - ) -> Dict[str, float]: + ) -> Dict[str, Any]: """ - Evaluate hybrid model vs Prophet-only on validation set. - - Args: - train_df: Training data - val_df: Validation data - train_prophet_pred: Prophet predictions on training set - val_prophet_pred: Prophet predictions on validation set - prophet_result: Prophet training result - - Returns: - Dictionary of metrics + Evaluate the overall performance of the hybrid model using threading for metrics. """ - # Get actual values - train_actual = train_df['y'].values - val_actual = val_df['y'].values - - # Get XGBoost predictions on residuals + import asyncio + + # Get XGBoost predictions on training and validation X_train = train_df[self.feature_columns].values X_val = val_df[self.feature_columns].values - - # ✅ FIX: Run blocking predict() in thread pool to avoid blocking event loop - import asyncio + train_xgb_pred = await asyncio.to_thread(self.xgb_model.predict, X_train) val_xgb_pred = await asyncio.to_thread(self.xgb_model.predict, X_val) - - # Hybrid predictions = Prophet + XGBoost residual correction + + # Hybrid prediction = Prophet prediction + XGBoost residual prediction train_hybrid_pred = train_prophet_pred + train_xgb_pred val_hybrid_pred = val_prophet_pred + val_xgb_pred - - # Calculate metrics for Prophet-only - prophet_train_mae = mean_absolute_error(train_actual, train_prophet_pred) - prophet_val_mae = mean_absolute_error(val_actual, val_prophet_pred) - prophet_train_mape = mean_absolute_percentage_error(train_actual, train_prophet_pred) * 100 - prophet_val_mape = mean_absolute_percentage_error(val_actual, val_prophet_pred) * 100 - - # Calculate metrics for Hybrid - hybrid_train_mae = mean_absolute_error(train_actual, train_hybrid_pred) - hybrid_val_mae = mean_absolute_error(val_actual, val_hybrid_pred) - hybrid_train_mape = mean_absolute_percentage_error(train_actual, train_hybrid_pred) * 100 - hybrid_val_mape = mean_absolute_percentage_error(val_actual, val_hybrid_pred) * 100 + + actual_train = train_df['y'].values + actual_val = val_df['y'].values + + # Basic RMSE calculation + train_rmse = float(np.sqrt(np.mean((actual_train - train_hybrid_pred)**2))) + val_rmse = float(np.sqrt(np.mean((actual_val - val_hybrid_pred)**2))) + + # MAE + train_mae = float(np.mean(np.abs(actual_train - train_hybrid_pred))) + val_mae = float(np.mean(np.abs(actual_val - val_hybrid_pred))) + + # MAPE (with safety for zero sales) + train_mape = float(np.mean(np.abs((actual_train - train_hybrid_pred) / np.maximum(actual_train, 1)))) + val_mape = float(np.mean(np.abs((actual_val - val_hybrid_pred) / np.maximum(actual_val, 1)))) # Calculate improvement - mae_improvement = ((prophet_val_mae - hybrid_val_mae) / prophet_val_mae) * 100 - mape_improvement = ((prophet_val_mape - hybrid_val_mape) / prophet_val_mape) * 100 + prophet_metrics = prophet_result.get("metrics", {}) + prophet_val_mae = prophet_metrics.get("val_mae", val_mae) # Fallback to hybrid if missing + prophet_val_mape = prophet_metrics.get("val_mape", val_mape) + + improvement_pct = 0.0 + if prophet_val_mape > 0: + improvement_pct = ((prophet_val_mape - val_mape) / prophet_val_mape) * 100 metrics = { - 'prophet_train_mae': float(prophet_train_mae), - 'prophet_val_mae': float(prophet_val_mae), - 'prophet_train_mape': float(prophet_train_mape), - 'prophet_val_mape': float(prophet_val_mape), - 'hybrid_train_mae': float(hybrid_train_mae), - 'hybrid_val_mae': float(hybrid_val_mae), - 'hybrid_train_mape': float(hybrid_train_mape), - 'hybrid_val_mape': float(hybrid_val_mape), - 'mae_improvement_pct': float(mae_improvement), - 'mape_improvement_pct': float(mape_improvement), - 'improvement_percentage': float(mape_improvement) # Primary metric + "train_rmse": train_rmse, + "val_rmse": val_rmse, + "train_mae": train_mae, + "val_mae": val_mae, + "train_mape": train_mape, + "val_mape": val_mape, + "prophet_val_mape": prophet_val_mape, + "hybrid_val_mape": val_mape, + "improvement_percentage": float(improvement_pct), + "prophet_metrics": prophet_metrics } + logger.info( + "Hybrid model evaluation complete", + val_rmse=val_rmse, + val_mae=val_mae, + val_mape=val_mape, + improvement=improvement_pct + ) + return metrics def _package_hybrid_model( self, prophet_result: Dict[str, Any], - metrics: Dict[str, float], + metrics: Dict[str, Any], tenant_id: str, inventory_product_id: str ) -> Dict[str, Any]: """ Package hybrid model for storage. - - Args: - prophet_result: Prophet model result - metrics: Hybrid model metrics - tenant_id: Tenant ID - inventory_product_id: Product ID - - Returns: - Model package dictionary """ return { 'model_type': 'hybrid_prophet_xgboost', - 'prophet_model': prophet_result.get('model'), + 'prophet_model_path': prophet_result.get('model_path'), 'xgboost_model': self.xgb_model, 'feature_columns': self.feature_columns, - 'prophet_metrics': { - 'train_mae': metrics['prophet_train_mae'], - 'val_mae': metrics['prophet_val_mae'], - 'train_mape': metrics['prophet_train_mape'], - 'val_mape': metrics['prophet_val_mape'] - }, - 'hybrid_metrics': { - 'train_mae': metrics['hybrid_train_mae'], - 'val_mae': metrics['hybrid_val_mae'], - 'train_mape': metrics['hybrid_train_mape'], - 'val_mape': metrics['hybrid_val_mape'] - }, - 'improvement_metrics': { - 'mae_improvement_pct': metrics['mae_improvement_pct'], - 'mape_improvement_pct': metrics['mape_improvement_pct'] - }, + 'metrics': metrics, 'tenant_id': tenant_id, 'inventory_product_id': inventory_product_id, 'trained_at': datetime.now(timezone.utc).isoformat() @@ -426,8 +410,18 @@ class HybridProphetXGBoost: Returns: DataFrame with predictions """ - # Step 1: Get Prophet predictions - prophet_model = model_data['prophet_model'] + # Step 1: Get Prophet model from path and make predictions + prophet_model_path = model_data.get('prophet_model_path') + if prophet_model_path is None: + raise ValueError("Prophet model path not found in model data") + + # Load the Prophet model from the stored path + try: + import joblib + prophet_model = joblib.load(prophet_model_path) + except Exception as e: + raise ValueError(f"Failed to load Prophet model from path {prophet_model_path}: {str(e)}") + # ✅ FIX: Run blocking predict() in thread pool to avoid blocking event loop import asyncio prophet_forecast = await asyncio.to_thread(prophet_model.predict, future_df) diff --git a/services/training/app/ml/poi_feature_integrator.py b/services/training/app/ml/poi_feature_integrator.py index 05a580dc..6ea945c5 100644 --- a/services/training/app/ml/poi_feature_integrator.py +++ b/services/training/app/ml/poi_feature_integrator.py @@ -43,86 +43,79 @@ class POIFeatureIntegrator: force_refresh: bool = False ) -> Optional[Dict[str, Any]]: """ - Fetch POI features for tenant location. + Fetch POI features for tenant location (optimized for training). - First checks if POI context exists, if not, triggers detection. + First checks if POI context exists. If not, returns None without triggering detection. + POI detection should be triggered during tenant registration, not during training. Args: tenant_id: Tenant UUID latitude: Bakery latitude longitude: Bakery longitude - force_refresh: Force re-detection + force_refresh: Force re-detection (only use if POI context already exists) Returns: - Dictionary with POI features or None if detection fails + Dictionary with POI features or None if not available """ try: # Try to get existing POI context first - if not force_refresh: - existing_context = await self.external_client.get_poi_context(tenant_id) - if existing_context: - poi_context = existing_context.get("poi_context", {}) - ml_features = poi_context.get("ml_features", {}) + existing_context = await self.external_client.get_poi_context(tenant_id) - # Check if stale - is_stale = existing_context.get("is_stale", False) - if not is_stale: + if existing_context: + poi_context = existing_context.get("poi_context", {}) + ml_features = poi_context.get("ml_features", {}) + + # Check if stale and force_refresh is requested + is_stale = existing_context.get("is_stale", False) + + if not is_stale or not force_refresh: + logger.info( + "Using existing POI context", + tenant_id=tenant_id, + is_stale=is_stale, + feature_count=len(ml_features) + ) + return ml_features + else: + logger.info( + "POI context is stale and force_refresh=True, refreshing", + tenant_id=tenant_id + ) + # Only refresh if explicitly requested and context exists + detection_result = await self.external_client.detect_poi_for_tenant( + tenant_id=tenant_id, + latitude=latitude, + longitude=longitude, + force_refresh=True + ) + + if detection_result: + poi_context = detection_result.get("poi_context", {}) + ml_features = poi_context.get("ml_features", {}) logger.info( - "Using existing POI context", - tenant_id=tenant_id + "POI refresh completed", + tenant_id=tenant_id, + feature_count=len(ml_features) ) return ml_features else: - logger.info( - "POI context is stale, refreshing", + logger.warning( + "POI refresh failed, returning existing features", tenant_id=tenant_id ) - force_refresh = True - else: - logger.info( - "No existing POI context, will detect", - tenant_id=tenant_id - ) - - # Detect or refresh POIs - logger.info( - "Detecting POIs for tenant", - tenant_id=tenant_id, - location=(latitude, longitude) - ) - - detection_result = await self.external_client.detect_poi_for_tenant( - tenant_id=tenant_id, - latitude=latitude, - longitude=longitude, - force_refresh=force_refresh - ) - - if detection_result: - poi_context = detection_result.get("poi_context", {}) - ml_features = poi_context.get("ml_features", {}) - - logger.info( - "POI detection completed", - tenant_id=tenant_id, - total_pois=poi_context.get("total_pois_detected", 0), - feature_count=len(ml_features) - ) - - return ml_features + return ml_features else: - logger.error( - "POI detection failed", + logger.info( + "No existing POI context found - POI detection should be triggered during tenant registration", tenant_id=tenant_id ) return None except Exception as e: - logger.error( - "Unexpected error fetching POI features", + logger.warning( + "Error fetching POI features - returning None", tenant_id=tenant_id, - error=str(e), - exc_info=True + error=str(e) ) return None diff --git a/services/training/app/services/data_client.py b/services/training/app/services/data_client.py index 523cebff..390cadf6 100644 --- a/services/training/app/services/data_client.py +++ b/services/training/app/services/data_client.py @@ -29,16 +29,15 @@ class DataClient: self.sales_client = get_sales_client(settings, "training") self.external_client = get_external_client(settings, "training") + # ExternalServiceClient always has get_stored_traffic_data_for_training method + self.supports_stored_traffic_data = True + # Configure timeouts for HTTP clients self._configure_timeouts() # Initialize circuit breakers for external services self._init_circuit_breakers() - # Check if the new method is available for stored traffic data - if hasattr(self.external_client, 'get_stored_traffic_data_for_training'): - self.supports_stored_traffic_data = True - def _configure_timeouts(self): """Configure appropriate timeouts for HTTP clients""" timeout = httpx.Timeout( @@ -49,14 +48,12 @@ class DataClient: ) # Apply timeout to clients if they have httpx clients + # Note: BaseServiceClient manages its own HTTP client internally if hasattr(self.sales_client, 'client') and isinstance(self.sales_client.client, httpx.AsyncClient): self.sales_client.client.timeout = timeout if hasattr(self.external_client, 'client') and isinstance(self.external_client.client, httpx.AsyncClient): self.external_client.client.timeout = timeout - else: - self.supports_stored_traffic_data = False - logger.warning("Stored traffic data method not available in external client") def _init_circuit_breakers(self): """Initialize circuit breakers for external service calls""" diff --git a/services/training/app/services/training_orchestrator.py b/services/training/app/services/training_orchestrator.py index f631a67a..1f2d4d58 100644 --- a/services/training/app/services/training_orchestrator.py +++ b/services/training/app/services/training_orchestrator.py @@ -404,22 +404,32 @@ class TrainingDataOrchestrator: tenant_id: str ) -> Dict[str, Any]: """ - Collect POI features for bakery location. + Collect POI features for bakery location (non-blocking). POI features are static (location-based, not time-varying). + This method is non-blocking with a short timeout to prevent training delays. + If POI detection hasn't been run yet, training continues without POI features. + + Returns: + Dictionary with POI features or empty dict if unavailable """ try: logger.info( - "Collecting POI features", + "Collecting POI features (non-blocking)", tenant_id=tenant_id, location=(lat, lon) ) - poi_features = await self.poi_feature_integrator.fetch_poi_features( - tenant_id=tenant_id, - latitude=lat, - longitude=lon, - force_refresh=False + # Set a short timeout to prevent blocking training + # POI detection should have been triggered during tenant registration + poi_features = await asyncio.wait_for( + self.poi_feature_integrator.fetch_poi_features( + tenant_id=tenant_id, + latitude=lat, + longitude=lon, + force_refresh=False + ), + timeout=15.0 # 15 second timeout - POI should be cached from registration ) if poi_features: @@ -430,18 +440,24 @@ class TrainingDataOrchestrator: ) else: logger.warning( - "No POI features collected (service may be unavailable)", + "No POI features collected (service may be unavailable or not yet detected)", tenant_id=tenant_id ) return poi_features or {} + except asyncio.TimeoutError: + logger.warning( + "POI collection timeout (15s) - continuing training without POI features. " + "POI detection should be triggered during tenant registration for best results.", + tenant_id=tenant_id + ) + return {} except Exception as e: - logger.error( - "Failed to collect POI features, continuing without them", + logger.warning( + "Failed to collect POI features (non-blocking) - continuing training without them", tenant_id=tenant_id, - error=str(e), - exc_info=True + error=str(e) ) return {} diff --git a/shared/clients/base_service_client.py b/shared/clients/base_service_client.py index f68db92a..335995cf 100755 --- a/shared/clients/base_service_client.py +++ b/shared/clients/base_service_client.py @@ -71,7 +71,7 @@ class ServiceAuthenticator: } if tenant_id: - headers["X-Tenant-ID"] = str(tenant_id) + headers["x-tenant-id"] = str(tenant_id) return headers diff --git a/shared/clients/forecast_client.py b/shared/clients/forecast_client.py index b7836fc5..4d275a75 100755 --- a/shared/clients/forecast_client.py +++ b/shared/clients/forecast_client.py @@ -351,7 +351,7 @@ class ForecastServiceClient(BaseServiceClient): """ Trigger demand forecasting insights for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. Used by demo-session service after cloning to generate AI insights from seeded data. Args: @@ -366,7 +366,7 @@ class ForecastServiceClient(BaseServiceClient): endpoint=f"forecasting/internal/ml/generate-demand-insights", tenant_id=tenant_id, data={"tenant_id": tenant_id}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: diff --git a/shared/clients/inventory_client.py b/shared/clients/inventory_client.py index d0030832..724e2faa 100755 --- a/shared/clients/inventory_client.py +++ b/shared/clients/inventory_client.py @@ -766,7 +766,7 @@ class InventoryServiceClient(BaseServiceClient): """ Trigger inventory alerts for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. The endpoint should trigger alerts specifically for the given tenant. Args: @@ -783,7 +783,7 @@ class InventoryServiceClient(BaseServiceClient): endpoint="inventory/internal/alerts/trigger", tenant_id=tenant_id, data={}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: @@ -819,7 +819,7 @@ class InventoryServiceClient(BaseServiceClient): """ Trigger safety stock optimization insights for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. Args: tenant_id: Tenant ID to trigger insights for @@ -833,7 +833,7 @@ class InventoryServiceClient(BaseServiceClient): endpoint="inventory/internal/ml/generate-safety-stock-insights", tenant_id=tenant_id, data={"tenant_id": tenant_id}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: diff --git a/shared/clients/procurement_client.py b/shared/clients/procurement_client.py index 71ea9c23..28135f21 100755 --- a/shared/clients/procurement_client.py +++ b/shared/clients/procurement_client.py @@ -580,7 +580,7 @@ class ProcurementServiceClient(BaseServiceClient): """ Trigger delivery tracking for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. Args: tenant_id: Tenant ID to trigger delivery tracking for @@ -596,7 +596,7 @@ class ProcurementServiceClient(BaseServiceClient): endpoint="procurement/internal/delivery-tracking/trigger", tenant_id=tenant_id, data={}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: @@ -632,7 +632,7 @@ class ProcurementServiceClient(BaseServiceClient): """ Trigger price forecasting insights for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. Args: tenant_id: Tenant ID to trigger insights for @@ -646,7 +646,7 @@ class ProcurementServiceClient(BaseServiceClient): endpoint="procurement/internal/ml/generate-price-insights", tenant_id=tenant_id, data={"tenant_id": tenant_id}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: diff --git a/shared/clients/production_client.py b/shared/clients/production_client.py index 26f170c7..83afa4d8 100755 --- a/shared/clients/production_client.py +++ b/shared/clients/production_client.py @@ -630,7 +630,7 @@ class ProductionServiceClient(BaseServiceClient): """ Trigger production alerts for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. Includes both production alerts and equipment maintenance checks. Args: @@ -647,7 +647,7 @@ class ProductionServiceClient(BaseServiceClient): endpoint="production/internal/alerts/trigger", tenant_id=tenant_id, data={}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: @@ -683,7 +683,7 @@ class ProductionServiceClient(BaseServiceClient): """ Trigger yield improvement insights for a tenant (internal service use only). - This method calls the internal endpoint which is protected by X-Internal-Service header. + This method calls the internal endpoint which is protected by x-internal-service header. Args: tenant_id: Tenant ID to trigger insights for @@ -697,7 +697,7 @@ class ProductionServiceClient(BaseServiceClient): endpoint="production/internal/ml/generate-yield-insights", tenant_id=tenant_id, data={"tenant_id": tenant_id}, - headers={"X-Internal-Service": "demo-session"} + headers={"x-internal-service": "demo-session"} ) if result: