- Updated all OpenTelemetry packages to latest versions: - opentelemetry-api: 1.27.0 → 1.39.1 - opentelemetry-sdk: 1.27.0 → 1.39.1 - opentelemetry-exporter-otlp-proto-grpc: 1.27.0 → 1.39.1 - opentelemetry-exporter-otlp-proto-http: 1.27.0 → 1.39.1 - opentelemetry-instrumentation-fastapi: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-httpx: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-redis: 0.48b0 → 0.60b1 - opentelemetry-instrumentation-sqlalchemy: 0.48b0 → 0.60b1 - Removed prometheus-client==0.23.1 from all services - Unified all services to use the same monitoring package versions Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
Sales Service
Overview
The Sales Service is the foundational data layer of Bakery-IA, responsible for collecting, processing, and analyzing historical sales data. It provides the critical training data for AI forecasting models and delivers comprehensive sales analytics to help bakery owners understand their business performance. This service handles bulk data imports, real-time sales tracking, and generates actionable insights from sales patterns.
Key Features
Sales Data Management
- Historical Sales Recording - Complete sales transaction history with timestamps
- Product Catalog Integration - Link sales to products for detailed analytics
- Multi-Channel Support - Track sales from POS, online orders, and manual entries
- Data Validation - Ensure data quality and consistency
- Bulk Import/Export - CSV/Excel file processing for historical data migration
- Real-Time Updates - Live sales data ingestion from POS systems
Sales Analytics
- Revenue Tracking - Daily, weekly, monthly, yearly revenue reports
- Product Performance - Best sellers, slow movers, profitability by product
- Trend Analysis - Identify growth patterns and seasonal variations
- Customer Insights - Purchase frequency, average transaction value
- Comparative Analytics - Period-over-period comparisons
- Sales Forecasting Input - Clean, structured data for ML training
Data Import & Onboarding
- CSV Upload - Import historical sales from spreadsheets
- Excel Support - Process .xlsx files with multiple sheets
- Column Mapping - Flexible mapping of user data to system fields
- Duplicate Detection - Prevent duplicate sales entries
- Error Handling - Detailed error reporting for failed imports
- Progress Tracking - Real-time import job status updates
Audit & Compliance
- Complete Audit Trail - Track all data modifications
- Data Retention - Configurable retention policies
- GDPR Compliance - Customer data anonymization and deletion
- Export Capabilities - Generate reports for accounting and tax compliance
Business Value
For Bakery Owners
- Business Intelligence - Understand which products drive revenue
- Trend Identification - Spot seasonal patterns and optimize inventory
- Performance Tracking - Monitor daily/weekly/monthly KPIs
- Historical Analysis - Learn from past performance to improve future decisions
- Tax Compliance - Export sales data for accounting and tax reporting
Quantifiable Impact
- Time Savings: 5-8 hours/week on manual sales tracking and reporting
- Accuracy: 99%+ data accuracy vs. manual entry
- Insights Speed: Real-time analytics vs. weekly/monthly manual reports
- Forecasting Foundation: Clean sales data improves forecast accuracy by 15-25%
For AI/ML Systems
- Training Data Quality - High-quality, structured data for Prophet models
- Feature Engineering - Pre-processed data with temporal features
- Data Completeness - Fill gaps and handle missing data
- Real-Time Updates - Continuous model improvement with new sales data
Technology Stack
- Framework: FastAPI (Python 3.11+) - Async web framework
- Database: PostgreSQL 17 - Sales transaction storage
- Data Processing: Pandas, NumPy - CSV/Excel processing and analytics
- ORM: SQLAlchemy 2.0 (async) - Database abstraction
- File Processing: openpyxl - Excel file handling
- Validation: Pydantic - Data validation and serialization
- Logging: Structlog - Structured JSON logging
- Metrics: Prometheus Client - Custom metrics
- Caching: Redis 7.4 - Analytics cache
API Endpoints (Key Routes)
Sales Data Management
POST /api/v1/sales- Create single sales recordPOST /api/v1/sales/batch- Create multiple sales recordsGET /api/v1/sales- List sales with filtering and paginationGET /api/v1/sales/{sale_id}- Get specific sale detailsPUT /api/v1/sales/{sale_id}- Update sales recordDELETE /api/v1/sales/{sale_id}- Delete sales record (soft delete)
Bulk Operations
POST /api/v1/sales/import/csv- Upload CSV file for bulk importPOST /api/v1/sales/import/excel- Upload Excel file for bulk importGET /api/v1/sales/import/jobs- List import job historyGET /api/v1/sales/import/jobs/{job_id}- Get import job statusGET /api/v1/sales/export/csv- Export sales data to CSVGET /api/v1/sales/export/excel- Export sales data to Excel
Analytics
GET /api/v1/sales/analytics/summary- Overall sales summaryGET /api/v1/sales/analytics/revenue- Revenue by periodGET /api/v1/sales/analytics/products- Product performance metricsGET /api/v1/sales/analytics/trends- Trend analysis and patternsGET /api/v1/sales/analytics/comparison- Period comparisonGET /api/v1/sales/analytics/top-products- Best selling products
Data Quality
POST /api/v1/sales/validate- Validate sales data before importGET /api/v1/sales/duplicates- Find potential duplicate recordsPOST /api/v1/sales/clean- Clean and normalize dataGET /api/v1/sales/data-quality- Data quality metrics
Database Schema
Main Tables
sales_data
CREATE TABLE sales_data (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
product_id UUID NOT NULL,
sale_date DATE NOT NULL,
sale_timestamp TIMESTAMP NOT NULL,
quantity DECIMAL(10, 2) NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
currency VARCHAR(3) DEFAULT 'EUR',
channel VARCHAR(50), -- pos, online, manual
location VARCHAR(255),
customer_id UUID,
transaction_id VARCHAR(100),
payment_method VARCHAR(50),
discount_amount DECIMAL(10, 2) DEFAULT 0,
tax_amount DECIMAL(10, 2),
notes TEXT,
metadata JSONB,
is_deleted BOOLEAN DEFAULT FALSE,
deleted_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
created_by UUID,
INDEX idx_tenant_date (tenant_id, sale_date),
INDEX idx_product_date (product_id, sale_date),
INDEX idx_transaction (transaction_id)
);
sales_import_jobs
CREATE TABLE sales_import_jobs (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
job_name VARCHAR(255),
file_name VARCHAR(255),
file_size_bytes BIGINT,
file_type VARCHAR(50), -- csv, xlsx
total_rows INTEGER,
processed_rows INTEGER DEFAULT 0,
successful_rows INTEGER DEFAULT 0,
failed_rows INTEGER DEFAULT 0,
status VARCHAR(50), -- pending, processing, completed, failed
error_log JSONB,
column_mapping JSONB,
started_at TIMESTAMP,
completed_at TIMESTAMP,
created_by UUID,
created_at TIMESTAMP DEFAULT NOW()
);
sales_products (Cache)
CREATE TABLE sales_products (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
product_name VARCHAR(255) NOT NULL,
product_category VARCHAR(100),
unit VARCHAR(50),
last_sale_date DATE,
total_sales_count INTEGER DEFAULT 0,
total_revenue DECIMAL(12, 2) DEFAULT 0,
average_price DECIMAL(10, 2),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(tenant_id, product_name)
);
Events & Messaging
Published Events (RabbitMQ)
Exchange: sales
Routing Key: sales.data.imported
Sales Data Imported Event
{
"event_type": "sales_data_imported",
"tenant_id": "uuid",
"import_job_id": "uuid",
"file_name": "sales_history_2024.csv",
"total_records": 15000,
"successful_records": 14850,
"failed_records": 150,
"date_range": {
"start_date": "2024-01-01",
"end_date": "2024-12-31"
},
"products_affected": 45,
"total_revenue": 125000.50,
"trigger_retraining": true,
"timestamp": "2025-11-06T10:30:00Z"
}
Consumed Events
- From POS: Real-time sales transactions
- From Orders: Completed order sales data
Custom Metrics (Prometheus)
sales_records_created_total = Counter(
'sales_records_created_total',
'Total sales records created',
['tenant_id', 'channel'] # pos, online, manual
)
sales_import_jobs_total = Counter(
'sales_import_jobs_total',
'Total import jobs',
['tenant_id', 'status', 'file_type']
)
sales_revenue_total = Counter(
'sales_revenue_euros_total',
'Total sales revenue in euros',
['tenant_id', 'product_category']
)
import_processing_duration = Histogram(
'sales_import_duration_seconds',
'Import job processing time',
['tenant_id', 'file_type'],
buckets=[1, 5, 10, 30, 60, 120, 300, 600]
)
data_quality_score = Gauge(
'sales_data_quality_score',
'Data quality score 0-100',
['tenant_id']
)
Configuration
Environment Variables
Service Configuration:
PORT- Service port (default: 8002)DATABASE_URL- PostgreSQL connection stringREDIS_URL- Redis connection stringRABBITMQ_URL- RabbitMQ connection string
Import Configuration:
MAX_IMPORT_FILE_SIZE_MB- Maximum file size (default: 50)MAX_IMPORT_ROWS- Maximum rows per import (default: 100000)IMPORT_BATCH_SIZE- Rows per batch insert (default: 1000)ENABLE_DUPLICATE_DETECTION- Check for duplicates (default: true)
Data Retention:
SALES_DATA_RETENTION_YEARS- Years to keep data (default: 10)ENABLE_SOFT_DELETE- Use soft deletes (default: true)AUTO_CLEANUP_ENABLED- Automatic old data cleanup (default: false)
Analytics Cache:
ANALYTICS_CACHE_TTL_MINUTES- Cache lifetime (default: 60)ENABLE_ANALYTICS_CACHE- Enable caching (default: true)
Development Setup
Prerequisites
- Python 3.11+
- PostgreSQL 17
- Redis 7.4
- RabbitMQ 4.1 (optional)
Local Development
cd services/sales
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export DATABASE_URL=postgresql://user:pass@localhost:5432/sales
export REDIS_URL=redis://localhost:6379/0
alembic upgrade head
python main.py
Testing
# Unit tests
pytest tests/unit/ -v
# Integration tests
pytest tests/integration/ -v
# Import tests
pytest tests/import/ -v
# Test with coverage
pytest --cov=app tests/ --cov-report=html
Sample Data Import
# Create sample CSV
cat > sample_sales.csv << EOF
date,product,quantity,price
2024-01-01,Baguette,50,1.50
2024-01-01,Croissant,30,2.00
2024-01-02,Baguette,55,1.50
EOF
# Import via API
curl -X POST http://localhost:8002/api/v1/sales/import/csv \
-H "Content-Type: multipart/form-data" \
-F "file=@sample_sales.csv" \
-F "tenant_id=your-tenant-id"
Integration Points
Dependencies
- PostgreSQL - Sales data storage
- Redis - Analytics caching
- RabbitMQ - Event publishing
- File System - Temporary file storage for imports
Dependents
- Forecasting Service - Fetch sales data for model training
- Training Service - Historical sales for ML training
- Analytics Dashboard - Display sales reports and charts
- AI Insights Service - Analyze sales patterns
- Inventory Service - Correlate sales with stock levels
- Production Service - Plan production based on sales history
Data Quality Measures
Validation Rules
# Sales record validation
class SalesRecordValidator:
def validate(self, record: dict) -> tuple[bool, list[str]]:
errors = []
# Required fields
if not record.get('sale_date'):
errors.append("sale_date is required")
if not record.get('product_id'):
errors.append("product_id is required")
if not record.get('quantity') or record['quantity'] <= 0:
errors.append("quantity must be positive")
if not record.get('unit_price') or record['unit_price'] < 0:
errors.append("unit_price cannot be negative")
# Business logic validation
if record.get('discount_amount', 0) > record.get('total_amount', 0):
errors.append("discount cannot exceed total amount")
# Date validation
if record.get('sale_date'):
sale_date = parse_date(record['sale_date'])
if sale_date > datetime.now().date():
errors.append("sale_date cannot be in the future")
return len(errors) == 0, errors
Duplicate Detection
def detect_duplicates(tenant_id: str, records: list[dict]) -> list[dict]:
"""Find potential duplicate sales records"""
duplicates = []
for record in records:
existing = db.query(SalesData).filter(
SalesData.tenant_id == tenant_id,
SalesData.product_id == record['product_id'],
SalesData.sale_timestamp.between(
record['sale_timestamp'] - timedelta(minutes=5),
record['sale_timestamp'] + timedelta(minutes=5)
),
SalesData.quantity == record['quantity'],
SalesData.total_amount == record['total_amount']
).first()
if existing:
duplicates.append({
'new_record': record,
'existing_record_id': existing.id,
'match_confidence': calculate_match_confidence(record, existing)
})
return duplicates
Security Measures
Data Protection
- Tenant Isolation - All sales data scoped to tenant_id
- Input Validation - Pydantic schemas for all inputs
- SQL Injection Prevention - Parameterized queries
- File Upload Security - Virus scanning, size limits, type validation
- Soft Deletes - Preserve data for audit trail
Access Control
- Authentication Required - JWT tokens for all endpoints
- Role-Based Access - Different permissions for owner/manager/staff
- Audit Logging - Track all data modifications
- GDPR Compliance - Customer data anonymization and export
Performance Optimization
Database Optimization
- Indexes - Optimized indexes on tenant_id, sale_date, product_id
- Partitioning - Table partitioning by year for large datasets
- Batch Inserts - Insert 1000 rows per transaction during imports
- Connection Pooling - Reuse database connections
- Query Optimization - Materialized views for common analytics
Import Performance
# Batch import optimization
async def bulk_import_sales(records: list[dict], batch_size: int = 1000):
"""Optimized bulk import with batching"""
total_records = len(records)
for i in range(0, total_records, batch_size):
batch = records[i:i + batch_size]
# Prepare batch for bulk insert
sales_objects = [SalesData(**record) for record in batch]
# Bulk insert
db.bulk_save_objects(sales_objects)
await db.commit()
# Update progress
progress = (i + len(batch)) / total_records * 100
await update_import_progress(job_id, progress)
Troubleshooting
Common Issues
Issue: Import fails with "File too large" error
- Cause: File exceeds
MAX_IMPORT_FILE_SIZE_MB - Solution: Split file into smaller chunks or increase limit
Issue: Duplicate records detected
- Cause: Re-importing same data or POS sync issues
- Solution: Enable duplicate detection or manual review
Issue: Slow analytics queries
- Cause: Large dataset without proper indexes
- Solution: Add indexes, enable caching, or use materialized views
Issue: Missing sales data
- Cause: POS integration not working
- Solution: Check POS service logs and webhook configuration
Competitive Advantages
- Bulk Import - Easy migration from existing systems
- Multi-Channel Support - Unified view across POS, online, manual
- Real-Time Analytics - Instant insights vs. batch processing
- Data Quality - Automated validation and cleaning
- ML-Ready Data - Structured data perfect for forecasting
- Spanish Market - Euro currency, Spanish date formats
- GDPR Compliant - Built-in compliance features
Future Enhancements
- Real-Time Streaming - Apache Kafka for high-volume sales
- Advanced Analytics - Customer segmentation, cohort analysis
- Predictive Analytics - Predict next purchase, customer lifetime value
- Multi-Currency - Support for international bakeries
- Mobile POS - Native mobile sales capture apps
- Blockchain Audit - Immutable sales records for compliance
- AI-Powered Cleaning - Automatic data quality improvements
For VUE Madrid Business Plan: The Sales Service provides the foundational data infrastructure that powers all AI/ML capabilities in Bakery-IA. The ability to easily import historical data (15,000+ records in minutes) and generate real-time analytics demonstrates technical sophistication and reduces customer onboarding time from days to hours. This is critical for rapid customer acquisition and SaaS scalability.