Files

Urtzi Alfaro 2eeebfc1e0 Fix Alembic issue

2025-10-01 11:24:06 +02:00

13 KiB

Raw Blame History

Migration Scripts Documentation

This document describes the migration regeneration scripts and the improvements made to ensure reliable migration generation.

Overview

The migration system consists of:

Main migration generation script (regenerate_migrations_k8s.sh)
Database cleanup helper (cleanup_databases_k8s.sh)
Enhanced DatabaseInitManager (shared/database/init_manager.py)

Problem Summary

The original migration generation script had several critical issues:

Root Cause

Tables already existed in databases - Created by K8s migration jobs using create_all() fallback
Table drop mechanism failed silently - Errors were hidden, script continued anyway
Alembic detected no changes - When tables matched models, empty migrations were generated
File copy verification was insufficient - kubectl cp reported success but files weren't copied locally

Impact

11 out of 14 services generated empty migrations (only pass statements)
Only 3 services (pos, suppliers, alert-processor) worked correctly because their DBs were clean
No visibility into actual errors during table drops
Migration files weren't being copied to local machine despite "success" messages

Solutions Implemented

1. Fixed Script Table Drop Mechanism

File: regenerate_migrations_k8s.sh

Changes Made:

Before (Lines 404-405):

# Failed silently, errors hidden in log file
kubectl exec ... -- sh -c "DROP TABLE ..." 2>>$LOG_FILE

After (Lines 397-512):

# Complete database schema reset with proper error handling
async with engine.begin() as conn:
    await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE'))
    await conn.execute(text('CREATE SCHEMA public'))
    await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC'))

Key Improvements:

✅ Uses engine.begin() instead of engine.connect() for proper transaction management
✅ Drops entire schema with CASCADE for guaranteed clean slate
✅ Captures and displays error output in real-time (not hidden in logs)
✅ Falls back to individual table drops if schema drop fails
✅ Verifies database is empty after cleanup
✅ Fails fast if cleanup fails (prevents generating empty migrations)

2. Enhanced kubectl cp Verification

File: regenerate_migrations_k8s.sh (Lines 547-595)

Improvements:

# Verify file was actually copied
if [ $CP_EXIT_CODE -eq 0 ] && [ -f "path/to/file" ]; then
    LOCAL_FILE_SIZE=$(wc -c < "path/to/file" | tr -d ' ')

    if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then
        echo "✓ Migration file copied: $FILENAME ($LOCAL_FILE_SIZE bytes)"
    else
        echo "✗ Migration file is empty (0 bytes)"
        # Clean up and fail
    fi
fi

Key Improvements:

✅ Checks exit code AND file existence
✅ Verifies file size > 0 bytes
✅ Displays actual error output from kubectl cp
✅ Removes empty files automatically
✅ Better warning messages for empty migrations

3. Enhanced Error Visibility

Changes Throughout Script:

✅ All Python error output captured and displayed: 2>&1 instead of 2>>$LOG_FILE
✅ Error messages shown in console immediately
✅ Detailed failure reasons in summary
✅ Exit codes checked for all critical operations

4. Modified DatabaseInitManager

File: shared/database/init_manager.py

New Features:

Environment-Aware Fallback Control:

def __init__(
    self,
    # ... existing params
    allow_create_all_fallback: bool = True,
    environment: Optional[str] = None
):
    self.environment = environment or os.getenv('ENVIRONMENT', 'development')
    self.allow_create_all_fallback = allow_create_all_fallback

Production Protection (Lines 74-93):

elif not db_state["has_migrations"]:
    if self.allow_create_all_fallback:
        # Development mode: use create_all()
        self.logger.warning("No migrations found - using create_all() as fallback")
        result = await self._handle_no_migrations()
    else:
        # Production mode: FAIL instead of using create_all()
        error_msg = (
            f"No migration files found for {self.service_name} and "
            f"create_all() fallback is disabled (environment: {self.environment}). "
            f"Migration files must be generated before deployment."
        )
        raise Exception(error_msg)

Key Improvements:

✅ Auto-detects environment from ENVIRONMENT env var
✅ Disables create_all() in production - Forces proper migrations
✅ Allows fallback in dev/local/test - Maintains developer convenience
✅ Clear error messages when migrations are missing
✅ Backwards compatible - Default behavior unchanged

Environment Detection:

Environment Value	Fallback Allowed?	Behavior
`development`, `dev`, `local`, `test`	✅ Yes	Uses `create_all()` if no migrations
`staging`, `production`, `prod`	❌ No	Fails with clear error message
Not set (default: `development`)	✅ Yes	Uses `create_all()` if no migrations

5. Pre-flight Checks

File: regenerate_migrations_k8s.sh (Lines 75-187)

New Pre-flight Check System:

preflight_checks() {
    # Check kubectl installation and version
    # Check Kubernetes cluster connectivity
    # Check namespace exists
    # Check service pods are running
    # Check database drivers available
    # Check local directory structure
    # Check disk space
}

Verifications:

✅ kubectl installation and version
✅ Kubernetes cluster connectivity
✅ Namespace exists
✅ Service pods running (shows count: X/14)
✅ Database drivers (asyncpg) available
✅ Local migration directories exist
✅ Sufficient disk space
✅ Option to continue even if checks fail

6. Database Cleanup Helper Script

New File: cleanup_databases_k8s.sh

Purpose:

Standalone script to manually clean all service databases before running migration generation.

Usage:

# Clean all databases (with confirmation)
./cleanup_databases_k8s.sh

# Clean all databases without confirmation
./cleanup_databases_k8s.sh --yes

# Clean only specific service
./cleanup_databases_k8s.sh --service auth --yes

# Use different namespace
./cleanup_databases_k8s.sh --namespace staging

Features:

✅ Drops all tables using schema CASCADE
✅ Verifies cleanup success
✅ Shows before/after table counts
✅ Can target specific services
✅ Requires explicit confirmation (unless --yes)
✅ Comprehensive summary with success/failure counts

Recommended Workflow

For Clean Migration Generation:

# Step 1: Clean all databases
./cleanup_databases_k8s.sh --yes

# Step 2: Generate migrations
./regenerate_migrations_k8s.sh --verbose

# Step 3: Review generated migrations
ls -lh services/*/migrations/versions/

# Step 4: Apply migrations (if testing)
./regenerate_migrations_k8s.sh --apply

For Production Deployment:

Local Development:

# Generate migrations with clean databases
./cleanup_databases_k8s.sh --yes
./regenerate_migrations_k8s.sh --verbose

Commit Migrations:

git add services/*/migrations/versions/*.py
git commit -m "Add initial schema migrations"

Build Docker Images:
- Migration files are included in Docker images
- No runtime generation needed
Deploy to Production:
- Set ENVIRONMENT=production in K8s manifests
- If migrations missing → Deployment will fail with clear error
- No create_all() fallback in production

Environment Variables

For DatabaseInitManager:

# Kubernetes deployment example
env:
  - name: ENVIRONMENT
    value: "production"  # or "staging", "development", "local", "test"

Behavior by Environment:

development/dev/local/test: Allows create_all() fallback if no migrations
production/staging/prod: Requires migrations, fails without them

Script Options

regenerate_migrations_k8s.sh

./regenerate_migrations_k8s.sh [OPTIONS]

Options:
  --dry-run         Show what would be done without making changes
  --skip-backup     Skip backing up existing migrations
  --apply           Automatically apply migrations after generation
  --check-existing  Check for and copy existing migrations from pods first
  --verbose         Enable detailed logging
  --skip-db-check   Skip database connectivity check
  --namespace NAME  Use specific Kubernetes namespace (default: bakery-ia)

cleanup_databases_k8s.sh

./cleanup_databases_k8s.sh [OPTIONS]

Options:
  --namespace NAME  Use specific Kubernetes namespace (default: bakery-ia)
  --service NAME    Clean only specific service database
  --yes             Skip confirmation prompt

Troubleshooting

Problem: Empty Migrations Generated

Symptoms:

def upgrade() -> None:
    pass

def downgrade() -> None:
    pass

Root Cause: Tables already exist in database matching models

Solution:

# Clean database first
./cleanup_databases_k8s.sh --service <service-name> --yes

# Regenerate migrations
./regenerate_migrations_k8s.sh --verbose

Problem: "Database cleanup failed"

Symptoms:

✗ Database schema reset failed
ERROR: permission denied for schema public

Solution: Check database user permissions. User needs DROP SCHEMA privilege:

GRANT ALL PRIVILEGES ON SCHEMA public TO <service_user>;

Problem: "No migration file found in pod"

Symptoms:

✗ No migration file found in pod

Possible Causes:

Alembic autogenerate failed (check logs)
Models not properly imported
Migration directory permissions

Solution:

# Check pod logs
kubectl logs -n bakery-ia <pod-name> -c <service>-service

# Check if models are importable
kubectl exec -n bakery-ia <pod-name> -c <service>-service -- \
  python3 -c "from app.models import *; print('OK')"

Problem: kubectl cp Shows Success But File Not Copied

Symptoms:

✓ Migration file copied: file.py
# But ls shows empty directory

Solution: The new script now verifies file size and will show:

✗ Migration file is empty (0 bytes)

If this persists, check:

Filesystem permissions
Available disk space
Pod container status

Testing

Verify Script Improvements:

# 1. Run pre-flight checks
./regenerate_migrations_k8s.sh --dry-run

# 2. Test database cleanup
./cleanup_databases_k8s.sh --service auth --yes

# 3. Verify database is empty
kubectl exec -n bakery-ia <auth-pod> -c auth-service -- \
  python3 -c "
import asyncio, os
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy import text

async def check():
    engine = create_async_engine(os.getenv('AUTH_DATABASE_URL'))
    async with engine.connect() as conn:
        result = await conn.execute(text('SELECT COUNT(*) FROM pg_tables WHERE schemaname=\\'public\\''))
        print(f'Tables: {result.scalar()}')
    await engine.dispose()

asyncio.run(check())
"

# Expected output: Tables: 0

# 4. Generate migration
./regenerate_migrations_k8s.sh --verbose

# 5. Verify migration has content
cat services/auth/migrations/versions/*.py | grep "op.create_table"

Migration File Validation

Valid Migration (Has Schema Operations):

def upgrade() -> None:
    op.create_table('users',
        sa.Column('id', sa.UUID(), nullable=False),
        sa.Column('email', sa.String(255), nullable=False),
        # ...
    )

Invalid Migration (Empty):

def upgrade() -> None:
    pass  # ⚠ WARNING: No schema operations!

The script now:

✅ Detects empty migrations
✅ Shows warning with explanation
✅ Suggests checking database cleanup

Summary of Changes

Area	Before	After
Table Drops	Failed silently, errors hidden	Proper error handling, visible errors
Database Reset	Individual table drops (didn't work)	Full schema DROP CASCADE (guaranteed clean)
File Copy	No verification	Checks exit code, file existence, and size
Error Visibility	Errors redirected to log file	Errors shown in console immediately
Production Safety	Always allowed create_all() fallback	Fails in production without migrations
Pre-flight Checks	Basic kubectl check only	Comprehensive environment verification
Database Cleanup	Manual kubectl commands	Dedicated helper script
Empty Migration Detection	Silent generation	Clear warnings with explanation

Future Improvements (Not Implemented)

Potential future enhancements:

Parallel migration generation for faster execution
Migration content diffing against previous versions
Automatic rollback on migration generation failure
Integration with CI/CD pipelines
Migration validation against database constraints
Automatic schema comparison and drift detection

regenerate_migrations_k8s.sh - Main migration generation script
cleanup_databases_k8s.sh - Database cleanup helper
shared/database/init_manager.py - Enhanced database initialization manager
services/*/migrations/versions/*.py - Generated migration files
services/*/migrations/env.py - Alembic environment configuration

13 KiB Raw Blame History

Migration Scripts Documentation

Overview

Problem Summary

Root Cause

Impact

Solutions Implemented

1. Fixed Script Table Drop Mechanism

Changes Made:

Key Improvements:

2. Enhanced kubectl cp Verification

Improvements:

Key Improvements:

3. Enhanced Error Visibility

Changes Throughout Script:

4. Modified DatabaseInitManager

New Features:

Key Improvements:

Environment Detection:

5. Pre-flight Checks

New Pre-flight Check System:

Verifications:

6. Database Cleanup Helper Script

Purpose:

Usage:

Features:

Recommended Workflow

For Clean Migration Generation:

For Production Deployment:

Environment Variables

For DatabaseInitManager:

Script Options

regenerate_migrations_k8s.sh

cleanup_databases_k8s.sh

Troubleshooting

Problem: Empty Migrations Generated

Problem: "Database cleanup failed"

Problem: "No migration file found in pod"

Problem: kubectl cp Shows Success But File Not Copied

Testing

Verify Script Improvements:

Migration File Validation

Valid Migration (Has Schema Operations):

Invalid Migration (Empty):

Summary of Changes

Future Improvements (Not Implemented)

Related Files

13 KiB

Raw Blame History