Files
bakery-ia/MIGRATION_SCRIPTS_README.md
2025-10-01 11:24:06 +02:00

13 KiB

Migration Scripts Documentation

This document describes the migration regeneration scripts and the improvements made to ensure reliable migration generation.

Overview

The migration system consists of:

  1. Main migration generation script (regenerate_migrations_k8s.sh)
  2. Database cleanup helper (cleanup_databases_k8s.sh)
  3. Enhanced DatabaseInitManager (shared/database/init_manager.py)

Problem Summary

The original migration generation script had several critical issues:

Root Cause

  1. Tables already existed in databases - Created by K8s migration jobs using create_all() fallback
  2. Table drop mechanism failed silently - Errors were hidden, script continued anyway
  3. Alembic detected no changes - When tables matched models, empty migrations were generated
  4. File copy verification was insufficient - kubectl cp reported success but files weren't copied locally

Impact

  • 11 out of 14 services generated empty migrations (only pass statements)
  • Only 3 services (pos, suppliers, alert-processor) worked correctly because their DBs were clean
  • No visibility into actual errors during table drops
  • Migration files weren't being copied to local machine despite "success" messages

Solutions Implemented

1. Fixed Script Table Drop Mechanism

File: regenerate_migrations_k8s.sh

Changes Made:

Before (Lines 404-405):

# Failed silently, errors hidden in log file
kubectl exec ... -- sh -c "DROP TABLE ..." 2>>$LOG_FILE

After (Lines 397-512):

# Complete database schema reset with proper error handling
async with engine.begin() as conn:
    await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE'))
    await conn.execute(text('CREATE SCHEMA public'))
    await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC'))

Key Improvements:

  • Uses engine.begin() instead of engine.connect() for proper transaction management
  • Drops entire schema with CASCADE for guaranteed clean slate
  • Captures and displays error output in real-time (not hidden in logs)
  • Falls back to individual table drops if schema drop fails
  • Verifies database is empty after cleanup
  • Fails fast if cleanup fails (prevents generating empty migrations)

2. Enhanced kubectl cp Verification

File: regenerate_migrations_k8s.sh (Lines 547-595)

Improvements:

# Verify file was actually copied
if [ $CP_EXIT_CODE -eq 0 ] && [ -f "path/to/file" ]; then
    LOCAL_FILE_SIZE=$(wc -c < "path/to/file" | tr -d ' ')

    if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then
        echo "✓ Migration file copied: $FILENAME ($LOCAL_FILE_SIZE bytes)"
    else
        echo "✗ Migration file is empty (0 bytes)"
        # Clean up and fail
    fi
fi

Key Improvements:

  • Checks exit code AND file existence
  • Verifies file size > 0 bytes
  • Displays actual error output from kubectl cp
  • Removes empty files automatically
  • Better warning messages for empty migrations

3. Enhanced Error Visibility

Changes Throughout Script:

  • All Python error output captured and displayed: 2>&1 instead of 2>>$LOG_FILE
  • Error messages shown in console immediately
  • Detailed failure reasons in summary
  • Exit codes checked for all critical operations

4. Modified DatabaseInitManager

File: shared/database/init_manager.py

New Features:

Environment-Aware Fallback Control:

def __init__(
    self,
    # ... existing params
    allow_create_all_fallback: bool = True,
    environment: Optional[str] = None
):
    self.environment = environment or os.getenv('ENVIRONMENT', 'development')
    self.allow_create_all_fallback = allow_create_all_fallback

Production Protection (Lines 74-93):

elif not db_state["has_migrations"]:
    if self.allow_create_all_fallback:
        # Development mode: use create_all()
        self.logger.warning("No migrations found - using create_all() as fallback")
        result = await self._handle_no_migrations()
    else:
        # Production mode: FAIL instead of using create_all()
        error_msg = (
            f"No migration files found for {self.service_name} and "
            f"create_all() fallback is disabled (environment: {self.environment}). "
            f"Migration files must be generated before deployment."
        )
        raise Exception(error_msg)

Key Improvements:

  • Auto-detects environment from ENVIRONMENT env var
  • Disables create_all() in production - Forces proper migrations
  • Allows fallback in dev/local/test - Maintains developer convenience
  • Clear error messages when migrations are missing
  • Backwards compatible - Default behavior unchanged

Environment Detection:

Environment Value Fallback Allowed? Behavior
development, dev, local, test Yes Uses create_all() if no migrations
staging, production, prod No Fails with clear error message
Not set (default: development) Yes Uses create_all() if no migrations

5. Pre-flight Checks

File: regenerate_migrations_k8s.sh (Lines 75-187)

New Pre-flight Check System:

preflight_checks() {
    # Check kubectl installation and version
    # Check Kubernetes cluster connectivity
    # Check namespace exists
    # Check service pods are running
    # Check database drivers available
    # Check local directory structure
    # Check disk space
}

Verifications:

  • kubectl installation and version
  • Kubernetes cluster connectivity
  • Namespace exists
  • Service pods running (shows count: X/14)
  • Database drivers (asyncpg) available
  • Local migration directories exist
  • Sufficient disk space
  • Option to continue even if checks fail

6. Database Cleanup Helper Script

New File: cleanup_databases_k8s.sh

Purpose:

Standalone script to manually clean all service databases before running migration generation.

Usage:

# Clean all databases (with confirmation)
./cleanup_databases_k8s.sh

# Clean all databases without confirmation
./cleanup_databases_k8s.sh --yes

# Clean only specific service
./cleanup_databases_k8s.sh --service auth --yes

# Use different namespace
./cleanup_databases_k8s.sh --namespace staging

Features:

  • Drops all tables using schema CASCADE
  • Verifies cleanup success
  • Shows before/after table counts
  • Can target specific services
  • Requires explicit confirmation (unless --yes)
  • Comprehensive summary with success/failure counts

For Clean Migration Generation:

# Step 1: Clean all databases
./cleanup_databases_k8s.sh --yes

# Step 2: Generate migrations
./regenerate_migrations_k8s.sh --verbose

# Step 3: Review generated migrations
ls -lh services/*/migrations/versions/

# Step 4: Apply migrations (if testing)
./regenerate_migrations_k8s.sh --apply

For Production Deployment:

  1. Local Development:

    # Generate migrations with clean databases
    ./cleanup_databases_k8s.sh --yes
    ./regenerate_migrations_k8s.sh --verbose
    
  2. Commit Migrations:

    git add services/*/migrations/versions/*.py
    git commit -m "Add initial schema migrations"
    
  3. Build Docker Images:

    • Migration files are included in Docker images
    • No runtime generation needed
  4. Deploy to Production:

    • Set ENVIRONMENT=production in K8s manifests
    • If migrations missing → Deployment will fail with clear error
    • No create_all() fallback in production

Environment Variables

For DatabaseInitManager:

# Kubernetes deployment example
env:
  - name: ENVIRONMENT
    value: "production"  # or "staging", "development", "local", "test"

Behavior by Environment:

  • development/dev/local/test: Allows create_all() fallback if no migrations
  • production/staging/prod: Requires migrations, fails without them

Script Options

regenerate_migrations_k8s.sh

./regenerate_migrations_k8s.sh [OPTIONS]

Options:
  --dry-run         Show what would be done without making changes
  --skip-backup     Skip backing up existing migrations
  --apply           Automatically apply migrations after generation
  --check-existing  Check for and copy existing migrations from pods first
  --verbose         Enable detailed logging
  --skip-db-check   Skip database connectivity check
  --namespace NAME  Use specific Kubernetes namespace (default: bakery-ia)

cleanup_databases_k8s.sh

./cleanup_databases_k8s.sh [OPTIONS]

Options:
  --namespace NAME  Use specific Kubernetes namespace (default: bakery-ia)
  --service NAME    Clean only specific service database
  --yes             Skip confirmation prompt

Troubleshooting

Problem: Empty Migrations Generated

Symptoms:

def upgrade() -> None:
    pass

def downgrade() -> None:
    pass

Root Cause: Tables already exist in database matching models

Solution:

# Clean database first
./cleanup_databases_k8s.sh --service <service-name> --yes

# Regenerate migrations
./regenerate_migrations_k8s.sh --verbose

Problem: "Database cleanup failed"

Symptoms:

✗ Database schema reset failed
ERROR: permission denied for schema public

Solution: Check database user permissions. User needs DROP SCHEMA privilege:

GRANT ALL PRIVILEGES ON SCHEMA public TO <service_user>;

Problem: "No migration file found in pod"

Symptoms:

✗ No migration file found in pod

Possible Causes:

  1. Alembic autogenerate failed (check logs)
  2. Models not properly imported
  3. Migration directory permissions

Solution:

# Check pod logs
kubectl logs -n bakery-ia <pod-name> -c <service>-service

# Check if models are importable
kubectl exec -n bakery-ia <pod-name> -c <service>-service -- \
  python3 -c "from app.models import *; print('OK')"

Problem: kubectl cp Shows Success But File Not Copied

Symptoms:

✓ Migration file copied: file.py
# But ls shows empty directory

Solution: The new script now verifies file size and will show:

✗ Migration file is empty (0 bytes)

If this persists, check:

  1. Filesystem permissions
  2. Available disk space
  3. Pod container status

Testing

Verify Script Improvements:

# 1. Run pre-flight checks
./regenerate_migrations_k8s.sh --dry-run

# 2. Test database cleanup
./cleanup_databases_k8s.sh --service auth --yes

# 3. Verify database is empty
kubectl exec -n bakery-ia <auth-pod> -c auth-service -- \
  python3 -c "
import asyncio, os
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy import text

async def check():
    engine = create_async_engine(os.getenv('AUTH_DATABASE_URL'))
    async with engine.connect() as conn:
        result = await conn.execute(text('SELECT COUNT(*) FROM pg_tables WHERE schemaname=\\'public\\''))
        print(f'Tables: {result.scalar()}')
    await engine.dispose()

asyncio.run(check())
"

# Expected output: Tables: 0

# 4. Generate migration
./regenerate_migrations_k8s.sh --verbose

# 5. Verify migration has content
cat services/auth/migrations/versions/*.py | grep "op.create_table"

Migration File Validation

Valid Migration (Has Schema Operations):

def upgrade() -> None:
    op.create_table('users',
        sa.Column('id', sa.UUID(), nullable=False),
        sa.Column('email', sa.String(255), nullable=False),
        # ...
    )

Invalid Migration (Empty):

def upgrade() -> None:
    pass  # ⚠ WARNING: No schema operations!

The script now:

  • Detects empty migrations
  • Shows warning with explanation
  • Suggests checking database cleanup

Summary of Changes

Area Before After
Table Drops Failed silently, errors hidden Proper error handling, visible errors
Database Reset Individual table drops (didn't work) Full schema DROP CASCADE (guaranteed clean)
File Copy No verification Checks exit code, file existence, and size
Error Visibility Errors redirected to log file Errors shown in console immediately
Production Safety Always allowed create_all() fallback Fails in production without migrations
Pre-flight Checks Basic kubectl check only Comprehensive environment verification
Database Cleanup Manual kubectl commands Dedicated helper script
Empty Migration Detection Silent generation Clear warnings with explanation

Future Improvements (Not Implemented)

Potential future enhancements:

  1. Parallel migration generation for faster execution
  2. Migration content diffing against previous versions
  3. Automatic rollback on migration generation failure
  4. Integration with CI/CD pipelines
  5. Migration validation against database constraints
  6. Automatic schema comparison and drift detection
  • regenerate_migrations_k8s.sh - Main migration generation script
  • cleanup_databases_k8s.sh - Database cleanup helper
  • shared/database/init_manager.py - Enhanced database initialization manager
  • services/*/migrations/versions/*.py - Generated migration files
  • services/*/migrations/env.py - Alembic environment configuration