# Migration Scripts Documentation This document describes the migration regeneration scripts and the improvements made to ensure reliable migration generation. ## Overview The migration system consists of: 1. **Main migration generation script** (`regenerate_migrations_k8s.sh`) 2. **Database cleanup helper** (`cleanup_databases_k8s.sh`) 3. **Enhanced DatabaseInitManager** (`shared/database/init_manager.py`) ## Problem Summary The original migration generation script had several critical issues: ### Root Cause 1. **Tables already existed in databases** - Created by K8s migration jobs using `create_all()` fallback 2. **Table drop mechanism failed silently** - Errors were hidden, script continued anyway 3. **Alembic detected no changes** - When tables matched models, empty migrations were generated 4. **File copy verification was insufficient** - `kubectl cp` reported success but files weren't copied locally ### Impact - **11 out of 14 services** generated empty migrations (only `pass` statements) - Only **3 services** (pos, suppliers, alert-processor) worked correctly because their DBs were clean - No visibility into actual errors during table drops - Migration files weren't being copied to local machine despite "success" messages ## Solutions Implemented ### 1. Fixed Script Table Drop Mechanism **File**: `regenerate_migrations_k8s.sh` #### Changes Made: **Before** (Lines 404-405): ```bash # Failed silently, errors hidden in log file kubectl exec ... -- sh -c "DROP TABLE ..." 2>>$LOG_FILE ``` **After** (Lines 397-512): ```bash # Complete database schema reset with proper error handling async with engine.begin() as conn: await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE')) await conn.execute(text('CREATE SCHEMA public')) await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC')) ``` #### Key Improvements: - ✅ Uses `engine.begin()` instead of `engine.connect()` for proper transaction management - ✅ Drops entire schema with CASCADE for guaranteed clean slate - ✅ Captures and displays error output in real-time (not hidden in logs) - ✅ Falls back to individual table drops if schema drop fails - ✅ Verifies database is empty after cleanup - ✅ Fails fast if cleanup fails (prevents generating empty migrations) ### 2. Enhanced kubectl cp Verification **File**: `regenerate_migrations_k8s.sh` (Lines 547-595) #### Improvements: ```bash # Verify file was actually copied if [ $CP_EXIT_CODE -eq 0 ] && [ -f "path/to/file" ]; then LOCAL_FILE_SIZE=$(wc -c < "path/to/file" | tr -d ' ') if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then echo "✓ Migration file copied: $FILENAME ($LOCAL_FILE_SIZE bytes)" else echo "✗ Migration file is empty (0 bytes)" # Clean up and fail fi fi ``` #### Key Improvements: - ✅ Checks exit code AND file existence - ✅ Verifies file size > 0 bytes - ✅ Displays actual error output from kubectl cp - ✅ Removes empty files automatically - ✅ Better warning messages for empty migrations ### 3. Enhanced Error Visibility #### Changes Throughout Script: - ✅ All Python error output captured and displayed: `2>&1` instead of `2>>$LOG_FILE` - ✅ Error messages shown in console immediately - ✅ Detailed failure reasons in summary - ✅ Exit codes checked for all critical operations ### 4. Modified DatabaseInitManager **File**: `shared/database/init_manager.py` #### New Features: **Environment-Aware Fallback Control**: ```python def __init__( self, # ... existing params allow_create_all_fallback: bool = True, environment: Optional[str] = None ): self.environment = environment or os.getenv('ENVIRONMENT', 'development') self.allow_create_all_fallback = allow_create_all_fallback ``` **Production Protection** (Lines 74-93): ```python elif not db_state["has_migrations"]: if self.allow_create_all_fallback: # Development mode: use create_all() self.logger.warning("No migrations found - using create_all() as fallback") result = await self._handle_no_migrations() else: # Production mode: FAIL instead of using create_all() error_msg = ( f"No migration files found for {self.service_name} and " f"create_all() fallback is disabled (environment: {self.environment}). " f"Migration files must be generated before deployment." ) raise Exception(error_msg) ``` #### Key Improvements: - ✅ **Auto-detects environment** from `ENVIRONMENT` env var - ✅ **Disables `create_all()` in production** - Forces proper migrations - ✅ **Allows fallback in dev/local/test** - Maintains developer convenience - ✅ **Clear error messages** when migrations are missing - ✅ **Backwards compatible** - Default behavior unchanged #### Environment Detection: | Environment Value | Fallback Allowed? | Behavior | |-------------------|-------------------|----------| | `development`, `dev`, `local`, `test` | ✅ Yes | Uses `create_all()` if no migrations | | `staging`, `production`, `prod` | ❌ No | Fails with clear error message | | Not set (default: `development`) | ✅ Yes | Uses `create_all()` if no migrations | ### 5. Pre-flight Checks **File**: `regenerate_migrations_k8s.sh` (Lines 75-187) #### New Pre-flight Check System: ```bash preflight_checks() { # Check kubectl installation and version # Check Kubernetes cluster connectivity # Check namespace exists # Check service pods are running # Check database drivers available # Check local directory structure # Check disk space } ``` #### Verifications: - ✅ kubectl installation and version - ✅ Kubernetes cluster connectivity - ✅ Namespace exists - ✅ Service pods running (shows count: X/14) - ✅ Database drivers (asyncpg) available - ✅ Local migration directories exist - ✅ Sufficient disk space - ✅ Option to continue even if checks fail ### 6. Database Cleanup Helper Script **New File**: `cleanup_databases_k8s.sh` #### Purpose: Standalone script to manually clean all service databases before running migration generation. #### Usage: ```bash # Clean all databases (with confirmation) ./cleanup_databases_k8s.sh # Clean all databases without confirmation ./cleanup_databases_k8s.sh --yes # Clean only specific service ./cleanup_databases_k8s.sh --service auth --yes # Use different namespace ./cleanup_databases_k8s.sh --namespace staging ``` #### Features: - ✅ Drops all tables using schema CASCADE - ✅ Verifies cleanup success - ✅ Shows before/after table counts - ✅ Can target specific services - ✅ Requires explicit confirmation (unless --yes) - ✅ Comprehensive summary with success/failure counts ## Recommended Workflow ### For Clean Migration Generation: ```bash # Step 1: Clean all databases ./cleanup_databases_k8s.sh --yes # Step 2: Generate migrations ./regenerate_migrations_k8s.sh --verbose # Step 3: Review generated migrations ls -lh services/*/migrations/versions/ # Step 4: Apply migrations (if testing) ./regenerate_migrations_k8s.sh --apply ``` ### For Production Deployment: 1. **Local Development**: ```bash # Generate migrations with clean databases ./cleanup_databases_k8s.sh --yes ./regenerate_migrations_k8s.sh --verbose ``` 2. **Commit Migrations**: ```bash git add services/*/migrations/versions/*.py git commit -m "Add initial schema migrations" ``` 3. **Build Docker Images**: - Migration files are included in Docker images - No runtime generation needed 4. **Deploy to Production**: - Set `ENVIRONMENT=production` in K8s manifests - If migrations missing → Deployment will fail with clear error - No `create_all()` fallback in production ## Environment Variables ### For DatabaseInitManager: ```yaml # Kubernetes deployment example env: - name: ENVIRONMENT value: "production" # or "staging", "development", "local", "test" ``` **Behavior by Environment**: - **development/dev/local/test**: Allows `create_all()` fallback if no migrations - **production/staging/prod**: Requires migrations, fails without them ## Script Options ### regenerate_migrations_k8s.sh ```bash ./regenerate_migrations_k8s.sh [OPTIONS] Options: --dry-run Show what would be done without making changes --skip-backup Skip backing up existing migrations --apply Automatically apply migrations after generation --check-existing Check for and copy existing migrations from pods first --verbose Enable detailed logging --skip-db-check Skip database connectivity check --namespace NAME Use specific Kubernetes namespace (default: bakery-ia) ``` ### cleanup_databases_k8s.sh ```bash ./cleanup_databases_k8s.sh [OPTIONS] Options: --namespace NAME Use specific Kubernetes namespace (default: bakery-ia) --service NAME Clean only specific service database --yes Skip confirmation prompt ``` ## Troubleshooting ### Problem: Empty Migrations Generated **Symptoms**: ```python def upgrade() -> None: pass def downgrade() -> None: pass ``` **Root Cause**: Tables already exist in database matching models **Solution**: ```bash # Clean database first ./cleanup_databases_k8s.sh --service --yes # Regenerate migrations ./regenerate_migrations_k8s.sh --verbose ``` ### Problem: "Database cleanup failed" **Symptoms**: ``` ✗ Database schema reset failed ERROR: permission denied for schema public ``` **Solution**: Check database user permissions. User needs `DROP SCHEMA` privilege: ```sql GRANT ALL PRIVILEGES ON SCHEMA public TO ; ``` ### Problem: "No migration file found in pod" **Symptoms**: ``` ✗ No migration file found in pod ``` **Possible Causes**: 1. Alembic autogenerate failed (check logs) 2. Models not properly imported 3. Migration directory permissions **Solution**: ```bash # Check pod logs kubectl logs -n bakery-ia -c -service # Check if models are importable kubectl exec -n bakery-ia -c -service -- \ python3 -c "from app.models import *; print('OK')" ``` ### Problem: kubectl cp Shows Success But File Not Copied **Symptoms**: ``` ✓ Migration file copied: file.py # But ls shows empty directory ``` **Solution**: The new script now verifies file size and will show: ``` ✗ Migration file is empty (0 bytes) ``` If this persists, check: 1. Filesystem permissions 2. Available disk space 3. Pod container status ## Testing ### Verify Script Improvements: ```bash # 1. Run pre-flight checks ./regenerate_migrations_k8s.sh --dry-run # 2. Test database cleanup ./cleanup_databases_k8s.sh --service auth --yes # 3. Verify database is empty kubectl exec -n bakery-ia -c auth-service -- \ python3 -c " import asyncio, os from sqlalchemy.ext.asyncio import create_async_engine from sqlalchemy import text async def check(): engine = create_async_engine(os.getenv('AUTH_DATABASE_URL')) async with engine.connect() as conn: result = await conn.execute(text('SELECT COUNT(*) FROM pg_tables WHERE schemaname=\\'public\\'')) print(f'Tables: {result.scalar()}') await engine.dispose() asyncio.run(check()) " # Expected output: Tables: 0 # 4. Generate migration ./regenerate_migrations_k8s.sh --verbose # 5. Verify migration has content cat services/auth/migrations/versions/*.py | grep "op.create_table" ``` ## Migration File Validation ### Valid Migration (Has Schema Operations): ```python def upgrade() -> None: op.create_table('users', sa.Column('id', sa.UUID(), nullable=False), sa.Column('email', sa.String(255), nullable=False), # ... ) ``` ### Invalid Migration (Empty): ```python def upgrade() -> None: pass # ⚠ WARNING: No schema operations! ``` The script now: - ✅ Detects empty migrations - ✅ Shows warning with explanation - ✅ Suggests checking database cleanup ## Summary of Changes | Area | Before | After | |------|--------|-------| | **Table Drops** | Failed silently, errors hidden | Proper error handling, visible errors | | **Database Reset** | Individual table drops (didn't work) | Full schema DROP CASCADE (guaranteed clean) | | **File Copy** | No verification | Checks exit code, file existence, and size | | **Error Visibility** | Errors redirected to log file | Errors shown in console immediately | | **Production Safety** | Always allowed create_all() fallback | Fails in production without migrations | | **Pre-flight Checks** | Basic kubectl check only | Comprehensive environment verification | | **Database Cleanup** | Manual kubectl commands | Dedicated helper script | | **Empty Migration Detection** | Silent generation | Clear warnings with explanation | ## Future Improvements (Not Implemented) Potential future enhancements: 1. Parallel migration generation for faster execution 2. Migration content diffing against previous versions 3. Automatic rollback on migration generation failure 4. Integration with CI/CD pipelines 5. Migration validation against database constraints 6. Automatic schema comparison and drift detection ## Related Files - `regenerate_migrations_k8s.sh` - Main migration generation script - `cleanup_databases_k8s.sh` - Database cleanup helper - `shared/database/init_manager.py` - Enhanced database initialization manager - `services/*/migrations/versions/*.py` - Generated migration files - `services/*/migrations/env.py` - Alembic environment configuration