13 KiB
Migration Scripts Documentation
This document describes the migration regeneration scripts and the improvements made to ensure reliable migration generation.
Overview
The migration system consists of:
- Main migration generation script (
regenerate_migrations_k8s.sh) - Database cleanup helper (
cleanup_databases_k8s.sh) - Enhanced DatabaseInitManager (
shared/database/init_manager.py)
Problem Summary
The original migration generation script had several critical issues:
Root Cause
- Tables already existed in databases - Created by K8s migration jobs using
create_all()fallback - Table drop mechanism failed silently - Errors were hidden, script continued anyway
- Alembic detected no changes - When tables matched models, empty migrations were generated
- File copy verification was insufficient -
kubectl cpreported success but files weren't copied locally
Impact
- 11 out of 14 services generated empty migrations (only
passstatements) - Only 3 services (pos, suppliers, alert-processor) worked correctly because their DBs were clean
- No visibility into actual errors during table drops
- Migration files weren't being copied to local machine despite "success" messages
Solutions Implemented
1. Fixed Script Table Drop Mechanism
File: regenerate_migrations_k8s.sh
Changes Made:
Before (Lines 404-405):
# Failed silently, errors hidden in log file
kubectl exec ... -- sh -c "DROP TABLE ..." 2>>$LOG_FILE
After (Lines 397-512):
# Complete database schema reset with proper error handling
async with engine.begin() as conn:
await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE'))
await conn.execute(text('CREATE SCHEMA public'))
await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC'))
Key Improvements:
- ✅ Uses
engine.begin()instead ofengine.connect()for proper transaction management - ✅ Drops entire schema with CASCADE for guaranteed clean slate
- ✅ Captures and displays error output in real-time (not hidden in logs)
- ✅ Falls back to individual table drops if schema drop fails
- ✅ Verifies database is empty after cleanup
- ✅ Fails fast if cleanup fails (prevents generating empty migrations)
2. Enhanced kubectl cp Verification
File: regenerate_migrations_k8s.sh (Lines 547-595)
Improvements:
# Verify file was actually copied
if [ $CP_EXIT_CODE -eq 0 ] && [ -f "path/to/file" ]; then
LOCAL_FILE_SIZE=$(wc -c < "path/to/file" | tr -d ' ')
if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then
echo "✓ Migration file copied: $FILENAME ($LOCAL_FILE_SIZE bytes)"
else
echo "✗ Migration file is empty (0 bytes)"
# Clean up and fail
fi
fi
Key Improvements:
- ✅ Checks exit code AND file existence
- ✅ Verifies file size > 0 bytes
- ✅ Displays actual error output from kubectl cp
- ✅ Removes empty files automatically
- ✅ Better warning messages for empty migrations
3. Enhanced Error Visibility
Changes Throughout Script:
- ✅ All Python error output captured and displayed:
2>&1instead of2>>$LOG_FILE - ✅ Error messages shown in console immediately
- ✅ Detailed failure reasons in summary
- ✅ Exit codes checked for all critical operations
4. Modified DatabaseInitManager
File: shared/database/init_manager.py
New Features:
Environment-Aware Fallback Control:
def __init__(
self,
# ... existing params
allow_create_all_fallback: bool = True,
environment: Optional[str] = None
):
self.environment = environment or os.getenv('ENVIRONMENT', 'development')
self.allow_create_all_fallback = allow_create_all_fallback
Production Protection (Lines 74-93):
elif not db_state["has_migrations"]:
if self.allow_create_all_fallback:
# Development mode: use create_all()
self.logger.warning("No migrations found - using create_all() as fallback")
result = await self._handle_no_migrations()
else:
# Production mode: FAIL instead of using create_all()
error_msg = (
f"No migration files found for {self.service_name} and "
f"create_all() fallback is disabled (environment: {self.environment}). "
f"Migration files must be generated before deployment."
)
raise Exception(error_msg)
Key Improvements:
- ✅ Auto-detects environment from
ENVIRONMENTenv var - ✅ Disables
create_all()in production - Forces proper migrations - ✅ Allows fallback in dev/local/test - Maintains developer convenience
- ✅ Clear error messages when migrations are missing
- ✅ Backwards compatible - Default behavior unchanged
Environment Detection:
| Environment Value | Fallback Allowed? | Behavior |
|---|---|---|
development, dev, local, test |
✅ Yes | Uses create_all() if no migrations |
staging, production, prod |
❌ No | Fails with clear error message |
Not set (default: development) |
✅ Yes | Uses create_all() if no migrations |
5. Pre-flight Checks
File: regenerate_migrations_k8s.sh (Lines 75-187)
New Pre-flight Check System:
preflight_checks() {
# Check kubectl installation and version
# Check Kubernetes cluster connectivity
# Check namespace exists
# Check service pods are running
# Check database drivers available
# Check local directory structure
# Check disk space
}
Verifications:
- ✅ kubectl installation and version
- ✅ Kubernetes cluster connectivity
- ✅ Namespace exists
- ✅ Service pods running (shows count: X/14)
- ✅ Database drivers (asyncpg) available
- ✅ Local migration directories exist
- ✅ Sufficient disk space
- ✅ Option to continue even if checks fail
6. Database Cleanup Helper Script
New File: cleanup_databases_k8s.sh
Purpose:
Standalone script to manually clean all service databases before running migration generation.
Usage:
# Clean all databases (with confirmation)
./cleanup_databases_k8s.sh
# Clean all databases without confirmation
./cleanup_databases_k8s.sh --yes
# Clean only specific service
./cleanup_databases_k8s.sh --service auth --yes
# Use different namespace
./cleanup_databases_k8s.sh --namespace staging
Features:
- ✅ Drops all tables using schema CASCADE
- ✅ Verifies cleanup success
- ✅ Shows before/after table counts
- ✅ Can target specific services
- ✅ Requires explicit confirmation (unless --yes)
- ✅ Comprehensive summary with success/failure counts
Recommended Workflow
For Clean Migration Generation:
# Step 1: Clean all databases
./cleanup_databases_k8s.sh --yes
# Step 2: Generate migrations
./regenerate_migrations_k8s.sh --verbose
# Step 3: Review generated migrations
ls -lh services/*/migrations/versions/
# Step 4: Apply migrations (if testing)
./regenerate_migrations_k8s.sh --apply
For Production Deployment:
-
Local Development:
# Generate migrations with clean databases ./cleanup_databases_k8s.sh --yes ./regenerate_migrations_k8s.sh --verbose -
Commit Migrations:
git add services/*/migrations/versions/*.py git commit -m "Add initial schema migrations" -
Build Docker Images:
- Migration files are included in Docker images
- No runtime generation needed
-
Deploy to Production:
- Set
ENVIRONMENT=productionin K8s manifests - If migrations missing → Deployment will fail with clear error
- No
create_all()fallback in production
- Set
Environment Variables
For DatabaseInitManager:
# Kubernetes deployment example
env:
- name: ENVIRONMENT
value: "production" # or "staging", "development", "local", "test"
Behavior by Environment:
- development/dev/local/test: Allows
create_all()fallback if no migrations - production/staging/prod: Requires migrations, fails without them
Script Options
regenerate_migrations_k8s.sh
./regenerate_migrations_k8s.sh [OPTIONS]
Options:
--dry-run Show what would be done without making changes
--skip-backup Skip backing up existing migrations
--apply Automatically apply migrations after generation
--check-existing Check for and copy existing migrations from pods first
--verbose Enable detailed logging
--skip-db-check Skip database connectivity check
--namespace NAME Use specific Kubernetes namespace (default: bakery-ia)
cleanup_databases_k8s.sh
./cleanup_databases_k8s.sh [OPTIONS]
Options:
--namespace NAME Use specific Kubernetes namespace (default: bakery-ia)
--service NAME Clean only specific service database
--yes Skip confirmation prompt
Troubleshooting
Problem: Empty Migrations Generated
Symptoms:
def upgrade() -> None:
pass
def downgrade() -> None:
pass
Root Cause: Tables already exist in database matching models
Solution:
# Clean database first
./cleanup_databases_k8s.sh --service <service-name> --yes
# Regenerate migrations
./regenerate_migrations_k8s.sh --verbose
Problem: "Database cleanup failed"
Symptoms:
✗ Database schema reset failed
ERROR: permission denied for schema public
Solution:
Check database user permissions. User needs DROP SCHEMA privilege:
GRANT ALL PRIVILEGES ON SCHEMA public TO <service_user>;
Problem: "No migration file found in pod"
Symptoms:
✗ No migration file found in pod
Possible Causes:
- Alembic autogenerate failed (check logs)
- Models not properly imported
- Migration directory permissions
Solution:
# Check pod logs
kubectl logs -n bakery-ia <pod-name> -c <service>-service
# Check if models are importable
kubectl exec -n bakery-ia <pod-name> -c <service>-service -- \
python3 -c "from app.models import *; print('OK')"
Problem: kubectl cp Shows Success But File Not Copied
Symptoms:
✓ Migration file copied: file.py
# But ls shows empty directory
Solution: The new script now verifies file size and will show:
✗ Migration file is empty (0 bytes)
If this persists, check:
- Filesystem permissions
- Available disk space
- Pod container status
Testing
Verify Script Improvements:
# 1. Run pre-flight checks
./regenerate_migrations_k8s.sh --dry-run
# 2. Test database cleanup
./cleanup_databases_k8s.sh --service auth --yes
# 3. Verify database is empty
kubectl exec -n bakery-ia <auth-pod> -c auth-service -- \
python3 -c "
import asyncio, os
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy import text
async def check():
engine = create_async_engine(os.getenv('AUTH_DATABASE_URL'))
async with engine.connect() as conn:
result = await conn.execute(text('SELECT COUNT(*) FROM pg_tables WHERE schemaname=\\'public\\''))
print(f'Tables: {result.scalar()}')
await engine.dispose()
asyncio.run(check())
"
# Expected output: Tables: 0
# 4. Generate migration
./regenerate_migrations_k8s.sh --verbose
# 5. Verify migration has content
cat services/auth/migrations/versions/*.py | grep "op.create_table"
Migration File Validation
Valid Migration (Has Schema Operations):
def upgrade() -> None:
op.create_table('users',
sa.Column('id', sa.UUID(), nullable=False),
sa.Column('email', sa.String(255), nullable=False),
# ...
)
Invalid Migration (Empty):
def upgrade() -> None:
pass # ⚠ WARNING: No schema operations!
The script now:
- ✅ Detects empty migrations
- ✅ Shows warning with explanation
- ✅ Suggests checking database cleanup
Summary of Changes
| Area | Before | After |
|---|---|---|
| Table Drops | Failed silently, errors hidden | Proper error handling, visible errors |
| Database Reset | Individual table drops (didn't work) | Full schema DROP CASCADE (guaranteed clean) |
| File Copy | No verification | Checks exit code, file existence, and size |
| Error Visibility | Errors redirected to log file | Errors shown in console immediately |
| Production Safety | Always allowed create_all() fallback | Fails in production without migrations |
| Pre-flight Checks | Basic kubectl check only | Comprehensive environment verification |
| Database Cleanup | Manual kubectl commands | Dedicated helper script |
| Empty Migration Detection | Silent generation | Clear warnings with explanation |
Future Improvements (Not Implemented)
Potential future enhancements:
- Parallel migration generation for faster execution
- Migration content diffing against previous versions
- Automatic rollback on migration generation failure
- Integration with CI/CD pipelines
- Migration validation against database constraints
- Automatic schema comparison and drift detection
Related Files
regenerate_migrations_k8s.sh- Main migration generation scriptcleanup_databases_k8s.sh- Database cleanup helpershared/database/init_manager.py- Enhanced database initialization managerservices/*/migrations/versions/*.py- Generated migration filesservices/*/migrations/env.py- Alembic environment configuration