452 lines
13 KiB
Markdown
452 lines
13 KiB
Markdown
|
|
# Migration Scripts Documentation
|
||
|
|
|
||
|
|
This document describes the migration regeneration scripts and the improvements made to ensure reliable migration generation.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The migration system consists of:
|
||
|
|
1. **Main migration generation script** (`regenerate_migrations_k8s.sh`)
|
||
|
|
2. **Database cleanup helper** (`cleanup_databases_k8s.sh`)
|
||
|
|
3. **Enhanced DatabaseInitManager** (`shared/database/init_manager.py`)
|
||
|
|
|
||
|
|
## Problem Summary
|
||
|
|
|
||
|
|
The original migration generation script had several critical issues:
|
||
|
|
|
||
|
|
### Root Cause
|
||
|
|
1. **Tables already existed in databases** - Created by K8s migration jobs using `create_all()` fallback
|
||
|
|
2. **Table drop mechanism failed silently** - Errors were hidden, script continued anyway
|
||
|
|
3. **Alembic detected no changes** - When tables matched models, empty migrations were generated
|
||
|
|
4. **File copy verification was insufficient** - `kubectl cp` reported success but files weren't copied locally
|
||
|
|
|
||
|
|
### Impact
|
||
|
|
- **11 out of 14 services** generated empty migrations (only `pass` statements)
|
||
|
|
- Only **3 services** (pos, suppliers, alert-processor) worked correctly because their DBs were clean
|
||
|
|
- No visibility into actual errors during table drops
|
||
|
|
- Migration files weren't being copied to local machine despite "success" messages
|
||
|
|
|
||
|
|
## Solutions Implemented
|
||
|
|
|
||
|
|
### 1. Fixed Script Table Drop Mechanism
|
||
|
|
|
||
|
|
**File**: `regenerate_migrations_k8s.sh`
|
||
|
|
|
||
|
|
#### Changes Made:
|
||
|
|
|
||
|
|
**Before** (Lines 404-405):
|
||
|
|
```bash
|
||
|
|
# Failed silently, errors hidden in log file
|
||
|
|
kubectl exec ... -- sh -c "DROP TABLE ..." 2>>$LOG_FILE
|
||
|
|
```
|
||
|
|
|
||
|
|
**After** (Lines 397-512):
|
||
|
|
```bash
|
||
|
|
# Complete database schema reset with proper error handling
|
||
|
|
async with engine.begin() as conn:
|
||
|
|
await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE'))
|
||
|
|
await conn.execute(text('CREATE SCHEMA public'))
|
||
|
|
await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC'))
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Key Improvements:
|
||
|
|
- ✅ Uses `engine.begin()` instead of `engine.connect()` for proper transaction management
|
||
|
|
- ✅ Drops entire schema with CASCADE for guaranteed clean slate
|
||
|
|
- ✅ Captures and displays error output in real-time (not hidden in logs)
|
||
|
|
- ✅ Falls back to individual table drops if schema drop fails
|
||
|
|
- ✅ Verifies database is empty after cleanup
|
||
|
|
- ✅ Fails fast if cleanup fails (prevents generating empty migrations)
|
||
|
|
|
||
|
|
### 2. Enhanced kubectl cp Verification
|
||
|
|
|
||
|
|
**File**: `regenerate_migrations_k8s.sh` (Lines 547-595)
|
||
|
|
|
||
|
|
#### Improvements:
|
||
|
|
```bash
|
||
|
|
# Verify file was actually copied
|
||
|
|
if [ $CP_EXIT_CODE -eq 0 ] && [ -f "path/to/file" ]; then
|
||
|
|
LOCAL_FILE_SIZE=$(wc -c < "path/to/file" | tr -d ' ')
|
||
|
|
|
||
|
|
if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then
|
||
|
|
echo "✓ Migration file copied: $FILENAME ($LOCAL_FILE_SIZE bytes)"
|
||
|
|
else
|
||
|
|
echo "✗ Migration file is empty (0 bytes)"
|
||
|
|
# Clean up and fail
|
||
|
|
fi
|
||
|
|
fi
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Key Improvements:
|
||
|
|
- ✅ Checks exit code AND file existence
|
||
|
|
- ✅ Verifies file size > 0 bytes
|
||
|
|
- ✅ Displays actual error output from kubectl cp
|
||
|
|
- ✅ Removes empty files automatically
|
||
|
|
- ✅ Better warning messages for empty migrations
|
||
|
|
|
||
|
|
### 3. Enhanced Error Visibility
|
||
|
|
|
||
|
|
#### Changes Throughout Script:
|
||
|
|
- ✅ All Python error output captured and displayed: `2>&1` instead of `2>>$LOG_FILE`
|
||
|
|
- ✅ Error messages shown in console immediately
|
||
|
|
- ✅ Detailed failure reasons in summary
|
||
|
|
- ✅ Exit codes checked for all critical operations
|
||
|
|
|
||
|
|
### 4. Modified DatabaseInitManager
|
||
|
|
|
||
|
|
**File**: `shared/database/init_manager.py`
|
||
|
|
|
||
|
|
#### New Features:
|
||
|
|
|
||
|
|
**Environment-Aware Fallback Control**:
|
||
|
|
```python
|
||
|
|
def __init__(
|
||
|
|
self,
|
||
|
|
# ... existing params
|
||
|
|
allow_create_all_fallback: bool = True,
|
||
|
|
environment: Optional[str] = None
|
||
|
|
):
|
||
|
|
self.environment = environment or os.getenv('ENVIRONMENT', 'development')
|
||
|
|
self.allow_create_all_fallback = allow_create_all_fallback
|
||
|
|
```
|
||
|
|
|
||
|
|
**Production Protection** (Lines 74-93):
|
||
|
|
```python
|
||
|
|
elif not db_state["has_migrations"]:
|
||
|
|
if self.allow_create_all_fallback:
|
||
|
|
# Development mode: use create_all()
|
||
|
|
self.logger.warning("No migrations found - using create_all() as fallback")
|
||
|
|
result = await self._handle_no_migrations()
|
||
|
|
else:
|
||
|
|
# Production mode: FAIL instead of using create_all()
|
||
|
|
error_msg = (
|
||
|
|
f"No migration files found for {self.service_name} and "
|
||
|
|
f"create_all() fallback is disabled (environment: {self.environment}). "
|
||
|
|
f"Migration files must be generated before deployment."
|
||
|
|
)
|
||
|
|
raise Exception(error_msg)
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Key Improvements:
|
||
|
|
- ✅ **Auto-detects environment** from `ENVIRONMENT` env var
|
||
|
|
- ✅ **Disables `create_all()` in production** - Forces proper migrations
|
||
|
|
- ✅ **Allows fallback in dev/local/test** - Maintains developer convenience
|
||
|
|
- ✅ **Clear error messages** when migrations are missing
|
||
|
|
- ✅ **Backwards compatible** - Default behavior unchanged
|
||
|
|
|
||
|
|
#### Environment Detection:
|
||
|
|
| Environment Value | Fallback Allowed? | Behavior |
|
||
|
|
|-------------------|-------------------|----------|
|
||
|
|
| `development`, `dev`, `local`, `test` | ✅ Yes | Uses `create_all()` if no migrations |
|
||
|
|
| `staging`, `production`, `prod` | ❌ No | Fails with clear error message |
|
||
|
|
| Not set (default: `development`) | ✅ Yes | Uses `create_all()` if no migrations |
|
||
|
|
|
||
|
|
### 5. Pre-flight Checks
|
||
|
|
|
||
|
|
**File**: `regenerate_migrations_k8s.sh` (Lines 75-187)
|
||
|
|
|
||
|
|
#### New Pre-flight Check System:
|
||
|
|
```bash
|
||
|
|
preflight_checks() {
|
||
|
|
# Check kubectl installation and version
|
||
|
|
# Check Kubernetes cluster connectivity
|
||
|
|
# Check namespace exists
|
||
|
|
# Check service pods are running
|
||
|
|
# Check database drivers available
|
||
|
|
# Check local directory structure
|
||
|
|
# Check disk space
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Verifications:
|
||
|
|
- ✅ kubectl installation and version
|
||
|
|
- ✅ Kubernetes cluster connectivity
|
||
|
|
- ✅ Namespace exists
|
||
|
|
- ✅ Service pods running (shows count: X/14)
|
||
|
|
- ✅ Database drivers (asyncpg) available
|
||
|
|
- ✅ Local migration directories exist
|
||
|
|
- ✅ Sufficient disk space
|
||
|
|
- ✅ Option to continue even if checks fail
|
||
|
|
|
||
|
|
### 6. Database Cleanup Helper Script
|
||
|
|
|
||
|
|
**New File**: `cleanup_databases_k8s.sh`
|
||
|
|
|
||
|
|
#### Purpose:
|
||
|
|
Standalone script to manually clean all service databases before running migration generation.
|
||
|
|
|
||
|
|
#### Usage:
|
||
|
|
```bash
|
||
|
|
# Clean all databases (with confirmation)
|
||
|
|
./cleanup_databases_k8s.sh
|
||
|
|
|
||
|
|
# Clean all databases without confirmation
|
||
|
|
./cleanup_databases_k8s.sh --yes
|
||
|
|
|
||
|
|
# Clean only specific service
|
||
|
|
./cleanup_databases_k8s.sh --service auth --yes
|
||
|
|
|
||
|
|
# Use different namespace
|
||
|
|
./cleanup_databases_k8s.sh --namespace staging
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Features:
|
||
|
|
- ✅ Drops all tables using schema CASCADE
|
||
|
|
- ✅ Verifies cleanup success
|
||
|
|
- ✅ Shows before/after table counts
|
||
|
|
- ✅ Can target specific services
|
||
|
|
- ✅ Requires explicit confirmation (unless --yes)
|
||
|
|
- ✅ Comprehensive summary with success/failure counts
|
||
|
|
|
||
|
|
## Recommended Workflow
|
||
|
|
|
||
|
|
### For Clean Migration Generation:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Step 1: Clean all databases
|
||
|
|
./cleanup_databases_k8s.sh --yes
|
||
|
|
|
||
|
|
# Step 2: Generate migrations
|
||
|
|
./regenerate_migrations_k8s.sh --verbose
|
||
|
|
|
||
|
|
# Step 3: Review generated migrations
|
||
|
|
ls -lh services/*/migrations/versions/
|
||
|
|
|
||
|
|
# Step 4: Apply migrations (if testing)
|
||
|
|
./regenerate_migrations_k8s.sh --apply
|
||
|
|
```
|
||
|
|
|
||
|
|
### For Production Deployment:
|
||
|
|
|
||
|
|
1. **Local Development**:
|
||
|
|
```bash
|
||
|
|
# Generate migrations with clean databases
|
||
|
|
./cleanup_databases_k8s.sh --yes
|
||
|
|
./regenerate_migrations_k8s.sh --verbose
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Commit Migrations**:
|
||
|
|
```bash
|
||
|
|
git add services/*/migrations/versions/*.py
|
||
|
|
git commit -m "Add initial schema migrations"
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Build Docker Images**:
|
||
|
|
- Migration files are included in Docker images
|
||
|
|
- No runtime generation needed
|
||
|
|
|
||
|
|
4. **Deploy to Production**:
|
||
|
|
- Set `ENVIRONMENT=production` in K8s manifests
|
||
|
|
- If migrations missing → Deployment will fail with clear error
|
||
|
|
- No `create_all()` fallback in production
|
||
|
|
|
||
|
|
## Environment Variables
|
||
|
|
|
||
|
|
### For DatabaseInitManager:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# Kubernetes deployment example
|
||
|
|
env:
|
||
|
|
- name: ENVIRONMENT
|
||
|
|
value: "production" # or "staging", "development", "local", "test"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Behavior by Environment**:
|
||
|
|
- **development/dev/local/test**: Allows `create_all()` fallback if no migrations
|
||
|
|
- **production/staging/prod**: Requires migrations, fails without them
|
||
|
|
|
||
|
|
## Script Options
|
||
|
|
|
||
|
|
### regenerate_migrations_k8s.sh
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./regenerate_migrations_k8s.sh [OPTIONS]
|
||
|
|
|
||
|
|
Options:
|
||
|
|
--dry-run Show what would be done without making changes
|
||
|
|
--skip-backup Skip backing up existing migrations
|
||
|
|
--apply Automatically apply migrations after generation
|
||
|
|
--check-existing Check for and copy existing migrations from pods first
|
||
|
|
--verbose Enable detailed logging
|
||
|
|
--skip-db-check Skip database connectivity check
|
||
|
|
--namespace NAME Use specific Kubernetes namespace (default: bakery-ia)
|
||
|
|
```
|
||
|
|
|
||
|
|
### cleanup_databases_k8s.sh
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./cleanup_databases_k8s.sh [OPTIONS]
|
||
|
|
|
||
|
|
Options:
|
||
|
|
--namespace NAME Use specific Kubernetes namespace (default: bakery-ia)
|
||
|
|
--service NAME Clean only specific service database
|
||
|
|
--yes Skip confirmation prompt
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Problem: Empty Migrations Generated
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```python
|
||
|
|
def upgrade() -> None:
|
||
|
|
pass
|
||
|
|
|
||
|
|
def downgrade() -> None:
|
||
|
|
pass
|
||
|
|
```
|
||
|
|
|
||
|
|
**Root Cause**: Tables already exist in database matching models
|
||
|
|
|
||
|
|
**Solution**:
|
||
|
|
```bash
|
||
|
|
# Clean database first
|
||
|
|
./cleanup_databases_k8s.sh --service <service-name> --yes
|
||
|
|
|
||
|
|
# Regenerate migrations
|
||
|
|
./regenerate_migrations_k8s.sh --verbose
|
||
|
|
```
|
||
|
|
|
||
|
|
### Problem: "Database cleanup failed"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
✗ Database schema reset failed
|
||
|
|
ERROR: permission denied for schema public
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solution**:
|
||
|
|
Check database user permissions. User needs `DROP SCHEMA` privilege:
|
||
|
|
```sql
|
||
|
|
GRANT ALL PRIVILEGES ON SCHEMA public TO <service_user>;
|
||
|
|
```
|
||
|
|
|
||
|
|
### Problem: "No migration file found in pod"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
✗ No migration file found in pod
|
||
|
|
```
|
||
|
|
|
||
|
|
**Possible Causes**:
|
||
|
|
1. Alembic autogenerate failed (check logs)
|
||
|
|
2. Models not properly imported
|
||
|
|
3. Migration directory permissions
|
||
|
|
|
||
|
|
**Solution**:
|
||
|
|
```bash
|
||
|
|
# Check pod logs
|
||
|
|
kubectl logs -n bakery-ia <pod-name> -c <service>-service
|
||
|
|
|
||
|
|
# Check if models are importable
|
||
|
|
kubectl exec -n bakery-ia <pod-name> -c <service>-service -- \
|
||
|
|
python3 -c "from app.models import *; print('OK')"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Problem: kubectl cp Shows Success But File Not Copied
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
✓ Migration file copied: file.py
|
||
|
|
# But ls shows empty directory
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solution**: The new script now verifies file size and will show:
|
||
|
|
```
|
||
|
|
✗ Migration file is empty (0 bytes)
|
||
|
|
```
|
||
|
|
|
||
|
|
If this persists, check:
|
||
|
|
1. Filesystem permissions
|
||
|
|
2. Available disk space
|
||
|
|
3. Pod container status
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
### Verify Script Improvements:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Run pre-flight checks
|
||
|
|
./regenerate_migrations_k8s.sh --dry-run
|
||
|
|
|
||
|
|
# 2. Test database cleanup
|
||
|
|
./cleanup_databases_k8s.sh --service auth --yes
|
||
|
|
|
||
|
|
# 3. Verify database is empty
|
||
|
|
kubectl exec -n bakery-ia <auth-pod> -c auth-service -- \
|
||
|
|
python3 -c "
|
||
|
|
import asyncio, os
|
||
|
|
from sqlalchemy.ext.asyncio import create_async_engine
|
||
|
|
from sqlalchemy import text
|
||
|
|
|
||
|
|
async def check():
|
||
|
|
engine = create_async_engine(os.getenv('AUTH_DATABASE_URL'))
|
||
|
|
async with engine.connect() as conn:
|
||
|
|
result = await conn.execute(text('SELECT COUNT(*) FROM pg_tables WHERE schemaname=\\'public\\''))
|
||
|
|
print(f'Tables: {result.scalar()}')
|
||
|
|
await engine.dispose()
|
||
|
|
|
||
|
|
asyncio.run(check())
|
||
|
|
"
|
||
|
|
|
||
|
|
# Expected output: Tables: 0
|
||
|
|
|
||
|
|
# 4. Generate migration
|
||
|
|
./regenerate_migrations_k8s.sh --verbose
|
||
|
|
|
||
|
|
# 5. Verify migration has content
|
||
|
|
cat services/auth/migrations/versions/*.py | grep "op.create_table"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Migration File Validation
|
||
|
|
|
||
|
|
### Valid Migration (Has Schema Operations):
|
||
|
|
```python
|
||
|
|
def upgrade() -> None:
|
||
|
|
op.create_table('users',
|
||
|
|
sa.Column('id', sa.UUID(), nullable=False),
|
||
|
|
sa.Column('email', sa.String(255), nullable=False),
|
||
|
|
# ...
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Invalid Migration (Empty):
|
||
|
|
```python
|
||
|
|
def upgrade() -> None:
|
||
|
|
pass # ⚠ WARNING: No schema operations!
|
||
|
|
```
|
||
|
|
|
||
|
|
The script now:
|
||
|
|
- ✅ Detects empty migrations
|
||
|
|
- ✅ Shows warning with explanation
|
||
|
|
- ✅ Suggests checking database cleanup
|
||
|
|
|
||
|
|
## Summary of Changes
|
||
|
|
|
||
|
|
| Area | Before | After |
|
||
|
|
|------|--------|-------|
|
||
|
|
| **Table Drops** | Failed silently, errors hidden | Proper error handling, visible errors |
|
||
|
|
| **Database Reset** | Individual table drops (didn't work) | Full schema DROP CASCADE (guaranteed clean) |
|
||
|
|
| **File Copy** | No verification | Checks exit code, file existence, and size |
|
||
|
|
| **Error Visibility** | Errors redirected to log file | Errors shown in console immediately |
|
||
|
|
| **Production Safety** | Always allowed create_all() fallback | Fails in production without migrations |
|
||
|
|
| **Pre-flight Checks** | Basic kubectl check only | Comprehensive environment verification |
|
||
|
|
| **Database Cleanup** | Manual kubectl commands | Dedicated helper script |
|
||
|
|
| **Empty Migration Detection** | Silent generation | Clear warnings with explanation |
|
||
|
|
|
||
|
|
## Future Improvements (Not Implemented)
|
||
|
|
|
||
|
|
Potential future enhancements:
|
||
|
|
1. Parallel migration generation for faster execution
|
||
|
|
2. Migration content diffing against previous versions
|
||
|
|
3. Automatic rollback on migration generation failure
|
||
|
|
4. Integration with CI/CD pipelines
|
||
|
|
5. Migration validation against database constraints
|
||
|
|
6. Automatic schema comparison and drift detection
|
||
|
|
|
||
|
|
## Related Files
|
||
|
|
|
||
|
|
- `regenerate_migrations_k8s.sh` - Main migration generation script
|
||
|
|
- `cleanup_databases_k8s.sh` - Database cleanup helper
|
||
|
|
- `shared/database/init_manager.py` - Enhanced database initialization manager
|
||
|
|
- `services/*/migrations/versions/*.py` - Generated migration files
|
||
|
|
- `services/*/migrations/env.py` - Alembic environment configuration
|