Fix Alembic issue

This commit is contained in:
Urtzi Alfaro
2025-10-01 11:24:06 +02:00
parent 7cc4b957a5
commit 2eeebfc1e0
62 changed files with 6114 additions and 3676 deletions

451
MIGRATION_SCRIPTS_README.md Normal file
View File

@@ -0,0 +1,451 @@
# Migration Scripts Documentation
This document describes the migration regeneration scripts and the improvements made to ensure reliable migration generation.
## Overview
The migration system consists of:
1. **Main migration generation script** (`regenerate_migrations_k8s.sh`)
2. **Database cleanup helper** (`cleanup_databases_k8s.sh`)
3. **Enhanced DatabaseInitManager** (`shared/database/init_manager.py`)
## Problem Summary
The original migration generation script had several critical issues:
### Root Cause
1. **Tables already existed in databases** - Created by K8s migration jobs using `create_all()` fallback
2. **Table drop mechanism failed silently** - Errors were hidden, script continued anyway
3. **Alembic detected no changes** - When tables matched models, empty migrations were generated
4. **File copy verification was insufficient** - `kubectl cp` reported success but files weren't copied locally
### Impact
- **11 out of 14 services** generated empty migrations (only `pass` statements)
- Only **3 services** (pos, suppliers, alert-processor) worked correctly because their DBs were clean
- No visibility into actual errors during table drops
- Migration files weren't being copied to local machine despite "success" messages
## Solutions Implemented
### 1. Fixed Script Table Drop Mechanism
**File**: `regenerate_migrations_k8s.sh`
#### Changes Made:
**Before** (Lines 404-405):
```bash
# Failed silently, errors hidden in log file
kubectl exec ... -- sh -c "DROP TABLE ..." 2>>$LOG_FILE
```
**After** (Lines 397-512):
```bash
# Complete database schema reset with proper error handling
async with engine.begin() as conn:
await conn.execute(text('DROP SCHEMA IF EXISTS public CASCADE'))
await conn.execute(text('CREATE SCHEMA public'))
await conn.execute(text('GRANT ALL ON SCHEMA public TO PUBLIC'))
```
#### Key Improvements:
- ✅ Uses `engine.begin()` instead of `engine.connect()` for proper transaction management
- ✅ Drops entire schema with CASCADE for guaranteed clean slate
- ✅ Captures and displays error output in real-time (not hidden in logs)
- ✅ Falls back to individual table drops if schema drop fails
- ✅ Verifies database is empty after cleanup
- ✅ Fails fast if cleanup fails (prevents generating empty migrations)
### 2. Enhanced kubectl cp Verification
**File**: `regenerate_migrations_k8s.sh` (Lines 547-595)
#### Improvements:
```bash
# Verify file was actually copied
if [ $CP_EXIT_CODE -eq 0 ] && [ -f "path/to/file" ]; then
LOCAL_FILE_SIZE=$(wc -c < "path/to/file" | tr -d ' ')
if [ "$LOCAL_FILE_SIZE" -gt 0 ]; then
echo "✓ Migration file copied: $FILENAME ($LOCAL_FILE_SIZE bytes)"
else
echo "✗ Migration file is empty (0 bytes)"
# Clean up and fail
fi
fi
```
#### Key Improvements:
- ✅ Checks exit code AND file existence
- ✅ Verifies file size > 0 bytes
- ✅ Displays actual error output from kubectl cp
- ✅ Removes empty files automatically
- ✅ Better warning messages for empty migrations
### 3. Enhanced Error Visibility
#### Changes Throughout Script:
- ✅ All Python error output captured and displayed: `2>&1` instead of `2>>$LOG_FILE`
- ✅ Error messages shown in console immediately
- ✅ Detailed failure reasons in summary
- ✅ Exit codes checked for all critical operations
### 4. Modified DatabaseInitManager
**File**: `shared/database/init_manager.py`
#### New Features:
**Environment-Aware Fallback Control**:
```python
def __init__(
self,
# ... existing params
allow_create_all_fallback: bool = True,
environment: Optional[str] = None
):
self.environment = environment or os.getenv('ENVIRONMENT', 'development')
self.allow_create_all_fallback = allow_create_all_fallback
```
**Production Protection** (Lines 74-93):
```python
elif not db_state["has_migrations"]:
if self.allow_create_all_fallback:
# Development mode: use create_all()
self.logger.warning("No migrations found - using create_all() as fallback")
result = await self._handle_no_migrations()
else:
# Production mode: FAIL instead of using create_all()
error_msg = (
f"No migration files found for {self.service_name} and "
f"create_all() fallback is disabled (environment: {self.environment}). "
f"Migration files must be generated before deployment."
)
raise Exception(error_msg)
```
#### Key Improvements:
-**Auto-detects environment** from `ENVIRONMENT` env var
-**Disables `create_all()` in production** - Forces proper migrations
-**Allows fallback in dev/local/test** - Maintains developer convenience
-**Clear error messages** when migrations are missing
-**Backwards compatible** - Default behavior unchanged
#### Environment Detection:
| Environment Value | Fallback Allowed? | Behavior |
|-------------------|-------------------|----------|
| `development`, `dev`, `local`, `test` | ✅ Yes | Uses `create_all()` if no migrations |
| `staging`, `production`, `prod` | ❌ No | Fails with clear error message |
| Not set (default: `development`) | ✅ Yes | Uses `create_all()` if no migrations |
### 5. Pre-flight Checks
**File**: `regenerate_migrations_k8s.sh` (Lines 75-187)
#### New Pre-flight Check System:
```bash
preflight_checks() {
# Check kubectl installation and version
# Check Kubernetes cluster connectivity
# Check namespace exists
# Check service pods are running
# Check database drivers available
# Check local directory structure
# Check disk space
}
```
#### Verifications:
- ✅ kubectl installation and version
- ✅ Kubernetes cluster connectivity
- ✅ Namespace exists
- ✅ Service pods running (shows count: X/14)
- ✅ Database drivers (asyncpg) available
- ✅ Local migration directories exist
- ✅ Sufficient disk space
- ✅ Option to continue even if checks fail
### 6. Database Cleanup Helper Script
**New File**: `cleanup_databases_k8s.sh`
#### Purpose:
Standalone script to manually clean all service databases before running migration generation.
#### Usage:
```bash
# Clean all databases (with confirmation)
./cleanup_databases_k8s.sh
# Clean all databases without confirmation
./cleanup_databases_k8s.sh --yes
# Clean only specific service
./cleanup_databases_k8s.sh --service auth --yes
# Use different namespace
./cleanup_databases_k8s.sh --namespace staging
```
#### Features:
- ✅ Drops all tables using schema CASCADE
- ✅ Verifies cleanup success
- ✅ Shows before/after table counts
- ✅ Can target specific services
- ✅ Requires explicit confirmation (unless --yes)
- ✅ Comprehensive summary with success/failure counts
## Recommended Workflow
### For Clean Migration Generation:
```bash
# Step 1: Clean all databases
./cleanup_databases_k8s.sh --yes
# Step 2: Generate migrations
./regenerate_migrations_k8s.sh --verbose
# Step 3: Review generated migrations
ls -lh services/*/migrations/versions/
# Step 4: Apply migrations (if testing)
./regenerate_migrations_k8s.sh --apply
```
### For Production Deployment:
1. **Local Development**:
```bash
# Generate migrations with clean databases
./cleanup_databases_k8s.sh --yes
./regenerate_migrations_k8s.sh --verbose
```
2. **Commit Migrations**:
```bash
git add services/*/migrations/versions/*.py
git commit -m "Add initial schema migrations"
```
3. **Build Docker Images**:
- Migration files are included in Docker images
- No runtime generation needed
4. **Deploy to Production**:
- Set `ENVIRONMENT=production` in K8s manifests
- If migrations missing → Deployment will fail with clear error
- No `create_all()` fallback in production
## Environment Variables
### For DatabaseInitManager:
```yaml
# Kubernetes deployment example
env:
- name: ENVIRONMENT
value: "production" # or "staging", "development", "local", "test"
```
**Behavior by Environment**:
- **development/dev/local/test**: Allows `create_all()` fallback if no migrations
- **production/staging/prod**: Requires migrations, fails without them
## Script Options
### regenerate_migrations_k8s.sh
```bash
./regenerate_migrations_k8s.sh [OPTIONS]
Options:
--dry-run Show what would be done without making changes
--skip-backup Skip backing up existing migrations
--apply Automatically apply migrations after generation
--check-existing Check for and copy existing migrations from pods first
--verbose Enable detailed logging
--skip-db-check Skip database connectivity check
--namespace NAME Use specific Kubernetes namespace (default: bakery-ia)
```
### cleanup_databases_k8s.sh
```bash
./cleanup_databases_k8s.sh [OPTIONS]
Options:
--namespace NAME Use specific Kubernetes namespace (default: bakery-ia)
--service NAME Clean only specific service database
--yes Skip confirmation prompt
```
## Troubleshooting
### Problem: Empty Migrations Generated
**Symptoms**:
```python
def upgrade() -> None:
pass
def downgrade() -> None:
pass
```
**Root Cause**: Tables already exist in database matching models
**Solution**:
```bash
# Clean database first
./cleanup_databases_k8s.sh --service <service-name> --yes
# Regenerate migrations
./regenerate_migrations_k8s.sh --verbose
```
### Problem: "Database cleanup failed"
**Symptoms**:
```
✗ Database schema reset failed
ERROR: permission denied for schema public
```
**Solution**:
Check database user permissions. User needs `DROP SCHEMA` privilege:
```sql
GRANT ALL PRIVILEGES ON SCHEMA public TO <service_user>;
```
### Problem: "No migration file found in pod"
**Symptoms**:
```
✗ No migration file found in pod
```
**Possible Causes**:
1. Alembic autogenerate failed (check logs)
2. Models not properly imported
3. Migration directory permissions
**Solution**:
```bash
# Check pod logs
kubectl logs -n bakery-ia <pod-name> -c <service>-service
# Check if models are importable
kubectl exec -n bakery-ia <pod-name> -c <service>-service -- \
python3 -c "from app.models import *; print('OK')"
```
### Problem: kubectl cp Shows Success But File Not Copied
**Symptoms**:
```
✓ Migration file copied: file.py
# But ls shows empty directory
```
**Solution**: The new script now verifies file size and will show:
```
✗ Migration file is empty (0 bytes)
```
If this persists, check:
1. Filesystem permissions
2. Available disk space
3. Pod container status
## Testing
### Verify Script Improvements:
```bash
# 1. Run pre-flight checks
./regenerate_migrations_k8s.sh --dry-run
# 2. Test database cleanup
./cleanup_databases_k8s.sh --service auth --yes
# 3. Verify database is empty
kubectl exec -n bakery-ia <auth-pod> -c auth-service -- \
python3 -c "
import asyncio, os
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy import text
async def check():
engine = create_async_engine(os.getenv('AUTH_DATABASE_URL'))
async with engine.connect() as conn:
result = await conn.execute(text('SELECT COUNT(*) FROM pg_tables WHERE schemaname=\\'public\\''))
print(f'Tables: {result.scalar()}')
await engine.dispose()
asyncio.run(check())
"
# Expected output: Tables: 0
# 4. Generate migration
./regenerate_migrations_k8s.sh --verbose
# 5. Verify migration has content
cat services/auth/migrations/versions/*.py | grep "op.create_table"
```
## Migration File Validation
### Valid Migration (Has Schema Operations):
```python
def upgrade() -> None:
op.create_table('users',
sa.Column('id', sa.UUID(), nullable=False),
sa.Column('email', sa.String(255), nullable=False),
# ...
)
```
### Invalid Migration (Empty):
```python
def upgrade() -> None:
pass # ⚠ WARNING: No schema operations!
```
The script now:
- ✅ Detects empty migrations
- ✅ Shows warning with explanation
- ✅ Suggests checking database cleanup
## Summary of Changes
| Area | Before | After |
|------|--------|-------|
| **Table Drops** | Failed silently, errors hidden | Proper error handling, visible errors |
| **Database Reset** | Individual table drops (didn't work) | Full schema DROP CASCADE (guaranteed clean) |
| **File Copy** | No verification | Checks exit code, file existence, and size |
| **Error Visibility** | Errors redirected to log file | Errors shown in console immediately |
| **Production Safety** | Always allowed create_all() fallback | Fails in production without migrations |
| **Pre-flight Checks** | Basic kubectl check only | Comprehensive environment verification |
| **Database Cleanup** | Manual kubectl commands | Dedicated helper script |
| **Empty Migration Detection** | Silent generation | Clear warnings with explanation |
## Future Improvements (Not Implemented)
Potential future enhancements:
1. Parallel migration generation for faster execution
2. Migration content diffing against previous versions
3. Automatic rollback on migration generation failure
4. Integration with CI/CD pipelines
5. Migration validation against database constraints
6. Automatic schema comparison and drift detection
## Related Files
- `regenerate_migrations_k8s.sh` - Main migration generation script
- `cleanup_databases_k8s.sh` - Database cleanup helper
- `shared/database/init_manager.py` - Enhanced database initialization manager
- `services/*/migrations/versions/*.py` - Generated migration files
- `services/*/migrations/env.py` - Alembic environment configuration