Files
bakery-ia/ARCHITECTURE_QUICK_REFERENCE.md

252 lines
8.3 KiB
Markdown
Raw Normal View History

2025-10-01 12:17:59 +02:00
# Service Initialization - Quick Reference
## The Problem You Identified
**Question**: "We have a migration job that runs Alembic migrations. Why should we also run migrations in the service init process?"
**Answer**: **You shouldn't!** This is architectural redundancy that should be fixed.
## Current State (Redundant ❌)
```
┌─────────────────────────────────────────┐
│ Kubernetes Deployment Starts │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ 1. Migration Job Runs │
│ - Command: run_migrations.py │
│ - Calls: initialize_service_database│
│ - Runs: alembic upgrade head │
│ - Status: Complete ✓ │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ 2. Service Pod Starts │
│ - Startup: _handle_database_tables()│
│ - Calls: initialize_service_database│ ← REDUNDANT!
│ - Runs: alembic upgrade head │ ← REDUNDANT!
│ - Status: Complete ✓ │
└─────────────────────────────────────────┘
Service Ready (Slower)
```
**Problems**:
- ❌ Same code runs twice
- ❌ 1-2 seconds slower startup per pod
- ❌ Confusion: who is responsible for migrations?
- ❌ Race conditions possible with multiple replicas
## Recommended State (Efficient ✅)
```
┌─────────────────────────────────────────┐
│ Kubernetes Deployment Starts │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ 1. Migration Job Runs │
│ - Command: run_migrations.py │
│ - Runs: alembic upgrade head │
│ - Status: Complete ✓ │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ 2. Service Pod Starts │
│ - Startup: _verify_database_ready() │ ← VERIFY ONLY!
│ - Checks: Tables exist? ✓ │
│ - Checks: Alembic version? ✓ │
│ - NO migration execution │
└─────────────────────────────────────────┘
Service Ready (Faster!)
```
**Benefits**:
- ✅ Clear separation of concerns
- ✅ 50-80% faster service startup
- ✅ No race conditions
- ✅ Easier debugging
## Implementation (3 Simple Changes)
### 1. Add to `shared/database/init_manager.py`
```python
class DatabaseInitManager:
def __init__(
self,
# ... existing params
verify_only: bool = False # ← ADD THIS
):
self.verify_only = verify_only
async def initialize_database(self) -> Dict[str, Any]:
if self.verify_only:
# Only check DB is ready, don't run migrations
return await self._verify_database_state()
# Existing full initialization
# ...
```
### 2. Update `shared/service_base.py`
```python
async def _handle_database_tables(self):
skip_migrations = os.getenv("SKIP_MIGRATIONS", "false").lower() == "true"
result = await initialize_service_database(
database_manager=self.database_manager,
service_name=self.service_name,
verify_only=skip_migrations # ← ADD THIS PARAMETER
)
```
### 3. Add to Kubernetes Deployments
```yaml
containers:
- name: external-service
env:
- name: SKIP_MIGRATIONS # ← ADD THIS
value: "true" # Service only verifies, doesn't run migrations
- name: ENVIRONMENT
value: "production" # Disable create_all fallback
```
## Quick Decision Matrix
| Environment | SKIP_MIGRATIONS | ENVIRONMENT | Behavior |
|-------------|-----------------|-------------|----------|
| **Development** | `false` | `development` | Full check, allow create_all |
| **Staging** | `true` | `staging` | Verify only, fail if not ready |
| **Production** | `true` | `production` | Verify only, fail if not ready |
## What Each Component Does
### Migration Job (runs once on deployment)
```
✓ Creates tables (if first deployment)
✓ Runs pending migrations
✓ Updates alembic_version
✗ Does NOT start service
```
### Service Startup (runs on every pod)
**With SKIP_MIGRATIONS=false** (current):
```
✓ Checks database connection
✓ Checks for migrations
✓ Runs alembic upgrade head ← REDUNDANT
✓ Starts service
Time: ~3-5 seconds
```
**With SKIP_MIGRATIONS=true** (recommended):
```
✓ Checks database connection
✓ Verifies tables exist
✓ Verifies alembic_version exists
✗ Does NOT run migrations
✓ Starts service
Time: ~1-2 seconds ← 50-60% FASTER
```
## Testing the Change
### Before (Current Behavior):
```bash
# Check service logs
kubectl logs -n bakery-ia deployment/external-service | grep -i migration
# Output shows:
[info] Running pending migrations service=external
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
[info] Migrations applied successfully service=external
```
### After (With SKIP_MIGRATIONS=true):
```bash
# Check service logs
kubectl logs -n bakery-ia deployment/external-service | grep -i migration
# Output shows:
[info] Migration skip enabled - verifying database only
[info] Database verified successfully
```
## Rollout Strategy
### Step 1: Development (Test)
```bash
# In local development, test the change:
export SKIP_MIGRATIONS=true
# Start service - should verify DB and start fast
```
### Step 2: Staging (Validate)
```yaml
# Update staging manifests
env:
- name: SKIP_MIGRATIONS
value: "true"
```
### Step 3: Production (Deploy)
```yaml
# Update production manifests
env:
- name: SKIP_MIGRATIONS
value: "true"
- name: ENVIRONMENT
value: "production"
```
## Expected Results
### Performance:
- 📊 **Service startup**: 3-5s → 1-2s (50-60% faster)
- 📊 **Horizontal scaling**: Immediate (no migration check delay)
- 📊 **Database load**: Reduced (no redundant migration queries)
### Reliability:
- 🛡️ **No race conditions**: Only job handles migrations
- 🛡️ **Clear errors**: "DB not ready" vs "migration failed"
- 🛡️ **Fail-fast**: Services won't start if DB not initialized
### Maintainability:
- 📝 **Clear logs**: Migration job logs separate from service logs
- 📝 **Easier debugging**: Check job for migration issues
- 📝 **Clean architecture**: Operations separated from application
## FAQs
**Q: What if migrations fail in the job?**
A: Service pods won't start (they'll fail verification), which is correct behavior.
**Q: What about development where I want fast iteration?**
A: Keep `SKIP_MIGRATIONS=false` in development. Services will still run migrations.
**Q: Is this backwards compatible?**
A: Yes! Default behavior is unchanged. SKIP_MIGRATIONS only activates when explicitly set.
**Q: What about database schema drift?**
A: Services verify schema on startup (check alembic_version). If drift detected, startup fails.
**Q: Can I still use create_all() in development?**
A: Yes! Set `ENVIRONMENT=development` and `SKIP_MIGRATIONS=false`.
## Summary
**Your Question**: Why run migrations in both job and service?
**Answer**: You shouldn't! This is redundant architecture.
**Solution**: Add `SKIP_MIGRATIONS=true` to service deployments.
**Result**: Faster, clearer, more reliable service initialization.
**See Full Details**: `SERVICE_INITIALIZATION_ARCHITECTURE.md`