Add improvements 2

2026-01-12 22:15:11 +01:00
parent 230bbe6a19
commit b931a5c45e
40 changed files with 1820 additions and 887 deletions
--- a/DOCKER_MAINTENANCE.md
+++ b/DOCKER_MAINTENANCE.md
@@ -0,0 +1,120 @@
+# Docker Maintenance Guide for Local Development
+
+## The Problem
+
+When developing with Tilt and local Kubernetes (Kind), Docker accumulates:
+- **Multiple image versions** from each code change (Tilt rebuilds)
+- **Unused volumes** from previous cluster runs
+- **Build cache** that grows over time
+
+This quickly fills up disk space, causing pods to fail with "No space left on device" errors.
+
+## Quick Fix (When You Hit Disk Issues)
+
+```bash
+# Clean up all unused Docker resources
+docker system prune -a --volumes -f
+```
+
+This removes:
+- All unused images
+- All unused volumes
+- All build cache
+
+**Expected recovery**: 60-100GB
+
+## Regular Maintenance
+
+### Option 1: Use the Cleanup Script (Recommended)
+
+Run the maintenance script weekly:
+
+```bash
+./scripts/cleanup-docker.sh
+```
+
+Or run it automatically without confirmation:
+
+```bash
+./scripts/cleanup-docker.sh --auto
+```
+
+### Option 2: Manual Commands
+
+```bash
+# Remove images older than 24 hours
+docker image prune -af --filter "until=24h"
+
+# Remove unused volumes
+docker volume prune -f
+
+# Remove build cache
+docker builder prune -af
+```
+
+### Option 3: Set Up Automated Cleanup
+
+Add to your crontab (run every Sunday at 2 AM):
+
+```bash
+crontab -e
+# Add this line:
+0 2 * * 0 /Users/urtzialfaro/Documents/bakery-ia/scripts/cleanup-docker.sh --auto >> /tmp/docker-cleanup.log 2>&1
+```
+
+## Monitoring Disk Usage
+
+### Check Docker disk usage:
+```bash
+docker system df
+```
+
+### Check Kind node disk usage:
+```bash
+docker exec bakery-ia-local-control-plane df -h /var
+```
+
+### Alert thresholds:
+- **< 70%**: Healthy ✅
+- **70-85%**: Consider cleanup soon ⚠️
+- **> 85%**: Run cleanup immediately 🚨
+- **> 95%**: Critical - pods will fail ❌
+
+## Prevention Tips
+
+1. **Run cleanup weekly** to prevent accumulation
+2. **Monitor disk usage** before long dev sessions
+3. **Delete old Kind clusters** when switching projects:
+   ```bash
+   kind delete cluster --name bakery-ia-local
+   ```
+4. **Increase Docker disk allocation** in Docker Desktop settings if you frequently rebuild many services
+
+## Troubleshooting
+
+### Pods in CrashLoopBackOff after disk issues:
+
+1. Run cleanup (see Quick Fix above)
+2. Restart failed pods:
+   ```bash
+   kubectl get pods -n bakery-ia | grep -E "(CrashLoopBackOff|Error)" | awk '{print $1}' | xargs kubectl delete pod -n bakery-ia
+   ```
+
+### Cleanup didn't free enough space:
+
+If still above 90% after cleanup:
+
+```bash
+# Nuclear option - rebuild everything
+kind delete cluster --name bakery-ia-local
+docker system prune -a --volumes -f
+# Then recreate cluster with your setup scripts
+```
+
+## What Happened Today (2026-01-12)
+
+- **Issue**: Disk was 100% full (113GB/113GB), causing database pods to crash
+- **Root cause**: 122 unused Docker images + 16 unused volumes + 6GB build cache
+- **Solution**: Ran `docker system prune -a --volumes -f`
+- **Result**: Freed 89GB, disk now at 22% usage (24GB/113GB)
+- **All services recovered successfully**