bakery-ia/DOCKER_MAINTENANCE.md

# Docker Maintenance Guide for Local Development

## The Problem

When developing with Tilt and local Kubernetes (Kind), Docker accumulates:
- **Multiple image versions** from each code change (Tilt rebuilds)
- **Unused volumes** from previous cluster runs
- **Build cache** that grows over time

This quickly fills up disk space, causing pods to fail with "No space left on device" errors.

## Quick Fix (When You Hit Disk Issues)

```bash
# Clean up all unused Docker resources
docker system prune -a --volumes -f
```

This removes:
- All unused images
- All unused volumes
- All build cache

**Expected recovery**: 60-100GB

## Regular Maintenance

### Option 1: Use the Cleanup Script (Recommended)

Run the maintenance script weekly:

```bash
./scripts/cleanup-docker.sh
```

Or run it automatically without confirmation:

```bash
./scripts/cleanup-docker.sh --auto
```

### Option 2: Manual Commands

```bash
# Remove images older than 24 hours
docker image prune -af --filter "until=24h"

# Remove unused volumes
docker volume prune -f

# Remove build cache
docker builder prune -af
```

### Option 3: Set Up Automated Cleanup

Add to your crontab (run every Sunday at 2 AM):

```bash
crontab -e
# Add this line:
0 2 * * 0 /Users/urtzialfaro/Documents/bakery-ia/scripts/cleanup-docker.sh --auto >> /tmp/docker-cleanup.log 2>&1
```

## Monitoring Disk Usage

### Check Docker disk usage:
```bash
docker system df
```

### Check Kind node disk usage:
```bash
docker exec bakery-ia-local-control-plane df -h /var
```

### Alert thresholds:
- **< 70%**: Healthy ✅
- **70-85%**: Consider cleanup soon ⚠️
- **> 85%**: Run cleanup immediately 🚨
- **> 95%**: Critical - pods will fail ❌

## Prevention Tips

1. **Run cleanup weekly** to prevent accumulation
2. **Monitor disk usage** before long dev sessions
3. **Delete old Kind clusters** when switching projects:
   ```bash
   kind delete cluster --name bakery-ia-local
   ```
4. **Increase Docker disk allocation** in Docker Desktop settings if you frequently rebuild many services

## Troubleshooting

### Pods in CrashLoopBackOff after disk issues:

1. Run cleanup (see Quick Fix above)
2. Restart failed pods:
   ```bash
   kubectl get pods -n bakery-ia | grep -E "(CrashLoopBackOff|Error)" | awk '{print $1}' | xargs kubectl delete pod -n bakery-ia
   ```

### Cleanup didn't free enough space:

If still above 90% after cleanup:

```bash
# Nuclear option - rebuild everything
kind delete cluster --name bakery-ia-local
docker system prune -a --volumes -f
# Then recreate cluster with your setup scripts
```

## What Happened Today (2026-01-12)

- **Issue**: Disk was 100% full (113GB/113GB), causing database pods to crash
- **Root cause**: 122 unused Docker images + 16 unused volumes + 6GB build cache
- **Solution**: Ran `docker system prune -a --volumes -f`
- **Result**: Freed 89GB, disk now at 22% usage (24GB/113GB)
- **All services recovered successfully**