Files
bakery-ia/DOCKER_MAINTENANCE.md
2026-01-12 22:15:11 +01:00

121 lines
2.9 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Docker Maintenance Guide for Local Development
## The Problem
When developing with Tilt and local Kubernetes (Kind), Docker accumulates:
- **Multiple image versions** from each code change (Tilt rebuilds)
- **Unused volumes** from previous cluster runs
- **Build cache** that grows over time
This quickly fills up disk space, causing pods to fail with "No space left on device" errors.
## Quick Fix (When You Hit Disk Issues)
```bash
# Clean up all unused Docker resources
docker system prune -a --volumes -f
```
This removes:
- All unused images
- All unused volumes
- All build cache
**Expected recovery**: 60-100GB
## Regular Maintenance
### Option 1: Use the Cleanup Script (Recommended)
Run the maintenance script weekly:
```bash
./scripts/cleanup-docker.sh
```
Or run it automatically without confirmation:
```bash
./scripts/cleanup-docker.sh --auto
```
### Option 2: Manual Commands
```bash
# Remove images older than 24 hours
docker image prune -af --filter "until=24h"
# Remove unused volumes
docker volume prune -f
# Remove build cache
docker builder prune -af
```
### Option 3: Set Up Automated Cleanup
Add to your crontab (run every Sunday at 2 AM):
```bash
crontab -e
# Add this line:
0 2 * * 0 /Users/urtzialfaro/Documents/bakery-ia/scripts/cleanup-docker.sh --auto >> /tmp/docker-cleanup.log 2>&1
```
## Monitoring Disk Usage
### Check Docker disk usage:
```bash
docker system df
```
### Check Kind node disk usage:
```bash
docker exec bakery-ia-local-control-plane df -h /var
```
### Alert thresholds:
- **< 70%**: Healthy
- **70-85%**: Consider cleanup soon
- **> 85%**: Run cleanup immediately 🚨
- **> 95%**: Critical - pods will fail ❌
## Prevention Tips
1. **Run cleanup weekly** to prevent accumulation
2. **Monitor disk usage** before long dev sessions
3. **Delete old Kind clusters** when switching projects:
```bash
kind delete cluster --name bakery-ia-local
```
4. **Increase Docker disk allocation** in Docker Desktop settings if you frequently rebuild many services
## Troubleshooting
### Pods in CrashLoopBackOff after disk issues:
1. Run cleanup (see Quick Fix above)
2. Restart failed pods:
```bash
kubectl get pods -n bakery-ia | grep -E "(CrashLoopBackOff|Error)" | awk '{print $1}' | xargs kubectl delete pod -n bakery-ia
```
### Cleanup didn't free enough space:
If still above 90% after cleanup:
```bash
# Nuclear option - rebuild everything
kind delete cluster --name bakery-ia-local
docker system prune -a --volumes -f
# Then recreate cluster with your setup scripts
```
## What Happened Today (2026-01-12)
- **Issue**: Disk was 100% full (113GB/113GB), causing database pods to crash
- **Root cause**: 122 unused Docker images + 16 unused volumes + 6GB build cache
- **Solution**: Ran `docker system prune -a --volumes -f`
- **Result**: Freed 89GB, disk now at 22% usage (24GB/113GB)
- **All services recovered successfully**