121 lines
2.9 KiB
Markdown
121 lines
2.9 KiB
Markdown
|
|
# Docker Maintenance Guide for Local Development
|
||
|
|
|
||
|
|
## The Problem
|
||
|
|
|
||
|
|
When developing with Tilt and local Kubernetes (Kind), Docker accumulates:
|
||
|
|
- **Multiple image versions** from each code change (Tilt rebuilds)
|
||
|
|
- **Unused volumes** from previous cluster runs
|
||
|
|
- **Build cache** that grows over time
|
||
|
|
|
||
|
|
This quickly fills up disk space, causing pods to fail with "No space left on device" errors.
|
||
|
|
|
||
|
|
## Quick Fix (When You Hit Disk Issues)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clean up all unused Docker resources
|
||
|
|
docker system prune -a --volumes -f
|
||
|
|
```
|
||
|
|
|
||
|
|
This removes:
|
||
|
|
- All unused images
|
||
|
|
- All unused volumes
|
||
|
|
- All build cache
|
||
|
|
|
||
|
|
**Expected recovery**: 60-100GB
|
||
|
|
|
||
|
|
## Regular Maintenance
|
||
|
|
|
||
|
|
### Option 1: Use the Cleanup Script (Recommended)
|
||
|
|
|
||
|
|
Run the maintenance script weekly:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./scripts/cleanup-docker.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
Or run it automatically without confirmation:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./scripts/cleanup-docker.sh --auto
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option 2: Manual Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Remove images older than 24 hours
|
||
|
|
docker image prune -af --filter "until=24h"
|
||
|
|
|
||
|
|
# Remove unused volumes
|
||
|
|
docker volume prune -f
|
||
|
|
|
||
|
|
# Remove build cache
|
||
|
|
docker builder prune -af
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option 3: Set Up Automated Cleanup
|
||
|
|
|
||
|
|
Add to your crontab (run every Sunday at 2 AM):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
crontab -e
|
||
|
|
# Add this line:
|
||
|
|
0 2 * * 0 /Users/urtzialfaro/Documents/bakery-ia/scripts/cleanup-docker.sh --auto >> /tmp/docker-cleanup.log 2>&1
|
||
|
|
```
|
||
|
|
|
||
|
|
## Monitoring Disk Usage
|
||
|
|
|
||
|
|
### Check Docker disk usage:
|
||
|
|
```bash
|
||
|
|
docker system df
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Kind node disk usage:
|
||
|
|
```bash
|
||
|
|
docker exec bakery-ia-local-control-plane df -h /var
|
||
|
|
```
|
||
|
|
|
||
|
|
### Alert thresholds:
|
||
|
|
- **< 70%**: Healthy ✅
|
||
|
|
- **70-85%**: Consider cleanup soon ⚠️
|
||
|
|
- **> 85%**: Run cleanup immediately 🚨
|
||
|
|
- **> 95%**: Critical - pods will fail ❌
|
||
|
|
|
||
|
|
## Prevention Tips
|
||
|
|
|
||
|
|
1. **Run cleanup weekly** to prevent accumulation
|
||
|
|
2. **Monitor disk usage** before long dev sessions
|
||
|
|
3. **Delete old Kind clusters** when switching projects:
|
||
|
|
```bash
|
||
|
|
kind delete cluster --name bakery-ia-local
|
||
|
|
```
|
||
|
|
4. **Increase Docker disk allocation** in Docker Desktop settings if you frequently rebuild many services
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Pods in CrashLoopBackOff after disk issues:
|
||
|
|
|
||
|
|
1. Run cleanup (see Quick Fix above)
|
||
|
|
2. Restart failed pods:
|
||
|
|
```bash
|
||
|
|
kubectl get pods -n bakery-ia | grep -E "(CrashLoopBackOff|Error)" | awk '{print $1}' | xargs kubectl delete pod -n bakery-ia
|
||
|
|
```
|
||
|
|
|
||
|
|
### Cleanup didn't free enough space:
|
||
|
|
|
||
|
|
If still above 90% after cleanup:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Nuclear option - rebuild everything
|
||
|
|
kind delete cluster --name bakery-ia-local
|
||
|
|
docker system prune -a --volumes -f
|
||
|
|
# Then recreate cluster with your setup scripts
|
||
|
|
```
|
||
|
|
|
||
|
|
## What Happened Today (2026-01-12)
|
||
|
|
|
||
|
|
- **Issue**: Disk was 100% full (113GB/113GB), causing database pods to crash
|
||
|
|
- **Root cause**: 122 unused Docker images + 16 unused volumes + 6GB build cache
|
||
|
|
- **Solution**: Ran `docker system prune -a --volumes -f`
|
||
|
|
- **Result**: Freed 89GB, disk now at 22% usage (24GB/113GB)
|
||
|
|
- **All services recovered successfully**
|