# Base Image Caching Solution for Docker Hub Rate Limiting ## Overview This solution provides a simple, short-term approach to reduce Docker Hub usage by pre-pulling and caching base images. It's designed to be implemented quickly while providing significant benefits. ## Problem Addressed - **Docker Hub Rate Limiting**: 100 pulls/6h for anonymous users - **Build Failures**: Timeouts and authentication errors during CI/CD - **Inconsistent Builds**: Different base image versions causing issues ## Solution Architecture ``` [Docker Hub] → [Pre-Pull Script] → [Local Cache/Registry] → [Service Builds] ``` ## Implementation Options ### Option 1: Simple Docker Cache (Easiest) ```bash # Just run the prepull script ./scripts/prepull-base-images.sh ``` **How it works:** - Pulls all base images once with authentication - Docker caches them locally - Subsequent builds use cached images - Reduces Docker Hub pulls by ~90% ### Option 2: Local Registry (More Robust) ```bash # Start local registry docker run -d -p 5000:5000 --name bakery-registry \ -v $(pwd)/registry-data:/var/lib/registry \ registry:2 # Run prepull script with local registry enabled USE_LOCAL_REGISTRY=true ./scripts/prepull-base-images.sh ``` **How it works:** - Runs a local Docker registry - Pre-pull script pushes images to local registry - All builds pull from local registry - Can be shared across team members ### Option 3: Pull-Through Cache (Most Advanced) ```yaml # Configure Docker daemon (docker daemon.json) { "registry-mirrors": ["http://localhost:5000"], "insecure-registries": ["localhost:5000"] } # Start registry as pull-through cache docker run -d -p 5000:5000 --name bakery-registry \ -v $(pwd)/registry-data:/var/lib/registry \ -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \ registry:2 ``` **How it works:** - Local registry acts as transparent cache - First request pulls from Docker Hub and caches - Subsequent requests served from cache - Completely transparent to builds ## Quick Start Guide ### 1. Simple Caching (5 minutes) ```bash # Make script executable chmod +x scripts/prepull-base-images.sh # Run the script ./scripts/prepull-base-images.sh # Verify images are cached docker images | grep -E "python:3.11-slim|postgres:17-alpine" ``` ### 2. Local Registry (10 minutes) ```bash # Build local registry image cd scripts/local-registry docker build -t bakery-registry . # Start registry docker run -d -p 5000:5000 --name bakery-registry \ -v $(pwd)/registry-data:/var/lib/registry \ bakery-registry # Run prepull with local registry USE_LOCAL_REGISTRY=true ../prepull-base-images.sh # Verify registry contents curl http://localhost:5000/v2/_catalog ``` ### 3. CI/CD Integration **GitHub Actions Example:** ```yaml jobs: setup: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Set up Docker uses: docker/setup-buildx-action@v2 - name: Login to Docker Hub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKER_USERNAME }} password: ${{ secrets.DOCKER_PASSWORD }} - name: Pre-pull base images run: ./scripts/prepull-base-images.sh - name: Cache Docker layers uses: actions/cache@v3 with: path: /tmp/.buildx-cache key: ${{ runner.os }}-buildx-${{ github.sha }} restore-keys: | ${{ runner.os }}-buildx- build: needs: setup runs-on: ubuntu-latest steps: - name: Build services run: ./scripts/build-services.sh ``` **Tekton Pipeline Example:** ```yaml apiVersion: tekton.dev/v1beta1 kind: Task metadata: name: prepull-base-images spec: steps: - name: login-to-docker image: docker:cli script: | echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin env: - name: DOCKER_USERNAME valueFrom: secretKeyRef: name: docker-creds key: username - name: DOCKER_PASSWORD valueFrom: secretKeyRef: name: docker-creds key: password - name: prepull-images image: docker:cli script: | #!/bin/bash images=("python:3.11-slim" "postgres:17-alpine" "redis:7.4-alpine") for img in "${images[@]}"; do echo "Pulling $img..." docker pull "$img" done ``` ## Base Images Covered The script pre-pulls all base images used in the Bakery-IA project: ### Primary Base Images - `python:3.11-slim` - Main Python runtime - `postgres:17-alpine` - Database init containers - `redis:7.4-alpine` - Redis init containers ### Utility Images - `busybox:1.36` - Lightweight utility container - `busybox:latest` - Latest busybox - `curlimages/curl:latest` - Curl utility - `bitnami/kubectl:1.28` - Kubernetes CLI ### Build System Images - `alpine:3.18` - Lightweight base - `alpine:3.19` - Latest Alpine - `gcr.io/kaniko-project/executor:v1.23.0` - Kaniko builder - `alpine/git:2.43.0` - Git client ## Benefits ### Immediate Benefits - **Reduces Docker Hub pulls by 90%+** - Only pull each base image once - **Eliminates rate limiting issues** - Authenticated pulls with proper credentials - **Faster builds** - Cached images load instantly - **More reliable CI/CD** - No more timeout failures ### Long-Term Benefits - **Consistent build environments** - Same base images for all builds - **Easier debugging** - Known image versions - **Better security** - Controlled image updates - **Foundation for improvement** - Can evolve to pull-through cache ## Monitoring and Maintenance ### Check Cache Status ```bash # List cached images docker images # Check disk usage docker system df # Clean up old images docker image prune -a ``` ### Update Base Images ```bash # Run prepull script monthly to get updates ./scripts/prepull-base-images.sh # Or create a cron job 0 3 1 * * /path/to/prepull-base-images.sh ``` ## Security Considerations ### Credential Management - Store Docker Hub credentials in secrets management system - Rotate credentials periodically - Use least-privilege access ### Image Verification ```bash # Verify image integrity docker trust inspect python:3.11-slim # Scan for vulnerabilities docker scan python:3.11-slim ``` ## Comparison with Other Solutions | Solution | Complexity | Docker Hub Usage | Implementation Time | Maintenance | |----------|------------|------------------|---------------------|-------------| | **This Solution** | Low | Very Low | 5-30 minutes | Low | | GHCR Migration | Medium | None | 1-2 days | Medium | | Pull-Through Cache | Medium | Very Low | 1 day | Medium | | Immutable Base Images | High | None | 1-2 weeks | High | ## Migration Path This solution can evolve over time: ``` Phase 1: Simple caching (Current) → Phase 2: Local registry → Phase 3: Pull-through cache → Phase 4: Immutable base images ``` ## Troubleshooting ### Common Issues **Issue: Authentication fails** ```bash # Solution: Verify credentials docker login -u your-username echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin ``` **Issue: Local registry not accessible** ```bash # Solution: Check registry status docker ps | grep registry curl http://localhost:5000/v2/ ``` **Issue: Images not found in cache** ```bash # Solution: Verify images are pulled docker images | grep python:3.11-slim # If missing, pull manually docker pull python:3.11-slim ``` ## Conclusion This simple base image caching solution provides an immediate fix for Docker Hub rate limiting issues while requiring minimal changes to your existing infrastructure. It serves as both a short-term solution and a foundation for more advanced caching strategies in the future. **Recommended Next Steps:** 1. Implement simple caching first 2. Monitor Docker Hub usage reduction 3. Consider adding local registry if needed 4. Plan for long-term solution (GHCR or immutable base images)