Files
bakery-ia/scripts/BASE_IMAGE_CACHING_SOLUTION.md
2026-01-19 11:55:17 +01:00

7.9 KiB

Base Image Caching Solution for Docker Hub Rate Limiting

Overview

This solution provides a simple, short-term approach to reduce Docker Hub usage by pre-pulling and caching base images. It's designed to be implemented quickly while providing significant benefits.

Problem Addressed

  • Docker Hub Rate Limiting: 100 pulls/6h for anonymous users
  • Build Failures: Timeouts and authentication errors during CI/CD
  • Inconsistent Builds: Different base image versions causing issues

Solution Architecture

[Docker Hub] → [Pre-Pull Script] → [Local Cache/Registry] → [Service Builds]

Implementation Options

Option 1: Simple Docker Cache (Easiest)

# Just run the prepull script
./scripts/prepull-base-images.sh

How it works:

  • Pulls all base images once with authentication
  • Docker caches them locally
  • Subsequent builds use cached images
  • Reduces Docker Hub pulls by ~90%

Option 2: Local Registry (More Robust)

# Start local registry
docker run -d -p 5000:5000 --name bakery-registry \
  -v $(pwd)/registry-data:/var/lib/registry \
  registry:2

# Run prepull script with local registry enabled
USE_LOCAL_REGISTRY=true ./scripts/prepull-base-images.sh

How it works:

  • Runs a local Docker registry
  • Pre-pull script pushes images to local registry
  • All builds pull from local registry
  • Can be shared across team members

Option 3: Pull-Through Cache (Most Advanced)

# Configure Docker daemon (docker daemon.json)
{
  "registry-mirrors": ["http://localhost:5000"],
  "insecure-registries": ["localhost:5000"]
}

# Start registry as pull-through cache
docker run -d -p 5000:5000 --name bakery-registry \
  -v $(pwd)/registry-data:/var/lib/registry \
  -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
  registry:2

How it works:

  • Local registry acts as transparent cache
  • First request pulls from Docker Hub and caches
  • Subsequent requests served from cache
  • Completely transparent to builds

Quick Start Guide

1. Simple Caching (5 minutes)

# Make script executable
chmod +x scripts/prepull-base-images.sh

# Run the script
./scripts/prepull-base-images.sh

# Verify images are cached
docker images | grep -E "python:3.11-slim|postgres:17-alpine"

2. Local Registry (10 minutes)

# Build local registry image
cd scripts/local-registry
docker build -t bakery-registry .

# Start registry
docker run -d -p 5000:5000 --name bakery-registry \
  -v $(pwd)/registry-data:/var/lib/registry \
  bakery-registry

# Run prepull with local registry
USE_LOCAL_REGISTRY=true ../prepull-base-images.sh

# Verify registry contents
curl http://localhost:5000/v2/_catalog

3. CI/CD Integration

GitHub Actions Example:

jobs:
  setup:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      
      - name: Set up Docker
        uses: docker/setup-buildx-action@v2
      
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
      
      - name: Pre-pull base images
        run: ./scripts/prepull-base-images.sh
      
      - name: Cache Docker layers
        uses: actions/cache@v3
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-buildx-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-buildx-

  build:
    needs: setup
    runs-on: ubuntu-latest
    steps:
      - name: Build services
        run: ./scripts/build-services.sh

Tekton Pipeline Example:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: prepull-base-images
spec:
  steps:
    - name: login-to-docker
      image: docker:cli
      script: |
        echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin
      env:
        - name: DOCKER_USERNAME
          valueFrom:
            secretKeyRef:
              name: docker-creds
              key: username
        - name: DOCKER_PASSWORD
          valueFrom:
            secretKeyRef:
              name: docker-creds
              key: password
    
    - name: prepull-images
      image: docker:cli
      script: |
        #!/bin/bash
        images=("python:3.11-slim" "postgres:17-alpine" "redis:7.4-alpine")
        for img in "${images[@]}"; do
          echo "Pulling $img..."
          docker pull "$img"
        done

Base Images Covered

The script pre-pulls all base images used in the Bakery-IA project:

Primary Base Images

  • python:3.11-slim - Main Python runtime
  • postgres:17-alpine - Database init containers
  • redis:7.4-alpine - Redis init containers

Utility Images

  • busybox:1.36 - Lightweight utility container
  • busybox:latest - Latest busybox
  • curlimages/curl:latest - Curl utility
  • bitnami/kubectl:1.28 - Kubernetes CLI

Build System Images

  • alpine:3.18 - Lightweight base
  • alpine:3.19 - Latest Alpine
  • gcr.io/kaniko-project/executor:v1.23.0 - Kaniko builder
  • alpine/git:2.43.0 - Git client

Benefits

Immediate Benefits

  • Reduces Docker Hub pulls by 90%+ - Only pull each base image once
  • Eliminates rate limiting issues - Authenticated pulls with proper credentials
  • Faster builds - Cached images load instantly
  • More reliable CI/CD - No more timeout failures

Long-Term Benefits

  • Consistent build environments - Same base images for all builds
  • Easier debugging - Known image versions
  • Better security - Controlled image updates
  • Foundation for improvement - Can evolve to pull-through cache

Monitoring and Maintenance

Check Cache Status

# List cached images
docker images

# Check disk usage
docker system df

# Clean up old images
docker image prune -a

Update Base Images

# Run prepull script monthly to get updates
./scripts/prepull-base-images.sh

# Or create a cron job
0 3 1 * * /path/to/prepull-base-images.sh

Security Considerations

Credential Management

  • Store Docker Hub credentials in secrets management system
  • Rotate credentials periodically
  • Use least-privilege access

Image Verification

# Verify image integrity
docker trust inspect python:3.11-slim

# Scan for vulnerabilities
docker scan python:3.11-slim

Comparison with Other Solutions

Solution Complexity Docker Hub Usage Implementation Time Maintenance
This Solution Low Very Low 5-30 minutes Low
GHCR Migration Medium None 1-2 days Medium
Pull-Through Cache Medium Very Low 1 day Medium
Immutable Base Images High None 1-2 weeks High

Migration Path

This solution can evolve over time:

Phase 1: Simple caching (Current) → Phase 2: Local registry → Phase 3: Pull-through cache → Phase 4: Immutable base images

Troubleshooting

Common Issues

Issue: Authentication fails

# Solution: Verify credentials
docker login -u your-username
echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin

Issue: Local registry not accessible

# Solution: Check registry status
docker ps | grep registry
curl http://localhost:5000/v2/

Issue: Images not found in cache

# Solution: Verify images are pulled
docker images | grep python:3.11-slim
# If missing, pull manually
docker pull python:3.11-slim

Conclusion

This simple base image caching solution provides an immediate fix for Docker Hub rate limiting issues while requiring minimal changes to your existing infrastructure. It serves as both a short-term solution and a foundation for more advanced caching strategies in the future.

Recommended Next Steps:

  1. Implement simple caching first
  2. Monitor Docker Hub usage reduction
  3. Consider adding local registry if needed
  4. Plan for long-term solution (GHCR or immutable base images)