Files
bakery-ia/DEPLOYMENT_TROUBLESHOOTING.md
2025-12-17 20:50:22 +01:00

5.1 KiB

Deployment Troubleshooting Guide

This guide addresses common deployment issues encountered with the Bakery IA system.

Table of Contents

Too Many Open Files Error

Symptoms

failed to create fsnotify watcher: too many open files
Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files

Root Cause

This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers.

Solutions

For macOS (Docker Desktop)

  1. Increase Docker Resources:

    • Open Docker Desktop
    • Go to Settings > Resources > Advanced
    • Increase memory allocation to 8GB or more
    • Restart Docker Desktop
  2. Clean Docker System:

    docker system prune -a --volumes
    
  3. Adjust macOS System Limits:

    # Add to /etc/sysctl.conf
    echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf
    echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf
    
    # Apply changes
    sudo sysctl -w kern.maxfiles=1048576
    sudo sysctl -w kern.maxfilesperproc=65536
    

For Linux (Kubernetes Nodes)

  1. Temporary Fix:

    sudo sysctl -w fs.inotify.max_user_watches=524288
    sudo sysctl -w fs.inotify.max_user_instances=1024
    sudo sysctl -w fs.inotify.max_queued_events=16384
    
  2. Permanent Fix:

    # Add to /etc/sysctl.conf
    echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
    echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf
    echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf
    
    # Apply changes
    sudo sysctl -p
    
  3. Restart Kubernetes Components:

    sudo systemctl restart kubelet
    sudo systemctl restart docker
    

For Kind Clusters

# Delete and recreate cluster
kind delete cluster
kind create cluster

For Minikube

minikube stop
minikube start

Prevention

Add security context to your deployments to limit resource usage:

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false

RouteBuilder TypeError Fix

Symptoms

TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given

Root Cause

Incorrect usage of RouteBuilder methods. The build_resource_detail_route method only accepts 2-3 parameters, but was being called with 4-5 parameters.

Solution

Use the correct RouteBuilder methods:

  • For nested resources: Use build_nested_resource_route()

    # Wrong
    route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback")
    
    # Correct
    route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback")
    
  • For resource actions: Use build_resource_action_route()

    # Wrong
    route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain")
    
    # Correct
    route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain")
    

Files Fixed

  • services/forecasting/app/api/forecast_feedback.py

General Kubernetes Troubleshooting

Check Pod Status

kubectl get pods -n bakery-ia
kubectl describe pod distribution-service -n bakery-ia

Check Logs

kubectl logs distribution-service -n bakery-ia
kubectl logs -f distribution-service -n bakery-ia  # Follow logs

Check Resource Usage

kubectl top pods -n bakery-ia
kubectl describe nodes | grep -A 10 "Allocated resources"

Restart Deployment

kubectl rollout restart deployment distribution-service -n bakery-ia

Scale Down/Up

kubectl scale deployment distribution-service -n bakery-ia --replicas=1
kubectl scale deployment distribution-service -n bakery-ia --replicas=2

Running Fix Scripts

Fix Inotify Limits

cd scripts
./fix_kubernetes_inotify.sh

Fix RouteBuilder Issues

The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues:

  1. Check the RouteBuilder method signatures in shared/routing/route_builder.py
  2. Use the appropriate method for your routing pattern
  3. Follow the examples in the fixed forecast feedback API

Additional Resources

Support

If issues persist after trying these solutions:

  1. Check the specific error message and logs
  2. Verify system resources (CPU, memory, disk)
  3. Review recent changes to the codebase
  4. Consult the architecture documentation for service boundaries