diff --git a/DEPLOYMENT_TROUBLESHOOTING.md b/DEPLOYMENT_TROUBLESHOOTING.md deleted file mode 100644 index 273cd493..00000000 --- a/DEPLOYMENT_TROUBLESHOOTING.md +++ /dev/null @@ -1,195 +0,0 @@ -# Deployment Troubleshooting Guide - -This guide addresses common deployment issues encountered with the Bakery IA system. - -## Table of Contents - -- [Too Many Open Files Error](#too-many-open-files-error) -- [RouteBuilder TypeError Fix](#routebuilder-typeerror-fix) -- [General Kubernetes Troubleshooting](#general-kubernetes-troubleshooting) - -## Too Many Open Files Error - -### Symptoms -``` -failed to create fsnotify watcher: too many open files -Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files -``` - -### Root Cause -This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers. - -### Solutions - -#### For macOS (Docker Desktop) - -1. **Increase Docker Resources**: - - Open Docker Desktop - - Go to Settings > Resources > Advanced - - Increase memory allocation to 8GB or more - - Restart Docker Desktop - -2. **Clean Docker System**: - ```bash - docker system prune -a --volumes - ``` - -3. **Adjust macOS System Limits**: - ```bash - # Add to /etc/sysctl.conf - echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf - echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf - - # Apply changes - sudo sysctl -w kern.maxfiles=1048576 - sudo sysctl -w kern.maxfilesperproc=65536 - ``` - -#### For Linux (Kubernetes Nodes) - -1. **Temporary Fix**: - ```bash - sudo sysctl -w fs.inotify.max_user_watches=524288 - sudo sysctl -w fs.inotify.max_user_instances=1024 - sudo sysctl -w fs.inotify.max_queued_events=16384 - ``` - -2. **Permanent Fix**: - ```bash - # Add to /etc/sysctl.conf - echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf - echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf - echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf - - # Apply changes - sudo sysctl -p - ``` - -3. **Restart Kubernetes Components**: - ```bash - sudo systemctl restart kubelet - sudo systemctl restart docker - ``` - -#### For Kind Clusters - -```bash -# Delete and recreate cluster -kind delete cluster -kind create cluster -``` - -#### For Minikube - -```bash -minikube stop -minikube start -``` - -### Prevention - -Add security context to your deployments to limit resource usage: - -```yaml -securityContext: - runAsUser: 1000 - runAsGroup: 1000 - allowPrivilegeEscalation: false - readOnlyRootFilesystem: false -``` - -## RouteBuilder TypeError Fix - -### Symptoms -``` -TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given -``` - -### Root Cause -Incorrect usage of RouteBuilder methods. The `build_resource_detail_route` method only accepts 2-3 parameters, but was being called with 4-5 parameters. - -### Solution - -Use the correct RouteBuilder methods: - -- **For nested resources**: Use `build_nested_resource_route()` - ```python - # Wrong - route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback") - - # Correct - route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback") - ``` - -- **For resource actions**: Use `build_resource_action_route()` - ```python - # Wrong - route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain") - - # Correct - route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain") - ``` - -### Files Fixed -- `services/forecasting/app/api/forecast_feedback.py` - -## General Kubernetes Troubleshooting - -### Check Pod Status -```bash -kubectl get pods -n bakery-ia -kubectl describe pod distribution-service -n bakery-ia -``` - -### Check Logs -```bash -kubectl logs distribution-service -n bakery-ia -kubectl logs -f distribution-service -n bakery-ia # Follow logs -``` - -### Check Resource Usage -```bash -kubectl top pods -n bakery-ia -kubectl describe nodes | grep -A 10 "Allocated resources" -``` - -### Restart Deployment -```bash -kubectl rollout restart deployment distribution-service -n bakery-ia -``` - -### Scale Down/Up -```bash -kubectl scale deployment distribution-service -n bakery-ia --replicas=1 -kubectl scale deployment distribution-service -n bakery-ia --replicas=2 -``` - -## Running Fix Scripts - -### Fix Inotify Limits -```bash -cd scripts -./fix_kubernetes_inotify.sh -``` - -### Fix RouteBuilder Issues -The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues: - -1. Check the RouteBuilder method signatures in `shared/routing/route_builder.py` -2. Use the appropriate method for your routing pattern -3. Follow the examples in the fixed forecast feedback API - -## Additional Resources - -- [Kubernetes Inotify Limits](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files) -- [Docker Desktop Resource Limits](https://docs.docker.com/desktop/settings/mac/#resources) -- [RouteBuilder Documentation](shared/routing/route_builder.py) - -## Support - -If issues persist after trying these solutions: - -1. Check the specific error message and logs -2. Verify system resources (CPU, memory, disk) -3. Review recent changes to the codebase -4. Consult the architecture documentation for service boundaries \ No newline at end of file diff --git a/scripts/fix_inotify_limits.sh b/scripts/fix_inotify_limits.sh deleted file mode 100755 index 666fb377..00000000 --- a/scripts/fix_inotify_limits.sh +++ /dev/null @@ -1,48 +0,0 @@ -#!/bin/bash - -# Script to fix "too many open files" error in Kubernetes -# This error occurs when the system hits inotify limits - -echo "Fixing inotify limits for Kubernetes..." - -# Check current inotify limits -echo "Current inotify limits:" -sysctl fs.inotify.max_user_watches -sysctl fs.inotify.max_user_instances -sysctl fs.inotify.max_queued_events - -echo "" -echo "Increasing inotify limits..." - -# Increase inotify limits (temporary - lasts until reboot) -sudo sysctl -w fs.inotify.max_user_watches=524288 -sudo sysctl -w fs.inotify.max_user_instances=1024 -sudo sysctl -w fs.inotify.max_queued_events=16384 - -# Verify the changes -echo "" -echo "New inotify limits:" -sysctl fs.inotify.max_user_watches -sysctl fs.inotify.max_user_instances -sysctl fs.inotify.max_queued_events - -echo "" -echo "For permanent fix, add these lines to /etc/sysctl.conf:" -echo "fs.inotify.max_user_watches=524288" -echo "fs.inotify.max_user_instances=1024" -echo "fs.inotify.max_queued_events=16384" -echo "" -echo "Then run: sudo sysctl -p" - -echo "" -echo "If you're using Docker Desktop or Kind, you may need to:" -echo "1. Restart Docker Desktop" -echo "2. Or for Kind: kind delete cluster && kind create cluster" -echo "3. Or adjust the node's system limits directly" - -echo "" -echo "For production environments, consider adding these limits to your deployment:" -echo "securityContext:" -echo " runAsUser: 1000" -echo " runAsGroup: 1000" -echo " fsGroup: 1000" \ No newline at end of file diff --git a/scripts/fix_kubernetes_inotify.sh b/scripts/fix_kubernetes_inotify.sh deleted file mode 100755 index d79220bf..00000000 --- a/scripts/fix_kubernetes_inotify.sh +++ /dev/null @@ -1,100 +0,0 @@ -#!/bin/bash - -# Script to fix "too many open files" error in Kubernetes -# This error occurs when the system hits inotify limits - -echo "🔧 Fixing Kubernetes inotify limits..." - -# Check if we're running on macOS (Docker Desktop) or Linux -if [[ "$(uname)" == "Darwin" ]]; then - echo "🍎 Detected macOS - Docker Desktop environment" - echo "" - echo "For Docker Desktop on macOS, you need to:" - echo "1. Open Docker Desktop settings" - echo "2. Go to 'Resources' -> 'Advanced'" - echo "3. Increase the memory allocation (recommended: 8GB+)" - echo "4. Restart Docker Desktop" - echo "" - echo "Alternatively, you can run:" - echo "docker system prune -a --volumes" - echo "Then restart Docker Desktop" - - # Also check if we can adjust macOS system limits - echo "" - echo "Checking current macOS inotify limits..." - sysctl kern.maxfilesperproc - sysctl kern.maxfiles - - echo "" - echo "To increase macOS limits permanently, add to /etc/sysctl.conf:" - echo "kern.maxfiles=1048576" - echo "kern.maxfilesperproc=65536" - echo "Then run: sudo sysctl -w kern.maxfiles=1048576" - echo "And: sudo sysctl -w kern.maxfilesperproc=65536" - -elif [[ "$(uname)" == "Linux" ]]; then - echo "🐧 Detected Linux environment" - - # Check if we're in a Kubernetes cluster - if kubectl cluster-info >/dev/null 2>&1; then - echo "🎯 Detected Kubernetes cluster" - - # Check current inotify limits - echo "" - echo "Current inotify limits:" - sysctl fs.inotify.max_user_watches - sysctl fs.inotify.max_user_instances - sysctl fs.inotify.max_queued_events - - # Increase limits temporarily - echo "" - echo "Increasing inotify limits temporarily..." - sudo sysctl -w fs.inotify.max_user_watches=524288 - sudo sysctl -w fs.inotify.max_user_instances=1024 - sudo sysctl -w fs.inotify.max_queued_events=16384 - - # Verify changes - echo "" - echo "New inotify limits:" - sysctl fs.inotify.max_user_watches - sysctl fs.inotify.max_user_instances - sysctl fs.inotify.max_queued_events - - # Check if we can make permanent changes - if [[ -f /etc/sysctl.conf ]]; then - echo "" - echo "Making inotify limits permanent..." - sudo bash -c 'cat >> /etc/sysctl.conf << EOF -# Increased inotify limits for Kubernetes -fs.inotify.max_user_watches=524288 -fs.inotify.max_user_instances=1024 -fs.inotify.max_queued_events=16384 -EOF' - sudo sysctl -p - fi - - # Check for Docker containers that might need restarting - echo "" - echo "Checking for running containers that might need restarting..." - docker ps --format "{{.Names}}" | while read container; do - echo "Restarting container: $container" - docker restart "$container" >/dev/null 2>&1 || echo "Failed to restart $container" - done - - else - echo "⚠️ Kubernetes cluster not detected" - echo "This script should be run on a Kubernetes node or with kubectl access" - fi -else - echo "❓ Unsupported operating system: $(uname)" -fi - -echo "" -echo "📋 Additional recommendations:" -echo "1. For Kind clusters: kind delete cluster && kind create cluster" -echo "2. For Minikube: minikube stop && minikube start" -echo "3. For production: Adjust node system limits and restart kubelet" -echo "4. Consider adding resource limits to your deployments" - -echo "" -echo "✅ Inotify fix script completed!" \ No newline at end of file