Delete files
This commit is contained in:
@@ -1,195 +0,0 @@
|
||||
# Deployment Troubleshooting Guide
|
||||
|
||||
This guide addresses common deployment issues encountered with the Bakery IA system.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Too Many Open Files Error](#too-many-open-files-error)
|
||||
- [RouteBuilder TypeError Fix](#routebuilder-typeerror-fix)
|
||||
- [General Kubernetes Troubleshooting](#general-kubernetes-troubleshooting)
|
||||
|
||||
## Too Many Open Files Error
|
||||
|
||||
### Symptoms
|
||||
```
|
||||
failed to create fsnotify watcher: too many open files
|
||||
Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files
|
||||
```
|
||||
|
||||
### Root Cause
|
||||
This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers.
|
||||
|
||||
### Solutions
|
||||
|
||||
#### For macOS (Docker Desktop)
|
||||
|
||||
1. **Increase Docker Resources**:
|
||||
- Open Docker Desktop
|
||||
- Go to Settings > Resources > Advanced
|
||||
- Increase memory allocation to 8GB or more
|
||||
- Restart Docker Desktop
|
||||
|
||||
2. **Clean Docker System**:
|
||||
```bash
|
||||
docker system prune -a --volumes
|
||||
```
|
||||
|
||||
3. **Adjust macOS System Limits**:
|
||||
```bash
|
||||
# Add to /etc/sysctl.conf
|
||||
echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf
|
||||
echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf
|
||||
|
||||
# Apply changes
|
||||
sudo sysctl -w kern.maxfiles=1048576
|
||||
sudo sysctl -w kern.maxfilesperproc=65536
|
||||
```
|
||||
|
||||
#### For Linux (Kubernetes Nodes)
|
||||
|
||||
1. **Temporary Fix**:
|
||||
```bash
|
||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
||||
```
|
||||
|
||||
2. **Permanent Fix**:
|
||||
```bash
|
||||
# Add to /etc/sysctl.conf
|
||||
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
|
||||
echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf
|
||||
echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf
|
||||
|
||||
# Apply changes
|
||||
sudo sysctl -p
|
||||
```
|
||||
|
||||
3. **Restart Kubernetes Components**:
|
||||
```bash
|
||||
sudo systemctl restart kubelet
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
#### For Kind Clusters
|
||||
|
||||
```bash
|
||||
# Delete and recreate cluster
|
||||
kind delete cluster
|
||||
kind create cluster
|
||||
```
|
||||
|
||||
#### For Minikube
|
||||
|
||||
```bash
|
||||
minikube stop
|
||||
minikube start
|
||||
```
|
||||
|
||||
### Prevention
|
||||
|
||||
Add security context to your deployments to limit resource usage:
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: false
|
||||
```
|
||||
|
||||
## RouteBuilder TypeError Fix
|
||||
|
||||
### Symptoms
|
||||
```
|
||||
TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given
|
||||
```
|
||||
|
||||
### Root Cause
|
||||
Incorrect usage of RouteBuilder methods. The `build_resource_detail_route` method only accepts 2-3 parameters, but was being called with 4-5 parameters.
|
||||
|
||||
### Solution
|
||||
|
||||
Use the correct RouteBuilder methods:
|
||||
|
||||
- **For nested resources**: Use `build_nested_resource_route()`
|
||||
```python
|
||||
# Wrong
|
||||
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback")
|
||||
|
||||
# Correct
|
||||
route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback")
|
||||
```
|
||||
|
||||
- **For resource actions**: Use `build_resource_action_route()`
|
||||
```python
|
||||
# Wrong
|
||||
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain")
|
||||
|
||||
# Correct
|
||||
route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain")
|
||||
```
|
||||
|
||||
### Files Fixed
|
||||
- `services/forecasting/app/api/forecast_feedback.py`
|
||||
|
||||
## General Kubernetes Troubleshooting
|
||||
|
||||
### Check Pod Status
|
||||
```bash
|
||||
kubectl get pods -n bakery-ia
|
||||
kubectl describe pod distribution-service -n bakery-ia
|
||||
```
|
||||
|
||||
### Check Logs
|
||||
```bash
|
||||
kubectl logs distribution-service -n bakery-ia
|
||||
kubectl logs -f distribution-service -n bakery-ia # Follow logs
|
||||
```
|
||||
|
||||
### Check Resource Usage
|
||||
```bash
|
||||
kubectl top pods -n bakery-ia
|
||||
kubectl describe nodes | grep -A 10 "Allocated resources"
|
||||
```
|
||||
|
||||
### Restart Deployment
|
||||
```bash
|
||||
kubectl rollout restart deployment distribution-service -n bakery-ia
|
||||
```
|
||||
|
||||
### Scale Down/Up
|
||||
```bash
|
||||
kubectl scale deployment distribution-service -n bakery-ia --replicas=1
|
||||
kubectl scale deployment distribution-service -n bakery-ia --replicas=2
|
||||
```
|
||||
|
||||
## Running Fix Scripts
|
||||
|
||||
### Fix Inotify Limits
|
||||
```bash
|
||||
cd scripts
|
||||
./fix_kubernetes_inotify.sh
|
||||
```
|
||||
|
||||
### Fix RouteBuilder Issues
|
||||
The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues:
|
||||
|
||||
1. Check the RouteBuilder method signatures in `shared/routing/route_builder.py`
|
||||
2. Use the appropriate method for your routing pattern
|
||||
3. Follow the examples in the fixed forecast feedback API
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Kubernetes Inotify Limits](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files)
|
||||
- [Docker Desktop Resource Limits](https://docs.docker.com/desktop/settings/mac/#resources)
|
||||
- [RouteBuilder Documentation](shared/routing/route_builder.py)
|
||||
|
||||
## Support
|
||||
|
||||
If issues persist after trying these solutions:
|
||||
|
||||
1. Check the specific error message and logs
|
||||
2. Verify system resources (CPU, memory, disk)
|
||||
3. Review recent changes to the codebase
|
||||
4. Consult the architecture documentation for service boundaries
|
||||
@@ -1,48 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Script to fix "too many open files" error in Kubernetes
|
||||
# This error occurs when the system hits inotify limits
|
||||
|
||||
echo "Fixing inotify limits for Kubernetes..."
|
||||
|
||||
# Check current inotify limits
|
||||
echo "Current inotify limits:"
|
||||
sysctl fs.inotify.max_user_watches
|
||||
sysctl fs.inotify.max_user_instances
|
||||
sysctl fs.inotify.max_queued_events
|
||||
|
||||
echo ""
|
||||
echo "Increasing inotify limits..."
|
||||
|
||||
# Increase inotify limits (temporary - lasts until reboot)
|
||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
||||
|
||||
# Verify the changes
|
||||
echo ""
|
||||
echo "New inotify limits:"
|
||||
sysctl fs.inotify.max_user_watches
|
||||
sysctl fs.inotify.max_user_instances
|
||||
sysctl fs.inotify.max_queued_events
|
||||
|
||||
echo ""
|
||||
echo "For permanent fix, add these lines to /etc/sysctl.conf:"
|
||||
echo "fs.inotify.max_user_watches=524288"
|
||||
echo "fs.inotify.max_user_instances=1024"
|
||||
echo "fs.inotify.max_queued_events=16384"
|
||||
echo ""
|
||||
echo "Then run: sudo sysctl -p"
|
||||
|
||||
echo ""
|
||||
echo "If you're using Docker Desktop or Kind, you may need to:"
|
||||
echo "1. Restart Docker Desktop"
|
||||
echo "2. Or for Kind: kind delete cluster && kind create cluster"
|
||||
echo "3. Or adjust the node's system limits directly"
|
||||
|
||||
echo ""
|
||||
echo "For production environments, consider adding these limits to your deployment:"
|
||||
echo "securityContext:"
|
||||
echo " runAsUser: 1000"
|
||||
echo " runAsGroup: 1000"
|
||||
echo " fsGroup: 1000"
|
||||
@@ -1,100 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Script to fix "too many open files" error in Kubernetes
|
||||
# This error occurs when the system hits inotify limits
|
||||
|
||||
echo "🔧 Fixing Kubernetes inotify limits..."
|
||||
|
||||
# Check if we're running on macOS (Docker Desktop) or Linux
|
||||
if [[ "$(uname)" == "Darwin" ]]; then
|
||||
echo "🍎 Detected macOS - Docker Desktop environment"
|
||||
echo ""
|
||||
echo "For Docker Desktop on macOS, you need to:"
|
||||
echo "1. Open Docker Desktop settings"
|
||||
echo "2. Go to 'Resources' -> 'Advanced'"
|
||||
echo "3. Increase the memory allocation (recommended: 8GB+)"
|
||||
echo "4. Restart Docker Desktop"
|
||||
echo ""
|
||||
echo "Alternatively, you can run:"
|
||||
echo "docker system prune -a --volumes"
|
||||
echo "Then restart Docker Desktop"
|
||||
|
||||
# Also check if we can adjust macOS system limits
|
||||
echo ""
|
||||
echo "Checking current macOS inotify limits..."
|
||||
sysctl kern.maxfilesperproc
|
||||
sysctl kern.maxfiles
|
||||
|
||||
echo ""
|
||||
echo "To increase macOS limits permanently, add to /etc/sysctl.conf:"
|
||||
echo "kern.maxfiles=1048576"
|
||||
echo "kern.maxfilesperproc=65536"
|
||||
echo "Then run: sudo sysctl -w kern.maxfiles=1048576"
|
||||
echo "And: sudo sysctl -w kern.maxfilesperproc=65536"
|
||||
|
||||
elif [[ "$(uname)" == "Linux" ]]; then
|
||||
echo "🐧 Detected Linux environment"
|
||||
|
||||
# Check if we're in a Kubernetes cluster
|
||||
if kubectl cluster-info >/dev/null 2>&1; then
|
||||
echo "🎯 Detected Kubernetes cluster"
|
||||
|
||||
# Check current inotify limits
|
||||
echo ""
|
||||
echo "Current inotify limits:"
|
||||
sysctl fs.inotify.max_user_watches
|
||||
sysctl fs.inotify.max_user_instances
|
||||
sysctl fs.inotify.max_queued_events
|
||||
|
||||
# Increase limits temporarily
|
||||
echo ""
|
||||
echo "Increasing inotify limits temporarily..."
|
||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
||||
|
||||
# Verify changes
|
||||
echo ""
|
||||
echo "New inotify limits:"
|
||||
sysctl fs.inotify.max_user_watches
|
||||
sysctl fs.inotify.max_user_instances
|
||||
sysctl fs.inotify.max_queued_events
|
||||
|
||||
# Check if we can make permanent changes
|
||||
if [[ -f /etc/sysctl.conf ]]; then
|
||||
echo ""
|
||||
echo "Making inotify limits permanent..."
|
||||
sudo bash -c 'cat >> /etc/sysctl.conf << EOF
|
||||
# Increased inotify limits for Kubernetes
|
||||
fs.inotify.max_user_watches=524288
|
||||
fs.inotify.max_user_instances=1024
|
||||
fs.inotify.max_queued_events=16384
|
||||
EOF'
|
||||
sudo sysctl -p
|
||||
fi
|
||||
|
||||
# Check for Docker containers that might need restarting
|
||||
echo ""
|
||||
echo "Checking for running containers that might need restarting..."
|
||||
docker ps --format "{{.Names}}" | while read container; do
|
||||
echo "Restarting container: $container"
|
||||
docker restart "$container" >/dev/null 2>&1 || echo "Failed to restart $container"
|
||||
done
|
||||
|
||||
else
|
||||
echo "⚠️ Kubernetes cluster not detected"
|
||||
echo "This script should be run on a Kubernetes node or with kubectl access"
|
||||
fi
|
||||
else
|
||||
echo "❓ Unsupported operating system: $(uname)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "📋 Additional recommendations:"
|
||||
echo "1. For Kind clusters: kind delete cluster && kind create cluster"
|
||||
echo "2. For Minikube: minikube stop && minikube start"
|
||||
echo "3. For production: Adjust node system limits and restart kubelet"
|
||||
echo "4. Consider adding resource limits to your deployments"
|
||||
|
||||
echo ""
|
||||
echo "✅ Inotify fix script completed!"
|
||||
Reference in New Issue
Block a user