Delete files
This commit is contained in:
@@ -1,195 +0,0 @@
|
|||||||
# Deployment Troubleshooting Guide
|
|
||||||
|
|
||||||
This guide addresses common deployment issues encountered with the Bakery IA system.
|
|
||||||
|
|
||||||
## Table of Contents
|
|
||||||
|
|
||||||
- [Too Many Open Files Error](#too-many-open-files-error)
|
|
||||||
- [RouteBuilder TypeError Fix](#routebuilder-typeerror-fix)
|
|
||||||
- [General Kubernetes Troubleshooting](#general-kubernetes-troubleshooting)
|
|
||||||
|
|
||||||
## Too Many Open Files Error
|
|
||||||
|
|
||||||
### Symptoms
|
|
||||||
```
|
|
||||||
failed to create fsnotify watcher: too many open files
|
|
||||||
Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files
|
|
||||||
```
|
|
||||||
|
|
||||||
### Root Cause
|
|
||||||
This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers.
|
|
||||||
|
|
||||||
### Solutions
|
|
||||||
|
|
||||||
#### For macOS (Docker Desktop)
|
|
||||||
|
|
||||||
1. **Increase Docker Resources**:
|
|
||||||
- Open Docker Desktop
|
|
||||||
- Go to Settings > Resources > Advanced
|
|
||||||
- Increase memory allocation to 8GB or more
|
|
||||||
- Restart Docker Desktop
|
|
||||||
|
|
||||||
2. **Clean Docker System**:
|
|
||||||
```bash
|
|
||||||
docker system prune -a --volumes
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Adjust macOS System Limits**:
|
|
||||||
```bash
|
|
||||||
# Add to /etc/sysctl.conf
|
|
||||||
echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf
|
|
||||||
echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf
|
|
||||||
|
|
||||||
# Apply changes
|
|
||||||
sudo sysctl -w kern.maxfiles=1048576
|
|
||||||
sudo sysctl -w kern.maxfilesperproc=65536
|
|
||||||
```
|
|
||||||
|
|
||||||
#### For Linux (Kubernetes Nodes)
|
|
||||||
|
|
||||||
1. **Temporary Fix**:
|
|
||||||
```bash
|
|
||||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
|
||||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
|
||||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Permanent Fix**:
|
|
||||||
```bash
|
|
||||||
# Add to /etc/sysctl.conf
|
|
||||||
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
|
|
||||||
echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf
|
|
||||||
echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf
|
|
||||||
|
|
||||||
# Apply changes
|
|
||||||
sudo sysctl -p
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Restart Kubernetes Components**:
|
|
||||||
```bash
|
|
||||||
sudo systemctl restart kubelet
|
|
||||||
sudo systemctl restart docker
|
|
||||||
```
|
|
||||||
|
|
||||||
#### For Kind Clusters
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Delete and recreate cluster
|
|
||||||
kind delete cluster
|
|
||||||
kind create cluster
|
|
||||||
```
|
|
||||||
|
|
||||||
#### For Minikube
|
|
||||||
|
|
||||||
```bash
|
|
||||||
minikube stop
|
|
||||||
minikube start
|
|
||||||
```
|
|
||||||
|
|
||||||
### Prevention
|
|
||||||
|
|
||||||
Add security context to your deployments to limit resource usage:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
securityContext:
|
|
||||||
runAsUser: 1000
|
|
||||||
runAsGroup: 1000
|
|
||||||
allowPrivilegeEscalation: false
|
|
||||||
readOnlyRootFilesystem: false
|
|
||||||
```
|
|
||||||
|
|
||||||
## RouteBuilder TypeError Fix
|
|
||||||
|
|
||||||
### Symptoms
|
|
||||||
```
|
|
||||||
TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given
|
|
||||||
```
|
|
||||||
|
|
||||||
### Root Cause
|
|
||||||
Incorrect usage of RouteBuilder methods. The `build_resource_detail_route` method only accepts 2-3 parameters, but was being called with 4-5 parameters.
|
|
||||||
|
|
||||||
### Solution
|
|
||||||
|
|
||||||
Use the correct RouteBuilder methods:
|
|
||||||
|
|
||||||
- **For nested resources**: Use `build_nested_resource_route()`
|
|
||||||
```python
|
|
||||||
# Wrong
|
|
||||||
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback")
|
|
||||||
|
|
||||||
# Correct
|
|
||||||
route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback")
|
|
||||||
```
|
|
||||||
|
|
||||||
- **For resource actions**: Use `build_resource_action_route()`
|
|
||||||
```python
|
|
||||||
# Wrong
|
|
||||||
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain")
|
|
||||||
|
|
||||||
# Correct
|
|
||||||
route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Files Fixed
|
|
||||||
- `services/forecasting/app/api/forecast_feedback.py`
|
|
||||||
|
|
||||||
## General Kubernetes Troubleshooting
|
|
||||||
|
|
||||||
### Check Pod Status
|
|
||||||
```bash
|
|
||||||
kubectl get pods -n bakery-ia
|
|
||||||
kubectl describe pod distribution-service -n bakery-ia
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check Logs
|
|
||||||
```bash
|
|
||||||
kubectl logs distribution-service -n bakery-ia
|
|
||||||
kubectl logs -f distribution-service -n bakery-ia # Follow logs
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check Resource Usage
|
|
||||||
```bash
|
|
||||||
kubectl top pods -n bakery-ia
|
|
||||||
kubectl describe nodes | grep -A 10 "Allocated resources"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Restart Deployment
|
|
||||||
```bash
|
|
||||||
kubectl rollout restart deployment distribution-service -n bakery-ia
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scale Down/Up
|
|
||||||
```bash
|
|
||||||
kubectl scale deployment distribution-service -n bakery-ia --replicas=1
|
|
||||||
kubectl scale deployment distribution-service -n bakery-ia --replicas=2
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running Fix Scripts
|
|
||||||
|
|
||||||
### Fix Inotify Limits
|
|
||||||
```bash
|
|
||||||
cd scripts
|
|
||||||
./fix_kubernetes_inotify.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Fix RouteBuilder Issues
|
|
||||||
The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues:
|
|
||||||
|
|
||||||
1. Check the RouteBuilder method signatures in `shared/routing/route_builder.py`
|
|
||||||
2. Use the appropriate method for your routing pattern
|
|
||||||
3. Follow the examples in the fixed forecast feedback API
|
|
||||||
|
|
||||||
## Additional Resources
|
|
||||||
|
|
||||||
- [Kubernetes Inotify Limits](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files)
|
|
||||||
- [Docker Desktop Resource Limits](https://docs.docker.com/desktop/settings/mac/#resources)
|
|
||||||
- [RouteBuilder Documentation](shared/routing/route_builder.py)
|
|
||||||
|
|
||||||
## Support
|
|
||||||
|
|
||||||
If issues persist after trying these solutions:
|
|
||||||
|
|
||||||
1. Check the specific error message and logs
|
|
||||||
2. Verify system resources (CPU, memory, disk)
|
|
||||||
3. Review recent changes to the codebase
|
|
||||||
4. Consult the architecture documentation for service boundaries
|
|
||||||
@@ -1,48 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Script to fix "too many open files" error in Kubernetes
|
|
||||||
# This error occurs when the system hits inotify limits
|
|
||||||
|
|
||||||
echo "Fixing inotify limits for Kubernetes..."
|
|
||||||
|
|
||||||
# Check current inotify limits
|
|
||||||
echo "Current inotify limits:"
|
|
||||||
sysctl fs.inotify.max_user_watches
|
|
||||||
sysctl fs.inotify.max_user_instances
|
|
||||||
sysctl fs.inotify.max_queued_events
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "Increasing inotify limits..."
|
|
||||||
|
|
||||||
# Increase inotify limits (temporary - lasts until reboot)
|
|
||||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
|
||||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
|
||||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
|
||||||
|
|
||||||
# Verify the changes
|
|
||||||
echo ""
|
|
||||||
echo "New inotify limits:"
|
|
||||||
sysctl fs.inotify.max_user_watches
|
|
||||||
sysctl fs.inotify.max_user_instances
|
|
||||||
sysctl fs.inotify.max_queued_events
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "For permanent fix, add these lines to /etc/sysctl.conf:"
|
|
||||||
echo "fs.inotify.max_user_watches=524288"
|
|
||||||
echo "fs.inotify.max_user_instances=1024"
|
|
||||||
echo "fs.inotify.max_queued_events=16384"
|
|
||||||
echo ""
|
|
||||||
echo "Then run: sudo sysctl -p"
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "If you're using Docker Desktop or Kind, you may need to:"
|
|
||||||
echo "1. Restart Docker Desktop"
|
|
||||||
echo "2. Or for Kind: kind delete cluster && kind create cluster"
|
|
||||||
echo "3. Or adjust the node's system limits directly"
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "For production environments, consider adding these limits to your deployment:"
|
|
||||||
echo "securityContext:"
|
|
||||||
echo " runAsUser: 1000"
|
|
||||||
echo " runAsGroup: 1000"
|
|
||||||
echo " fsGroup: 1000"
|
|
||||||
@@ -1,100 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Script to fix "too many open files" error in Kubernetes
|
|
||||||
# This error occurs when the system hits inotify limits
|
|
||||||
|
|
||||||
echo "🔧 Fixing Kubernetes inotify limits..."
|
|
||||||
|
|
||||||
# Check if we're running on macOS (Docker Desktop) or Linux
|
|
||||||
if [[ "$(uname)" == "Darwin" ]]; then
|
|
||||||
echo "🍎 Detected macOS - Docker Desktop environment"
|
|
||||||
echo ""
|
|
||||||
echo "For Docker Desktop on macOS, you need to:"
|
|
||||||
echo "1. Open Docker Desktop settings"
|
|
||||||
echo "2. Go to 'Resources' -> 'Advanced'"
|
|
||||||
echo "3. Increase the memory allocation (recommended: 8GB+)"
|
|
||||||
echo "4. Restart Docker Desktop"
|
|
||||||
echo ""
|
|
||||||
echo "Alternatively, you can run:"
|
|
||||||
echo "docker system prune -a --volumes"
|
|
||||||
echo "Then restart Docker Desktop"
|
|
||||||
|
|
||||||
# Also check if we can adjust macOS system limits
|
|
||||||
echo ""
|
|
||||||
echo "Checking current macOS inotify limits..."
|
|
||||||
sysctl kern.maxfilesperproc
|
|
||||||
sysctl kern.maxfiles
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "To increase macOS limits permanently, add to /etc/sysctl.conf:"
|
|
||||||
echo "kern.maxfiles=1048576"
|
|
||||||
echo "kern.maxfilesperproc=65536"
|
|
||||||
echo "Then run: sudo sysctl -w kern.maxfiles=1048576"
|
|
||||||
echo "And: sudo sysctl -w kern.maxfilesperproc=65536"
|
|
||||||
|
|
||||||
elif [[ "$(uname)" == "Linux" ]]; then
|
|
||||||
echo "🐧 Detected Linux environment"
|
|
||||||
|
|
||||||
# Check if we're in a Kubernetes cluster
|
|
||||||
if kubectl cluster-info >/dev/null 2>&1; then
|
|
||||||
echo "🎯 Detected Kubernetes cluster"
|
|
||||||
|
|
||||||
# Check current inotify limits
|
|
||||||
echo ""
|
|
||||||
echo "Current inotify limits:"
|
|
||||||
sysctl fs.inotify.max_user_watches
|
|
||||||
sysctl fs.inotify.max_user_instances
|
|
||||||
sysctl fs.inotify.max_queued_events
|
|
||||||
|
|
||||||
# Increase limits temporarily
|
|
||||||
echo ""
|
|
||||||
echo "Increasing inotify limits temporarily..."
|
|
||||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
|
||||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
|
||||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
|
||||||
|
|
||||||
# Verify changes
|
|
||||||
echo ""
|
|
||||||
echo "New inotify limits:"
|
|
||||||
sysctl fs.inotify.max_user_watches
|
|
||||||
sysctl fs.inotify.max_user_instances
|
|
||||||
sysctl fs.inotify.max_queued_events
|
|
||||||
|
|
||||||
# Check if we can make permanent changes
|
|
||||||
if [[ -f /etc/sysctl.conf ]]; then
|
|
||||||
echo ""
|
|
||||||
echo "Making inotify limits permanent..."
|
|
||||||
sudo bash -c 'cat >> /etc/sysctl.conf << EOF
|
|
||||||
# Increased inotify limits for Kubernetes
|
|
||||||
fs.inotify.max_user_watches=524288
|
|
||||||
fs.inotify.max_user_instances=1024
|
|
||||||
fs.inotify.max_queued_events=16384
|
|
||||||
EOF'
|
|
||||||
sudo sysctl -p
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check for Docker containers that might need restarting
|
|
||||||
echo ""
|
|
||||||
echo "Checking for running containers that might need restarting..."
|
|
||||||
docker ps --format "{{.Names}}" | while read container; do
|
|
||||||
echo "Restarting container: $container"
|
|
||||||
docker restart "$container" >/dev/null 2>&1 || echo "Failed to restart $container"
|
|
||||||
done
|
|
||||||
|
|
||||||
else
|
|
||||||
echo "⚠️ Kubernetes cluster not detected"
|
|
||||||
echo "This script should be run on a Kubernetes node or with kubectl access"
|
|
||||||
fi
|
|
||||||
else
|
|
||||||
echo "❓ Unsupported operating system: $(uname)"
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "📋 Additional recommendations:"
|
|
||||||
echo "1. For Kind clusters: kind delete cluster && kind create cluster"
|
|
||||||
echo "2. For Minikube: minikube stop && minikube start"
|
|
||||||
echo "3. For production: Adjust node system limits and restart kubelet"
|
|
||||||
echo "4. Consider adding resource limits to your deployments"
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "✅ Inotify fix script completed!"
|
|
||||||
Reference in New Issue
Block a user