Files
bakery-ia/DEPLOYMENT_TROUBLESHOOTING.md

195 lines
5.1 KiB
Markdown
Raw Normal View History

2025-12-17 20:50:22 +01:00
# Deployment Troubleshooting Guide
This guide addresses common deployment issues encountered with the Bakery IA system.
## Table of Contents
- [Too Many Open Files Error](#too-many-open-files-error)
- [RouteBuilder TypeError Fix](#routebuilder-typeerror-fix)
- [General Kubernetes Troubleshooting](#general-kubernetes-troubleshooting)
## Too Many Open Files Error
### Symptoms
```
failed to create fsnotify watcher: too many open files
Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files
```
### Root Cause
This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers.
### Solutions
#### For macOS (Docker Desktop)
1. **Increase Docker Resources**:
- Open Docker Desktop
- Go to Settings > Resources > Advanced
- Increase memory allocation to 8GB or more
- Restart Docker Desktop
2. **Clean Docker System**:
```bash
docker system prune -a --volumes
```
3. **Adjust macOS System Limits**:
```bash
# Add to /etc/sysctl.conf
echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf
echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf
# Apply changes
sudo sysctl -w kern.maxfiles=1048576
sudo sysctl -w kern.maxfilesperproc=65536
```
#### For Linux (Kubernetes Nodes)
1. **Temporary Fix**:
```bash
sudo sysctl -w fs.inotify.max_user_watches=524288
sudo sysctl -w fs.inotify.max_user_instances=1024
sudo sysctl -w fs.inotify.max_queued_events=16384
```
2. **Permanent Fix**:
```bash
# Add to /etc/sysctl.conf
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf
# Apply changes
sudo sysctl -p
```
3. **Restart Kubernetes Components**:
```bash
sudo systemctl restart kubelet
sudo systemctl restart docker
```
#### For Kind Clusters
```bash
# Delete and recreate cluster
kind delete cluster
kind create cluster
```
#### For Minikube
```bash
minikube stop
minikube start
```
### Prevention
Add security context to your deployments to limit resource usage:
```yaml
securityContext:
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
```
## RouteBuilder TypeError Fix
### Symptoms
```
TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given
```
### Root Cause
Incorrect usage of RouteBuilder methods. The `build_resource_detail_route` method only accepts 2-3 parameters, but was being called with 4-5 parameters.
### Solution
Use the correct RouteBuilder methods:
- **For nested resources**: Use `build_nested_resource_route()`
```python
# Wrong
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback")
# Correct
route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback")
```
- **For resource actions**: Use `build_resource_action_route()`
```python
# Wrong
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain")
# Correct
route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain")
```
### Files Fixed
- `services/forecasting/app/api/forecast_feedback.py`
## General Kubernetes Troubleshooting
### Check Pod Status
```bash
kubectl get pods -n bakery-ia
kubectl describe pod distribution-service -n bakery-ia
```
### Check Logs
```bash
kubectl logs distribution-service -n bakery-ia
kubectl logs -f distribution-service -n bakery-ia # Follow logs
```
### Check Resource Usage
```bash
kubectl top pods -n bakery-ia
kubectl describe nodes | grep -A 10 "Allocated resources"
```
### Restart Deployment
```bash
kubectl rollout restart deployment distribution-service -n bakery-ia
```
### Scale Down/Up
```bash
kubectl scale deployment distribution-service -n bakery-ia --replicas=1
kubectl scale deployment distribution-service -n bakery-ia --replicas=2
```
## Running Fix Scripts
### Fix Inotify Limits
```bash
cd scripts
./fix_kubernetes_inotify.sh
```
### Fix RouteBuilder Issues
The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues:
1. Check the RouteBuilder method signatures in `shared/routing/route_builder.py`
2. Use the appropriate method for your routing pattern
3. Follow the examples in the fixed forecast feedback API
## Additional Resources
- [Kubernetes Inotify Limits](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files)
- [Docker Desktop Resource Limits](https://docs.docker.com/desktop/settings/mac/#resources)
- [RouteBuilder Documentation](shared/routing/route_builder.py)
## Support
If issues persist after trying these solutions:
1. Check the specific error message and logs
2. Verify system resources (CPU, memory, disk)
3. Review recent changes to the codebase
4. Consult the architecture documentation for service boundaries