195 lines
5.1 KiB
Markdown
195 lines
5.1 KiB
Markdown
# Deployment Troubleshooting Guide
|
|
|
|
This guide addresses common deployment issues encountered with the Bakery IA system.
|
|
|
|
## Table of Contents
|
|
|
|
- [Too Many Open Files Error](#too-many-open-files-error)
|
|
- [RouteBuilder TypeError Fix](#routebuilder-typeerror-fix)
|
|
- [General Kubernetes Troubleshooting](#general-kubernetes-troubleshooting)
|
|
|
|
## Too Many Open Files Error
|
|
|
|
### Symptoms
|
|
```
|
|
failed to create fsnotify watcher: too many open files
|
|
Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files
|
|
```
|
|
|
|
### Root Cause
|
|
This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers.
|
|
|
|
### Solutions
|
|
|
|
#### For macOS (Docker Desktop)
|
|
|
|
1. **Increase Docker Resources**:
|
|
- Open Docker Desktop
|
|
- Go to Settings > Resources > Advanced
|
|
- Increase memory allocation to 8GB or more
|
|
- Restart Docker Desktop
|
|
|
|
2. **Clean Docker System**:
|
|
```bash
|
|
docker system prune -a --volumes
|
|
```
|
|
|
|
3. **Adjust macOS System Limits**:
|
|
```bash
|
|
# Add to /etc/sysctl.conf
|
|
echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf
|
|
echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf
|
|
|
|
# Apply changes
|
|
sudo sysctl -w kern.maxfiles=1048576
|
|
sudo sysctl -w kern.maxfilesperproc=65536
|
|
```
|
|
|
|
#### For Linux (Kubernetes Nodes)
|
|
|
|
1. **Temporary Fix**:
|
|
```bash
|
|
sudo sysctl -w fs.inotify.max_user_watches=524288
|
|
sudo sysctl -w fs.inotify.max_user_instances=1024
|
|
sudo sysctl -w fs.inotify.max_queued_events=16384
|
|
```
|
|
|
|
2. **Permanent Fix**:
|
|
```bash
|
|
# Add to /etc/sysctl.conf
|
|
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
|
|
echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf
|
|
echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf
|
|
|
|
# Apply changes
|
|
sudo sysctl -p
|
|
```
|
|
|
|
3. **Restart Kubernetes Components**:
|
|
```bash
|
|
sudo systemctl restart kubelet
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
#### For Kind Clusters
|
|
|
|
```bash
|
|
# Delete and recreate cluster
|
|
kind delete cluster
|
|
kind create cluster
|
|
```
|
|
|
|
#### For Minikube
|
|
|
|
```bash
|
|
minikube stop
|
|
minikube start
|
|
```
|
|
|
|
### Prevention
|
|
|
|
Add security context to your deployments to limit resource usage:
|
|
|
|
```yaml
|
|
securityContext:
|
|
runAsUser: 1000
|
|
runAsGroup: 1000
|
|
allowPrivilegeEscalation: false
|
|
readOnlyRootFilesystem: false
|
|
```
|
|
|
|
## RouteBuilder TypeError Fix
|
|
|
|
### Symptoms
|
|
```
|
|
TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given
|
|
```
|
|
|
|
### Root Cause
|
|
Incorrect usage of RouteBuilder methods. The `build_resource_detail_route` method only accepts 2-3 parameters, but was being called with 4-5 parameters.
|
|
|
|
### Solution
|
|
|
|
Use the correct RouteBuilder methods:
|
|
|
|
- **For nested resources**: Use `build_nested_resource_route()`
|
|
```python
|
|
# Wrong
|
|
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback")
|
|
|
|
# Correct
|
|
route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback")
|
|
```
|
|
|
|
- **For resource actions**: Use `build_resource_action_route()`
|
|
```python
|
|
# Wrong
|
|
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain")
|
|
|
|
# Correct
|
|
route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain")
|
|
```
|
|
|
|
### Files Fixed
|
|
- `services/forecasting/app/api/forecast_feedback.py`
|
|
|
|
## General Kubernetes Troubleshooting
|
|
|
|
### Check Pod Status
|
|
```bash
|
|
kubectl get pods -n bakery-ia
|
|
kubectl describe pod distribution-service -n bakery-ia
|
|
```
|
|
|
|
### Check Logs
|
|
```bash
|
|
kubectl logs distribution-service -n bakery-ia
|
|
kubectl logs -f distribution-service -n bakery-ia # Follow logs
|
|
```
|
|
|
|
### Check Resource Usage
|
|
```bash
|
|
kubectl top pods -n bakery-ia
|
|
kubectl describe nodes | grep -A 10 "Allocated resources"
|
|
```
|
|
|
|
### Restart Deployment
|
|
```bash
|
|
kubectl rollout restart deployment distribution-service -n bakery-ia
|
|
```
|
|
|
|
### Scale Down/Up
|
|
```bash
|
|
kubectl scale deployment distribution-service -n bakery-ia --replicas=1
|
|
kubectl scale deployment distribution-service -n bakery-ia --replicas=2
|
|
```
|
|
|
|
## Running Fix Scripts
|
|
|
|
### Fix Inotify Limits
|
|
```bash
|
|
cd scripts
|
|
./fix_kubernetes_inotify.sh
|
|
```
|
|
|
|
### Fix RouteBuilder Issues
|
|
The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues:
|
|
|
|
1. Check the RouteBuilder method signatures in `shared/routing/route_builder.py`
|
|
2. Use the appropriate method for your routing pattern
|
|
3. Follow the examples in the fixed forecast feedback API
|
|
|
|
## Additional Resources
|
|
|
|
- [Kubernetes Inotify Limits](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files)
|
|
- [Docker Desktop Resource Limits](https://docs.docker.com/desktop/settings/mac/#resources)
|
|
- [RouteBuilder Documentation](shared/routing/route_builder.py)
|
|
|
|
## Support
|
|
|
|
If issues persist after trying these solutions:
|
|
|
|
1. Check the specific error message and logs
|
|
2. Verify system resources (CPU, memory, disk)
|
|
3. Review recent changes to the codebase
|
|
4. Consult the architecture documentation for service boundaries |