Imporve enterprise
This commit is contained in:
195
DEPLOYMENT_TROUBLESHOOTING.md
Normal file
195
DEPLOYMENT_TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Deployment Troubleshooting Guide
|
||||
|
||||
This guide addresses common deployment issues encountered with the Bakery IA system.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Too Many Open Files Error](#too-many-open-files-error)
|
||||
- [RouteBuilder TypeError Fix](#routebuilder-typeerror-fix)
|
||||
- [General Kubernetes Troubleshooting](#general-kubernetes-troubleshooting)
|
||||
|
||||
## Too Many Open Files Error
|
||||
|
||||
### Symptoms
|
||||
```
|
||||
failed to create fsnotify watcher: too many open files
|
||||
Error streaming distribution-service-7ff4db8c48-k4xw7 logs: failed to create fsnotify watcher: too many open files
|
||||
```
|
||||
|
||||
### Root Cause
|
||||
This error occurs when the system hits inotify limits, which are used by Kubernetes and Docker to monitor file system changes. This is common in development environments with many containers.
|
||||
|
||||
### Solutions
|
||||
|
||||
#### For macOS (Docker Desktop)
|
||||
|
||||
1. **Increase Docker Resources**:
|
||||
- Open Docker Desktop
|
||||
- Go to Settings > Resources > Advanced
|
||||
- Increase memory allocation to 8GB or more
|
||||
- Restart Docker Desktop
|
||||
|
||||
2. **Clean Docker System**:
|
||||
```bash
|
||||
docker system prune -a --volumes
|
||||
```
|
||||
|
||||
3. **Adjust macOS System Limits**:
|
||||
```bash
|
||||
# Add to /etc/sysctl.conf
|
||||
echo "kern.maxfiles=1048576" | sudo tee -a /etc/sysctl.conf
|
||||
echo "kern.maxfilesperproc=65536" | sudo tee -a /etc/sysctl.conf
|
||||
|
||||
# Apply changes
|
||||
sudo sysctl -w kern.maxfiles=1048576
|
||||
sudo sysctl -w kern.maxfilesperproc=65536
|
||||
```
|
||||
|
||||
#### For Linux (Kubernetes Nodes)
|
||||
|
||||
1. **Temporary Fix**:
|
||||
```bash
|
||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
||||
sudo sysctl -w fs.inotify.max_user_instances=1024
|
||||
sudo sysctl -w fs.inotify.max_queued_events=16384
|
||||
```
|
||||
|
||||
2. **Permanent Fix**:
|
||||
```bash
|
||||
# Add to /etc/sysctl.conf
|
||||
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
|
||||
echo "fs.inotify.max_user_instances=1024" | sudo tee -a /etc/sysctl.conf
|
||||
echo "fs.inotify.max_queued_events=16384" | sudo tee -a /etc/sysctl.conf
|
||||
|
||||
# Apply changes
|
||||
sudo sysctl -p
|
||||
```
|
||||
|
||||
3. **Restart Kubernetes Components**:
|
||||
```bash
|
||||
sudo systemctl restart kubelet
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
#### For Kind Clusters
|
||||
|
||||
```bash
|
||||
# Delete and recreate cluster
|
||||
kind delete cluster
|
||||
kind create cluster
|
||||
```
|
||||
|
||||
#### For Minikube
|
||||
|
||||
```bash
|
||||
minikube stop
|
||||
minikube start
|
||||
```
|
||||
|
||||
### Prevention
|
||||
|
||||
Add security context to your deployments to limit resource usage:
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: false
|
||||
```
|
||||
|
||||
## RouteBuilder TypeError Fix
|
||||
|
||||
### Symptoms
|
||||
```
|
||||
TypeError: RouteBuilder.build_resource_detail_route() takes from 2 to 4 positional arguments but 5 were given
|
||||
```
|
||||
|
||||
### Root Cause
|
||||
Incorrect usage of RouteBuilder methods. The `build_resource_detail_route` method only accepts 2-3 parameters, but was being called with 4-5 parameters.
|
||||
|
||||
### Solution
|
||||
|
||||
Use the correct RouteBuilder methods:
|
||||
|
||||
- **For nested resources**: Use `build_nested_resource_route()`
|
||||
```python
|
||||
# Wrong
|
||||
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback")
|
||||
|
||||
# Correct
|
||||
route_builder.build_nested_resource_route("forecasts", "forecast_id", "feedback")
|
||||
```
|
||||
|
||||
- **For resource actions**: Use `build_resource_action_route()`
|
||||
```python
|
||||
# Wrong
|
||||
route_builder.build_resource_detail_route("forecasts", "forecast_id", "feedback", "retrain")
|
||||
|
||||
# Correct
|
||||
route_builder.build_resource_action_route("forecasts", "forecast_id", "retrain")
|
||||
```
|
||||
|
||||
### Files Fixed
|
||||
- `services/forecasting/app/api/forecast_feedback.py`
|
||||
|
||||
## General Kubernetes Troubleshooting
|
||||
|
||||
### Check Pod Status
|
||||
```bash
|
||||
kubectl get pods -n bakery-ia
|
||||
kubectl describe pod distribution-service -n bakery-ia
|
||||
```
|
||||
|
||||
### Check Logs
|
||||
```bash
|
||||
kubectl logs distribution-service -n bakery-ia
|
||||
kubectl logs -f distribution-service -n bakery-ia # Follow logs
|
||||
```
|
||||
|
||||
### Check Resource Usage
|
||||
```bash
|
||||
kubectl top pods -n bakery-ia
|
||||
kubectl describe nodes | grep -A 10 "Allocated resources"
|
||||
```
|
||||
|
||||
### Restart Deployment
|
||||
```bash
|
||||
kubectl rollout restart deployment distribution-service -n bakery-ia
|
||||
```
|
||||
|
||||
### Scale Down/Up
|
||||
```bash
|
||||
kubectl scale deployment distribution-service -n bakery-ia --replicas=1
|
||||
kubectl scale deployment distribution-service -n bakery-ia --replicas=2
|
||||
```
|
||||
|
||||
## Running Fix Scripts
|
||||
|
||||
### Fix Inotify Limits
|
||||
```bash
|
||||
cd scripts
|
||||
./fix_kubernetes_inotify.sh
|
||||
```
|
||||
|
||||
### Fix RouteBuilder Issues
|
||||
The RouteBuilder issues have been fixed in the codebase. If you encounter similar issues:
|
||||
|
||||
1. Check the RouteBuilder method signatures in `shared/routing/route_builder.py`
|
||||
2. Use the appropriate method for your routing pattern
|
||||
3. Follow the examples in the fixed forecast feedback API
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Kubernetes Inotify Limits](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files)
|
||||
- [Docker Desktop Resource Limits](https://docs.docker.com/desktop/settings/mac/#resources)
|
||||
- [RouteBuilder Documentation](shared/routing/route_builder.py)
|
||||
|
||||
## Support
|
||||
|
||||
If issues persist after trying these solutions:
|
||||
|
||||
1. Check the specific error message and logs
|
||||
2. Verify system resources (CPU, memory, disk)
|
||||
3. Review recent changes to the codebase
|
||||
4. Consult the architecture documentation for service boundaries
|
||||
Reference in New Issue
Block a user