Improve monitoring for prod
This commit is contained in:
514
docs/README.md
514
docs/README.md
@@ -1,120 +1,404 @@
|
||||
# Bakery IA - Documentation Index
|
||||
# Bakery-IA Documentation
|
||||
|
||||
Welcome to the Bakery IA documentation! This guide will help you navigate through all aspects of the project, from getting started to advanced operations.
|
||||
**Comprehensive documentation for deploying, operating, and maintaining the Bakery-IA platform**
|
||||
|
||||
## Quick Links
|
||||
|
||||
- **New to the project?** Start with [Getting Started](01-getting-started/README.md)
|
||||
- **Need to understand the system?** See [Architecture Overview](02-architecture/system-overview.md)
|
||||
- **Looking for APIs?** Check [API Reference](08-api-reference/README.md)
|
||||
- **Deploying to production?** Read [Deployment Guide](05-deployment/README.md)
|
||||
- **Having issues?** Visit [Troubleshooting](09-operations/troubleshooting.md)
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
### 📚 [01. Getting Started](01-getting-started/)
|
||||
Start here if you're new to the project.
|
||||
- [Quick Start Guide](01-getting-started/README.md) - Get up and running quickly
|
||||
- [Installation](01-getting-started/installation.md) - Detailed installation instructions
|
||||
- [Development Setup](01-getting-started/development-setup.md) - Configure your dev environment
|
||||
|
||||
### 🏗️ [02. Architecture](02-architecture/)
|
||||
Understand the system design and components.
|
||||
- [System Overview](02-architecture/system-overview.md) - High-level architecture
|
||||
- [Microservices](02-architecture/microservices.md) - Service architecture details
|
||||
- [Data Flow](02-architecture/data-flow.md) - How data moves through the system
|
||||
- [AI/ML Components](02-architecture/ai-ml-components.md) - Machine learning architecture
|
||||
|
||||
### ⚡ [03. Features](03-features/)
|
||||
Detailed documentation for each major feature.
|
||||
|
||||
#### AI & Analytics
|
||||
- [AI Insights Platform](03-features/ai-insights/overview.md) - ML-powered insights
|
||||
- [Dynamic Rules Engine](03-features/ai-insights/dynamic-rules-engine.md) - Pattern detection and rules
|
||||
|
||||
#### Tenant Management
|
||||
- [Deletion System](03-features/tenant-management/deletion-system.md) - Complete tenant deletion
|
||||
- [Multi-Tenancy](03-features/tenant-management/multi-tenancy.md) - Tenant isolation and management
|
||||
- [Roles & Permissions](03-features/tenant-management/roles-permissions.md) - RBAC system
|
||||
|
||||
#### Other Features
|
||||
- [Orchestration System](03-features/orchestration/orchestration-refactoring.md) - Workflow orchestration
|
||||
- [Sustainability Features](03-features/sustainability/sustainability-features.md) - Environmental tracking
|
||||
- [Hyperlocal Calendar](03-features/calendar/hyperlocal-calendar.md) - Event management
|
||||
|
||||
### 💻 [04. Development](04-development/)
|
||||
Tools and workflows for developers.
|
||||
- [Development Workflow](04-development/README.md) - Daily development practices
|
||||
- [Tilt vs Skaffold](04-development/tilt-vs-skaffold.md) - Development tool comparison
|
||||
- [Testing Guide](04-development/testing-guide.md) - Testing strategies and best practices
|
||||
- [Debugging](04-development/debugging.md) - Troubleshooting during development
|
||||
|
||||
### 🚀 [05. Deployment](05-deployment/)
|
||||
Deploy and configure the system.
|
||||
- [Kubernetes Setup](05-deployment/README.md) - K8s deployment guide
|
||||
- [Security Configuration](05-deployment/security-configuration.md) - Security setup
|
||||
- [Database Setup](05-deployment/database-setup.md) - Database configuration
|
||||
- [Monitoring](05-deployment/monitoring.md) - Observability setup
|
||||
|
||||
### 🔒 [06. Security](06-security/)
|
||||
Security implementation and best practices.
|
||||
- [Security Overview](06-security/README.md) - Security architecture
|
||||
- [Database Security](06-security/database-security.md) - DB security configuration
|
||||
- [RBAC Implementation](06-security/rbac-implementation.md) - Role-based access control
|
||||
- [TLS Configuration](06-security/tls-configuration.md) - Transport security
|
||||
- [Security Checklist](06-security/security-checklist.md) - Pre-deployment checklist
|
||||
|
||||
### ⚖️ [07. Compliance](07-compliance/)
|
||||
Data privacy and regulatory compliance.
|
||||
- [GDPR Implementation](07-compliance/gdpr.md) - GDPR compliance
|
||||
- [Data Privacy](07-compliance/data-privacy.md) - Privacy controls
|
||||
- [Audit Logging](07-compliance/audit-logging.md) - Audit trail system
|
||||
|
||||
### 📖 [08. API Reference](08-api-reference/)
|
||||
API documentation and integration guides.
|
||||
- [API Overview](08-api-reference/README.md) - API introduction
|
||||
- [AI Insights API](08-api-reference/ai-insights-api.md) - AI endpoints
|
||||
- [Authentication](08-api-reference/authentication.md) - Auth mechanisms
|
||||
- [Tenant API](08-api-reference/tenant-api.md) - Tenant management endpoints
|
||||
|
||||
### 🔧 [09. Operations](09-operations/)
|
||||
Production operations and maintenance.
|
||||
- [Operations Guide](09-operations/README.md) - Ops overview
|
||||
- [Monitoring & Observability](09-operations/monitoring-observability.md) - System monitoring
|
||||
- [Backup & Recovery](09-operations/backup-recovery.md) - Data backup procedures
|
||||
- [Troubleshooting](09-operations/troubleshooting.md) - Common issues and solutions
|
||||
- [Runbooks](09-operations/runbooks/) - Step-by-step operational procedures
|
||||
|
||||
### 📋 [10. Reference](10-reference/)
|
||||
Additional reference materials.
|
||||
- [Changelog](10-reference/changelog.md) - Project history and milestones
|
||||
- [Service Tokens](10-reference/service-tokens.md) - Token configuration
|
||||
- [Glossary](10-reference/glossary.md) - Terms and definitions
|
||||
- [Smart Procurement](10-reference/smart-procurement.md) - Procurement feature details
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Main README**: [Project README](../README.md) - Project overview and quick start
|
||||
- **Archived Docs**: [Archive](archive/) - Historical documentation and progress reports
|
||||
|
||||
## Contributing to Documentation
|
||||
|
||||
When updating documentation:
|
||||
1. Keep content focused and concise
|
||||
2. Use clear headings and structure
|
||||
3. Include code examples where relevant
|
||||
4. Update this index when adding new documents
|
||||
5. Cross-link related documents
|
||||
|
||||
## Documentation Standards
|
||||
|
||||
- Use Markdown format
|
||||
- Include a clear title and introduction
|
||||
- Add a table of contents for long documents
|
||||
- Use code blocks with language tags
|
||||
- Keep line length reasonable for readability
|
||||
- Update the last modified date at the bottom
|
||||
**Last Updated:** 2026-01-07
|
||||
**Version:** 2.0
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-11-04
|
||||
## 📚 Documentation Structure
|
||||
|
||||
### 🚀 Getting Started
|
||||
|
||||
#### For New Deployments
|
||||
- **[PILOT_LAUNCH_GUIDE.md](./PILOT_LAUNCH_GUIDE.md)** - Complete guide to deploy production environment
|
||||
- VPS provisioning and setup
|
||||
- Domain and DNS configuration
|
||||
- TLS/SSL certificates
|
||||
- Email and WhatsApp setup
|
||||
- Kubernetes deployment
|
||||
- Configuration and secrets
|
||||
- Verification and testing
|
||||
- **Start here for production pilot launch**
|
||||
|
||||
#### For Production Operations
|
||||
- **[PRODUCTION_OPERATIONS_GUIDE.md](./PRODUCTION_OPERATIONS_GUIDE.md)** - Complete operations manual
|
||||
- Monitoring and observability
|
||||
- Security operations
|
||||
- Database management
|
||||
- Backup and recovery
|
||||
- Performance optimization
|
||||
- Scaling operations
|
||||
- Incident response
|
||||
- Maintenance tasks
|
||||
- Compliance and audit
|
||||
- **Use this for day-to-day operations**
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Security Documentation
|
||||
|
||||
### Core Security Guides
|
||||
- **[security-checklist.md](./security-checklist.md)** - Pre-deployment and ongoing security checklist
|
||||
- Deployment steps with verification
|
||||
- Security validation procedures
|
||||
- Post-deployment tasks
|
||||
- Maintenance schedules
|
||||
|
||||
- **[database-security.md](./database-security.md)** - Database security implementation
|
||||
- 15 databases secured (14 PostgreSQL + 1 Redis)
|
||||
- TLS encryption details
|
||||
- Access control
|
||||
- Audit logging
|
||||
- Compliance (GDPR, PCI-DSS, SOC 2)
|
||||
|
||||
- **[tls-configuration.md](./tls-configuration.md)** - TLS/SSL setup and management
|
||||
- Certificate infrastructure
|
||||
- PostgreSQL TLS configuration
|
||||
- Redis TLS configuration
|
||||
- Certificate rotation procedures
|
||||
- Troubleshooting
|
||||
|
||||
### Access Control
|
||||
- **[rbac-implementation.md](./rbac-implementation.md)** - Role-based access control
|
||||
- 4 user roles (Viewer, Member, Admin, Owner)
|
||||
- 3 subscription tiers (Starter, Professional, Enterprise)
|
||||
- Implementation guidelines
|
||||
- API endpoint protection
|
||||
|
||||
### Compliance & Audit
|
||||
- **[audit-logging.md](./audit-logging.md)** - Audit logging implementation
|
||||
- Event registry system
|
||||
- 11 microservices with audit endpoints
|
||||
- Filtering and search capabilities
|
||||
- Export functionality
|
||||
|
||||
- **[gdpr.md](./gdpr.md)** - GDPR compliance guide
|
||||
- Data protection requirements
|
||||
- Privacy by design
|
||||
- User rights implementation
|
||||
- Data retention policies
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring Documentation
|
||||
|
||||
- **[MONITORING_DEPLOYMENT_SUMMARY.md](./MONITORING_DEPLOYMENT_SUMMARY.md)** - Complete monitoring implementation
|
||||
- Prometheus, AlertManager, Grafana, Jaeger
|
||||
- 50+ alert rules
|
||||
- 11 dashboards
|
||||
- High availability setup
|
||||
- **Complete technical reference**
|
||||
|
||||
- **[QUICK_START_MONITORING.md](./QUICK_START_MONITORING.md)** - Quick setup guide (15 min)
|
||||
- Step-by-step deployment
|
||||
- Configuration updates
|
||||
- Verification procedures
|
||||
- Troubleshooting
|
||||
- **Use this for rapid deployment**
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Architecture & Features
|
||||
|
||||
- **[TECHNICAL-DOCUMENTATION-SUMMARY.md](./TECHNICAL-DOCUMENTATION-SUMMARY.md)** - System architecture overview
|
||||
- 18 microservices
|
||||
- Technology stack
|
||||
- Data models
|
||||
- Integration points
|
||||
|
||||
- **[wizard-flow-specification.md](./wizard-flow-specification.md)** - Onboarding wizard specification
|
||||
- Multi-step setup process
|
||||
- Data collection flows
|
||||
- Validation rules
|
||||
|
||||
- **[poi-detection-system.md](./poi-detection-system.md)** - POI detection implementation
|
||||
- Nominatim geocoding
|
||||
- OSM data integration
|
||||
- Self-hosted solution
|
||||
|
||||
- **[sustainability-features.md](./sustainability-features.md)** - Sustainability tracking
|
||||
- Carbon footprint calculation
|
||||
- Food waste monitoring
|
||||
- Reporting features
|
||||
|
||||
- **[deletion-system.md](./deletion-system.md)** - Safe deletion system
|
||||
- Soft delete implementation
|
||||
- Cascade rules
|
||||
- Recovery procedures
|
||||
|
||||
---
|
||||
|
||||
## 💬 Communication Setup
|
||||
|
||||
### WhatsApp Integration
|
||||
- **[whatsapp/implementation-summary.md](./whatsapp/implementation-summary.md)** - WhatsApp integration overview
|
||||
- **[whatsapp/master-account-setup.md](./whatsapp/master-account-setup.md)** - Master account configuration
|
||||
- **[whatsapp/multi-tenant-implementation.md](./whatsapp/multi-tenant-implementation.md)** - Multi-tenancy setup
|
||||
- **[whatsapp/shared-account-guide.md](./whatsapp/shared-account-guide.md)** - Shared account management
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Development & Testing
|
||||
|
||||
- **[DEV-HTTPS-SETUP.md](./DEV-HTTPS-SETUP.md)** - HTTPS setup for local development
|
||||
- Self-signed certificates
|
||||
- Browser configuration
|
||||
- Testing with SSL
|
||||
|
||||
---
|
||||
|
||||
## 📖 How to Use This Documentation
|
||||
|
||||
### For Initial Production Deployment
|
||||
```
|
||||
1. Read: PILOT_LAUNCH_GUIDE.md (complete walkthrough)
|
||||
2. Check: security-checklist.md (pre-deployment)
|
||||
3. Setup: QUICK_START_MONITORING.md (monitoring)
|
||||
4. Verify: All checklists completed
|
||||
```
|
||||
|
||||
### For Day-to-Day Operations
|
||||
```
|
||||
1. Reference: PRODUCTION_OPERATIONS_GUIDE.md (operations manual)
|
||||
2. Monitor: Use Grafana dashboards (see monitoring docs)
|
||||
3. Maintain: Follow maintenance schedules (in operations guide)
|
||||
4. Secure: Review security-checklist.md monthly
|
||||
```
|
||||
|
||||
### For Security Audits
|
||||
```
|
||||
1. Review: security-checklist.md (audit checklist)
|
||||
2. Verify: database-security.md (database hardening)
|
||||
3. Check: tls-configuration.md (certificate status)
|
||||
4. Audit: audit-logging.md (event logs)
|
||||
5. Compliance: gdpr.md (GDPR requirements)
|
||||
```
|
||||
|
||||
### For Troubleshooting
|
||||
```
|
||||
1. Check: PRODUCTION_OPERATIONS_GUIDE.md (incident response)
|
||||
2. Review: Monitoring dashboards (Grafana)
|
||||
3. Consult: Specific component docs (database, TLS, etc.)
|
||||
4. Execute: Emergency procedures (in operations guide)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Quick Reference
|
||||
|
||||
### Deployment Flow
|
||||
```
|
||||
Pilot Launch Guide
|
||||
↓
|
||||
Security Checklist
|
||||
↓
|
||||
Monitoring Setup
|
||||
↓
|
||||
Production Operations
|
||||
```
|
||||
|
||||
### Operations Flow
|
||||
```
|
||||
Daily: Health checks (operations guide)
|
||||
↓
|
||||
Weekly: Resource review (operations guide)
|
||||
↓
|
||||
Monthly: Security audit (security checklist)
|
||||
↓
|
||||
Quarterly: Full audit + disaster recovery test
|
||||
```
|
||||
|
||||
### Documentation Maintenance
|
||||
```
|
||||
After each deployment: Update deployment notes
|
||||
After incidents: Update troubleshooting sections
|
||||
Monthly: Review and update operations procedures
|
||||
Quarterly: Full documentation review
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Support & Resources
|
||||
|
||||
### Internal Resources
|
||||
- Pilot Launch Guide: Complete deployment walkthrough
|
||||
- Operations Guide: Day-to-day operations manual
|
||||
- Security Documentation: Complete security reference
|
||||
- Monitoring Guides: Observability and alerting
|
||||
|
||||
### External Resources
|
||||
- **Kubernetes:** https://kubernetes.io/docs
|
||||
- **MicroK8s:** https://microk8s.io/docs
|
||||
- **Prometheus:** https://prometheus.io/docs
|
||||
- **Grafana:** https://grafana.com/docs
|
||||
- **PostgreSQL:** https://www.postgresql.org/docs
|
||||
|
||||
### Emergency Contacts
|
||||
- DevOps Team: devops@yourdomain.com
|
||||
- On-Call: oncall@yourdomain.com
|
||||
- Security Team: security@yourdomain.com
|
||||
|
||||
---
|
||||
|
||||
## 📝 Documentation Standards
|
||||
|
||||
### File Naming Convention
|
||||
- `UPPERCASE.md` - Core guides and summaries
|
||||
- `lowercase-hyphenated.md` - Component-specific documentation
|
||||
- `folder/specific-topic.md` - Organized by category
|
||||
|
||||
### Documentation Types
|
||||
- **Guides:** Step-by-step instructions (PILOT_LAUNCH_GUIDE.md)
|
||||
- **References:** Technical specifications (database-security.md)
|
||||
- **Checklists:** Verification procedures (security-checklist.md)
|
||||
- **Summaries:** Implementation overviews (TECHNICAL-DOCUMENTATION-SUMMARY.md)
|
||||
|
||||
### Update Frequency
|
||||
- **Core guides:** After each major deployment or architectural change
|
||||
- **Security docs:** Monthly review, update as needed
|
||||
- **Monitoring docs:** Update when adding dashboards/alerts
|
||||
- **Operations docs:** Update after significant incidents or process changes
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Document Status
|
||||
|
||||
### Active & Maintained
|
||||
✅ All documents listed above are current and actively maintained
|
||||
|
||||
### Deprecated & Removed
|
||||
The following outdated documents have been consolidated into the new guides:
|
||||
- ❌ pilot-launch-cost-effective-plan.md → PILOT_LAUNCH_GUIDE.md
|
||||
- ❌ K8S-MIGRATION-GUIDE.md → PILOT_LAUNCH_GUIDE.md
|
||||
- ❌ MIGRATION-CHECKLIST.md → PILOT_LAUNCH_GUIDE.md
|
||||
- ❌ MIGRATION-SUMMARY.md → PILOT_LAUNCH_GUIDE.md
|
||||
- ❌ vps-sizing-production.md → PILOT_LAUNCH_GUIDE.md
|
||||
- ❌ k8s-production-readiness.md → PILOT_LAUNCH_GUIDE.md
|
||||
- ❌ DEV-PROD-PARITY-ANALYSIS.md → Not needed for pilot
|
||||
- ❌ DEV-PROD-PARITY-CHANGES.md → Not needed for pilot
|
||||
- ❌ colima-setup.md → Development-specific, not needed for prod
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start Paths
|
||||
|
||||
### Path 1: New Production Deployment (First Time)
|
||||
```
|
||||
Time: 2-4 hours
|
||||
|
||||
1. PILOT_LAUNCH_GUIDE.md
|
||||
├── Pre-Launch Checklist
|
||||
├── VPS Provisioning
|
||||
├── Infrastructure Setup
|
||||
├── Domain & DNS
|
||||
├── TLS Certificates
|
||||
├── Email Setup
|
||||
├── Kubernetes Deployment
|
||||
└── Verification
|
||||
|
||||
2. QUICK_START_MONITORING.md
|
||||
└── Setup monitoring (15 min)
|
||||
|
||||
3. security-checklist.md
|
||||
└── Verify security measures
|
||||
|
||||
4. PRODUCTION_OPERATIONS_GUIDE.md
|
||||
└── Setup ongoing operations
|
||||
```
|
||||
|
||||
### Path 2: Operations & Maintenance
|
||||
```
|
||||
Daily:
|
||||
- PRODUCTION_OPERATIONS_GUIDE.md → Daily Tasks
|
||||
- Check Grafana dashboards
|
||||
- Review alerts
|
||||
|
||||
Weekly:
|
||||
- PRODUCTION_OPERATIONS_GUIDE.md → Weekly Tasks
|
||||
- Review resource usage
|
||||
- Check error logs
|
||||
|
||||
Monthly:
|
||||
- security-checklist.md → Monthly audit
|
||||
- PRODUCTION_OPERATIONS_GUIDE.md → Monthly Tasks
|
||||
- Test backup restore
|
||||
```
|
||||
|
||||
### Path 3: Security Hardening
|
||||
```
|
||||
1. security-checklist.md
|
||||
└── Complete security audit
|
||||
|
||||
2. database-security.md
|
||||
└── Verify database hardening
|
||||
|
||||
3. tls-configuration.md
|
||||
└── Check certificate status
|
||||
|
||||
4. rbac-implementation.md
|
||||
└── Review access controls
|
||||
|
||||
5. audit-logging.md
|
||||
└── Review audit logs
|
||||
|
||||
6. gdpr.md
|
||||
└── Verify compliance
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📞 Getting Help
|
||||
|
||||
### For Deployment Issues
|
||||
1. Check PILOT_LAUNCH_GUIDE.md troubleshooting section
|
||||
2. Review specific component docs (database, TLS, etc.)
|
||||
3. Contact DevOps team
|
||||
|
||||
### For Operations Issues
|
||||
1. Check PRODUCTION_OPERATIONS_GUIDE.md incident response
|
||||
2. Review monitoring dashboards
|
||||
3. Check recent events: `kubectl get events`
|
||||
4. Contact On-Call engineer
|
||||
|
||||
### For Security Concerns
|
||||
1. Review security-checklist.md
|
||||
2. Check audit logs
|
||||
3. Contact Security team immediately
|
||||
|
||||
---
|
||||
|
||||
## ✅ Pre-Deployment Checklist
|
||||
|
||||
Before going to production, ensure you have:
|
||||
|
||||
- [ ] Read PILOT_LAUNCH_GUIDE.md completely
|
||||
- [ ] Provisioned VPS with correct specs
|
||||
- [ ] Registered domain name
|
||||
- [ ] Configured DNS (Cloudflare recommended)
|
||||
- [ ] Set up email service (Zoho/Gmail)
|
||||
- [ ] Created WhatsApp Business account
|
||||
- [ ] Generated strong passwords for all services
|
||||
- [ ] Reviewed security-checklist.md
|
||||
- [ ] Planned backup strategy
|
||||
- [ ] Set up monitoring (QUICK_START_MONITORING.md)
|
||||
- [ ] Documented access credentials securely
|
||||
- [ ] Trained team on operations procedures
|
||||
- [ ] Prepared incident response plan
|
||||
- [ ] Scheduled regular maintenance windows
|
||||
|
||||
---
|
||||
|
||||
**🎉 Ready to Deploy?**
|
||||
|
||||
Start with **[PILOT_LAUNCH_GUIDE.md](./PILOT_LAUNCH_GUIDE.md)** for your production deployment!
|
||||
|
||||
For questions or issues, contact: devops@yourdomain.com
|
||||
|
||||
---
|
||||
|
||||
**Documentation Version:** 2.0
|
||||
**Last Major Update:** 2026-01-07
|
||||
**Next Review:** 2026-04-07
|
||||
**Maintained By:** DevOps Team
|
||||
|
||||
Reference in New Issue
Block a user