Files
bakery-ia/COMPLETION_CHECKLIST.md
2025-10-31 11:54:19 +01:00

471 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Completion Checklist - Tenant & User Deletion System
**Current Status:** 75% Complete
**Time to 100%:** ~4 hours implementation + 2 days testing
---
## Phase 1: Complete Remaining Services (1.5 hours)
### POS Service (30 minutes)
- [ ] Create `services/pos/app/services/tenant_deletion_service.py`
- [ ] Copy template from QUICK_START_REMAINING_SERVICES.md
- [ ] Import models: POSConfiguration, POSTransaction, POSSession
- [ ] Implement `get_tenant_data_preview()`
- [ ] Implement `delete_tenant_data()` with correct order:
- [ ] 1. POSTransaction
- [ ] 2. POSSession
- [ ] 3. POSConfiguration
- [ ] Add endpoints to `services/pos/app/api/{router}.py`
- [ ] DELETE /tenant/{tenant_id}
- [ ] GET /tenant/{tenant_id}/deletion-preview
- [ ] Test manually:
```bash
curl -X GET "http://localhost:8000/api/v1/pos/tenant/{id}/deletion-preview"
curl -X DELETE "http://localhost:8000/api/v1/pos/tenant/{id}"
```
### External Service (30 minutes)
- [ ] Create `services/external/app/services/tenant_deletion_service.py`
- [ ] Copy template
- [ ] Import models: ExternalDataCache, APIKeyUsage
- [ ] Implement `get_tenant_data_preview()`
- [ ] Implement `delete_tenant_data()` with order:
- [ ] 1. APIKeyUsage
- [ ] 2. ExternalDataCache
- [ ] Add endpoints to `services/external/app/api/{router}.py`
- [ ] DELETE /tenant/{tenant_id}
- [ ] GET /tenant/{tenant_id}/deletion-preview
- [ ] Test manually
### Alert Processor Service (30 minutes)
- [ ] Create `services/alert_processor/app/services/tenant_deletion_service.py`
- [ ] Copy template
- [ ] Import models: Alert, AlertRule, AlertHistory
- [ ] Implement `get_tenant_data_preview()`
- [ ] Implement `delete_tenant_data()` with order:
- [ ] 1. AlertHistory
- [ ] 2. Alert
- [ ] 3. AlertRule
- [ ] Add endpoints to `services/alert_processor/app/api/{router}.py`
- [ ] DELETE /tenant/{tenant_id}
- [ ] GET /tenant/{tenant_id}/deletion-preview
- [ ] Test manually
---
## Phase 2: Refactor Existing Services (2.5 hours)
### Forecasting Service (45 minutes)
- [ ] Review existing deletion logic in forecasting service
- [ ] Create new `services/forecasting/app/services/tenant_deletion_service.py`
- [ ] Extend BaseTenantDataDeletionService
- [ ] Move existing logic into standard pattern
- [ ] Import models: Forecast, PredictionBatch, etc.
- [ ] Update endpoints to use new pattern
- [ ] Replace existing DELETE logic
- [ ] Add deletion-preview endpoint
- [ ] Test both endpoints
### Training Service (45 minutes)
- [ ] Review existing deletion logic
- [ ] Create new `services/training/app/services/tenant_deletion_service.py`
- [ ] Extend BaseTenantDataDeletionService
- [ ] Move existing logic into standard pattern
- [ ] Import models: TrainingJob, TrainedModel, ModelArtifact
- [ ] Update endpoints to use new pattern
- [ ] Test both endpoints
### Notification Service (45 minutes)
- [ ] Review existing deletion logic
- [ ] Create new `services/notification/app/services/tenant_deletion_service.py`
- [ ] Extend BaseTenantDataDeletionService
- [ ] Move existing logic into standard pattern
- [ ] Import models: Notification, NotificationPreference, etc.
- [ ] Update endpoints to use new pattern
- [ ] Test both endpoints
---
## Phase 3: Integration (2 hours)
### Update Auth Service
- [ ] Open `services/auth/app/services/admin_delete.py`
- [ ] Import DeletionOrchestrator:
```python
from app.services.deletion_orchestrator import DeletionOrchestrator
```
- [ ] Update `_delete_tenant_data()` method:
```python
async def _delete_tenant_data(self, tenant_id: str):
orchestrator = DeletionOrchestrator(auth_token=self.get_service_token())
job = await orchestrator.orchestrate_tenant_deletion(
tenant_id=tenant_id,
tenant_name=tenant_info.get("name"),
initiated_by=self.requesting_user_id
)
return job.to_dict()
```
- [ ] Remove old manual service calls
- [ ] Test complete user deletion flow
### Verify Service URLs
- [ ] Check orchestrator SERVICE_DELETION_ENDPOINTS
- [ ] Update URLs for your environment:
- [ ] Development: localhost ports
- [ ] Staging: service names
- [ ] Production: service names
---
## Phase 4: Testing (2 days)
### Unit Tests (Day 1)
- [ ] Test TenantDataDeletionResult
```python
def test_deletion_result_creation():
result = TenantDataDeletionResult("tenant-123", "test-service")
assert result.tenant_id == "tenant-123"
assert result.success == True
```
- [ ] Test BaseTenantDataDeletionService
```python
async def test_safe_delete_handles_errors():
# Test error handling
```
- [ ] Test each service deletion class
```python
async def test_orders_deletion():
# Create test data
# Call delete_tenant_data()
# Verify data deleted
```
- [ ] Test DeletionOrchestrator
```python
async def test_orchestrator_parallel_execution():
# Mock service responses
# Verify all called
```
- [ ] Test DeletionJob tracking
```python
def test_job_status_tracking():
# Create job
# Check status transitions
```
### Integration Tests (Day 1-2)
- [ ] Test tenant deletion endpoint
```python
async def test_delete_tenant_endpoint():
response = await client.delete(f"/api/v1/tenants/{tenant_id}")
assert response.status_code == 200
```
- [ ] Test service-to-service calls
```python
async def test_orders_deletion_via_orchestrator():
# Create tenant with orders
# Delete tenant
# Verify orders deleted
```
- [ ] Test CASCADE deletes
```python
async def test_cascade_deletes_children():
# Create parent with children
# Delete parent
# Verify children also deleted
```
- [ ] Test error handling
```python
async def test_partial_failure_handling():
# Mock one service failure
# Verify job shows failure
# Verify other services succeeded
```
### E2E Tests (Day 2)
- [ ] Test complete tenant deletion
```python
async def test_complete_tenant_deletion():
# Create tenant with data in all services
# Delete tenant
# Verify all data deleted
# Check deletion job status
```
- [ ] Test complete user deletion
```python
async def test_user_deletion_with_owned_tenants():
# Create user with owned tenants
# Create other admins
# Delete user
# Verify ownership transferred
# Verify user data deleted
```
- [ ] Test owner deletion with tenant deletion
```python
async def test_owner_deletion_no_other_admins():
# Create user with tenant (no other admins)
# Delete user
# Verify tenant deleted
# Verify all cascade deletes
```
### Manual Testing (Throughout)
- [ ] Test with small dataset (<100 records)
- [ ] Test with medium dataset (1,000 records)
- [ ] Test with large dataset (10,000+ records)
- [ ] Measure performance
- [ ] Verify database queries are efficient
- [ ] Check logs for errors
- [ ] Verify audit trail
---
## Phase 5: Database Persistence (1 day)
### Create Migration
- [ ] Create deletion_jobs table:
```sql
CREATE TABLE deletion_jobs (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
tenant_name VARCHAR(255),
initiated_by UUID,
status VARCHAR(50) NOT NULL,
service_results JSONB,
total_items_deleted INTEGER DEFAULT 0,
started_at TIMESTAMP WITH TIME ZONE,
completed_at TIMESTAMP WITH TIME ZONE,
error_log TEXT[],
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX idx_deletion_jobs_tenant ON deletion_jobs(tenant_id);
CREATE INDEX idx_deletion_jobs_status ON deletion_jobs(status);
CREATE INDEX idx_deletion_jobs_initiated ON deletion_jobs(initiated_by);
```
- [ ] Run migration in dev
- [ ] Run migration in staging
### Update Orchestrator
- [ ] Add database session to DeletionOrchestrator
- [ ] Save job to database in orchestrate_tenant_deletion()
- [ ] Update job status in database
- [ ] Query jobs from database in get_job_status()
- [ ] Query jobs from database in list_jobs()
### Add Job API Endpoints
- [ ] Create `services/auth/app/api/deletion_jobs.py`
```python
@router.get("/deletion-jobs/{job_id}")
async def get_job_status(job_id: str):
# Query from database
@router.get("/deletion-jobs")
async def list_deletion_jobs(
tenant_id: Optional[str] = None,
status: Optional[str] = None,
limit: int = 100
):
# Query from database with filters
```
- [ ] Test job status endpoints
---
## Phase 6: Production Prep (2 days)
### Performance Testing
- [ ] Create test dataset with 100K records
- [ ] Run deletion and measure time
- [ ] Identify bottlenecks
- [ ] Optimize slow queries
- [ ] Add batch processing if needed
- [ ] Re-test and verify improvement
### Monitoring Setup
- [ ] Add Prometheus metrics:
```python
deletion_duration_seconds = Histogram(...)
deletion_items_deleted = Counter(...)
deletion_errors_total = Counter(...)
deletion_jobs_status = Gauge(...)
```
- [ ] Create Grafana dashboard:
- [ ] Active deletions gauge
- [ ] Deletion rate graph
- [ ] Error rate graph
- [ ] Average duration graph
- [ ] Items deleted by service
- [ ] Configure alerts:
- [ ] Alert if deletion >5 minutes
- [ ] Alert if >10% error rate
- [ ] Alert if service timeouts
### Documentation Updates
- [ ] Update API documentation
- [ ] Create operations runbook
- [ ] Document rollback procedures
- [ ] Create troubleshooting guide
### Rollout Plan
- [ ] Deploy to dev environment
- [ ] Run full test suite
- [ ] Deploy to staging
- [ ] Run smoke tests
- [ ] Deploy to production with feature flag
- [ ] Monitor for 24 hours
- [ ] Enable for all tenants
---
## Phase 7: Optional Enhancements (Future)
### Soft Delete (2 days)
- [ ] Add deleted_at column to tenants table
- [ ] Implement 30-day retention
- [ ] Add restoration endpoint
- [ ] Add cleanup job for expired deletions
- [ ] Update queries to filter deleted tenants
### Advanced Features (1 week)
- [ ] WebSocket progress updates
- [ ] Email notifications on completion
- [ ] Deletion reports (PDF download)
- [ ] Scheduled deletions
- [ ] Deletion preview aggregation
---
## Sign-Off Checklist
### Code Quality
- [ ] All services implemented
- [ ] All endpoints tested
- [ ] No compiler warnings
- [ ] Code reviewed
- [ ] Documentation complete
### Testing
- [ ] Unit tests passing (>80% coverage)
- [ ] Integration tests passing
- [ ] E2E tests passing
- [ ] Performance tests passing
- [ ] Manual testing complete
### Production Readiness
- [ ] Monitoring configured
- [ ] Alerts configured
- [ ] Logging verified
- [ ] Rollback plan documented
- [ ] Runbook created
### Security & Compliance
- [ ] Authorization verified
- [ ] Audit logging enabled
- [ ] GDPR compliance verified
- [ ] Data retention policy documented
- [ ] Security review completed
---
## Quick Reference
### Files to Create (3 new services):
1. `services/pos/app/services/tenant_deletion_service.py`
2. `services/external/app/services/tenant_deletion_service.py`
3. `services/alert_processor/app/services/tenant_deletion_service.py`
### Files to Modify (3 refactored services):
1. `services/forecasting/app/services/tenant_deletion_service.py`
2. `services/training/app/services/tenant_deletion_service.py`
3. `services/notification/app/services/tenant_deletion_service.py`
### Files to Update (integration):
1. `services/auth/app/services/admin_delete.py`
### Tests to Write (~50 tests):
- 10 unit tests (base classes)
- 24 service-specific tests (2 per service × 12 services)
- 10 integration tests
- 6 E2E tests
### Time Estimate:
- Implementation: 4 hours
- Testing: 2 days
- Deployment: 2 days
- **Total: ~5 days**
---
## Success Criteria
✅ All 12 services have deletion logic
✅ All deletion endpoints working
✅ Orchestrator coordinating successfully
✅ Job tracking persisted to database
✅ All tests passing
✅ Performance acceptable (<5 min for large tenants)
Monitoring in place
Documentation complete
Production deployment successful
---
**Keep this checklist handy and mark items as you complete them!**
**Remember:** Templates and examples are in QUICK_START_REMAINING_SERVICES.md