Files
bakery-ia/docs/security-checklist.md

705 lines
20 KiB
Markdown
Raw Normal View History

2025-11-05 13:34:56 +01:00
# Security Deployment Checklist
**Last Updated:** November 2025
**Status:** Production Deployment Guide
**Security Grade Target:** A-
---
## Table of Contents
1. [Overview](#overview)
2. [Pre-Deployment Checklist](#pre-deployment-checklist)
3. [Deployment Steps](#deployment-steps)
4. [Verification Checklist](#verification-checklist)
5. [Post-Deployment Tasks](#post-deployment-tasks)
6. [Ongoing Maintenance](#ongoing-maintenance)
7. [Security Hardening Roadmap](#security-hardening-roadmap)
8. [Related Documentation](#related-documentation)
---
## Overview
This checklist ensures all security measures are properly implemented before deploying the Bakery IA platform to production.
### Security Grade Targets
| Phase | Security Grade | Timeframe |
|-------|----------------|-----------|
| Pre-Implementation | D- | Baseline |
| Phase 1 Complete | C+ | Week 1-2 |
| Phase 2 Complete | B | Week 3-4 |
| Phase 3 Complete | A- | Week 5-6 |
| Full Hardening | A | Month 3 |
---
## Pre-Deployment Checklist
### Infrastructure Preparation
#### Certificate Infrastructure
- [ ] Generate TLS certificates using `/infrastructure/tls/generate-certificates.sh`
- [ ] Verify CA certificate created (10-year validity)
- [ ] Verify PostgreSQL server certificates (3-year validity)
- [ ] Verify Redis server certificates (3-year validity)
- [ ] Store CA private key securely (NOT in version control)
- [ ] Document certificate expiry dates (October 2028)
#### Kubernetes Cluster
- [ ] Kubernetes cluster running (Kind, GKE, EKS, or AKS)
- [ ] `kubectl` configured and working
- [ ] Namespace `bakery-ia` created
- [ ] Storage class available for PVCs
- [ ] Sufficient resources (CPU: 4+ cores, RAM: 8GB+, Storage: 50GB+)
#### Secrets Management
- [ ] Generate strong passwords (32 characters): `openssl rand -base64 32`
- [ ] Create `.env` file with new passwords (use `.env.example` as template)
- [ ] Update `infrastructure/kubernetes/base/secrets.yaml` with base64-encoded passwords
- [ ] Generate AES-256 key for Kubernetes secrets encryption
- [ ] **Verify passwords are NOT default values** (`*_pass123` is insecure!)
- [ ] Store backup of passwords in secure password manager
- [ ] Document password rotation schedule (every 90 days)
### Security Configuration Files
#### Database Security
- [ ] PostgreSQL TLS secret created: `postgres-tls-secret.yaml`
- [ ] Redis TLS secret created: `redis-tls-secret.yaml`
- [ ] PostgreSQL logging ConfigMap created: `postgres-logging-config.yaml`
- [ ] PostgreSQL init ConfigMap includes pgcrypto extension
#### Application Security
- [ ] All database URLs include `?ssl=require` parameter
- [ ] Redis URLs use `rediss://` protocol
- [ ] Service-to-service authentication configured
- [ ] CORS configured for frontend
- [ ] Rate limiting enabled on authentication endpoints
---
## Deployment Steps
### Phase 1: Database Security (CRITICAL - Week 1)
**Time Required:** 2-3 hours
#### Step 1.1: Deploy PersistentVolumeClaims
```bash
# Verify PVCs exist in database YAML files
grep -r "PersistentVolumeClaim" infrastructure/kubernetes/base/components/databases/
# Apply database deployments (includes PVCs)
kubectl apply -f infrastructure/kubernetes/base/components/databases/
# Verify PVCs are bound
kubectl get pvc -n bakery-ia
```
**Expected:** 15 PVCs (14 PostgreSQL + 1 Redis) in "Bound" state
- [ ] All PostgreSQL PVCs created (2Gi each)
- [ ] Redis PVC created
- [ ] All PVCs in "Bound" state
- [ ] Storage class supports dynamic provisioning
#### Step 1.2: Deploy TLS Certificates
```bash
# Create TLS secrets
kubectl apply -f infrastructure/kubernetes/base/secrets/postgres-tls-secret.yaml
kubectl apply -f infrastructure/kubernetes/base/secrets/redis-tls-secret.yaml
# Verify secrets created
kubectl get secrets -n bakery-ia | grep tls
```
**Expected:** `postgres-tls` and `redis-tls` secrets exist
- [ ] PostgreSQL TLS secret created
- [ ] Redis TLS secret created
- [ ] Secrets contain all required keys (cert, key, ca)
#### Step 1.3: Deploy PostgreSQL Configuration
```bash
# Apply PostgreSQL logging config
kubectl apply -f infrastructure/kubernetes/base/configmaps/postgres-logging-config.yaml
# Apply PostgreSQL init config (pgcrypto)
kubectl apply -f infrastructure/kubernetes/base/configs/postgres-init-config.yaml
# Verify ConfigMaps
kubectl get configmap -n bakery-ia | grep postgres
```
- [ ] PostgreSQL logging ConfigMap created
- [ ] PostgreSQL init ConfigMap created (includes pgcrypto)
- [ ] Configuration includes SSL settings
#### Step 1.4: Update Application Secrets
```bash
# Apply updated secrets with strong passwords
kubectl apply -f infrastructure/kubernetes/base/secrets.yaml
# Verify secrets updated
kubectl get secret bakery-ia-secrets -n bakery-ia -o yaml
```
- [ ] All database passwords updated (32+ characters)
- [ ] Redis password updated
- [ ] JWT secret updated
- [ ] Database connection URLs include SSL parameters
#### Step 1.5: Deploy Databases
```bash
# Deploy all databases
kubectl apply -f infrastructure/kubernetes/base/components/databases/
# Wait for databases to be ready (may take 5-10 minutes)
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=database -n bakery-ia --timeout=600s
# Check database pod status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
```
**Expected:** All 14 PostgreSQL + 1 Redis pods in "Running" state
- [ ] All 14 PostgreSQL database pods running
- [ ] Redis pod running
- [ ] No pod crashes or restarts
- [ ] Init containers completed successfully
### Phase 2: Service Deployment (Week 2)
#### Step 2.1: Deploy Database Migrations
```bash
# Apply migration jobs
kubectl apply -f infrastructure/kubernetes/base/migrations/
# Wait for migrations to complete
kubectl wait --for=condition=complete job -l app.kubernetes.io/component=migration -n bakery-ia --timeout=600s
# Check migration status
kubectl get jobs -n bakery-ia | grep migration
```
**Expected:** All migration jobs show "COMPLETIONS = 1/1"
- [ ] All database migration jobs completed successfully
- [ ] No migration errors in logs
- [ ] Database schemas created
#### Step 2.2: Deploy Services
```bash
# Deploy all microservices
kubectl apply -f infrastructure/kubernetes/base/components/services/
# Wait for services to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=service -n bakery-ia --timeout=600s
# Check service status
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=service
```
**Expected:** All 15 service pods in "Running" state
- [ ] All microservice pods running
- [ ] Services connect to databases with TLS
- [ ] No SSL/TLS errors in logs
- [ ] Health endpoints responding
#### Step 2.3: Deploy Gateway and Frontend
```bash
# Deploy API gateway
kubectl apply -f infrastructure/kubernetes/base/components/gateway/
# Deploy frontend
kubectl apply -f infrastructure/kubernetes/base/components/frontend/
# Check deployment status
kubectl get pods -n bakery-ia
```
- [ ] Gateway pod running
- [ ] Frontend pod running
- [ ] Ingress configured (if applicable)
### Phase 3: Security Hardening (Week 3-4)
#### Step 3.1: Enable Kubernetes Secrets Encryption
```bash
# REQUIRES CLUSTER RECREATION
# Delete existing cluster (WARNING: destroys all data)
kind delete cluster --name bakery-ia-local
# Create cluster with encryption enabled
kind create cluster --config kind-config.yaml
# Re-deploy entire stack
kubectl apply -f infrastructure/kubernetes/base/namespace.yaml
./scripts/apply-security-changes.sh
```
- [ ] Encryption configuration file created
- [ ] Kind cluster configured with encryption
- [ ] All secrets encrypted at rest
- [ ] Encryption verified (check kube-apiserver logs)
#### Step 3.2: Configure Audit Logging
```bash
# Verify PostgreSQL logging enabled
kubectl exec -n bakery-ia <postgres-pod> -- sh -c \
'psql -U $POSTGRES_USER -d $POSTGRES_DB -c "SHOW log_statement;"'
# Should show: all
```
- [ ] PostgreSQL logs all statements
- [ ] Connection logging enabled
- [ ] Query duration logging enabled
- [ ] Log rotation configured
#### Step 3.3: Enable pgcrypto Extension
```bash
# Verify pgcrypto installed
kubectl exec -n bakery-ia <postgres-pod> -- sh -c \
'psql -U $POSTGRES_USER -d $POSTGRES_DB -c "SELECT * FROM pg_extension WHERE extname='"'"'pgcrypto'"'"';"'
# Should return one row
```
- [ ] pgcrypto extension available in all databases
- [ ] Encryption functions tested
- [ ] Documentation for using column-level encryption provided
---
## Verification Checklist
### Database Security Verification
#### PostgreSQL TLS
```bash
# 1. Verify SSL enabled
kubectl exec -n bakery-ia auth-db-<pod-id> -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl;"'
# Expected: on
# 2. Verify TLS version
kubectl exec -n bakery-ia auth-db-<pod-id> -- sh -c \
'psql -U auth_user -d auth_db -c "SHOW ssl_min_protocol_version;"'
# Expected: TLSv1.2
# 3. Verify certificate permissions
kubectl exec -n bakery-ia auth-db-<pod-id> -- ls -la /tls/
# Expected: server-key.pem = 600, server-cert.pem = 644
# 4. Check certificate expiry
kubectl exec -n bakery-ia auth-db-<pod-id> -- \
openssl x509 -in /tls/server-cert.pem -noout -dates
# Expected: notAfter=Oct 17 00:00:00 2028 GMT
```
**Verification Checklist:**
- [ ] SSL enabled on all 14 PostgreSQL databases
- [ ] TLS 1.2+ enforced
- [ ] Certificates have correct permissions (key=600, cert=644)
- [ ] Certificates valid until 2028
- [ ] All certificates owned by postgres user
#### Redis TLS
```bash
# 1. Test Redis TLS connection
kubectl exec -n bakery-ia redis-<pod-id> -- redis-cli \
--tls \
--cert /tls/redis-cert.pem \
--key /tls/redis-key.pem \
--cacert /tls/ca-cert.pem \
-a <redis-password> \
ping
# Expected: PONG
# 2. Verify plaintext port disabled
kubectl exec -n bakery-ia redis-<pod-id> -- redis-cli -a <redis-password> ping
# Expected: Connection refused
```
**Verification Checklist:**
- [ ] Redis responds to TLS connections
- [ ] Plaintext connections refused
- [ ] Password authentication working
- [ ] No "wrong version number" errors in logs
#### Service Connections
```bash
# 1. Check migration jobs
kubectl get jobs -n bakery-ia | grep migration
# Expected: All show "1/1" completions
# 2. Check service logs for SSL enforcement
kubectl logs -n bakery-ia auth-service-<pod-id> | grep "SSL enforcement"
# Expected: "SSL enforcement added to database URL"
# 3. Check for connection errors
kubectl logs -n bakery-ia auth-service-<pod-id> | grep -i "error" | grep -i "ssl"
# Expected: No SSL/TLS errors
```
**Verification Checklist:**
- [ ] All migration jobs completed successfully
- [ ] Services show SSL enforcement in logs
- [ ] No TLS/SSL connection errors
- [ ] All services can connect to databases
- [ ] Health endpoints return 200 OK
### Data Persistence Verification
```bash
# 1. Check all PVCs
kubectl get pvc -n bakery-ia
# Expected: 15 PVCs, all "Bound"
# 2. Check PVC sizes
kubectl get pvc -n bakery-ia -o custom-columns=NAME:.metadata.name,SIZE:.spec.resources.requests.storage
# Expected: PostgreSQL=2Gi, Redis=1Gi
# 3. Test data persistence (restart a database)
kubectl delete pod auth-db-<pod-id> -n bakery-ia
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=auth-db -n bakery-ia --timeout=120s
# Data should persist after restart
```
**Verification Checklist:**
- [ ] All 15 PVCs in "Bound" state
- [ ] Correct storage sizes allocated
- [ ] Data persists across pod restarts
- [ ] No emptyDir volumes for databases
### Password Security Verification
```bash
# 1. Check password strength
kubectl get secret bakery-ia-secrets -n bakery-ia -o jsonpath='{.data.AUTH_DB_PASSWORD}' | base64 -d | wc -c
# Expected: 32 or more characters
# 2. Verify passwords are NOT defaults
kubectl get secret bakery-ia-secrets -n bakery-ia -o jsonpath='{.data.AUTH_DB_PASSWORD}' | base64 -d
# Should NOT be: auth_pass123
```
**Verification Checklist:**
- [ ] All passwords 32+ characters
- [ ] Passwords use cryptographically secure random generation
- [ ] No default passwords (`*_pass123`) in use
- [ ] Passwords backed up in secure location
- [ ] Password rotation schedule documented
### Compliance Verification
**GDPR Article 32:**
- [ ] Encryption in transit implemented (TLS)
- [ ] Encryption at rest available (pgcrypto + K8s)
- [ ] Privacy policy claims are accurate
- [ ] User data access logging enabled
**PCI-DSS:**
- [ ] Requirement 3.4: Transmission encryption (TLS) ✓
- [ ] Requirement 3.5: Stored data protection (pgcrypto) ✓
- [ ] Requirement 10: Access tracking (audit logs) ✓
**SOC 2:**
- [ ] CC6.1: Access controls (RBAC) ✓
- [ ] CC6.6: Transit encryption (TLS) ✓
- [ ] CC6.7: Rest encryption (K8s + pgcrypto) ✓
---
## Post-Deployment Tasks
### Immediate (First 24 Hours)
#### Backup Configuration
```bash
# 1. Test backup script
./scripts/encrypted-backup.sh
# 2. Verify backup created
ls -lh /path/to/backups/
# 3. Test restore process
gpg --decrypt backup_file.sql.gz.gpg | gunzip | head -n 10
```
- [ ] Backup script tested and working
- [ ] Backups encrypted with GPG
- [ ] Restore process documented and tested
- [ ] Backup storage location configured
- [ ] Backup retention policy defined
#### Monitoring Setup
```bash
# 1. Set up certificate expiry monitoring
# Add to monitoring system: Alert 90 days before October 2028
# 2. Set up database health checks
# Monitor: Connection count, query performance, disk usage
# 3. Set up audit log monitoring
# Monitor: Failed login attempts, privilege escalations
```
- [ ] Certificate expiry alerts configured
- [ ] Database health monitoring enabled
- [ ] Audit log monitoring configured
- [ ] Security event alerts configured
- [ ] Performance monitoring enabled
### First Week
#### Security Audit
```bash
# 1. Review audit logs
kubectl logs -n bakery-ia <db-pod> | grep -i "authentication failed"
# 2. Review access patterns
kubectl logs -n bakery-ia <db-pod> | grep -i "connection received"
# 3. Check for anomalies
kubectl logs -n bakery-ia <db-pod> | grep -iE "(error|warning|fatal)"
```
- [ ] Audit logs reviewed for suspicious activity
- [ ] No unauthorized access attempts
- [ ] All services connecting properly
- [ ] No security warnings in logs
#### Documentation
- [ ] Update runbooks with new security procedures
- [ ] Document certificate rotation process
- [ ] Document password rotation process
- [ ] Update disaster recovery plan
- [ ] Share security documentation with team
### First Month
#### Access Control Implementation
- [ ] Implement role decorators on critical endpoints
- [ ] Add subscription tier checks on premium features
- [ ] Implement rate limiting on ML operations
- [ ] Add audit logging for destructive operations
- [ ] Test RBAC enforcement
#### Backup and Recovery
- [ ] Set up automated daily backups (2 AM)
- [ ] Configure backup rotation (30/90/365 days)
- [ ] Test disaster recovery procedure
- [ ] Document recovery time objectives (RTO)
- [ ] Document recovery point objectives (RPO)
---
## Ongoing Maintenance
### Daily
- [ ] Monitor database health (automated)
- [ ] Check backup completion (automated)
- [ ] Review critical alerts
### Weekly
- [ ] Review audit logs for anomalies
- [ ] Check certificate expiry dates
- [ ] Verify backup integrity
- [ ] Review access control logs
### Monthly
- [ ] Review security posture
- [ ] Update security documentation
- [ ] Test backup restore process
- [ ] Review and update RBAC policies
- [ ] Check for security updates
### Quarterly (Every 90 Days)
- [ ] **Rotate all passwords**
- [ ] Review and update security policies
- [ ] Conduct security audit
- [ ] Update disaster recovery plan
- [ ] Review compliance status
- [ ] Security team training
### Annually
- [ ] Full security assessment
- [ ] Penetration testing
- [ ] Compliance audit (GDPR, PCI-DSS, SOC 2)
- [ ] Update security roadmap
- [ ] Review and update all security documentation
### Before Certificate Expiry (Oct 2028 - Alert 90 Days Prior)
- [ ] Generate new TLS certificates
- [ ] Test new certificates in staging
- [ ] Schedule maintenance window
- [ ] Update Kubernetes secrets
- [ ] Restart database pods
- [ ] Verify new certificates working
- [ ] Update documentation with new expiry dates
---
## Security Hardening Roadmap
### Completed (Security Grade: A-)
- ✅ TLS encryption for all database connections
- ✅ Strong password policy (32-character passwords)
- ✅ Data persistence with PVCs
- ✅ Kubernetes secrets encryption
- ✅ PostgreSQL audit logging
- ✅ pgcrypto extension for encryption at rest
- ✅ Automated encrypted backups
### Phase 1: Critical Security (Weeks 1-2)
- [ ] Add role decorators to all deletion endpoints
- [ ] Implement owner-only checks for billing/subscription
- [ ] Add service-to-service authentication
- [ ] Implement audit logging for critical operations
- [ ] Add rate limiting on authentication endpoints
### Phase 2: Premium Feature Gating (Weeks 3-4)
- [ ] Implement forecast horizon limits per tier
- [ ] Implement training job quotas per tier
- [ ] Implement dataset size limits for ML
- [ ] Add tier checks to advanced analytics
- [ ] Add tier checks to scenario modeling
- [ ] Implement usage quota tracking
### Phase 3: Advanced Access Control (Month 2)
- [ ] Fine-grained resource permissions
- [ ] Department-based access control
- [ ] Approval workflows for critical operations
- [ ] Data retention policies
- [ ] GDPR data export functionality
### Phase 4: Infrastructure Hardening (Month 3)
- [ ] Network policies for service isolation
- [ ] Pod security policies
- [ ] Resource quotas and limits
- [ ] Container image scanning
- [ ] Secrets management with HashiCorp Vault (optional)
### Phase 5: Advanced Features (Month 4-6)
- [ ] Mutual TLS (mTLS) for service-to-service
- [ ] Database activity monitoring (DAM)
- [ ] SIEM integration
- [ ] Automated certificate rotation
- [ ] Multi-region disaster recovery
### Long-term (6+ Months)
- [ ] Migrate to managed database services (AWS RDS, Cloud SQL)
- [ ] Implement HashiCorp Vault for secrets
- [ ] Deploy Istio service mesh
- [ ] Implement zero-trust networking
- [ ] SOC 2 Type II certification
---
## Related Documentation
### Security Guides
- [Database Security](./database-security.md) - Complete database security guide
- [RBAC Implementation](./rbac-implementation.md) - Access control details
- [TLS Configuration](./tls-configuration.md) - TLS/SSL setup guide
### Source Reports
- [Database Security Analysis Report](../DATABASE_SECURITY_ANALYSIS_REPORT.md)
- [Security Implementation Complete](../SECURITY_IMPLEMENTATION_COMPLETE.md)
- [RBAC Analysis Report](../RBAC_ANALYSIS_REPORT.md)
- [TLS Implementation Complete](../TLS_IMPLEMENTATION_COMPLETE.md)
### Operational Guides
- [Backup and Recovery Guide](../operations/backup-recovery.md) (if exists)
- [Monitoring Guide](../operations/monitoring.md) (if exists)
- [Incident Response Plan](../operations/incident-response.md) (if exists)
---
## Quick Reference
### Common Verification Commands
```bash
# Verify all databases running
kubectl get pods -n bakery-ia -l app.kubernetes.io/component=database
# Verify all PVCs bound
kubectl get pvc -n bakery-ia
# Verify TLS secrets
kubectl get secrets -n bakery-ia | grep tls
# Check certificate expiry
kubectl exec -n bakery-ia <pod> -- \
openssl x509 -in /tls/server-cert.pem -noout -dates
# Test database connection
kubectl exec -n bakery-ia <pod> -- sh -c \
'psql -U $POSTGRES_USER -d $POSTGRES_DB -c "SELECT version();"'
# Test Redis connection
kubectl exec -n bakery-ia <pod> -- redis-cli \
--tls --cert /tls/redis-cert.pem \
--key /tls/redis-key.pem \
--cacert /tls/ca-cert.pem \
-a $REDIS_PASSWORD ping
# View recent audit logs
kubectl logs -n bakery-ia <db-pod> --tail=100
# Restart all services
kubectl rollout restart deployment -n bakery-ia
```
### Emergency Procedures
**Database Pod Not Starting:**
```bash
# 1. Check init container logs
kubectl logs -n bakery-ia <pod> -c fix-tls-permissions
# 2. Check main container logs
kubectl logs -n bakery-ia <pod>
# 3. Describe pod for events
kubectl describe pod <pod> -n bakery-ia
```
**Services Can't Connect to Database:**
```bash
# 1. Verify database is listening
kubectl exec -n bakery-ia <db-pod> -- netstat -tlnp
# 2. Check service logs
kubectl logs -n bakery-ia <service-pod> | grep -i "database\|error"
# 3. Restart service
kubectl rollout restart deployment/<service> -n bakery-ia
```
**Lost Database Password:**
```bash
# 1. Recover from backup
kubectl get secret bakery-ia-secrets -n bakery-ia -o jsonpath='{.data.AUTH_DB_PASSWORD}' | base64 -d
# 2. Or check .env file (if available)
grep AUTH_DB_PASSWORD .env
# 3. Last resort: Reset password (requires database restart)
```
---
**Document Version:** 1.0
**Last Review:** November 2025
**Next Review:** February 2026
**Owner:** Security Team
**Approval Required:** DevOps Lead, Security Lead