Fix training log race conditions and audit event error
Critical fixes for training session logging:
1. Training log race condition fix:
- Add explicit session commits after creating training logs
- Handle duplicate key errors gracefully when multiple sessions
try to create the same log simultaneously
- Implement retry logic to query for existing logs after
duplicate key violations
- Prevents "Training log not found" errors during training
2. Audit event async generator error fix:
- Replace incorrect next(get_db()) usage with proper
async context manager (database_manager.get_session())
- Fixes "'async_generator' object is not an iterator" error
- Ensures audit logging works correctly
These changes address race conditions in concurrent database
sessions and ensure training logs are properly synchronized
across the training pipeline.
This commit is contained in:
@@ -236,27 +236,27 @@ async def start_training_job(
|
||||
|
||||
# Log audit event for training job creation
|
||||
try:
|
||||
from app.core.database import get_db
|
||||
db = next(get_db())
|
||||
await audit_logger.log_event(
|
||||
db_session=db,
|
||||
tenant_id=tenant_id,
|
||||
user_id=current_user["user_id"],
|
||||
action=AuditAction.CREATE.value,
|
||||
resource_type="training_job",
|
||||
resource_id=job_id,
|
||||
severity=AuditSeverity.MEDIUM.value,
|
||||
description=f"Started training job (tier: {tier})",
|
||||
metadata={
|
||||
"job_id": job_id,
|
||||
"tier": tier,
|
||||
"estimated_dataset_size": estimated_dataset_size,
|
||||
"quota_usage": quota_result.get('current', 0) if quota_result else 0,
|
||||
"quota_limit": quota_limit if quota_limit else "unlimited"
|
||||
},
|
||||
endpoint="/jobs",
|
||||
method="POST"
|
||||
)
|
||||
from app.core.database import database_manager
|
||||
async with database_manager.get_session() as db:
|
||||
await audit_logger.log_event(
|
||||
db_session=db,
|
||||
tenant_id=tenant_id,
|
||||
user_id=current_user["user_id"],
|
||||
action=AuditAction.CREATE.value,
|
||||
resource_type="training_job",
|
||||
resource_id=job_id,
|
||||
severity=AuditSeverity.MEDIUM.value,
|
||||
description=f"Started training job (tier: {tier})",
|
||||
metadata={
|
||||
"job_id": job_id,
|
||||
"tier": tier,
|
||||
"estimated_dataset_size": estimated_dataset_size,
|
||||
"quota_usage": quota_result.get('current', 0) if quota_result else 0,
|
||||
"quota_limit": quota_limit if quota_limit else "unlimited"
|
||||
},
|
||||
endpoint="/jobs",
|
||||
method="POST"
|
||||
)
|
||||
except Exception as audit_error:
|
||||
logger.warning("Failed to log audit event", error=str(audit_error))
|
||||
|
||||
|
||||
Reference in New Issue
Block a user