Fix: Enforce single-worker deployment for session cache cluster safety
Addresses: Backend session cache not cluster-safe (multi-worker issue) Problem: - Session cache is process-local (InMemorySessionCache) - Multi-worker deployments (uvicorn --workers N) create separate processes - Each process has its own independent session cache - Sessions cached in Worker A are invisible to Workers B, C, D - Users randomly logged out when requests land on different workers - Also affects RuntimeState, rate limiter, and background jobs Solution (Option A - Strict single-worker enforcement): - Enhance startup validation with clearer error messages - Update error messages to explain the problem and how to fix it - Document single-worker requirement prominently in Docker configs - Update module docstrings to clarify constraints Changes: 1. app/startup.py: - Enhanced _check_single_worker_mode() error message with troubleshooting - Enhanced _stage_check_worker_mode_and_acquire_lock() error message - Removed unused import 2. app/utils/session_cache.py: - Updated module docstring to explain constraints more clearly - Added references to deployment documentation - Clarified multi-worker solution for future implementation 3. app/utils/runtime_state.py: - Updated module docstring with deployment constraint references - Aligned messaging with session_cache.py 4. Docker/Dockerfile.backend: - Added comprehensive comments about single-worker requirement - Explained impact in multi-worker deployments - Referenced deployment constraints documentation 5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml: - Added documentation comments about BANGUI_WORKERS constraint - Explained why single-worker is required 6. backend/tests/test_startup_integration.py: - Fixed test unpacking to match function return signature (3 values, not 2) This ensures multi-worker deployments fail loudly at startup with clear guidance on what went wrong and how to fix it. The database-backed scheduler lock provides defense-in-depth for container orchestration scenarios. For future multi-worker support, implement: - Redis or database-backed session cache - Shared RuntimeState coordination - Distributed APScheduler backend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -83,11 +83,12 @@ async def test_startup_shared_resources_complete_flow() -> None:
|
||||
mock_blocklist_import_register.return_value = None
|
||||
|
||||
# Call startup_shared_resources
|
||||
http_session, scheduler = await startup_shared_resources(app, settings)
|
||||
http_session, scheduler, startup_db = await startup_shared_resources(app, settings)
|
||||
|
||||
# Verify all stages completed successfully
|
||||
assert http_session is not None
|
||||
assert scheduler is not None
|
||||
assert startup_db is not None
|
||||
assert scheduler.running
|
||||
|
||||
# Verify resources were initialized
|
||||
@@ -178,7 +179,7 @@ async def test_startup_shared_resources_scheduler_starts() -> None:
|
||||
mock_geo_cache.init_geoip = MagicMock()
|
||||
mock_geo_cache_class.return_value = mock_geo_cache
|
||||
|
||||
http_session, scheduler = await startup_shared_resources(app, settings)
|
||||
http_session, scheduler, startup_db = await startup_shared_resources(app, settings)
|
||||
|
||||
# Verify scheduler is running
|
||||
assert scheduler.running
|
||||
|
||||
Reference in New Issue
Block a user