Fix: Enforce single-worker deployment for session cache cluster safety
Addresses: Backend session cache not cluster-safe (multi-worker issue) Problem: - Session cache is process-local (InMemorySessionCache) - Multi-worker deployments (uvicorn --workers N) create separate processes - Each process has its own independent session cache - Sessions cached in Worker A are invisible to Workers B, C, D - Users randomly logged out when requests land on different workers - Also affects RuntimeState, rate limiter, and background jobs Solution (Option A - Strict single-worker enforcement): - Enhance startup validation with clearer error messages - Update error messages to explain the problem and how to fix it - Document single-worker requirement prominently in Docker configs - Update module docstrings to clarify constraints Changes: 1. app/startup.py: - Enhanced _check_single_worker_mode() error message with troubleshooting - Enhanced _stage_check_worker_mode_and_acquire_lock() error message - Removed unused import 2. app/utils/session_cache.py: - Updated module docstring to explain constraints more clearly - Added references to deployment documentation - Clarified multi-worker solution for future implementation 3. app/utils/runtime_state.py: - Updated module docstring with deployment constraint references - Aligned messaging with session_cache.py 4. Docker/Dockerfile.backend: - Added comprehensive comments about single-worker requirement - Explained impact in multi-worker deployments - Referenced deployment constraints documentation 5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml: - Added documentation comments about BANGUI_WORKERS constraint - Explained why single-worker is required 6. backend/tests/test_startup_integration.py: - Fixed test unpacking to match function return signature (3 values, not 2) This ensures multi-worker deployments fail loudly at startup with clear guidance on what went wrong and how to fix it. The database-backed scheduler lock provides defense-in-depth for container orchestration scenarios. For future multi-worker support, implement: - Redis or database-backed session cache - Shared RuntimeState coordination - Distributed APScheduler backend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -24,18 +24,26 @@ IMPACT IN MULTI-WORKER DEPLOYMENTS:
|
||||
- fail2ban activation/recovery tracking (pending_recovery, last_activation)
|
||||
is per-worker and unreliable across processes.
|
||||
|
||||
MULTI-WORKER SOLUTION:
|
||||
To deploy BanGUI with multiple workers (e.g., via gunicorn -w 4), you must:
|
||||
1. Replace RuntimeState with a shared store (Redis, shared memory, database).
|
||||
2. Replace InMemorySessionCache with RedisSessionCache (see session_cache.py).
|
||||
3. Ensure all workers use the same backend for coordination.
|
||||
|
||||
SINGLE-WORKER ENFORCEMENT:
|
||||
See TASK-002 in Docs/Tasks.md for deployment configuration that enforces
|
||||
single-worker mode, preventing this issue entirely.
|
||||
BanGUI enforces single-worker mode at startup:
|
||||
1. Environment variable check: BANGUI_WORKERS must be 1 or unset
|
||||
2. Database lock: Only one instance can run the scheduler at a time
|
||||
3. Startup validation: Fails loudly if multi-worker scenario is detected
|
||||
|
||||
For now, BanGUI is deployed as single-worker only — this constraint is
|
||||
acceptable and keeps the implementation simple.
|
||||
See Docs/Architekture.md § Deployment Constraints for full details.
|
||||
|
||||
MULTI-WORKER SOLUTION (Future):
|
||||
To deploy BanGUI with multiple workers in the future (e.g., via gunicorn -w 4):
|
||||
1. Replace RuntimeState with a shared store (Redis, shared memory, database)
|
||||
2. Replace InMemorySessionCache with a shared backend (Redis, database)
|
||||
3. Replace APScheduler with a distributed scheduler backend
|
||||
4. Ensure all workers use the same backend for coordination
|
||||
|
||||
CURRENT STATUS:
|
||||
For now, BanGUI is deployed as single-worker only. This constraint is
|
||||
acceptable and keeps the implementation simple. The database-backed scheduler
|
||||
lock ensures only one instance runs background jobs, even in container
|
||||
orchestration scenarios where multiple instances may start.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -19,16 +19,24 @@ IMPACT IN MULTI-WORKER DEPLOYMENTS:
|
||||
- Worker B still has the stale session in its cache → request is accepted.
|
||||
- User appears still logged in (from their perspective).
|
||||
|
||||
This is a security issue: logout does not work reliably across workers.
|
||||
This is a CRITICAL SECURITY ISSUE: logout does not work reliably across workers.
|
||||
|
||||
MULTI-WORKER SOLUTION:
|
||||
To deploy BanGUI with multiple workers (e.g., via gunicorn -w 4), replace
|
||||
InMemorySessionCache with a shared backend such as:
|
||||
- RedisSessionCache — backed by Redis (recommended for production).
|
||||
- DatabaseSessionCache — backed by SQLite or PostgreSQL.
|
||||
- SharedMemorySessionCache — backed by IPC (for local multi-process).
|
||||
SINGLE-WORKER ENFORCEMENT:
|
||||
BanGUI enforces single-worker mode to prevent this issue:
|
||||
1. Environment variable check: BANGUI_WORKERS must be 1 or unset
|
||||
2. Database lock: Only one instance can run the scheduler at a time
|
||||
3. Startup validation: Fails loudly if multi-worker scenario is detected
|
||||
|
||||
The SessionCache Protocol is already designed for pluggable backends:
|
||||
See Docs/Architekture.md § Deployment Constraints for full details.
|
||||
|
||||
MULTI-WORKER SOLUTION (Future):
|
||||
If multi-worker support is needed in the future, replace InMemorySessionCache
|
||||
with a shared backend such as:
|
||||
- RedisSessionCache — backed by Redis (recommended for production)
|
||||
- DatabaseSessionCache — backed by SQLite or PostgreSQL
|
||||
- SharedMemorySessionCache — backed by IPC (for local multi-process)
|
||||
|
||||
The SessionCache Protocol is designed for pluggable backends:
|
||||
class SessionCache(Protocol):
|
||||
def get(token: str) -> Session | None: ...
|
||||
def set(token: str, session: Session, ttl_seconds: float) -> None: ...
|
||||
@@ -36,17 +44,16 @@ MULTI-WORKER SOLUTION:
|
||||
def clear() -> None: ...
|
||||
|
||||
To add Redis support:
|
||||
1. Create RedisSessionCache in this module (implements SessionCache).
|
||||
2. Update runtime_state.set_runtime_settings() to instantiate RedisSessionCache
|
||||
when REDIS_URL is configured.
|
||||
3. See Backend-Development.md § "Session Cache Pluggability" for details.
|
||||
1. Create RedisSessionCache in this module (implements SessionCache)
|
||||
2. Update app/main.py _update_session_cache() to instantiate RedisSessionCache
|
||||
when BANGUI_REDIS_URL is configured
|
||||
3. Update Backend-Development.md with multi-worker deployment guidelines
|
||||
|
||||
SINGLE-WORKER ENFORCEMENT:
|
||||
See TASK-002 in Docs/Tasks.md for deployment configuration that enforces
|
||||
single-worker mode, preventing this issue entirely.
|
||||
|
||||
For now, BanGUI is deployed as single-worker only — this constraint is
|
||||
acceptable and keeps the implementation simple.
|
||||
CURRENT STATUS:
|
||||
For now, BanGUI is deployed as single-worker only. This constraint is
|
||||
acceptable and keeps the implementation simple. The database-backed scheduler
|
||||
lock ensures only one instance runs background jobs, even in container
|
||||
orchestration scenarios where multiple instances may start.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
Reference in New Issue
Block a user