Fix: Enforce single-worker deployment for session cache cluster safety
Addresses: Backend session cache not cluster-safe (multi-worker issue) Problem: - Session cache is process-local (InMemorySessionCache) - Multi-worker deployments (uvicorn --workers N) create separate processes - Each process has its own independent session cache - Sessions cached in Worker A are invisible to Workers B, C, D - Users randomly logged out when requests land on different workers - Also affects RuntimeState, rate limiter, and background jobs Solution (Option A - Strict single-worker enforcement): - Enhance startup validation with clearer error messages - Update error messages to explain the problem and how to fix it - Document single-worker requirement prominently in Docker configs - Update module docstrings to clarify constraints Changes: 1. app/startup.py: - Enhanced _check_single_worker_mode() error message with troubleshooting - Enhanced _stage_check_worker_mode_and_acquire_lock() error message - Removed unused import 2. app/utils/session_cache.py: - Updated module docstring to explain constraints more clearly - Added references to deployment documentation - Clarified multi-worker solution for future implementation 3. app/utils/runtime_state.py: - Updated module docstring with deployment constraint references - Aligned messaging with session_cache.py 4. Docker/Dockerfile.backend: - Added comprehensive comments about single-worker requirement - Explained impact in multi-worker deployments - Referenced deployment constraints documentation 5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml: - Added documentation comments about BANGUI_WORKERS constraint - Explained why single-worker is required 6. backend/tests/test_startup_integration.py: - Fixed test unpacking to match function return signature (3 values, not 2) This ensures multi-worker deployments fail loudly at startup with clear guidance on what went wrong and how to fix it. The database-backed scheduler lock provides defense-in-depth for container orchestration scenarios. For future multi-worker support, implement: - Redis or database-backed session cache - Shared RuntimeState coordination - Distributed APScheduler backend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -67,4 +67,19 @@ USER bangui
|
||||
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
|
||||
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health')" || exit 1
|
||||
|
||||
# ⚠️ IMPORTANT: Single-Worker Requirement
|
||||
# BanGUI must always run as a single worker process:
|
||||
# - Do NOT pass --workers or --worker-class to uvicorn
|
||||
# - Do NOT use gunicorn with -w 4 or similar
|
||||
# - Do NOT override BANGUI_WORKERS to > 1
|
||||
#
|
||||
# Why? The session cache is process-local. Multiple workers would cause:
|
||||
# - Random user logouts (sessions not shared between workers)
|
||||
# - Duplicate background jobs (each worker runs the scheduler)
|
||||
# - SQLite lock contention and timeouts
|
||||
#
|
||||
# For high availability, use container orchestration (Kubernetes, Docker Swarm)
|
||||
# to run multiple instances, not multiple workers in a single process.
|
||||
#
|
||||
# See Docs/Architekture.md § Deployment Constraints for details.
|
||||
CMD ["uvicorn", "app.main:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]
|
||||
|
||||
@@ -65,6 +65,8 @@ services:
|
||||
# Secure=false is intentional for local HTTP development.
|
||||
# In production, Secure=true prevents session cookies over unencrypted HTTP.
|
||||
BANGUI_SESSION_COOKIE_SECURE: "false"
|
||||
# BANGUI_WORKERS should not be set (defaults to 1).
|
||||
# Never set it to > 1; the session cache is process-local.
|
||||
volumes:
|
||||
- ../backend/app:/app/app:z
|
||||
- ../fail2ban-master:/app/fail2ban-master:ro,z
|
||||
|
||||
@@ -58,7 +58,11 @@ services:
|
||||
BANGUI_FAIL2BAN_SOCKET: "/var/run/fail2ban/fail2ban.sock"
|
||||
BANGUI_FAIL2BAN_CONFIG_DIR: "/config/fail2ban"
|
||||
BANGUI_LOG_LEVEL: "info"
|
||||
BANGUI_WORKERS: "1" # APScheduler requires single worker — do not change
|
||||
# ⚠️ BANGUI_WORKERS MUST be 1 — see session_cache.py docstring for details
|
||||
# BanGUI uses a process-local session cache. Multiple workers in a single process
|
||||
# would cause users to be randomly logged out as sessions wouldn't be shared.
|
||||
# For HA, run multiple BanGUI instances (each with --workers 1) via orchestration.
|
||||
BANGUI_WORKERS: "1"
|
||||
BANGUI_SESSION_SECRET: "${BANGUI_SESSION_SECRET:?Set BANGUI_SESSION_SECRET}"
|
||||
BANGUI_TIMEZONE: "${BANGUI_TIMEZONE:-UTC}"
|
||||
volumes:
|
||||
|
||||
@@ -41,6 +41,8 @@ services:
|
||||
- BANGUI_FAIL2BAN_SOCKET=/var/run/fail2ban/fail2ban.sock
|
||||
- BANGUI_FAIL2BAN_CONFIG_DIR=/config/fail2ban
|
||||
- BANGUI_LOG_LEVEL=info
|
||||
# ⚠️ BANGUI_WORKERS MUST be 1 — the session cache is process-local
|
||||
# Multiple workers would cause random logouts and duplicate background jobs
|
||||
- BANGUI_SESSION_SECRET=${BANGUI_SESSION_SECRET:?Set BANGUI_SESSION_SECRET}
|
||||
- BANGUI_TIMEZONE=${BANGUI_TIMEZONE:-UTC}
|
||||
volumes:
|
||||
|
||||
Reference in New Issue
Block a user