Addresses: Backend session cache not cluster-safe (multi-worker issue)
Problem:
- Session cache is process-local (InMemorySessionCache)
- Multi-worker deployments (uvicorn --workers N) create separate processes
- Each process has its own independent session cache
- Sessions cached in Worker A are invisible to Workers B, C, D
- Users randomly logged out when requests land on different workers
- Also affects RuntimeState, rate limiter, and background jobs
Solution (Option A - Strict single-worker enforcement):
- Enhance startup validation with clearer error messages
- Update error messages to explain the problem and how to fix it
- Document single-worker requirement prominently in Docker configs
- Update module docstrings to clarify constraints
Changes:
1. app/startup.py:
- Enhanced _check_single_worker_mode() error message with troubleshooting
- Enhanced _stage_check_worker_mode_and_acquire_lock() error message
- Removed unused import
2. app/utils/session_cache.py:
- Updated module docstring to explain constraints more clearly
- Added references to deployment documentation
- Clarified multi-worker solution for future implementation
3. app/utils/runtime_state.py:
- Updated module docstring with deployment constraint references
- Aligned messaging with session_cache.py
4. Docker/Dockerfile.backend:
- Added comprehensive comments about single-worker requirement
- Explained impact in multi-worker deployments
- Referenced deployment constraints documentation
5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml:
- Added documentation comments about BANGUI_WORKERS constraint
- Explained why single-worker is required
6. backend/tests/test_startup_integration.py:
- Fixed test unpacking to match function return signature (3 values, not 2)
This ensures multi-worker deployments fail loudly at startup with clear
guidance on what went wrong and how to fix it. The database-backed scheduler
lock provides defense-in-depth for container orchestration scenarios.
For future multi-worker support, implement:
- Redis or database-backed session cache
- Shared RuntimeState coordination
- Distributed APScheduler backend
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TASK-004: Replace module-level mutable runtime flags in service layer with
injected state holder, eliminating hidden global state and improving testability
and synchronization boundaries.
Changes:
- Create JailServiceState dataclass in app/utils/runtime_state.py to hold
backend capability cache and synchronization lock
- Add JailServiceState as a field in RuntimeState (with default_factory)
- Remove module-level _backend_cmd_supported and _backend_cmd_lock from
jail_service.py
- Refactor _check_backend_cmd_supported() to accept state parameter
- Inject JailServiceState into list_jails() and _fetch_jail_summary() via
parameters
- Add get_jail_service_state() dependency provider in app/dependencies.py
- Add JailServiceStateDep type alias for router injection
- Update jails router to receive and pass state to service functions
- Update all tests to use jail_service_state fixture and pass state to functions
- Remove duplicate _MAX_PAGE_SIZE constant definition
- Document mutable state management in Backend-Development.md
- Update Architecture.md to describe JailServiceState and state nesting pattern
Benefits:
- Eliminates global mutable state and associated race conditions
- Makes state visible to callers (not hidden in module scope)
- Enables test isolation (each test gets fresh state)
- Prepares codebase for multi-worker deployments (state can be extracted to
shared backend)
- Synchronization boundaries are now explicit (state.get_backend_cmd_lock())
Compliance:
- All tests pass (17 passed in TestListJails, TestGetJail, TestLockInitialization)
- No ruff linting errors
- Type-safe: JailServiceState properly typed with asyncio.Lock, bool | None
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add comprehensive docstring to runtime_state.py explaining single-process
constraint, impacts in multi-worker deployments, and solution approach
- Add comprehensive docstring to session_cache.py explaining process-local
cache limitation, security implications, and Redis/database alternatives
- Update Architecture.md to clarify session cache is process-local and
describe single-worker enforcement via TASK-002
- Update Architecture.md runtime state section with detailed explanation of
per-process state and multi-worker impacts
- Add Backend-Development.md section 13.7.2 documenting session cache
pluggability pattern with example Redis implementation
- All tests pass; linting passes; type checking has pre-existing errors
This is the short-term fix for TASK-003: enforce single-worker deployment
(TASK-002) and document the constraint clearly. The long-term fix (Redis
backend) is deferred as a follow-up.
Move session cache initialization from per-request _build_app_context to
startup lifespan handler. The session cache type is now decided once at app
startup based on settings, making _build_app_context pure (read-only).
Changes:
- Move cache initialization logic to new _update_session_cache() in main.py
- Call _update_session_cache() during lifespan startup to initialize cache
- Remove three if/elif/elif branches mutating state.session_cache from _build_app_context
- Add cache swap logic to set_runtime_settings() in runtime_state.py to handle
runtime settings changes (e.g., setup wizard updates)
- Keep app.state.session_cache initialization in create_app() for test compatibility
This ensures:
- _build_app_context is pure and doesn't mutate app state on each request
- Session cache configuration decisions are centralized at startup
- Settings changes during runtime (via setup wizard) also trigger cache swap
- Cache initialization logic is isolated in one place
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>