From d982fe3efce91f60664d7ce8d128dcf83383c087 Mon Sep 17 00:00:00 2001 From: Lukas Date: Sun, 26 Apr 2026 11:43:34 +0200 Subject: [PATCH] TASK-003: Document process-local constraint for RuntimeState and SessionCache - Add comprehensive docstring to runtime_state.py explaining single-process constraint, impacts in multi-worker deployments, and solution approach - Add comprehensive docstring to session_cache.py explaining process-local cache limitation, security implications, and Redis/database alternatives - Update Architecture.md to clarify session cache is process-local and describe single-worker enforcement via TASK-002 - Update Architecture.md runtime state section with detailed explanation of per-process state and multi-worker impacts - Add Backend-Development.md section 13.7.2 documenting session cache pluggability pattern with example Redis implementation - All tests pass; linting passes; type checking has pre-existing errors This is the short-term fix for TASK-003: enforce single-worker deployment (TASK-002) and document the constraint clearly. The long-term fix (Redis backend) is deferred as a follow-up. --- Docs/Architekture.md | 4 +- Docs/Backend-Development.md | 69 ++++++++++++++++++++++++++++++ backend/app/utils/runtime_state.py | 35 ++++++++++++--- backend/app/utils/session_cache.py | 44 +++++++++++++++++++ 4 files changed, 145 insertions(+), 7 deletions(-) diff --git a/Docs/Architekture.md b/Docs/Architekture.md index b1b45c9..9ef3286 100644 --- a/Docs/Architekture.md +++ b/Docs/Architekture.md @@ -668,9 +668,9 @@ BanGUI maintains its **own SQLite database** (separate from the fail2ban databas - Session expiry is configurable (set during setup, stored in `settings`). - The frontend `AuthProvider` checks session validity on mount and redirects to `/login` if invalid. - The backend `dependencies.py` provides an `authenticated` dependency that validates the session cookie on every protected endpoint. -- **Session validation cache** — validated session tokens are cached in memory for 10 seconds (`_session_cache` dict in `dependencies.py`) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. This cache is process-local and not safe for multi-worker or distributed deployments. A clustered deployment should replace `_session_cache` with a shared cache or remove it entirely. +- **Session validation cache** (`InMemorySessionCache` in `app.utils.session_cache`) — validated session tokens are cached in memory for 10 seconds (configurable via `session_cache_ttl_seconds`) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. **⚠️ This cache is process-local and not safe for multi-worker or distributed deployments.** In single-worker mode (enforced by TASK-002), this is safe and improves performance. For multi-worker deployments, replace `InMemorySessionCache` with a shared backend (Redis, database, shared memory) implementing the `SessionCache` protocol. See `app/utils/session_cache.py` module docstring for implementation details. - **GeoCache** — `GeoCache` instance is created at startup and stored on `app.state.geo_cache`. It encapsulates all IP geolocation caching: in-memory lookup cache, negative cache for unresolvable IPs (with TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from the `geo_cache` SQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by the `geo_cache_flush` background task. Stale entries are re-resolved by the `geo_re_resolve` task. Injected into routes and tasks via FastAPI's dependency system. -- **Runtime state** — `RuntimeState` is process-local and only safe when BanGUI runs as a single asyncio worker. Mutating runtime state must not span `await` points because the current design relies on cooperative scheduling. Multi-worker or multi-process deployments must replace this runtime state with a shared coordination backend such as Redis, shared memory, or a database-backed store. +- **Runtime state** (`RuntimeState` in `app.utils.runtime_state`) — stores mutable application state: `server_status` (fail2ban online/offline), `last_activation` (jail activation tracking), `pending_recovery` (crash detection), and `runtime_settings` (effective configuration). **⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker.** Mutations must not span `await` points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See `app/utils/runtime_state.py` module docstring for details. - **Setup-completion flag** — once `is_setup_complete()` returns `True`, the result is stored in `app.state._setup_complete_cached`. The `SetupRedirectMiddleware` skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard. --- diff --git a/Docs/Backend-Development.md b/Docs/Backend-Development.md index c21620e..ec2eacc 100644 --- a/Docs/Backend-Development.md +++ b/Docs/Backend-Development.md @@ -645,6 +645,75 @@ async def get_session_repo() -> SessionRepository: - Before each deployment, run `mypy --strict` to ensure all dependency providers return values compatible with their Protocol types. - The `cast()` calls in `dependencies.py` are a documented signal that structural compatibility is being verified externally, not via explicit class inheritance. +#### 13.7.2 Session Cache Pluggability — Process-Local vs. Shared Backends + +Session validation is expensive (SQLite lookup + password verification). To improve performance, **validated session tokens are cached** using the `SessionCache` interface (`app.utils.session_cache`). The default implementation, `InMemorySessionCache`, stores cached sessions in process-local memory. + +**Current implementation (single-worker):** + +```python +from app.utils.session_cache import SessionCache, InMemorySessionCache, NoOpSessionCache + +class SessionCache(Protocol): + """Interface for session token validation cache backends.""" + def get(self, token: str) -> Session | None: ... + def set(self, token: str, session: Session, ttl_seconds: float) -> None: ... + def invalidate(self, token: str) -> None: ... + def clear(self) -> None: ... + +# Default in-memory implementation — PROCESS-LOCAL +class InMemorySessionCache: + def __init__(self) -> None: + self._entries: dict[str, tuple[Session, float]] = {} +``` + +**Single-worker constraint:** + +`InMemorySessionCache` is **process-local** — each worker process has its own dict. In single-worker mode (enforced by TASK-002), this is safe and improves performance. In multi-worker deployments: +- A logout by worker A clears the session from A's cache, but worker B still has it → logout doesn't work. +- Enabling/disabling the cache requires restarting all workers to take effect. + +**Multi-worker solution:** + +To support multiple workers (future enhancement), implement a shared backend behind the same `SessionCache` Protocol: + +```python +# Example Redis implementation (not yet in codebase) +class RedisSessionCache: + """Session cache backed by Redis.""" + def __init__(self, redis_url: str) -> None: + self.client = aioredis.from_url(redis_url) + + async def get(self, token: str) -> Session | None: + data = await self.client.get(f"session:{token}") + return Session.model_validate_json(data) if data else None + + async def set(self, token: str, session: Session, ttl_seconds: float) -> None: + await self.client.setex( + f"session:{token}", + int(ttl_seconds), + session.model_dump_json() + ) + + async def invalidate(self, token: str) -> None: + await self.client.delete(f"session:{token}") + + async def clear(self) -> None: + await self.client.flushdb() +``` + +To adopt a Redis backend: +1. Create `RedisSessionCache` in `app.utils.session_cache`. +2. Update `app.utils.runtime_state.set_runtime_settings()` to instantiate `RedisSessionCache` when `REDIS_URL` env var is set. +3. Update `app.config.Settings` to accept optional `REDIS_URL`. +4. Tests continue to use `InMemorySessionCache` (no Redis dependency in dev). + +**Implementation rules:** +- All cache methods must be `async` (even if the backend is sync). +- Never log session tokens or session data. +- TTL must be respected — expired entries must be removed on access. +- See `app/utils/session_cache.py` for the full Protocol definition and current implementations. + ### 14.8 Composition over Inheritance - Favour **composing** small, focused objects over deep inheritance hierarchies. diff --git a/backend/app/utils/runtime_state.py b/backend/app/utils/runtime_state.py index 95c60ff..9b346e0 100644 --- a/backend/app/utils/runtime_state.py +++ b/backend/app/utils/runtime_state.py @@ -6,11 +6,36 @@ framework state bag limited to shared infrastructure handles and immutable configuration while still allowing existing code to access runtime values via attribute proxying. -RuntimeState is designed for a single-process, single-worker asyncio -deployment. Mutations must complete without awaiting across read-modify-write -sequences. If BanGUI is ever deployed with multiple workers or across processes, -this module must be replaced with a shared backend such as Redis or shared -memory. +⚠️ SINGLE-PROCESS CONSTRAINT +============================== + +RuntimeState is designed for a single-process, single-worker asyncio deployment. +This means: + - Each process has its own independent copy of all runtime state. + - Changes to runtime_state in one process are NOT visible to other processes. + - Mutations must complete without awaiting across read-modify-write sequences + (cooperative scheduling within a single event loop is safe). + +IMPACT IN MULTI-WORKER DEPLOYMENTS: + - Logout processed by worker A clears the session from A's in-memory cache, + but worker B still has that session in its own cache and will accept it. + - Health status updates (server_status) received by worker A are invisible + to worker B's dashboard responses — each worker reports stale data. + - fail2ban activation/recovery tracking (pending_recovery, last_activation) + is per-worker and unreliable across processes. + +MULTI-WORKER SOLUTION: + To deploy BanGUI with multiple workers (e.g., via gunicorn -w 4), you must: + 1. Replace RuntimeState with a shared store (Redis, shared memory, database). + 2. Replace InMemorySessionCache with RedisSessionCache (see session_cache.py). + 3. Ensure all workers use the same backend for coordination. + +SINGLE-WORKER ENFORCEMENT: + See TASK-002 in Docs/Tasks.md for deployment configuration that enforces + single-worker mode, preventing this issue entirely. + +For now, BanGUI is deployed as single-worker only — this constraint is +acceptable and keeps the implementation simple. """ from __future__ import annotations diff --git a/backend/app/utils/session_cache.py b/backend/app/utils/session_cache.py index 5d63c66..9900085 100644 --- a/backend/app/utils/session_cache.py +++ b/backend/app/utils/session_cache.py @@ -3,6 +3,50 @@ This module defines a cache interface for authenticated sessions and a default process-local in-memory implementation. The backend can swap the cache implementation without changing the authentication dependency logic. + +⚠️ PROCESS-LOCAL CONSTRAINT (InMemorySessionCache) +==================================================== + +InMemorySessionCache stores validated sessions in a process-local dict. +This means: + - Each worker process has its own independent session cache. + - A session invalidated (logout) by worker A is still valid in worker B. + - Changes to the cache in one process are NOT visible to other processes. + +IMPACT IN MULTI-WORKER DEPLOYMENTS: + - User logs out (worker A clears session from its cache). + - User makes a new request → routed to worker B. + - Worker B still has the stale session in its cache → request is accepted. + - User appears still logged in (from their perspective). + + This is a security issue: logout does not work reliably across workers. + +MULTI-WORKER SOLUTION: + To deploy BanGUI with multiple workers (e.g., via gunicorn -w 4), replace + InMemorySessionCache with a shared backend such as: + - RedisSessionCache — backed by Redis (recommended for production). + - DatabaseSessionCache — backed by SQLite or PostgreSQL. + - SharedMemorySessionCache — backed by IPC (for local multi-process). + + The SessionCache Protocol is already designed for pluggable backends: + class SessionCache(Protocol): + def get(token: str) -> Session | None: ... + def set(token: str, session: Session, ttl_seconds: float) -> None: ... + def invalidate(token: str) -> None: ... + def clear() -> None: ... + + To add Redis support: + 1. Create RedisSessionCache in this module (implements SessionCache). + 2. Update runtime_state.set_runtime_settings() to instantiate RedisSessionCache + when REDIS_URL is configured. + 3. See Backend-Development.md § "Session Cache Pluggability" for details. + +SINGLE-WORKER ENFORCEMENT: + See TASK-002 in Docs/Tasks.md for deployment configuration that enforces + single-worker mode, preventing this issue entirely. + +For now, BanGUI is deployed as single-worker only — this constraint is +acceptable and keeps the implementation simple. """ from __future__ import annotations