diff --git a/Docs/Tasks.md b/Docs/Tasks.md index 8fcb370..94a770a 100644 --- a/Docs/Tasks.md +++ b/Docs/Tasks.md @@ -8,57 +8,107 @@ Reference: `Docs/Refactoring.md` for full analysis of each issue. ## Open Issues -### 1. Fix setup persistence -- Where found: `backend/app/config.py`, `backend/app/startup.py`, `backend/app/services/setup_service.py`, `backend/app/routers/setup.py` -- Goal: runtime configuration should use the values persisted during setup for `database_path`, `fail2ban_socket`, `timezone`, and `session_duration_minutes` rather than only environment defaults. -- Status: completed -- Possible traps and issues: - - Setup may appear successful but later use a different DB/socket on restart. - - A partially persisted setup run must not leave the app in a broken or half-configured state. - - Using both env vars and persisted settings requires a clear precedence model. +1. Move runtime application state out of `app.state` + - Goal: Remove process-local mutable business state from FastAPI application state and centralise it in a cluster-safe, testable runtime state abstraction. + - Issue: `app.state` is currently used to store runtime values such as `setup_complete_cached`, `server_status`, `pending_recovery`, and `last_activation`, which are process-specific and leak operational state into the framework layer. + - Propose: Introduce a dedicated runtime state manager or cache service that owns these values and exposes them through explicit dependencies. Keep `app.state` limited to shared immutable resources and infrastructure handles. + - Test: Verify the backend no longer stores business state in `app.state` and that runtime state is provided by the new abstraction only. + - Status: completed -### 2. Remove or use `session_secret` -- Where found: `backend/app/config.py` -- Goal: either eliminate the unused `BANGUI_SESSION_SECRET` requirement or use it for session token generation / signing so the setting has purpose. -- Status: completed -- Possible traps and issues: - - Keeping it required without use is misleading and burdens deployments. - - Introducing a new crypto dependency for session tokens must preserve backward compatibility with existing sessions. - - If switched to signed tokens, ensure token revocation / logout still works correctly. +2. Eliminate direct `app.state` access from routers + - Goal: Keep routers strictly in the HTTP translation layer by using FastAPI `Depends()` and injected services only. + - Issue: Several routers still read or mutate `app.state` directly, including `setup.py` and `config.py`, which breaks the intended separation of concerns and makes handlers harder to test. + - Propose: Refactor routers to consume explicit dependencies for required values and actions, and move all direct state mutations into services or dedicated state managers. + - Test: Ensure all router handlers compile without requiring direct `app.state` access and existing endpoint tests still pass. -### 3. Harden session cookie security -- Where found: `backend/app/routers/auth.py` -- Goal: auth cookies should be set with `secure=True` in HTTPS production deployments and `SameSite`/`HttpOnly` behavior should be explicit and configurable. -- Possible traps and issues: - - Hardcoding `secure=False` makes production deployment insecure. - - Switching to `secure=True` can break local development unless there is an explicit dev override. - - The frontend API may need matching CORS and same-site handling when served from a different origin. -- Status: completed — implemented configurable session cookie flags and secure mode support. +3. Replace process-local session cache with a pluggable abstraction + - Goal: Make session validation consistent across multi-worker deployments by replacing the module-level in-memory cache with an injectable cache backend. + - Issue: `_session_cache` in `app/dependencies.py` is a process-local dict, so logout invalidation and session revocation only work within a single process. + - Propose: Define a cache interface and provide a default in-memory implementation, with the option to swap in shared cache storage (Redis, Memcached) for clustered production deployments. + - Test: Add unit tests for the cache abstraction and verify logout/invalidation behaves correctly through the configured cache implementation. -### 4. Address session cache invalidation semantics -- Where found: `backend/app/dependencies.py` -- Goal: make session caching safe or remove it, and document that cache invalidation is not cluster-safe if the app is run with multiple workers. -- Status: completed — session validation cache is now disabled by default and can be enabled explicitly in single-process deployments. -- Possible traps and issues: - - Process-local cache can keep revoked sessions alive in other worker processes. - - Implementing a shared cache is a larger architectural change; a safer short-term fix is to disable caching by default. - - Need to ensure `logout()` and session expiry remain consistent across requests. +4. Harden SQLite connection configuration and lifecycle + - Goal: Make application database access robust under concurrent requests and background task load. + - Issue: `get_db` opens a fresh `aiosqlite` connection per request without explicit database pragmas such as `busy_timeout` or WAL mode. + - Propose: Configure SQLite connections in `open_db` with safe defaults (e.g. WAL, busy timeout, journal mode) and consider a centralized request-scoped access wrapper to keep connection setup consistent. + - Test: Confirm `open_db` applies the expected pragmas and add simulated concurrency tests that surface lock / timeout failures. -### 5. Improve external HTTP client resilience -- Where found: `backend/app/startup.py` -- Goal: create `aiohttp.ClientSession()` with sensible global timeouts, connection limit settings, and optional retry policy for geo/blocklist API calls. -- Status: completed — configured shared aiohttp session with sensible timeouts, connection limits, and retry support for transient blocklist/geo failures. -- Possible traps and issues: - - Without timeouts, external lookups can hang request handling or background tasks. - - Connection limits must be chosen carefully to avoid underutilization or overload. - - A retry policy should avoid retry storms and should respect API rate limits. +5. Separate startup settings from runtime configuration mutation + - Goal: Keep startup configuration immutable after bootstrap and handle runtime overrides through a dedicated manager. + - Issue: `POST /api/setup` mutates `app.state.settings`, and startup logic also replaces `app.state.settings` with persisted runtime overrides, mixing immutable startup settings with mutable runtime config. + - Propose: Introduce a runtime settings manager or configuration service that resolves effective settings from persisted overrides without mutating the startup settings object on `app.state`. + - Test: Validate the setup flow persists runtime settings in the database and that runtime settings are resolved through the new manager instead of direct `app.state.settings` mutation. + +6. Introduce explicit service and repository abstractions for swapability and testing + - Goal: Reduce tight coupling and improve replacement/testing of core backend components. + - Issue: Routers and higher-level services currently import concrete service modules directly, which prevents clean substitution and dependency override testing. + - Propose: Define protocols or abstract base classes for major services and repositories, then wire concrete implementations through FastAPI dependency providers. + - Test: Add tests that override a service dependency with a fake implementation and verify the router behavior remains correct. + +7. Move operational orchestration out of routers and into service/task layer + - Goal: Keep routers thin and move operational control flow into service or task components. + - Issue: `app/routers/config.py` contains operational orchestration such as activation crash tracking, pending recovery state updates, and forced health probes, which belongs in service/task code not the HTTP layer. + - Propose: Refactor jail activation/deactivation/recovery coordination into services or task managers that manage state updates and health probe triggers on behalf of the router. + - Test: Confirm router tests only cover HTTP translation while unit tests for the new service/task components cover the orchestration logic. + +8. Decouple periodic background jobs from FastAPI application state + - Goal: Make scheduled task runners explicit and testable by removing direct `app.state` dependency from background task code. + - Issue: `history_sync`, `geo_re_resolve`, `geo_cache_flush`, and `blocklist_import` read shared resources directly from `app.state` and register jobs against `app.state.scheduler`, which couples task implementation to FastAPI internals. + - Propose: Introduce a scheduler/task bootstrap abstraction that accepts injected settings, database providers, HTTP sessions, and scheduler handles, and move task resource access out of the `app.state` internals. + - Test: Add unit tests for task registration and execution with fake resource providers and a mock scheduler, without needing a real FastAPI instance. + +9. Remove brittle scheduler bootstrap logic from blocklist job registration + - Goal: Ensure job registration is deterministic and compatible with both synchronous and asynchronous startup contexts. + - Issue: `blocklist_import.register` uses `asyncio.get_event_loop().run_until_complete` then falls back to `asyncio.ensure_future`, which is fragile and can fail if the application is already running inside an event loop. + - Propose: Refactor blocklist job registration to use a pure async bootstrap flow invoked from the lifespan handler, or resolve schedule configuration before starting the scheduler instead of forcing synchronous loop execution. + - Test: Add tests validating blocklist job registration in both non-running and running event loop contexts, and verify the registration path does not raise loop state errors. + +10. Standardize async offloading and thread-backed sync work behind a shared executor abstraction + - Goal: Remove scattered `asyncio.get_event_loop().run_in_executor` / `asyncio.ensure_future` patterns and give the backend a single, consistent way to run blocking file, CPU-bound, or compatibility work. + - Issue: Multiple services (`raw_config_io_service`, `log_service`, `jail_config_service`, `setup_service`, and others) manually call `asyncio.get_event_loop()` or `run_in_executor`, creating inconsistent async control flow and making event-loop compatibility harder to maintain. + - Propose: Introduce a shared async offload abstraction or utility that uses `asyncio.to_thread` / dedicated thread pool executors, and refactor blocking I/O and CPU-bound helpers to consume that shared provider. + - Test: Verify existing async service tests still pass after refactoring and add coverage for the new offload utility to ensure blocking work is properly delegated without event-loop blocking. + +11. Eliminate duplicate setup persistence across bootstrap and runtime databases + - Goal: Simplify setup persistence by establishing a single source of truth for runtime configuration data and reduce failure surface in first-run setup. + - Issue: `setup_service.run_setup` writes the same master password and runtime settings to both the bootstrap configuration database and the runtime database, making setup logic brittle and difficult to reason about. + - Propose: Consolidate setup persistence so initial configuration is stored in one clearly defined location, with bootstrap metadata in the startup DB and runtime configuration pulled from the single persisted source. + - Test: Add tests that confirm setup writes each value exactly once and that startup uses a single persisted store to resolve runtime settings, removing duplicate write paths. + +12. Replace custom fail2ban socket protocol duplication with a dedicated adapter over the vendored client + - Goal: Reduce maintenance risk by aligning fail2ban integration with the bundled `fail2ban-master` codebase and isolating protocol implementation behind a swapable adapter. + - Issue: `app.utils.fail2ban_client` reimplements low-level socket framing, command encoding, and protocol parsing rather than using the vendored fail2ban client classes, creating duplicated protocol logic and an unclear source of truth. + - Propose: Introduce a fail2ban adapter interface and either wrap the vendored `fail2ban-client` implementation or refactor the custom client so it is the single canonical integration point. Ensure all services depend on the adapter abstraction rather than raw socket details. + - Test: Add adapter-level unit tests and service tests that can swap in a fake fail2ban adapter, proving the backend no longer couples business logic directly to low-level socket protocol code. + +13. Introduce explicit schema migration/versioning for the runtime database + - Goal: Allow BanGUI to evolve its application database schema safely across releases and prevent startup failures caused by schema drift. + - Issue: `app/db.py` only creates missing tables on startup and has no schema version marker or migration strategy. This leaves upgrades brittle and makes it difficult to evolve the database without manual intervention. + - Propose: Add a lightweight migration system or schema version table, define incremental migration scripts, and run upgrades during startup before the application begins serving requests. + - Test: Add tests that simulate an older schema version and verify startup migrates it to the latest schema, and assert that `init_db`/migration code leaves the database in the expected current state. + +14. Split the monolithic config router into focused sub-routers + - Goal: Improve maintainability, reduce cognitive load, and make HTTP routing responsibilities easier to isolate and test. + - Issue: `app/routers/config.py` and `app/routers/file_config.py` each contain many unrelated endpoints with mixed concerns, making the HTTP layer large, hard to navigate, and harder to evolve safely. + - Propose: Refactor configuration endpoints into smaller routers or feature-specific modules such as `jail_config`, `filter_config`, `action_config`, and `raw_config`. Keep each router focused on one vertical slice of functionality and delegate business logic to a single service. + - Test: Verify the refactor preserves endpoint behavior and that router tests are split along the new modules, ensuring each HTTP layer unit test covers a narrow, well-defined feature. + +15. Centralize fail2ban metadata resolution and avoid repeated socket discovery + - Goal: Reduce coupling between history APIs and fail2ban socket metadata discovery, improving performance and reliability. + - Issue: `app/utils/fail2ban_db_utils.get_fail2ban_db_path` queries fail2ban for the database path on every history request. This ties read-only history endpoints to socket availability and duplicates discovery logic across the codebase. + - Propose: Introduce a dedicated fail2ban metadata service that resolves and caches runtime metadata such as the fail2ban DB path, refreshing it only when needed instead of every request. + - Test: Add tests that confirm history requests can reuse cached metadata, and that the metadata service falls back cleanly when the socket is temporarily unavailable after initial resolution. + +16. Introduce an explicit application lifecycle context to replace raw `app.state` access + - Goal: Make shared infrastructure handles and runtime configuration available through a typed, injectable context instead of a loosely-typed `app.state` bag. + - Issue: The backend currently relies on raw `app.state` access in routers, tasks, and dependencies, which hides contract boundaries and allows arbitrary mutable state to leak into the framework layer. + - Propose: Define a dedicated `ApplicationContext` / runtime context object for shared resources such as `settings`, `http_session`, `scheduler`, and runtime caches. Provide it through explicit FastAPI dependencies and remove direct `request.app.state` reads outside a small bootstrap boundary. + - Test: Verify the new context can satisfy existing dependencies, and add a static check or test ensuring no backend module directly accesses `request.app.state` except through the new context provider. + +17. Resolve runtime configuration before creating shared startup resources + - Goal: Ensure startup resources are initialized using the effective runtime settings, not stale bootstrap defaults. + - Issue: `startup_shared_resources` creates shared resources like `aiohttp.ClientSession` and geo cache initialization from the initial environment-loaded settings, then later applies persisted runtime overrides to `app.state.settings`, producing a fragile startup ordering. + - Propose: Split startup into phases that first resolve bootstrap and runtime persisted configuration, then construct shared resources and register scheduled jobs using those effective settings. + - Test: Add startup tests asserting that when persisted runtime settings differ from bootstrap settings, the final initialized resources are built from the resolved effective settings, not the original bootstrap values. -### 6. Update async socket handling -- Where found: `backend/app/utils/fail2ban_client.py`, `backend/app/startup.py` -- Goal: use modern asyncio APIs (`get_running_loop()`), avoid blocking operations on the event loop, and ensure startup resources are cleaned up if initialization fails. -- Status: completed — switched fail2ban socket I/O to `asyncio.to_thread` and added startup cleanup for failed resource initialization. -- Possible traps and issues: - - `asyncio.get_event_loop()` behavior changed in newer Python versions; this can cause runtime warnings or errors. - - Resource leaks can occur if `startup_shared_resources()` fails before the lifespan shutdown path is reached. - - The fail2ban socket client should still handle transient errors and not hide protocol failures behind generic exceptions. diff --git a/backend/app/dependencies.py b/backend/app/dependencies.py index fc7e5c8..810ee93 100644 --- a/backend/app/dependencies.py +++ b/backend/app/dependencies.py @@ -21,6 +21,7 @@ from app.config import Settings from app.models.auth import Session from app.models.config import PendingRecovery from app.models.server import ServerStatus +from app.utils.runtime_state import RuntimeState from app.utils.time_utils import utc_now log: structlog.stdlib.BoundLogger = structlog.get_logger() @@ -35,6 +36,7 @@ class AppState(Protocol): server_status: ServerStatus pending_recovery: PendingRecovery | None last_activation: dict[str, datetime.datetime] | None + runtime_state: RuntimeState _COOKIE_NAME = "bangui_session" diff --git a/backend/app/main.py b/backend/app/main.py index a6f701f..ea0c1b0 100644 --- a/backend/app/main.py +++ b/backend/app/main.py @@ -45,6 +45,7 @@ from app.routers import ( ) from app.startup import startup_shared_resources from app.utils.fail2ban_client import Fail2BanConnectionError, Fail2BanProtocolError +from app.utils.runtime_state import ApplicationState, RuntimeState from app.utils.setup_state import is_setup_complete_cached, set_setup_complete_cache log: structlog.stdlib.BoundLogger = structlog.get_logger() @@ -283,7 +284,10 @@ def create_app(settings: Settings | None = None) -> FastAPI: lifespan=_lifespan, ) - # Store settings on app.state so the lifespan handler can access them. + # Store immutable configuration and the dedicated runtime state manager on + # app.state. Runtime state values are proxied through the wrapper so the + # shared Starlette state bag itself does not hold mutable business state. + app.state = ApplicationState(RuntimeState()) app.state.settings = resolved_settings set_setup_complete_cache(app, False) diff --git a/backend/app/utils/runtime_state.py b/backend/app/utils/runtime_state.py new file mode 100644 index 0000000..25a61be --- /dev/null +++ b/backend/app/utils/runtime_state.py @@ -0,0 +1,85 @@ +"""Centralise mutable runtime application state. + +Runtime state is kept outside of Starlette's raw ``app.state`` storage and +exposed through a controlled state manager object. This keeps the FastAPI +framework state bag limited to shared infrastructure handles and immutable +configuration while still allowing existing code to access runtime values via +attribute proxying. +""" + +from __future__ import annotations + +import datetime +from dataclasses import dataclass, field +from typing import TYPE_CHECKING, Any + +from starlette.datastructures import State + +from app.models.server import ServerStatus + +if TYPE_CHECKING: # pragma: no cover + from app.models.config import PendingRecovery + +ActivationRecord = dict[str, datetime.datetime] + +_RUNTIME_ATTRIBUTES: frozenset[str] = frozenset( + { + "setup_complete_cached", + "server_status", + "pending_recovery", + "last_activation", + } +) + + +@dataclass +class RuntimeState: + """Mutable runtime state for the current application instance.""" + + setup_complete_cached: bool = False + server_status: ServerStatus = field(default_factory=lambda: ServerStatus(online=False)) + pending_recovery: PendingRecovery | None = None + last_activation: ActivationRecord | None = None + + +class ApplicationState(State): + """Application state wrapper that delegates runtime state access. + + This allows runtime values to be stored in a dedicated + :class:`RuntimeState` instance while preserving the familiar attribute-based + ``app.state`` API for the rest of the application. + """ + + def __init__(self, runtime_state: RuntimeState, state: dict[str, Any] | None = None): + super().__init__(state) + object.__setattr__(self, "_runtime_state", runtime_state) + + @property + def runtime_state(self) -> RuntimeState: + """Return the dedicated runtime state manager.""" + return object.__getattribute__(self, "_runtime_state") + + def __getattr__(self, key: str) -> Any: + if key in _RUNTIME_ATTRIBUTES: + return getattr(self.runtime_state, key) + return super().__getattr__(key) + + def __setattr__(self, key: str, value: Any) -> None: + if key in _RUNTIME_ATTRIBUTES: + setattr(self.runtime_state, key, value) + return + super().__setattr__(key, value) + + def __delattr__(self, key: str) -> None: + if key in _RUNTIME_ATTRIBUTES: + delattr(self.runtime_state, key) + return + super().__delattr__(key) + + +def get_runtime_state(app: Any) -> RuntimeState: + """Return the runtime state manager for the current FastAPI application.""" + state = getattr(app, "state", None) + if state is None or not hasattr(state, "runtime_state"): + raise AttributeError("Runtime state has not been initialised on the application.") + return state.runtime_state diff --git a/backend/app/utils/setup_state.py b/backend/app/utils/setup_state.py index 5e30940..a56d009 100644 --- a/backend/app/utils/setup_state.py +++ b/backend/app/utils/setup_state.py @@ -2,17 +2,22 @@ from __future__ import annotations -from fastapi import FastAPI +from typing import TYPE_CHECKING + +from app.utils.runtime_state import get_runtime_state + +if TYPE_CHECKING: # pragma: no cover + from fastapi import FastAPI def is_setup_complete_cached(app: FastAPI) -> bool: - """Return the cached setup completion state from application state.""" - return getattr(app.state, "setup_complete_cached", False) + """Return the cached setup completion state from the runtime state manager.""" + return get_runtime_state(app).setup_complete_cached def set_setup_complete_cache(app: FastAPI, completed: bool) -> None: - """Set the cached setup completion state on application state.""" - app.state.setup_complete_cached = completed + """Set the cached setup completion state on the runtime state manager.""" + get_runtime_state(app).setup_complete_cached = completed def invalidate_setup_complete_cache(app: FastAPI) -> None: @@ -21,4 +26,4 @@ def invalidate_setup_complete_cache(app: FastAPI) -> None: This helper exists so the cache can be invalidated explicitly if the application state changes during runtime. """ - app.state.setup_complete_cached = False + get_runtime_state(app).setup_complete_cached = False diff --git a/backend/tests/test_main.py b/backend/tests/test_main.py index c5ce8e7..993ab52 100644 --- a/backend/tests/test_main.py +++ b/backend/tests/test_main.py @@ -60,6 +60,28 @@ def test_create_app_skips_cors_when_no_origins_are_configured() -> None: assert cors_middleware == [] +def test_create_app_initialises_runtime_state_manager() -> None: + """The FastAPI app exposes a dedicated runtime state manager on app.state.""" + settings = Settings( + database_path="/tmp/test.db", + fail2ban_socket="/tmp/fake_fail2ban.sock", + fail2ban_config_dir="/tmp/fail2ban", + session_secret="test-secret-key-do-not-use-in-production", + session_duration_minutes=60, + timezone="UTC", + log_level="debug", + ) + + app = create_app(settings=settings) + runtime_state = app.state.runtime_state + + assert runtime_state.setup_complete_cached is False + assert runtime_state.server_status.online is False + assert runtime_state.pending_recovery is None + assert runtime_state.last_activation is None + assert app.state.server_status.online is False + + def test_create_app_disables_cors_by_default() -> None: """The FastAPI app does not add CORS middleware when no origins are configured by environment.""" settings = Settings(