|
|
|
|
@@ -8,57 +8,107 @@ Reference: `Docs/Refactoring.md` for full analysis of each issue.
|
|
|
|
|
|
|
|
|
|
## Open Issues
|
|
|
|
|
|
|
|
|
|
### 1. Fix setup persistence
|
|
|
|
|
- Where found: `backend/app/config.py`, `backend/app/startup.py`, `backend/app/services/setup_service.py`, `backend/app/routers/setup.py`
|
|
|
|
|
- Goal: runtime configuration should use the values persisted during setup for `database_path`, `fail2ban_socket`, `timezone`, and `session_duration_minutes` rather than only environment defaults.
|
|
|
|
|
- Status: completed
|
|
|
|
|
- Possible traps and issues:
|
|
|
|
|
- Setup may appear successful but later use a different DB/socket on restart.
|
|
|
|
|
- A partially persisted setup run must not leave the app in a broken or half-configured state.
|
|
|
|
|
- Using both env vars and persisted settings requires a clear precedence model.
|
|
|
|
|
1. Move runtime application state out of `app.state`
|
|
|
|
|
- Goal: Remove process-local mutable business state from FastAPI application state and centralise it in a cluster-safe, testable runtime state abstraction.
|
|
|
|
|
- Issue: `app.state` is currently used to store runtime values such as `setup_complete_cached`, `server_status`, `pending_recovery`, and `last_activation`, which are process-specific and leak operational state into the framework layer.
|
|
|
|
|
- Propose: Introduce a dedicated runtime state manager or cache service that owns these values and exposes them through explicit dependencies. Keep `app.state` limited to shared immutable resources and infrastructure handles.
|
|
|
|
|
- Test: Verify the backend no longer stores business state in `app.state` and that runtime state is provided by the new abstraction only.
|
|
|
|
|
- Status: completed
|
|
|
|
|
|
|
|
|
|
### 2. Remove or use `session_secret`
|
|
|
|
|
- Where found: `backend/app/config.py`
|
|
|
|
|
- Goal: either eliminate the unused `BANGUI_SESSION_SECRET` requirement or use it for session token generation / signing so the setting has purpose.
|
|
|
|
|
- Status: completed
|
|
|
|
|
- Possible traps and issues:
|
|
|
|
|
- Keeping it required without use is misleading and burdens deployments.
|
|
|
|
|
- Introducing a new crypto dependency for session tokens must preserve backward compatibility with existing sessions.
|
|
|
|
|
- If switched to signed tokens, ensure token revocation / logout still works correctly.
|
|
|
|
|
2. Eliminate direct `app.state` access from routers
|
|
|
|
|
- Goal: Keep routers strictly in the HTTP translation layer by using FastAPI `Depends()` and injected services only.
|
|
|
|
|
- Issue: Several routers still read or mutate `app.state` directly, including `setup.py` and `config.py`, which breaks the intended separation of concerns and makes handlers harder to test.
|
|
|
|
|
- Propose: Refactor routers to consume explicit dependencies for required values and actions, and move all direct state mutations into services or dedicated state managers.
|
|
|
|
|
- Test: Ensure all router handlers compile without requiring direct `app.state` access and existing endpoint tests still pass.
|
|
|
|
|
|
|
|
|
|
### 3. Harden session cookie security
|
|
|
|
|
- Where found: `backend/app/routers/auth.py`
|
|
|
|
|
- Goal: auth cookies should be set with `secure=True` in HTTPS production deployments and `SameSite`/`HttpOnly` behavior should be explicit and configurable.
|
|
|
|
|
- Possible traps and issues:
|
|
|
|
|
- Hardcoding `secure=False` makes production deployment insecure.
|
|
|
|
|
- Switching to `secure=True` can break local development unless there is an explicit dev override.
|
|
|
|
|
- The frontend API may need matching CORS and same-site handling when served from a different origin.
|
|
|
|
|
- Status: completed — implemented configurable session cookie flags and secure mode support.
|
|
|
|
|
3. Replace process-local session cache with a pluggable abstraction
|
|
|
|
|
- Goal: Make session validation consistent across multi-worker deployments by replacing the module-level in-memory cache with an injectable cache backend.
|
|
|
|
|
- Issue: `_session_cache` in `app/dependencies.py` is a process-local dict, so logout invalidation and session revocation only work within a single process.
|
|
|
|
|
- Propose: Define a cache interface and provide a default in-memory implementation, with the option to swap in shared cache storage (Redis, Memcached) for clustered production deployments.
|
|
|
|
|
- Test: Add unit tests for the cache abstraction and verify logout/invalidation behaves correctly through the configured cache implementation.
|
|
|
|
|
|
|
|
|
|
### 4. Address session cache invalidation semantics
|
|
|
|
|
- Where found: `backend/app/dependencies.py`
|
|
|
|
|
- Goal: make session caching safe or remove it, and document that cache invalidation is not cluster-safe if the app is run with multiple workers.
|
|
|
|
|
- Status: completed — session validation cache is now disabled by default and can be enabled explicitly in single-process deployments.
|
|
|
|
|
- Possible traps and issues:
|
|
|
|
|
- Process-local cache can keep revoked sessions alive in other worker processes.
|
|
|
|
|
- Implementing a shared cache is a larger architectural change; a safer short-term fix is to disable caching by default.
|
|
|
|
|
- Need to ensure `logout()` and session expiry remain consistent across requests.
|
|
|
|
|
4. Harden SQLite connection configuration and lifecycle
|
|
|
|
|
- Goal: Make application database access robust under concurrent requests and background task load.
|
|
|
|
|
- Issue: `get_db` opens a fresh `aiosqlite` connection per request without explicit database pragmas such as `busy_timeout` or WAL mode.
|
|
|
|
|
- Propose: Configure SQLite connections in `open_db` with safe defaults (e.g. WAL, busy timeout, journal mode) and consider a centralized request-scoped access wrapper to keep connection setup consistent.
|
|
|
|
|
- Test: Confirm `open_db` applies the expected pragmas and add simulated concurrency tests that surface lock / timeout failures.
|
|
|
|
|
|
|
|
|
|
### 5. Improve external HTTP client resilience
|
|
|
|
|
- Where found: `backend/app/startup.py`
|
|
|
|
|
- Goal: create `aiohttp.ClientSession()` with sensible global timeouts, connection limit settings, and optional retry policy for geo/blocklist API calls.
|
|
|
|
|
- Status: completed — configured shared aiohttp session with sensible timeouts, connection limits, and retry support for transient blocklist/geo failures.
|
|
|
|
|
- Possible traps and issues:
|
|
|
|
|
- Without timeouts, external lookups can hang request handling or background tasks.
|
|
|
|
|
- Connection limits must be chosen carefully to avoid underutilization or overload.
|
|
|
|
|
- A retry policy should avoid retry storms and should respect API rate limits.
|
|
|
|
|
5. Separate startup settings from runtime configuration mutation
|
|
|
|
|
- Goal: Keep startup configuration immutable after bootstrap and handle runtime overrides through a dedicated manager.
|
|
|
|
|
- Issue: `POST /api/setup` mutates `app.state.settings`, and startup logic also replaces `app.state.settings` with persisted runtime overrides, mixing immutable startup settings with mutable runtime config.
|
|
|
|
|
- Propose: Introduce a runtime settings manager or configuration service that resolves effective settings from persisted overrides without mutating the startup settings object on `app.state`.
|
|
|
|
|
- Test: Validate the setup flow persists runtime settings in the database and that runtime settings are resolved through the new manager instead of direct `app.state.settings` mutation.
|
|
|
|
|
|
|
|
|
|
6. Introduce explicit service and repository abstractions for swapability and testing
|
|
|
|
|
- Goal: Reduce tight coupling and improve replacement/testing of core backend components.
|
|
|
|
|
- Issue: Routers and higher-level services currently import concrete service modules directly, which prevents clean substitution and dependency override testing.
|
|
|
|
|
- Propose: Define protocols or abstract base classes for major services and repositories, then wire concrete implementations through FastAPI dependency providers.
|
|
|
|
|
- Test: Add tests that override a service dependency with a fake implementation and verify the router behavior remains correct.
|
|
|
|
|
|
|
|
|
|
7. Move operational orchestration out of routers and into service/task layer
|
|
|
|
|
- Goal: Keep routers thin and move operational control flow into service or task components.
|
|
|
|
|
- Issue: `app/routers/config.py` contains operational orchestration such as activation crash tracking, pending recovery state updates, and forced health probes, which belongs in service/task code not the HTTP layer.
|
|
|
|
|
- Propose: Refactor jail activation/deactivation/recovery coordination into services or task managers that manage state updates and health probe triggers on behalf of the router.
|
|
|
|
|
- Test: Confirm router tests only cover HTTP translation while unit tests for the new service/task components cover the orchestration logic.
|
|
|
|
|
|
|
|
|
|
8. Decouple periodic background jobs from FastAPI application state
|
|
|
|
|
- Goal: Make scheduled task runners explicit and testable by removing direct `app.state` dependency from background task code.
|
|
|
|
|
- Issue: `history_sync`, `geo_re_resolve`, `geo_cache_flush`, and `blocklist_import` read shared resources directly from `app.state` and register jobs against `app.state.scheduler`, which couples task implementation to FastAPI internals.
|
|
|
|
|
- Propose: Introduce a scheduler/task bootstrap abstraction that accepts injected settings, database providers, HTTP sessions, and scheduler handles, and move task resource access out of the `app.state` internals.
|
|
|
|
|
- Test: Add unit tests for task registration and execution with fake resource providers and a mock scheduler, without needing a real FastAPI instance.
|
|
|
|
|
|
|
|
|
|
9. Remove brittle scheduler bootstrap logic from blocklist job registration
|
|
|
|
|
- Goal: Ensure job registration is deterministic and compatible with both synchronous and asynchronous startup contexts.
|
|
|
|
|
- Issue: `blocklist_import.register` uses `asyncio.get_event_loop().run_until_complete` then falls back to `asyncio.ensure_future`, which is fragile and can fail if the application is already running inside an event loop.
|
|
|
|
|
- Propose: Refactor blocklist job registration to use a pure async bootstrap flow invoked from the lifespan handler, or resolve schedule configuration before starting the scheduler instead of forcing synchronous loop execution.
|
|
|
|
|
- Test: Add tests validating blocklist job registration in both non-running and running event loop contexts, and verify the registration path does not raise loop state errors.
|
|
|
|
|
|
|
|
|
|
10. Standardize async offloading and thread-backed sync work behind a shared executor abstraction
|
|
|
|
|
- Goal: Remove scattered `asyncio.get_event_loop().run_in_executor` / `asyncio.ensure_future` patterns and give the backend a single, consistent way to run blocking file, CPU-bound, or compatibility work.
|
|
|
|
|
- Issue: Multiple services (`raw_config_io_service`, `log_service`, `jail_config_service`, `setup_service`, and others) manually call `asyncio.get_event_loop()` or `run_in_executor`, creating inconsistent async control flow and making event-loop compatibility harder to maintain.
|
|
|
|
|
- Propose: Introduce a shared async offload abstraction or utility that uses `asyncio.to_thread` / dedicated thread pool executors, and refactor blocking I/O and CPU-bound helpers to consume that shared provider.
|
|
|
|
|
- Test: Verify existing async service tests still pass after refactoring and add coverage for the new offload utility to ensure blocking work is properly delegated without event-loop blocking.
|
|
|
|
|
|
|
|
|
|
11. Eliminate duplicate setup persistence across bootstrap and runtime databases
|
|
|
|
|
- Goal: Simplify setup persistence by establishing a single source of truth for runtime configuration data and reduce failure surface in first-run setup.
|
|
|
|
|
- Issue: `setup_service.run_setup` writes the same master password and runtime settings to both the bootstrap configuration database and the runtime database, making setup logic brittle and difficult to reason about.
|
|
|
|
|
- Propose: Consolidate setup persistence so initial configuration is stored in one clearly defined location, with bootstrap metadata in the startup DB and runtime configuration pulled from the single persisted source.
|
|
|
|
|
- Test: Add tests that confirm setup writes each value exactly once and that startup uses a single persisted store to resolve runtime settings, removing duplicate write paths.
|
|
|
|
|
|
|
|
|
|
12. Replace custom fail2ban socket protocol duplication with a dedicated adapter over the vendored client
|
|
|
|
|
- Goal: Reduce maintenance risk by aligning fail2ban integration with the bundled `fail2ban-master` codebase and isolating protocol implementation behind a swapable adapter.
|
|
|
|
|
- Issue: `app.utils.fail2ban_client` reimplements low-level socket framing, command encoding, and protocol parsing rather than using the vendored fail2ban client classes, creating duplicated protocol logic and an unclear source of truth.
|
|
|
|
|
- Propose: Introduce a fail2ban adapter interface and either wrap the vendored `fail2ban-client` implementation or refactor the custom client so it is the single canonical integration point. Ensure all services depend on the adapter abstraction rather than raw socket details.
|
|
|
|
|
- Test: Add adapter-level unit tests and service tests that can swap in a fake fail2ban adapter, proving the backend no longer couples business logic directly to low-level socket protocol code.
|
|
|
|
|
|
|
|
|
|
13. Introduce explicit schema migration/versioning for the runtime database
|
|
|
|
|
- Goal: Allow BanGUI to evolve its application database schema safely across releases and prevent startup failures caused by schema drift.
|
|
|
|
|
- Issue: `app/db.py` only creates missing tables on startup and has no schema version marker or migration strategy. This leaves upgrades brittle and makes it difficult to evolve the database without manual intervention.
|
|
|
|
|
- Propose: Add a lightweight migration system or schema version table, define incremental migration scripts, and run upgrades during startup before the application begins serving requests.
|
|
|
|
|
- Test: Add tests that simulate an older schema version and verify startup migrates it to the latest schema, and assert that `init_db`/migration code leaves the database in the expected current state.
|
|
|
|
|
|
|
|
|
|
14. Split the monolithic config router into focused sub-routers
|
|
|
|
|
- Goal: Improve maintainability, reduce cognitive load, and make HTTP routing responsibilities easier to isolate and test.
|
|
|
|
|
- Issue: `app/routers/config.py` and `app/routers/file_config.py` each contain many unrelated endpoints with mixed concerns, making the HTTP layer large, hard to navigate, and harder to evolve safely.
|
|
|
|
|
- Propose: Refactor configuration endpoints into smaller routers or feature-specific modules such as `jail_config`, `filter_config`, `action_config`, and `raw_config`. Keep each router focused on one vertical slice of functionality and delegate business logic to a single service.
|
|
|
|
|
- Test: Verify the refactor preserves endpoint behavior and that router tests are split along the new modules, ensuring each HTTP layer unit test covers a narrow, well-defined feature.
|
|
|
|
|
|
|
|
|
|
15. Centralize fail2ban metadata resolution and avoid repeated socket discovery
|
|
|
|
|
- Goal: Reduce coupling between history APIs and fail2ban socket metadata discovery, improving performance and reliability.
|
|
|
|
|
- Issue: `app/utils/fail2ban_db_utils.get_fail2ban_db_path` queries fail2ban for the database path on every history request. This ties read-only history endpoints to socket availability and duplicates discovery logic across the codebase.
|
|
|
|
|
- Propose: Introduce a dedicated fail2ban metadata service that resolves and caches runtime metadata such as the fail2ban DB path, refreshing it only when needed instead of every request.
|
|
|
|
|
- Test: Add tests that confirm history requests can reuse cached metadata, and that the metadata service falls back cleanly when the socket is temporarily unavailable after initial resolution.
|
|
|
|
|
|
|
|
|
|
16. Introduce an explicit application lifecycle context to replace raw `app.state` access
|
|
|
|
|
- Goal: Make shared infrastructure handles and runtime configuration available through a typed, injectable context instead of a loosely-typed `app.state` bag.
|
|
|
|
|
- Issue: The backend currently relies on raw `app.state` access in routers, tasks, and dependencies, which hides contract boundaries and allows arbitrary mutable state to leak into the framework layer.
|
|
|
|
|
- Propose: Define a dedicated `ApplicationContext` / runtime context object for shared resources such as `settings`, `http_session`, `scheduler`, and runtime caches. Provide it through explicit FastAPI dependencies and remove direct `request.app.state` reads outside a small bootstrap boundary.
|
|
|
|
|
- Test: Verify the new context can satisfy existing dependencies, and add a static check or test ensuring no backend module directly accesses `request.app.state` except through the new context provider.
|
|
|
|
|
|
|
|
|
|
17. Resolve runtime configuration before creating shared startup resources
|
|
|
|
|
- Goal: Ensure startup resources are initialized using the effective runtime settings, not stale bootstrap defaults.
|
|
|
|
|
- Issue: `startup_shared_resources` creates shared resources like `aiohttp.ClientSession` and geo cache initialization from the initial environment-loaded settings, then later applies persisted runtime overrides to `app.state.settings`, producing a fragile startup ordering.
|
|
|
|
|
- Propose: Split startup into phases that first resolve bootstrap and runtime persisted configuration, then construct shared resources and register scheduled jobs using those effective settings.
|
|
|
|
|
- Test: Add startup tests asserting that when persisted runtime settings differ from bootstrap settings, the final initialized resources are built from the resolved effective settings, not the original bootstrap values.
|
|
|
|
|
|
|
|
|
|
### 6. Update async socket handling
|
|
|
|
|
- Where found: `backend/app/utils/fail2ban_client.py`, `backend/app/startup.py`
|
|
|
|
|
- Goal: use modern asyncio APIs (`get_running_loop()`), avoid blocking operations on the event loop, and ensure startup resources are cleaned up if initialization fails.
|
|
|
|
|
- Status: completed — switched fail2ban socket I/O to `asyncio.to_thread` and added startup cleanup for failed resource initialization.
|
|
|
|
|
- Possible traps and issues:
|
|
|
|
|
- `asyncio.get_event_loop()` behavior changed in newer Python versions; this can cause runtime warnings or errors.
|
|
|
|
|
- Resource leaks can occur if `startup_shared_resources()` fails before the lifespan shutdown path is reached.
|
|
|
|
|
- The fail2ban socket client should still handle transient errors and not hide protocol failures behind generic exceptions.
|
|
|
|
|
|
|
|
|
|
|