Files
BanGUI/Docs/Tasks.md

42 KiB
Raw Blame History

BanGUI — Task List

This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.

Reference: Docs/Refactoring.md for full analysis of each issue.


Open Issues


Task 1 — Fix setup_service: mark setup_done only after successful DB init

Status: Completed

Severity: Bug

Where: backend/app/services/setup_service.pyrun_setup() function, last few lines where settings_repo.set_setting(db, _KEY_SETUP_DONE, "1") is called unconditionally.

Goal: _KEY_SETUP_DONE must only be written to the bootstrap DB after _ensure_database_initialized() has confirmed the runtime database was successfully created and all initial settings were persisted. The write must be the final step in a successful path, not an unconditional epilogue.

Possible traps and issues:

  • The _ensure_database_initialized() helper returns False on failure rather than raising — callers must check the return value explicitly rather than relying on exceptions.
  • The bootstrap DB and the runtime DB are two different connections; make sure the set_setting call targets the bootstrap DB (the one that holds setup_completed) and not the runtime DB.
  • run_setup() is not idempotent by design (it raises if called twice), so a failed first run that incorrectly sets the flag will permanently block the setup wizard without a manual DB reset.

Docs changes needed: Docs/Architekture.md section 6 "Authentication & Session Management" describes the setup-completion flag. The note about is_setup_complete() caching should clarify that the flag is only written after a fully successful setup.

Why this is needed: If the runtime DB creation fails (e.g., path does not exist, permission denied) but setup_completed = 1 is still written to the bootstrap DB, every subsequent request will pass the setup-completion check, get_runtime_database_path() will return the invalid path, and opening that DB will throw a 503 error on every request with no way to re-run setup through the UI.


Task 2 — Fix log_service.preview_log: stale run_blocking call site

Status: Completed

Severity: Bug

Where: backend/app/services/log_service.py line 74 — the call await run_blocking(None, _read_tail_lines, str(path), req.num_lines).

Goal: Change the call to await run_blocking(_read_tail_lines, str(path), req.num_lines). The run_blocking signature is run_blocking(func, *args, *, executor=None) — the first argument is the callable, not an executor. The None was left over from an older signature where executor was the first positional argument.

Possible traps and issues:

  • asyncio.to_thread(None, ...) does not raise immediately at the await site; it may produce a confusing TypeError: 'NoneType' object is not callable at the point where the thread runs, which can be hard to trace if logs are sparse.
  • Verify no other call sites in log_service.py or any future callers of run_blocking use the old positional-executor convention.
  • After fixing, run the log-preview endpoint manually or via the test suite to confirm the fix.

Docs changes needed: None — this is a straightforward call-site correction, not a design change.

Why this is needed: The GET /api/config log preview endpoint (POST /api/config/preview-log) calls preview_log(), which will crash with a TypeError on every invocation. The feature has been silently broken since the run_blocking signature was changed.


Task 3 — Fix filter_config router: import ConfigWriteError from the correct module

Status: Completed

Severity: Bug

Where: backend/app/routers/filter_config.py line 8 — from app.services.config_file_service import ConfigWriteError. Compare with backend/app/routers/action_config.py line 8 — from app.exceptions import ConfigWriteError.

filter_config_service raises its own internal exceptions that may or may not match the imported class identity.

Goal: All routers must catch ConfigWriteError from app.exceptions — the single canonical definition. The filter_config.py import must be changed to from app.exceptions import ConfigWriteError. The same class must also be the one actually raised by filter_config_service (verify the service raises from app.exceptions, not from a local definition).

Possible traps and issues:

  • Python exception matching uses class identity (isinstance), not structural typing. If the service raises config_file_service.ConfigWriteError but the router catches exceptions.ConfigWriteError, the except block silently misses and the exception propagates to the global 500 handler.
  • filter_config_service.py imports ConfigWriteError from config_file_service (line 8 of that file). After fixing the router, also ensure the service raises the canonical one from app.exceptions.
  • This fix depends on Task 4 (consolidating all exception classes into app.exceptions) being done first or in parallel.

Docs changes needed: Docs/Architekture.md section 2.2 services table — add a note that all domain exceptions are raised from app.exceptions, not from individual service modules.

Why this is needed: Any config file write error in a filter operation (bad content, permission denied, disk full) will be swallowed and returned as HTTP 500 instead of the correct 400/503. Users will see an unhelpful "unexpected error" message with no actionable detail.


Task 4 — Consolidate all exception classes into app.exceptions

Status: Completed

Severity: High

Where: backend/app/exceptions.py — canonical location, currently has JailNotFoundInConfigError, ConfigWriteError, ConfigValidationError, ConfigOperationError, and others.

Duplicates in:

  • backend/app/services/config_file_service.py lines 128, 167, 171 — JailNotFoundInConfigError, JailNameError, ConfigWriteError
  • backend/app/services/jail_config_service.py lines 84, 123, 127 — same three classes
  • backend/app/services/filter_config_service.py lines 45, 95 — FilterNotFoundError, JailNameError
  • backend/app/services/action_config_service.py lines 44, 116 — ActionNotFoundError, JailNameError
  • backend/app/services/config_file_service.py lines 1611 and 2223 — FilterNotFoundError, ActionNotFoundError

Goal: Every exception class lives exactly once in app.exceptions. All services and routers import exceptions from there. Local class definitions in service files are removed. The complete canonical list: JailNotFoundError, JailNotFoundInConfigError, JailNameError, JailAlreadyActiveError, JailAlreadyInactiveError, JailOperationError, ConfigValidationError, ConfigOperationError, ConfigWriteError, FilterNotFoundError, FilterInvalidRegexError, FilterAlreadyExistsError, FilterNameError, FilterReadonlyError, ActionNotFoundError, ActionAlreadyExistsError, ActionNameError, ActionReadonlyError, ServerOperationError.

Possible traps and issues:

  • When removing a local class definition from a service, check every except clause in every router and other service that previously imported that class from the service module — those imports must be updated to app.exceptions.
  • filter_config.py and action_config.py import JailNameError and JailNotFoundInConfigError from jail_config_service — these must also move to app.exceptions.
  • Run the full test suite after each service is migrated, not at the end, to catch regressions early.
  • Pay attention to JailAlreadyActiveError and JailAlreadyInactiveError — these are currently only in jail_config_service and not yet in app.exceptions.

Docs changes needed: Docs/Architekture.md section 2.2 — add exceptions.py to the utils description table, noting it is the single source of truth for all domain exceptions.

Why this is needed: With multiple copies of the same class name in different modules, isinstance checks silently fail when a service raises class A but the router catches class B with the same name. The result is unexpected 500 errors that are hard to reproduce and diagnose.


Task 5 — Resolve config_file_service.py god object and dual implementations

Status: Done

Severity: High

Where: backend/app/services/config_file_service.py — 3126 lines embedding jail, filter, and action domain logic. backend/app/services/jail_config_service.py — 1066 lines. backend/app/services/filter_config_service.py — 927 lines. backend/app/services/action_config_service.py — 1070 lines. backend/app/helpers/config_file_helpers.py — re-exports private symbols from config_file_service.

Goal: Establish one canonical implementation per domain. The intended split (per Docs/Architekture.md) is:

  • config_file_service.py — shared infrastructure only: config file parsing, atomic writes, path helpers, _parse_jails_sync, _build_parser, _safe_jail_name, etc.
  • jail_config_service.py — jail activation/deactivation, activate_jail, deactivate_jail, validate_jail_config, rollback_jail, list_inactive_jails.
  • filter_config_service.py — filter lifecycle: list_filters, get_filter, create_filter, update_filter, delete_filter, assign_filter_to_jail.
  • action_config_service.py — action lifecycle: same shape as filter.

Duplicate function bodies in jail_config_service.py that duplicate what is already in config_file_service.py must be removed. The canonical versions stay in config_file_service.py (or are promoted to app/helpers/ if they are shared infrastructure). config_file_helpers.py should be able to import from config_file_service without re-exporting private names.

Possible traps and issues:

  • The routers currently import from jail_config_service (e.g., jail_config.py line 33). After removing duplicates, confirm all router imports still resolve.
  • start_daemon() and wait_for_fail2ban() are duplicated verbatim in two files — decide on one owner (likely config_file_service) and delete the other copy; update all callers.
  • This is the largest refactoring task. Do it in stages: first remove class duplicates (covered by Task 4), then consolidate functions one group at a time, running tests after each group.
  • The helpers/config_file_helpers.py re-exports private symbols (_parse_jails_sync etc.) because the domain services need them. After this task those symbols should be importable directly from config_file_service without being private, or should be moved to a proper shared util module.

Docs changes needed: Docs/Architekture.md section 2.2 services table — update the description of config_file_service.py, jail_config_service.py, filter_config_service.py, action_config_service.py to reflect the final split after the refactor.

Why this is needed: 7201 lines across 5 files with two active implementations of the same logic is the dominant source of maintenance risk in the backend. Any bug fix or feature to jail/filter/action config must be applied in multiple places, and it is currently unclear which implementation is authoritative.


Task 6 — Extract fail2ban response helpers into a shared util module

Severity: High

Where: 16 copies across 8 service files:

  • _ok() in jail_service.py:90, config_service.py:77, config_file_service.py:570 (inner fn), server_service.py:64, health_service.py:33
  • _to_dict() in jail_service.py:113, config_service.py:98, health_service.py:60
  • _ensure_list() in jail_service.py:134, config_service.py:112
  • _is_truthy() in config_file_service.py:294, jail_config_service.py:169, filter_config_service.py:157, action_config_service.py:158
  • _is_not_found_error() in jail_service.py:155, config_service.py:147

Goal: Create backend/app/utils/fail2ban_utils.py (or extend fail2ban_client.py) with public functions:

  • unwrap_response(response) -> object (the current _ok logic)
  • response_to_dict(pairs) -> dict[str, object] (the current _to_dict logic)
  • response_to_list(value) -> list[str] (the current _ensure_list logic)
  • is_truthy(value: str) -> bool (the current _is_truthy logic)
  • is_not_found_error(exc: Exception) -> bool (the current _is_not_found_error logic)

All 8 service files then import from this one location and delete their local copies.

Possible traps and issues:

  • _ok() in config_file_service.py is defined as an inner function inside a specific method, not at module level — it must be extracted before it can be replaced.
  • Check all call sites for subtle differences: _ok in some files casts the return, in others it does not. Ensure the shared version covers all variants.
  • _is_truthy is always paired with _TRUE_VALUES = frozenset({"true", "yes", "1"}) — move both together. This overlaps with Task 9 (consolidating _TRUE_VALUES).
  • After replacement, run the full test suite — a signature mismatch will cause runtime TypeErrors that may not be caught by static analysis.

Docs changes needed: Docs/Architekture.md section 2.2 utils table — add fail2ban_utils.py (or document the extension to fail2ban_client.py) and describe the response-parsing helpers.

Why this is needed: A future bug fix to the pickle-response parsing logic must be applied in 16 places. A discrepancy between copies already exists: the inner _ok in config_file_service.py does not cast the return, while the top-level versions do.


Task 7 — Adopt constants.py FAIL2BAN_SOCKET_TIMEOUT_SECONDS everywhere

Severity: High

Where: backend/app/utils/constants.py line 16 — FAIL2BAN_SOCKET_TIMEOUT_SECONDS: Final[float] = 5.0 (never imported by any service).

Hardcoded local constants in:

  • backend/app/services/jail_service.py:6510.0
  • backend/app/services/server_service.py:3110.0
  • backend/app/services/jail_config_service.py:5810.0
  • backend/app/services/config_service.py:6310.0
  • backend/app/services/config_file_service.py:10810.0
  • backend/app/services/action_config_service.py:9310.0
  • backend/app/services/ban_service.py:555.0
  • backend/app/services/health_service.py:305.0
  • backend/app/services/fail2ban_metadata_service.py:655.0 (inline literal, not even a named constant)

Goal: constants.py should export two constants:

  • FAIL2BAN_SOCKET_TIMEOUT_SECONDS: Final[float] = 10.0 — standard operational timeout
  • FAIL2BAN_SOCKET_FAST_TIMEOUT_SECONDS: Final[float] = 5.0 — for health probes and metadata queries that should fail quickly

Every service deletes its local _SOCKET_TIMEOUT definition and imports the appropriate constant. Decide intentionally which services should use the fast vs standard timeout rather than having the difference be accidental.

Possible traps and issues:

  • The current value in constants.py is 5.0 but most services use 10.0. Changing to a single value would alter the behaviour of health probes or operational commands. Use two explicit constants to preserve the intentional difference.
  • fail2ban_metadata_service.py hardcodes 5.0 as an inline literal inside _resolve_db_path() with no named constant at all — easy to miss when grepping for _SOCKET_TIMEOUT.
  • Static analysis (mypy/ruff) will not catch this — must be found by grep or manual review.

Docs changes needed: Docs/Architekture.md section 4.3 fail2ban communication — mention the timeout constants and their purpose.

Why this is needed: The current inconsistency (five services at 10 s, three at 5 s) appears to be accidental. A planned change to socket timeout policy requires touching 8+ files instead of one. A misconfigured timeout on a slow host will cause silent bans to fail; having one value to change makes this tunable.


Task 8 — Remove app/helpers/ indirection layer; fix circular imports structurally

Status: Completed

Severity: Medium

Where: backend/app/helpers/config_file_helpers.py — 18 lines, imports 5 private symbols from config_file_service and re-exports them unchanged. backend/app/helpers/log_helpers.py — 22 lines, wraps log_service.preview_log and log_service.test_regex one-for-one. backend/app/helpers/jail_helpers.py — wraps jail_service.reload_all as reload_jails.

Goal: Eliminate the helpers layer entirely by fixing the underlying circular imports that made it necessary:

  1. config_file_helpers.py exists because jail_config_service, filter_config_service, action_config_service need private symbols from config_file_service but importing directly would create a cycle (after Task 5 splits the god object, the symbols these services need should live in a proper shared location and be importable directly).
  2. log_helpers.py exists because config_service imports from log_service but some path would create a cycle. Move test_regex and preview_log to log_service and import directly from there.
  3. jail_helpers.py is a pointless one-function wrapper. Import jail_service.reload_all directly.

After the underlying causes are fixed, delete all three helper files and update all import sites.

Possible traps and issues:

  • Do not delete helpers until the circular imports are resolved — doing so in the wrong order will cause ImportError at startup.
  • Check __all__ in config_file_helpers.py — it exports symbols that were previously accessed by external code. After removal, all consumers must be updated to import from the canonical location.
  • This task should follow Task 5 (god object split) since that is the root cause of most of the cycles.

Docs changes needed: Docs/Architekture.md section 2.1 — remove the helpers/ directory from the project structure diagram, or rename it to document its true role.

Why this is needed: The helpers layer hides import cycles rather than eliminating them and adds three levels of indirection to reach simple utility functions. A new developer reading the import chain cannot understand why the layer exists.


Task 9 — Consolidate _TRUE_VALUES into constants.py

Status: Completed Severity: Low (part of Task 6 cleanup)

Where: _TRUE_VALUES = frozenset({"true", "yes", "1"}) defined in:

  • backend/app/services/config_file_service.py:117
  • backend/app/services/jail_config_service.py:67
  • backend/app/services/action_config_service.py:105

filter_config_service.py:35 imports it as a private symbol: from app.services.config_file_service import _TRUE_VALUES.

Goal: Add FAIL2BAN_TRUTHY_VALUES: Final[frozenset[str]] = frozenset({"true", "yes", "1"}) to backend/app/utils/constants.py. All four service files import from there; local definitions and the cross-service private import are removed.

Possible traps and issues:

  • The cross-service import of a private symbol (_TRUE_VALUES with leading underscore) breaks the layering contract — if config_file_service ever moves or renames the constant, filter_config_service silently breaks at runtime. Fix the import in filter_config_service as part of this task.
  • This constant is closely coupled with _is_truthy() (Task 6) — consider combining both changes.

Docs changes needed: Docs/Architekture.md section 2.2 utils — note that constants.py also holds the shared set of truthy string values used by config parsers.

Why this is needed: Importing a private symbol from a peer service module is a layering violation. If the implementation ever changes (e.g., adding "on" as a truthy value), all four definitions must be found and updated.


Task 10 — Replace ban_service deferred local imports with top-level imports

Status: Completed

Severity: Medium

Where: backend/app/services/ban_service.py lines 172, 340, 596, 695 — four separate from app.repositories.history_archive_repo import ... statements inside function bodies.

Goal: Move all four history_archive_repo imports to the top of ban_service.py. If a true circular import exists, resolve it structurally (move the shared dependency to a lower-level module both can import) rather than deferring.

Possible traps and issues:

  • Deferred imports suggest a hidden circular dependency: ban_servicehistory_archive_repoban_service or similar. Before moving the import, trace the full import chain to confirm. If a cycle exists, it must be broken (e.g., by extracting a shared data model or helper that neither module imports the other for).
  • history_archive_repo should be a pure data-access module — it should not import from any service. If it does, that is the actual bug to fix.
  • Moving imports is safe if there is no cycle; confirm by running python -c "from app.services import ban_service" after the change.

Docs changes needed: None.

Why this is needed: Deferred function-body imports hide module-level code dependencies, break standard import analysis tools, and run the import machinery on every call to those functions (Python memoises the result, but the attribute lookup overhead and code clarity cost are real). Four identical deferred imports in one file indicate a structural problem.


Task 11 — Replace history_sync.py deferred import inside loop

Status: Completed

Severity: Medium

Where: backend/app/tasks/history_sync.py line 79 — from app.repositories.history_archive_repo import archive_ban_event inside a while True: loop body.

Goal: Move the import to top-level. Same structural fix as Task 10 — if a circular import prevents this, break the cycle instead of deferring.

Possible traps and issues:

  • This import fires on every iteration of the pagination loop (potentially hundreds of times per sync run). Python caches the module after the first import so there is no repeated I/O cost, but the sys.modules lookup and attribute access on every loop iteration is unnecessary overhead.
  • If moving the import to the top causes a circular import error, the circular dependency must be resolved before this task is closed.

Docs changes needed: None.

Why this is needed: An import statement inside a loop is an unconventional pattern that confuses readers and signals an unresolved dependency issue. Standard tools (isort, ruff) will flag it.


Task 12 — Replace blocklist_service importlib workaround with proper injection

Status: Completed

Severity: Medium

Where: backend/app/services/blocklist_service.py lines 364366:

jail_svc = importlib.import_module("app.services.jail_service")
ban_ip_fn = getattr(jail_svc, "ban_ip")

This was used as a workaround for a circular import between blocklist_service and jail_service.

Goal: The import_source() function already accepts an optional ban_ip parameter for testing. Make this parameter required (not optional) at the service level and always pass jail_service.ban_ip from the router or from the scheduled task that calls import_source. Remove the importlib fallback entirely.

Possible traps and issues:

  • All callers of import_source() (the scheduled blocklist task tasks/blocklist_import.py and the manual-trigger router endpoint) must be updated to pass ban_ip=jail_service.ban_ip.
  • After making the parameter required, mypy will flag any caller that does not pass it.
  • Verify the circular import that originally motivated this workaround — if blocklist_service imports jail_service at the top of the file and jail_service imports blocklist_service, address the cycle properly (the dependency should only flow one way: blocklist_service depends on jail_service, not the reverse).

Docs changes needed: None.

Why this is needed: importlib.import_module with a string breaks static analysis, IDE navigation, and refactoring tools. The function effectively already supports injection but bypasses it in production. The workaround made the production code path differ from the tested code path.


Task 13 — Wire geo_batch_lookup through dependency injection

Severity: Medium

Where: Five call sites where geo_batch_lookup=geo_service.lookup_batch is passed explicitly as an argument:

  • backend/app/routers/dashboard.py:126 and :178
  • backend/app/routers/bans.py:80
  • backend/app/routers/blocklist.py:143
  • backend/app/routers/jails.py:602

Goal: Add a GeoBatchLookupDep type alias and get_geo_batch_lookup() dependency provider to backend/app/dependencies.py, similar to existing deps like Fail2BanSocketDep. Routers declare geo_lookup: GeoBatchLookupDep in their handler signatures and stop passing geo_service.lookup_batch manually. Service functions that accept geo_batch_lookup as a parameter should continue doing so (good for testability) — the injection just moves from the router body to the dependency declaration.

Possible traps and issues:

  • The callable type annotation for geo_batch_lookup needs to be precise enough for mypy. Either use a Protocol or import the exact type from geo_service.
  • Some service functions accept geo_batch_lookup as a required parameter; after this change, always confirm those services are called with it.
  • This is low-risk but touches five routers and dependencies.py — run the full test suite after.

Docs changes needed: Docs/Architekture.md section 2.2 dependencies — mention the geo lookup dependency provider.

Why this is needed: Manually threading geo_service.lookup_batch at call sites is not dependency injection — it is partial application scattered across files. Any future change to the geo lookup implementation (switching provider, adding caching policy) requires touching five routers instead of one dependency function.


Task 14 — Fix router prefix structure for config sub-routers

Severity: Medium

Where: backend/app/routers/config.py — this aggregator router has prefix="/api/config" and includes four sub-routers. The sub-routers are defined as router = APIRouter() with no prefix:

  • backend/app/routers/jail_config.py:42
  • backend/app/routers/filter_config.py:26
  • backend/app/routers/action_config.py:25
  • backend/app/routers/config_misc.py:23

Route paths like "/jails" in jail_config.py resolve to /api/config/jails only by implicit inheritance from the parent.

Goal: Each sub-router should declare its own prefix explicitly (e.g. APIRouter(prefix="/jails", tags=["Jail Config"])). The aggregator config.py then calls router.include_router(jail_config.router) without any additional prefix argument. This makes each router self-describing and independently testable.

Possible traps and issues:

  • Assert that the final URL paths are identical before and after the change. Use FastAPI's route inspection (app.routes) in a test to validate.
  • config_misc.py routes may not have a natural single sub-prefix — list its routes first and decide on a grouping.
  • Frontend src/api/endpoints.ts hard-codes the final URL paths; changing prefixes here will break the frontend unless the paths remain the same.

Docs changes needed: Docs/Architekture.md section 8.2 endpoint groups — verify the listed paths match the actual registered routes after the fix.

Why this is needed: A sub-router with no prefix is not independently testable via TestClient(sub_router_app) because its routes resolve to bare paths like /jails instead of /api/config/jails. Implicit inheritance from a parent makes the actual URL opaque to anyone reading the sub-router file.


Task 15 — Remove 7 ghost imports from routers/config.py

Severity: Medium

Where: backend/app/routers/config.py lines 613:

from app.services import (
    action_config_service,
    config_file_service,
    config_service,
    filter_config_service,
    jail_config_service,
    jail_service,
    log_service,
)  # noqa: F401

None of these are referenced in the file body.

Goal: Remove the entire import block. If these imports are present to force module initialisation for side effects, document exactly what side effect is required and trigger it explicitly in the startup lifespan instead.

Possible traps and issues:

  • If any of these modules register something at import time (class registries, decorators, etc.), removing the import will silently skip that registration. Audit each module for module-level side effects before deleting the import.
  • The # noqa: F401 suppresses the linter warning that would normally catch unused imports — this is a code smell that was masking the issue.
  • After removing, run the full application startup and hit all config endpoints to confirm no registration was skipped.

Docs changes needed: None.

Why this is needed: Imports for side effects are an anti-pattern that creates hidden startup dependencies. They suppress linter warnings, make the code misleading, and force unnecessary module-level execution at startup even when the imports are truly vestigial.


Task 16 — Reset _backend_cmd_supported between test runs

Severity: Medium

Where: backend/app/services/jail_service.py lines 7778:

_backend_cmd_supported: bool | None = None
_backend_cmd_lock: asyncio.Lock = asyncio.Lock()

Module-level mutable globals that persist for the lifetime of the Python process.

Goal: Provide a _reset_capability_cache() function (or use importlib.reload in tests) so the test suite can reset _backend_cmd_supported to None between test cases. Alternatively, move the cache into the Fail2BanMetadataService-style class instance (similar to how default_fail2ban_metadata_service is structured) so tests can construct a fresh instance per test.

Possible traps and issues:

  • asyncio.Lock() objects in Python 3.10+ are not bound to the running event loop at creation, so creating them at module level is safe. But if a test creates a new event loop (common with pytest-asyncio), an existing asyncio.Lock may behave unexpectedly.
  • The cleanest fix is to encapsulate the cache and lock into a class, instantiate it as a module-level singleton identical to Fail2BanMetadataService, and expose a invalidate_backend_capability_cache() function for test teardown.

Docs changes needed: None.

Why this is needed: A test that forces _backend_cmd_supported = False (by mocking a socket error) will corrupt the capability cache for all subsequent tests in the same process. This produces flaky test failures that are very hard to diagnose.


Task 17 — Remove unittest.mock.Mock check from production runtime_state

Severity: Low

Where: backend/app/utils/runtime_state.py lines 1516 and line 107:

from unittest.mock import Mock as _Mock
...
if runtime_settings is not None and _Mock is not None and isinstance(runtime_settings, _Mock):
    return get_app_settings(app)

Goal: Remove _Mock entirely from runtime_state.py. The test-isolation problem it was solving should be handled in test fixtures by injecting a real Settings instance rather than a Mock. Tests that currently rely on this fallback should be updated to use create_app(settings=my_test_settings) with a real test settings object.

Possible traps and issues:

  • Find all tests that pass a Mock as runtime_settings and replace the mock with a Settings(**test_overrides) instance. This is the correct approach.
  • After removing the check, run the full test suite. Any test that previously relied on the Mock detection will fail fast rather than silently falling back.

Docs changes needed: None.

Why this is needed: unittest.mock is a test library. Its presence in a production module is a layering violation that adds import overhead at every startup and puts test scaffolding logic in a path that runs on every request.


Status: Completed

Severity: Low

Where:

  • backend/app/dependencies.py:61_COOKIE_NAME = "bangui_session"
  • backend/app/routers/auth.py:29_COOKIE_NAME = "bangui_session"

Goal: Add SESSION_COOKIE_NAME: Final[str] = "bangui_session" to backend/app/utils/constants.py. Both files import from there and remove their local definition.

Possible traps and issues:

  • Search for any other files that hard-code the string "bangui_session" directly (e.g., in tests) and update those too.
  • The cookie name is also referenced in the frontend (document.cookie or fetch clients) — verify it matches.

Docs changes needed: None.

Why this is needed: If the cookie name ever needs to change (e.g., to support multi-tenant deployments or versioning), it must currently be changed in two backend files and potentially in frontend code. A single constant with a clear name is self-documenting.


Task 19 — Remove dead loop variable and asyncio import in auth_service

Severity: Low

Where: backend/app/services/auth_service.py lines 10 and 75:

import asyncio          # line 10
...
loop = asyncio.get_running_loop()  # line 75 — assigned but never used
return await run_blocking(lambda: bool(bcrypt.checkpw(plain_bytes, hashed_bytes)))

Goal: Delete line 75 (loop = asyncio.get_running_loop()). Remove import asyncio from line 10 since it will then be unused.

Possible traps and issues:

  • Verify asyncio is not used anywhere else in the file before removing the import.
  • The run_blocking(lambda: ...) call relies on asyncio.to_thread internally — this is correct and does not need the loop variable.

Docs changes needed: None.

Why this is needed: Dead code misleads readers into thinking loop is used for something. The original intent was likely loop.run_in_executor(...) before run_blocking was introduced, but the old assignment was never cleaned up.


Task 20 — Offload ensure_jail_configs to a thread at startup

Severity: Low

Where: backend/app/startup.py line 120:

ensure_jail_configs(Path(settings.fail2ban_config_dir) / "jail.d")

ensure_jail_configs in backend/app/utils/jail_config.py performs synchronous filesystem writes (Path.write_text, Path.mkdir) directly in the async startup path.

Goal: Wrap the call in await asyncio.to_thread(ensure_jail_configs, Path(settings.fail2ban_config_dir) / "jail.d") or use await run_blocking(ensure_jail_configs, ...) to offload the blocking I/O to a thread pool.

Possible traps and issues:

  • ensure_jail_configs is currently synchronous (def, not async def). It should remain synchronous — just call it via asyncio.to_thread.
  • On typical local filesystems this completes in microseconds, but on NFS or Docker-volume mounts with high I/O latency it can block the event loop for tens of milliseconds, preventing any other startup task from running.
  • Ensure the call remains before the scheduler is started (the blocklist-import task depends on the jail config files being present).

Docs changes needed: None.

Why this is needed: Synchronous I/O in the async startup path violates the "Async Everything" design principle (Architecture doc section 10). It blocks the event loop during startup and sets a bad precedent for future startup tasks.


Task 21 — Consolidate start_daemon / wait_for_fail2ban duplicates

Severity: Low

Where: start_daemon() and wait_for_fail2ban() are defined identically in:

  • backend/app/services/config_file_service.py (lines approx. 560620)
  • backend/app/services/jail_config_service.py

Goal: Keep one canonical implementation in config_file_service.py (as the shared infrastructure layer). Delete the copy from jail_config_service.py. Update jail_config_service to call the version from config_file_service directly (no intermediary helper).

Possible traps and issues:

  • After Task 5 splits the god object, confirm whether start_daemon and wait_for_fail2ban belong in config_file_service (shared infra) or in a dedicated fail2ban_lifecycle.py util. Either placement is fine as long as there is only one.
  • Check if any tests import start_daemon or wait_for_fail2ban from jail_config_service directly — those tests will need updating.

Docs changes needed: None.

Why this is needed: Any change to the daemon-start retry logic (timeouts, exception handling) must currently be applied twice. This will diverge over time as the two copies evolve independently.


Task 22 — Remove DEFAULT_BLOCKING_EXECUTOR dead code

Severity: Low

Where: backend/app/utils/async_utils.py lines 1619:

DEFAULT_BLOCKING_EXECUTOR: ThreadPoolExecutor = ThreadPoolExecutor(
    max_workers=16,
    thread_name_prefix="bangui-blocking",
)

run_blocking() ignores this when executor=None (the default at every call site) and delegates to asyncio.to_thread instead. DEFAULT_BLOCKING_EXECUTOR is never passed to any run_blocking call.

Goal: Either wire DEFAULT_BLOCKING_EXECUTOR as the actual default used inside run_blocking (replacing asyncio.to_thread) to get the deliberate size-16 pool, or delete the constant entirely if asyncio.to_thread (default executor, size = min(32, os.cpu_count() + 4)) is acceptable.

Possible traps and issues:

  • If the 16-worker pool was intentional (to limit concurrent blocking operations under high load), then run_blocking should default to using it: return await loop.run_in_executor(DEFAULT_BLOCKING_EXECUTOR, functools.partial(func, *args, **kwargs)).
  • asyncio.to_thread uses the loop's default executor which is configured at the OS level and may be smaller than 16. If blocking I/O throughput is a concern, the named pool is better.
  • ThreadPoolExecutor created at module level is never explicitly shut down — add a shutdown hook in the app lifespan if the pool is kept.

Docs changes needed: None.

Why this is needed: A named 16-worker thread pool that is never actually used consumes OS thread resources from startup for zero benefit. It also misleads readers who assume run_blocking uses it.


Task 23 — Fix InMemorySessionCache instantiated when disabled

Severity: Low

Where: backend/app/main.py line 296:

app.state.session_cache = InMemorySessionCache()

This is unconditional — InMemorySessionCache is always created even when settings.session_cache_enabled = False.

Goal: Instantiate the cache conditionally: use InMemorySessionCache() when session_cache_enabled is True and a NoOpSessionCache (a minimal SessionCache-protocol-compatible stub that always returns None from get and does nothing on set/invalidate) otherwise. The get_session_cache dependency in dependencies.py can then always return the cache without a None check because the correct implementation (real or no-op) is always present.

Possible traps and issues:

  • The null-cache pattern avoids the if cache_enabled: checks scattered in require_auth. After this change, move all cache logic into the cache implementations — require_auth calls cache.get(token) unconditionally and the no-op always returns None.
  • get_session_cache currently raises HTTP 503 if app_context.session_cache is None — after this change the cache is always non-None; remove that guard.

Docs changes needed: Docs/Architekture.md section 6 — update the session validation cache note to describe the no-op pattern used for disabled mode.

Why this is needed: Allocating an in-memory structure and its lock mechanisms for a feature that is disabled at configuration time wastes memory and creates a correctness gap: if code accidentally calls session_cache.set() when disabled, it will populate a cache that is never invalidated on logout.


Task 24 — Remove unused asyncio import from log_service.py

Severity: Low

Where: backend/app/services/log_service.py line 9 — import asyncio is present but asyncio is never referenced in the file body.

Goal: Delete line 9.

Possible traps and issues:

  • Confirm asyncio is not referenced in any conditional branch (including type-checking TYPE_CHECKING blocks).

Docs changes needed: None.

Why this is needed: Unused imports add noise, slow down import time marginally, and may confuse linters or readers who assume there must be async behaviour somewhere in the module.


Task 25 — Extend service protocols coverage

Severity: Low (foundation for future testability)

Where: backend/app/services/protocols.py — only AuthService and JailService are defined. backend/app/repositories/protocols.py — only SessionRepository is defined.

No concrete service or repository module declares that it implements any protocol (class AuthServiceImpl(AuthService): does not exist anywhere).

Goal: Add Protocol definitions to services/protocols.py for: BanService, ConfigService, HistoryService, GeoService, BlocklistService, HealthService, ServerService. Add Protocol definitions to repositories/protocols.py for: BlocklistRepository, ImportLogRepository, GeoCacheRepository, Fail2BanDbRepository.

Use typing.runtime_checkable on protocols that need instance-check support in tests. Expose the protocols as Dep type aliases in dependencies.py for each new service wired through DI.

Possible traps and issues:

  • Not all services are injected through dependencies.py today — many are called as modules directly (from app.services import ban_service). The protocol coverage task should go alongside a step to inject these through Depends().
  • Protocols must be kept in sync with the concrete implementations — a method added to the concrete module must be added to the protocol. Consider a test that instantiates a concrete module as the protocol type to catch divergence.

Docs changes needed: Docs/Architekture.md section 10 design principles — reinforce the Dependency Inversion entry: all services injected via dependencies.py must have a corresponding Protocol.

Why this is needed: Without protocols for the majority of services, unit tests must either use the real implementation (slow, requires running fail2ban) or patch methods at the module level (fragile). Protocols enable clean mock injection via Depends() in tests.


Completed Issues