42 KiB
BanGUI — Task List
This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.
Reference: Docs/Refactoring.md for full analysis of each issue.
Open Issues
Task 1 — Fix setup_service: mark setup_done only after successful DB init
Status: Completed
Severity: Bug
Where:
backend/app/services/setup_service.py — run_setup() function, last few lines where settings_repo.set_setting(db, _KEY_SETUP_DONE, "1") is called unconditionally.
Goal:
_KEY_SETUP_DONE must only be written to the bootstrap DB after _ensure_database_initialized() has confirmed the runtime database was successfully created and all initial settings were persisted. The write must be the final step in a successful path, not an unconditional epilogue.
Possible traps and issues:
- The
_ensure_database_initialized()helper returnsFalseon failure rather than raising — callers must check the return value explicitly rather than relying on exceptions. - The bootstrap DB and the runtime DB are two different connections; make sure the
set_settingcall targets the bootstrap DB (the one that holdssetup_completed) and not the runtime DB. run_setup()is not idempotent by design (it raises if called twice), so a failed first run that incorrectly sets the flag will permanently block the setup wizard without a manual DB reset.
Docs changes needed:
Docs/Architekture.md section 6 "Authentication & Session Management" describes the setup-completion flag. The note about is_setup_complete() caching should clarify that the flag is only written after a fully successful setup.
Why this is needed:
If the runtime DB creation fails (e.g., path does not exist, permission denied) but setup_completed = 1 is still written to the bootstrap DB, every subsequent request will pass the setup-completion check, get_runtime_database_path() will return the invalid path, and opening that DB will throw a 503 error on every request with no way to re-run setup through the UI.
Task 2 — Fix log_service.preview_log: stale run_blocking call site
Status: Completed
Severity: Bug
Where:
backend/app/services/log_service.py line 74 — the call await run_blocking(None, _read_tail_lines, str(path), req.num_lines).
Goal:
Change the call to await run_blocking(_read_tail_lines, str(path), req.num_lines). The run_blocking signature is run_blocking(func, *args, *, executor=None) — the first argument is the callable, not an executor. The None was left over from an older signature where executor was the first positional argument.
Possible traps and issues:
asyncio.to_thread(None, ...)does not raise immediately at theawaitsite; it may produce a confusingTypeError: 'NoneType' object is not callableat the point where the thread runs, which can be hard to trace if logs are sparse.- Verify no other call sites in
log_service.pyor any future callers ofrun_blockinguse the old positional-executor convention. - After fixing, run the log-preview endpoint manually or via the test suite to confirm the fix.
Docs changes needed: None — this is a straightforward call-site correction, not a design change.
Why this is needed:
The GET /api/config log preview endpoint (POST /api/config/preview-log) calls preview_log(), which will crash with a TypeError on every invocation. The feature has been silently broken since the run_blocking signature was changed.
Task 3 — Fix filter_config router: import ConfigWriteError from the correct module
Status: Completed
Severity: Bug
Where:
backend/app/routers/filter_config.py line 8 — from app.services.config_file_service import ConfigWriteError.
Compare with backend/app/routers/action_config.py line 8 — from app.exceptions import ConfigWriteError.
filter_config_service raises its own internal exceptions that may or may not match the imported class identity.
Goal:
All routers must catch ConfigWriteError from app.exceptions — the single canonical definition. The filter_config.py import must be changed to from app.exceptions import ConfigWriteError. The same class must also be the one actually raised by filter_config_service (verify the service raises from app.exceptions, not from a local definition).
Possible traps and issues:
- Python exception matching uses class identity (
isinstance), not structural typing. If the service raisesconfig_file_service.ConfigWriteErrorbut the router catchesexceptions.ConfigWriteError, theexceptblock silently misses and the exception propagates to the global 500 handler. filter_config_service.pyimportsConfigWriteErrorfromconfig_file_service(line 8 of that file). After fixing the router, also ensure the service raises the canonical one fromapp.exceptions.- This fix depends on Task 4 (consolidating all exception classes into
app.exceptions) being done first or in parallel.
Docs changes needed:
Docs/Architekture.md section 2.2 services table — add a note that all domain exceptions are raised from app.exceptions, not from individual service modules.
Why this is needed: Any config file write error in a filter operation (bad content, permission denied, disk full) will be swallowed and returned as HTTP 500 instead of the correct 400/503. Users will see an unhelpful "unexpected error" message with no actionable detail.
Task 4 — Consolidate all exception classes into app.exceptions
Severity: High
Where:
backend/app/exceptions.py — canonical location, currently has JailNotFoundInConfigError, ConfigWriteError, ConfigValidationError, ConfigOperationError, and others.
Duplicates in:
backend/app/services/config_file_service.pylines 128, 167, 171 —JailNotFoundInConfigError,JailNameError,ConfigWriteErrorbackend/app/services/jail_config_service.pylines 84, 123, 127 — same three classesbackend/app/services/filter_config_service.pylines 45, 95 —FilterNotFoundError,JailNameErrorbackend/app/services/action_config_service.pylines 44, 116 —ActionNotFoundError,JailNameErrorbackend/app/services/config_file_service.pylines 1611 and 2223 —FilterNotFoundError,ActionNotFoundError
Goal:
Every exception class lives exactly once in app.exceptions. All services and routers import exceptions from there. Local class definitions in service files are removed. The complete canonical list: JailNotFoundError, JailNotFoundInConfigError, JailNameError, JailAlreadyActiveError, JailAlreadyInactiveError, JailOperationError, ConfigValidationError, ConfigOperationError, ConfigWriteError, FilterNotFoundError, FilterInvalidRegexError, FilterAlreadyExistsError, FilterNameError, FilterReadonlyError, ActionNotFoundError, ActionAlreadyExistsError, ActionNameError, ActionReadonlyError, ServerOperationError.
Possible traps and issues:
- When removing a local class definition from a service, check every
exceptclause in every router and other service that previously imported that class from the service module — those imports must be updated toapp.exceptions. filter_config.pyandaction_config.pyimportJailNameErrorandJailNotFoundInConfigErrorfromjail_config_service— these must also move toapp.exceptions.- Run the full test suite after each service is migrated, not at the end, to catch regressions early.
- Pay attention to
JailAlreadyActiveErrorandJailAlreadyInactiveError— these are currently only injail_config_serviceand not yet inapp.exceptions.
Docs changes needed:
Docs/Architekture.md section 2.2 — add exceptions.py to the utils description table, noting it is the single source of truth for all domain exceptions.
Why this is needed:
With multiple copies of the same class name in different modules, isinstance checks silently fail when a service raises class A but the router catches class B with the same name. The result is unexpected 500 errors that are hard to reproduce and diagnose.
Task 5 — Resolve config_file_service.py god object and dual implementations
Severity: High
Where:
backend/app/services/config_file_service.py — 3126 lines embedding jail, filter, and action domain logic.
backend/app/services/jail_config_service.py — 1066 lines.
backend/app/services/filter_config_service.py — 927 lines.
backend/app/services/action_config_service.py — 1070 lines.
backend/app/helpers/config_file_helpers.py — re-exports private symbols from config_file_service.
Goal:
Establish one canonical implementation per domain. The intended split (per Docs/Architekture.md) is:
config_file_service.py— shared infrastructure only: config file parsing, atomic writes, path helpers,_parse_jails_sync,_build_parser,_safe_jail_name, etc.jail_config_service.py— jail activation/deactivation,activate_jail,deactivate_jail,validate_jail_config,rollback_jail,list_inactive_jails.filter_config_service.py— filter lifecycle:list_filters,get_filter,create_filter,update_filter,delete_filter,assign_filter_to_jail.action_config_service.py— action lifecycle: same shape as filter.
Duplicate function bodies in jail_config_service.py that duplicate what is already in config_file_service.py must be removed. The canonical versions stay in config_file_service.py (or are promoted to app/helpers/ if they are shared infrastructure). config_file_helpers.py should be able to import from config_file_service without re-exporting private names.
Possible traps and issues:
- The routers currently import from
jail_config_service(e.g.,jail_config.pyline 33). After removing duplicates, confirm all router imports still resolve. start_daemon()andwait_for_fail2ban()are duplicated verbatim in two files — decide on one owner (likelyconfig_file_service) and delete the other copy; update all callers.- This is the largest refactoring task. Do it in stages: first remove class duplicates (covered by Task 4), then consolidate functions one group at a time, running tests after each group.
- The
helpers/config_file_helpers.pyre-exports private symbols (_parse_jails_syncetc.) because the domain services need them. After this task those symbols should be importable directly fromconfig_file_servicewithout being private, or should be moved to a proper shared util module.
Docs changes needed:
Docs/Architekture.md section 2.2 services table — update the description of config_file_service.py, jail_config_service.py, filter_config_service.py, action_config_service.py to reflect the final split after the refactor.
Why this is needed: 7201 lines across 5 files with two active implementations of the same logic is the dominant source of maintenance risk in the backend. Any bug fix or feature to jail/filter/action config must be applied in multiple places, and it is currently unclear which implementation is authoritative.
Task 6 — Extract fail2ban response helpers into a shared util module
Severity: High
Where: 16 copies across 8 service files:
_ok()injail_service.py:90,config_service.py:77,config_file_service.py:570(inner fn),server_service.py:64,health_service.py:33_to_dict()injail_service.py:113,config_service.py:98,health_service.py:60_ensure_list()injail_service.py:134,config_service.py:112_is_truthy()inconfig_file_service.py:294,jail_config_service.py:169,filter_config_service.py:157,action_config_service.py:158_is_not_found_error()injail_service.py:155,config_service.py:147
Goal:
Create backend/app/utils/fail2ban_utils.py (or extend fail2ban_client.py) with public functions:
unwrap_response(response) -> object(the current_oklogic)response_to_dict(pairs) -> dict[str, object](the current_to_dictlogic)response_to_list(value) -> list[str](the current_ensure_listlogic)is_truthy(value: str) -> bool(the current_is_truthylogic)is_not_found_error(exc: Exception) -> bool(the current_is_not_found_errorlogic)
All 8 service files then import from this one location and delete their local copies.
Possible traps and issues:
_ok()inconfig_file_service.pyis defined as an inner function inside a specific method, not at module level — it must be extracted before it can be replaced.- Check all call sites for subtle differences:
_okin some files casts the return, in others it does not. Ensure the shared version covers all variants. _is_truthyis always paired with_TRUE_VALUES = frozenset({"true", "yes", "1"})— move both together. This overlaps with Task 9 (consolidating_TRUE_VALUES).- After replacement, run the full test suite — a signature mismatch will cause runtime
TypeErrors that may not be caught by static analysis.
Docs changes needed:
Docs/Architekture.md section 2.2 utils table — add fail2ban_utils.py (or document the extension to fail2ban_client.py) and describe the response-parsing helpers.
Why this is needed:
A future bug fix to the pickle-response parsing logic must be applied in 16 places. A discrepancy between copies already exists: the inner _ok in config_file_service.py does not cast the return, while the top-level versions do.
Task 7 — Adopt constants.py FAIL2BAN_SOCKET_TIMEOUT_SECONDS everywhere
Severity: High
Where:
backend/app/utils/constants.py line 16 — FAIL2BAN_SOCKET_TIMEOUT_SECONDS: Final[float] = 5.0 (never imported by any service).
Hardcoded local constants in:
backend/app/services/jail_service.py:65—10.0backend/app/services/server_service.py:31—10.0backend/app/services/jail_config_service.py:58—10.0backend/app/services/config_service.py:63—10.0backend/app/services/config_file_service.py:108—10.0backend/app/services/action_config_service.py:93—10.0backend/app/services/ban_service.py:55—5.0backend/app/services/health_service.py:30—5.0backend/app/services/fail2ban_metadata_service.py:65—5.0(inline literal, not even a named constant)
Goal:
constants.py should export two constants:
FAIL2BAN_SOCKET_TIMEOUT_SECONDS: Final[float] = 10.0— standard operational timeoutFAIL2BAN_SOCKET_FAST_TIMEOUT_SECONDS: Final[float] = 5.0— for health probes and metadata queries that should fail quickly
Every service deletes its local _SOCKET_TIMEOUT definition and imports the appropriate constant. Decide intentionally which services should use the fast vs standard timeout rather than having the difference be accidental.
Possible traps and issues:
- The current value in
constants.pyis5.0but most services use10.0. Changing to a single value would alter the behaviour of health probes or operational commands. Use two explicit constants to preserve the intentional difference. fail2ban_metadata_service.pyhardcodes5.0as an inline literal inside_resolve_db_path()with no named constant at all — easy to miss when grepping for_SOCKET_TIMEOUT.- Static analysis (mypy/ruff) will not catch this — must be found by grep or manual review.
Docs changes needed:
Docs/Architekture.md section 4.3 fail2ban communication — mention the timeout constants and their purpose.
Why this is needed: The current inconsistency (five services at 10 s, three at 5 s) appears to be accidental. A planned change to socket timeout policy requires touching 8+ files instead of one. A misconfigured timeout on a slow host will cause silent bans to fail; having one value to change makes this tunable.
Task 8 — Remove app/helpers/ indirection layer; fix circular imports structurally
Severity: Medium
Where:
backend/app/helpers/config_file_helpers.py — 18 lines, imports 5 private symbols from config_file_service and re-exports them unchanged.
backend/app/helpers/log_helpers.py — 22 lines, wraps log_service.preview_log and log_service.test_regex one-for-one.
backend/app/helpers/jail_helpers.py — wraps jail_service.reload_all as reload_jails.
Goal: Eliminate the helpers layer entirely by fixing the underlying circular imports that made it necessary:
config_file_helpers.pyexists becausejail_config_service,filter_config_service,action_config_serviceneed private symbols fromconfig_file_servicebut importing directly would create a cycle (after Task 5 splits the god object, the symbols these services need should live in a proper shared location and be importable directly).log_helpers.pyexists becauseconfig_serviceimports fromlog_servicebut some path would create a cycle. Movetest_regexandpreview_logtolog_serviceand import directly from there.jail_helpers.pyis a pointless one-function wrapper. Importjail_service.reload_alldirectly.
After the underlying causes are fixed, delete all three helper files and update all import sites.
Possible traps and issues:
- Do not delete helpers until the circular imports are resolved — doing so in the wrong order will cause
ImportErrorat startup. - Check
__all__inconfig_file_helpers.py— it exports symbols that were previously accessed by external code. After removal, all consumers must be updated to import from the canonical location. - This task should follow Task 5 (god object split) since that is the root cause of most of the cycles.
Docs changes needed:
Docs/Architekture.md section 2.1 — remove the helpers/ directory from the project structure diagram, or rename it to document its true role.
Why this is needed: The helpers layer hides import cycles rather than eliminating them and adds three levels of indirection to reach simple utility functions. A new developer reading the import chain cannot understand why the layer exists.
Task 9 — Consolidate _TRUE_VALUES into constants.py
Severity: Low (part of Task 6 cleanup)
Where:
_TRUE_VALUES = frozenset({"true", "yes", "1"}) defined in:
backend/app/services/config_file_service.py:117backend/app/services/jail_config_service.py:67backend/app/services/action_config_service.py:105
filter_config_service.py:35 imports it as a private symbol: from app.services.config_file_service import _TRUE_VALUES.
Goal:
Add FAIL2BAN_TRUTHY_VALUES: Final[frozenset[str]] = frozenset({"true", "yes", "1"}) to backend/app/utils/constants.py. All four service files import from there; local definitions and the cross-service private import are removed.
Possible traps and issues:
- The cross-service import of a private symbol (
_TRUE_VALUESwith leading underscore) breaks the layering contract — ifconfig_file_serviceever moves or renames the constant,filter_config_servicesilently breaks at runtime. Fix the import infilter_config_serviceas part of this task. - This constant is closely coupled with
_is_truthy()(Task 6) — consider combining both changes.
Docs changes needed:
Docs/Architekture.md section 2.2 utils — note that constants.py also holds the shared set of truthy string values used by config parsers.
Why this is needed: Importing a private symbol from a peer service module is a layering violation. If the implementation ever changes (e.g., adding "on" as a truthy value), all four definitions must be found and updated.
Task 10 — Replace ban_service deferred local imports with top-level imports
Severity: Medium
Where:
backend/app/services/ban_service.py lines 172, 340, 596, 695 — four separate from app.repositories.history_archive_repo import ... statements inside function bodies.
Goal:
Move all four history_archive_repo imports to the top of ban_service.py. If a true circular import exists, resolve it structurally (move the shared dependency to a lower-level module both can import) rather than deferring.
Possible traps and issues:
- Deferred imports suggest a hidden circular dependency:
ban_service→history_archive_repo→ban_serviceor similar. Before moving the import, trace the full import chain to confirm. If a cycle exists, it must be broken (e.g., by extracting a shared data model or helper that neither module imports the other for). history_archive_reposhould be a pure data-access module — it should not import from any service. If it does, that is the actual bug to fix.- Moving imports is safe if there is no cycle; confirm by running
python -c "from app.services import ban_service"after the change.
Docs changes needed: None.
Why this is needed: Deferred function-body imports hide module-level code dependencies, break standard import analysis tools, and run the import machinery on every call to those functions (Python memoises the result, but the attribute lookup overhead and code clarity cost are real). Four identical deferred imports in one file indicate a structural problem.
Task 11 — Replace history_sync.py deferred import inside loop
Severity: Medium
Where:
backend/app/tasks/history_sync.py line 79 — from app.repositories.history_archive_repo import archive_ban_event inside a while True: loop body.
Goal: Move the import to top-level. Same structural fix as Task 10 — if a circular import prevents this, break the cycle instead of deferring.
Possible traps and issues:
- This import fires on every iteration of the pagination loop (potentially hundreds of times per sync run). Python caches the module after the first import so there is no repeated I/O cost, but the
sys.moduleslookup and attribute access on every loop iteration is unnecessary overhead. - If moving the import to the top causes a circular import error, the circular dependency must be resolved before this task is closed.
Docs changes needed: None.
Why this is needed:
An import statement inside a loop is an unconventional pattern that confuses readers and signals an unresolved dependency issue. Standard tools (isort, ruff) will flag it.
Task 12 — Replace blocklist_service importlib workaround with proper injection
Severity: Medium
Where:
backend/app/services/blocklist_service.py lines 364–366:
jail_svc = importlib.import_module("app.services.jail_service")
ban_ip_fn = getattr(jail_svc, "ban_ip")
This was used as a workaround for a circular import between blocklist_service and jail_service.
Goal:
The import_source() function already accepts an optional ban_ip parameter for testing. Make this parameter required (not optional) at the service level and always pass jail_service.ban_ip from the router or from the scheduled task that calls import_source. Remove the importlib fallback entirely.
Possible traps and issues:
- All callers of
import_source()(the scheduled blocklist tasktasks/blocklist_import.pyand the manual-trigger router endpoint) must be updated to passban_ip=jail_service.ban_ip. - After making the parameter required, mypy will flag any caller that does not pass it.
- Verify the circular import that originally motivated this workaround — if
blocklist_serviceimportsjail_serviceat the top of the file andjail_serviceimportsblocklist_service, address the cycle properly (the dependency should only flow one way:blocklist_servicedepends onjail_service, not the reverse).
Docs changes needed: None.
Why this is needed:
importlib.import_module with a string breaks static analysis, IDE navigation, and refactoring tools. The function effectively already supports injection but bypasses it in production. The workaround made the production code path differ from the tested code path.
Task 13 — Wire geo_batch_lookup through dependency injection
Severity: Medium
Where:
Five call sites where geo_batch_lookup=geo_service.lookup_batch is passed explicitly as an argument:
backend/app/routers/dashboard.py:126and:178backend/app/routers/bans.py:80backend/app/routers/blocklist.py:143backend/app/routers/jails.py:602
Goal:
Add a GeoBatchLookupDep type alias and get_geo_batch_lookup() dependency provider to backend/app/dependencies.py, similar to existing deps like Fail2BanSocketDep. Routers declare geo_lookup: GeoBatchLookupDep in their handler signatures and stop passing geo_service.lookup_batch manually. Service functions that accept geo_batch_lookup as a parameter should continue doing so (good for testability) — the injection just moves from the router body to the dependency declaration.
Possible traps and issues:
- The callable type annotation for
geo_batch_lookupneeds to be precise enough for mypy. Either use aProtocolor import the exact type fromgeo_service. - Some service functions accept
geo_batch_lookupas a required parameter; after this change, always confirm those services are called with it. - This is low-risk but touches five routers and
dependencies.py— run the full test suite after.
Docs changes needed:
Docs/Architekture.md section 2.2 dependencies — mention the geo lookup dependency provider.
Why this is needed:
Manually threading geo_service.lookup_batch at call sites is not dependency injection — it is partial application scattered across files. Any future change to the geo lookup implementation (switching provider, adding caching policy) requires touching five routers instead of one dependency function.
Task 14 — Fix router prefix structure for config sub-routers
Severity: Medium
Where:
backend/app/routers/config.py — this aggregator router has prefix="/api/config" and includes four sub-routers. The sub-routers are defined as router = APIRouter() with no prefix:
backend/app/routers/jail_config.py:42backend/app/routers/filter_config.py:26backend/app/routers/action_config.py:25backend/app/routers/config_misc.py:23
Route paths like "/jails" in jail_config.py resolve to /api/config/jails only by implicit inheritance from the parent.
Goal:
Each sub-router should declare its own prefix explicitly (e.g. APIRouter(prefix="/jails", tags=["Jail Config"])). The aggregator config.py then calls router.include_router(jail_config.router) without any additional prefix argument. This makes each router self-describing and independently testable.
Possible traps and issues:
- Assert that the final URL paths are identical before and after the change. Use FastAPI's route inspection (
app.routes) in a test to validate. config_misc.pyroutes may not have a natural single sub-prefix — list its routes first and decide on a grouping.- Frontend
src/api/endpoints.tshard-codes the final URL paths; changing prefixes here will break the frontend unless the paths remain the same.
Docs changes needed:
Docs/Architekture.md section 8.2 endpoint groups — verify the listed paths match the actual registered routes after the fix.
Why this is needed:
A sub-router with no prefix is not independently testable via TestClient(sub_router_app) because its routes resolve to bare paths like /jails instead of /api/config/jails. Implicit inheritance from a parent makes the actual URL opaque to anyone reading the sub-router file.
Task 15 — Remove 7 ghost imports from routers/config.py
Severity: Medium
Where:
backend/app/routers/config.py lines 6–13:
from app.services import (
action_config_service,
config_file_service,
config_service,
filter_config_service,
jail_config_service,
jail_service,
log_service,
) # noqa: F401
None of these are referenced in the file body.
Goal: Remove the entire import block. If these imports are present to force module initialisation for side effects, document exactly what side effect is required and trigger it explicitly in the startup lifespan instead.
Possible traps and issues:
- If any of these modules register something at import time (class registries, decorators, etc.), removing the import will silently skip that registration. Audit each module for module-level side effects before deleting the import.
- The
# noqa: F401suppresses the linter warning that would normally catch unused imports — this is a code smell that was masking the issue. - After removing, run the full application startup and hit all config endpoints to confirm no registration was skipped.
Docs changes needed: None.
Why this is needed: Imports for side effects are an anti-pattern that creates hidden startup dependencies. They suppress linter warnings, make the code misleading, and force unnecessary module-level execution at startup even when the imports are truly vestigial.
Task 16 — Reset _backend_cmd_supported between test runs
Severity: Medium
Where:
backend/app/services/jail_service.py lines 77–78:
_backend_cmd_supported: bool | None = None
_backend_cmd_lock: asyncio.Lock = asyncio.Lock()
Module-level mutable globals that persist for the lifetime of the Python process.
Goal:
Provide a _reset_capability_cache() function (or use importlib.reload in tests) so the test suite can reset _backend_cmd_supported to None between test cases. Alternatively, move the cache into the Fail2BanMetadataService-style class instance (similar to how default_fail2ban_metadata_service is structured) so tests can construct a fresh instance per test.
Possible traps and issues:
asyncio.Lock()objects in Python 3.10+ are not bound to the running event loop at creation, so creating them at module level is safe. But if a test creates a new event loop (common withpytest-asyncio), an existingasyncio.Lockmay behave unexpectedly.- The cleanest fix is to encapsulate the cache and lock into a class, instantiate it as a module-level singleton identical to
Fail2BanMetadataService, and expose ainvalidate_backend_capability_cache()function for test teardown.
Docs changes needed: None.
Why this is needed:
A test that forces _backend_cmd_supported = False (by mocking a socket error) will corrupt the capability cache for all subsequent tests in the same process. This produces flaky test failures that are very hard to diagnose.
Task 17 — Remove unittest.mock.Mock check from production runtime_state
Severity: Low
Where:
backend/app/utils/runtime_state.py lines 15–16 and line 107:
from unittest.mock import Mock as _Mock
...
if runtime_settings is not None and _Mock is not None and isinstance(runtime_settings, _Mock):
return get_app_settings(app)
Goal:
Remove _Mock entirely from runtime_state.py. The test-isolation problem it was solving should be handled in test fixtures by injecting a real Settings instance rather than a Mock. Tests that currently rely on this fallback should be updated to use create_app(settings=my_test_settings) with a real test settings object.
Possible traps and issues:
- Find all tests that pass a
Mockasruntime_settingsand replace the mock with aSettings(**test_overrides)instance. This is the correct approach. - After removing the check, run the full test suite. Any test that previously relied on the
Mockdetection will fail fast rather than silently falling back.
Docs changes needed: None.
Why this is needed:
unittest.mock is a test library. Its presence in a production module is a layering violation that adds import overhead at every startup and puts test scaffolding logic in a path that runs on every request.
Task 18 — Move _COOKIE_NAME to constants.py
Severity: Low
Where:
backend/app/dependencies.py:61—_COOKIE_NAME = "bangui_session"backend/app/routers/auth.py:29—_COOKIE_NAME = "bangui_session"
Goal:
Add SESSION_COOKIE_NAME: Final[str] = "bangui_session" to backend/app/utils/constants.py. Both files import from there and remove their local definition.
Possible traps and issues:
- Search for any other files that hard-code the string
"bangui_session"directly (e.g., in tests) and update those too. - The cookie name is also referenced in the frontend (
document.cookieor fetch clients) — verify it matches.
Docs changes needed: None.
Why this is needed: If the cookie name ever needs to change (e.g., to support multi-tenant deployments or versioning), it must currently be changed in two backend files and potentially in frontend code. A single constant with a clear name is self-documenting.
Task 19 — Remove dead loop variable and asyncio import in auth_service
Severity: Low
Where:
backend/app/services/auth_service.py lines 10 and 75:
import asyncio # line 10
...
loop = asyncio.get_running_loop() # line 75 — assigned but never used
return await run_blocking(lambda: bool(bcrypt.checkpw(plain_bytes, hashed_bytes)))
Goal:
Delete line 75 (loop = asyncio.get_running_loop()). Remove import asyncio from line 10 since it will then be unused.
Possible traps and issues:
- Verify
asynciois not used anywhere else in the file before removing the import. - The
run_blocking(lambda: ...)call relies onasyncio.to_threadinternally — this is correct and does not need theloopvariable.
Docs changes needed: None.
Why this is needed:
Dead code misleads readers into thinking loop is used for something. The original intent was likely loop.run_in_executor(...) before run_blocking was introduced, but the old assignment was never cleaned up.
Task 20 — Offload ensure_jail_configs to a thread at startup
Severity: Low
Where:
backend/app/startup.py line 120:
ensure_jail_configs(Path(settings.fail2ban_config_dir) / "jail.d")
ensure_jail_configs in backend/app/utils/jail_config.py performs synchronous filesystem writes (Path.write_text, Path.mkdir) directly in the async startup path.
Goal:
Wrap the call in await asyncio.to_thread(ensure_jail_configs, Path(settings.fail2ban_config_dir) / "jail.d") or use await run_blocking(ensure_jail_configs, ...) to offload the blocking I/O to a thread pool.
Possible traps and issues:
ensure_jail_configsis currently synchronous (def, notasync def). It should remain synchronous — just call it viaasyncio.to_thread.- On typical local filesystems this completes in microseconds, but on NFS or Docker-volume mounts with high I/O latency it can block the event loop for tens of milliseconds, preventing any other startup task from running.
- Ensure the call remains before the scheduler is started (the blocklist-import task depends on the jail config files being present).
Docs changes needed: None.
Why this is needed: Synchronous I/O in the async startup path violates the "Async Everything" design principle (Architecture doc section 10). It blocks the event loop during startup and sets a bad precedent for future startup tasks.
Task 21 — Consolidate start_daemon / wait_for_fail2ban duplicates
Severity: Low
Where:
start_daemon() and wait_for_fail2ban() are defined identically in:
backend/app/services/config_file_service.py(lines approx. 560–620)backend/app/services/jail_config_service.py
Goal:
Keep one canonical implementation in config_file_service.py (as the shared infrastructure layer). Delete the copy from jail_config_service.py. Update jail_config_service to call the version from config_file_service directly (no intermediary helper).
Possible traps and issues:
- After Task 5 splits the god object, confirm whether
start_daemonandwait_for_fail2banbelong inconfig_file_service(shared infra) or in a dedicatedfail2ban_lifecycle.pyutil. Either placement is fine as long as there is only one. - Check if any tests import
start_daemonorwait_for_fail2banfromjail_config_servicedirectly — those tests will need updating.
Docs changes needed: None.
Why this is needed: Any change to the daemon-start retry logic (timeouts, exception handling) must currently be applied twice. This will diverge over time as the two copies evolve independently.
Task 22 — Remove DEFAULT_BLOCKING_EXECUTOR dead code
Severity: Low
Where:
backend/app/utils/async_utils.py lines 16–19:
DEFAULT_BLOCKING_EXECUTOR: ThreadPoolExecutor = ThreadPoolExecutor(
max_workers=16,
thread_name_prefix="bangui-blocking",
)
run_blocking() ignores this when executor=None (the default at every call site) and delegates to asyncio.to_thread instead. DEFAULT_BLOCKING_EXECUTOR is never passed to any run_blocking call.
Goal:
Either wire DEFAULT_BLOCKING_EXECUTOR as the actual default used inside run_blocking (replacing asyncio.to_thread) to get the deliberate size-16 pool, or delete the constant entirely if asyncio.to_thread (default executor, size = min(32, os.cpu_count() + 4)) is acceptable.
Possible traps and issues:
- If the 16-worker pool was intentional (to limit concurrent blocking operations under high load), then
run_blockingshould default to using it:return await loop.run_in_executor(DEFAULT_BLOCKING_EXECUTOR, functools.partial(func, *args, **kwargs)). asyncio.to_threaduses the loop's default executor which is configured at the OS level and may be smaller than 16. If blocking I/O throughput is a concern, the named pool is better.ThreadPoolExecutorcreated at module level is never explicitly shut down — add a shutdown hook in the app lifespan if the pool is kept.
Docs changes needed: None.
Why this is needed:
A named 16-worker thread pool that is never actually used consumes OS thread resources from startup for zero benefit. It also misleads readers who assume run_blocking uses it.
Task 23 — Fix InMemorySessionCache instantiated when disabled
Severity: Low
Where:
backend/app/main.py line 296:
app.state.session_cache = InMemorySessionCache()
This is unconditional — InMemorySessionCache is always created even when settings.session_cache_enabled = False.
Goal:
Instantiate the cache conditionally: use InMemorySessionCache() when session_cache_enabled is True and a NoOpSessionCache (a minimal SessionCache-protocol-compatible stub that always returns None from get and does nothing on set/invalidate) otherwise. The get_session_cache dependency in dependencies.py can then always return the cache without a None check because the correct implementation (real or no-op) is always present.
Possible traps and issues:
- The null-cache pattern avoids the
if cache_enabled:checks scattered inrequire_auth. After this change, move all cache logic into the cache implementations —require_authcallscache.get(token)unconditionally and the no-op always returnsNone. get_session_cachecurrently raises HTTP 503 ifapp_context.session_cache is None— after this change the cache is always non-None; remove that guard.
Docs changes needed:
Docs/Architekture.md section 6 — update the session validation cache note to describe the no-op pattern used for disabled mode.
Why this is needed:
Allocating an in-memory structure and its lock mechanisms for a feature that is disabled at configuration time wastes memory and creates a correctness gap: if code accidentally calls session_cache.set() when disabled, it will populate a cache that is never invalidated on logout.
Task 24 — Remove unused asyncio import from log_service.py
Severity: Low
Where:
backend/app/services/log_service.py line 9 — import asyncio is present but asyncio is never referenced in the file body.
Goal: Delete line 9.
Possible traps and issues:
- Confirm
asynciois not referenced in any conditional branch (including type-checkingTYPE_CHECKINGblocks).
Docs changes needed: None.
Why this is needed:
Unused imports add noise, slow down import time marginally, and may confuse linters or readers who assume there must be async behaviour somewhere in the module.
Task 25 — Extend service protocols coverage
Severity: Low (foundation for future testability)
Where:
backend/app/services/protocols.py — only AuthService and JailService are defined.
backend/app/repositories/protocols.py — only SessionRepository is defined.
No concrete service or repository module declares that it implements any protocol (class AuthServiceImpl(AuthService): does not exist anywhere).
Goal:
Add Protocol definitions to services/protocols.py for: BanService, ConfigService, HistoryService, GeoService, BlocklistService, HealthService, ServerService. Add Protocol definitions to repositories/protocols.py for: BlocklistRepository, ImportLogRepository, GeoCacheRepository, Fail2BanDbRepository.
Use typing.runtime_checkable on protocols that need instance-check support in tests. Expose the protocols as Dep type aliases in dependencies.py for each new service wired through DI.
Possible traps and issues:
- Not all services are injected through
dependencies.pytoday — many are called as modules directly (from app.services import ban_service). The protocol coverage task should go alongside a step to inject these throughDepends(). - Protocols must be kept in sync with the concrete implementations — a method added to the concrete module must be added to the protocol. Consider a test that instantiates a concrete module as the protocol type to catch divergence.
Docs changes needed:
Docs/Architekture.md section 10 design principles — reinforce the Dependency Inversion entry: all services injected via dependencies.py must have a corresponding Protocol.
Why this is needed:
Without protocols for the majority of services, unit tests must either use the real implementation (slow, requires running fail2ban) or patch methods at the module level (fragile). Protocols enable clean mock injection via Depends() in tests.