Files
BanGUI/Docs/Tasks.md
Lukas 83452ffc23 Refactor backend services and jail configuration
- Refactor action_config_service, filter_config_service, jail_config_service, and jail_service
- Add jail_socket utility module for socket communication
- Update test_jail_service with new test cases
- Update architecture and task documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-25 18:34:03 +02:00

339 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
### T-07 · Break cross-service import: `jail_config_service` imports `jail_service`
**Where found:** `backend/app/services/jail_config_service.py``import app.services.jail_service as jail_service`
**Why this is needed:** Services at the same layer should not depend on each other. Shared logic should move to a lower-level utility. This creates a dependency cycle risk and makes both services harder to test independently.
**Goal:** Shared socket operations extracted to `app/utils/` or `app/utils/jail_socket.py`. No service imports a sibling service.
**What to do:**
1. Identify which functions from `jail_service` are called by `jail_config_service`.
2. Extract shared low-level socket helpers to `app/utils/jail_socket.py` (or extend `fail2ban_client.py`).
3. Update both services to import from the utility layer.
**Possible traps and issues:**
- Some `jail_service` functions that are called may themselves import geo, config, or other services — trace the full dependency graph before extracting.
- APScheduler tasks reference `jail_service` — ensure they still work after any reorganisation.
**Docs changes needed:** `Docs/Architekture.md` — add rule: services must not import sibling services.
**Doc references:** `Docs/Architekture.md`
---
### T-08 · `_SOCKET_TIMEOUT` defined 6× — `constants.py` constant unused
**Where found:** `backend/app/utils/constants.py:16` (defines `FAIL2BAN_SOCKET_TIMEOUT_SECONDS = 5.0` but is never imported); `ban_service.py` (5.0), `jail_service.py` (10.0), `config_service.py` (10.0), `server_service.py` (10.0), `log_service.py` (10.0), `jail_config_service.py` (10.0), `config_file_utils.py` (10.0)
**Why this is needed:** `constants.py` has a module docstring saying "import from this module rather than hard-coding values" — and then nothing imports the socket timeout. The values also disagree: `ban_service` uses `5.0` while all others use `10.0`, creating silent inconsistency in timeout behaviour across endpoints.
**Goal:** One constant in `constants.py`, imported everywhere. Decide on a single default (or two named constants for fast/slow operations).
**What to do:**
1. In `constants.py`, define `FAIL2BAN_SOCKET_TIMEOUT_FAST: Final[float] = 5.0` (for health/metadata probes) and `FAIL2BAN_SOCKET_TIMEOUT: Final[float] = 10.0` (for command operations) — or a single value if appropriate.
2. Replace all local `_SOCKET_TIMEOUT` float literals with imports from `constants`.
3. Confirm `ban_service` should actually be `5.0` or `10.0` (intentional vs accidental discrepancy).
**Possible traps and issues:**
- Changing `ban_service` from 5.0 to 10.0 (or vice versa) changes observable timeout behaviour. Decide deliberately.
- `fail2ban_metadata_service.py` also has `socket_timeout: float = 5.0` hardcoded inline — include this in the sweep.
**Docs changes needed:** None.
**Doc references:** `backend/app/utils/constants.py`
---
### T-09 · `_since_unix` has two divergent implementations — latent window-boundary bug
**Where found:** `backend/app/services/ban_service.py:393` (uses `time.time()`, applies `_TIME_RANGE_SLACK_SECONDS = 60`); `backend/app/services/history_service.py:53` (uses `datetime.now(UTC).timestamp()`, no slack)
**Why this is needed:** The `ban_service` docstring explicitly explains why `time.time()` is used and why the 60-second slack is needed for consistency with fail2ban's timestamp storage. `history_service` quietly omits both, so history queries return a slightly different window boundary than ban queries for the same `TimeRange` parameter. Inconsistent windows cause "same data, different count" bugs in the UI.
**Goal:** Single `_since_unix` function in a shared utility module, with documented rationale, used by both services.
**What to do:**
1. Move `_since_unix` (with the `time.time()` + slack approach) to `app/utils/time_utils.py` (which already exists).
2. Delete both local implementations.
3. Import from `time_utils` in both services.
4. Decide consciously whether history queries should also use the slack (probably yes for consistency).
**Possible traps and issues:**
- Adding the 60-second slack to history queries may change row counts in tests that assert exact counts against seeded timestamps.
- Confirm `_TIME_RANGE_SLACK_SECONDS` belongs in `constants.py`.
**Docs changes needed:** `Docs/Backend-Development.md` — document the timestamp handling rationale.
**Doc references:** `backend/app/utils/time_utils.py`, `backend/app/utils/constants.py`
---
### T-10 · `get_geo_batch_lookup` is false injectability — module function pointer injection
**Where found:** `backend/app/dependencies.py``get_geo_batch_lookup()` returns `geo_service.lookup_batch` (a module-level function)
**Why this is needed:** The dependency provider exists to give the appearance of injectable geo lookup, but because `geo_service` uses module-level global state (T-04), tests that inject a different callable into routers still have the global cache active. The abstraction provides type-level indirection without runtime isolation.
**Goal:** Once T-04 is done (GeoCache as an object), inject the `GeoCache` instance and call methods on it directly. The `GeoBatchLookup` callable protocol becomes a method reference on the injected instance.
**What to do:**
1. Complete T-04 first.
2. Update `get_geo_batch_lookup` to retrieve `GeoCache` from `app.state` and return its `lookup_batch` method.
3. Or inject `GeoCache` directly and let routers call `.lookup_batch()` on it.
**Possible traps and issues:** Blocked on T-04.
**Docs changes needed:** None beyond T-04.
**Doc references:** `backend/app/dependencies.py`
---
### T-11 · Repositories injected as module references via `cast()` — structural type-safety gap
**Where found:** `backend/app/dependencies.py``get_session_repo()`, `get_blocklist_repo()`, `get_settings_repo()`, `get_import_log_repo()`, `get_history_archive_repo()`, `get_geo_cache_repo()`, `get_fail2ban_db_repo()` all return the module itself cast to the Protocol type.
**Why this is needed:** The `cast()` call is a signal that the type system is being overridden. Modules pass Protocol structural checks only because their top-level `async def` functions happen to match the Protocol method signatures. This is fragile — a module rename, a function rename, or an added required parameter will silently pass mypy but fail at runtime.
**Goal:** Repository modules become proper singleton instances, or the dependency providers are acknowledged as module-adapters with explicit documentation.
**What to do (option A — correct):**
1. Convert each repository module's functions into a class with the same method signatures.
2. Instantiate singletons at startup and store on `app.state` or as module-level instances.
3. Update dependency providers to return the instance without `cast()`.
**What to do (option B — minimal):**
1. Document in each `get_*_repo` provider why the module-as-Protocol pattern is intentional.
2. Add a CI check (or mypy plugin) that validates structural compatibility doesn't silently break.
**Possible traps and issues:**
- Option A is a significant refactor affecting all repository call sites.
- Option B risks the pattern silently breaking in future.
**Docs changes needed:** `Docs/Backend-Development.md` — document repository injection pattern and why it works.
**Doc references:** `Docs/Backend-Development.md`, `backend/app/repositories/protocols.py`
---
### T-12 · Apply `useListData` consistently across all data-fetching hooks
**Where found:** `frontend/src/hooks/useJailList.ts`, `useJailDetail.ts`, `useServerStatus.ts`, `useBanTrend.ts`, `useDashboardCountryData.ts` — all re-implement abort-controller / loading / error state manually. `useListData.ts` exists and is used by `useBlocklists`, `useJailConfigs`, `useActionList`, `useFilterList`.
**Why this is needed:** At least 5 hooks implement the same 40-line pattern. Any fix to the pattern (e.g. abort-guard in `.finally()`) must be applied to every copy independently. `useHistory` has a real bug because of this (see T-18).
**Goal:** All hooks that load a list and need refresh semantics use `useListData` or a shared base.
**What to do:**
1. Audit all hooks for the manual abort-controller pattern.
2. Refactor `useJailList` first (cleanest candidate — no mutations).
3. For hooks with side-effects beyond listing (e.g. `useJailDetail`), split into data hook + command hook (see T-13) and use `useListData` for the data half.
4. Extend `useListData` if needed to support `onSuccess` callbacks returning non-array data (e.g. `total`).
**Possible traps and issues:**
- `useListData` currently requires `selector: (response) → TItem[]`. Hooks that expose `total` alongside items need `onSuccess` to capture it — the `onSuccess` callback already exists in `UseListDataOptions`.
- `useServerStatus` has a polling interval and window-focus refetch that `useListData` does not support — may need a `usePolledData` variant or extension.
**Docs changes needed:** None.
**Doc references:** `frontend/src/hooks/useListData.ts`
---
### T-13 · Split `useJailDetail` — SRP violation (read state + write commands in one hook)
**Where found:** `frontend/src/hooks/useJailDetail.ts`
**Why this is needed:** The hook manages fetch state AND exposes 8 write operations (`start`, `stop`, `reload`, `setIdle`, `addIp`, `removeIp`, `toggleIgnoreSelf`, `load`). Read concerns and command concerns are independent. The consumer (`JailDetailPage`) passes them separately through props anyway, so the UI doesn't depend on them being co-located.
**Goal:** `useJailData(name)` for reading, `useJailCommands(name, onSuccess)` for mutations.
**What to do:**
1. Create `useJailData(name): { jail, ignoreList, ignoreSelf, loading, error, refresh }` using `useListData` or the abort-controller pattern.
2. Create `useJailCommands(name, onSuccess: () => void): { start, stop, reload, setIdle, addIp, removeIp, toggleIgnoreSelf }` — each command calls the API and then calls `onSuccess()` to trigger a refresh.
3. In `JailDetailPage`, call both hooks and pass results to child components.
4. Delete `useJailDetail.ts`.
**Possible traps and issues:**
- `JailDetailPage` destructures all properties from `useJailDetail` in a single line — update the destructuring.
- Commands currently call `load()` directly. The `onSuccess` callback pattern keeps them decoupled from the data hook's internals.
**Docs changes needed:** None.
**Doc references:** `frontend/src/hooks/useJailDetail.ts`, `frontend/src/pages/JailDetailPage.tsx`
---
### T-14 · Move jail detail sub-sections from `pages/jail/` to `components/jail/`
**Where found:** `frontend/src/pages/jail/``JailInfoSection`, `PatternsSection`, `BantimeEscalationSection`, `IgnoreListSection`, `jailDetailPageStyles.ts`. `BannedIpsSection` is in `frontend/src/components/jail/` (correct location).
**Why this is needed:** The project convention is `pages/` for route-level components and `components/` for reusable UI. Sub-sections are reusable UI building blocks, not route components. `BannedIpsSection` already lives in the right place. The inconsistency makes the directory structure misleading.
**Goal:** All `*Section` components for jail detail under `components/jail/`. `pages/jail/` deleted or contains only route-level entry points.
**What to do:**
1. Move `JailInfoSection`, `PatternsSection`, `BantimeEscalationSection`, `IgnoreListSection`, and `jailDetailPageStyles.ts` to `frontend/src/components/jail/`.
2. Update imports in `JailDetailPage.tsx`.
3. Remove empty `pages/jail/` directory.
**Possible traps and issues:**
- Confirm no other page imports from `pages/jail/` before deleting it.
- Update any path-based test imports.
**Docs changes needed:** `Docs/Web-Development.md` — document the `pages/` vs `components/` convention explicitly.
**Doc references:** `Docs/Web-Development.md`
---
### T-15 · Replace `window` event bus for session expiry with React context callback
**Where found:** `frontend/src/api/client.ts``window.dispatchEvent(new Event(SESSION_EXPIRED_EVENT))`; `frontend/src/providers/AuthProvider.tsx``window.addEventListener(SESSION_EXPIRED_EVENT, ...)`
**Why this is needed:** Using `window` as a side-channel bypasses React's component tree, breaks in non-browser environments (SSR, test environments without full JSDOM), and creates an invisible coupling between the API client and the auth provider. It also fires globally — any code anywhere that dispatches `bangui:session-expired` on window will trigger a logout, with no tracing possible.
**Goal:** The API client receives an `onUnauthorized` callback injected at the provider level, called directly instead of via a DOM event.
**What to do:**
1. Add an `onUnauthorized` callback to the API client (e.g. a module-level setter `setUnauthorizedHandler(fn)`).
2. In `AuthProvider`, call `setUnauthorizedHandler(handleSessionExpired)` on mount and reset it on unmount.
3. In `client.ts`, call the handler directly instead of `window.dispatchEvent`.
4. Remove `SESSION_EXPIRED_EVENT` string constant and the `window.addEventListener` in `AuthProvider`.
**Possible traps and issues:**
- The module-level handler setter is still global state — an alternative is to pass the handler as a parameter to `request()` via a context object, but that changes the API signature more significantly.
- Tests that mock `window.dispatchEvent` need updating.
- SSR / Vitest environments that already mock `window` may need adjustment.
**Docs changes needed:** None.
**Doc references:** `frontend/src/api/client.ts`, `frontend/src/providers/AuthProvider.tsx`
---
### T-16 · Centralise `PAGE_SIZE` frontend constants
**Where found:** `frontend/src/hooks/useBans.ts:14` (`PAGE_SIZE = 100`); `frontend/src/pages/HistoryPage.tsx:45` (`PAGE_SIZE = 50`)
**Why this is needed:** Page sizes can silently diverge from backend defaults. If the backend changes `_DEFAULT_PAGE_SIZE`, the frontend won't know. Having multiple files define the same concept differently is also misleading.
**Goal:** All pagination constants in `frontend/src/utils/constants.ts`.
**What to do:**
1. Add `BAN_PAGE_SIZE = 100`, `HISTORY_PAGE_SIZE = 50` to `frontend/src/utils/constants.ts` (create it if it doesn't exist).
2. Replace local `const PAGE_SIZE = ...` in each hook/page with imports.
**Possible traps and issues:** Trivial. Verify test snapshots don't hard-code the old inline constant.
**Docs changes needed:** None.
**Doc references:** `frontend/src/utils/constants.ts`
---
### T-17 · `useHistory` is missing abort-signal guards — stale state update bug
**Where found:** `frontend/src/hooks/useHistory.ts``.then()`, `.catch()`, `.finally()` callbacks update state without checking `abortRef.current.signal.aborted`
**Why this is needed:** Every other data-fetching hook in the codebase guards all state-update callbacks against aborted signals. `useHistory` does not. If the component unmounts mid-request, `setItems`, `setTotal`, `setLoading` will all fire on an unmounted component. In React 18 this is a no-op but it still indicates a broken invariant and `handleFetchError` could misclassify the abort as a real error (depends on whether `fetch` threw `AbortError` or the API module swallowed it).
**Goal:** All callbacks in `useHistory` check the abort signal before mutating state.
**What to do:**
1. Capture the controller in a local variable inside `load()` (already done: `abortRef.current = new AbortController()`).
2. In `.then()`: add `if (abortRef.current.signal.aborted) return;` before `setItems(...)`.
3. In `.catch()`: add the same guard before `handleFetchError(...)`.
4. In `.finally()`: add `if (!abortRef.current.signal.aborted)` before `setLoading(false)`.
**Possible traps and issues:**
- `abortRef.current` may have been replaced by a new controller before the callback fires. Capture the controller in a closure variable at the top of `load()`: `const controller = abortRef.current`.
**Docs changes needed:** None.
**Doc references:** `frontend/src/hooks/useHistory.ts`
---
### T-18 · Merge `useDashboardCountryData` and `useMapData` — near-identical hooks
**Where found:** `frontend/src/hooks/useDashboardCountryData.ts` and `frontend/src/hooks/useMapData.ts`
**Why this is needed:** Both hooks call `fetchBansByCountry`, maintain the same state shape (`countries`, `countryNames`, `bans`, `total`, `loading`, `error`), and implement the same abort-controller pattern. The only behavioural difference is that `useMapData` adds a 300ms debounce. Any bug fix must be applied to both.
**Goal:** A single `useBansByCountry` base hook; `useMapData` adds the debounce on top.
**What to do:**
1. Create `useBansByCountry(range, origin, source, countryCode?)` — the shared fetch logic without debounce.
2. Refactor `useDashboardCountryData` to wrap `useBansByCountry`.
3. Refactor `useMapData` to wrap `useBansByCountry` and add the debounce layer.
4. Keep the existing hook names as thin wrappers to preserve call sites.
**Possible traps and issues:**
- `useMapData` returns `{ data }` (the full response object) whereas `useDashboardCountryData` unpacks `countries`, `countryNames`, `bans`, `total`. Normalise the return shape before collapsing.
- Tests for each hook test them independently — update or merge.
**Docs changes needed:** None.
**Doc references:** `frontend/src/hooks/useDashboardCountryData.ts`, `frontend/src/hooks/useMapData.ts`
---
### T-19 · Move `DashboardFilterProvider` — page-scoped provider in wrong directory
**Where found:** `frontend/src/providers/DashboardFilterProvider.tsx` — instantiated only inside `DashboardPage.tsx`
**Why this is needed:** The `providers/` directory implies app-wide providers (alongside `AuthProvider`, `ThemeProvider`, `TimezoneProvider`). `DashboardFilterProvider` wraps only `DashboardPageContent` and is not used anywhere else. Its placement implies reuse that doesn't exist, misleading future contributors about its scope.
**Goal:** Co-located with its only consumer.
**What to do:**
1. Move `DashboardFilterProvider.tsx` to `frontend/src/pages/` (alongside `DashboardPage.tsx`) or to `frontend/src/pages/dashboard/` if the page is split into a subdirectory.
2. Update imports in `DashboardPage.tsx` and any tests.
**Possible traps and issues:** Only `DashboardPage.tsx` imports it — confirm with grep before moving.
**Docs changes needed:** `Docs/Web-Development.md` — document what belongs in `providers/` (app-wide) vs co-located.
**Doc references:** `Docs/Web-Development.md`
---
### T-20 · Replace inline `style={{}}` objects with `makeStyles` classes
**Where found:** `frontend/src/pages/map/MapBansTable.tsx` (multiple), `pages/JailDetailPage.tsx`, `pages/HistoryPage.tsx`, `pages/history/IpDetailView.tsx`, `components/WorldMap.tsx`, `components/TopCountriesPieChart.tsx`, `components/TopCountriesBarChart.tsx`
**Why this is needed:** The project uses Fluent UI's `makeStyles` with atomic CSS caching. Inline `style={{}}` objects are allocated on every render, bypass the atomic CSS cache, and are inconsistent with the established pattern. Exceptions are acceptable only for truly dynamic values (e.g. tooltip `left`/`top` that change on mouse move) — static layout values must use `makeStyles`.
**Goal:** All static layout properties moved to `makeStyles`. Inline styles only for genuinely dynamic values.
**What to do:**
1. Audit each file listed above.
2. For static `display: flex`, `gap`, `margin`, `padding` — move to `makeStyles` in that component's style block.
3. Keep inline `style` only where the value is truly dynamic at runtime (e.g. `WorldMap` tooltip position, `TopCountriesBarChart` chart height).
**Possible traps and issues:**
- `MapBansTable.tsx` has several consecutive inline `style` objects for its pagination row — these can be collapsed into one named class.
- Some components use both `tokens.*` values and inline styles — ensure `makeStyles` is imported where it isn't already.
**Docs changes needed:** `Docs/Web-Development.md` — add styling rule: use `makeStyles` for all static styles; inline `style` only for runtime-dynamic values.
**Doc references:** `Docs/Web-Development.md`, Fluent UI v9 docs on `makeStyles`
---
### T-21 · `Fail2BanMetadataService` has inline socket timeout hardcoded
**Where found:** `backend/app/services/fail2ban_metadata_service.py:64``socket_timeout: float = 5.0`
**Why this is needed:** Part of the same constant duplication as T-08. This file doesn't use `_SOCKET_TIMEOUT` as a module constant — it hardcodes `5.0` inline as a local variable. Should use the constant from `constants.py` once T-08 is done.
**Goal:** After T-08, replace with `FAIL2BAN_SOCKET_TIMEOUT_FAST` import.
**What to do:** Covered by T-08 sweep.
**Possible traps and issues:** None — dependent on T-08.
**Docs changes needed:** None.
**Doc references:** `backend/app/services/fail2ban_metadata_service.py`