Files
BanGUI/Docs/Tasks.md
Lukas d6da81131f Add tests for background tasks and fail2ban client utility
- tests/test_tasks/test_blocklist_import.py: 14 tests, 96% coverage
- tests/test_tasks/test_health_check.py: 12 tests, 100% coverage
- tests/test_tasks/test_geo_cache_flush.py: 8 tests, 100% coverage
- tests/test_services/test_fail2ban_client.py: 24 new tests, 96% coverage

Total: 50 new tests (628 → 678 passing). Overall coverage 85% → 87%.
ruff, mypy --strict, tsc, and eslint all clean.
2026-03-13 10:29:22 +01:00

21 KiB

BanGUI — Task List

This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.


Task 1 — Make Geo-Cache Persistent DONE

Goal: Minimise calls to the external geo-IP lookup service by caching results in the database.

Details:

  • Currently geo-IP results may only live in memory and are lost on restart. Persist every successful geo-lookup result into the database so the external service is called as rarely as possible.
  • On each geo-lookup request, first query the database for a cached entry for that IP. Only call the external service if no cached entry exists (or the entry has expired, if a TTL policy is desired).
  • After a successful external lookup, write the result back to the database immediately.
  • Review the existing implementation in app/services/geo_service.py and the related repository/model code. Verify that:
    • The DB table/model for geo-cache entries exists and has the correct schema (IP, country, city, latitude, longitude, looked-up timestamp, etc.).
    • The repository layer exposes get_by_ip and upsert (or equivalent) methods.
    • The service checks the cache before calling the external API.
    • Bulk inserts are used where multiple IPs need to be resolved at once (see Task 3).

Task 2 — Fix geo_lookup_request_failed Warnings DONE

Goal: Investigate and fix the frequent geo_lookup_request_failed log warnings that occur with an empty error field.

Resolution: The root cause was str(exc) returning "" for aiohttp exceptions with no message (e.g. ServerDisconnectedError). Fixed by:

  • Replacing error=str(exc) with error=repr(exc) in both lookup() and _batch_api_call() so the exception class name is always present in the log.
  • Adding exc_type=type(exc).__name__ field to every network-error log event for easy filtering.
  • Moving import aiohttp from the TYPE_CHECKING block to a regular runtime import and replacing the raw-float timeout arguments with aiohttp.ClientTimeout(total=...), removing the # type: ignore[arg-type] workarounds.
  • Three new tests in TestErrorLogging verify empty-message exceptions are correctly captured.

Observed behaviour (from container logs):

{"ip": "197.221.98.153", "error": "", "event": "geo_lookup_request_failed", ...}
{"ip": "197.231.178.38", "error": "", "event": "geo_lookup_request_failed", ...}
{"ip": "197.234.201.154", "error": "", "event": "geo_lookup_request_failed", ...}
{"ip": "197.234.206.108", "error": "", "event": "geo_lookup_request_failed", ...}

Details:

  • Open app/services/geo_service.py and trace the code path that emits the geo_lookup_request_failed event.
  • The error field is empty, which suggests the request may silently fail (e.g. the external service returns a non-200 status, an empty body, or the response parsing swallows the real error).
  • Ensure the actual HTTP status code and response body (or exception message) are captured and logged in the error field so failures are diagnosable.
  • Check whether the external geo-IP service has rate-limiting or IP-range restrictions that could explain the failures.
  • Add proper error handling: distinguish between transient errors (timeout, 429, 5xx) and permanent ones (invalid IP, 404) so retries can be applied only when appropriate.

Task 3 — Non-Blocking Web Requests & Bulk DB Operations DONE

Goal: Ensure the web UI remains responsive while geo-IP lookups and database writes are in progress.

Resolution:

  • Bulk DB writes: geo_service.lookup_batch now collects resolved IPs into pos_rows / neg_ips lists across the chunk loop and flushes them with two executemany calls per chunk instead of one execute per IP.
  • lookup_cached_only: New function that returns (geo_map, uncached) immediately from the in-memory + SQLite cache with no API calls. Used by bans_by_country for its hot path.
  • Background geo resolution: bans_by_country calls lookup_cached_only for an instant response, then fires asyncio.create_task(geo_service.lookup_batch(uncached, …)) to populate the cache in the background for subsequent requests.
  • Batch enrichment for get_active_bans: jail_service.get_active_bans now accepts http_session / app_db and resolves all banned IPs in a single lookup_batch call (chunked 100-IP batches) instead of firing one coroutine per IP through asyncio.gather.
  • 12 new tests across test_geo_service.py, test_jail_service.py, and test_ban_service.py; ruff and mypy --strict clean; 145 tests pass.

Details:

  • After the geo-IP service was integrated, web UI requests became slow or appeared to hang because geo lookups and individual DB writes block the async event loop.
  • Bulk DB operations: When multiple IPs need geo data at once (e.g. loading the ban list), collect all uncached IPs and resolve them in a single batch. Use bulk INSERT … ON CONFLICT (or equivalent) to write results to the DB in one round-trip instead of one query per IP.
  • Non-blocking external calls: Make sure all HTTP calls to the external geo-IP service use an async HTTP client (httpx.AsyncClient or similar) so the event loop is never blocked by network I/O.
  • Non-blocking DB access: Ensure all database operations use the async SQLAlchemy session (or are off-loaded to a thread) so they do not block request handling.
  • Background processing: Consider moving bulk geo-lookups into a background task (e.g. the existing task infrastructure in app/tasks/) so the API endpoint returns immediately and the UI is updated once results are ready.

Task 4 — Better Jail Configuration DONE

Goal: Expose the full fail2ban configuration surface (jails, filters, actions) in the web UI.

Reference config directory: /home/lukas/Volume/repo/BanGUI/Docker/fail2ban-dev-config/fail2ban/

Implementation summary:

  • Backend: New app/models/file_config.py, app/services/file_config_service.py, and app/routers/file_config.py with full CRUD for jail.d/, filter.d/, action.d/ files. Path-traversal prevention via _assert_within() + _validate_new_name(). app/config.py extended with fail2ban_config_dir setting.
  • Backend (socket): Added delete_log_path() to config_service.py + DELETE /api/config/jails/{name}/logpath endpoint.
  • Docker: Both compose files updated with BANGUI_FAIL2BAN_CONFIG_DIR env var; volume mount changed :ro:rw.
  • Frontend: New Jail Files, Filters, Actions tabs in ConfigPage.tsx. Delete buttons for log paths in jail accordion. Full API call layer in api/config.ts + new types in types/config.ts.
  • Tests: 44 service unit tests + 19 router integration tests; all pass; ruff clean.

Task 4c audit findings — options not yet exposed in the UI:

  • Per-jail: ignoreip, bantime.increment, bantime.rndtime, bantime.maxtime, bantime.factor, bantime.formula, bantime.multipliers, bantime.overalljails, ignorecommand, prefregex, timezone, journalmatch, usedns, backend (read-only shown), destemail, sender, action override
  • Global: allowipv6, before includes

4a — Activate / Deactivate Jail Configs DONE

  • Listed all .conf and .local files in jail.d/ via GET /api/config/jail-files.
  • Toggle enabled/disabled via PUT /api/config/jail-files/{filename}/enabled which patches the enabled = true/false line in the config file, preserving all comments.
  • Frontend: Jail Files tab with enabled Switch per file and read-only content viewer.

4b — Editable Log Paths DONE

  • Added DELETE /api/config/jails/{name}/logpath?log_path=… endpoint (uses fail2ban socket set <jail> dellogpath).
  • Frontend: each log path in the Jails accordion now has a dismiss button to remove it.

4c — Audit Missing Config Options DONE

  • Audit findings documented above.

4d — Filter Configuration (filter.d) DONE

  • Listed all filter files via GET /api/config/filters.
  • View and edit individual filters via GET/PUT /api/config/filters/{name}.
  • Create new filter via POST /api/config/filters.
  • Frontend: Filters tab with accordion-per-file, editable textarea, save button, and create-new form.

4e — Action Configuration (action.d) DONE

  • Listed all action files via GET /api/config/actions.
  • View and edit individual actions via GET/PUT /api/config/actions/{name}.
  • Create new action via POST /api/config/actions.
  • Frontend: Actions tab with identical structure to Filters tab.

4f — Create New Configuration Files DONE

  • Create filter and action files via POST /api/config/filters and POST /api/config/actions with name validation (_SAFE_NAME_RE) and 512 KB content size limit.
  • Frontend: "New Filter/Action File" section at the bottom of each tab with name input, content textarea, and create button.

Task 5 — Add Log Path to Jail (Config UI) DONE

Goal: Allow users to add new log file paths to an existing fail2ban jail directly from the Configuration → Jails tab, completing the "Add Log Observation" feature from Features.md § 6.3.

Implementation summary:

  • ConfigPage.tsx JailAccordionPanel:
    • Added addLogPath and AddLogPathRequest imports.
    • Added state: newLogPath, newLogPathTail (default true), addingLogPath.
    • Added handleAddLogPath callback: calls addLogPath(jail.name, { log_path, tail }), appends path to logPaths state, clears input, shows success/error feedback.
    • Added inline "Add Log Path" form below the existing log-path list — an Input for the file path, a Switch for tail/head selection, and an "Add" button with aria-label="Add log path".
  • 6 new frontend tests in src/components/__tests__/ConfigPageLogPath.test.tsx covering: rendering, disabled state, enabled state, successful add, success message, and API error surfacing.
  • tsc --noEmit, eslint: zero errors.

Task 6 — Expose Ban-Time Escalation Settings DONE

Goal: Surface fail2ban's incremental ban-time escalation settings in the web UI, as called out in Features.md § 5 (Jail Detail) and Features.md § 6 (Edit Configuration).

Features.md requirements:

  • §5 Jail Detail: "Shows ban-time escalation settings if incremental banning is enabled (factor, formula, multipliers, max time)."
  • §6 Edit Configuration: "Configure ban-time escalation: enable incremental banning and set factor, formula, multipliers, maximum ban time, and random jitter."

Tasks:

6a — Backend: Add BantimeEscalation model and extend jail + config models

  • Add BantimeEscalation Pydantic model with fields: increment (bool), factor (float|None), formula (str|None), multipliers (str|None), max_time (int|None), rnd_time (int|None), overall_jails (bool).
  • Add bantime_escalation: BantimeEscalation | None field to Jail in app/models/jail.py.
  • Add escalation fields to JailConfig in app/models/config.py (mirrored via BantimeEscalation).
  • Add escalation fields to JailConfigUpdate in app/models/config.py.

6b — Backend: Read escalation settings from fail2ban socket

  • In jail_service.get_jail_detail(): fetch the seven bantime.* socket commands in the existing asyncio.gather() block; populate bantime_escalation on the returned Jail.
  • In config_service.get_jail_config(): same gather pattern; populate bantime_escalation on JailConfig.

6c — Backend: Write escalation settings to fail2ban socket

  • In config_service.update_jail_config(): when JailConfigUpdate.bantime_escalation is provided, set <jail> bantime.increment, and any non-None sub-fields.

6d — Frontend: Update types

  • types/jail.ts: add BantimeEscalation interface; add bantime_escalation: BantimeEscalation | null to Jail.
  • types/config.ts: add bantime_escalation: BantimeEscalation | null to JailConfig; add BantimeEscalationUpdate and include it in JailConfigUpdate.

6e — Frontend: Show escalation in Jail Detail

  • In JailDetailPage.tsx, add a "Ban-time Escalation" info card that is only rendered when bantime_escalation?.increment === true.
  • Show: increment enabled indicator, factor, formula, multipliers, max time, random jitter.

6f — Frontend: Edit escalation in ConfigPage

  • In ConfigPage.tsx JailAccordionPanel, add a "Ban-time Escalation" section with:
    • A Switch for increment (enable/disable).
    • When enabled: numeric inputs for max_time (seconds), rnd_time (seconds), factor; text inputs for formula and multipliers; Switch for overall_jails.
    • Saving triggers updateJailConfig with the escalation payload.

6g — Tests

  • Backend: unit tests in test_config_service.py verifying that escalation fields are fetched and written.
  • Backend: router integration tests in test_config.py verifying the escalation round-trip.
  • Frontend: update ConfigPageLogPath.test.tsx mock JailConfig to include bantime_escalation: null.

Task 7 — Expose Remaining Per-Jail Config Fields (usedns, date_pattern, prefregex) DONE

Goal: Surface the three remaining per-jail configuration fields — DNS look-up mode (usedns), custom date pattern (datepattern), and prefix regex (prefregex) — in both the backend API response model and the Configuration → Jails UI, completing the editable jail config surface defined in Features.md § 6.

Implementation summary:

  • Backend model (app/models/config.py):
    • Added use_dns: str (default "warn") and prefregex: str (default "") to JailConfig.
    • Added prefregex: str | None to JailConfigUpdate (None = skip, "" = clear, non-empty = set).
  • Backend service (app/services/config_service.py):
    • Added get <jail> usedns and get <jail> prefregex to the asyncio.gather() block in get_jail_config().
    • Populated use_dns and prefregex on the returned JailConfig.
    • Added prefregex validation (regex compile-check) and set <jail> prefregex write in update_jail_config().
  • Frontend types (types/config.ts):
    • Added use_dns: string and prefregex: string to JailConfig.
    • Added prefregex?: string | null to JailConfigUpdate.
  • Frontend ConfigPage (ConfigPage.tsx JailAccordionPanel):
    • Added state and editable Input for date_pattern (hints "Leave blank for auto-detect").
    • Added state and Select dropdown for dns_mode with options yes / warn / no / raw.
    • Added state and editable Input for prefregex (hints "Leave blank to disable").
    • All three included in handleSave() update payload.
  • Tests: 8 new service unit tests + 3 new router integration tests; ConfigPageLogPath.test.tsx mock updated; 628 tests pass; 85% coverage; ruff + mypy + tsc + eslint clean.

Goal: Surface the three remaining per-jail configuration fields — DNS look-up mode (usedns), custom date pattern (datepattern), and prefix regex (prefregex) — in both the backend API response model and the Configuration → Jails UI, completing the editable jail config surface defined in Features.md § 6.

Background: Task 4c audit found several options not yet exposed in the UI. Task 6 covered ban-time escalation. This task covers the three remaining fields that are most commonly changed through the fail2ban configuration:

  • usedns — controls whether fail2ban resolves hostnames ("yes" / "warn" / "no" / "raw").
  • datepattern — custom date format for log parsing; empty / unset means fail2ban auto-detects.
  • prefregex — a prefix regex prepended to every failregex for pre-filtering log lines; empty means disabled.

Tasks:

7a — Backend: Add use_dns and prefregex to JailConfig model

  • Add use_dns: str field to JailConfig in app/models/config.py (default "warn").
  • Add prefregex: str field to JailConfig (default ""; empty string means not set).
  • Add prefregex: str | None to JailConfigUpdate (None = skip, "" = clear, non-empty = set).

7b — Backend: Read usedns and prefregex from fail2ban socket

  • In config_service.get_jail_config(): add get <jail> usedns and get <jail> prefregex to the existing asyncio.gather() block.
  • Populate use_dns and prefregex on the returned JailConfig.

7c — Backend: Write prefregex to fail2ban socket

  • In config_service.update_jail_config(): validate prefregex with _validate_regex if non-empty, then set <jail> prefregex <value> when JailConfigUpdate.prefregex is not None.

7d — Frontend: Update types

  • types/config.ts: add use_dns: string and prefregex: string to JailConfig.
  • types/config.ts: add prefregex?: string | null to JailConfigUpdate.

7e — Frontend: Edit date_pattern, use_dns, and prefregex in ConfigPage

  • In ConfigPage.tsx JailAccordionPanel, add:
    • Text input for date_pattern (empty = auto-detect; non-empty value is sent as-is).
    • Select dropdown for use_dns with options "yes" / "warn" / "no" / "raw".
    • Text input for prefregex (empty = not set / cleared).
    • All three are included in the handleSave() update payload.

7f — Tests

  • Backend: add usedns and prefregex entries to _DEFAULT_JAIL_RESPONSES in test_config_service.py.
  • Backend: add unit tests verifying new fields are fetched and prefregex is written via update_jail_config().
  • Backend: update _make_jail_config() in test_config.py to include use_dns and prefregex.
  • Backend: add router integration tests for the new update fields.
  • Frontend: update ConfigPageLogPath.test.tsx mock JailConfig to include use_dns and prefregex.

Task 8 — Improve Test Coverage for Background Tasks and Utilities DONE

Goal: Raise test coverage for the background-task modules and the fail2ban client utility to ≥ 80 %, closing the critical-path gap flagged in the Step 6.2 review.

Coverage before this task (from last full run):

Module Before
app/tasks/blocklist_import.py 23 %
app/tasks/health_check.py 43 %
app/tasks/geo_cache_flush.py 60 %
app/utils/fail2ban_client.py 58 %

8a — Tests for blocklist_import

Create tests/test_tasks/test_blocklist_import.py:

  • _run_import happy path: mock blocklist_service.import_all, verify structured log emitted.
  • _run_import exception path: simulate unexpected exception, verify log.exception called.
  • _apply_schedule hourly: mock scheduler, verify add_job called with "interval" trigger and correct hours.
  • _apply_schedule daily: verify "cron" trigger with hour and minute.
  • _apply_schedule weekly: verify "cron" trigger with day_of_week, hour, minute.
  • _apply_schedule replaces an existing job: confirm remove_job called first when job already exists.

8b — Tests for health_check

Create tests/test_tasks/test_health_check.py:

  • _run_probe online status: verify app.state.server_status is updated correctly.
  • _run_probe offline→online transition: verify "fail2ban_came_online" log event.
  • _run_probe online→offline transition: verify "fail2ban_went_offline" log event.
  • _run_probe stable online (no transition): verify no transition log events.
  • register: verify add_job is called with "interval" trigger and initial offline status set.

8c — Tests for geo_cache_flush

Create tests/test_tasks/test_geo_cache_flush.py:

  • _run_flush with dirty IPs: verify geo_service.flush_dirty is called and debug log emitted when count > 0.
  • _run_flush with nothing: verify flush_dirty called but no debug log.
  • register: verify add_job called with correct interval and stable job ID.

8d — Extended tests for fail2ban_client

Extend tests/test_services/test_fail2ban_client.py:

  • send() success path: mock run_in_executor, verify response is returned and debug log emitted.
  • send() Fail2BanConnectionError: verify exception is re-raised and warning log emitted.
  • send() Fail2BanProtocolError: verify exception is re-raised and error log emitted.
  • _send_command_sync connection closed mid-stream (empty chunk): verify Fail2BanConnectionError.
  • _send_command_sync pickle parse error (bad bytes in response): verify Fail2BanProtocolError.
  • _coerce_command_token for str, bool, int, float, list, dict, set, and a custom object (stringified).

Result: 50 new tests added (678 total). Coverage after:

Module Before After
app/tasks/blocklist_import.py 23 % 96 %
app/tasks/health_check.py 43 % 100 %
app/tasks/geo_cache_flush.py 60 % 100 %
app/utils/fail2ban_client.py 58 % 96 %

Overall backend coverage: 85 % → 87 %. ruff, mypy --strict, tsc, and eslint all clean.