Files
BanGUI/Docs/Tasks.md
Lukas 5d24780c63 TASK-028: Add exception logging to fire-and-forget asyncio.create_task()
- Create logged_task() helper in backend/app/utils/async_utils.py to wrap
  fire-and-forget coroutines with exception logging
- Ensures unhandled task exceptions are always logged to structlog instead of
  silently discarded (Python 3.11+ RuntimeWarning)
- Update ban_service.py to use logged_task() for geo_cache.lookup_batch()
  background resolution
- Add comprehensive tests for logged_task() in test_async_utils.py
- Document fire-and-forget task conventions in Backend-Development.md

The logged_task() wrapper catches any exception raised in a background task,
logs it with full traceback context and task name, and never re-raises.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 15:17:30 +02:00

12 KiB
Raw Blame History

TASK-028 — Fire-and-forget asyncio.create_task() silently discards exceptions

Severity: Low

Where found

backend/app/services/ban_service.py line ~614:

asyncio.create_task(  # noqa: RUF006
    geo_cache.lookup_batch(uncached, http_session, db=app_db),
    name="geo_bans_by_country",
)

Why this is needed

The task reference is immediately discarded. Any exception raised inside geo_cache.lookup_batch() — network errors, aiohttp timeouts, DB write failures — becomes an unhandled task exception. In Python 3.11+ this emits a RuntimeWarning to stderr but is otherwise silently swallowed. Errors in background geo resolution are invisible in structured logs.

Goal

Ensure exceptions in fire-and-forget tasks are always logged.

What to do

  1. Wrap the task body in a logging wrapper:
    async def _logged_task(coro: Coroutine[Any, Any, Any], name: str) -> None:
        try:
            await coro
        except Exception:
            log.exception("background_task_failed", task_name=name)
    
    asyncio.create_task(_logged_task(geo_cache.lookup_batch(...), "geo_bans_by_country"))
    
  2. Extract _logged_task into backend/app/utils/async_utils.py as a reusable helper so the same pattern is used for all fire-and-forget tasks.

Possible traps and issues

  • The done callback must not re-raise the exception — only log it.
  • log.exception() inside a callback/task captures the traceback automatically with structlog.

Docs changes needed

  • Backend-Development.md — fire-and-forget task conventions.

Doc references


TASK-029 — Fail2BanConnectionError leaks socket path in HTTP error responses

Severity: Medium

Where found

backend/app/exceptions.pyFail2BanConnectionError.__init__() formats the message as f"{message} (socket: {socket_path})". backend/app/main.py_fail2ban_connection_handler() returns {"detail": f"Cannot reach fail2ban: {exc}"} verbatim.

Why this is needed

Every 502 response caused by fail2ban being unreachable includes the full socket path (e.g., Cannot reach fail2ban: [Errno 2] No such file or directory (socket: /var/run/fail2ban/fail2ban.sock)) in the JSON error body. This discloses internal infrastructure details to unauthenticated users who can trigger the error. Similarly, _fail2ban_protocol_handler includes raw exception details that may expose internal parsing logic.

Goal

Return generic, user-friendly error messages in HTTP responses. Log full details server-side only.

What to do

  1. In _fail2ban_connection_handler(), replace:
    content={"detail": f"Cannot reach fail2ban: {exc}"}
    
    with:
    content={"detail": "Cannot reach the fail2ban service. Check the server status page."}
    
  2. In _fail2ban_protocol_handler(), similarly return a generic message.
  3. Both handlers already log error=str(exc) server-side — this is correct and should remain.

Possible traps and issues

  • Update any tests that assert the exact detail string in 502 responses.
  • If the frontend displays this error message directly to the user, ensure it still makes sense after genericizing.

Docs changes needed

  • Backend-Development.md — error message hygiene (no internal paths/details in responses).

Doc references


TASK-030 — ip-api.com geo lookups use plain HTTP — IP addresses sent unencrypted

Severity: Medium

Where found

backend/app/services/geo_cache.py lines ~4146:

_API_URL = "http://ip-api.com/json/{ip}?fields=..."
_BATCH_API_URL = "http://ip-api.com/batch?fields=..."

Why this is needed

All banned and monitored IP addresses are transmitted to ip-api.com in cleartext over HTTP. These are potentially sensitive data (PII under GDPR/CCPA — IP addresses identify users). Any network path between the BanGUI server and ip-api.com's servers can observe or modify the traffic. Forged responses would corrupt the geo database silently.

Goal

Use encrypted transport for all geo API calls, or switch to a local resolver.

What to do

ip-api.com's free tier does not support HTTPS. The recommended approach:

  1. Promote the existing geoip_db_path setting (MaxMind GeoLite2-Country MMDB) to the primary resolver.
  2. Use ip-api.com as a secondary fallback only when the MMDB is unavailable or returns no result.
  3. Add documentation and compose file examples for downloading and mounting the GeoLite2 MMDB.
  4. If ip-api.com HTTP is retained as a fallback, add a config flag BANGUI_GEOIP_ALLOW_HTTP_FALLBACK (default false) and warn clearly at startup when enabled.

Possible traps and issues

  • The MaxMind GeoLite2 database requires a free account and a license key to download — document the setup process.
  • The GeoLite2-Country MMDB does not include ASN or organisation data — these fields will be null when using the local resolver. The GeoInfo model must handle nullable asn and org.

Docs changes needed

  • Features.md — document the geo resolution mechanism and MMDB setup.
  • Architekture.md — update the external API dependency section.
  • Backend-Development.md — configuration for geoip_db_path.

Doc references


TASK-031 — bcrypt 72-byte truncation not enforced — long passwords silently equivalent to their prefix

Severity: Medium

Where found

backend/app/models/auth.pyLoginRequest.password: str = Field(...) (no max_length). backend/app/models/setup.pySetupRequest.master_password has min_length=8 but no max_length.

Why this is needed

bcrypt silently truncates all input at 72 bytes before hashing. A user who sets a 100-character password can be authenticated by supplying only the first 72 characters. The extra characters provide no additional security. An attacker who has reduced the search space to 72 characters can brute-force the password more efficiently than the user intended.

Goal

Enforce a maximum password length of 72 bytes, or pre-hash before bcrypt to remove the limit entirely.

What to do

Option A (simple):

  1. Add max_length=72 to SetupRequest.master_password and LoginRequest.password.
  2. Update the setup wizard UI to reflect the 72-character maximum.

Option B (removes the 72-byte limit entirely):

  1. Pre-hash the password with HMAC-SHA256 using the session_secret as the key before passing to bcrypt:
    pre_hashed = hmac.new(secret.encode(), password.encode(), hashlib.sha256).digest()
    bcrypt.hashpw(pre_hashed, bcrypt.gensalt())
    
  2. Apply consistently in both run_setup() and _check_password().

Option A is recommended as the simpler, lower-risk fix. Option B is architecturally cleaner but requires a stored hash migration.

Possible traps and issues

  • Option A: Users who already have passwords longer than 72 characters will need to reset. For a single-admin app this is acceptable.
  • Option B: If the session_secret changes, all stored password hashes become invalid (since the pre-hash key changes). This is a hidden coupling — document it explicitly.

Docs changes needed

  • Features.md — document the password length constraint.
  • Backend-Development.md — bcrypt usage notes.

Doc references


TASK-032 — geo_cache table grows unboundedly — no eviction or purge

Severity: Medium

Where found

backend/app/repositories/geo_cache_repo.py — has upsert_entry, bulk_upsert_entries, upsert_neg_entry — but no DELETE functions. backend/app/db.pygeo_cache table has no last_seen or created_at column.

Why this is needed

Every unique IP address ever seen by fail2ban gets a row in geo_cache. The table is never trimmed. A BanGUI instance monitoring a busy server can accumulate millions of rows over months, increasing the DB file size and degrading query performance on every geo lookup.

Goal

Implement a retention policy that prunes geo cache entries not referenced recently.

What to do

  1. Add a migration (_MIGRATIONS[2]) that adds a last_seen TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP column to geo_cache.
  2. Update upsert_entry and bulk_upsert_entries to set last_seen = CURRENT_TIMESTAMP on every upsert.
  3. Add delete_stale_entries(db: aiosqlite.Connection, cutoff_iso: str) -> int to geo_cache_repo.py.
  4. Create backend/app/tasks/geo_cache_cleanup.py — a nightly task that calls delete_stale_entries with a 90-day cutoff.
  5. Register the task in startup_shared_resources.

Possible traps and issues

  • Adding a column requires a migration. Coordinate with TASK-023 (migration atomicity) and TASK-022 (session hash migration) — all three migrations must be sequenced correctly as _MIGRATIONS[2], [3], etc.
  • IPs that have not been seen in 90 days will lose their geo data — on their next appearance they will be re-resolved from ip-api.com or the MMDB. This is acceptable.

Docs changes needed

  • Architekture.md — update the geo_cache table description and add the cleanup task.
  • Backend-Development.md — document the geo cache retention policy.

Doc references


Severity: Medium

Where found

backend/app/routers/auth.pylogin() returns LoginResponse(token=signed_token, expires_at=expires_at) in the JSON body and sets the HttpOnly cookie. backend/app/models/auth.pyLoginResponse.token field.

Why this is needed

The LoginResponse JSON body contains the full signed session token. JavaScript running on the page (including third-party analytics scripts or a future XSS injection) can read the response body from a fetch() call and store the token in localStorage or a non-HttpOnly cookie. The Bearer-header authentication path (Authorization: Bearer <token>) then allows using that extracted token, completely bypassing the protections provided by the HttpOnly cookie.

Goal

Prevent the session token from being accessible to JavaScript when using cookie-based authentication.

What to do

  1. For browser SPA consumers: Remove the token field from LoginResponse. The HttpOnly cookie is the only token the browser needs.
  2. If an API-first (non-browser) token flow is required, create a separate endpoint POST /api/auth/token that returns a token in the body and does not set a cookie. Document this endpoint as "for programmatic API clients only, not for browser use".
  3. Update the frontend — verify that AuthProvider does not use response.token (confirmed: it currently does not).

Possible traps and issues

  • Any existing API client that relies on the token in the LoginResponse body will break. Check tests.
  • The expires_at field in LoginResponse is useful for the frontend to know when to prompt for re-login — this can remain.
  • The Bearer-token path in require_auth (Authorization: Bearer) remains functional for programmatic clients using the dedicated token endpoint.

Docs changes needed

  • Features.md — document the authentication flow (cookie for browser, token endpoint for API clients).
  • Backend-Development.md — authentication endpoint design.
  • Web-Development.md — document that the frontend uses only the HttpOnly cookie.

Doc references