Files
BanGUI/Docs/Tasks.md
Lukas e2560f5db0 TASK-032: Implement geo_cache retention policy and cleanup
Add automatic cleanup of stale geolocation cache entries to prevent
unbounded database growth. Resolves the issue where unique IP addresses
accumulated indefinitely in the geo_cache table, degrading query performance.

## Changes

### Database Schema (Migration 3)
- Add 'last_seen' column to geo_cache table tracking last reference time
- Existing entries default to current timestamp

### Repository Layer (geo_cache_repo.py)
- Update upsert_entry() to set/refresh last_seen on insert/update
- Update upsert_neg_entry() to set/refresh last_seen on negative cache hits
- Update bulk_upsert_entries() to set/refresh last_seen in batch operations
- Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries

### Background Task (geo_cache_cleanup.py)
- New APScheduler task that runs nightly (24-hour interval)
- Calculates cutoff as 90 days ago from current time (UTC)
- Deletes all entries with last_seen older than cutoff
- Logs operation results (info when deleted > 0, debug when 0 deleted)
- Configurable retention period via GEO_CACHE_RETENTION_DAYS constant

### Application Startup (startup.py)
- Register geo_cache_cleanup task in scheduler during app startup
- Placed after geo_cache_flush in task registration order

### Tests
- Add delete_stale_entries test cases covering:
  * Removal of old entries beyond cutoff
  * No deletion when all entries are recent
  * Empty table edge case
- Update existing test fixtures to include last_seen column
- Add full test suite for cleanup task registration and execution

### Documentation
- Architekture.md: Document cleanup task, update schema/diagram
- Backend-Development.md: Add retention policy documentation

## Behavior

When an IP is accessed, its last_seen is refreshed. After 90 days of no
access, an IP is purged by the nightly cleanup. On next encounter, the IP
is re-resolved from MaxMind MMDB or ip-api.com (if configured).

This is acceptable because:
1. Stale geolocation data may become inaccurate over time
2. Re-resolution cost is minimal compared to unbounded storage growth
3. Active IPs maintain fresh data through their last_seen updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 19:24:34 +02:00

4.3 KiB

TASK-032 — geo_cache table grows unboundedly — no eviction or purge

Severity: Medium

Where found

backend/app/repositories/geo_cache_repo.py — has upsert_entry, bulk_upsert_entries, upsert_neg_entry — but no DELETE functions. backend/app/db.pygeo_cache table has no last_seen or created_at column.

Why this is needed

Every unique IP address ever seen by fail2ban gets a row in geo_cache. The table is never trimmed. A BanGUI instance monitoring a busy server can accumulate millions of rows over months, increasing the DB file size and degrading query performance on every geo lookup.

Goal

Implement a retention policy that prunes geo cache entries not referenced recently.

What to do

  1. Add a migration (_MIGRATIONS[2]) that adds a last_seen TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP column to geo_cache.
  2. Update upsert_entry and bulk_upsert_entries to set last_seen = CURRENT_TIMESTAMP on every upsert.
  3. Add delete_stale_entries(db: aiosqlite.Connection, cutoff_iso: str) -> int to geo_cache_repo.py.
  4. Create backend/app/tasks/geo_cache_cleanup.py — a nightly task that calls delete_stale_entries with a 90-day cutoff.
  5. Register the task in startup_shared_resources.

Possible traps and issues

  • Adding a column requires a migration. Coordinate with TASK-023 (migration atomicity) and TASK-022 (session hash migration) — all three migrations must be sequenced correctly as _MIGRATIONS[2], [3], etc.
  • IPs that have not been seen in 90 days will lose their geo data — on their next appearance they will be re-resolved from ip-api.com or the MMDB. This is acceptable.

Docs changes needed

  • Architekture.md — update the geo_cache table description and add the cleanup task.
  • Backend-Development.md — document the geo cache retention policy.

Doc references


Severity: Medium

Where found

backend/app/routers/auth.pylogin() returns LoginResponse(token=signed_token, expires_at=expires_at) in the JSON body and sets the HttpOnly cookie. backend/app/models/auth.pyLoginResponse.token field.

Why this is needed

The LoginResponse JSON body contains the full signed session token. JavaScript running on the page (including third-party analytics scripts or a future XSS injection) can read the response body from a fetch() call and store the token in localStorage or a non-HttpOnly cookie. The Bearer-header authentication path (Authorization: Bearer <token>) then allows using that extracted token, completely bypassing the protections provided by the HttpOnly cookie.

Goal

Prevent the session token from being accessible to JavaScript when using cookie-based authentication.

What to do

  1. For browser SPA consumers: Remove the token field from LoginResponse. The HttpOnly cookie is the only token the browser needs.
  2. If an API-first (non-browser) token flow is required, create a separate endpoint POST /api/auth/token that returns a token in the body and does not set a cookie. Document this endpoint as "for programmatic API clients only, not for browser use".
  3. Update the frontend — verify that AuthProvider does not use response.token (confirmed: it currently does not).

Possible traps and issues

  • Any existing API client that relies on the token in the LoginResponse body will break. Check tests.
  • The expires_at field in LoginResponse is useful for the frontend to know when to prompt for re-login — this can remain.
  • The Bearer-token path in require_auth (Authorization: Bearer) remains functional for programmatic clients using the dedicated token endpoint.

Docs changes needed

  • Features.md — document the authentication flow (cookie for browser, token endpoint for API clients).
  • Backend-Development.md — authentication endpoint design.
  • Web-Development.md — document that the frontend uses only the HttpOnly cookie.

Doc references