Add automatic cleanup of stale geolocation cache entries to prevent unbounded database growth. Resolves the issue where unique IP addresses accumulated indefinitely in the geo_cache table, degrading query performance. ## Changes ### Database Schema (Migration 3) - Add 'last_seen' column to geo_cache table tracking last reference time - Existing entries default to current timestamp ### Repository Layer (geo_cache_repo.py) - Update upsert_entry() to set/refresh last_seen on insert/update - Update upsert_neg_entry() to set/refresh last_seen on negative cache hits - Update bulk_upsert_entries() to set/refresh last_seen in batch operations - Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries ### Background Task (geo_cache_cleanup.py) - New APScheduler task that runs nightly (24-hour interval) - Calculates cutoff as 90 days ago from current time (UTC) - Deletes all entries with last_seen older than cutoff - Logs operation results (info when deleted > 0, debug when 0 deleted) - Configurable retention period via GEO_CACHE_RETENTION_DAYS constant ### Application Startup (startup.py) - Register geo_cache_cleanup task in scheduler during app startup - Placed after geo_cache_flush in task registration order ### Tests - Add delete_stale_entries test cases covering: * Removal of old entries beyond cutoff * No deletion when all entries are recent * Empty table edge case - Update existing test fixtures to include last_seen column - Add full test suite for cleanup task registration and execution ### Documentation - Architekture.md: Document cleanup task, update schema/diagram - Backend-Development.md: Add retention policy documentation ## Behavior When an IP is accessed, its last_seen is refreshed. After 90 days of no access, an IP is purged by the nightly cleanup. On next encounter, the IP is re-resolved from MaxMind MMDB or ip-api.com (if configured). This is acceptable because: 1. Stale geolocation data may become inaccurate over time 2. Re-resolution cost is minimal compared to unbounded storage growth 3. Active IPs maintain fresh data through their last_seen updates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4.3 KiB
TASK-032 — geo_cache table grows unboundedly — no eviction or purge
Severity: Medium
Where found
backend/app/repositories/geo_cache_repo.py — has upsert_entry, bulk_upsert_entries, upsert_neg_entry — but no DELETE functions. backend/app/db.py — geo_cache table has no last_seen or created_at column.
Why this is needed
Every unique IP address ever seen by fail2ban gets a row in geo_cache. The table is never trimmed. A BanGUI instance monitoring a busy server can accumulate millions of rows over months, increasing the DB file size and degrading query performance on every geo lookup.
Goal
Implement a retention policy that prunes geo cache entries not referenced recently.
What to do
- Add a migration (
_MIGRATIONS[2]) that adds alast_seen TEXT NOT NULL DEFAULT CURRENT_TIMESTAMPcolumn togeo_cache. - Update
upsert_entryandbulk_upsert_entriesto setlast_seen = CURRENT_TIMESTAMPon every upsert. - Add
delete_stale_entries(db: aiosqlite.Connection, cutoff_iso: str) -> inttogeo_cache_repo.py. - Create
backend/app/tasks/geo_cache_cleanup.py— a nightly task that callsdelete_stale_entrieswith a 90-day cutoff. - Register the task in
startup_shared_resources.
Possible traps and issues
- Adding a column requires a migration. Coordinate with TASK-023 (migration atomicity) and TASK-022 (session hash migration) — all three migrations must be sequenced correctly as
_MIGRATIONS[2],[3], etc. - IPs that have not been seen in 90 days will lose their geo data — on their next appearance they will be re-resolved from ip-api.com or the MMDB. This is acceptable.
Docs changes needed
Architekture.md— update thegeo_cachetable description and add the cleanup task.Backend-Development.md— document the geo cache retention policy.
Doc references
- Architekture.md — application database schema
- Backend-Development.md — background tasks
TASK-033 — Session token returned in JSON body alongside HttpOnly cookie
Severity: Medium
Where found
backend/app/routers/auth.py — login() returns LoginResponse(token=signed_token, expires_at=expires_at) in the JSON body and sets the HttpOnly cookie. backend/app/models/auth.py — LoginResponse.token field.
Why this is needed
The LoginResponse JSON body contains the full signed session token. JavaScript running on the page (including third-party analytics scripts or a future XSS injection) can read the response body from a fetch() call and store the token in localStorage or a non-HttpOnly cookie. The Bearer-header authentication path (Authorization: Bearer <token>) then allows using that extracted token, completely bypassing the protections provided by the HttpOnly cookie.
Goal
Prevent the session token from being accessible to JavaScript when using cookie-based authentication.
What to do
- For browser SPA consumers: Remove the
tokenfield fromLoginResponse. The HttpOnly cookie is the only token the browser needs. - If an API-first (non-browser) token flow is required, create a separate endpoint
POST /api/auth/tokenthat returns a token in the body and does not set a cookie. Document this endpoint as "for programmatic API clients only, not for browser use". - Update the frontend — verify that
AuthProviderdoes not useresponse.token(confirmed: it currently does not).
Possible traps and issues
- Any existing API client that relies on the token in the
LoginResponsebody will break. Check tests. - The
expires_atfield inLoginResponseis useful for the frontend to know when to prompt for re-login — this can remain. - The Bearer-token path in
require_auth(Authorization: Bearer) remains functional for programmatic clients using the dedicated token endpoint.
Docs changes needed
Features.md— document the authentication flow (cookie for browser, token endpoint for API clients).Backend-Development.md— authentication endpoint design.Web-Development.md— document that the frontend uses only the HttpOnly cookie.
Doc references
- Features.md — authentication
- Backend-Development.md — auth router design
- Web-Development.md — AuthProvider