Add automatic cleanup of stale geolocation cache entries to prevent unbounded database growth. Resolves the issue where unique IP addresses accumulated indefinitely in the geo_cache table, degrading query performance. ## Changes ### Database Schema (Migration 3) - Add 'last_seen' column to geo_cache table tracking last reference time - Existing entries default to current timestamp ### Repository Layer (geo_cache_repo.py) - Update upsert_entry() to set/refresh last_seen on insert/update - Update upsert_neg_entry() to set/refresh last_seen on negative cache hits - Update bulk_upsert_entries() to set/refresh last_seen in batch operations - Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries ### Background Task (geo_cache_cleanup.py) - New APScheduler task that runs nightly (24-hour interval) - Calculates cutoff as 90 days ago from current time (UTC) - Deletes all entries with last_seen older than cutoff - Logs operation results (info when deleted > 0, debug when 0 deleted) - Configurable retention period via GEO_CACHE_RETENTION_DAYS constant ### Application Startup (startup.py) - Register geo_cache_cleanup task in scheduler during app startup - Placed after geo_cache_flush in task registration order ### Tests - Add delete_stale_entries test cases covering: * Removal of old entries beyond cutoff * No deletion when all entries are recent * Empty table edge case - Update existing test fixtures to include last_seen column - Add full test suite for cleanup task registration and execution ### Documentation - Architekture.md: Document cleanup task, update schema/diagram - Backend-Development.md: Add retention policy documentation ## Behavior When an IP is accessed, its last_seen is refreshed. After 90 days of no access, an IP is purged by the nightly cleanup. On next encounter, the IP is re-resolved from MaxMind MMDB or ip-api.com (if configured). This is acceptable because: 1. Stale geolocation data may become inaccurate over time 2. Re-resolution cost is minimal compared to unbounded storage growth 3. Active IPs maintain fresh data through their last_seen updates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
66 lines
4.3 KiB
Markdown
66 lines
4.3 KiB
Markdown
## TASK-032 — `geo_cache` table grows unboundedly — no eviction or purge
|
|
|
|
**Severity:** Medium
|
|
|
|
### Where found
|
|
`backend/app/repositories/geo_cache_repo.py` — has `upsert_entry`, `bulk_upsert_entries`, `upsert_neg_entry` — but **no DELETE functions**. `backend/app/db.py` — `geo_cache` table has no `last_seen` or `created_at` column.
|
|
|
|
### Why this is needed
|
|
Every unique IP address ever seen by fail2ban gets a row in `geo_cache`. The table is never trimmed. A BanGUI instance monitoring a busy server can accumulate millions of rows over months, increasing the DB file size and degrading query performance on every geo lookup.
|
|
|
|
### Goal
|
|
Implement a retention policy that prunes geo cache entries not referenced recently.
|
|
|
|
### What to do
|
|
1. Add a migration (`_MIGRATIONS[2]`) that adds a `last_seen TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP` column to `geo_cache`.
|
|
2. Update `upsert_entry` and `bulk_upsert_entries` to set `last_seen = CURRENT_TIMESTAMP` on every upsert.
|
|
3. Add `delete_stale_entries(db: aiosqlite.Connection, cutoff_iso: str) -> int` to `geo_cache_repo.py`.
|
|
4. Create `backend/app/tasks/geo_cache_cleanup.py` — a nightly task that calls `delete_stale_entries` with a 90-day cutoff.
|
|
5. Register the task in `startup_shared_resources`.
|
|
|
|
### Possible traps and issues
|
|
- Adding a column requires a migration. Coordinate with TASK-023 (migration atomicity) and TASK-022 (session hash migration) — all three migrations must be sequenced correctly as `_MIGRATIONS[2]`, `[3]`, etc.
|
|
- IPs that have not been seen in 90 days will lose their geo data — on their next appearance they will be re-resolved from ip-api.com or the MMDB. This is acceptable.
|
|
|
|
### Docs changes needed
|
|
- `Architekture.md` — update the `geo_cache` table description and add the cleanup task.
|
|
- `Backend-Development.md` — document the geo cache retention policy.
|
|
|
|
### Doc references
|
|
- [Architekture.md](Architekture.md) — application database schema
|
|
- [Backend-Development.md](Backend-Development.md) — background tasks
|
|
|
|
---
|
|
|
|
## TASK-033 — Session token returned in JSON body alongside HttpOnly cookie
|
|
|
|
**Severity:** Medium
|
|
|
|
### Where found
|
|
`backend/app/routers/auth.py` — `login()` returns `LoginResponse(token=signed_token, expires_at=expires_at)` in the JSON body **and** sets the HttpOnly cookie. `backend/app/models/auth.py` — `LoginResponse.token` field.
|
|
|
|
### Why this is needed
|
|
The `LoginResponse` JSON body contains the full signed session token. JavaScript running on the page (including third-party analytics scripts or a future XSS injection) can read the response body from a `fetch()` call and store the token in `localStorage` or a non-HttpOnly cookie. The Bearer-header authentication path (`Authorization: Bearer <token>`) then allows using that extracted token, completely bypassing the protections provided by the HttpOnly cookie.
|
|
|
|
### Goal
|
|
Prevent the session token from being accessible to JavaScript when using cookie-based authentication.
|
|
|
|
### What to do
|
|
1. For browser SPA consumers: Remove the `token` field from `LoginResponse`. The HttpOnly cookie is the only token the browser needs.
|
|
2. If an API-first (non-browser) token flow is required, create a separate endpoint `POST /api/auth/token` that returns a token in the body and does **not** set a cookie. Document this endpoint as "for programmatic API clients only, not for browser use".
|
|
3. Update the frontend — verify that `AuthProvider` does not use `response.token` (confirmed: it currently does not).
|
|
|
|
### Possible traps and issues
|
|
- Any existing API client that relies on the token in the `LoginResponse` body will break. Check tests.
|
|
- The `expires_at` field in `LoginResponse` is useful for the frontend to know when to prompt for re-login — this can remain.
|
|
- The Bearer-token path in `require_auth` (`Authorization: Bearer`) remains functional for programmatic clients using the dedicated token endpoint.
|
|
|
|
### Docs changes needed
|
|
- `Features.md` — document the authentication flow (cookie for browser, token endpoint for API clients).
|
|
- `Backend-Development.md` — authentication endpoint design.
|
|
- `Web-Development.md` — document that the frontend uses only the HttpOnly cookie.
|
|
|
|
### Doc references
|
|
- [Features.md](Features.md) — authentication
|
|
- [Backend-Development.md](Backend-Development.md) — auth router design
|
|
- [Web-Development.md](Web-Development.md) — AuthProvider |