TASK-032: Implement geo_cache retention policy and cleanup
Add automatic cleanup of stale geolocation cache entries to prevent unbounded database growth. Resolves the issue where unique IP addresses accumulated indefinitely in the geo_cache table, degrading query performance. ## Changes ### Database Schema (Migration 3) - Add 'last_seen' column to geo_cache table tracking last reference time - Existing entries default to current timestamp ### Repository Layer (geo_cache_repo.py) - Update upsert_entry() to set/refresh last_seen on insert/update - Update upsert_neg_entry() to set/refresh last_seen on negative cache hits - Update bulk_upsert_entries() to set/refresh last_seen in batch operations - Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries ### Background Task (geo_cache_cleanup.py) - New APScheduler task that runs nightly (24-hour interval) - Calculates cutoff as 90 days ago from current time (UTC) - Deletes all entries with last_seen older than cutoff - Logs operation results (info when deleted > 0, debug when 0 deleted) - Configurable retention period via GEO_CACHE_RETENTION_DAYS constant ### Application Startup (startup.py) - Register geo_cache_cleanup task in scheduler during app startup - Placed after geo_cache_flush in task registration order ### Tests - Add delete_stale_entries test cases covering: * Removal of old entries beyond cutoff * No deletion when all entries are recent * Empty table edge case - Update existing test fixtures to include last_seen column - Add full test suite for cleanup task registration and execution ### Documentation - Architekture.md: Document cleanup task, update schema/diagram - Backend-Development.md: Add retention policy documentation ## Behavior When an IP is accessed, its last_seen is refreshed. After 90 days of no access, an IP is purged by the nightly cleanup. On next encounter, the IP is re-resolved from MaxMind MMDB or ip-api.com (if configured). This is acceptable because: 1. Stale geolocation data may become inaccurate over time 2. Re-resolution cost is minimal compared to unbounded storage growth 3. Active IPs maintain fresh data through their last_seen updates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -133,7 +133,8 @@ backend/
|
||||
│ │ ├── geo_cache_repo.py # IP geolocation cache persistence│ │ └── import_log_repo.py # Import run history records
|
||||
│ ├── tasks/ # APScheduler background jobs
|
||||
│ │ ├── blocklist_import.py# Scheduled blocklist download and application
|
||||
│ │ ├── geo_cache_flush.py # Periodic geo cache persistence (dirty-set flush to SQLite)│ │ ├── geo_re_resolve.py # Periodic re-resolution of stale geo cache records│ │ └── health_check.py # Periodic fail2ban connectivity probe
|
||||
│ │ ├── geo_cache_flush.py # Periodic geo cache persistence (dirty-set flush to SQLite)│ │ ├── geo_cache_cleanup.py # Periodic purge of stale geo cache entries
|
||||
│ │ ├── geo_re_resolve.py # Periodic re-resolution of stale geo cache records│ │ └── health_check.py # Periodic fail2ban connectivity probe
|
||||
│ └── utils/ # Helpers, constants, shared types
|
||||
│ ├── fail2ban_client.py # Async wrapper around the fail2ban socket protocol
|
||||
│ ├── fail2ban_response.py # Canonical response parsing: ok(), to_dict(), ensure_list(), is_not_found_error()
|
||||
@@ -241,6 +242,7 @@ APScheduler background jobs that run on a schedule without user interaction.
|
||||
| Task | Purpose |
|
||||
|---|---|
|
||||
| `blocklist_import.py` | Downloads all enabled blocklist sources, validates entries, applies bans, records results in the import log |
|
||||
| `geo_cache_cleanup.py` | Periodically removes entries from the `geo_cache` table that have not been referenced in the configured retention period (default: 90 days). Prevents unbounded database growth. |
|
||||
| `geo_cache_flush.py` | Periodically flushes newly resolved IPs from the in-memory dirty set to the `geo_cache` SQLite table (default: every 60 seconds). GET requests populate only the in-memory cache; this task persists them without blocking any request. |
|
||||
| `geo_re_resolve.py` | Periodically re-resolves stale entries in `geo_cache` to keep geolocation data fresh |
|
||||
| `health_check.py` | Periodically pings the fail2ban socket and updates the cached server status so the frontend always has fresh data |
|
||||
@@ -649,7 +651,7 @@ BanGUI maintains its **own SQLite database** (separate from the fail2ban databas
|
||||
|---|---|
|
||||
| `settings` | Key-value store for application configuration (master password hash, fail2ban socket path, database path, timezone, session duration) |
|
||||
| `sessions` | Active session token hashes with expiry timestamps. Tokens are stored as one-way SHA256 hashes to prevent token hijacking if the database is exposed. |
|
||||
| `geo_cache` | Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at). Loaded into memory at startup via `load_cache_from_db()`; new entries are flushed back by the `geo_cache_flush` background task. |
|
||||
| `geo_cache` | Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at, last_seen). Tracks the last time each IP address was referenced to enable retention policies. Entries older than 90 days are automatically purged by the `geo_cache_cleanup` task to prevent unbounded growth. Loaded into memory at startup via `load_cache_from_db()`; new entries are flushed back by the `geo_cache_flush` background task. |
|
||||
| `blocklist_sources` | Registered blocklist URLs (id, name, url, enabled, created_at, updated_at) |
|
||||
| `import_logs` | Record of every blocklist import run (id, source_id, timestamp, ips_imported, ips_skipped, errors, status) |
|
||||
|
||||
@@ -701,6 +703,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
│ (async, in-process) │
|
||||
├──────────────────────┤
|
||||
│ blocklist_import │ ── runs on configured schedule (default: daily 03:00)
|
||||
│ geo_cache_cleanup │ ── runs every 24 hours (nightly)
|
||||
│ geo_cache_flush │ ── runs every 60 seconds
|
||||
│ health_check │ ── runs every 30 seconds
|
||||
└──────────────────────┘
|
||||
|
||||
Reference in New Issue
Block a user