TASK-032: Implement geo_cache retention policy and cleanup

Add automatic cleanup of stale geolocation cache entries to prevent
unbounded database growth. Resolves the issue where unique IP addresses
accumulated indefinitely in the geo_cache table, degrading query performance.

## Changes

### Database Schema (Migration 3)
- Add 'last_seen' column to geo_cache table tracking last reference time
- Existing entries default to current timestamp

### Repository Layer (geo_cache_repo.py)
- Update upsert_entry() to set/refresh last_seen on insert/update
- Update upsert_neg_entry() to set/refresh last_seen on negative cache hits
- Update bulk_upsert_entries() to set/refresh last_seen in batch operations
- Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries

### Background Task (geo_cache_cleanup.py)
- New APScheduler task that runs nightly (24-hour interval)
- Calculates cutoff as 90 days ago from current time (UTC)
- Deletes all entries with last_seen older than cutoff
- Logs operation results (info when deleted > 0, debug when 0 deleted)
- Configurable retention period via GEO_CACHE_RETENTION_DAYS constant

### Application Startup (startup.py)
- Register geo_cache_cleanup task in scheduler during app startup
- Placed after geo_cache_flush in task registration order

### Tests
- Add delete_stale_entries test cases covering:
  * Removal of old entries beyond cutoff
  * No deletion when all entries are recent
  * Empty table edge case
- Update existing test fixtures to include last_seen column
- Add full test suite for cleanup task registration and execution

### Documentation
- Architekture.md: Document cleanup task, update schema/diagram
- Backend-Development.md: Add retention policy documentation

## Behavior

When an IP is accessed, its last_seen is refreshed. After 90 days of no
access, an IP is purged by the nightly cleanup. On next encounter, the IP
is re-resolved from MaxMind MMDB or ip-api.com (if configured).

This is acceptable because:
1. Stale geolocation data may become inaccurate over time
2. Re-resolution cost is minimal compared to unbounded storage growth
3. Active IPs maintain fresh data through their last_seen updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-26 19:24:34 +02:00
parent 32aad186c3
commit e2560f5db0
9 changed files with 405 additions and 89 deletions

View File

@@ -1,84 +1,3 @@
## TASK-030 — ip-api.com geo lookups use plain HTTP — IP addresses sent unencrypted
**Severity:** Medium
### Where found
`backend/app/services/geo_cache.py` lines ~4146:
```python
_API_URL = "http://ip-api.com/json/{ip}?fields=..."
_BATCH_API_URL = "http://ip-api.com/batch?fields=..."
```
### Why this is needed
All banned and monitored IP addresses are transmitted to ip-api.com in cleartext over HTTP. These are potentially sensitive data (PII under GDPR/CCPA — IP addresses identify users). Any network path between the BanGUI server and ip-api.com's servers can observe or modify the traffic. Forged responses would corrupt the geo database silently.
### Goal
Use encrypted transport for all geo API calls, or switch to a local resolver.
### What to do
ip-api.com's free tier does not support HTTPS. The recommended approach:
1. Promote the existing `geoip_db_path` setting (MaxMind GeoLite2-Country MMDB) to the **primary** resolver.
2. Use ip-api.com as a secondary fallback only when the MMDB is unavailable or returns no result.
3. Add documentation and compose file examples for downloading and mounting the GeoLite2 MMDB.
4. If ip-api.com HTTP is retained as a fallback, add a config flag `BANGUI_GEOIP_ALLOW_HTTP_FALLBACK` (default `false`) and warn clearly at startup when enabled.
### Possible traps and issues
- The MaxMind GeoLite2 database requires a free account and a license key to download — document the setup process.
- The GeoLite2-Country MMDB does not include ASN or organisation data — these fields will be `null` when using the local resolver. The `GeoInfo` model must handle nullable `asn` and `org`.
### Docs changes needed
- `Features.md` — document the geo resolution mechanism and MMDB setup.
- `Architekture.md` — update the external API dependency section.
- `Backend-Development.md` — configuration for `geoip_db_path`.
### Doc references
- [Features.md](Features.md) — geolocation
- [Architekture.md](Architekture.md) — external API dependencies
---
## TASK-031 — bcrypt 72-byte truncation not enforced — long passwords silently equivalent to their prefix
**Severity:** Medium
### Where found
`backend/app/models/auth.py``LoginRequest.password: str = Field(...)` (no `max_length`). `backend/app/models/setup.py``SetupRequest.master_password` has `min_length=8` but no `max_length`.
### Why this is needed
bcrypt silently truncates all input at 72 bytes before hashing. A user who sets a 100-character password can be authenticated by supplying only the first 72 characters. The extra characters provide no additional security. An attacker who has reduced the search space to 72 characters can brute-force the password more efficiently than the user intended.
### Goal
Enforce a maximum password length of 72 bytes, or pre-hash before bcrypt to remove the limit entirely.
### What to do
**Option A (simple):**
1. Add `max_length=72` to `SetupRequest.master_password` and `LoginRequest.password`.
2. Update the setup wizard UI to reflect the 72-character maximum.
**Option B (removes the 72-byte limit entirely):**
1. Pre-hash the password with HMAC-SHA256 using the `session_secret` as the key before passing to bcrypt:
```python
pre_hashed = hmac.new(secret.encode(), password.encode(), hashlib.sha256).digest()
bcrypt.hashpw(pre_hashed, bcrypt.gensalt())
```
2. Apply consistently in both `run_setup()` and `_check_password()`.
Option A is recommended as the simpler, lower-risk fix. Option B is architecturally cleaner but requires a stored hash migration.
### Possible traps and issues
- Option A: Users who already have passwords longer than 72 characters will need to reset. For a single-admin app this is acceptable.
- Option B: If the `session_secret` changes, all stored password hashes become invalid (since the pre-hash key changes). This is a hidden coupling — document it explicitly.
### Docs changes needed
- `Features.md` — document the password length constraint.
- `Backend-Development.md` — bcrypt usage notes.
### Doc references
- [Features.md](Features.md) — authentication and setup
- [Backend-Development.md](Backend-Development.md) — password hashing
---
## TASK-032 — `geo_cache` table grows unboundedly — no eviction or purge
**Severity:** Medium