Fix geo cache write performance: batch commits, read-only GETs, dirty flush
- Remove per-IP db.commit() from _persist_entry() and _persist_neg_entry(); add a single commit after the full lookup_batch() chunk loop instead. Reduces commits from ~5,200 to 1 per bans/by-country request. - Remove db dependency from GET /api/dashboard/bans and GET /api/dashboard/bans/by-country; pass app_db=None so no SQLite writes occur during read-only requests. - Add _dirty set to geo_service; _store() marks resolved IPs dirty. New flush_dirty(db) batch-upserts all dirty entries in one transaction. New geo_cache_flush APScheduler task flushes every 60 s so geo data is persisted without blocking requests.
This commit is contained in:
150
Docs/Tasks.md
150
Docs/Tasks.md
@@ -4,24 +4,140 @@ This document breaks the entire BanGUI project into development stages, ordered
|
||||
|
||||
---
|
||||
|
||||
## Completed
|
||||
## Issue: World Map Loading Time — Architecture Fix
|
||||
|
||||
### [DONE] Fix: Country column shows "—" for blocklist-import IPs in ban list
|
||||
### Problem Summary
|
||||
|
||||
**Root cause:** `ban_service.list_bans()` resolved geo data one IP at a time via
|
||||
`geo_service.lookup()`, which uses the ip-api.com single-IP endpoint (45 req/min
|
||||
free tier limit). A page of 100 bans triggered 100 sequential HTTP requests;
|
||||
after the ~45th request ip-api.com applied rate limiting, all remaining IPs were
|
||||
added to the in-process negative cache (5 min TTL), and they showed "—" in the
|
||||
country column permanently until the TTL expired. Because the map endpoint
|
||||
(`bans_by_country`) used `lookup_batch` (100 IPs per POST), it never hit the
|
||||
rate limit, which is why the map showed colours while the list did not.
|
||||
The `GET /api/dashboard/bans/by-country` endpoint is extremely slow on first load. A single request with ~5,200 unique IPs produces **10,400 SQLite commits** and **6,000 INSERT statements** against the app database — all during a read-only GET request. The log shows 21,000+ lines of SQL trace for just 18 HTTP requests.
|
||||
|
||||
**Fix:** Added `http_session` and `app_db` parameters to `list_bans()`. When
|
||||
`http_session` is provided (production path via the dashboard router), the entire
|
||||
page of IPs is resolved in a single `geo_service.lookup_batch()` call instead of
|
||||
100 individual ones. The legacy `geo_enricher` callback is kept for backwards
|
||||
compatibility in tests. Updated `dashboard.py` to pass `http_session` and `db`
|
||||
instead of constructing a per-IP enricher closure. Added 3 new tests covering
|
||||
the batch path, failure resilience, and priority over `geo_enricher`.
|
||||
Root causes (ordered by impact):
|
||||
|
||||
1. **Per-IP commit during geo cache writes** — `geo_service._persist_entry()` and `_persist_neg_entry()` each call `await db.commit()` after every single INSERT. With 5,200 uncached IPs this means 5,200+ individual commits, each forcing an `fsync`. This is the dominant bottleneck.
|
||||
2. **DB writes on a GET request** — The bans/by-country endpoint passes `app_db` to `geo_service.lookup_batch()`, which triggers INSERT+COMMIT for every resolved IP. A GET request should never produce database writes/commits. Users do not expect loading a map page to mutate the database.
|
||||
3. **Same pattern exists in other endpoints** — The following GET endpoints also trigger geo cache commits: `/api/dashboard/bans`, `/api/bans/active`, `/api/history`, `/api/history/{ip}`, `/api/geo/lookup/{ip}`.
|
||||
|
||||
### Evidence from `log.log`
|
||||
|
||||
- Log line count: **21,117 lines** for 18 HTTP requests
|
||||
- `INSERT INTO geo_cache`: **6,000** executions
|
||||
- `db.commit()`: **10,400** calls (each INSERT + its commit = 2 ops per IP)
|
||||
- `geo_batch_lookup_start`: reports `total=5200` uncached IPs
|
||||
- The bans/by-country response is at line **21,086** out of 21,117 — the entire log is essentially one request's geo persist work
|
||||
- Other requests (`/api/dashboard/status`, `/api/blocklists/schedule`, `/api/config/map-color-thresholds`) interleave with the geo persist loop because they share the same single async DB connection
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Batch geo cache writes — eliminate per-IP commits ✅ DONE
|
||||
|
||||
**Summary:** Removed `await db.commit()` from `_persist_entry()` and `_persist_neg_entry()`. Added a single `await db.commit()` (wrapped in try/except) at the end of `lookup_batch()` after all chunk processing, and after each `_persist_entry` / `_persist_neg_entry` call in `lookup()`. Reduces commits from ~5,200 to **1** per batch request.
|
||||
|
||||
**File:** `backend/app/services/geo_service.py`
|
||||
|
||||
**What to change:**
|
||||
|
||||
The functions `_persist_entry()` and `_persist_neg_entry()` each call `await db.commit()` after every INSERT. Instead, the commit should happen once after the entire batch is processed.
|
||||
|
||||
1. Remove `await db.commit()` from both `_persist_entry()` and `_persist_neg_entry()`.
|
||||
2. In `lookup_batch()`, after the loop over all chunks is complete and all `_persist_entry()` / `_persist_neg_entry()` calls have been made, issue a single `await db.commit()` if `db is not None`.
|
||||
3. Wrap the single commit in a try/except to handle any errors gracefully.
|
||||
|
||||
**Expected impact:** Reduces commits from ~5,200 to **1** per request. This alone should cut the endpoint response time dramatically.
|
||||
|
||||
**Testing:** Existing tests in `test_services/test_ban_service.py` and `test_services/test_geo_service.py` should continue to pass. Verify the geo_cache table still gets populated after a batch lookup by checking the DB contents in an integration test.
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Do not write geo cache during GET requests ✅ DONE
|
||||
|
||||
**Summary:** Removed `db` dependency injection from `GET /api/dashboard/bans` and `GET /api/dashboard/bans/by-country` in `dashboard.py`. Both now pass `app_db=None` to their respective service calls. The other GET endpoints (`/api/bans/active`, `/api/history`, `/api/history/{ip}`, `/api/geo/lookup/{ip}`) already did not pass `db` to geo lookups — confirmed correct.
|
||||
|
||||
**Files:** `backend/app/routers/dashboard.py`, `backend/app/routers/bans.py`, `backend/app/routers/history.py`, `backend/app/routers/geo.py`
|
||||
|
||||
**What to change:**
|
||||
|
||||
GET endpoints should not pass `app_db` (or equivalent) to geo_service functions. The geo resolution should still populate the in-memory cache (which is fast, free, and ephemeral), but should NOT write to SQLite during a read request.
|
||||
|
||||
For each of these GET endpoints:
|
||||
- `GET /api/dashboard/bans/by-country` in `dashboard.py` — stop passing `app_db=db` to `bans_by_country()`; pass `app_db=None` instead.
|
||||
- `GET /api/dashboard/bans` in `dashboard.py` — stop passing `app_db=db` to `list_bans()`; pass `app_db=None` instead.
|
||||
- `GET /api/bans/active` in `bans.py` — the enricher callback should not pass `db` to `geo_service.lookup()`.
|
||||
- `GET /api/history` and `GET /api/history/{ip}` in `history.py` — same: enricher should not pass `db`.
|
||||
- `GET /api/geo/lookup/{ip}` in `geo.py` — do not pass `db` to `geo_service.lookup()`.
|
||||
|
||||
The persistent geo cache should only be written during explicit write operations:
|
||||
- `POST /api/geo/re-resolve` (already a POST — this is correct)
|
||||
- Blocklist import tasks (`blocklist_service.py`)
|
||||
- Application startup via `load_cache_from_db()`
|
||||
|
||||
**Expected impact:** GET requests become truly read-only. No commits, no `fsync`, no write contention on the app DB during map loads.
|
||||
|
||||
**Testing:** Run the full test suite. Verify that:
|
||||
1. The bans/by-country endpoint still returns correct country data (from in-memory cache).
|
||||
2. The `geo_cache` table is still populated when `POST /api/geo/re-resolve` is called or after blocklist import.
|
||||
3. After a server restart, geo data is still available (because `load_cache_from_db()` warms memory from previously persisted data).
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Periodically persist the in-memory geo cache (background task) ✅ DONE
|
||||
|
||||
**Summary:** Added `_dirty: set[str]` to `geo_service.py`. `_store()` now adds IPs with a non-null `country_code` to `_dirty`; `clear_cache()` clears it. Added `flush_dirty(db)` which atomically snapshots/clears `_dirty`, batch-upserts all rows via `executemany()`, commits once, and re-adds entries on failure. Created `backend/app/tasks/geo_cache_flush.py` with a 60-second APScheduler job, registered in `main.py`.
|
||||
|
||||
**Files:** `backend/app/services/geo_service.py`, `backend/app/tasks/` (new task file)
|
||||
|
||||
**What to change:**
|
||||
|
||||
After Task 2, GET requests no longer write to the DB. But newly resolved IPs during GET requests only live in the in-memory cache and would be lost on restart. To avoid this, add a background task that periodically flushes new in-memory entries to the `geo_cache` table.
|
||||
|
||||
1. In `geo_service.py`, add a module-level `_dirty: set[str]` that tracks IPs added to `_cache` but not yet persisted. When `_store()` adds an entry, also add the IP to `_dirty`.
|
||||
2. Add a new function `flush_dirty(db: aiosqlite.Connection) -> int` that:
|
||||
- Takes the current `_dirty` set and clears it atomically.
|
||||
- Uses `executemany()` to batch-INSERT all dirty entries in one transaction.
|
||||
- Calls `db.commit()` once.
|
||||
- Returns the count of flushed entries.
|
||||
3. Register a periodic task (using the existing APScheduler setup) that calls `flush_dirty()` every 60 seconds (configurable). This is similar to how other background tasks already run.
|
||||
|
||||
**Expected impact:** Geo data is persisted without blocking any request. If the server restarts, at most 60 seconds of new geo data is lost (and it will simply be re-fetched from ip-api.com on the next request).
|
||||
|
||||
**Testing:** Write a test that:
|
||||
- Calls `lookup_batch()` without a DB reference.
|
||||
- Verifies IPs are in `_dirty`.
|
||||
- Calls `flush_dirty(db)`.
|
||||
- Verifies the geo_cache table contains the entries and `_dirty` is empty.
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Reduce redundant SQL queries per request (settings / auth)
|
||||
|
||||
**Files:** `backend/app/dependencies.py`, `backend/app/main.py`, `backend/app/repositories/settings_repo.py`
|
||||
|
||||
**What to change:**
|
||||
|
||||
The log shows that every single HTTP request executes at least 2 separate SQL queries before reaching the actual endpoint logic:
|
||||
1. `SELECT value FROM settings WHERE key = 'setup_completed'` (SetupRedirectMiddleware)
|
||||
2. `SELECT id, token, ... FROM sessions WHERE token = ?` (require_auth dependency)
|
||||
|
||||
When multiple requests arrive concurrently (as seen in the log — 3 parallel requests trigger 3× setup_completed + 3× session token queries), this adds unnecessary DB contention.
|
||||
|
||||
Options (implement one or both):
|
||||
- **Cache `setup_completed` in memory:** Once setup is complete, it never goes back to incomplete. Cache the result in `app.state` and skip the DB query on subsequent requests. Set it on first `True` read and clear it only if the app restarts.
|
||||
- **Cache session validation briefly:** Use a short TTL in-memory cache (e.g., 5–10 seconds) for validated session tokens. This reduces repeated DB lookups for the same token across near-simultaneous requests.
|
||||
|
||||
**Expected impact:** Reduces per-request overhead from 2+ SQL queries to 0 for the common case (setup done, session recently validated).
|
||||
|
||||
**Testing:** Existing auth and setup tests must continue to pass. Add a test that validates the cached path (second request skips DB).
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Audit and verify — run full test suite
|
||||
|
||||
After tasks 1–4 are implemented, run:
|
||||
|
||||
```bash
|
||||
cd backend && python -m pytest tests/ -x -q
|
||||
```
|
||||
|
||||
Verify:
|
||||
- All tests pass (currently 443).
|
||||
- `ruff check backend/app/` passes.
|
||||
- `mypy --strict backend/app/` passes on changed files.
|
||||
- Manual smoke test: load the world map page and verify it renders quickly with correct country data.
|
||||
|
||||
Reference in New Issue
Block a user