Files
BanGUI/Docs/Tasks.md
Lukas 44a5a3d70e Fix geo cache write performance: batch commits, read-only GETs, dirty flush
- Remove per-IP db.commit() from _persist_entry() and _persist_neg_entry();
  add a single commit after the full lookup_batch() chunk loop instead.
  Reduces commits from ~5,200 to 1 per bans/by-country request.

- Remove db dependency from GET /api/dashboard/bans and
  GET /api/dashboard/bans/by-country; pass app_db=None so no SQLite
  writes occur during read-only requests.

- Add _dirty set to geo_service; _store() marks resolved IPs dirty.
  New flush_dirty(db) batch-upserts all dirty entries in one transaction.
  New geo_cache_flush APScheduler task flushes every 60 s so geo data
  is persisted without blocking requests.
2026-03-10 18:45:58 +01:00

9.4 KiB
Raw Blame History

BanGUI — Task List

This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.


Issue: World Map Loading Time — Architecture Fix

Problem Summary

The GET /api/dashboard/bans/by-country endpoint is extremely slow on first load. A single request with ~5,200 unique IPs produces 10,400 SQLite commits and 6,000 INSERT statements against the app database — all during a read-only GET request. The log shows 21,000+ lines of SQL trace for just 18 HTTP requests.

Root causes (ordered by impact):

  1. Per-IP commit during geo cache writesgeo_service._persist_entry() and _persist_neg_entry() each call await db.commit() after every single INSERT. With 5,200 uncached IPs this means 5,200+ individual commits, each forcing an fsync. This is the dominant bottleneck.
  2. DB writes on a GET request — The bans/by-country endpoint passes app_db to geo_service.lookup_batch(), which triggers INSERT+COMMIT for every resolved IP. A GET request should never produce database writes/commits. Users do not expect loading a map page to mutate the database.
  3. Same pattern exists in other endpoints — The following GET endpoints also trigger geo cache commits: /api/dashboard/bans, /api/bans/active, /api/history, /api/history/{ip}, /api/geo/lookup/{ip}.

Evidence from log.log

  • Log line count: 21,117 lines for 18 HTTP requests
  • INSERT INTO geo_cache: 6,000 executions
  • db.commit(): 10,400 calls (each INSERT + its commit = 2 ops per IP)
  • geo_batch_lookup_start: reports total=5200 uncached IPs
  • The bans/by-country response is at line 21,086 out of 21,117 — the entire log is essentially one request's geo persist work
  • Other requests (/api/dashboard/status, /api/blocklists/schedule, /api/config/map-color-thresholds) interleave with the geo persist loop because they share the same single async DB connection

Task 1: Batch geo cache writes — eliminate per-IP commits DONE

Summary: Removed await db.commit() from _persist_entry() and _persist_neg_entry(). Added a single await db.commit() (wrapped in try/except) at the end of lookup_batch() after all chunk processing, and after each _persist_entry / _persist_neg_entry call in lookup(). Reduces commits from ~5,200 to 1 per batch request.

File: backend/app/services/geo_service.py

What to change:

The functions _persist_entry() and _persist_neg_entry() each call await db.commit() after every INSERT. Instead, the commit should happen once after the entire batch is processed.

  1. Remove await db.commit() from both _persist_entry() and _persist_neg_entry().
  2. In lookup_batch(), after the loop over all chunks is complete and all _persist_entry() / _persist_neg_entry() calls have been made, issue a single await db.commit() if db is not None.
  3. Wrap the single commit in a try/except to handle any errors gracefully.

Expected impact: Reduces commits from ~5,200 to 1 per request. This alone should cut the endpoint response time dramatically.

Testing: Existing tests in test_services/test_ban_service.py and test_services/test_geo_service.py should continue to pass. Verify the geo_cache table still gets populated after a batch lookup by checking the DB contents in an integration test.


Task 2: Do not write geo cache during GET requests DONE

Summary: Removed db dependency injection from GET /api/dashboard/bans and GET /api/dashboard/bans/by-country in dashboard.py. Both now pass app_db=None to their respective service calls. The other GET endpoints (/api/bans/active, /api/history, /api/history/{ip}, /api/geo/lookup/{ip}) already did not pass db to geo lookups — confirmed correct.

Files: backend/app/routers/dashboard.py, backend/app/routers/bans.py, backend/app/routers/history.py, backend/app/routers/geo.py

What to change:

GET endpoints should not pass app_db (or equivalent) to geo_service functions. The geo resolution should still populate the in-memory cache (which is fast, free, and ephemeral), but should NOT write to SQLite during a read request.

For each of these GET endpoints:

  • GET /api/dashboard/bans/by-country in dashboard.py — stop passing app_db=db to bans_by_country(); pass app_db=None instead.
  • GET /api/dashboard/bans in dashboard.py — stop passing app_db=db to list_bans(); pass app_db=None instead.
  • GET /api/bans/active in bans.py — the enricher callback should not pass db to geo_service.lookup().
  • GET /api/history and GET /api/history/{ip} in history.py — same: enricher should not pass db.
  • GET /api/geo/lookup/{ip} in geo.py — do not pass db to geo_service.lookup().

The persistent geo cache should only be written during explicit write operations:

  • POST /api/geo/re-resolve (already a POST — this is correct)
  • Blocklist import tasks (blocklist_service.py)
  • Application startup via load_cache_from_db()

Expected impact: GET requests become truly read-only. No commits, no fsync, no write contention on the app DB during map loads.

Testing: Run the full test suite. Verify that:

  1. The bans/by-country endpoint still returns correct country data (from in-memory cache).
  2. The geo_cache table is still populated when POST /api/geo/re-resolve is called or after blocklist import.
  3. After a server restart, geo data is still available (because load_cache_from_db() warms memory from previously persisted data).

Task 3: Periodically persist the in-memory geo cache (background task) DONE

Summary: Added _dirty: set[str] to geo_service.py. _store() now adds IPs with a non-null country_code to _dirty; clear_cache() clears it. Added flush_dirty(db) which atomically snapshots/clears _dirty, batch-upserts all rows via executemany(), commits once, and re-adds entries on failure. Created backend/app/tasks/geo_cache_flush.py with a 60-second APScheduler job, registered in main.py.

Files: backend/app/services/geo_service.py, backend/app/tasks/ (new task file)

What to change:

After Task 2, GET requests no longer write to the DB. But newly resolved IPs during GET requests only live in the in-memory cache and would be lost on restart. To avoid this, add a background task that periodically flushes new in-memory entries to the geo_cache table.

  1. In geo_service.py, add a module-level _dirty: set[str] that tracks IPs added to _cache but not yet persisted. When _store() adds an entry, also add the IP to _dirty.
  2. Add a new function flush_dirty(db: aiosqlite.Connection) -> int that:
    • Takes the current _dirty set and clears it atomically.
    • Uses executemany() to batch-INSERT all dirty entries in one transaction.
    • Calls db.commit() once.
    • Returns the count of flushed entries.
  3. Register a periodic task (using the existing APScheduler setup) that calls flush_dirty() every 60 seconds (configurable). This is similar to how other background tasks already run.

Expected impact: Geo data is persisted without blocking any request. If the server restarts, at most 60 seconds of new geo data is lost (and it will simply be re-fetched from ip-api.com on the next request).

Testing: Write a test that:

  • Calls lookup_batch() without a DB reference.
  • Verifies IPs are in _dirty.
  • Calls flush_dirty(db).
  • Verifies the geo_cache table contains the entries and _dirty is empty.

Task 4: Reduce redundant SQL queries per request (settings / auth)

Files: backend/app/dependencies.py, backend/app/main.py, backend/app/repositories/settings_repo.py

What to change:

The log shows that every single HTTP request executes at least 2 separate SQL queries before reaching the actual endpoint logic:

  1. SELECT value FROM settings WHERE key = 'setup_completed' (SetupRedirectMiddleware)
  2. SELECT id, token, ... FROM sessions WHERE token = ? (require_auth dependency)

When multiple requests arrive concurrently (as seen in the log — 3 parallel requests trigger 3× setup_completed + 3× session token queries), this adds unnecessary DB contention.

Options (implement one or both):

  • Cache setup_completed in memory: Once setup is complete, it never goes back to incomplete. Cache the result in app.state and skip the DB query on subsequent requests. Set it on first True read and clear it only if the app restarts.
  • Cache session validation briefly: Use a short TTL in-memory cache (e.g., 510 seconds) for validated session tokens. This reduces repeated DB lookups for the same token across near-simultaneous requests.

Expected impact: Reduces per-request overhead from 2+ SQL queries to 0 for the common case (setup done, session recently validated).

Testing: Existing auth and setup tests must continue to pass. Add a test that validates the cached path (second request skips DB).


Task 5: Audit and verify — run full test suite

After tasks 14 are implemented, run:

cd backend && python -m pytest tests/ -x -q

Verify:

  • All tests pass (currently 443).
  • ruff check backend/app/ passes.
  • mypy --strict backend/app/ passes on changed files.
  • Manual smoke test: load the world map page and verify it renders quickly with correct country data.