# Performance Guidelines Query optimization patterns for BanGUI backend services. --- ## Never Load Unbounded Result Sets Loading large result sets into Python memory causes OOM crashes, slow responses, and unbounded growth. Every query that processes large datasets must use one of the following strategies. ### The Problem With millions of ban records: - Loading all rows as Python dicts → 200-400 MB+ memory spike - Python loop aggregation (O(n) per item) → seconds of CPU time - Offset pagination on large tables → O(n) scan before returning results ### The Solution: SQL Aggregation SQL GROUP BY executes inside SQLite's optimized query planner, using indexes where available, and returns only the aggregated result (typically a few KB). ```python # BAD: loads 1M rows into Python all_rows = await get_all_archived_history(db, since=since) agg = {} for row in all_rows: # O(n) Python loop agg[row["ip"]] = agg.get(row["ip"], 0) + 1 # GOOD: SQL aggregation, returns lightweight {ip, count} pairs ip_counts = await get_ip_ban_counts(db, since=since) # [{ip: "1.2.3.4", event_count: 42}, ...] — a few KB regardless of table size ``` ### Aggregation Reference | Use Case | SQL Pattern | Repository Function | |----------|-------------|-------------------| | Ban count per IP | `SELECT ip, COUNT(*) FROM history_archive ... GROUP BY ip` | `get_ip_ban_counts()` | | Ban count per jail | `SELECT jail, COUNT(*) FROM history_archive ... GROUP BY jail ORDER BY COUNT(*) DESC` | `get_jail_ban_counts()` | | Ban count per time bucket | `SELECT CAST((timeofban - ?) / ? AS INTEGER), COUNT(*) ... GROUP BY bucket_idx` | `get_ban_counts_by_bucket()` | | Paginated rows (no offset) | `WHERE id < ? ORDER BY id DESC LIMIT ?` | `get_archived_history_keyset()` | | Total count | `SELECT COUNT(*) FROM ...` (fast with where clause) | included in `get_jail_ban_counts()` return | ### Pagination vs Aggregation Use **aggregation** when: - Displaying summary data (counts, totals, group-by results) - Building country/jail/timeline dashboards - Only need counts, not individual row data Use **pagination** when: - Displaying individual records (ban list, history) - Clients need access to specific rows - Exporting or bulk operations ### Batch Geo Lookups When you need geo data for many IPs, batch in a single call rather than per-IP: ```python # BAD: N sequential API calls for ip in unique_ips: geo = await geo_service.lookup(ip) # 45 req/min rate limit × N calls # GOOD: one batch call, geo_service handles rate limiting geo_map, uncached = geo_cache_lookup(unique_ips) # uses in-memory cache if uncached: asyncio.create_task(geo_cache.lookup_batch(uncached, http_session)) # fire-and-forget ``` ### Index Requirements SQLite needs indexes on: - Columns used in WHERE clauses (timeofban, jail, action) - Columns used in GROUP BY (ip, jail, bucket index) - Sort columns for pagination (id) Current indexes on `history_archive`: - `idx_history_archive_timeofban` — for time-range filtering - `idx_history_archive_jail_timeofban` — for jail + time filtering - `idx_history_archive_action_timeofban` — for action + time filtering - `idx_history_archive_id` — for keyset pagination Before adding a new query pattern, verify it uses an existing index or add one with a benchmark test. ### Memory Monitoring Watch for these warning signs: - Python RSS > 500 MB in container metrics - Response time > 5s for dashboard endpoints - Query time > 1s in SQLite EXPLAIN ANALYZE output Use `EXPLAIN QUERY PLAN` to verify index usage: ```sql EXPLAIN QUERY PLAN SELECT ip, COUNT(*) FROM history_archive WHERE timeofban >= ? GROUP BY ip; ``` Expected: `USING INDEX idx_history_archive_timeofban` in the output.