BanGUI/Docs/PERFORMANCE.md

# Performance Guidelines

Query optimization patterns for BanGUI backend services.

---

## Never Load Unbounded Result Sets

Loading large result sets into Python memory causes OOM crashes, slow responses, and unbounded growth. Every query that processes large datasets must use one of the following strategies.

### The Problem

With millions of ban records:
- Loading all rows as Python dicts → 200-400 MB+ memory spike
- Python loop aggregation (O(n) per item) → seconds of CPU time
- Offset pagination on large tables → O(n) scan before returning results

### The Solution: SQL Aggregation

SQL GROUP BY executes inside SQLite's optimized query planner, using indexes where available, and returns only the aggregated result (typically a few KB).

```python
# BAD: loads 1M rows into Python
all_rows = await get_all_archived_history(db, since=since)
agg = {}
for row in all_rows:  # O(n) Python loop
    agg[row["ip"]] = agg.get(row["ip"], 0) + 1

# GOOD: SQL aggregation, returns lightweight {ip, count} pairs
ip_counts = await get_ip_ban_counts(db, since=since)
# [{ip: "1.2.3.4", event_count: 42}, ...] — a few KB regardless of table size
```

### Aggregation Reference

| Use Case | SQL Pattern | Repository Function |
|----------|-------------|-------------------|
| Ban count per IP | `SELECT ip, COUNT(*) FROM history_archive ... GROUP BY ip` | `get_ip_ban_counts()` |
| Ban count per jail | `SELECT jail, COUNT(*) FROM history_archive ... GROUP BY jail ORDER BY COUNT(*) DESC` | `get_jail_ban_counts()` |
| Ban count per time bucket | `SELECT CAST((timeofban - ?) / ? AS INTEGER), COUNT(*) ... GROUP BY bucket_idx` | `get_ban_counts_by_bucket()` |
| Paginated rows (no offset) | `WHERE id < ? ORDER BY id DESC LIMIT ?` | `get_archived_history_keyset()` |
| Total count | `SELECT COUNT(*) FROM ...` (fast with where clause) | included in `get_jail_ban_counts()` return |

### Pagination vs Aggregation

Use **aggregation** when:
- Displaying summary data (counts, totals, group-by results)
- Building country/jail/timeline dashboards
- Only need counts, not individual row data

Use **pagination** when:
- Displaying individual records (ban list, history)
- Clients need access to specific rows
- Exporting or bulk operations

### Batch Geo Lookups

When you need geo data for many IPs, batch in a single call rather than per-IP:

```python
# BAD: N sequential API calls
for ip in unique_ips:
    geo = await geo_service.lookup(ip)  # 45 req/min rate limit × N calls

# GOOD: one batch call, geo_service handles rate limiting
geo_map, uncached = geo_cache_lookup(unique_ips)  # uses in-memory cache
if uncached:
    asyncio.create_task(geo_cache.lookup_batch(uncached, http_session))  # fire-and-forget
```

### Index Requirements

SQLite needs indexes on:
- Columns used in WHERE clauses (timeofban, jail, action)
- Columns used in GROUP BY (ip, jail, bucket index)
- Sort columns for pagination (id)

Current indexes on `history_archive`:
- `idx_history_archive_timeofban` — for time-range filtering
- `idx_history_archive_jail_timeofban` — for jail + time filtering
- `idx_history_archive_action_timeofban` — for action + time filtering
- `idx_history_archive_id` — for keyset pagination

Before adding a new query pattern, verify it uses an existing index or add one with a benchmark test.

### Memory Monitoring

Watch for these warning signs:
- Python RSS > 500 MB in container metrics
- Response time > 5s for dashboard endpoints
- Query time > 1s in SQLite EXPLAIN ANALYZE output

Use `EXPLAIN QUERY PLAN` to verify index usage:
```sql
EXPLAIN QUERY PLAN SELECT ip, COUNT(*) FROM history_archive WHERE timeofban >= ? GROUP BY ip;
```

Expected: `USING INDEX idx_history_archive_timeofban` in the output.