Files
BanGUI/Docs/PERFORMANCE.md
Lukas cc6dbcf3f0 feat: implement API versioning /api/v1/
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-02 21:29:30 +02:00

98 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance Guidelines
Query optimization patterns for BanGUI backend services.
---
## Never Load Unbounded Result Sets
Loading large result sets into Python memory causes OOM crashes, slow responses, and unbounded growth. Every query that processes large datasets must use one of the following strategies.
### The Problem
With millions of ban records:
- Loading all rows as Python dicts → 200-400 MB+ memory spike
- Python loop aggregation (O(n) per item) → seconds of CPU time
- Offset pagination on large tables → O(n) scan before returning results
### The Solution: SQL Aggregation
SQL GROUP BY executes inside SQLite's optimized query planner, using indexes where available, and returns only the aggregated result (typically a few KB).
```python
# BAD: loads 1M rows into Python
all_rows = await get_all_archived_history(db, since=since)
agg = {}
for row in all_rows: # O(n) Python loop
agg[row["ip"]] = agg.get(row["ip"], 0) + 1
# GOOD: SQL aggregation, returns lightweight {ip, count} pairs
ip_counts = await get_ip_ban_counts(db, since=since)
# [{ip: "1.2.3.4", event_count: 42}, ...] — a few KB regardless of table size
```
### Aggregation Reference
| Use Case | SQL Pattern | Repository Function |
|----------|-------------|-------------------|
| Ban count per IP | `SELECT ip, COUNT(*) FROM history_archive ... GROUP BY ip` | `get_ip_ban_counts()` |
| Ban count per jail | `SELECT jail, COUNT(*) FROM history_archive ... GROUP BY jail ORDER BY COUNT(*) DESC` | `get_jail_ban_counts()` |
| Ban count per time bucket | `SELECT CAST((timeofban - ?) / ? AS INTEGER), COUNT(*) ... GROUP BY bucket_idx` | `get_ban_counts_by_bucket()` |
| Paginated rows (no offset) | `WHERE id < ? ORDER BY id DESC LIMIT ?` | `get_archived_history_keyset()` |
| Total count | `SELECT COUNT(*) FROM ...` (fast with where clause) | included in `get_jail_ban_counts()` return |
### Pagination vs Aggregation
Use **aggregation** when:
- Displaying summary data (counts, totals, group-by results)
- Building country/jail/timeline dashboards
- Only need counts, not individual row data
Use **pagination** when:
- Displaying individual records (ban list, history)
- Clients need access to specific rows
- Exporting or bulk operations
### Batch Geo Lookups
When you need geo data for many IPs, batch in a single call rather than per-IP:
```python
# BAD: N sequential API calls
for ip in unique_ips:
geo = await geo_service.lookup(ip) # 45 req/min rate limit × N calls
# GOOD: one batch call, geo_service handles rate limiting
geo_map, uncached = geo_cache_lookup(unique_ips) # uses in-memory cache
if uncached:
asyncio.create_task(geo_cache.lookup_batch(uncached, http_session)) # fire-and-forget
```
### Index Requirements
SQLite needs indexes on:
- Columns used in WHERE clauses (timeofban, jail, action)
- Columns used in GROUP BY (ip, jail, bucket index)
- Sort columns for pagination (id)
Current indexes on `history_archive`:
- `idx_history_archive_timeofban` — for time-range filtering
- `idx_history_archive_jail_timeofban` — for jail + time filtering
- `idx_history_archive_action_timeofban` — for action + time filtering
- `idx_history_archive_id` — for keyset pagination
Before adding a new query pattern, verify it uses an existing index or add one with a benchmark test.
### Memory Monitoring
Watch for these warning signs:
- Python RSS > 500 MB in container metrics
- Response time > 5s for dashboard endpoints
- Query time > 1s in SQLite EXPLAIN ANALYZE output
Use `EXPLAIN QUERY PLAN` to verify index usage:
```sql
EXPLAIN QUERY PLAN SELECT ip, COUNT(*) FROM history_archive WHERE timeofban >= ? GROUP BY ip;
```
Expected: `USING INDEX idx_history_archive_timeofban` in the output.