feat: implement API versioning /api/v1/
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
98
Docs/PERFORMANCE.md
Normal file
98
Docs/PERFORMANCE.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# Performance Guidelines
|
||||
|
||||
Query optimization patterns for BanGUI backend services.
|
||||
|
||||
---
|
||||
|
||||
## Never Load Unbounded Result Sets
|
||||
|
||||
Loading large result sets into Python memory causes OOM crashes, slow responses, and unbounded growth. Every query that processes large datasets must use one of the following strategies.
|
||||
|
||||
### The Problem
|
||||
|
||||
With millions of ban records:
|
||||
- Loading all rows as Python dicts → 200-400 MB+ memory spike
|
||||
- Python loop aggregation (O(n) per item) → seconds of CPU time
|
||||
- Offset pagination on large tables → O(n) scan before returning results
|
||||
|
||||
### The Solution: SQL Aggregation
|
||||
|
||||
SQL GROUP BY executes inside SQLite's optimized query planner, using indexes where available, and returns only the aggregated result (typically a few KB).
|
||||
|
||||
```python
|
||||
# BAD: loads 1M rows into Python
|
||||
all_rows = await get_all_archived_history(db, since=since)
|
||||
agg = {}
|
||||
for row in all_rows: # O(n) Python loop
|
||||
agg[row["ip"]] = agg.get(row["ip"], 0) + 1
|
||||
|
||||
# GOOD: SQL aggregation, returns lightweight {ip, count} pairs
|
||||
ip_counts = await get_ip_ban_counts(db, since=since)
|
||||
# [{ip: "1.2.3.4", event_count: 42}, ...] — a few KB regardless of table size
|
||||
```
|
||||
|
||||
### Aggregation Reference
|
||||
|
||||
| Use Case | SQL Pattern | Repository Function |
|
||||
|----------|-------------|-------------------|
|
||||
| Ban count per IP | `SELECT ip, COUNT(*) FROM history_archive ... GROUP BY ip` | `get_ip_ban_counts()` |
|
||||
| Ban count per jail | `SELECT jail, COUNT(*) FROM history_archive ... GROUP BY jail ORDER BY COUNT(*) DESC` | `get_jail_ban_counts()` |
|
||||
| Ban count per time bucket | `SELECT CAST((timeofban - ?) / ? AS INTEGER), COUNT(*) ... GROUP BY bucket_idx` | `get_ban_counts_by_bucket()` |
|
||||
| Paginated rows (no offset) | `WHERE id < ? ORDER BY id DESC LIMIT ?` | `get_archived_history_keyset()` |
|
||||
| Total count | `SELECT COUNT(*) FROM ...` (fast with where clause) | included in `get_jail_ban_counts()` return |
|
||||
|
||||
### Pagination vs Aggregation
|
||||
|
||||
Use **aggregation** when:
|
||||
- Displaying summary data (counts, totals, group-by results)
|
||||
- Building country/jail/timeline dashboards
|
||||
- Only need counts, not individual row data
|
||||
|
||||
Use **pagination** when:
|
||||
- Displaying individual records (ban list, history)
|
||||
- Clients need access to specific rows
|
||||
- Exporting or bulk operations
|
||||
|
||||
### Batch Geo Lookups
|
||||
|
||||
When you need geo data for many IPs, batch in a single call rather than per-IP:
|
||||
|
||||
```python
|
||||
# BAD: N sequential API calls
|
||||
for ip in unique_ips:
|
||||
geo = await geo_service.lookup(ip) # 45 req/min rate limit × N calls
|
||||
|
||||
# GOOD: one batch call, geo_service handles rate limiting
|
||||
geo_map, uncached = geo_cache_lookup(unique_ips) # uses in-memory cache
|
||||
if uncached:
|
||||
asyncio.create_task(geo_cache.lookup_batch(uncached, http_session)) # fire-and-forget
|
||||
```
|
||||
|
||||
### Index Requirements
|
||||
|
||||
SQLite needs indexes on:
|
||||
- Columns used in WHERE clauses (timeofban, jail, action)
|
||||
- Columns used in GROUP BY (ip, jail, bucket index)
|
||||
- Sort columns for pagination (id)
|
||||
|
||||
Current indexes on `history_archive`:
|
||||
- `idx_history_archive_timeofban` — for time-range filtering
|
||||
- `idx_history_archive_jail_timeofban` — for jail + time filtering
|
||||
- `idx_history_archive_action_timeofban` — for action + time filtering
|
||||
- `idx_history_archive_id` — for keyset pagination
|
||||
|
||||
Before adding a new query pattern, verify it uses an existing index or add one with a benchmark test.
|
||||
|
||||
### Memory Monitoring
|
||||
|
||||
Watch for these warning signs:
|
||||
- Python RSS > 500 MB in container metrics
|
||||
- Response time > 5s for dashboard endpoints
|
||||
- Query time > 1s in SQLite EXPLAIN ANALYZE output
|
||||
|
||||
Use `EXPLAIN QUERY PLAN` to verify index usage:
|
||||
```sql
|
||||
EXPLAIN QUERY PLAN SELECT ip, COUNT(*) FROM history_archive WHERE timeofban >= ? GROUP BY ip;
|
||||
```
|
||||
|
||||
Expected: `USING INDEX idx_history_archive_timeofban` in the output.
|
||||
Reference in New Issue
Block a user