Add database migration 5: Indexes for history_archive query performance

- Add composite index on (jail, timeofban DESC) for dashboard filtering
- Add composite index on (timeofban DESC, jail, action) for time-range queries
- Add single-column indexes on ip and action for targeted filtering
- Update schema version to 5 and document in Backend-Development.md

Indexes optimize common dashboard and API query patterns with pagination.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-29 20:17:58 +02:00
parent 187cd8250d
commit b6631b86e4
4 changed files with 257 additions and 19 deletions

View File

@@ -102,6 +102,68 @@ rows = await db.execute(
)
```
### Database Performance & Indexing
Large archive datasets can degrade query performance without proper indexing. The `history_archive` table supports multiple filter patterns:
**Query Patterns (Indexed for Performance):**
1. **MAX(timeofban)**`history_sync_task` queries for the latest timestamp to know where to resume syncing from fail2ban. This is a covering index lookup.
2. **Jail filter with time ordering** — Dashboard and API endpoints filter by `jail` and sort by `timeofban DESC` for pagination. This is accelerated by `idx_history_archive_jail_timeofban`.
3. **Time-range filter** — Queries filter by `timeofban >= since` to fetch recent records. This uses the composite index `idx_history_archive_timeofban_jail_action` which includes `timeofban` as the leading column for efficient range scans.
4. **IP filter** — Users can search by exact IP or IP prefix (using `LIKE ip%`). The `idx_history_archive_ip` index accelerates these searches.
5. **Action filter** — Queries may filter by action ('ban' or 'unban'). The `idx_history_archive_action` index supports this.
6. **Purge old entries** — Background tasks delete entries older than a threshold (`timeofban < cutoff`). This uses `idx_history_archive_timeofban_jail_action`.
**Current Indexes (defined in `backend/app/db.py` Migration 5):**
- `idx_history_archive_jail_timeofban(jail, timeofban DESC)` — Composite index for jail-filtered queries.
- `idx_history_archive_timeofban_jail_action(timeofban DESC, jail, action)` — Covering index for time-range queries and MAX lookups.
- `idx_history_archive_ip(ip)` — Single-column index for IP searches.
- `idx_history_archive_action(action)` — Single-column index for action filtering.
**Benchmark Results:**
Query benchmarks (see `backend/tests/test_repositories/test_history_archive_indexing.py`) verify that common operations complete within expected thresholds on datasets with 10,000+ records:
| Operation | Time Budget | Actual (with indexes) |
|-----------|-------------|----------------------|
| MAX(timeofban) | <0.01s | ✓ Uses covering index |
| Count with jail filter | <0.10s | ✓ Covering index scan |
| List with jail + order | <0.05s | ✓ Index fully utilized |
| Time-range filter | <0.05s | ✓ Range scan on timeofban |
| Combined filters | <0.05s | ✓ Composite indexes used |
**Adding New Indexes:**
If you add new query patterns to `history_archive_repo.py`:
1. **Analyze the WHERE and ORDER BY clauses** — Identify which columns are filtered and sorted.
2. **Check EXPLAIN QUERY PLAN** in a local test:
```python
async with db.execute("EXPLAIN QUERY PLAN SELECT ...") as cur:
rows = await cur.fetchall()
for row in rows: print(row[3]) # Print the plan text
```
3. **If the plan shows a full table scan, add an index** that matches the filter columns in order.
4. **Create a migration** in `backend/app/db.py` following the pattern from Migration 5.
5. **Add a benchmark test** to verify the new index improves query performance.
**Index Tradeoffs:**
- **Pros**: Faster SELECT queries, reduced CPU during queries.
- **Cons**: Slower INSERT/UPDATE/DELETE (indexes must be maintained), larger database file size.
For `history_archive`, the read-heavy workload justifies these indexes because:
- Inserts are batched during sync (one batch per minute), not per-request.
- Deletes happen once per day during purge.
- SELECT queries run on every API request to the history endpoint.
---
## 3. Project Structure