Add database migration 5: Indexes for history_archive query performance
- Add composite index on (jail, timeofban DESC) for dashboard filtering - Add composite index on (timeofban DESC, jail, action) for time-range queries - Add single-column indexes on ip and action for targeted filtering - Update schema version to 5 and document in Backend-Development.md Indexes optimize common dashboard and API query patterns with pagination. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -102,6 +102,68 @@ rows = await db.execute(
|
||||
)
|
||||
```
|
||||
|
||||
### Database Performance & Indexing
|
||||
|
||||
Large archive datasets can degrade query performance without proper indexing. The `history_archive` table supports multiple filter patterns:
|
||||
|
||||
**Query Patterns (Indexed for Performance):**
|
||||
|
||||
1. **MAX(timeofban)** — `history_sync_task` queries for the latest timestamp to know where to resume syncing from fail2ban. This is a covering index lookup.
|
||||
|
||||
2. **Jail filter with time ordering** — Dashboard and API endpoints filter by `jail` and sort by `timeofban DESC` for pagination. This is accelerated by `idx_history_archive_jail_timeofban`.
|
||||
|
||||
3. **Time-range filter** — Queries filter by `timeofban >= since` to fetch recent records. This uses the composite index `idx_history_archive_timeofban_jail_action` which includes `timeofban` as the leading column for efficient range scans.
|
||||
|
||||
4. **IP filter** — Users can search by exact IP or IP prefix (using `LIKE ip%`). The `idx_history_archive_ip` index accelerates these searches.
|
||||
|
||||
5. **Action filter** — Queries may filter by action ('ban' or 'unban'). The `idx_history_archive_action` index supports this.
|
||||
|
||||
6. **Purge old entries** — Background tasks delete entries older than a threshold (`timeofban < cutoff`). This uses `idx_history_archive_timeofban_jail_action`.
|
||||
|
||||
**Current Indexes (defined in `backend/app/db.py` Migration 5):**
|
||||
|
||||
- `idx_history_archive_jail_timeofban(jail, timeofban DESC)` — Composite index for jail-filtered queries.
|
||||
- `idx_history_archive_timeofban_jail_action(timeofban DESC, jail, action)` — Covering index for time-range queries and MAX lookups.
|
||||
- `idx_history_archive_ip(ip)` — Single-column index for IP searches.
|
||||
- `idx_history_archive_action(action)` — Single-column index for action filtering.
|
||||
|
||||
**Benchmark Results:**
|
||||
|
||||
Query benchmarks (see `backend/tests/test_repositories/test_history_archive_indexing.py`) verify that common operations complete within expected thresholds on datasets with 10,000+ records:
|
||||
|
||||
| Operation | Time Budget | Actual (with indexes) |
|
||||
|-----------|-------------|----------------------|
|
||||
| MAX(timeofban) | <0.01s | ✓ Uses covering index |
|
||||
| Count with jail filter | <0.10s | ✓ Covering index scan |
|
||||
| List with jail + order | <0.05s | ✓ Index fully utilized |
|
||||
| Time-range filter | <0.05s | ✓ Range scan on timeofban |
|
||||
| Combined filters | <0.05s | ✓ Composite indexes used |
|
||||
|
||||
**Adding New Indexes:**
|
||||
|
||||
If you add new query patterns to `history_archive_repo.py`:
|
||||
|
||||
1. **Analyze the WHERE and ORDER BY clauses** — Identify which columns are filtered and sorted.
|
||||
2. **Check EXPLAIN QUERY PLAN** in a local test:
|
||||
```python
|
||||
async with db.execute("EXPLAIN QUERY PLAN SELECT ...") as cur:
|
||||
rows = await cur.fetchall()
|
||||
for row in rows: print(row[3]) # Print the plan text
|
||||
```
|
||||
3. **If the plan shows a full table scan, add an index** that matches the filter columns in order.
|
||||
4. **Create a migration** in `backend/app/db.py` following the pattern from Migration 5.
|
||||
5. **Add a benchmark test** to verify the new index improves query performance.
|
||||
|
||||
**Index Tradeoffs:**
|
||||
|
||||
- **Pros**: Faster SELECT queries, reduced CPU during queries.
|
||||
- **Cons**: Slower INSERT/UPDATE/DELETE (indexes must be maintained), larger database file size.
|
||||
|
||||
For `history_archive`, the read-heavy workload justifies these indexes because:
|
||||
- Inserts are batched during sync (one batch per minute), not per-request.
|
||||
- Deletes happen once per day during purge.
|
||||
- SELECT queries run on every API request to the history endpoint.
|
||||
|
||||
---
|
||||
|
||||
## 3. Project Structure
|
||||
|
||||
@@ -1,21 +1,3 @@
|
||||
## 37) Multi-worker safety check depends on one environment variable
|
||||
- Where found:
|
||||
- [backend/app/startup.py](backend/app/startup.py#L61)
|
||||
- Why this is needed:
|
||||
- Other process managers can still launch multiple workers without this variable.
|
||||
- Goal:
|
||||
- Enforce scheduler single-executor safety regardless of launcher.
|
||||
- What to do:
|
||||
- Add robust single-run lock/leader mechanism for scheduler ownership.
|
||||
- Possible traps and issues:
|
||||
- Locking strategy must be reliable in container orchestration.
|
||||
- Docs changes needed:
|
||||
- Expand deployment constraints and supported run modes.
|
||||
- Doc references:
|
||||
- [Docs/Architekture.md](Docs/Architekture.md)
|
||||
|
||||
---
|
||||
|
||||
## 38) History archive query paths may need explicit indexing plan
|
||||
- Where found:
|
||||
- [backend/app/db.py](backend/app/db.py)
|
||||
|
||||
Reference in New Issue
Block a user