Refactor pagination with cursor-based support and standardized response format

- Implement cursor-based pagination in pagination.py
- Update response models to standardize pagination structure
- Add cursor pagination utilities for repositories
- Update HistoryArchiveRepository and ImportLogRepository with new pagination
- Add comprehensive tests for cursor pagination
- Update documentation for backend development and task tracking

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-01 17:54:05 +02:00
parent be974b9b0d
commit 67b26a3ef7
8 changed files with 613 additions and 51 deletions

View File

@@ -159,6 +159,100 @@ If you add new query patterns to `history_archive_repo.py`:
- **Pros**: Faster SELECT queries, reduced CPU during queries.
- **Cons**: Slower INSERT/UPDATE/DELETE (indexes must be maintained), larger database file size.
---
## 7.5 Cursor-Based Pagination for Large Result Sets
**Problem:** Offset-based pagination (`LIMIT ? OFFSET ?`) scans and discards N rows before fetching the next N. On a 10M-row table, fetching the last page takes 15+ seconds because SQLite must evaluate all previous rows.
**Solution:** Use keyset pagination (cursor-based) with `WHERE id > last_id` instead of OFFSET. This leverages indexes to jump directly to the next page in O(log N) time.
### Offset vs. Cursor Pagination
| Aspect | Offset (`LIMIT ? OFFSET ?`) | Cursor (`WHERE id > ?`) |
|--------|-----|-----|
| Performance | O(N) — scans N rows to fetch | O(log N) — index jump |
| Last page on 10M rows | 15+ seconds ⚠️ | <50ms ✓ |
| API Contract | `page`, `page_size` | `cursor`, `page_size` |
| Backward nav | Stateless (any page any time) | Stateless (cursor is opaque) |
| Count query | Required (slow on large tables) | Not required |
### When to Use Cursor Pagination
- ✓ **Use cursor pagination** for large tables (>100K rows) with frequent pagination queries
- ✓ **Use cursor pagination** for real-time feeds where rows are constantly added/modified
- ✓ **Use cursor pagination** if your API already exposes cursor tokens to clients
- ✗ **Use offset pagination** for small datasets or administrative interfaces where performance is not critical
### Implementation Pattern
**1. Add indexes on sort columns:**
Cursor queries use `WHERE id > :cursor ORDER BY id ASC LIMIT :page_size`. Ensure the sort column is indexed or part of a composite index.
**2. Use cursor pagination utilities:**
```python
from app.utils.pagination import encode_cursor, decode_cursor
# Fetch next page using cursor
last_row_id = decode_cursor(cursor) if cursor else None
items, has_more = await repo.get_items_keyset(
page_size=50,
last_row_id=last_row_id,
)
# Encode cursor for next page
next_cursor = encode_cursor(items[-1]["id"]) if items and has_more else None
```
**3. Return cursor in pagination metadata:**
The response includes `cursor` (for cursor pagination) in addition to `page`, `page_size`, and `has_next_page`:
```json
{
"items": [...],
"pagination": {
"page": 1,
"page_size": 50,
"total": -1,
"total_pages": -1,
"has_next_page": true,
"has_prev_page": false,
"cursor": "eyJpZCI6IDQyN30="
}
}
```
**4. Repositories supporting cursor pagination:**
- `import_log_repo.list_logs_keyset()` — Import log with cursor pagination
- `history_archive_repo.get_archived_history_keyset()` — Archived bans with cursor pagination
Both functions return `(items, has_more)` instead of `(items, total)` to avoid expensive COUNT queries.
### Cursor Format & Security
Cursors are **opaque base64-encoded JSON** objects. Clients must not decode or modify them:
```python
# Cursor structure (internal only — never expose raw JSON to client)
{"id": 12345}
# Base64-encoded cursor sent to client:
# eyJpZCI6IDEyMzQ1fQ==
# Decode with decode_cursor() which validates the format
last_id = decode_cursor(cursor)
```
Benefits:
- ✓ **Opaque to client** — Format can evolve without breaking API compatibility
- ✓ **Deterministic** — Same row ID always produces the same cursor
- ✓ **Tamper-evident** — Invalid/malformed cursors are rejected with clear errors
For `history_archive`, the read-heavy workload justifies these indexes because:
- Inserts are batched during sync (one batch per minute), not per-request.
- Deletes happen once per day during purge.

View File

@@ -1,39 +1,3 @@
## [IMPORTANT] Promise cancellation not checked in .then()/.catch() chains
**Where found**
- `frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88`
- `frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58`
- Multiple components use this pattern
**Why this is needed**
When user navigates away, `.then()` chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.
**Goal**
Check for cancellation in all `.then()/.catch()` chains.
**What to do**
1. Replace `.then()/.catch()` with `async/await` and cancellation check
2. Or use wrapper hook to hide logic
**Possible traps and issues**
- Checking `signal.aborted` after `await` introduces race conditions
- Better: let AbortError propagate, catch it in catch block
**Docs changes needed**
- Update `Docs/Web-Development.md` § Async Patterns
**Doc references**
- `Docs/Web-Development.md` (async patterns)
---
## [MEDIUM] Inefficient database pagination uses OFFSET
**Where found**