Refactor pagination with cursor-based support and standardized response format

- Implement cursor-based pagination in pagination.py - Update response models to standardize pagination structure - Add cursor pagination utilities for repositories - Update HistoryArchiveRepository and ImportLogRepository with new pagination - Add comprehensive tests for cursor pagination - Update documentation for backend development and task tracking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 17:54:05 +02:00
parent be974b9b0d
commit 67b26a3ef7
8 changed files with 613 additions and 51 deletions
--- a/Docs/Backend-Development.md
+++ b/Docs/Backend-Development.md
@@ -159,6 +159,100 @@ If you add new query patterns to `history_archive_repo.py`:
 - **Pros**: Faster SELECT queries, reduced CPU during queries.
 - **Cons**: Slower INSERT/UPDATE/DELETE (indexes must be maintained), larger database file size.

+---
+
+## 7.5 Cursor-Based Pagination for Large Result Sets
+
+**Problem:** Offset-based pagination (`LIMIT ? OFFSET ?`) scans and discards N rows before fetching the next N. On a 10M-row table, fetching the last page takes 15+ seconds because SQLite must evaluate all previous rows.
+
+**Solution:** Use keyset pagination (cursor-based) with `WHERE id > last_id` instead of OFFSET. This leverages indexes to jump directly to the next page in O(log N) time.
+
+### Offset vs. Cursor Pagination
+
+| Aspect | Offset (`LIMIT ? OFFSET ?`) | Cursor (`WHERE id > ?`) |
+|--------|-----|-----|
+| Performance | O(N) — scans N rows to fetch | O(log N) — index jump |
+| Last page on 10M rows | 15+ seconds ⚠️ | <50ms ✓ |
+| API Contract | `page`, `page_size` | `cursor`, `page_size` |
+| Backward nav | Stateless (any page any time) | Stateless (cursor is opaque) |
+| Count query | Required (slow on large tables) | Not required |
+
+### When to Use Cursor Pagination
+
+- ✓ **Use cursor pagination** for large tables (>100K rows) with frequent pagination queries
+- ✓ **Use cursor pagination** for real-time feeds where rows are constantly added/modified
+- ✓ **Use cursor pagination** if your API already exposes cursor tokens to clients
+- ✗ **Use offset pagination** for small datasets or administrative interfaces where performance is not critical
+
+### Implementation Pattern
+
+**1. Add indexes on sort columns:**
+
+Cursor queries use `WHERE id > :cursor ORDER BY id ASC LIMIT :page_size`. Ensure the sort column is indexed or part of a composite index.
+
+**2. Use cursor pagination utilities:**
+
+```python
+from app.utils.pagination import encode_cursor, decode_cursor
+
+# Fetch next page using cursor
+last_row_id = decode_cursor(cursor) if cursor else None
+items, has_more = await repo.get_items_keyset(
+    page_size=50,
+    last_row_id=last_row_id,
+)
+
+# Encode cursor for next page
+next_cursor = encode_cursor(items[-1]["id"]) if items and has_more else None
+```
+
+**3. Return cursor in pagination metadata:**
+
+The response includes `cursor` (for cursor pagination) in addition to `page`, `page_size`, and `has_next_page`:
+
+```json
+{
+  "items": [...],
+  "pagination": {
+    "page": 1,
+    "page_size": 50,
+    "total": -1,
+    "total_pages": -1,
+    "has_next_page": true,
+    "has_prev_page": false,
+    "cursor": "eyJpZCI6IDQyN30="
+  }
+}
+```
+
+**4. Repositories supporting cursor pagination:**
+
+- `import_log_repo.list_logs_keyset()` — Import log with cursor pagination
+- `history_archive_repo.get_archived_history_keyset()` — Archived bans with cursor pagination
+
+Both functions return `(items, has_more)` instead of `(items, total)` to avoid expensive COUNT queries.
+
+### Cursor Format & Security
+
+Cursors are **opaque base64-encoded JSON** objects. Clients must not decode or modify them:
+
+```python
+# Cursor structure (internal only — never expose raw JSON to client)
+{"id": 12345}
+
+# Base64-encoded cursor sent to client:
+# eyJpZCI6IDEyMzQ1fQ==
+
+# Decode with decode_cursor() which validates the format
+last_id = decode_cursor(cursor)
+```
+
+Benefits:
+- ✓ **Opaque to client** — Format can evolve without breaking API compatibility
+- ✓ **Deterministic** — Same row ID always produces the same cursor
+- ✓ **Tamper-evident** — Invalid/malformed cursors are rejected with clear errors
+
+
 For `history_archive`, the read-heavy workload justifies these indexes because:
 - Inserts are batched during sync (one batch per minute), not per-request.
 - Deletes happen once per day during purge.