Refactor pagination with cursor-based support and standardized response format
- Implement cursor-based pagination in pagination.py - Update response models to standardize pagination structure - Add cursor pagination utilities for repositories - Update HistoryArchiveRepository and ImportLogRepository with new pagination - Add comprehensive tests for cursor pagination - Update documentation for backend development and task tracking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -159,6 +159,100 @@ If you add new query patterns to `history_archive_repo.py`:
|
||||
- **Pros**: Faster SELECT queries, reduced CPU during queries.
|
||||
- **Cons**: Slower INSERT/UPDATE/DELETE (indexes must be maintained), larger database file size.
|
||||
|
||||
---
|
||||
|
||||
## 7.5 Cursor-Based Pagination for Large Result Sets
|
||||
|
||||
**Problem:** Offset-based pagination (`LIMIT ? OFFSET ?`) scans and discards N rows before fetching the next N. On a 10M-row table, fetching the last page takes 15+ seconds because SQLite must evaluate all previous rows.
|
||||
|
||||
**Solution:** Use keyset pagination (cursor-based) with `WHERE id > last_id` instead of OFFSET. This leverages indexes to jump directly to the next page in O(log N) time.
|
||||
|
||||
### Offset vs. Cursor Pagination
|
||||
|
||||
| Aspect | Offset (`LIMIT ? OFFSET ?`) | Cursor (`WHERE id > ?`) |
|
||||
|--------|-----|-----|
|
||||
| Performance | O(N) — scans N rows to fetch | O(log N) — index jump |
|
||||
| Last page on 10M rows | 15+ seconds ⚠️ | <50ms ✓ |
|
||||
| API Contract | `page`, `page_size` | `cursor`, `page_size` |
|
||||
| Backward nav | Stateless (any page any time) | Stateless (cursor is opaque) |
|
||||
| Count query | Required (slow on large tables) | Not required |
|
||||
|
||||
### When to Use Cursor Pagination
|
||||
|
||||
- ✓ **Use cursor pagination** for large tables (>100K rows) with frequent pagination queries
|
||||
- ✓ **Use cursor pagination** for real-time feeds where rows are constantly added/modified
|
||||
- ✓ **Use cursor pagination** if your API already exposes cursor tokens to clients
|
||||
- ✗ **Use offset pagination** for small datasets or administrative interfaces where performance is not critical
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
**1. Add indexes on sort columns:**
|
||||
|
||||
Cursor queries use `WHERE id > :cursor ORDER BY id ASC LIMIT :page_size`. Ensure the sort column is indexed or part of a composite index.
|
||||
|
||||
**2. Use cursor pagination utilities:**
|
||||
|
||||
```python
|
||||
from app.utils.pagination import encode_cursor, decode_cursor
|
||||
|
||||
# Fetch next page using cursor
|
||||
last_row_id = decode_cursor(cursor) if cursor else None
|
||||
items, has_more = await repo.get_items_keyset(
|
||||
page_size=50,
|
||||
last_row_id=last_row_id,
|
||||
)
|
||||
|
||||
# Encode cursor for next page
|
||||
next_cursor = encode_cursor(items[-1]["id"]) if items and has_more else None
|
||||
```
|
||||
|
||||
**3. Return cursor in pagination metadata:**
|
||||
|
||||
The response includes `cursor` (for cursor pagination) in addition to `page`, `page_size`, and `has_next_page`:
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [...],
|
||||
"pagination": {
|
||||
"page": 1,
|
||||
"page_size": 50,
|
||||
"total": -1,
|
||||
"total_pages": -1,
|
||||
"has_next_page": true,
|
||||
"has_prev_page": false,
|
||||
"cursor": "eyJpZCI6IDQyN30="
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**4. Repositories supporting cursor pagination:**
|
||||
|
||||
- `import_log_repo.list_logs_keyset()` — Import log with cursor pagination
|
||||
- `history_archive_repo.get_archived_history_keyset()` — Archived bans with cursor pagination
|
||||
|
||||
Both functions return `(items, has_more)` instead of `(items, total)` to avoid expensive COUNT queries.
|
||||
|
||||
### Cursor Format & Security
|
||||
|
||||
Cursors are **opaque base64-encoded JSON** objects. Clients must not decode or modify them:
|
||||
|
||||
```python
|
||||
# Cursor structure (internal only — never expose raw JSON to client)
|
||||
{"id": 12345}
|
||||
|
||||
# Base64-encoded cursor sent to client:
|
||||
# eyJpZCI6IDEyMzQ1fQ==
|
||||
|
||||
# Decode with decode_cursor() which validates the format
|
||||
last_id = decode_cursor(cursor)
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- ✓ **Opaque to client** — Format can evolve without breaking API compatibility
|
||||
- ✓ **Deterministic** — Same row ID always produces the same cursor
|
||||
- ✓ **Tamper-evident** — Invalid/malformed cursors are rejected with clear errors
|
||||
|
||||
|
||||
For `history_archive`, the read-heavy workload justifies these indexes because:
|
||||
- Inserts are batched during sync (one batch per minute), not per-request.
|
||||
- Deletes happen once per day during purge.
|
||||
|
||||
Reference in New Issue
Block a user