Fix HIGH priority issues: unbounded queries, rate limiting, health checks

Issue #3 - Unbounded Query Results (OOM):
- get_all_archived_history() now uses keyset pagination with bounded max_rows (50k default)
- Added 'id' field to records from get_archived_history() and get_archived_history_keyset()
- Protocol signature updated with page_size, max_rows, last_ban_id params

Issue #7 - Docker Health Check Fails:
- Added curl to Dockerfile.backend runtime image
- HEALTHCHECK now uses 'curl -f http://localhost:8000/api/health'
- compose.prod.yml: increased start_period to 40s, timeout to 10s
- Frontend healthcheck proxies to backend /api/health

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-01 21:47:36 +02:00
parent 1830da496d
commit 0d5882b32f
39 changed files with 2067 additions and 339 deletions

View File

@@ -124,6 +124,31 @@ Check logs for these key events:
If duplication occurs frequently, consider migrating to Redis-backed locking (see Advanced section below) for higher reliability.
### Troubleshooting: "Scheduler stops completely"
**Symptom:** Background tasks (blocklist import, geo cache cleanup, history sync, session cleanup) stop running. No errors in logs but tasks don't execute.
**Cause:** Instance holding the scheduler lock crashed without releasing it, or heartbeat is failing silently.
**Diagnosis:**
1. Check if lock exists: `SELECT * FROM scheduler_lock;`
2. If lock exists with a PID that no longer runs, it's orphaned
3. Check logs for `scheduler_lock_heartbeat_lost` warnings
**Solution:**
1. **Clear the orphaned lock:** `DELETE FROM scheduler_lock;`
2. **Restart the instance** that should hold the lock
3. Verify lock acquisition: `grep "scheduler_lock_acquired" logs`
4. If heartbeat keeps failing, check database latency (SQLite heartbeats should be <100ms)
**Prevention:**
- Monitor `scheduler_lock_heartbeat_lost` events — more than 3 in an hour indicates a problem
- Ensure database I/O is not bottlenecked (SSD recommended for SQLite)
- Consider reducing heartbeat interval if network latency causes false timeouts
### Advanced: Migrating to Redis
For very high-traffic deployments with strict data consistency requirements, you can replace the SQLite-backed lock with Redis: