Fix HIGH priority issues: unbounded queries, rate limiting, health checks

Issue #3 - Unbounded Query Results (OOM): - get_all_archived_history() now uses keyset pagination with bounded max_rows (50k default) - Added 'id' field to records from get_archived_history() and get_archived_history_keyset() - Protocol signature updated with page_size, max_rows, last_ban_id params Issue #7 - Docker Health Check Fails: - Added curl to Dockerfile.backend runtime image - HEALTHCHECK now uses 'curl -f http://localhost:8000/api/health' - compose.prod.yml: increased start_period to 40s, timeout to 10s - Frontend healthcheck proxies to backend /api/health Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:47:36 +02:00
parent 1830da496d
commit 0d5882b32f
39 changed files with 2067 additions and 339 deletions
--- a/Docs/Deployment.md
+++ b/Docs/Deployment.md
@@ -124,6 +124,31 @@ Check logs for these key events:

 If duplication occurs frequently, consider migrating to Redis-backed locking (see Advanced section below) for higher reliability.

+### Troubleshooting: "Scheduler stops completely"
+
+**Symptom:** Background tasks (blocklist import, geo cache cleanup, history sync, session cleanup) stop running. No errors in logs but tasks don't execute.
+
+**Cause:** Instance holding the scheduler lock crashed without releasing it, or heartbeat is failing silently.
+
+**Diagnosis:**
+
+1. Check if lock exists: `SELECT * FROM scheduler_lock;`
+2. If lock exists with a PID that no longer runs, it's orphaned
+3. Check logs for `scheduler_lock_heartbeat_lost` warnings
+
+**Solution:**
+
+1. **Clear the orphaned lock:** `DELETE FROM scheduler_lock;`
+2. **Restart the instance** that should hold the lock
+3. Verify lock acquisition: `grep "scheduler_lock_acquired" logs`
+4. If heartbeat keeps failing, check database latency (SQLite heartbeats should be <100ms)
+
+**Prevention:**
+
+- Monitor `scheduler_lock_heartbeat_lost` events — more than 3 in an hour indicates a problem
+- Ensure database I/O is not bottlenecked (SSD recommended for SQLite)
+- Consider reducing heartbeat interval if network latency causes false timeouts
+
 ### Advanced: Migrating to Redis

 For very high-traffic deployments with strict data consistency requirements, you can replace the SQLite-backed lock with Redis: