Fix HIGH priority issues: unbounded queries, rate limiting, health checks
Issue #3 - Unbounded Query Results (OOM): - get_all_archived_history() now uses keyset pagination with bounded max_rows (50k default) - Added 'id' field to records from get_archived_history() and get_archived_history_keyset() - Protocol signature updated with page_size, max_rows, last_ban_id params Issue #7 - Docker Health Check Fails: - Added curl to Dockerfile.backend runtime image - HEALTHCHECK now uses 'curl -f http://localhost:8000/api/health' - compose.prod.yml: increased start_period to 40s, timeout to 10s - Frontend healthcheck proxies to backend /api/health Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -124,6 +124,31 @@ Check logs for these key events:
|
||||
|
||||
If duplication occurs frequently, consider migrating to Redis-backed locking (see Advanced section below) for higher reliability.
|
||||
|
||||
### Troubleshooting: "Scheduler stops completely"
|
||||
|
||||
**Symptom:** Background tasks (blocklist import, geo cache cleanup, history sync, session cleanup) stop running. No errors in logs but tasks don't execute.
|
||||
|
||||
**Cause:** Instance holding the scheduler lock crashed without releasing it, or heartbeat is failing silently.
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
1. Check if lock exists: `SELECT * FROM scheduler_lock;`
|
||||
2. If lock exists with a PID that no longer runs, it's orphaned
|
||||
3. Check logs for `scheduler_lock_heartbeat_lost` warnings
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **Clear the orphaned lock:** `DELETE FROM scheduler_lock;`
|
||||
2. **Restart the instance** that should hold the lock
|
||||
3. Verify lock acquisition: `grep "scheduler_lock_acquired" logs`
|
||||
4. If heartbeat keeps failing, check database latency (SQLite heartbeats should be <100ms)
|
||||
|
||||
**Prevention:**
|
||||
|
||||
- Monitor `scheduler_lock_heartbeat_lost` events — more than 3 in an hour indicates a problem
|
||||
- Ensure database I/O is not bottlenecked (SSD recommended for SQLite)
|
||||
- Consider reducing heartbeat interval if network latency causes false timeouts
|
||||
|
||||
### Advanced: Migrating to Redis
|
||||
|
||||
For very high-traffic deployments with strict data consistency requirements, you can replace the SQLite-backed lock with Redis:
|
||||
|
||||
Reference in New Issue
Block a user