# Troubleshooting Guide ## Scheduler Lock Issues ### Lock Held by Crashed Instance (Orphaned Lock) **Symptom:** Background tasks stop running. Logs show `scheduler_lock_held_by_other_instance` but no other instance is running. **Diagnosis:** ```bash sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;" ``` If `heartbeat_at` is older than 5 minutes and the PID no longer exists, the lock is orphaned. **Recovery:** ```bash sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;" ``` Restart the backend. It will acquire the lock fresh. **Prevention:** - Monitor `scheduler_lock_heartbeat_lost` events in logs - If >3 occurrences per hour, investigate database I/O performance --- ### Two Instances Both Running Scheduler **Symptom:** Duplicate blocklist imports, duplicate geo cache cleanups, or duplicate history syncs. **Cause:** Both instances believe they hold the lock. **Diagnosis:** 1. Check which instance holds the lock: `SELECT pid, hostname FROM scheduler_lock;` 2. Compare with running processes: `ps aux | grep bangui` **Solution:** 1. Stop one instance immediately 2. Clear lock: `DELETE FROM scheduler_lock;` 3. Restart the remaining instance **Prevention:** - Ensure only one instance starts before heartbeat begins - Check `BANGUI_SINGLE_INSTANCE=true` is set if single-instance operation is required --- ### Heartbeat Update Failures **Symptom:** Logs show `scheduler_lock_heartbeat_lost` repeatedly, then lock is lost. **Cause:** Database writes failing or extremely slow (>5 seconds per write). **Diagnosis:** ```bash time sqlite3 /var/lib/bangui/bangui.db "UPDATE scheduler_lock SET heartbeat_at = unixepoch();" ``` If this takes >1 second, database I/O is degraded. **Solution:** 1. Check disk health: `sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"` 2. Move database to faster storage (SSD) 3. Check for other I/O bottlenecks on the host --- ### Lock Not Acquired at Startup **Symptom:** Instance fails to start with error "Could not acquire scheduler lock". **Cause:** Another instance already holds the lock and appears healthy. **Diagnosis:** ```bash sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;" ps aux | grep ``` **Solution:** - If other instance is healthy and should run scheduler: this instance must wait - If other instance is crashed: `DELETE FROM scheduler_lock;` then restart this instance - If running single instance: ensure no other instances are running before startup --- ## General Recovery Commands Clear all locks: ```bash sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;" ``` Check lock status: ```bash sqlite3 /var/lib/bangui/bangui.db "SELECT * FROM scheduler_lock;" ``` Verify database integrity: ```bash sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;" ``` --- ## Getting Help If issues persist after following this guide: 1. Enable debug logging: `BANGUI_LOG_LEVEL=debug` 2. Collect logs around the failure time 3. Check `Docs/Deployment.md` for configuration guidance 4. Check `Docs/Observability.md` for monitoring setup