# Troubleshooting Guide

## Scheduler Lock Issues

### Lock Held by Crashed Instance (Orphaned Lock)

**Symptom:** Background tasks stop running. Logs show `scheduler_lock_held_by_other_instance` but no other instance is running.

**Diagnosis:**
```bash
sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
```

If `heartbeat_at` is older than 5 minutes and the PID no longer exists, the lock is orphaned.

**Recovery:**
```bash
sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"
```

Restart the backend. It will acquire the lock fresh.

**Prevention:**
- Monitor `scheduler_lock_heartbeat_lost` events in logs
- If >3 occurrences per hour, investigate database I/O performance

---

### Two Instances Both Running Scheduler

**Symptom:** Duplicate blocklist imports, duplicate geo cache cleanups, or duplicate history syncs.

**Cause:** Both instances believe they hold the lock.

**Diagnosis:**
1. Check which instance holds the lock: `SELECT pid, hostname FROM scheduler_lock;`
2. Compare with running processes: `ps aux | grep bangui`

**Solution:**
1. Stop one instance immediately
2. Clear lock: `DELETE FROM scheduler_lock;`
3. Restart the remaining instance

**Prevention:**
- Ensure only one instance starts before heartbeat begins
- Check `BANGUI_SINGLE_INSTANCE=true` is set if single-instance operation is required

---

### Heartbeat Update Failures

**Symptom:** Logs show `scheduler_lock_heartbeat_lost` repeatedly, then lock is lost.

**Cause:** Database writes failing or extremely slow (>5 seconds per write).

**Diagnosis:**
```bash
time sqlite3 /var/lib/bangui/bangui.db "UPDATE scheduler_lock SET heartbeat_at = unixepoch();"
```

If this takes >1 second, database I/O is degraded.

**Solution:**
1. Check disk health: `sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"`
2. Move database to faster storage (SSD)
3. Check for other I/O bottlenecks on the host

---

### Lock Not Acquired at Startup

**Symptom:** Instance fails to start with error "Could not acquire scheduler lock".

**Cause:** Another instance already holds the lock and appears healthy.

**Diagnosis:**
```bash
sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
ps aux | grep <pid>
```

**Solution:**
- If other instance is healthy and should run scheduler: this instance must wait
- If other instance is crashed: `DELETE FROM scheduler_lock;` then restart this instance
- If running single instance: ensure no other instances are running before startup

---

## General Recovery Commands

Clear all locks:
```bash
sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"
```

Check lock status:
```bash
sqlite3 /var/lib/bangui/bangui.db "SELECT * FROM scheduler_lock;"
```

Verify database integrity:
```bash
sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"
```

---

## Getting Help

If issues persist after following this guide:

1. Enable debug logging: `BANGUI_LOG_LEVEL=debug`
2. Collect logs around the failure time
3. Check `Docs/Deployment.md` for configuration guidance
4. Check `Docs/Observability.md` for monitoring setup