- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5.1 KiB
Troubleshooting Guide
Scheduler Lock Issues
Lock Held by Crashed Instance (Orphaned Lock)
Symptom: Background tasks stop running. Logs show scheduler_lock_held_by_other_instance but no other instance is running.
Diagnosis:
sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
If heartbeat_at is older than 5 minutes and the PID no longer exists, the lock is orphaned.
Recovery:
sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"
Restart the backend. It will acquire the lock fresh.
Prevention:
- Monitor
scheduler_lock_heartbeat_lostevents in logs - If >3 occurrences per hour, investigate database I/O performance
Two Instances Both Running Scheduler
Symptom: Duplicate blocklist imports, duplicate geo cache cleanups, or duplicate history syncs.
Cause: Both instances believe they hold the lock.
Diagnosis:
- Check which instance holds the lock:
SELECT pid, hostname FROM scheduler_lock; - Compare with running processes:
ps aux | grep bangui
Solution:
- Stop one instance immediately
- Clear lock:
DELETE FROM scheduler_lock; - Restart the remaining instance
Prevention:
- Ensure only one instance starts before heartbeat begins
- Check
BANGUI_SINGLE_INSTANCE=trueis set if single-instance operation is required
Heartbeat Update Failures
Symptom: Logs show scheduler_lock_heartbeat_lost repeatedly, then lock is lost.
Cause: Database writes failing or extremely slow (>5 seconds per write).
Diagnosis:
time sqlite3 /var/lib/bangui/bangui.db "UPDATE scheduler_lock SET heartbeat_at = unixepoch();"
If this takes >1 second, database I/O is degraded.
Solution:
- Check disk health:
sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;" - Move database to faster storage (SSD)
- Check for other I/O bottlenecks on the host
Lock Not Acquired at Startup
Symptom: Instance fails to start with error "Could not acquire scheduler lock".
Cause: Another instance already holds the lock and appears healthy.
Diagnosis:
sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
ps aux | grep <pid>
Solution:
- If other instance is healthy and should run scheduler: this instance must wait
- If other instance is crashed:
DELETE FROM scheduler_lock;then restart this instance - If running single instance: ensure no other instances are running before startup
Rate Limiting
Getting 429 Too Many Requests
Symptom: API returns HTTP 429 with rate_limit_exceeded error code.
Cause: You have exceeded the per-IP rate limit for a specific operation.
Diagnosis:
- Check the
Retry-Afterheader in the response — this tells you how many seconds to wait - Look for the log event
*_rate_limit_exceededwhich shows the bucket and client IP
Rate limit buckets:
| Bucket | Limit | Window | Operations |
|---|---|---|---|
bans:ban |
100 | 1 minute | Ban IP addresses |
bans:unban |
100 | 1 minute | Unban IP addresses |
blocklist:import |
10 | 1 hour | Import blocklists |
config:update |
50 | 1 minute | Update configuration |
jail:update |
100 | 1 minute | Update jail config |
jail:create |
100 | 1 minute | Add log paths, assign filters/actions |
jail:delete |
100 | 1 minute | Remove log paths, actions |
jail:activate |
100 | 1 minute | Activate jails |
jail:deactivate |
100 | 1 minute | Deactivate jails |
filter:update |
50 | 1 minute | Update filters |
filter:create |
50 | 1 minute | Create filters |
filter:delete |
50 | 1 minute | Delete filters |
action:update |
50 | 1 minute | Update actions |
action:create |
50 | 1 minute | Create actions |
action:delete |
50 | 1 minute | Delete actions |
Solution:
- Wait for the
Retry-Afterperiod before retrying - If you hit the limit during legitimate bulk operations, consider batching requests
- For blocklist imports (10/hour), ensure automated imports are not more frequent
Prevention:
- Monitor
*_rate_limit_exceededlog events - Adjust limits via environment variables if needed (see
Docs/CONFIGURATION.md) - For bulk operations, implement client-side throttling
Note: If rate limiting triggers unexpectedly for legitimate use, check for:
- Internal monitoring scripts hitting endpoints too frequently
- Multiple users behind the same proxy IP
- Stale rate limit state after process restart (uses in-memory tracking)
General Recovery Commands
Clear all locks:
sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"
Check lock status:
sqlite3 /var/lib/bangui/bangui.db "SELECT * FROM scheduler_lock;"
Verify database integrity:
sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"
Getting Help
If issues persist after following this guide:
- Enable debug logging:
BANGUI_LOG_LEVEL=debug - Collect logs around the failure time
- Check
Docs/Deployment.mdfor configuration guidance - Check
Docs/Observability.mdfor monitoring setup