Files

Lukas cc6dbcf3f0 feat: implement API versioning /api/v1/

- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-02 21:29:30 +02:00

5.1 KiB

Raw Blame History

Troubleshooting Guide

Scheduler Lock Issues

Lock Held by Crashed Instance (Orphaned Lock)

Symptom: Background tasks stop running. Logs show scheduler_lock_held_by_other_instance but no other instance is running.

Diagnosis:

sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"

If heartbeat_at is older than 5 minutes and the PID no longer exists, the lock is orphaned.

Recovery:

sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"

Restart the backend. It will acquire the lock fresh.

Prevention:

Monitor scheduler_lock_heartbeat_lost events in logs
If >3 occurrences per hour, investigate database I/O performance

Two Instances Both Running Scheduler

Symptom: Duplicate blocklist imports, duplicate geo cache cleanups, or duplicate history syncs.

Cause: Both instances believe they hold the lock.

Diagnosis:

Check which instance holds the lock: SELECT pid, hostname FROM scheduler_lock;
Compare with running processes: ps aux | grep bangui

Solution:

Stop one instance immediately
Clear lock: DELETE FROM scheduler_lock;
Restart the remaining instance

Prevention:

Ensure only one instance starts before heartbeat begins
Check BANGUI_SINGLE_INSTANCE=true is set if single-instance operation is required

Heartbeat Update Failures

Symptom: Logs show scheduler_lock_heartbeat_lost repeatedly, then lock is lost.

Cause: Database writes failing or extremely slow (>5 seconds per write).

Diagnosis:

time sqlite3 /var/lib/bangui/bangui.db "UPDATE scheduler_lock SET heartbeat_at = unixepoch();"

If this takes >1 second, database I/O is degraded.

Solution:

Check disk health: sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"
Move database to faster storage (SSD)
Check for other I/O bottlenecks on the host

Lock Not Acquired at Startup

Symptom: Instance fails to start with error "Could not acquire scheduler lock".

Cause: Another instance already holds the lock and appears healthy.

Diagnosis:

sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
ps aux | grep <pid>

Solution:

If other instance is healthy and should run scheduler: this instance must wait
If other instance is crashed: DELETE FROM scheduler_lock; then restart this instance
If running single instance: ensure no other instances are running before startup

Rate Limiting

Getting 429 Too Many Requests

Symptom: API returns HTTP 429 with rate_limit_exceeded error code.

Cause: You have exceeded the per-IP rate limit for a specific operation.

Diagnosis:

Check the Retry-After header in the response — this tells you how many seconds to wait
Look for the log event *_rate_limit_exceeded which shows the bucket and client IP

Rate limit buckets:

Bucket	Limit	Window	Operations
`bans:ban`	100	1 minute	Ban IP addresses
`bans:unban`	100	1 minute	Unban IP addresses
`blocklist:import`	10	1 hour	Import blocklists
`config:update`	50	1 minute	Update configuration
`jail:update`	100	1 minute	Update jail config
`jail:create`	100	1 minute	Add log paths, assign filters/actions
`jail:delete`	100	1 minute	Remove log paths, actions
`jail:activate`	100	1 minute	Activate jails
`jail:deactivate`	100	1 minute	Deactivate jails
`filter:update`	50	1 minute	Update filters
`filter:create`	50	1 minute	Create filters
`filter:delete`	50	1 minute	Delete filters
`action:update`	50	1 minute	Update actions
`action:create`	50	1 minute	Create actions
`action:delete`	50	1 minute	Delete actions

Solution:

Wait for the Retry-After period before retrying
If you hit the limit during legitimate bulk operations, consider batching requests
For blocklist imports (10/hour), ensure automated imports are not more frequent

Prevention:

Monitor *_rate_limit_exceeded log events
Adjust limits via environment variables if needed (see Docs/CONFIGURATION.md)
For bulk operations, implement client-side throttling

Note: If rate limiting triggers unexpectedly for legitimate use, check for:

Internal monitoring scripts hitting endpoints too frequently
Multiple users behind the same proxy IP
Stale rate limit state after process restart (uses in-memory tracking)

General Recovery Commands

Clear all locks:

sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"

Check lock status:

sqlite3 /var/lib/bangui/bangui.db "SELECT * FROM scheduler_lock;"

Verify database integrity:

sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"

Getting Help

If issues persist after following this guide:

Enable debug logging: BANGUI_LOG_LEVEL=debug
Collect logs around the failure time
Check Docs/Deployment.md for configuration guidance
Check Docs/Observability.md for monitoring setup

5.1 KiB Raw Blame History

Troubleshooting Guide

Scheduler Lock Issues

Lock Held by Crashed Instance (Orphaned Lock)

Two Instances Both Running Scheduler

Heartbeat Update Failures

Lock Not Acquired at Startup

Rate Limiting

Getting 429 Too Many Requests

General Recovery Commands

Getting Help

5.1 KiB

Raw Blame History