- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
164 lines
5.1 KiB
Markdown
164 lines
5.1 KiB
Markdown
# Troubleshooting Guide
|
|
|
|
## Scheduler Lock Issues
|
|
|
|
### Lock Held by Crashed Instance (Orphaned Lock)
|
|
|
|
**Symptom:** Background tasks stop running. Logs show `scheduler_lock_held_by_other_instance` but no other instance is running.
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
|
|
```
|
|
|
|
If `heartbeat_at` is older than 5 minutes and the PID no longer exists, the lock is orphaned.
|
|
|
|
**Recovery:**
|
|
```bash
|
|
sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"
|
|
```
|
|
|
|
Restart the backend. It will acquire the lock fresh.
|
|
|
|
**Prevention:**
|
|
- Monitor `scheduler_lock_heartbeat_lost` events in logs
|
|
- If >3 occurrences per hour, investigate database I/O performance
|
|
|
|
---
|
|
|
|
### Two Instances Both Running Scheduler
|
|
|
|
**Symptom:** Duplicate blocklist imports, duplicate geo cache cleanups, or duplicate history syncs.
|
|
|
|
**Cause:** Both instances believe they hold the lock.
|
|
|
|
**Diagnosis:**
|
|
1. Check which instance holds the lock: `SELECT pid, hostname FROM scheduler_lock;`
|
|
2. Compare with running processes: `ps aux | grep bangui`
|
|
|
|
**Solution:**
|
|
1. Stop one instance immediately
|
|
2. Clear lock: `DELETE FROM scheduler_lock;`
|
|
3. Restart the remaining instance
|
|
|
|
**Prevention:**
|
|
- Ensure only one instance starts before heartbeat begins
|
|
- Check `BANGUI_SINGLE_INSTANCE=true` is set if single-instance operation is required
|
|
|
|
---
|
|
|
|
### Heartbeat Update Failures
|
|
|
|
**Symptom:** Logs show `scheduler_lock_heartbeat_lost` repeatedly, then lock is lost.
|
|
|
|
**Cause:** Database writes failing or extremely slow (>5 seconds per write).
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
time sqlite3 /var/lib/bangui/bangui.db "UPDATE scheduler_lock SET heartbeat_at = unixepoch();"
|
|
```
|
|
|
|
If this takes >1 second, database I/O is degraded.
|
|
|
|
**Solution:**
|
|
1. Check disk health: `sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"`
|
|
2. Move database to faster storage (SSD)
|
|
3. Check for other I/O bottlenecks on the host
|
|
|
|
---
|
|
|
|
### Lock Not Acquired at Startup
|
|
|
|
**Symptom:** Instance fails to start with error "Could not acquire scheduler lock".
|
|
|
|
**Cause:** Another instance already holds the lock and appears healthy.
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
sqlite3 /var/lib/bangui/bangui.db "SELECT pid, hostname, heartbeat_at FROM scheduler_lock;"
|
|
ps aux | grep <pid>
|
|
```
|
|
|
|
**Solution:**
|
|
- If other instance is healthy and should run scheduler: this instance must wait
|
|
- If other instance is crashed: `DELETE FROM scheduler_lock;` then restart this instance
|
|
- If running single instance: ensure no other instances are running before startup
|
|
|
|
---
|
|
|
|
## Rate Limiting
|
|
|
|
### Getting 429 Too Many Requests
|
|
|
|
**Symptom:** API returns HTTP 429 with `rate_limit_exceeded` error code.
|
|
|
|
**Cause:** You have exceeded the per-IP rate limit for a specific operation.
|
|
|
|
**Diagnosis:**
|
|
1. Check the `Retry-After` header in the response — this tells you how many seconds to wait
|
|
2. Look for the log event `*_rate_limit_exceeded` which shows the bucket and client IP
|
|
|
|
**Rate limit buckets:**
|
|
| Bucket | Limit | Window | Operations |
|
|
|--------|-------|--------|------------|
|
|
| `bans:ban` | 100 | 1 minute | Ban IP addresses |
|
|
| `bans:unban` | 100 | 1 minute | Unban IP addresses |
|
|
| `blocklist:import` | 10 | 1 hour | Import blocklists |
|
|
| `config:update` | 50 | 1 minute | Update configuration |
|
|
| `jail:update` | 100 | 1 minute | Update jail config |
|
|
| `jail:create` | 100 | 1 minute | Add log paths, assign filters/actions |
|
|
| `jail:delete` | 100 | 1 minute | Remove log paths, actions |
|
|
| `jail:activate` | 100 | 1 minute | Activate jails |
|
|
| `jail:deactivate` | 100 | 1 minute | Deactivate jails |
|
|
| `filter:update` | 50 | 1 minute | Update filters |
|
|
| `filter:create` | 50 | 1 minute | Create filters |
|
|
| `filter:delete` | 50 | 1 minute | Delete filters |
|
|
| `action:update` | 50 | 1 minute | Update actions |
|
|
| `action:create` | 50 | 1 minute | Create actions |
|
|
| `action:delete` | 50 | 1 minute | Delete actions |
|
|
|
|
**Solution:**
|
|
1. Wait for the `Retry-After` period before retrying
|
|
2. If you hit the limit during legitimate bulk operations, consider batching requests
|
|
3. For blocklist imports (10/hour), ensure automated imports are not more frequent
|
|
|
|
**Prevention:**
|
|
- Monitor `*_rate_limit_exceeded` log events
|
|
- Adjust limits via environment variables if needed (see `Docs/CONFIGURATION.md`)
|
|
- For bulk operations, implement client-side throttling
|
|
|
|
**Note:** If rate limiting triggers unexpectedly for legitimate use, check for:
|
|
- Internal monitoring scripts hitting endpoints too frequently
|
|
- Multiple users behind the same proxy IP
|
|
- Stale rate limit state after process restart (uses in-memory tracking)
|
|
|
|
---
|
|
|
|
## General Recovery Commands
|
|
|
|
Clear all locks:
|
|
```bash
|
|
sqlite3 /var/lib/bangui/bangui.db "DELETE FROM scheduler_lock;"
|
|
```
|
|
|
|
Check lock status:
|
|
```bash
|
|
sqlite3 /var/lib/bangui/bangui.db "SELECT * FROM scheduler_lock;"
|
|
```
|
|
|
|
Verify database integrity:
|
|
```bash
|
|
sqlite3 /var/lib/bangui/bangui.db "PRAGMA integrity_check;"
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Help
|
|
|
|
If issues persist after following this guide:
|
|
|
|
1. Enable debug logging: `BANGUI_LOG_LEVEL=debug`
|
|
2. Collect logs around the failure time
|
|
3. Check `Docs/Deployment.md` for configuration guidance
|
|
4. Check `Docs/Observability.md` for monitoring setup
|