Fix health check endpoint to return 503 when fail2ban is offline

The health check endpoint now properly indicates service unavailability:
- Returns HTTP 200 when fail2ban is online
- Returns HTTP 503 when fail2ban is offline

This allows Docker and other orchestration tools to correctly detect when
fail2ban is unreachable and automatically restart the backend container,
preventing the situation where Docker treats the container as healthy
despite fail2ban being down.

Changes:
- Update GET /api/health to return 503 on fail2ban offline
- Return appropriate JSON response bodies for each state
- Update tests to verify both online (200) and offline (503) scenarios
- Update Dockerfile HEALTHCHECK documentation
- Add Health Checks section to Deployment.md documentation

All tests pass with 100% coverage on health.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-30 21:56:42 +02:00
parent 52f237d5d4
commit 94d6352d1d
5 changed files with 60 additions and 72 deletions

View File

@@ -1,48 +1,3 @@
## [CRITICAL] Background tasks not idempotent
**Where found**
- `backend/app/tasks/blocklist_import.py` — bans applied without checking if already banned
- `backend/app/tasks/geo_cache_flush.py` — cache entries written without transaction
- Multi-step operations not wrapped in transaction
**Why this is needed**
If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.
**Goal**
Make all background tasks idempotent — retrying produces same result as running once.
**What to do**
1. Use operation IDs to deduplicate:
```python
operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}"
if await import_log_repo.get_by_operation_id(operation_id):
return # Already done
```
2. Use transactions for multi-step operations
3. Store operation state before execution
**Possible traps and issues**
- Idempotency keys must be unique but deterministic
- Transactions require database support
- State machine (pending → completed/failed) must be enforced
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Task Idempotency
**Doc references**
- `Docs/Backend-Development.md` (task design)
- `backend/app/tasks/` (task implementations)
---
## [CRITICAL] Health check endpoint returns wrong status code
**Where found**