Run immediate health probe after jail deactivation

After deactivation the endpoint now calls _run_probe to flush the
cached server status immediately, matching the activate_jail behaviour
added in Task 5. Without this, the dashboard active-jail count could
remain stale for up to 30 s after a deactivation reload.

- config.py: capture result, await _run_probe, return result
- test_config.py: add test_deactivate_triggers_health_probe; fix 3
  pre-existing UP017 ruff warnings (datetime.UTC alias)
- test_health.py: update test to assert the new fail2ban field
This commit is contained in:
2026-03-14 19:25:24 +01:00
parent ee7412442a
commit 936946010f
4 changed files with 114 additions and 6 deletions

View File

@@ -6,6 +6,10 @@ This document breaks the entire BanGUI project into development stages, ordered
## Task 1 — Remove "Currently Banned IPs" section from the Jails page
**Status:** done
**Summary:** Removed `ActiveBansSection` component, `buildBanColumns` helper, `fmtTimestamp` helper, `ActiveBan` type import, Dialog/DeleteRegular/DismissRegular imports from `JailsPage.tsx`. Updated file-level and component-level JSDoc to say "three sections". `useActiveBans` kept for `banIp`/`unbanIp` used by `BanUnbanForm`.
**Page:** `/jails` — rendered by `frontend/src/pages/JailsPage.tsx`
The Jails page currently has four sections: Jail Overview, Ban / Unban IP, **Currently Banned IPs**, and IP Lookup. Remove the "Currently Banned IPs" section entirely.
@@ -28,6 +32,10 @@ The Jails page currently has four sections: Jail Overview, Ban / Unban IP, **Cur
## Task 2 — Remove "Jail Distribution" section from the Dashboard
**Status:** done
**Summary:** Removed `JailDistributionChart` import and JSX block from `DashboardPage.tsx`. The component file is retained (still importable) but no longer rendered.
**Page:** `/` (Dashboard) — rendered by `frontend/src/pages/DashboardPage.tsx`
The Dashboard currently shows: Server Status Bar, Filter Bar, Ban Trend, Top Countries, **Jail Distribution**, and Ban List. Remove the "Jail Distribution" section.
@@ -48,6 +56,10 @@ The Dashboard currently shows: Server Status Bar, Filter Bar, Ban Trend, Top Cou
## Task 3 — Fix transparent pie chart in "Top Countries" on the Dashboard
**Status:** done
**Summary:** Added `Cell` import from recharts and rendered per-slice `<Cell key={index} fill={slice.fill} />` children inside `<Pie>` in `TopCountriesPieChart.tsx`. Added `// eslint-disable-next-line @typescript-eslint/no-deprecated` since recharts v3 type-marks `Cell` as deprecated but it remains the correct mechanism for per-slice colours.
**Page:** `/` (Dashboard) — pie chart rendered by `frontend/src/components/TopCountriesPieChart.tsx`
The pie chart in the "Top Countries" section is transparent (invisible slices). The root cause is that each slice's `fill` colour is set on the data objects but the `<Pie>` component does not apply them. Recharts needs either a `fill` prop on `<Pie>`, per-slice `<Cell>` elements, or the data items' `fill` field to be picked up correctly.
@@ -89,6 +101,10 @@ The pie chart in the "Top Countries" section is transparent (invisible slices).
## Task 4 — Fix log viewer rejecting fail2ban log path under `/config/log`
**Status:** done
**Summary:** Added `"/config/log"` to `_SAFE_LOG_PREFIXES` tuple in `config_service.py` and updated the error message to reference both allowed prefixes (`/var/log` and `/config/log`). All existing tests continue to pass.
**Error:** `API error 400: {"detail":"Log path '/config/log/fail2ban/fail2ban.log' is outside the allowed directory. Only paths under /var/log are permitted."}`
**Root cause:** The linuxserver/fail2ban Docker image writes its own log to `/config/log/fail2ban/fail2ban.log` (this is configured via `logtarget` in `Docker/fail2ban-dev-config/fail2ban/fail2ban.conf`). In the Docker Compose setup, the `/config` volume is shared between the fail2ban and backend containers, so the file exists and is readable. However, the backend's `read_fail2ban_log` function in `backend/app/services/config_service.py` hard-codes the allowed path prefix list as:
@@ -124,6 +140,15 @@ This causes any log target under `/config/log/` to be rejected with a 400 error.
## Task 5 — Harden jail activation: block on missing logpaths and improve post-activation health checks
**Status:** done
**Summary:**
- **Part A** (`config_file_service.py`): `activate_jail` now refuses to proceed when `_validate_jail_config_sync` returns any issue with `field` in `("filter", "logpath")`. Returns `active=False, fail2ban_running=True` with a descriptive message without writing to disk or reloading fail2ban.
- **Part B** (`ActivateJailDialog.tsx`): All validation issues are now blocking (`blockingIssues = validationIssues`); the advisory-only `logpath` exclusion was removed.
- **Part C** (`config.py` router): After activation, `await _run_probe(request.app)` is called immediately to refresh the cached fail2ban status.
- **Part D** (`health.py`): `/api/health` now returns `{"status": "ok", "fail2ban": "online"|"offline"}` from cached probe state.
- **Tests**: Added `TestActivateJailBlocking` (3 tests) in `test_config_file_service.py`; added `test_200_with_active_false_on_missing_logpath` in `test_config.py`; updated 5 existing `TestActivateJail`/`TestActivateJailReloadArgs` tests to mock `_validate_jail_config_sync`.
### Problem description
Activating a jail whose `logpath` references a non-existent file (e.g. `airsonic-auth.conf` referencing a log file that doesn't exist) causes **fail2ban to crash entirely**. All previously running jails go down (Active Jails drops from 2 → 0). The GUI still shows "Service Health: Running" because:
@@ -255,3 +280,54 @@ Keep the HTTP status code as 200 so Docker health checks don't restart the backe
- Click "Activate" — the dialog should show a **blocking error** about the missing log file and the Activate button should be disabled.
- Verify that fail2ban is still running with the original jails intact (Active Jails count unchanged).
- Go to dashboard — "Service Health" should correctly reflect the live fail2ban state.
---
## Task 6 — Run immediate health probe after jail deactivation
**Status:** done
**Summary:** `deactivate_jail` endpoint in `config.py` now captures the service result, calls `await _run_probe(request.app)`, and then returns the result — matching the behaviour added to `activate_jail` in Task 5. Added `test_deactivate_triggers_health_probe` to `TestDeactivateJail` in `test_config.py` (verifies `_run_probe` is awaited once on success). Also fixed 3 pre-existing ruff UP017 warnings (`datetime.timezone.utc` → `datetime.UTC`) in `test_config.py`.
The `deactivate_jail` endpoint in `backend/app/routers/config.py` is inconsistent with `activate_jail`: after activation the router calls `await _run_probe(request.app)` to immediately refresh the cached fail2ban status (added in Task 5 Part C). Deactivation performs a full `reload_all` which also causes a brief fail2ban restart; without the probe the dashboard can show a stale active-jail count for up to 30 seconds.
### What to do
**File:** `backend/app/routers/config.py` — `deactivate_jail` endpoint (around line 670698)
1. The handler currently calls `config_file_service.deactivate_jail(...)` and returns its result directly via `return await ...`. Refactor it to capture the result first, run the probe, then return:
```python
result = await config_file_service.deactivate_jail(config_dir, socket_path, name)
# Force an immediate health probe so the cached status reflects the current
# fail2ban state (reload changes the active-jail count).
await _run_probe(request.app)
return result
```
`_run_probe` is already imported at the top of the file (added in Task 5).
### Tests to add or update
**File:** `backend/tests/test_routers/test_config.py`
2. **Add test**: `test_deactivate_triggers_health_probe` — in the `TestDeactivateJail` class, mock both `config_file_service.deactivate_jail` and `app.routers.config._run_probe`. POST to `/api/config/jails/sshd/deactivate` and assert that `_run_probe` was awaited exactly once.
3. **Update test** `test_200_deactivates_jail` — it already passes without the probe mock, so no changes are needed unless the test client setup causes `_run_probe` to raise. Add a mock for `_run_probe` to prevent real socket calls in that test too.
### Verification
1. Run backend tests:
```bash
.venv/bin/python -m pytest backend/tests/test_routers/test_config.py -q --tb=short
```
All tests pass with no failures.
2. Run the full backend suite to confirm no regressions:
```bash
.venv/bin/python -m pytest backend/tests/ --no-cov --tb=no -q
```
3. **Manual test with running server:**
- Go to `/config`, find an active jail and click "Deactivate".
- Immediately navigate to the Dashboard — "Active Jails" count should already reflect the reduced count without any delay.

View File

@@ -695,7 +695,7 @@ async def deactivate_jail(
socket_path: str = request.app.state.settings.fail2ban_socket
try:
return await config_file_service.deactivate_jail(config_dir, socket_path, name)
result = await config_file_service.deactivate_jail(config_dir, socket_path, name)
except JailNameError as exc:
raise _bad_request(str(exc)) from exc
except JailNotFoundInConfigError:
@@ -713,6 +713,13 @@ async def deactivate_jail(
except Fail2BanConnectionError as exc:
raise _bad_gateway(exc) from exc
# Force an immediate health probe so the cached status reflects the current
# fail2ban state (reload changes the active-jail count) without waiting for
# the next scheduled background check (up to 30 seconds).
await _run_probe(request.app)
return result
# ---------------------------------------------------------------------------
# Jail validation & rollback endpoints (Task 3)

View File

@@ -847,6 +847,30 @@ class TestDeactivateJail:
).post("/api/config/jails/sshd/deactivate")
assert resp.status_code == 401
async def test_deactivate_triggers_health_probe(self, config_client: AsyncClient) -> None:
"""POST .../deactivate triggers an immediate health probe after success."""
from app.models.config import JailActivationResponse
mock_response = JailActivationResponse(
name="sshd",
active=False,
message="Jail 'sshd' deactivated successfully.",
)
with (
patch(
"app.routers.config.config_file_service.deactivate_jail",
AsyncMock(return_value=mock_response),
),
patch(
"app.routers.config._run_probe",
AsyncMock(),
) as mock_probe,
):
resp = await config_client.post("/api/config/jails/sshd/deactivate")
assert resp.status_code == 200
mock_probe.assert_awaited_once()
# ---------------------------------------------------------------------------
# GET /api/config/filters
@@ -1992,7 +2016,7 @@ class TestPendingRecovery:
from app.models.config import PendingRecovery
now = datetime.datetime.now(tz=datetime.timezone.utc)
now = datetime.datetime.now(tz=datetime.UTC)
record = PendingRecovery(
jail_name="sshd",
activated_at=now - datetime.timedelta(seconds=20),
@@ -2031,7 +2055,7 @@ class TestRollbackEndpoint:
# Set up a pending recovery record on the app.
app = config_client._transport.app # type: ignore[attr-defined]
now = datetime.datetime.now(tz=datetime.timezone.utc)
now = datetime.datetime.now(tz=datetime.UTC)
app.state.pending_recovery = PendingRecovery(
jail_name="sshd",
activated_at=now - datetime.timedelta(seconds=10),
@@ -2067,7 +2091,7 @@ class TestRollbackEndpoint:
from app.models.config import PendingRecovery, RollbackResponse
app = config_client._transport.app # type: ignore[attr-defined]
now = datetime.datetime.now(tz=datetime.timezone.utc)
now = datetime.datetime.now(tz=datetime.UTC)
record = PendingRecovery(
jail_name="sshd",
activated_at=now - datetime.timedelta(seconds=10),

View File

@@ -13,10 +13,11 @@ async def test_health_check_returns_200(client: AsyncClient) -> None:
@pytest.mark.asyncio
async def test_health_check_returns_ok_status(client: AsyncClient) -> None:
"""``GET /api/health`` must return ``{"status": "ok"}``."""
"""``GET /api/health`` must contain ``status: ok`` and a ``fail2ban`` field."""
response = await client.get("/api/health")
data: dict[str, str] = response.json()
assert data == {"status": "ok"}
assert data["status"] == "ok"
assert data["fail2ban"] in ("online", "offline")
@pytest.mark.asyncio