Files
BanGUI/Docs/Tasks.md
Lukas 936946010f Run immediate health probe after jail deactivation
After deactivation the endpoint now calls _run_probe to flush the
cached server status immediately, matching the activate_jail behaviour
added in Task 5. Without this, the dashboard active-jail count could
remain stale for up to 30 s after a deactivation reload.

- config.py: capture result, await _run_probe, return result
- test_config.py: add test_deactivate_triggers_health_probe; fix 3
  pre-existing UP017 ruff warnings (datetime.UTC alias)
- test_health.py: update test to assert the new fail2ban field
2026-03-14 19:25:24 +01:00

334 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BanGUI — Task List
This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.
---
## Task 1 — Remove "Currently Banned IPs" section from the Jails page
**Status:** done
**Summary:** Removed `ActiveBansSection` component, `buildBanColumns` helper, `fmtTimestamp` helper, `ActiveBan` type import, Dialog/DeleteRegular/DismissRegular imports from `JailsPage.tsx`. Updated file-level and component-level JSDoc to say "three sections". `useActiveBans` kept for `banIp`/`unbanIp` used by `BanUnbanForm`.
**Page:** `/jails` — rendered by `frontend/src/pages/JailsPage.tsx`
The Jails page currently has four sections: Jail Overview, Ban / Unban IP, **Currently Banned IPs**, and IP Lookup. Remove the "Currently Banned IPs" section entirely.
### What to do
1. In `frontend/src/pages/JailsPage.tsx`, find the `ActiveBansSection` sub-component (the function that renders the "Currently Banned IPs" heading, the DataGrid of active bans, the refresh/clear-all buttons, and the confirmation dialog). Delete the entire `ActiveBansSection` function.
2. In the `JailsPage` component at the bottom of the same file, remove the `<ActiveBansSection />` JSX element from the returned markup.
3. Remove any imports, hooks, types, or helper functions that were **only** used by `ActiveBansSection` and are now unused (e.g. `useActiveBans` if it is no longer referenced elsewhere, the `buildBanColumns` helper, the `ActiveBan` type import, etc.). Check the remaining code — `useActiveBans` is also destructured for `banIp`/`unbanIp` in `JailsPage`, so keep it if still needed there.
4. Update the file-level JSDoc comment at the top of the file: change the list from four sections to three and remove the "Currently Banned IPs" bullet.
5. Update the JSDoc on the `JailsPage` component to say "three sections" instead of "four sections".
### Verification
- `npx tsc --noEmit` passes with no errors.
- `npx eslint src/pages/JailsPage.tsx` reports no warnings.
- The `/jails` page renders without the "Currently Banned IPs" section; the remaining three sections (Jail Overview, Ban/Unban IP, IP Lookup) still work.
---
## Task 2 — Remove "Jail Distribution" section from the Dashboard
**Status:** done
**Summary:** Removed `JailDistributionChart` import and JSX block from `DashboardPage.tsx`. The component file is retained (still importable) but no longer rendered.
**Page:** `/` (Dashboard) — rendered by `frontend/src/pages/DashboardPage.tsx`
The Dashboard currently shows: Server Status Bar, Filter Bar, Ban Trend, Top Countries, **Jail Distribution**, and Ban List. Remove the "Jail Distribution" section.
### What to do
1. In `frontend/src/pages/DashboardPage.tsx`, delete the entire `{/* Jail Distribution section */}` block — this is the `<div className={styles.section}>` that wraps the `<JailDistributionChart>` component (approximately lines 164177).
2. Remove the `JailDistributionChart` import at the top of the file (`import { JailDistributionChart } from "../components/JailDistributionChart";`).
3. If the `JailDistributionChart` component file (`frontend/src/components/JailDistributionChart.tsx`) is not imported anywhere else in the codebase, you may optionally delete it, but this is not required.
### Verification
- `npx tsc --noEmit` passes with no errors.
- `npx eslint src/pages/DashboardPage.tsx` reports no warnings.
- The Dashboard renders without the "Jail Distribution" chart; all other sections remain functional.
---
## Task 3 — Fix transparent pie chart in "Top Countries" on the Dashboard
**Status:** done
**Summary:** Added `Cell` import from recharts and rendered per-slice `<Cell key={index} fill={slice.fill} />` children inside `<Pie>` in `TopCountriesPieChart.tsx`. Added `// eslint-disable-next-line @typescript-eslint/no-deprecated` since recharts v3 type-marks `Cell` as deprecated but it remains the correct mechanism for per-slice colours.
**Page:** `/` (Dashboard) — pie chart rendered by `frontend/src/components/TopCountriesPieChart.tsx`
The pie chart in the "Top Countries" section is transparent (invisible slices). The root cause is that each slice's `fill` colour is set on the data objects but the `<Pie>` component does not apply them. Recharts needs either a `fill` prop on `<Pie>`, per-slice `<Cell>` elements, or the data items' `fill` field to be picked up correctly.
### What to do
1. Open `frontend/src/components/TopCountriesPieChart.tsx`.
2. The `buildSlices` helper already attaches a resolved `fill` colour string to every `SliceData` item. However, the `<Pie>` element does not render individual `<Cell>` elements to apply those colours.
3. Import `Cell` from `recharts`:
```tsx
import { Cell, Legend, Pie, PieChart, ResponsiveContainer, Tooltip } from "recharts";
```
4. Inside the `<Pie>` element, add child `<Cell>` elements that map each slice to its colour:
```tsx
<Pie
data={slices}
dataKey="value"
nameKey="name"
cx="50%"
cy="50%"
outerRadius={90}
label={…}
labelLine={false}
>
{slices.map((slice, index) => (
<Cell key={index} fill={slice.fill} />
))}
</Pie>
```
This ensures each pie slice is painted with the colour from `CHART_PALETTE` that `buildSlices` resolved.
### Verification
- `npx tsc --noEmit` passes with no errors.
- The pie chart on the Dashboard now displays coloured slices (blue, red, green, gold, purple) matching the Fluent palette.
- The legend and tooltip still work and show correct country names and percentages.
---
## Task 4 — Fix log viewer rejecting fail2ban log path under `/config/log`
**Status:** done
**Summary:** Added `"/config/log"` to `_SAFE_LOG_PREFIXES` tuple in `config_service.py` and updated the error message to reference both allowed prefixes (`/var/log` and `/config/log`). All existing tests continue to pass.
**Error:** `API error 400: {"detail":"Log path '/config/log/fail2ban/fail2ban.log' is outside the allowed directory. Only paths under /var/log are permitted."}`
**Root cause:** The linuxserver/fail2ban Docker image writes its own log to `/config/log/fail2ban/fail2ban.log` (this is configured via `logtarget` in `Docker/fail2ban-dev-config/fail2ban/fail2ban.conf`). In the Docker Compose setup, the `/config` volume is shared between the fail2ban and backend containers, so the file exists and is readable. However, the backend's `read_fail2ban_log` function in `backend/app/services/config_service.py` hard-codes the allowed path prefix list as:
```python
_SAFE_LOG_PREFIXES: tuple[str, ...] = ("/var/log",)
```
This causes any log target under `/config/log/` to be rejected with a 400 error.
### What to do
1. Open `backend/app/services/config_service.py`.
2. Find the `_SAFE_LOG_PREFIXES` constant (line ~771). Add `"/config/log"` as a second allowed prefix:
```python
_SAFE_LOG_PREFIXES: tuple[str, ...] = ("/var/log", "/config/log")
```
This is safe because:
- `/config/log` is a volume mount controlled by the Docker Compose setup, not user input.
- The path is still validated via `Path.resolve()` to prevent traversal (e.g. `/config/log/../../etc/shadow` resolves outside and is rejected).
3. Update the error message on the next line (inside `read_fail2ban_log`) to reflect both allowed directories:
```python
"Only paths under /var/log or /config/log are permitted."
```
4. Update the existing test `test_path_outside_safe_dir_raises_operation_error` in `backend/tests/test_services/test_config_service.py` — it currently patches `_SAFE_LOG_PREFIXES` to `("/var/log",)` in the assertion, so it will still pass. Verify no other tests break.
### Verification
- `python -m pytest backend/tests/test_services/test_config_service.py -q` passes.
- The log viewer page in the UI successfully loads the fail2ban log file at `/config/log/fail2ban/fail2ban.log` without a 400 error.
---
## Task 5 — Harden jail activation: block on missing logpaths and improve post-activation health checks
**Status:** done
**Summary:**
- **Part A** (`config_file_service.py`): `activate_jail` now refuses to proceed when `_validate_jail_config_sync` returns any issue with `field` in `("filter", "logpath")`. Returns `active=False, fail2ban_running=True` with a descriptive message without writing to disk or reloading fail2ban.
- **Part B** (`ActivateJailDialog.tsx`): All validation issues are now blocking (`blockingIssues = validationIssues`); the advisory-only `logpath` exclusion was removed.
- **Part C** (`config.py` router): After activation, `await _run_probe(request.app)` is called immediately to refresh the cached fail2ban status.
- **Part D** (`health.py`): `/api/health` now returns `{"status": "ok", "fail2ban": "online"|"offline"}` from cached probe state.
- **Tests**: Added `TestActivateJailBlocking` (3 tests) in `test_config_file_service.py`; added `test_200_with_active_false_on_missing_logpath` in `test_config.py`; updated 5 existing `TestActivateJail`/`TestActivateJailReloadArgs` tests to mock `_validate_jail_config_sync`.
### Problem description
Activating a jail whose `logpath` references a non-existent file (e.g. `airsonic-auth.conf` referencing a log file that doesn't exist) causes **fail2ban to crash entirely**. All previously running jails go down (Active Jails drops from 2 → 0). The GUI still shows "Service Health: Running" because:
1. The **backend `/api/health`** endpoint (`backend/app/routers/health.py`) only checks that the FastAPI process itself is alive — it does **not** probe fail2ban. So Docker health checks pass even when fail2ban is dead.
2. The **background health probe** runs every 30 seconds. If the user checks the dashboard during the window between the crash and the next probe, the cached status is stale (still shows "Running").
3. The **pre-activation validation** (`_validate_jail_config_sync` in `backend/app/services/config_file_service.py`) treats missing log paths as **warnings only** (`field="logpath"` issues). The `ActivateJailDialog` frontend component filters these as "advisory" and does not block activation. This means a jail with a non-existent log file is activated, causing fail2ban to crash on reload.
### What to do
#### Part A — Block activation when log files don't exist (backend)
**File:** `backend/app/services/config_file_service.py`
1. In `_validate_jail_config_sync()` (around line 715723), change the log path existence check from a **warning** to an **error** for literal, non-glob paths. Currently it appends a `JailValidationIssue` with `field="logpath"` but the function still returns `valid=True` if these are the only issues. Instead, treat a missing logpath as a blocking validation failure — the `valid` field at the bottom (line ~729) already uses `len(issues) == 0`, so if the issue is appended it will set `valid=False`.
The current code already does this correctly — the issue is in `activate_jail()` itself. Find the post-validation block (around line 11301138) where `warnings` are collected. Currently activation **always proceeds** regardless of validation result. Change this:
2. In `activate_jail()`, after running `_validate_jail_config_sync`, check `validation_result.valid`. If it is `False` **and** any issue has `field` in `("filter", "logpath")`, **refuse to activate**. Return a `JailActivationResponse` with `active=False`, `fail2ban_running=True`, and a descriptive `message` listing the blocking issues. This prevents writing `enabled = true` and reloading fail2ban with a known-bad config.
```python
# Block activation on critical validation failures (missing filter or logpath).
blocking = [i for i in validation_result.issues if i.field in ("filter", "logpath")]
if blocking:
log.warning("jail_activation_blocked", jail=name, issues=[str(i) for i in blocking])
return JailActivationResponse(
name=name,
active=False,
fail2ban_running=True,
validation_warnings=[f"{i.field}: {i.message}" for i in validation_result.issues],
message=(
f"Jail {name!r} cannot be activated: "
+ "; ".join(i.message for i in blocking)
),
)
```
#### Part B — Block activation in the frontend dialog
**File:** `frontend/src/components/config/ActivateJailDialog.tsx`
Currently `blockingIssues` is computed by filtering **out** `logpath` issues (line ~175):
```tsx
const blockingIssues = validationIssues.filter((i) => i.field !== "logpath");
```
Change this so that `logpath` issues **are** blocking too:
```tsx
const blockingIssues = validationIssues.filter(
(i) => i.field !== "logpath" || i.message.includes("not found"),
);
```
Or simply remove the `logpath` exclusion entirely so all validation issues block:
```tsx
const blockingIssues = validationIssues; // all issues block activation
const advisoryIssues: JailValidationIssue[] = []; // nothing is advisory anymore
```
The "Activate" button should remained disabled when `blockingIssues.length > 0` (this logic already exists).
#### Part C — Run an immediate health probe after activation (backend)
**File:** `backend/app/routers/config.py` — `activate_jail` endpoint (around line 640660)
After `config_file_service.activate_jail()` returns, **trigger an immediate health check** so the cached status is updated right away (instead of waiting up to 30 seconds):
```python
# Force an immediate health probe to refresh cached status.
from app.tasks.health_check import _run_probe
await _run_probe(request.app)
```
Add this right after the `last_activation` recording block (around line 653), before the `return result`. This ensures the dashboard immediately reflects the current fail2ban state.
#### Part D — Include fail2ban liveness in `/api/health` endpoint
**File:** `backend/app/routers/health.py`
The current health endpoint always returns `{"status": "ok"}`. Enhance it to also report fail2ban status from the cached probe:
```python
@router.get("/health", summary="Application health check")
async def health_check(request: Request) -> JSONResponse:
cached: ServerStatus = getattr(
request.app.state, "server_status", ServerStatus(online=False)
)
return JSONResponse(content={
"status": "ok",
"fail2ban": "online" if cached.online else "offline",
})
```
Keep the HTTP status code as 200 so Docker health checks don't restart the backend container when fail2ban is down. But having `"fail2ban": "offline"` in the response allows monitoring and debugging.
### Tests to add or update
**File:** `backend/tests/test_services/test_config_file_service.py`
1. **Add test**: `test_activate_jail_blocked_when_logpath_missing` — mock `_validate_jail_config_sync` to return `valid=False` with a `logpath` issue. Assert `activate_jail()` returns `active=False` and `fail2ban_running=True` without calling `reload_all`.
2. **Add test**: `test_activate_jail_blocked_when_filter_missing` — same pattern but with a `filter` issue.
3. **Add test**: `test_activate_jail_proceeds_when_only_regex_warnings` — mock validation with a non-blocking `failregex` issue and assert activation still proceeds.
**File:** `backend/tests/test_routers/test_config.py`
4. **Add test**: `test_activate_returns_400_style_response_on_missing_logpath` — POST to `/api/config/jails/{name}/activate` with a jail that has a missing logpath. Assert the response body has `active=False` and contains the logpath error message.
**File:** `backend/tests/test_tasks/test_health_check.py`
5. **Existing tests** should still pass — `_run_probe` behavior is unchanged.
### Verification
1. Run backend tests:
```bash
.venv/bin/python -m pytest backend/tests/ -q --tb=short
```
All tests pass with no failures.
2. Run frontend type check and lint:
```bash
cd frontend && npx tsc --noEmit && npx eslint src/components/config/ActivateJailDialog.tsx
```
3. **Manual test with running server:**
- Go to `/config`, find a jail with a non-existent logpath (e.g. `airsonic-auth`).
- Click "Activate" — the dialog should show a **blocking error** about the missing log file and the Activate button should be disabled.
- Verify that fail2ban is still running with the original jails intact (Active Jails count unchanged).
- Go to dashboard — "Service Health" should correctly reflect the live fail2ban state.
---
## Task 6 — Run immediate health probe after jail deactivation
**Status:** done
**Summary:** `deactivate_jail` endpoint in `config.py` now captures the service result, calls `await _run_probe(request.app)`, and then returns the result — matching the behaviour added to `activate_jail` in Task 5. Added `test_deactivate_triggers_health_probe` to `TestDeactivateJail` in `test_config.py` (verifies `_run_probe` is awaited once on success). Also fixed 3 pre-existing ruff UP017 warnings (`datetime.timezone.utc` → `datetime.UTC`) in `test_config.py`.
The `deactivate_jail` endpoint in `backend/app/routers/config.py` is inconsistent with `activate_jail`: after activation the router calls `await _run_probe(request.app)` to immediately refresh the cached fail2ban status (added in Task 5 Part C). Deactivation performs a full `reload_all` which also causes a brief fail2ban restart; without the probe the dashboard can show a stale active-jail count for up to 30 seconds.
### What to do
**File:** `backend/app/routers/config.py` — `deactivate_jail` endpoint (around line 670698)
1. The handler currently calls `config_file_service.deactivate_jail(...)` and returns its result directly via `return await ...`. Refactor it to capture the result first, run the probe, then return:
```python
result = await config_file_service.deactivate_jail(config_dir, socket_path, name)
# Force an immediate health probe so the cached status reflects the current
# fail2ban state (reload changes the active-jail count).
await _run_probe(request.app)
return result
```
`_run_probe` is already imported at the top of the file (added in Task 5).
### Tests to add or update
**File:** `backend/tests/test_routers/test_config.py`
2. **Add test**: `test_deactivate_triggers_health_probe` — in the `TestDeactivateJail` class, mock both `config_file_service.deactivate_jail` and `app.routers.config._run_probe`. POST to `/api/config/jails/sshd/deactivate` and assert that `_run_probe` was awaited exactly once.
3. **Update test** `test_200_deactivates_jail` — it already passes without the probe mock, so no changes are needed unless the test client setup causes `_run_probe` to raise. Add a mock for `_run_probe` to prevent real socket calls in that test too.
### Verification
1. Run backend tests:
```bash
.venv/bin/python -m pytest backend/tests/test_routers/test_config.py -q --tb=short
```
All tests pass with no failures.
2. Run the full backend suite to confirm no regressions:
```bash
.venv/bin/python -m pytest backend/tests/ --no-cov --tb=no -q
```
3. **Manual test with running server:**
- Go to `/config`, find an active jail and click "Deactivate".
- Immediately navigate to the Dashboard — "Active Jails" count should already reflect the reduced count without any delay.