- jail_service.restart(): replace invalid ["restart"] socket command with ["stop"], matching fail2ban transmitter protocol. The daemon is now stopped via socket; the caller starts it via subprocess. - config_file_service: expose _start_daemon and _wait_for_fail2ban as public start_daemon / wait_for_fail2ban functions. - restart_fail2ban router: orchestrate stop (socket) → start (subprocess) → probe (socket). Returns 204 on success, 503 when fail2ban does not come back within 10 s. Catches JailOperationError → 409. - reload_fail2ban router: add JailOperationError catch → 409 Conflict, consistent with other jail control endpoints. - Tests: add TestJailControls.test_restart_* (3 cases), TestReloadFail2ban 502/409 cases, TestRestartFail2ban (5 cases), TestRollbackJail (6 integration tests verifying file-write, subprocess invocation, socket- probe truthiness, active_jails count, and offline-at-call-time).
151 lines
6.4 KiB
Markdown
151 lines
6.4 KiB
Markdown
# BanGUI — Task List
|
|
|
|
This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.
|
|
|
|
---
|
|
## Fix: `restart` and `reload` endpoints — correctness and safety ✅ DONE
|
|
|
|
### Background
|
|
|
|
Two bugs were found by analysing the fail2ban socket protocol
|
|
(`fail2ban-master/fail2ban/server/transmitter.py`) against the current
|
|
backend implementation.
|
|
|
|
---
|
|
|
|
### Task 1 — Fix `jail_service.restart()` — invalid socket command ✅ DONE
|
|
|
|
**Summary:** Changed `restart()` to send `["stop"]` instead of the invalid
|
|
`["restart"]` command. Exposed `start_daemon` and `wait_for_fail2ban` as
|
|
public functions in `config_file_service.py`. The router now orchestrates
|
|
the full stop→start→probe sequence.
|
|
|
|
**File:** `backend/app/services/jail_service.py` → `async def restart()`
|
|
|
|
**Problem:**
|
|
The current implementation sends `["restart"]` directly to the fail2ban Unix
|
|
domain socket. The fail2ban server transmitter (`transmitter.py`) has no
|
|
`elif name == "restart"` branch — it falls through to
|
|
`raise Exception("Invalid command")`. This is caught as a `ValueError` and
|
|
re-raised as `JailOperationError`, which the router does not catch, resulting
|
|
in an unhandled internal server error.
|
|
|
|
**Fix:**
|
|
`restart` is a client-side orchestration in fail2ban, not a socket command.
|
|
Replace the body of `jail_service.restart()` with two sequential socket calls:
|
|
1. Send `["stop"]` — this calls `server.quit()` on the daemon side (shuts down
|
|
all jails and terminates the process).
|
|
2. Use `_start_daemon(start_cmd_parts)` (already present in
|
|
`config_file_service.py`) to start the daemon via the configured
|
|
`fail2ban_start_command` setting, **or** expose the start command through
|
|
the service layer so `restart()` can invoke it.
|
|
|
|
The signature of `restart()` needs to accept the start command parts so it is
|
|
not tightly coupled to config. Alternatively, keep the socket `stop` call in
|
|
`jail_service.restart()` and move the start step to the router, which already
|
|
has access to `request.app.state.settings.fail2ban_start_command`.
|
|
|
|
**Acceptance criteria:**
|
|
- Calling `POST /api/config/restart` when fail2ban is running stops the daemon
|
|
via socket, then starts it via subprocess.
|
|
- No `JailOperationError` is raised for normal operation.
|
|
- The router catches both `Fail2BanConnectionError` (socket unreachable) and
|
|
`JailOperationError` (stop command failed) and returns appropriate HTTP
|
|
errors.
|
|
|
|
---
|
|
|
|
### Task 2 — Fix `restart_fail2ban` router — missing `JailOperationError` catch ✅ DONE
|
|
|
|
**Summary:** Added `except JailOperationError` branch returning HTTP 409 Conflict
|
|
to `restart_fail2ban`. Also imports `JailOperationError` from `jail_service`.
|
|
|
|
**File:** `backend/app/routers/config.py` → `async def restart_fail2ban()`
|
|
|
|
**Problem:**
|
|
The router only catches `Fail2BanConnectionError`. If `jail_service.restart()`
|
|
raises `JailOperationError` (e.g. fail2ban reports the stop failed), it
|
|
propagates as an unhandled 500.
|
|
|
|
**Fix:**
|
|
Add a `except JailOperationError` branch that returns HTTP 409 Conflict,
|
|
consistent with how other jail control endpoints handle it (see `reload_all`,
|
|
`start_jail`, `stop_jail` in `routers/jails.py`).
|
|
|
|
---
|
|
|
|
### Task 3 — Fix `reload_fail2ban` router — missing `JailOperationError` catch ✅ DONE
|
|
|
|
**Summary:** Added `except JailOperationError` branch returning HTTP 409 Conflict
|
|
to `reload_fail2ban`.
|
|
|
|
**File:** `backend/app/routers/config.py` → `async def reload_fail2ban()`
|
|
|
|
**Problem:**
|
|
Same pattern as Task 2. `jail_service.reload_all()` can raise
|
|
`JailOperationError` (e.g. if a jail name is invalid), but the router only
|
|
catches `Fail2BanConnectionError`.
|
|
|
|
**Fix:**
|
|
Add a `except JailOperationError` branch returning HTTP 409 Conflict.
|
|
|
|
---
|
|
|
|
### Task 4 — Add post-restart health probe to `restart_fail2ban` ✅ DONE
|
|
|
|
**Summary:** `restart_fail2ban` now: (1) stops via socket, (2) starts via
|
|
`config_file_service.start_daemon()`, (3) probes with
|
|
`config_file_service.wait_for_fail2ban()`. Returns 204 on success, 503 when
|
|
fail2ban does not come back within 10 s.
|
|
|
|
**File:** `backend/app/routers/config.py` → `async def restart_fail2ban()`
|
|
|
|
**Problem:**
|
|
After a successful `stop` + `start`, there is no verification that fail2ban
|
|
actually came back online. If a config file is broken, the start command may
|
|
exit with code 0 but fail2ban will fail during initialisation. The router
|
|
returns HTTP 204 No Content regardless.
|
|
|
|
**Fix:**
|
|
After `jail_service.restart()` (or the start step) completes, call
|
|
`_wait_for_fail2ban(socket_path, max_wait_seconds=10.0)` (already implemented
|
|
in `config_file_service.py`) and change the response:
|
|
|
|
- If fail2ban is responsive: return 204 No Content (or a 200 with a body
|
|
such as `{"fail2ban_running": true, "active_jails": <count>}`).
|
|
- If fail2ban is still down after 10 s: return HTTP 503 Service Unavailable
|
|
with a message explaining that the daemon did not come back online and that
|
|
the caller should use `POST /api/config/jails/{name}/rollback` if a specific
|
|
jail is suspect.
|
|
|
|
Change the response model and HTTP status code accordingly. Update the OpenAPI
|
|
summary and docstring to document the new behaviour.
|
|
|
|
---
|
|
|
|
### Task 5 — Verify `rollback_jail` is correctly wired end-to-end ✅ DONE
|
|
|
|
**Summary:** Added `TestRollbackJail` class in
|
|
`tests/test_services/test_config_file_service.py` with 6 integration tests
|
|
covering: `.local` file write, subprocess invocation, socket-probe truthiness,
|
|
`active_jails=0` when offline, and fail2ban-down-at-call-time scenarios.
|
|
|
|
**Files:**
|
|
- `backend/app/routers/config.py` → `async def rollback_jail()`
|
|
- `backend/app/services/config_file_service.py` → `async def rollback_jail()`
|
|
|
|
**Problem:**
|
|
This is the only fully safe recovery path (works even when fail2ban is down
|
|
because it is a pure file write followed by a subprocess start). Verify with
|
|
an integration test that:
|
|
|
|
1. A jail `.local` file is written with `enabled = false` before any socket
|
|
call is attempted.
|
|
2. The start command is invoked via subprocess (not via socket).
|
|
3. `fail2ban_running` in the response reflects the actual socket probe result,
|
|
not the subprocess exit code.
|
|
4. `active_jails` is 0 (not a stale count) when `fail2ban_running` is false.
|
|
|
|
Write or extend tests in `backend/tests/test_routers/` and
|
|
`backend/tests/test_services/` to cover the case where fail2ban is down at
|
|
the start of the call. |