# BanGUI — Task List This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation. --- ## Fix: `restart` and `reload` endpoints — correctness and safety ✅ DONE ### Background Two bugs were found by analysing the fail2ban socket protocol (`fail2ban-master/fail2ban/server/transmitter.py`) against the current backend implementation. --- ### Task 1 — Fix `jail_service.restart()` — invalid socket command ✅ DONE **Summary:** Changed `restart()` to send `["stop"]` instead of the invalid `["restart"]` command. Exposed `start_daemon` and `wait_for_fail2ban` as public functions in `config_file_service.py`. The router now orchestrates the full stop→start→probe sequence. **File:** `backend/app/services/jail_service.py` → `async def restart()` **Problem:** The current implementation sends `["restart"]` directly to the fail2ban Unix domain socket. The fail2ban server transmitter (`transmitter.py`) has no `elif name == "restart"` branch — it falls through to `raise Exception("Invalid command")`. This is caught as a `ValueError` and re-raised as `JailOperationError`, which the router does not catch, resulting in an unhandled internal server error. **Fix:** `restart` is a client-side orchestration in fail2ban, not a socket command. Replace the body of `jail_service.restart()` with two sequential socket calls: 1. Send `["stop"]` — this calls `server.quit()` on the daemon side (shuts down all jails and terminates the process). 2. Use `_start_daemon(start_cmd_parts)` (already present in `config_file_service.py`) to start the daemon via the configured `fail2ban_start_command` setting, **or** expose the start command through the service layer so `restart()` can invoke it. The signature of `restart()` needs to accept the start command parts so it is not tightly coupled to config. Alternatively, keep the socket `stop` call in `jail_service.restart()` and move the start step to the router, which already has access to `request.app.state.settings.fail2ban_start_command`. **Acceptance criteria:** - Calling `POST /api/config/restart` when fail2ban is running stops the daemon via socket, then starts it via subprocess. - No `JailOperationError` is raised for normal operation. - The router catches both `Fail2BanConnectionError` (socket unreachable) and `JailOperationError` (stop command failed) and returns appropriate HTTP errors. --- ### Task 2 — Fix `restart_fail2ban` router — missing `JailOperationError` catch ✅ DONE **Summary:** Added `except JailOperationError` branch returning HTTP 409 Conflict to `restart_fail2ban`. Also imports `JailOperationError` from `jail_service`. **File:** `backend/app/routers/config.py` → `async def restart_fail2ban()` **Problem:** The router only catches `Fail2BanConnectionError`. If `jail_service.restart()` raises `JailOperationError` (e.g. fail2ban reports the stop failed), it propagates as an unhandled 500. **Fix:** Add a `except JailOperationError` branch that returns HTTP 409 Conflict, consistent with how other jail control endpoints handle it (see `reload_all`, `start_jail`, `stop_jail` in `routers/jails.py`). --- ### Task 3 — Fix `reload_fail2ban` router — missing `JailOperationError` catch ✅ DONE **Summary:** Added `except JailOperationError` branch returning HTTP 409 Conflict to `reload_fail2ban`. **File:** `backend/app/routers/config.py` → `async def reload_fail2ban()` **Problem:** Same pattern as Task 2. `jail_service.reload_all()` can raise `JailOperationError` (e.g. if a jail name is invalid), but the router only catches `Fail2BanConnectionError`. **Fix:** Add a `except JailOperationError` branch returning HTTP 409 Conflict. --- ### Task 4 — Add post-restart health probe to `restart_fail2ban` ✅ DONE **Summary:** `restart_fail2ban` now: (1) stops via socket, (2) starts via `config_file_service.start_daemon()`, (3) probes with `config_file_service.wait_for_fail2ban()`. Returns 204 on success, 503 when fail2ban does not come back within 10 s. **File:** `backend/app/routers/config.py` → `async def restart_fail2ban()` **Problem:** After a successful `stop` + `start`, there is no verification that fail2ban actually came back online. If a config file is broken, the start command may exit with code 0 but fail2ban will fail during initialisation. The router returns HTTP 204 No Content regardless. **Fix:** After `jail_service.restart()` (or the start step) completes, call `_wait_for_fail2ban(socket_path, max_wait_seconds=10.0)` (already implemented in `config_file_service.py`) and change the response: - If fail2ban is responsive: return 204 No Content (or a 200 with a body such as `{"fail2ban_running": true, "active_jails": }`). - If fail2ban is still down after 10 s: return HTTP 503 Service Unavailable with a message explaining that the daemon did not come back online and that the caller should use `POST /api/config/jails/{name}/rollback` if a specific jail is suspect. Change the response model and HTTP status code accordingly. Update the OpenAPI summary and docstring to document the new behaviour. --- ### Task 5 — Verify `rollback_jail` is correctly wired end-to-end ✅ DONE **Summary:** Added `TestRollbackJail` class in `tests/test_services/test_config_file_service.py` with 6 integration tests covering: `.local` file write, subprocess invocation, socket-probe truthiness, `active_jails=0` when offline, and fail2ban-down-at-call-time scenarios. **Files:** - `backend/app/routers/config.py` → `async def rollback_jail()` - `backend/app/services/config_file_service.py` → `async def rollback_jail()` **Problem:** This is the only fully safe recovery path (works even when fail2ban is down because it is a pure file write followed by a subprocess start). Verify with an integration test that: 1. A jail `.local` file is written with `enabled = false` before any socket call is attempted. 2. The start command is invoked via subprocess (not via socket). 3. `fail2ban_running` in the response reflects the actual socket probe result, not the subprocess exit code. 4. `active_jails` is 0 (not a stale count) when `fail2ban_running` is false. Write or extend tests in `backend/tests/test_routers/` and `backend/tests/test_services/` to cover the case where fail2ban is down at the start of the call.