- jail_service.restart(): replace invalid ["restart"] socket command with ["stop"], matching fail2ban transmitter protocol. The daemon is now stopped via socket; the caller starts it via subprocess. - config_file_service: expose _start_daemon and _wait_for_fail2ban as public start_daemon / wait_for_fail2ban functions. - restart_fail2ban router: orchestrate stop (socket) → start (subprocess) → probe (socket). Returns 204 on success, 503 when fail2ban does not come back within 10 s. Catches JailOperationError → 409. - reload_fail2ban router: add JailOperationError catch → 409 Conflict, consistent with other jail control endpoints. - Tests: add TestJailControls.test_restart_* (3 cases), TestReloadFail2ban 502/409 cases, TestRestartFail2ban (5 cases), TestRollbackJail (6 integration tests verifying file-write, subprocess invocation, socket- probe truthiness, active_jails count, and offline-at-call-time).
6.4 KiB
BanGUI — Task List
This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.
Fix: restart and reload endpoints — correctness and safety ✅ DONE
Background
Two bugs were found by analysing the fail2ban socket protocol
(fail2ban-master/fail2ban/server/transmitter.py) against the current
backend implementation.
Task 1 — Fix jail_service.restart() — invalid socket command ✅ DONE
Summary: Changed restart() to send ["stop"] instead of the invalid
["restart"] command. Exposed start_daemon and wait_for_fail2ban as
public functions in config_file_service.py. The router now orchestrates
the full stop→start→probe sequence.
File: backend/app/services/jail_service.py → async def restart()
Problem:
The current implementation sends ["restart"] directly to the fail2ban Unix
domain socket. The fail2ban server transmitter (transmitter.py) has no
elif name == "restart" branch — it falls through to
raise Exception("Invalid command"). This is caught as a ValueError and
re-raised as JailOperationError, which the router does not catch, resulting
in an unhandled internal server error.
Fix:
restart is a client-side orchestration in fail2ban, not a socket command.
Replace the body of jail_service.restart() with two sequential socket calls:
- Send
["stop"]— this callsserver.quit()on the daemon side (shuts down all jails and terminates the process). - Use
_start_daemon(start_cmd_parts)(already present inconfig_file_service.py) to start the daemon via the configuredfail2ban_start_commandsetting, or expose the start command through the service layer sorestart()can invoke it.
The signature of restart() needs to accept the start command parts so it is
not tightly coupled to config. Alternatively, keep the socket stop call in
jail_service.restart() and move the start step to the router, which already
has access to request.app.state.settings.fail2ban_start_command.
Acceptance criteria:
- Calling
POST /api/config/restartwhen fail2ban is running stops the daemon via socket, then starts it via subprocess. - No
JailOperationErroris raised for normal operation. - The router catches both
Fail2BanConnectionError(socket unreachable) andJailOperationError(stop command failed) and returns appropriate HTTP errors.
Task 2 — Fix restart_fail2ban router — missing JailOperationError catch ✅ DONE
Summary: Added except JailOperationError branch returning HTTP 409 Conflict
to restart_fail2ban. Also imports JailOperationError from jail_service.
File: backend/app/routers/config.py → async def restart_fail2ban()
Problem:
The router only catches Fail2BanConnectionError. If jail_service.restart()
raises JailOperationError (e.g. fail2ban reports the stop failed), it
propagates as an unhandled 500.
Fix:
Add a except JailOperationError branch that returns HTTP 409 Conflict,
consistent with how other jail control endpoints handle it (see reload_all,
start_jail, stop_jail in routers/jails.py).
Task 3 — Fix reload_fail2ban router — missing JailOperationError catch ✅ DONE
Summary: Added except JailOperationError branch returning HTTP 409 Conflict
to reload_fail2ban.
File: backend/app/routers/config.py → async def reload_fail2ban()
Problem:
Same pattern as Task 2. jail_service.reload_all() can raise
JailOperationError (e.g. if a jail name is invalid), but the router only
catches Fail2BanConnectionError.
Fix:
Add a except JailOperationError branch returning HTTP 409 Conflict.
Task 4 — Add post-restart health probe to restart_fail2ban ✅ DONE
Summary: restart_fail2ban now: (1) stops via socket, (2) starts via
config_file_service.start_daemon(), (3) probes with
config_file_service.wait_for_fail2ban(). Returns 204 on success, 503 when
fail2ban does not come back within 10 s.
File: backend/app/routers/config.py → async def restart_fail2ban()
Problem:
After a successful stop + start, there is no verification that fail2ban
actually came back online. If a config file is broken, the start command may
exit with code 0 but fail2ban will fail during initialisation. The router
returns HTTP 204 No Content regardless.
Fix:
After jail_service.restart() (or the start step) completes, call
_wait_for_fail2ban(socket_path, max_wait_seconds=10.0) (already implemented
in config_file_service.py) and change the response:
- If fail2ban is responsive: return 204 No Content (or a 200 with a body
such as
{"fail2ban_running": true, "active_jails": <count>}). - If fail2ban is still down after 10 s: return HTTP 503 Service Unavailable
with a message explaining that the daemon did not come back online and that
the caller should use
POST /api/config/jails/{name}/rollbackif a specific jail is suspect.
Change the response model and HTTP status code accordingly. Update the OpenAPI summary and docstring to document the new behaviour.
Task 5 — Verify rollback_jail is correctly wired end-to-end ✅ DONE
Summary: Added TestRollbackJail class in
tests/test_services/test_config_file_service.py with 6 integration tests
covering: .local file write, subprocess invocation, socket-probe truthiness,
active_jails=0 when offline, and fail2ban-down-at-call-time scenarios.
Files:
backend/app/routers/config.py→async def rollback_jail()backend/app/services/config_file_service.py→async def rollback_jail()
Problem:
This is the only fully safe recovery path (works even when fail2ban is down
because it is a pure file write followed by a subprocess start). Verify with
an integration test that:
- A jail
.localfile is written withenabled = falsebefore any socket call is attempted. - The start command is invoked via subprocess (not via socket).
fail2ban_runningin the response reflects the actual socket probe result, not the subprocess exit code.active_jailsis 0 (not a stale count) whenfail2ban_runningis false.
Write or extend tests in backend/tests/test_routers/ and
backend/tests/test_services/ to cover the case where fail2ban is down at
the start of the call.