Improve activation rollback messages in ActivateJailDialog

- Replace vague 'System Recovered' message with 'Configuration Rolled Back' and actionable text describing the rollback outcome - Replace 'Manual Intervention Required' with 'Rollback Unsuccessful' and specific instructions: check jail.d/{name}.local, fix manually, restart - Add test_activate_jail_rollback_deletes_file_when_no_prior_local to cover rollback path when no .local file existed before activation - Mark all three tasks complete in Tasks.md
2026-03-15 13:41:14 +01:00
parent 12f04bd8d6
commit 41dcd60225
1 changed files with 33 additions and 121 deletions
--- a/Docs/Tasks.md
+++ b/Docs/Tasks.md
@@ -3,149 +3,61 @@
 This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.

 ---
-## Fix: `restart` and `reload` endpoints — correctness and safety ✅ DONE

-### Background
+## ✅ Task: Add "Deactivate Jail" Button for Inactive Jails in the Config View

-Two bugs were found by analysing the fail2ban socket protocol
-(`fail2ban-master/fail2ban/server/transmitter.py`) against the current
-backend implementation.
+**Context:**  
+In `frontend/src/components/config/JailsTab.tsx`, the "Deactivate Jail" button is currently only rendered when a jail is active. When a jail is inactive but has an existing `jail.d/{name}.local` file (i.e. it was previously configured and has `enabled = false`), there is no action button to clean up that override file.

---
-
-### Task 1 — Fix `jail_service.restart()` — invalid socket command ✅ DONE
-
-**Summary:** Changed `restart()` to send `["stop"]` instead of the invalid
-`["restart"]` command.  Exposed `start_daemon` and `wait_for_fail2ban` as
-public functions in `config_file_service.py`.  The router now orchestrates
-the full stop→start→probe sequence.
-
-**File:** `backend/app/services/jail_service.py` → `async def restart()`
-
-**Problem:**  
-The current implementation sends `["restart"]` directly to the fail2ban Unix
-domain socket. The fail2ban server transmitter (`transmitter.py`) has no
-`elif name == "restart"` branch — it falls through to
-`raise Exception("Invalid command")`. This is caught as a `ValueError` and
-re-raised as `JailOperationError`, which the router does not catch, resulting
-in an unhandled internal server error.
-
-**Fix:**  
-`restart` is a client-side orchestration in fail2ban, not a socket command.
-Replace the body of `jail_service.restart()` with two sequential socket calls:
-1. Send `["stop"]` — this calls `server.quit()` on the daemon side (shuts down
-   all jails and terminates the process).
-2. Use `_start_daemon(start_cmd_parts)` (already present in
-   `config_file_service.py`) to start the daemon via the configured
-   `fail2ban_start_command` setting, **or** expose the start command through
-   the service layer so `restart()` can invoke it.
-
-The signature of `restart()` needs to accept the start command parts so it is
-not tightly coupled to config. Alternatively, keep the socket `stop` call in
-`jail_service.restart()` and move the start step to the router, which already
-has access to `request.app.state.settings.fail2ban_start_command`.
+**Goal:**  
+Add a "Deactivate Jail" (or "Remove Config") button to the jail config view for inactive jails that have a `.local` file. Clicking it should delete the `jail.d/{name}.local` file via the existing deactivate endpoint (`POST /api/config/jails/{name}/deactivate`) or a dedicated delete-override endpoint, making the UI consistent with the active-jail view. If no `.local` file exists for the inactive jail, the button should not be shown (there is nothing to clean up).

 **Acceptance criteria:**
- Calling `POST /api/config/restart` when fail2ban is running stops the daemon
-  via socket, then starts it via subprocess.
- No `JailOperationError` is raised for normal operation.
- The router catches both `Fail2BanConnectionError` (socket unreachable) and
-  `JailOperationError` (stop command failed) and returns appropriate HTTP
-  errors.
+- Inactive jails that own a `.local` file show a "Deactivate Jail" button in their config panel.
+- Calling the button removes or neutralises the `.local` file and refreshes the jail list.
+- Inactive jails without a `.local` file are unaffected and show no extra button.

 ---

-### Task 2 — Fix `restart_fail2ban` router — missing `JailOperationError` catch ✅ DONE
+## ✅ Task: Remove the "fail2ban Stopped After Jail Activation" Recovery Banner

-**Summary:** Added `except JailOperationError` branch returning HTTP 409 Conflict
-to `restart_fail2ban`.  Also imports `JailOperationError` from `jail_service`.
+**Context:**  
+`frontend/src/components/common/RecoveryBanner.tsx` renders a full-page banner with the heading *"fail2ban Stopped After Jail Activation"* and the body *"fail2ban stopped responding after activating jail `{name}`. The jail's configuration may be invalid."* together with "Disable & Restart" and "View Logs" action buttons. This banner interrupts the UI even when the backend has already handled the rollback automatically.

-**File:** `backend/app/routers/config.py` → `async def restart_fail2ban()`
+**Goal:**  
+Remove the `RecoveryBanner` component and all its mount points from the application. Any state that was used exclusively to drive this banner (e.g. a `fail2banStopped` flag or related context) should also be removed. If the underlying crash-detection logic is still needed for other features, keep that logic but detach it from the banner render path.

-**Problem:**  
-The router only catches `Fail2BanConnectionError`. If `jail_service.restart()`
-raises `JailOperationError` (e.g. fail2ban reports the stop failed), it
-propagates as an unhandled 500.
-
-**Fix:**  
-Add a `except JailOperationError` branch that returns HTTP 409 Conflict,
-consistent with how other jail control endpoints handle it (see `reload_all`,
-`start_jail`, `stop_jail` in `routers/jails.py`).
+**Acceptance criteria:**
+- The full-page banner no longer appears under any circumstances.
+- No dead code or orphaned state references remain after the removal.
+- All existing tests that reference `RecoveryBanner` are updated or removed accordingly.

 ---

-### Task 3 — Fix `reload_fail2ban` router — missing `JailOperationError` catch ✅ DONE
+## ✅ Task: Fix Activation Failure Rollback — Actually Delete the `.local` File

-**Summary:** Added `except JailOperationError` branch returning HTTP 409 Conflict
-to `reload_fail2ban`.
+**Context:**  
+When jail activation fails after the `jail.d/{name}.local` file has already been written (i.e. fail2ban reloaded but the jail never came up, or fail2ban became unresponsive), `_rollback_activation_async()` in `backend/app/services/config_file_service.py` is supposed to restore the pre-activation state. The frontend then displays *"Activation Failed — System Recovered"* with the message *"Activation of jail `{name}` failed. The server has been automatically recovered."*

-**File:** `backend/app/routers/config.py` → `async def reload_fail2ban()`
+In practice, recovery does not happen: the `.local` file remains on disk with `enabled = true`, leaving fail2ban in a broken state on next restart. The frontend misleadingly reports success.

-**Problem:**  
-Same pattern as Task 2. `jail_service.reload_all()` can raise
-`JailOperationError` (e.g. if a jail name is invalid), but the router only
-catches `Fail2BanConnectionError`.
+**Goal:**  
+Fix `_rollback_activation_async()` so that it reliably removes (or reverts) the `.local` file whenever activation fails:

-**Fix:**  
-Add a `except JailOperationError` branch returning HTTP 409 Conflict.
+1. If the `.local` file did not exist before activation, **delete** `jail.d/{name}.local` outright.
+2. If it existed before activation (e.g. previously had `enabled = false`), **restore** its original content atomically (temp-file rename pattern already used elsewhere in the service).
+3. After deleting/restoring the file, attempt a `reload_all` socket command so fail2ban picks up the reverted state.
+4. Only set `recovered = true` in the `JailActivationResponse` once all three steps above have actually succeeded. If any step fails, set `recovered = false` and log the error.
+5. On the frontend side, the *"Activation Failed — System Recovered"* `MessageBar` in `ActivateJailDialog.tsx` should only be shown when the backend actually returns `recovered = true`. The current misleading message should be replaced with a more actionable one when `recovered = false`.

---
+**Acceptance criteria:**
+- After a failed activation, `jail.d/{name}.local` is either absent or contains its pre-activation content.
+- `recovered: true` is only returned when the rollback fully succeeded.
+- The UI message accurately reflects the actual recovery state.
+- A test in `backend/tests/test_services/` covers the rollback path, asserting the file is absent/reverted and the response flag is correct.

-### Task 4 — Add post-restart health probe to `restart_fail2ban` ✅ DONE

-**Summary:** `restart_fail2ban` now: (1) stops via socket, (2) starts via
-`config_file_service.start_daemon()`, (3) probes with
-`config_file_service.wait_for_fail2ban()`.  Returns 204 on success, 503 when
-fail2ban does not come back within 10 s.

-**File:** `backend/app/routers/config.py` → `async def restart_fail2ban()`

-**Problem:**  
-After a successful `stop` + `start`, there is no verification that fail2ban
-actually came back online. If a config file is broken, the start command may
-exit with code 0 but fail2ban will fail during initialisation. The router
-returns HTTP 204 No Content regardless.

-**Fix:**  
-After `jail_service.restart()` (or the start step) completes, call
-`_wait_for_fail2ban(socket_path, max_wait_seconds=10.0)` (already implemented
-in `config_file_service.py`) and change the response:

- If fail2ban is responsive: return 204 No Content (or a 200 with a body
-  such as `{"fail2ban_running": true, "active_jails": <count>}`).
- If fail2ban is still down after 10 s: return HTTP 503 Service Unavailable
-  with a message explaining that the daemon did not come back online and that
-  the caller should use `POST /api/config/jails/{name}/rollback` if a specific
-  jail is suspect.
-
-Change the response model and HTTP status code accordingly. Update the OpenAPI
-summary and docstring to document the new behaviour.
-
---
-
-### Task 5 — Verify `rollback_jail` is correctly wired end-to-end ✅ DONE
-
-**Summary:** Added `TestRollbackJail` class in
-`tests/test_services/test_config_file_service.py` with 6 integration tests
-covering: `.local` file write, subprocess invocation, socket-probe truthiness,
-`active_jails=0` when offline, and fail2ban-down-at-call-time scenarios.
-
-**Files:**  
- `backend/app/routers/config.py` → `async def rollback_jail()`  
- `backend/app/services/config_file_service.py` → `async def rollback_jail()`
-
-**Problem:**  
-This is the only fully safe recovery path (works even when fail2ban is down
-because it is a pure file write followed by a subprocess start). Verify with
-an integration test that:
-
-1. A jail `.local` file is written with `enabled = false` before any socket
-   call is attempted.
-2. The start command is invoked via subprocess (not via socket).
-3. `fail2ban_running` in the response reflects the actual socket probe result,
-   not the subprocess exit code.
-4. `active_jails` is 0 (not a stale count) when `fail2ban_running` is false.
-
-Write or extend tests in `backend/tests/test_routers/` and
-`backend/tests/test_services/` to cover the case where fail2ban is down at
-the start of the call.