Remove inactive jails section from Jail management page
The Jail page is now a pure operational view showing only jails that fail2ban reports as active. The backend GET /api/jails already queried only the fail2ban socket status command, so no backend changes were needed. Frontend changes: - Remove Inactive Jails table, Show-inactive toggle, and all related state (showInactive, inactiveJails, activateTarget) - Remove fetchInactiveJails() call and loadInactive/handleActivated callbacks - Remove ActivateJailDialog import and usage - Remove unused imports: useCallback, useEffect, Switch, InactiveJail Inactive-jail discovery and activation remain fully functional via the Configuration page Jails tab (JailsTab.tsx) — unchanged.
This commit is contained in:
483
Docs/Tasks.md
483
Docs/Tasks.md
@@ -4,273 +4,254 @@ This document breaks the entire BanGUI project into development stages, ordered
|
||||
|
||||
---
|
||||
|
||||
## Stage 1 — Bug Fix: Jail Activation / Deactivation Reload Stream
|
||||
## Task 1 — Jail Page: Show Only Active Jails (No Inactive Configs)
|
||||
|
||||
### 1.1 Fix `reload_all` to include newly activated jails in the start stream ✅ DONE
|
||||
**Status:** done
|
||||
|
||||
**Problem:**
|
||||
When a user activates an inactive jail (e.g. `apache-auth`), the backend writes `enabled = true` to `jail.d/apache-auth.local` and calls `jail_service.reload_all()`. However, `reload_all` queries the *currently running* jails via `["status"]` to build the start stream. Since the new jail is not yet running, it is excluded from the stream. After `reload --all`, fail2ban's end-of-reload phase deletes every jail not in the stream — so the newly activated jail never starts.
|
||||
**Summary:** Backend `GET /api/jails` already only returned active jails (queries fail2ban socket `status` command). Frontend `JailsPage.tsx` updated: removed the "Inactive Jails" section, the "Show inactive" toggle, the `fetchInactiveJails()` call, the `ActivateJailDialog` import/usage, and the `InactiveJail` type import. The Config page (`JailsTab.tsx`) retains full inactive-jail management. All backend tests pass (96/96). TypeScript and ESLint report zero errors. (`JailsPage.tsx`) currently displays inactive jail configurations alongside active jails. Inactive jails — those defined in config files but not running — belong on the **Configuration** page (`ConfigPage.tsx`, Jails tab), not on the operational Jail management page. The Jail page should be a pure operational view: only jails that fail2ban reports as active/running appear here.
|
||||
|
||||
The inverse bug exists for deactivation: the jail is still running when `["status"]` is queried, so it remains in the stream and may be restarted despite `enabled = false` being written to the config.
|
||||
### Goal
|
||||
|
||||
**Fix:**
|
||||
Add keyword-only `include_jails` and `exclude_jails` parameters to `jail_service.reload_all()`. Callers merge these into the stream derived from the current status. `activate_jail` passes `include_jails=[name]`; `deactivate_jail` passes `exclude_jails=[name]`. All existing callers are unaffected (both params default to `None`).
|
||||
Remove all inactive-jail display and activation UI from the Jail management page. The Jail page shows only jails that are currently loaded in the running fail2ban instance. Users who want to discover and activate inactive jails do so exclusively through the Configuration page's Jails tab.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/services/jail_service.py` — `reload_all()`
|
||||
- `backend/app/services/config_file_service.py` — `activate_jail()`, `deactivate_jail()`
|
||||
### Backend Changes
|
||||
|
||||
**Acceptance criteria:**
|
||||
- Activating an inactive jail via the API actually starts it in fail2ban.
|
||||
- Deactivating a running jail via the API actually stops it after reload.
|
||||
- All other callers of `reload_all()` (config save, filter/action updates) continue to work without changes.
|
||||
1. **Review `GET /api/jails`** in `backend/app/routers/jails.py` and `jail_service.py`. Confirm this endpoint only returns jails that are reported as active by fail2ban via the socket (`status` command). If it already does, no change needed. If it includes inactive/config-only jails in its response, strip them out.
|
||||
2. **No new endpoints needed.** The inactive-jail listing and activation endpoints already live under `/api/config/jails` and `/api/config/jails/{name}/activate` in `config.py` / `config_file_service.py` — those stay as-is for the Config page.
|
||||
|
||||
### Frontend Changes
|
||||
|
||||
3. **`JailsPage.tsx`** — Remove the "Inactive Jails" section, the toggle that reveals inactive jails, and the `fetchInactiveJails()` call. The page should only call `fetchJails()` (which queries `/api/jails`) and render that list. Remove the `ActivateJailDialog` import and usage from this page if present.
|
||||
4. **`JailsPage.tsx`** — Remove any "Activate" buttons or affordances that reference inactive jails. The jail overview table should show: jail name, status (running / stopped / idle), backend type, currently banned count, total bans, currently failed, total failed, find time, ban time, max retries. No "Inactive" badge or "Activate" button.
|
||||
5. **Verify the Config page** (`ConfigPage.tsx` → Jails tab / `JailsTab.tsx`) still shows the full list including inactive jails with Active/Inactive badges and the Activate button. This is the only place where inactive jails are managed. No changes expected here — just verify nothing broke.
|
||||
|
||||
### Tests
|
||||
|
||||
6. **Backend:** If there are existing tests for `GET /api/jails` that assert inactive jails are included, update them so they assert inactive jails are excluded.
|
||||
7. **Frontend:** Update or remove any component tests for the inactive-jail section on `JailsPage`. Ensure Config-page tests for inactive jail activation still pass.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- The Jail page shows zero inactive jails under any circumstance.
|
||||
- All Jail page data comes only from the fail2ban socket's active jail list.
|
||||
- Inactive-jail discovery and activation remain fully functional on the Configuration page, Jails tab.
|
||||
- No regressions in existing jail control actions (start, stop, reload, idle, ignore-list) on the Jail page.
|
||||
|
||||
---
|
||||
|
||||
### 1.2 Add unit tests for `reload_all` with `include_jails` / `exclude_jails` ✅ DONE
|
||||
## Task 2 — Configuration Subpage: fail2ban Log Viewer & Service Health
|
||||
|
||||
Write tests that verify the new parameters produce the correct fail2ban command stream.
|
||||
**Status:** not started
|
||||
**References:** [Features.md § 6 — Configuration View](Features.md), [Architekture.md § 2](Architekture.md)
|
||||
|
||||
**Test cases:**
|
||||
1. `reload_all(sock, include_jails=["apache-auth"])` when currently running jails are `["sshd", "nginx"]` → the stream sent to fail2ban must contain `["start", "apache-auth"]`, `["start", "nginx"]`, and `["start", "sshd"]`.
|
||||
2. `reload_all(sock, exclude_jails=["sshd"])` when currently running jails are `["sshd", "nginx"]` → the stream must contain only `["start", "nginx"]`, **not** `["start", "sshd"]`.
|
||||
3. `reload_all(sock, include_jails=["new"], exclude_jails=["old"])` when running jails are `["old", "nginx"]` → stream must contain `["start", "new"]` and `["start", "nginx"]`, **not** `["start", "old"]`.
|
||||
4. `reload_all(sock)` without extra args continues to work exactly as before (backwards compatibility).
|
||||
### Problem
|
||||
|
||||
**Files:**
|
||||
- `backend/tests/test_services/test_jail_service.py`
|
||||
There is currently no way to view the fail2ban daemon log (`/var/log/fail2ban.log` or wherever the log target is configured) through the web interface. There is also no dedicated place in the Configuration section that shows at a glance whether fail2ban is running correctly. The existing health probe (`health_service.py`) and dashboard status bar give connectivity info, but the Configuration page should have its own panel showing service health alongside the raw log output.
|
||||
|
||||
### Goal
|
||||
|
||||
Add a new **Log** tab to the Configuration page. This tab shows two things:
|
||||
1. A **Service Health panel** — a compact summary showing whether fail2ban is running, its version, active jail count, total bans, total failures, and the current log level/target. This reuses data from the existing health probe.
|
||||
2. A **Log viewer** — displays the tail of the fail2ban daemon log file with newest entries at the bottom. Supports manual refresh and optional auto-refresh on an interval.
|
||||
|
||||
### Backend Changes
|
||||
|
||||
#### New Endpoint: Read fail2ban Log
|
||||
|
||||
1. **Create `GET /api/config/fail2ban-log`** in `backend/app/routers/config.py` (or a new router file `backend/app/routers/log.py` if `config.py` is getting large).
|
||||
- **Query parameters:**
|
||||
- `lines` (int, default 200, max 2000) — number of lines to return from the tail of the log file.
|
||||
- `filter` (optional string) — a plain-text substring filter; only return lines containing this string (for searching).
|
||||
- **Response model:** `Fail2BanLogResponse` with fields:
|
||||
- `log_path: str` — the resolved path of the log file being read.
|
||||
- `lines: list[str]` — the log lines.
|
||||
- `total_lines: int` — total number of lines in the file (so the UI can indicate if it's truncated).
|
||||
- `log_level: str` — the current fail2ban log level.
|
||||
- `log_target: str` — the current fail2ban log target.
|
||||
- **Behaviour:** Query the fail2ban socket for `get logtarget` to find the current log file path. Read the last N lines from that file using an efficient tail implementation (read from end of file, do not load the entire file into memory). If the log target is not a file (stdout, syslog, systemd-journal), return an informative error explaining that log viewing is only available when fail2ban logs to a file.
|
||||
- **Security:** Validate that the resolved log path is under an expected directory (e.g. `/var/log/`). Do not allow path traversal. Never expose arbitrary file contents.
|
||||
|
||||
2. **Create the service method** `read_fail2ban_log()` in `backend/app/services/config_service.py` (or a new `log_service.py`).
|
||||
- Use `fail2ban_client.py` to query `get logtarget` and `get loglevel`.
|
||||
- Implement an async file tail: open the file, seek to end, read backwards until N newlines are found OR the beginning of the file is reached.
|
||||
- Apply the optional substring filter on the server side before returning.
|
||||
|
||||
3. **Create Pydantic models** in `backend/app/models/config.py`:
|
||||
- `Fail2BanLogResponse(log_path: str, lines: list[str], total_lines: int, log_level: str, log_target: str)`
|
||||
|
||||
#### Extend Health Data for Config Page
|
||||
|
||||
4. **Create `GET /api/config/service-status`** (or reuse/extend `GET /api/dashboard/status` if appropriate).
|
||||
- Returns: `online` (bool), `version` (str), `jail_count` (int), `total_bans` (int), `total_failures` (int), `log_level` (str), `log_target` (str), `db_path` (str), `uptime` or `start_time` if available.
|
||||
- This can delegate to the existing `health_service.probe()` and augment with the log-level/target info from the socket.
|
||||
|
||||
### Frontend Changes
|
||||
|
||||
#### New Tab: Log
|
||||
|
||||
5. **Create `frontend/src/components/config/LogTab.tsx`.**
|
||||
- **Service Health panel** at the top:
|
||||
- A status badge: green "Running" or red "Offline".
|
||||
- Version, active jails count, total bans, total failures displayed in a compact row of stat cards.
|
||||
- Current log level and log target shown as labels.
|
||||
- If fail2ban is offline, show a prominent warning banner with the text: "fail2ban is not running or unreachable. Check the server and socket configuration."
|
||||
- **Log viewer** below:
|
||||
- A monospace-font scrollable container showing the log lines.
|
||||
- A toolbar above the log area with:
|
||||
- A **Refresh** button to re-fetch the log.
|
||||
- An **Auto-refresh** toggle (off by default) with a selectable interval (5s, 10s, 30s).
|
||||
- A **Lines** dropdown to choose how many lines to load (100, 200, 500, 1000).
|
||||
- A **Filter** text input to search within the log (sends the filter param to the backend).
|
||||
- Log lines should be syntax-highlighted or at minimum color-coded by log level (ERROR = red, WARNING = yellow, INFO = default, DEBUG = muted).
|
||||
- The container auto-scrolls to the bottom on load and on refresh (since newest entries are at the end).
|
||||
- If the log target is not a file, show an info banner: "fail2ban is logging to [target]. File-based log viewing is not available."
|
||||
|
||||
6. **Register the tab** in `ConfigPage.tsx`. Add a "Log" tab after the existing tabs (Jails, Filters, Actions, Global, Server, Map, Regex Tester). Use a log-file icon.
|
||||
|
||||
7. **Create API functions** in `frontend/src/api/config.ts`:
|
||||
- `fetchFail2BanLog(lines?: number, filter?: string): Promise<Fail2BanLogResponse>`
|
||||
- `fetchServiceStatus(): Promise<ServiceStatusResponse>`
|
||||
|
||||
8. **Create TypeScript types** in `frontend/src/types/config.ts` (or wherever config types live):
|
||||
- `Fail2BanLogResponse { log_path: string; lines: string[]; total_lines: number; log_level: string; log_target: string; }`
|
||||
- `ServiceStatusResponse { online: boolean; version: string; jail_count: number; total_bans: number; total_failures: number; log_level: string; log_target: string; }`
|
||||
|
||||
### Tests
|
||||
|
||||
9. **Backend:** Write tests for the new log endpoint — mock the file read, test line-count limiting, test the substring filter, test the error case when log target is not a file, test path-traversal prevention.
|
||||
10. **Backend:** Write tests for the service-status endpoint.
|
||||
11. **Frontend:** Write component tests for `LogTab.tsx` — renders health panel, renders log lines, filter input works, handles offline state.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- The Configuration page has a new "Log" tab.
|
||||
- The Log tab shows a clear health summary with running/offline state and key metrics.
|
||||
- The Log tab displays the tail of the fail2ban daemon log file.
|
||||
- Users can choose how many lines to display, can refresh manually, and can optionally enable auto-refresh.
|
||||
- Users can filter log lines by substring.
|
||||
- Log lines are visually differentiated by severity level.
|
||||
- If fail2ban logs to a non-file target, a clear message is shown instead of the log viewer.
|
||||
- The log endpoint does not allow reading arbitrary files — only the actual fail2ban log target.
|
||||
|
||||
---
|
||||
|
||||
### 1.3 Add integration-level tests for activate / deactivate endpoints ✅ DONE
|
||||
## Task 3 — Invalid Jail Config Recovery: Detect Broken fail2ban & Auto-Disable Bad Jails
|
||||
|
||||
Verify that the `POST /api/config/jails/{name}/activate` and `POST /api/config/jails/{name}/deactivate` endpoints pass the correct `include_jails` / `exclude_jails` arguments through to `reload_all`. These tests mock `jail_service.reload_all` and assert on the keyword arguments it receives.
|
||||
**Status:** not started
|
||||
**References:** [Features.md § 5 — Jail Management](Features.md), [Features.md § 6 — Configuration View](Features.md), [Architekture.md § 2](Architekture.md)
|
||||
|
||||
**Files:**
|
||||
- `backend/tests/test_routers/test_config.py` (or a new `test_config_activate.py`)
|
||||
### Problem
|
||||
|
||||
When a user activates a jail from the Configuration page, the system writes `enabled = true` to a `.local` override file and triggers a fail2ban reload. If the jail's configuration is invalid (bad regex, missing log file, broken filter reference, syntax error in an action), fail2ban may **refuse to start entirely** — not just skip the one bad jail but stop the whole daemon. At that point every jail is down, all monitoring stops, and the user is locked out of all fail2ban operations in BanGUI.
|
||||
|
||||
The current `activate_jail()` flow in `config_file_service.py` does a post-reload check (queries fail2ban for the jail's status and returns `active=false` if it didn't start), but this only works when fail2ban is still running. If the entire daemon crashes after the reload, the socket is gone and BanGUI cannot query anything. The user sees generic "offline" errors but has no clear path to fix the problem.
|
||||
|
||||
### Goal
|
||||
|
||||
Build a multi-layered safety net that:
|
||||
1. **Pre-validates** the jail config before activating it (catch obvious errors before the reload).
|
||||
2. **Detects** when fail2ban goes down after a jail activation (detect the crash quickly).
|
||||
3. **Alerts** the user with a clear, actionable message explaining which jail was just activated and that it likely caused the failure.
|
||||
4. **Offers a one-click rollback** that disables the bad jail config and restarts fail2ban.
|
||||
|
||||
### Plan
|
||||
|
||||
#### Layer 1: Pre-Activation Validation
|
||||
|
||||
1. **Extend `activate_jail()` in `config_file_service.py`** (or add a new `validate_jail_config()` method) to perform dry-run checks before writing the `.local` file and reloading:
|
||||
- **Filter existence:** Verify the jail's `filter` setting references a filter file that actually exists in `filter.d/`.
|
||||
- **Action existence:** Verify every action referenced by the jail exists in `action.d/`.
|
||||
- **Regex compilation:** Attempt to compile all `failregex` and `ignoreregex` patterns with Python's `re` module. Report which pattern is broken.
|
||||
- **Log path check:** Verify that the log file paths declared in the jail config actually exist on disk and are readable.
|
||||
- **Syntax check:** Parse the full merged config (base + overrides) and check for obvious syntax issues (malformed interpolation, missing required keys).
|
||||
2. **Return validation errors as a structured response** before proceeding with activation. The response should list every issue found so the user can fix them before trying again.
|
||||
3. **Create a new endpoint `POST /api/config/jails/{name}/validate`** that runs only the validation step without actually activating. The frontend can call this for a "Check Config" button.
|
||||
|
||||
#### Layer 2: Post-Activation Health Check
|
||||
|
||||
4. **After each `activate_jail()` reload**, perform a health-check sequence with retries:
|
||||
- Wait 2 seconds after sending the reload command.
|
||||
- Probe the fail2ban socket with `ping`.
|
||||
- If the probe succeeds, check if the specific jail is active.
|
||||
- If the probe fails (socket gone / connection refused), retry up to 3 times with 2-second intervals.
|
||||
- Return the probe result as part of the activation response.
|
||||
5. **Extend the `JailActivationResponse` model** to include:
|
||||
- `fail2ban_running: bool` — whether the fail2ban daemon is still running after reload.
|
||||
- `validation_warnings: list[str]` — any non-fatal warnings from the pre-validation step.
|
||||
- `error: str | None` — a human-readable error message if something went wrong.
|
||||
|
||||
#### Layer 3: Automatic Crash Detection via Background Task
|
||||
|
||||
6. **Extend `tasks/health_check.py`** (the periodic health probe that runs every 30 seconds):
|
||||
- Track the **last known activation event**: when a jail was activated, store its name and timestamp in an in-memory variable (or a lightweight DB record).
|
||||
- If the health check detects that fail2ban transitioned from `online` to `offline`, and a jail was activated within the last 60 seconds, flag this as a **probable activation failure**.
|
||||
- Store a `PendingRecovery` record: `{ jail_name: str, activated_at: datetime, detected_at: datetime, recovered: bool }`.
|
||||
7. **Create a new endpoint `GET /api/config/pending-recovery`** that returns the current `PendingRecovery` record (or `null` if none).
|
||||
- The frontend polls this endpoint (or it is included in the dashboard status response) to detect when a recovery state is active.
|
||||
|
||||
#### Layer 4: User Alert & One-Click Rollback
|
||||
|
||||
8. **Frontend — Global alert banner.** When the health status transitions to offline and a `PendingRecovery` record exists:
|
||||
- Show a **full-width warning banner** at the top of every page (not just the Config page). The banner is dismissible only after the issue is resolved.
|
||||
- Banner text: "fail2ban stopped after activating jail **{name}**. The jail's configuration may be invalid. Disable this jail and restart fail2ban?"
|
||||
- Two buttons:
|
||||
- **"Disable & Restart"** — calls the rollback endpoint (see below).
|
||||
- **"View Details"** — navigates to the Config page Log tab so the user can inspect the fail2ban log for the exact error message.
|
||||
9. **Create a rollback endpoint `POST /api/config/jails/{name}/rollback`** in the backend:
|
||||
- Writes `enabled = false` to the jail's `.local` override (same as `deactivate_jail()` but works even when fail2ban is down since it only writes a file).
|
||||
- Attempts to start (not reload) the fail2ban daemon via the configured start command (e.g. `systemctl start fail2ban` or `fail2ban-client start`). Make the start command configurable in the app settings.
|
||||
- Waits up to 10 seconds for the socket to come back, probing every 2 seconds.
|
||||
- Returns a response indicating whether fail2ban is back online and how many jails are now active.
|
||||
- Clears the `PendingRecovery` record on success.
|
||||
10. **Frontend — Rollback result.** After the rollback call returns:
|
||||
- If successful: show a success toast "fail2ban restarted with {n} active jails. The jail **{name}** has been disabled." and dismiss the banner.
|
||||
- If fail2ban still doesn't start: show an error dialog explaining that the problem may not be limited to the last activated jail. Suggest the user check the fail2ban log (link to the Log tab) or SSH into the server. Keep the banner visible.
|
||||
|
||||
#### Layer 5: Config Page Enhancements
|
||||
|
||||
11. **On the Config page Jails tab**, when activating a jail:
|
||||
- Before activation, show a confirmation dialog that includes any validation warnings from the pre-check.
|
||||
- During activation, show a spinner with the text "Activating jail and verifying fail2ban…" (acknowledge the post-activation health check takes a few seconds).
|
||||
- After activation, if `fail2ban_running` is false in the response, immediately show the recovery banner and rollback option without waiting for the background health check.
|
||||
12. **Add a "Validate" button** next to the "Activate" button on inactive jails. Clicking it calls `POST /api/config/jails/{name}/validate` and shows the validation results in a panel (green for pass, red for each issue found).
|
||||
|
||||
### Backend File Map
|
||||
|
||||
| File | Changes |
|
||||
|---|---|
|
||||
| `services/config_file_service.py` | Add `validate_jail_config()`, extend `activate_jail()` with pre-validation and post-reload health check. |
|
||||
| `routers/config.py` | Add `POST /api/config/jails/{name}/validate`, `GET /api/config/pending-recovery`, `POST /api/config/jails/{name}/rollback`. |
|
||||
| `models/config.py` | Add `JailValidationResult`, `PendingRecovery`, extend `JailActivationResponse`. |
|
||||
| `tasks/health_check.py` | Track last activation event, detect crash-after-activation, write `PendingRecovery` record. |
|
||||
| `services/health_service.py` | Add helper to attempt daemon start (not just probe). |
|
||||
|
||||
### Frontend File Map
|
||||
|
||||
| File | Changes |
|
||||
|---|---|
|
||||
| `components/config/ActivateJailDialog.tsx` | Add pre-validation call, show warnings, show extended activation feedback. |
|
||||
| `components/config/JailsTab.tsx` | Add "Validate" button next to "Activate" for inactive jails. |
|
||||
| `components/common/RecoveryBanner.tsx` (new) | Global warning banner for activation failures with rollback button. |
|
||||
| `pages/AppLayout.tsx` (or root layout) | Mount the `RecoveryBanner` component so it appears on all pages. |
|
||||
| `api/config.ts` | Add `validateJailConfig()`, `fetchPendingRecovery()`, `rollbackJail()`. |
|
||||
| `types/config.ts` | Add `JailValidationResult`, `PendingRecovery`, extend `JailActivationResponse`. |
|
||||
|
||||
### Tests
|
||||
|
||||
13. **Backend:** Test `validate_jail_config()` — valid config passes, missing filter fails, bad regex fails, missing log path fails.
|
||||
14. **Backend:** Test the rollback endpoint — mock file write, mock daemon start, verify response for success and failure cases.
|
||||
15. **Backend:** Test the health-check crash detection — simulate online→offline transition with a recent activation, verify `PendingRecovery` is set.
|
||||
16. **Frontend:** Test `RecoveryBanner` — renders when `PendingRecovery` is present, disappears after successful rollback, shows error on failed rollback.
|
||||
17. **Frontend:** Test the "Validate" button on the Jails tab — shows green on valid, shows errors on invalid.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Obvious config errors (missing filter, bad regex, missing log file) are caught **before** the jail is activated.
|
||||
- If fail2ban crashes after a jail activation, BanGUI detects it within 30 seconds and shows a prominent alert.
|
||||
- The user can disable the problematic jail and restart fail2ban with a single click from the alert banner.
|
||||
- If the automatic rollback succeeds, BanGUI confirms fail2ban is back and shows the number of recovered jails.
|
||||
- If the automatic rollback fails, the user is guided to check the log or intervene manually.
|
||||
- A standalone "Validate" button lets users check a jail's config without activating it.
|
||||
- All new endpoints have tests covering success, failure, and edge cases.
|
||||
|
||||
---
|
||||
|
||||
## Stage 2 — Socket Connection Resilience
|
||||
|
||||
### 2.1 Add retry logic to `Fail2BanClient.send` for transient connection errors ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
The logs show intermittent `fail2ban_connection_error` events during parallel command bursts (e.g. when fetching jail details after a reload). The fail2ban Unix socket can momentarily refuse connections while processing a reload.
|
||||
|
||||
**Task:**
|
||||
Add a configurable retry mechanism (default 2 retries, 100 ms backoff) to `Fail2BanClient.send()` that catches `ConnectionRefusedError` / `FileNotFoundError` and retries before raising `Fail2BanConnectionError`. This must not retry on protocol-level errors (e.g. unknown jail) — only on connection failures.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/utils/fail2ban_client.py`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- Transient socket errors during reload bursts are retried transparently.
|
||||
- Non-connection errors (e.g. unknown jail) are raised immediately without retry.
|
||||
- A structured log message is emitted for each retry attempt.
|
||||
- Unit tests cover retry success, retry exhaustion, and non-retryable errors.
|
||||
|
||||
---
|
||||
|
||||
### 2.2 Serialize concurrent `reload_all` calls ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
Multiple browser tabs or fast UI clicks could trigger concurrent `reload_all` calls. Sending overlapping `reload --all` commands to the fail2ban socket is undefined behavior and may cause jail loss.
|
||||
|
||||
**Task:**
|
||||
Add an asyncio lock inside `reload_all` (module-level `asyncio.Lock`) so that concurrent calls are serialized. If a reload is already in progress, subsequent calls wait rather than firing in parallel.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/services/jail_service.py`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- Two concurrent `reload_all` calls are serialized; the second waits for the first to finish.
|
||||
- Unit test demonstrates that the lock prevents overlapping socket commands.
|
||||
|
||||
---
|
||||
|
||||
## Stage 3 — Activate / Deactivate UX Improvements
|
||||
|
||||
### 3.1 Return the jail's runtime status after activation ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
After activating a jail, the API returns `active: True` optimistically before verifying that fail2ban actually started the jail. If the reload silently fails (e.g. bad regex in the jail config), the frontend shows the jail as active but it is not.
|
||||
|
||||
**Task:**
|
||||
After calling `reload_all`, query `["status"]` and verify the activated jail appears in the running jail list. If it does not, return `active: False` with a warning message explaining the jail config may be invalid. Log a warning event.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/services/config_file_service.py` — `activate_jail()`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- Successful activation returns `active: True` only after verification.
|
||||
- If the jail doesn't start (e.g. bad config), the response has `active: False` and a descriptive message.
|
||||
- A structured log event is emitted on verification failure.
|
||||
|
||||
---
|
||||
|
||||
### 3.2 Frontend feedback for activation failure
|
||||
|
||||
**Task:**
|
||||
If the activation endpoint returns `active: False`, the ConfigPage jail detail pane should show a warning toast/banner explaining that the jail could not be started and the user should check the jail configuration (filters, log paths, regex etc.).
|
||||
|
||||
**Files:**
|
||||
- `frontend/src/hooks/useConfigActiveStatus.ts` (or relevant hook)
|
||||
- `frontend/src/components/config/` (jail detail component)
|
||||
|
||||
---
|
||||
|
||||
## Stage 4 — Parallel Command Throttling
|
||||
|
||||
### 4.1 Limit concurrent fail2ban socket commands ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
When loading jail details for multiple active jails, the backend fires dozens of `get` commands in parallel (bantime, findtime, maxretry, failregex, etc. × N jails). The fail2ban socket is single-threaded and some commands time out or fail with connection errors under this load.
|
||||
|
||||
**Task:**
|
||||
Introduce an asyncio `Semaphore` (configurable, default 10) that limits the number of in-flight fail2ban commands. All code paths that use `Fail2BanClient.send()` should acquire the semaphore first. This can be implemented as a connection-pool wrapper or a middleware in the client.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/utils/fail2ban_client.py`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- No more than N commands are sent to the socket concurrently.
|
||||
- Connection errors during jail detail fetches are eliminated under normal load.
|
||||
- A structured log event is emitted when a command waits for the semaphore.
|
||||
|
||||
---
|
||||
|
||||
## Stage 5 — Test Coverage Hardening
|
||||
|
||||
### 5.1 Add tests for `activate_jail` and `deactivate_jail` service functions ✅ DONE
|
||||
|
||||
**Task:**
|
||||
Write comprehensive unit tests for `config_file_service.activate_jail` and `config_file_service.deactivate_jail`, covering:
|
||||
- Happy path: jail exists, is inactive, local file is written, reload includes it, response is correct.
|
||||
- Jail not found in config → `JailNotFoundInConfigError`.
|
||||
- Jail already active → `JailAlreadyActiveError`.
|
||||
- Jail already inactive → `JailAlreadyInactiveError`.
|
||||
- Reload fails → activation still returns but with logged warning.
|
||||
- Override parameters (bantime, findtime, etc.) are written to the `.local` file correctly.
|
||||
|
||||
**Files:**
|
||||
- `backend/tests/test_services/test_config_file_service.py`
|
||||
|
||||
---
|
||||
|
||||
### 5.2 Add tests for deactivate path with `exclude_jails` ✅ DONE
|
||||
|
||||
**Task:**
|
||||
Verify that `deactivate_jail` passes `exclude_jails=[name]` to `reload_all`, ensuring the jail is removed from the start stream. Mock `jail_service.reload_all` and assert the keyword arguments.
|
||||
|
||||
**Files:**
|
||||
- `backend/tests/test_services/test_config_file_service.py`
|
||||
|
||||
---
|
||||
|
||||
## Stage 6 — Bug Fix: 502 "Resource temporarily unavailable" on fail2ban Socket
|
||||
|
||||
### 6.1 Add retry with back-off to `_send_command_sync` for transient `OSError` ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
Under concurrent load the fail2ban Unix socket returns `[Errno 11] Resource temporarily unavailable` (EAGAIN). The `_send_command_sync` function in `fail2ban_client.py` catches this as a generic `OSError` and immediately raises `Fail2BanConnectionError`, which the routers translate into a 502 response. There is no retry.
|
||||
|
||||
**Task:**
|
||||
Wrap the `sock.connect()` / `sock.sendall()` / `sock.recv()` block inside a retry loop (max 3 attempts, exponential back-off starting at 150 ms). Only retry on `OSError` with `errno` in `{errno.EAGAIN, errno.ECONNREFUSED, errno.ENOBUFS}` — all other `OSError` variants and all `Fail2BanProtocolError` cases must be raised immediately.
|
||||
|
||||
Emit a structured log event (`fail2ban_socket_retry`) on each retry attempt containing the attempt number, the errno, and the socket path. After the final retry is exhausted, raise `Fail2BanConnectionError` as today.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/utils/fail2ban_client.py` — `_send_command_sync()`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- A transient EAGAIN on the first attempt is silently retried and succeeds on the second attempt without surfacing a 502.
|
||||
- Non-retryable socket errors (e.g. `ENOENT` — socket file missing) are raised immediately on the first attempt.
|
||||
- A `Fail2BanProtocolError` (unpickle failure) is never retried.
|
||||
- After 3 consecutive EAGAIN failures, `Fail2BanConnectionError` is raised as before.
|
||||
- Each retry is logged with `structlog`.
|
||||
|
||||
---
|
||||
|
||||
### 6.2 Add a concurrency semaphore to `Fail2BanClient.send` ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
Dashboard page load fires many parallel `get` commands (jail details, ban stats, trend data). The fail2ban socket is single-threaded; flooding it causes the EAGAIN errors from 6.1.
|
||||
|
||||
**Task:**
|
||||
Introduce an `asyncio.Semaphore` (configurable, default 10) at the module level in `fail2ban_client.py`. Acquire the semaphore in `Fail2BanClient.send()` before dispatching `_send_command_sync` to the thread-pool executor. This caps the number of in-flight socket commands and prevents the socket backlog from overflowing.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/utils/fail2ban_client.py`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- No more than 10 commands are sent to the socket concurrently.
|
||||
- Under normal load, the 502 errors are eliminated.
|
||||
- A structured log event is emitted when a command has to wait for the semaphore (debug level).
|
||||
|
||||
---
|
||||
|
||||
### 6.3 Unit tests for socket retry and semaphore ✅ DONE
|
||||
|
||||
**Task:**
|
||||
Write tests that verify:
|
||||
1. A single transient `OSError(errno.EAGAIN)` is retried and the command succeeds.
|
||||
2. Three consecutive EAGAIN failures raise `Fail2BanConnectionError`.
|
||||
3. An `OSError(errno.ENOENT)` (socket missing) is raised immediately without retry.
|
||||
4. The semaphore limits concurrency — launch 20 parallel `send()` calls against a mock that records timestamps and assert no more than 10 overlap.
|
||||
|
||||
**Files:**
|
||||
- `backend/tests/test_utils/test_fail2ban_client.py`
|
||||
|
||||
---
|
||||
|
||||
## Stage 7 — Bug Fix: Empty Bans-by-Jail Response
|
||||
|
||||
### 7.1 Investigate and fix the empty `bans_by_jail` query ✅ DONE
|
||||
|
||||
**Problem:**
|
||||
`GET /api/dashboard/bans/by-jail?range=30d` returns `{"jails":[],"total":0}` even though ban data exists in the fail2ban database. The query in `ban_service.bans_by_jail()` filters on `WHERE timeofban >= ?` using a Unix timestamp computed from `datetime.now(tz=UTC)`. If the fail2ban database stores `timeofban` in local time rather than UTC (which is the default for fail2ban ≤ 1.0), the comparison silently excludes all rows because the UTC timestamp is hours ahead of the local-time values.
|
||||
|
||||
**Task:**
|
||||
1. Query the fail2ban database for a few sample `timeofban` values and compare them to `datetime.now(tz=UTC).timestamp()` and `time.time()`. Determine whether fail2ban stores bans in UTC or local time.
|
||||
2. If fail2ban uses `time.time()` (which returns UTC on all platforms), then the bug is elsewhere — add debug logging to `bans_by_jail` that logs `since`, the actual `SELECT COUNT(*)` result, and `db_path` so the root cause can be traced from production logs.
|
||||
3. If the timestamps are local time, change `_since_unix()` to use `time.time()` (always UTC epoch) instead of `datetime.now(tz=UTC).timestamp()` to stay consistent. Both should be equivalent on correctly configured systems, but `time.time()` avoids any timezone-aware datetime pitfalls.
|
||||
4. Add a guard: if `total == 0` and the range is `30d` or `365d`, run a `SELECT COUNT(*) FROM bans` (no WHERE) and log the result. If there are rows in the table but zero match the filter, log a warning with the `since` timestamp and the min/max `timeofban` values from the table. This makes future debugging trivial.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/services/ban_service.py` — `_since_unix()`, `bans_by_jail()`
|
||||
|
||||
**Acceptance criteria:**
|
||||
- `bans_by_jail` returns the correct jail counts for the requested time range.
|
||||
- When zero results are returned despite data existing, a warning log is emitted with diagnostic information (since timestamp, db row count, min/max timeofban).
|
||||
- `_since_unix()` uses a method consistent with how fail2ban stores timestamps.
|
||||
|
||||
---
|
||||
|
||||
### 7.2 Add a `/api/dashboard/bans/by-jail` diagnostic endpoint or debug logging ✅ DONE
|
||||
|
||||
**Task:**
|
||||
Add debug-level structured log output to `bans_by_jail` that includes:
|
||||
- The resolved `db_path`.
|
||||
- The computed `since` Unix timestamp and its ISO representation.
|
||||
- The raw `total` count from the first query.
|
||||
- The number of jail groups returned.
|
||||
|
||||
This allows operators to diagnose empty-result issues from the container logs without code changes.
|
||||
|
||||
**Files:**
|
||||
- `backend/app/services/ban_service.py` — `bans_by_jail()`
|
||||
|
||||
---
|
||||
|
||||
### 7.3 Unit tests for `bans_by_jail` with a seeded in-memory database ✅ DONE
|
||||
|
||||
**Task:**
|
||||
Write tests that create a temporary SQLite database matching the fail2ban `bans` table schema, seed it with rows at known timestamps, and call `bans_by_jail` (mocking `_get_fail2ban_db_path` to point at the temp database). Verify:
|
||||
1. Rows within the time range are counted and grouped by jail correctly.
|
||||
2. Rows outside the range are excluded.
|
||||
3. The `origin` filter (`"blocklist"` / `"selfblock"`) partitions results as expected.
|
||||
4. An empty database returns `{"jails": [], "total": 0}` without error.
|
||||
|
||||
**Files:**
|
||||
- `backend/tests/test_services/test_ban_service.py`
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
* geo-location details.
|
||||
*/
|
||||
|
||||
import { useCallback, useEffect, useState } from "react";
|
||||
import { useState } from "react";
|
||||
import {
|
||||
Badge,
|
||||
Button,
|
||||
@@ -32,7 +32,6 @@ import {
|
||||
MessageBarBody,
|
||||
Select,
|
||||
Spinner,
|
||||
Switch,
|
||||
Text,
|
||||
Tooltip,
|
||||
makeStyles,
|
||||
@@ -53,11 +52,8 @@ import {
|
||||
StopRegular,
|
||||
} from "@fluentui/react-icons";
|
||||
import { Link } from "react-router-dom";
|
||||
import { fetchInactiveJails } from "../api/config";
|
||||
import { ActivateJailDialog } from "../components/config";
|
||||
import { useActiveBans, useIpLookup, useJails } from "../hooks/useJails";
|
||||
import type { ActiveBan, JailSummary } from "../types/jail";
|
||||
import type { InactiveJail } from "../types/config";
|
||||
import { ApiError } from "../api/client";
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -323,25 +319,6 @@ function JailOverviewSection(): React.JSX.Element {
|
||||
const { jails, total, loading, error, refresh, startJail, stopJail, setIdle, reloadJail, reloadAll } =
|
||||
useJails();
|
||||
const [opError, setOpError] = useState<string | null>(null);
|
||||
const [showInactive, setShowInactive] = useState(true);
|
||||
const [inactiveJails, setInactiveJails] = useState<InactiveJail[]>([]);
|
||||
const [activateTarget, setActivateTarget] = useState<InactiveJail | null>(null);
|
||||
|
||||
const loadInactive = useCallback((): void => {
|
||||
fetchInactiveJails()
|
||||
.then((res) => { setInactiveJails(res.jails); })
|
||||
.catch(() => { /* non-critical */ });
|
||||
}, []);
|
||||
|
||||
useEffect(() => {
|
||||
loadInactive();
|
||||
}, [loadInactive]);
|
||||
|
||||
const handleActivated = useCallback((): void => {
|
||||
setActivateTarget(null);
|
||||
refresh();
|
||||
loadInactive();
|
||||
}, [refresh, loadInactive]);
|
||||
|
||||
const handle = (fn: () => Promise<void>): void => {
|
||||
setOpError(null);
|
||||
@@ -350,9 +327,6 @@ function JailOverviewSection(): React.JSX.Element {
|
||||
});
|
||||
};
|
||||
|
||||
const activeNameSet = new Set(jails.map((j) => j.name));
|
||||
const inactiveToShow = inactiveJails.filter((j) => !activeNameSet.has(j.name));
|
||||
|
||||
return (
|
||||
<div className={styles.section}>
|
||||
<div className={styles.sectionHeader}>
|
||||
@@ -365,11 +339,6 @@ function JailOverviewSection(): React.JSX.Element {
|
||||
)}
|
||||
</Text>
|
||||
<div className={styles.actionRow}>
|
||||
<Switch
|
||||
label="Show inactive"
|
||||
checked={showInactive}
|
||||
onChange={(_e, d) => { setShowInactive(d.checked); }}
|
||||
/>
|
||||
<Button
|
||||
size="small"
|
||||
appearance="subtle"
|
||||
@@ -481,85 +450,6 @@ function JailOverviewSection(): React.JSX.Element {
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Inactive jails table */}
|
||||
{showInactive && inactiveToShow.length > 0 && (
|
||||
<div style={{ marginTop: tokens.spacingVerticalM }}>
|
||||
<Text
|
||||
size={300}
|
||||
weight="semibold"
|
||||
style={{ color: tokens.colorNeutralForeground3, marginBottom: tokens.spacingVerticalXS }}
|
||||
block
|
||||
>
|
||||
Inactive jails ({String(inactiveToShow.length)})
|
||||
</Text>
|
||||
<div className={styles.tableWrapper}>
|
||||
<table style={{ width: "100%", borderCollapse: "collapse", fontSize: tokens.fontSizeBase200 }}>
|
||||
<thead>
|
||||
<tr style={{ borderBottom: `1px solid ${tokens.colorNeutralStroke2}` }}>
|
||||
<th style={{ textAlign: "left", padding: "6px 8px", fontWeight: tokens.fontWeightSemibold }}>Jail</th>
|
||||
<th style={{ textAlign: "left", padding: "6px 8px", fontWeight: tokens.fontWeightSemibold }}>Status</th>
|
||||
<th style={{ textAlign: "left", padding: "6px 8px", fontWeight: tokens.fontWeightSemibold }}>Filter</th>
|
||||
<th style={{ textAlign: "left", padding: "6px 8px", fontWeight: tokens.fontWeightSemibold }}>Port</th>
|
||||
<th style={{ textAlign: "left", padding: "6px 8px" }} />
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{inactiveToShow.map((j) => (
|
||||
<tr
|
||||
key={j.name}
|
||||
style={{
|
||||
borderBottom: `1px solid ${tokens.colorNeutralStroke2}`,
|
||||
opacity: 0.7,
|
||||
}}
|
||||
>
|
||||
<td style={{ padding: "6px 8px" }}>
|
||||
<Link
|
||||
to="/config"
|
||||
style={{
|
||||
fontFamily: "Consolas, 'Courier New', monospace",
|
||||
fontSize: "0.85rem",
|
||||
textDecoration: "none",
|
||||
color: tokens.colorBrandForeground1,
|
||||
}}
|
||||
>
|
||||
{j.name}
|
||||
</Link>
|
||||
</td>
|
||||
<td style={{ padding: "6px 8px" }}>
|
||||
<Badge appearance="filled" color="subtle">inactive</Badge>
|
||||
</td>
|
||||
<td style={{ padding: "6px 8px" }}>
|
||||
<Text size={200} style={{ fontFamily: "Consolas, 'Courier New', monospace" }}>
|
||||
{j.filter || "—"}
|
||||
</Text>
|
||||
</td>
|
||||
<td style={{ padding: "6px 8px" }}>
|
||||
<Text size={200}>{j.port ?? "—"}</Text>
|
||||
</td>
|
||||
<td style={{ padding: "6px 8px" }}>
|
||||
<Button
|
||||
size="small"
|
||||
appearance="primary"
|
||||
icon={<PlayRegular />}
|
||||
onClick={() => { setActivateTarget(j); }}
|
||||
>
|
||||
Activate
|
||||
</Button>
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<ActivateJailDialog
|
||||
jail={activateTarget}
|
||||
open={activateTarget !== null}
|
||||
onClose={() => { setActivateTarget(null); }}
|
||||
onActivated={handleActivated}
|
||||
/>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user