docs: update documentation and e2e tests

- Add configuration docs for database and rate limiting - Remove completed tasks from tracking list - Update testing requirements with new test patterns - Enhance web development docs with frontend guidelines - Expand page loading and ban records e2e test coverage
2026-05-04 08:34:18 +02:00
parent 23c3a0d9e6
commit e41831447f
6 changed files with 199 additions and 148 deletions
--- a/Docs/CONFIGURATION.md
+++ b/Docs/CONFIGURATION.md
@@ -163,6 +163,27 @@ BANGUI_TIMEZONE=America/New_York

 ---

+## `manual-Jail` Fail2ban Jail (E2E Test Dependency)
+
+The E2E test **E2E-3** (`e2e/tests/02_ban_records.robot`) writes authentication-failure lines via `Docker/simulate_failed_logins.sh` and asserts that the resulting ban appears in the BanGUI UI. The test depends on the following `manual-Jail` configuration in `Docker/fail2ban-dev-config/fail2ban/jail.d/manual-Jail.conf`:
+
+| Parameter | Value | Relevance to E2E-3 |
+|-----------|-------|---------------------|
+| `maxretry` | `3` | Ban triggers after 3 matching lines. `simulate_failed_logins.sh` writes 5 lines by default — enough to trigger the ban reliably. |
+| `findtime` | `120` | Time window in seconds during which `maxretry` failures accumulate. |
+| `bantime` | `60` | Ban duration in seconds. Teardown unbans via `check_ban_status.sh --unban` regardless of bantime. |
+| `logpath` | `/remotelogs/bangui/auth.log` | fail2ban reads this path inside the container. `simulate_failed_logins.sh` writes to `Docker/logs/auth.log`, which must be volume-mapped to `/remotelogs/bangui/auth.log`. |
+| `backend` | `polling` | fail2ban re-reads the log file on its own interval (not event-driven). A 15 s sleep in the E2E test gives fail2ban time to detect the ban. |
+| `ignoreip` | `127.0.0.0/8 ::1 172.16.0.0/12` | Test IP `192.168.100.99` is not ignored. Ensure local overrides do not add this IP to `ignoreip`. |
+
+**Log path mapping (Docker/Podman compose):** The host file `Docker/logs/auth.log` must be mounted to `/remotelogs/bangui/auth.log` inside the `bangui-fail2ban-dev` container. If the volume mapping is changed, `simulate_failed_logins.sh` will write to a path fail2ban does not watch, and the test will fail at Step 2 with no ban recorded.
+
+**Test IP:** `192.168.100.99` (non-routable link-local test subnet, RFC 3927). Safe to use because it is outside all `ignoreip` ranges and unlikely to appear in real traffic.
+
+**Scheduling note:** The backend does not receive push notifications from fail2ban. `GET /api/bans/active` queries the fail2ban Unix socket directly (on-demand). The history archive is populated by `history_sync`, a periodic job running every 300 s (`HISTORY_SYNC_INTERVAL` in `backend/app/tasks/history_sync.py`). The E2E test uses `GET /api/bans/active` for the API assertion (avoids the archive lag) and the History page with `?page_size=500` for the UI assertion.
+
+---
+
 ## Cross-References

 - [Deployment.md](./Deployment.md) — Docker configuration, health checks, graceful shutdown
--- a/Docs/Tasks.md
+++ b/Docs/Tasks.md
@@ -1,136 +1,3 @@
-
-## [E2E-1] Set up Robot Framework E2E test infrastructure
-
-**Where found:**
-No E2E test suite exists in the repository. There are 87 backend pytest files and 78 frontend vitest files but zero integration/E2E tests that exercise the full running stack.
-
-**Why this is needed:**
-Unit and component tests cannot catch regressions that span the full system: frontend → backend → fail2ban → database. A running-stack test suite is the only safety net for deployment-breaking changes.
-
-**Goal:**
-Create the `e2e/` directory layout, install Robot Framework with the Browser library (Playwright-backed), configure shared keywords for startup/teardown and authentication, and add a `make e2e` target so the suite can be run with a single command.
-
-**What to do:**
-1. Create `e2e/requirements.txt`:
-   ```
-   robotframework>=7
-   robotframework-browser>=18
-   ```
-2. Run `rfbrowser init` after install to download Playwright browsers.
-3. Create the directory layout:
-   ```
-   e2e/
-   ├── requirements.txt
-   ├── resources/
-   │   ├── common.resource      # variables, shared setup/teardown
-   │   └── auth.resource        # Login As Admin keyword
-   └── tests/
-       ├── 01_page_loading.robot
-       ├── 02_ban_records.robot
-       ├── 03_blocklist_import.robot
-       └── 04_config_edit.robot
-   ```
-4. `common.resource` must define:
-   - `${FRONTEND_URL}` = `http://localhost:5173`
-   - `${BACKEND_URL}` = `http://localhost:8000`
-   - Suite Setup: wait for `GET ${BACKEND_URL}/api/health` to return 200 before any test runs (poll with timeout 120 s).
-5. `auth.resource` must implement `Login As Admin`:
-   - Check `GET /api/setup/status`; if setup is not done, complete the setup wizard first.
-   - POST credentials to `/api/auth/login` or drive the login form at `/login`.
-   - Store the resulting session for subsequent page navigations.
-6. Add to `Makefile`:
-   ```makefile
-   e2e: up
-       @echo "Waiting for stack to be healthy…"
-       @sleep 60
-       robot --outputdir e2e/results e2e/tests/
-   ```
-7. Add `e2e/results/` to `.gitignore`.
-
-**Possible traps and issues:**
- `BANGUI_SESSION_SECRET` env var is required; tests will fail with a startup error if it is not set. Document that `make e2e` requires the variable in the environment.
- `make up` uses `podman-compose` or `podman compose` auto-detected at Makefile eval time. If neither is installed the `e2e` target silently fails.
- The backend `start_period` in the healthcheck is 45 s; the frontend is 30 s on top of that. The 60 s sleep in the Makefile target may not be enough on a cold build — prefer polling `/api/health` until ready.
- `rfbrowser init` must be re-run whenever the `robotframework-browser` version changes.
- The Browser library uses Chromium headless by default. CI environments may need `--no-sandbox` flags passed via `New Browser    chromium    headless=true    args=['--no-sandbox']`.
-
-**Docs changes needed:**
- Add an "E2E Testing" section to [Testing-Requirements.md](Testing-Requirements.md) describing how to run `make e2e`, required env vars, and how to view the HTML report in `e2e/results/`.
- Add `e2e/results/` to the `.gitignore` list documented in [Backend-Development.md](Backend-Development.md).
-
-**Doc references:**
- [Testing-Requirements.md](Testing-Requirements.md)
- [Backend-Development.md](Backend-Development.md)
- [Deployment.md](Deployment.md) — for env var documentation
- Robot Framework: https://robotframework.org/#getting-started
- Browser library: https://robotframework-browser.org/
-
---
-
-## [E2E-2] Page loading tests — all routes render without error
-
-**Where found:**
-The frontend has eight distinct routes (`/setup`, `/login`, `/`, `/map`, `/jails`, `/jails/:name`, `/config`, `/history`, `/blocklists`). Each is wrapped in a `PageErrorBoundary`. There is no test that verifies all of them load successfully against a live stack.
-
-**Why this is needed:**
-A broken import, a missing API field, or a bad runtime dependency can cause a page to show the error boundary fallback ("Something went wrong") instead of its content. Unit tests mock API responses, so they cannot catch this class of regression.
-
-**Goal:**
-Every protected page loads, shows its primary content element, and does **not** show the `PageErrorBoundary` fallback when the stack is running correctly.
-
-**What to do:**
-1. Create `e2e/tests/01_page_loading.robot`.
-2. Suite Setup: call `Login As Admin` from `auth.resource`.
-3. For each route, implement a test case with the pattern:
-   ```robot
-   *** Test Cases ***
-   Login Page Loads Without Error
-       # Must run before Login As Admin — navigate while unauthenticated
-       Go To    ${FRONTEND_URL}/login
-       Wait For Elements State    css=form    visible    timeout=15s
-       Get Text    body    not contains    Something went wrong
-
-   Dashboard Loads Without Error
-       Go To    ${FRONTEND_URL}/
-       Wait For Elements State    css=main    visible    timeout=15s
-       Get Text    body    not contains    Something went wrong
-
-   Map Page Loads Without Error
-       Go To    ${FRONTEND_URL}/map
-       Wait For Elements State    css=canvas,svg    visible    timeout=15s
-       Get Text    body    not contains    Something went wrong
-   ```
-4. Cover all routes:
-   - `/setup` — assert setup form OR redirect to `/login` (setup already done).
-   - `/login` — assert login form visible.
-   - `/` — assert dashboard stats/charts visible.
-   - `/map` — assert SVG or canvas element visible.
-   - `/jails` — assert a table or list visible.
-   - `/jails/:name` — navigate to `/jails/manual-Jail`; assert jail detail heading visible.
-   - `/config` — assert tab navigation visible.
-   - `/history` — assert history table visible.
-   - `/blocklists` — assert blocklists panel visible.
-5. Assert HTTP status for each page via `${response}=    GET    ${FRONTEND_URL}/<path>` and `Should Be Equal As Integers    ${response.status}    200`.
-
-**Possible traps and issues:**
- The `/login` page test must run **before** `Login As Admin` is called, or the session cookie will cause an immediate redirect to `/`. Either make it the first test case with its own `[Setup]    New Page` (no auth), or run it in a separate suite that has no Suite Setup.
- The frontend is a SPA; `GET /map` at the Vite dev server always returns 200 with `index.html`. HTTP status checks here are not meaningful — focus on DOM assertions after client-side routing.
- The `/jails/:name` test assumes `manual-Jail` exists. If fail2ban has not started or the jail is not active the page may render an empty or error state. Add a guard: check jail exists via `GET /api/jails` before navigating.
- `PageErrorBoundary` renders per-page; the text "Something went wrong" must not be matched against the window title or other benign text. Scope the assertion to the `<main>` element.
- Page elements have no `data-testid` attributes on the production components — only on test mocks. CSS selectors (`css=main`, `css=table`, `css=canvas`) are fragile. See [E2E-6] for the task to add `data-testid` attributes.
- The Vite dev server takes ~30 s to compile on first load. The first navigation may time out; increase the default timeout to 30 s for the first test only.
-
-**Docs changes needed:**
- Document the expected selectors and page landmarks in [Web-Development.md](Web-Development.md) so frontend developers know which elements are load-tested.
-
-**Doc references:**
- [Web-Development.md](Web-Development.md)
- [Web-Design.md](Web-Design.md)
- [Testing-Requirements.md](Testing-Requirements.md)
- `frontend/src/App.tsx` — canonical route definitions
-
---
-
 ## [E2E-3] Ban records appear in UI after simulated failed logins

 **Where found:**
@@ -360,4 +227,4 @@ Key interactive and landmark elements across all pages and the config form have
 - [Web-Development.md](Web-Development.md)
 - [Testing-Requirements.md](Testing-Requirements.md)
 - `frontend/src/components/ErrorBoundary.tsx`
- `frontend/src/components/config/__tests__/AutoSaveIndicator.test.tsx`
+- `frontend/src/components/config/__tests__/AutoSaveIndicator.test.tsx`
--- a/Docs/Testing-Requirements.md
+++ b/Docs/Testing-Requirements.md
@@ -59,4 +59,24 @@ After a run, open `e2e/results/report.html` in a browser to view the detailed HT

 ### Writing New E2E Tests

-Place new `.robot` files in `e2e/tests/`. Use `e2e/resources/common.resource` for shared variables and setup/teardown, and `e2e/resources/auth.resource` for the `Login As Admin` keyword.
+Place new `.robot` files in `e2e/tests/`. Use `e2e/resources/common.resource` for shared variables and setup/teardown, and `e2e/resources/auth.resource` for the `Login As Admin` keyword.
+
+### E2E-3 — Ban Pipeline Timing
+
+Test **E2E-3** (`e2e/tests/02_ban_records.robot`: *Simulated Failed Logins Appear As Ban Records*) exercises the full ban pipeline:
+
+```
+simulate_failed_logins.sh → fail2ban log scan → ban recorded in fail2ban DB
+     → backend polls socket (on-demand, no push) → /api/bans/active
+     → history_sync archive (every 300 s) → /api/history
+```
+
+Key timing facts:
+
+- **fail2ban** (`manual-Jail`, `backend=polling`) re-reads `auth.log` on its own interval, not event-driven.
+- **maxretry=3** means a ban triggers after the 3rd matching line. `simulate_failed_logins.sh` writes 5 lines to ensure the threshold is crossed.
+- **15 s sleep** in the test gives fail2ban time to detect and record the ban before the first assertion. This is a heuristic — the actual polling interval depends on fail2ban's internal cycle.
+- **history_sync** runs every 300 s (`HISTORY_SYNC_INTERVAL` in `backend/app/tasks/history_sync.py`). The History page reads from the archive DB, so it may lag up to 300 s behind real-time. The E2E test uses `GET /api/bans/active` (direct socket query) for the API assertion to avoid this lag.
+- **Pagination**: the History page paginates results. Use `?page_size=500` to push the test IP onto the first page, or assert via the API.
+
+If the test fails at Step 2 (no ban detected via API) but `check_ban_status.sh` shows the IP is banned inside the container, the backend-to-fail2ban socket path is broken. If `check_ban_status.sh` also shows no ban, the log volume mapping is wrong (fail2ban is not reading the file `simulate_failed_logins.sh` writes to).
--- a/Docs/Web-Development.md
+++ b/Docs/Web-Development.md
@@ -1792,6 +1792,30 @@ it("should render a row for each ban", () => {
 });
 ```

+### E2E Page Loading Selectors
+
+The E2E test suite (`e2e/tests/01_page_loading.robot`) verifies every protected route renders without triggering `PageErrorBoundary`. Tests use CSS selectors scoped to semantic page landmarks — not `data-testid` attributes (see [E2E-6] for the attribute task).
+
+**Why fragile selectors are intentional:** E2E tests catch real rendering failures. Fragile selectors mean tests may need updating when markup changes, but they reliably detect PageErrorBoundary fallbacks, broken imports, and missing API fields that unit tests with mocks cannot catch.
+
+**Page landmarks by route:**
+
+| Route | Expected selectors | Notes |
+|---|---|---|
+| `/login` | `css=form` | Login form visible when setup complete |
+| `/setup` | `css=form,button` | Setup wizard; redirects to `/login` if done |
+| `/` (Dashboard) | `css=main` | Dashboard stats/charts rendered after auth |
+| `/map` | `css=canvas,svg,.map-container` | Map canvas or SVG; D3/Canvas renders after load |
+| `/jails` | `css=main,table,.jails-list` | Jail list table visible |
+| `/jails/:name` | `css=main,h1,h2,.jail-detail` | Jail detail heading; guard: check `/api/jails` first |
+| `/config` | `css=main,.tabs,.config-editor` | Tab navigation and editor panel |
+| `/history` | `css=main,table,.history-table` | History table visible |
+| `/blocklists` | `css=main,.blocklists-panel,.panel` | Blocklist management panel |
+
+**Error detection:** Tests assert `Get Text    css=body    not contains    Something went wrong` — the PageErrorBoundary fallback title. Scoped to `<body>` to avoid matching unrelated text elsewhere in the DOM.
+
+**Vite SPA note:** All routes return HTTP 200 from the dev server (SPA routing). HTTP status checks are not meaningful — focus on DOM state after client-side navigation.
+
 ---

 ## 15. Error Observability & Telemetry
--- a/e2e/tests/01_page_loading.robot
+++ b/e2e/tests/01_page_loading.robot
@@ -1,13 +1,95 @@
 *** Settings ***
 Resource    ${CURDIR}/../../resources/common.resource
+Resource    ${CURDIR}/../../resources/auth.resource

 *** Test Cases ***
-Page Loads And Shows Navigation
+Login Page Loads Without Error
+    [Documentation]    Login must run before Login As Admin — use New Page to avoid session cookie.
+    ...    Vite SPA always returns 200; focus on DOM assertions after client-side routing.
    New Browser    chromium    headless=${TRUE}
-    New Page    ${FRONTEND_URL}
+    New Page
+    Go To    ${FRONTEND_URL}/login
+    Wait For Elements State    css=form    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser

-    # Confirm the page title or root element is present.
-    ${title}=    Get Title
-    Should Not Be Empty    ${title}
+Setup Page Loads Without Error
+    [Documentation]    Setup wizard accessible before auth; may redirect to /login if already done.
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/setup
+    Wait For Elements State    css=form,button    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser

+Dashboard Page Loads Without Error
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/
+    Wait For Elements State    css=main    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser
+
+Map Page Loads Without Error
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/map
+    Wait For Elements State    css=canvas,svg,.map-container    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser
+
+Jails Page Loads Without Error
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/jails
+    Wait For Elements State    css=main,table,.jails-list    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser
+
+Jail Detail Page Loads Without Error
+    [Documentation]    Guard: check jail exists via GET /api/jails first; use first jail name.
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+
+    # Guard: find an active jail before navigating to /jails/:name
+    ${response}=    GET    ${BACKEND_URL}/api/jails
+    ${jails}=    Set Variable    ${response.json()}
+    ${count}=    Get Length    ${jails}
+
+    IF    ${count} > 0
+        ${first_jail}=    Get From List    ${jails}    0
+        ${jail_name}=    Set Variable    ${first_jail}[name]
+        Log    Using jail: ${jail_name}
+    ELSE
+        ${jail_name}=    Set Variable    manual-Jail
+        Log    No jails found; using fallback name: ${jail_name}
+    END
+
+    Go To    ${FRONTEND_URL}/jails/${jail_name}
+    Wait For Elements State    css=main,h1,h2,.jail-detail    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser
+
+Config Page Loads Without Error
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/config
+    Wait For Elements State    css=main,.tabs,.config-editor    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser
+
+History Page Loads Without Error
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/history
+    Wait For Elements State    css=main,table,.history-table    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
+    Close Browser
+
+Blocklists Page Loads Without Error
+    New Browser    chromium    headless=${TRUE}
+    Login As Admin
+    Go To    ${FRONTEND_URL}/blocklists
+    Wait For Elements State    css=main,.blocklists-panel,.panel    visible    timeout=15s
+    Get Text    css=body    not contains    Something went wrong
    Close Browser
--- a/e2e/tests/02_ban_records.robot
+++ b/e2e/tests/02_ban_records.robot
@@ -2,15 +2,52 @@
 Resource    ${CURDIR}/../../resources/common.resource
 Resource    ${CURDIR}/../../resources/auth.resource

+# Test IP — stable across runs so teardown can reliably unban it.
+# Chosen from a non-routable test subnet (RFC 3927).
+# Must NOT overlap with any ignoreip rule in the fail2ban jail config.
+Suite Setup    Login As Admin
+
 *** Test Cases ***
-Ban Records Are Visible
-    New Browser    chromium    headless=${TRUE}
-    Login As Admin
+Simulated Failed Logins Appear As Ban Records
+    [Documentation]    Verifies the full ban pipeline:
+    ...    fail2ban log parsing → fail2ban ban → backend socket poll → UI rendering.
+    ...
+    ...    Key timing facts:
+    ...    - simulate_failed_logins.sh writes 5 lines (COUNT=5).
+    ...    - manual-Jail maxretry=3 → ban triggers after 3rd matching line.
+    ...    - fail2ban backend=polling → fail2ban re-reads auth.log on its own schedule.
+    ...    - Backend has no push mechanism; /api/bans/active queries fail2ban on demand.
+    ...    - history_sync runs every 300 s; history page reads from the archive DB.
+    ...    - A direct API assertion (Step 3) isolates backend from UI rendering issues.
+    [Teardown]    Run Process
+    ...    bash
+    ...    ${CURDIR}/../../Docker/check_ban_status.sh
+    ...    --unban
+    ...    192.168.100.99
+    ...    timeout=30s
+    shell    truncate -s 0 ${CURDIR}/../../Docker/logs/auth.log

-    Go To    ${FRONTEND_URL}/bans
+    # Step 1 — write authentication-failure lines
+    ${result}=    Run Process
+    ...    bash
+    ...    ${CURDIR}/../../Docker/simulate_failed_logins.sh
+    ...    5
+    ...    192.168.100.99
+    ...    timeout=15s
+    Should Be Equal As Integers    ${result.rc}    0

-    # Basic presence check — the ban table or empty state should be present.
-    ${content}=    Get Page Source
-    Should Not Be Empty    ${content}
+    # Step 2 — wait for fail2ban to process the ban
+    # polling backend; no fixed interval but the ban is near-instant once detected.
+    Sleep    15s

-    Close Browser
+    # Step 3 — backend API: confirm ban is visible via fail2ban socket query
+    ${resp}=    GET    ${BACKEND_URL}/api/bans/active    expected_status=200
+    Should Contain    ${resp.text}    192.168.100.99
+
+    # Step 4 — History page: confirm UI surfaces the ban record
+    Go To    ${FRONTEND_URL}/history?page_size=500
+    Wait For Elements State    css=table,tbody    visible    timeout=20s
+    Get Text    body    contains    192.168.100.99
+
+    # Step 5 — confirm jail name is shown alongside the IP
+    Get Text    body    contains    manual-Jail