This commit implements fixes for three independent bugs in the fail2ban configuration and integration layer: 1. Task 1: Detect UnknownJailException and prevent silent failures - Added JailNotFoundError detection in jail_service.reload_all() - Enhanced error handling in config_file_service to catch JailNotFoundError - Added specific error message with logpath validation hints - Added rollback test for this scenario 2. Task 2: Fix iptables-allports exit code 4 (xtables lock contention) - Added global banaction setting in jail.conf with -w 5 lockingopt - Removed redundant per-jail banaction overrides from bangui-sim and blocklist-import - Added production compose documentation note 3. Task 3: Suppress log noise from unsupported backend/idle commands - Implemented capability detection to cache command support status - Double-check locking to minimize lock contention - Avoids sending unsupported get <jail> backend/idle commands - Returns default values without socket calls when unsupported All changes include comprehensive tests and maintain backward compatibility.
13 KiB
BanGUI — Task List
This document breaks the entire BanGUI project into development stages, ordered so that each stage builds on the previous one. Every task is described in prose with enough detail for a developer to begin work. References point to the relevant documentation.
Agent Operating Instructions
These instructions apply to every AI agent working in this repository. Read them fully before touching any file.
Repository Layout
backend/app/ FastAPI application (Python 3.12+, async)
models/ Pydantic v2 models
repositories/ Database access (aiosqlite)
routers/ FastAPI routers — thin, delegate to services
services/ Business logic; all fail2ban interaction lives here
tasks/ APScheduler background jobs
utils/ Shared helpers (fail2ban_client.py, ip_utils.py, …)
backend/tests/ pytest-asyncio test suite mirroring app/ structure
Docker/ Compose files, Dockerfiles, dev config for fail2ban
fail2ban-dev-config/ Bind-mounted into the fail2ban container in debug mode
frontend/src/ React + TypeScript SPA (Vite, Fluent UI)
fail2ban-master/ Vendored fail2ban source — DO NOT EDIT
Docs/ Architecture, design notes, this file
Coding Conventions
- Python: async/await throughout. Use
structlogfor logging (log.info/warning/errorwith keyword args). Pydantic v2 models. Type-annotated functions. - Error handling: Raise domain-specific exceptions (
JailNotFoundError,JailOperationError,Fail2BanConnectionError) from service functions; let routers map them to HTTP responses. Never swallow exceptions silently in routers. - fail2ban socket: All communication goes through
app.utils.fail2ban_client.Fail2BanClient. Use_safe_getwhen a missing command is non-fatal; use_ok()when the response is required. - Tests: Mirror the
app/tree undertests/. Usepytest-asyncio+unittest.mock.patchorAsyncMock. Keep fixtures inconftest.py. Run withpytest backend/testsfrom the repo root (venv at.venv/). - Docker dev config:
Docker/fail2ban-dev-config/is bind-mounted read-write into the fail2ban container. Changes here take effect afterfail2ban-client reload. - Compose: Use
Docker/compose.debug.ymlfor local development;Docker/compose.prod.ymlfor production. Never commit secrets.
How to Verify Changes
- Run the backend test suite:
cd /home/lukas/Volume/repo/BanGUI && .venv/bin/pytest backend/tests -x -q - Check for Python type errors:
.venv/bin/mypy backend/app(if mypy is installed). - For runtime verification, start the debug stack:
docker compose -f Docker/compose.debug.yml up. - Inspect fail2ban logs inside the container:
docker exec bangui-fail2ban-dev fail2ban-client statusanddocker logs bangui-fail2ban-dev.
Agent Workflow
- Read the task description fully. Read every source file mentioned before editing anything.
- Make the minimal change that solves the stated problem — do not refactor surrounding code.
- After every code change run the test suite to confirm no regressions.
- If a fix requires a config file change in
Docker/fail2ban-dev-config/, also updateDocker/compose.prod.ymlor document the required change if it affects the production config volume. - Mark the task complete only after tests pass and the root-cause error no longer appears.
Bug Fixes — fail2ban Runtime Errors (2026-03-14)
The following three independent bugs were identified from fail2ban logs. They are ordered by severity. Resolve them in order; each is self-contained.
Task 1 — UnknownJailException('airsonic-auth') on reload
Priority: High
Files: Docker/fail2ban-dev-config/fail2ban/jail.d/airsonic-auth.conf, backend/app/services/config_file_service.py
Observed error
fail2ban.transmitter ERROR Command ['reload', '--all', [], [['start', 'airsonic-auth'],
['start', 'bangui-sim'], ['start', 'blocklist-import']]] has failed.
Received UnknownJailException('airsonic-auth')
Root cause
When a user activates the airsonic-auth jail through BanGUI, config_file_service.py writes a local override and then calls jail_service.reload_all(socket_path, include_jails=['airsonic-auth']). The reload stream therefore contains ['start', 'airsonic-auth']. fail2ban cannot create the jail object because its logpath (/remotelogs/airsonic/airsonic.log) does not exist in the dev environment — the airsonic service is not running and no log volume is mounted. fail2ban registers a config-level parse failure for the jail and the server falls back to UnknownJailException when the reload asks it to start that name.
The current error handling in config_file_service.py catches the reload exception and rolls back the config file, but it does so with a generic except Exception and logs only a warning. The real error is never surfaced to the user in a way that explains why activation failed.
What to implement
-
Dev-config fix (Docker/fail2ban-dev-config): The
airsonic-auth.conffile is a sample jail showing how a remote-log jail is configured. Add a prominent comment block at the top explaining that this jail is intentionallyenabled = falseand that it will fail to start unless the/remotelogs/airsonic/log directory is mounted into the fail2ban container. This makes the intent explicit and prevents confusion. -
Logpath pre-validation in the activation service (
config_file_service.py): Before callingreload_allduring jail activation, read thelogpathvalues from the jail's config (parse the.confand any.localoverride using the existingconffile_parser). For each logpath that is not/dev/nulland does not contain a glob wildcard, check whether the path exists on the filesystem (the backend container shares the fail2ban config volume with read-write access). If any required logpath is missing, abort the activation immediately — do not write the local override, do not call reload — and return aJailActivationResponsewithactive=Falseand a clearmessageexplaining which logpath is missing. Add avalidation_warningsentry listing the missing paths. -
Specific exception detection (
jail_service.py): Inreload_all, when_ok()raisesValueErrorand the message matchesunknownjail/unknown jail(use the existing_is_not_found_errorhelper), re-raise aJailNotFoundErrorinstead of the genericJailOperationError. Update callers inconfig_file_service.pyto catchJailNotFoundErrorseparately and include the jail name in the activation failure message.
Acceptance criteria
- Attempting to activate
airsonic-auth(without mounting the log volume) returns a 422-class response with a message mentioning the missing logpath — noUnknownJailExceptionappears in the fail2ban log. - Activating
bangui-sim(whose logpath exists in dev) continues to work correctly. - All existing tests in
backend/tests/pass without modification.
Task 2 — iptables-allports action fails with exit code 4 (Script error) on blocklist-import jail
Priority: High
Files: Docker/fail2ban-dev-config/fail2ban/jail.d/blocklist-import.conf, Docker/fail2ban-dev-config/fail2ban/jail.d/bangui-sim.conf, Docker/fail2ban-dev-config/fail2ban/jail.conf
Observed error
fail2ban.utils ERROR 753c588a7860 -- returned 4
fail2ban.actions ERROR Failed to execute ban jail 'blocklist-import' action
'iptables-allports' … Error starting action
Jail('blocklist-import')/iptables-allports: 'Script error'
Root cause
Exit code 4 from an iptables invocation means the xtables advisory lock could not be obtained (another iptables call was in progress simultaneously). The iptables-allports actionstart script does not pass the -w (wait-for-lock) flag by default. In the Docker dev environment, fail2ban starts multiple jails in parallel during reload --all. Each jail's actionstart sub-process calls iptables concurrently. The xtables lock contention causes some of them to exit with code 4. fail2ban interprets any non-zero exit from an action script as Script error and aborts the action.
A secondary risk: if the host kernel uses nf_tables (iptables-nft) but the container runs iptables-legacy binaries, the same exit code 4 can also appear due to backend mismatch.
What to implement
-
Add the xtables lock-wait flag globally in
Docker/fail2ban-dev-config/fail2ban/jail.conf, under the[DEFAULT]section. Set:banaction = iptables-allports[lockingopt="-w 5"]The
lockingoptparameter is appended to every iptables call insideiptables-allports, telling it to wait up to 5 seconds for the xtables lock instead of failing immediately. This is the canonical fix recommended in the fail2ban documentation for containerised deployments. -
Remove the redundant
banactionoverrides fromblocklist-import.confandbangui-sim.conf(both already specifybanaction = iptables-allportswhich now comes from the global default). Removing the per-jail override keeps configuration DRY and ensures the lock-wait flag applies consistently. -
Production compose note: Add a comment in
Docker/compose.prod.ymlnear the fail2ban service block stating that thefail2ban-configvolume must contain ajail.d/withbanaction = iptables-allports[lockingopt="-w 5"]or equivalent, because the production volume is not pre-seeded by the repository. This is a documentation action only — do not auto-seed the production volume.
Acceptance criteria
- After
docker compose -f Docker/compose.debug.yml restart fail2ban, the fail2ban log shows noreturned 4orScript errorlines during startup. docker exec bangui-fail2ban-dev fail2ban-client status blocklist-importreports the jail as running.docker exec bangui-fail2ban-dev fail2ban-client status bangui-simreports the jail as running.
Task 3 — Suppress log noise from unsupported get <jail> idle/backend commands
Priority: Low
Files: backend/app/services/jail_service.py
Observed error
fail2ban.transmitter ERROR Command ['get', 'bangui-sim', 'idle'] has failed.
Received Exception('Invalid command (no get action or not yet implemented)')
fail2ban.transmitter ERROR Command ['get', 'bangui-sim', 'backend'] has failed.
Received Exception('Invalid command (no get action or not yet implemented)')
… (repeated ~15 times per polling cycle)
Root cause
_fetch_jail_summary() in jail_service.py sends ["get", name, "backend"] and ["get", name, "idle"] to the fail2ban daemon via the socket. The running fail2ban version (LinuxServer.io container image) does not implement these two sub-commands in its transmitter. fail2ban logs every unrecognised command as an ERROR on its side. BanGUI already handles this gracefully — asyncio.gather(return_exceptions=True) captures the exceptions and _safe_bool / _safe_str fall back to False / "polling" respectively. The BanGUI application does not malfunction, but the fail2ban log is flooded with spurious ERROR lines on every jail-list refresh (~15 errors per request across all jails, repeated on every dashboard poll).
What to implement
The fix is a capability-detection probe executed once at startup (or at most once per polling cycle, re-used for all jails). The probe sends ["get", "<first_jail>", "backend"] and caches the result in a module-level boolean flag.
-
In
jail_service.py, add a module-levelasyncio.Lockand a nullableboolflag:_backend_cmd_supported: bool | None = None _backend_cmd_lock: asyncio.Lock = asyncio.Lock() -
Add an async helper
_check_backend_cmd_supported(client, jail_name) -> boolthat:- Returns the cached value immediately if already determined.
- Acquires
_backend_cmd_lock, checks again (double-check idiom), then sends["get", jail_name, "backend"]. - Sets
_backend_cmd_supported = Trueif the command returns without exception,Falseif it raises anyException. - Returns the flag value.
-
In
_fetch_jail_summary(), call this helper (using the first available jail name) before the mainasyncio.gather. If the flag isFalse, replace the["get", name, "backend"]and["get", name, "idle"]gather entries withasyncio.coroutineconstants that immediately return the default values ("polling"andFalse) without sending any socket command. This eliminates the fail2ban-side log noise entirely without changing the public API or the fallback values. -
If
_backend_cmd_supportedbecomesTrue(future fail2ban version), the commands are sent as before and real values are returned.
Acceptance criteria
- After one polling cycle where
backend/idleare detected as unsupported, the fail2ban log shows zeroInvalid commandlines for those commands on subsequent refreshes. - The
GET /api/jailsresponse still returnsbackend: "polling"andidle: falsefor each jail (unchanged defaults). - All existing tests in
backend/tests/pass. - The detection result is reset if
_backend_cmd_supportedisNone(i.e., on application restart), so a fail2ban upgrade that adds support is detected automatically within one polling cycle.