fix: retry, semaphore, reload lock, activation verify, bans_by_jail diagnostics

Stage 1.1-1.3: reload_all include/exclude_jails params already implemented;
  added keyword-arg assertions in router and service tests.

Stage 2.1/6.1: _send_command_sync retry loop (3 attempts, 150ms exp backoff)
  retrying on EAGAIN/ECONNREFUSED/ENOBUFS; immediate raise on all other errors.

Stage 2.2: asyncio.Lock at module level in jail_service.reload_all to
  serialize concurrent reload--all commands.

Stage 3.1: activate_jail re-queries _get_active_jail_names after reload;
  returns active=False with descriptive message if jail did not start.

Stage 4.1/6.2: asyncio.Semaphore (max 10) in Fail2BanClient.send, lazy-
  initialized; logs fail2ban_command_waiting_semaphore at debug when waiting.

Stage 5.1/5.2: unit tests asserting reload_all is called with include_jails
  and exclude_jails; activation verification happy/sad path tests.

Stage 6.3: TestSendCommandSyncRetry (5 cases) + TestFail2BanClientSemaphore
  concurrency test.

Stage 7.1-7.3: _since_unix uses time.time(); bans_by_jail debug logging with
  since_iso; diagnostic warning when total==0 despite table rows; unit test
  verifying the warning fires for stale data.
This commit is contained in:
2026-03-14 11:09:55 +01:00
parent 2274e20123
commit 2f2e5a7419
9 changed files with 880 additions and 115 deletions

View File

@@ -37,6 +37,12 @@ log: structlog.stdlib.BoundLogger = structlog.get_logger()
_SOCKET_TIMEOUT: float = 10.0
# Guard against concurrent reload_all calls. Overlapping ``reload --all``
# commands sent to fail2ban's socket produce undefined behaviour and may cause
# jails to be permanently removed from the daemon. Serialising them here
# ensures only one reload stream is in-flight at a time.
_reload_all_lock: asyncio.Lock = asyncio.Lock()
# ---------------------------------------------------------------------------
# Custom exceptions
# ---------------------------------------------------------------------------
@@ -540,7 +546,12 @@ async def reload_jail(socket_path: str, name: str) -> None:
raise JailOperationError(str(exc)) from exc
async def reload_all(socket_path: str) -> None:
async def reload_all(
socket_path: str,
*,
include_jails: list[str] | None = None,
exclude_jails: list[str] | None = None,
) -> None:
"""Reload all fail2ban jails at once.
Fetches the current jail list first so that a ``['start', name]`` entry
@@ -548,8 +559,14 @@ async def reload_all(socket_path: str) -> None:
non-empty stream the end-of-reload phase deletes every jail that received
no configuration commands.
*include_jails* are added to the stream (e.g. a newly activated jail that
is not yet running). *exclude_jails* are removed from the stream (e.g. a
jail that was just deactivated and should not be restarted).
Args:
socket_path: Path to the fail2ban Unix domain socket.
include_jails: Extra jail names to add to the start stream.
exclude_jails: Jail names to remove from the start stream.
Raises:
JailOperationError: If fail2ban reports the operation failed.
@@ -557,17 +574,26 @@ async def reload_all(socket_path: str) -> None:
cannot be reached.
"""
client = Fail2BanClient(socket_path=socket_path, timeout=_SOCKET_TIMEOUT)
try:
# Resolve jail names so we can build the minimal config stream.
status_raw = _ok(await client.send(["status"]))
status_dict = _to_dict(status_raw)
jail_list_raw: str = str(status_dict.get("Jail list", ""))
jail_names = [n.strip() for n in jail_list_raw.split(",") if n.strip()]
stream: list[list[str]] = [["start", n] for n in jail_names]
_ok(await client.send(["reload", "--all", [], stream]))
log.info("all_jails_reloaded")
except ValueError as exc:
raise JailOperationError(str(exc)) from exc
async with _reload_all_lock:
try:
# Resolve jail names so we can build the minimal config stream.
status_raw = _ok(await client.send(["status"]))
status_dict = _to_dict(status_raw)
jail_list_raw: str = str(status_dict.get("Jail list", ""))
jail_names = [n.strip() for n in jail_list_raw.split(",") if n.strip()]
# Merge include/exclude sets so the stream matches the desired state.
names_set: set[str] = set(jail_names)
if include_jails:
names_set.update(include_jails)
if exclude_jails:
names_set -= set(exclude_jails)
stream: list[list[str]] = [["start", n] for n in sorted(names_set)]
_ok(await client.send(["reload", "--all", [], stream]))
log.info("all_jails_reloaded")
except ValueError as exc:
raise JailOperationError(str(exc)) from exc
# ---------------------------------------------------------------------------