Files
BanGUI/Docs/Backend-Development.md
Lukas d476e9d611 TASK-020: Fix log_target security vulnerability (defense in depth)
**Issue:**
- log_target accepted arbitrary paths, allowing authenticated users to write
  files as root via fail2ban (e.g., /etc/cron.d/bangui-pwned)
- fail2ban runs as root and opens files specified in log_target

**Solution:**
1. **Model layer validation:** Already existed in GlobalConfigUpdate, prevents
   invalid paths before reaching service
2. **Service layer validation:** Added defensive check in update_global_config()
   that validates log_target even if model validation is bypassed
3. **New validation helper:** Added validate_log_target() utility that accepts
   special values (STDOUT, STDERR, SYSLOG) or paths within allowed directories

**Changes:**
- app/utils/path_utils.py: Added validate_log_target() helper
- app/services/config_service.py: Added service-layer validation before
  sending command to fail2ban
- backend/tests: Fixed session_secret length issues in fixtures (min 32 chars)
- backend/tests: Added tests for valid special log targets
- Docs/Backend-Development.md: Documented log_target security requirements

**Test Coverage:**
- Model validation rejects /etc/passwd (existing test)
- Model validation accepts STDOUT, STDERR, SYSLOG special values
- Model validation accepts paths in allowed directories
- Service layer validation tested with special values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 14:23:56 +02:00

49 KiB
Raw Blame History

Backend Development — Rules & Guidelines

Rules and conventions every backend developer must follow. Read this before writing your first line of code.


1. Language & Typing

  • Python 3.12+ is the minimum version.
  • Every function, method, and variable must have explicit type annotations — no exceptions.
  • Use str, int, float, bool, None for primitives.
  • Use list[T], dict[K, V], set[T], tuple[T, ...] (lowercase, built-in generics) — never typing.List, typing.Dict, etc.
  • Use T | None instead of Optional[T].
  • Use TypeAlias, TypeVar, Protocol, and NewType when they improve clarity.
  • Return types are mandatory — including -> None.
  • Never use Any unless there is no other option and a comment explains why.
  • Run mypy --strict (or pyright in strict mode) — the codebase must pass with zero errors.
# Good
def get_jail_by_name(name: str) -> Jail | None:
    ...

# Bad — missing types
def get_jail_by_name(name):
    ...

2. Core Libraries

Purpose Library Notes
Web framework FastAPI Async endpoints only.
Data validation & settings Pydantic v2 All request/response bodies and config models.
Async HTTP client aiohttp (ClientSession) For external calls (blocklists, IP lookups).
Scheduling APScheduler 4.x (async) Blocklist imports, periodic health checks.
Structured logging structlog Every log call must use structlog — never print() or logging directly.
Database aiosqlite Async SQLite access for the application database.
Testing pytest + pytest-asyncio + httpx (AsyncClient) Every feature needs tests.
Mocking unittest.mock / pytest-mock Isolate external dependencies.
Date & time datetime (stdlib) — always timezone-aware Use datetime.datetime.now(datetime.UTC). Never naive datetimes.
IP / Network ipaddress (stdlib) Validate and normalise IPs and CIDR ranges.
Environment / config pydantic-settings Load .env and environment variables into typed models.
fail2ban integration fail2ban client (bundled) Use the local copy at ./fail2ban-master. Import from ./fail2ban-master/fail2ban/client to communicate with the fail2ban socket. Do not install fail2ban as a pip package.

fail2ban Client Usage

The repository ships with a vendored copy of fail2ban located at ./fail2ban-master. All communication with the fail2ban daemon must go through the client classes found in ./fail2ban-master/fail2ban/client. Add the project root to sys.path (or configure it in pyproject.toml as a path dependency) so that from fail2ban.client ... resolves to the bundled copy.

import sys
from pathlib import Path

# Ensure the bundled fail2ban is importable
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "fail2ban-master"))

from fail2ban.client.csocket import CSSocket  # noqa: E402

Libraries you must NOT use

  • requests — use aiohttp (async).
  • flask — we use FastAPI.
  • celery — we use APScheduler.
  • print() for logging — use structlog.
  • json.loads / json.dumps on Pydantic models — use .model_dump() / .model_validate().

Timestamp Handling

Timestamp consistency is critical for accurate ban history queries across the dashboard and history endpoints. Follow these rules:

Rule 1: Use consistent UTC timestamps

  • All timestamps in the database are stored as Unix epochs (seconds since 1970-01-01 UTC).
  • fail2ban stores timestamps using time.time(), which is always UTC epoch seconds.
  • When querying fail2ban's SQLite database by timestamp, use app.utils.time_utils.since_unix() (not manual datetime calculations).

Rule 2: Time-range windows include a 60-second slack

  • The since_unix() function includes a 60-second slack window (TIME_RANGE_SLACK_SECONDS in app.utils.constants).
  • This slack accommodates:
    • Clock drift between the local system and fail2ban.
    • Test seeding delays when timestamps are manually set to exact boundaries.
  • The slack ensures that dashboard and history queries return consistent row counts for the same time range.

Rule 3: Never duplicate timestamp calculation logic

  • All services that query by time range must import and use since_unix().
  • Do not recalculate timestamps locally using datetime or time modules in service code.
  • If you need a timestamp for a time range, use since_unix().

Example:

from app.utils.time_utils import since_unix

# Get all bans from the last 24 hours (with 60-second slack)
since_ts: int = since_unix("24h")
rows = await db.execute(
    "SELECT * FROM bans WHERE timeofban >= ?",
    (since_ts,)
)

3. Project Structure

backend/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI app factory, lifespan
│   ├── config.py            # Pydantic settings
│   ├── dependencies.py      # FastAPI dependency providers
│   ├── models/              # Pydantic schemas (request, response, domain)
│   ├── routers/             # FastAPI routers grouped by feature
│   ├── services/            # Business logic — one service per domain
│   ├── repositories/        # Database access layer
│   ├── tasks/               # APScheduler jobs
│   └── utils/               # Helpers, constants, shared types
├── tests/
│   ├── conftest.py
│   ├── test_routers/
│   ├── test_services/
│   └── test_repositories/
├── pyproject.toml
└── .env.example
  • Routers receive requests, validate input via Pydantic, and delegate to services.
  • Services contain business logic and call repositories or external clients.
  • Repositories handle raw database queries — nothing else.
  • Never put business logic inside routers or repositories.

4. FastAPI Conventions

  • Use async def for every endpoint — no sync endpoints.

  • Every endpoint must declare explicit response models (response_model=...).

  • Use Pydantic models for request bodies and query parameters — never raw dicts.

  • Use Depends() for dependency injection (database sessions, services, auth).

  • Group endpoints into routers by feature domain (routers/jails.py, routers/bans.py, …).

  • Use appropriate HTTP status codes: 201 for creation, 204 for deletion with no body, 404 for not found, etc.

  • Protected endpoints should return 401 Unauthorized or 403 Forbidden when the session is invalid or expired; the frontend treats these responses as a session-expiry event and redirects the user to /login.

  • Use HTTPException or custom exception handlers — never return error dicts manually.

  • GET endpoints are read-only — never call db.commit() or execute INSERT/UPDATE/DELETE inside a GET handler. If a GET path produces side-effects (e.g., caching resolved data), that write belongs in a background task, a scheduled flush, or a separate POST endpoint. Users and HTTP caches assume GET is idempotent and non-mutating.

    # Good — pass db=None on GET so geo_service never commits
    result = await geo_service.lookup_batch(ips, http_session, db=None)
    
    # Bad — triggers INSERT + COMMIT per IP inside a GET handler
    result = await geo_service.lookup_batch(ips, http_session, db=app_db)
    
from fastapi import APIRouter, Depends, HTTPException, status
from app.models.jail import JailResponse, JailListResponse
from app.services.jail_service import JailService

router: APIRouter = APIRouter(prefix="/api/jails", tags=["Jails"])

@router.get("/", response_model=JailListResponse)
async def list_jails(service: JailService = Depends()) -> JailListResponse:
    jails: list[JailResponse] = await service.get_all_jails()
    return JailListResponse(jails=jails)

5. Pydantic Models

  • Every model inherits from pydantic.BaseModel.
  • Use model_config = ConfigDict(strict=True) where appropriate.
  • Field names use snake_case in Python, export as camelCase to the frontend via alias generators if needed.
  • Validate at the boundary — once data enters a Pydantic model it is trusted.
  • Use Field(...) with descriptions for every field to keep auto-generated docs useful.
  • Separate request models, response models, and domain (internal) models — do not reuse one model for all three.
from pydantic import BaseModel, Field
from datetime import datetime

class BanResponse(BaseModel):
    ip: str = Field(..., description="Banned IP address")
    jail: str = Field(..., description="Jail that issued the ban")
    banned_at: datetime = Field(..., description="UTC timestamp of the ban")
    expires_at: datetime | None = Field(None, description="UTC expiry, None if permanent")
    ban_count: int = Field(..., ge=1, description="Number of times this IP was banned")

Using Literal Types for Constrained Strings

When a field should only accept a small set of predefined values, use Literal to enforce this at the type level:

from typing import Literal
from pydantic import BaseModel, Field

LogLevel = Literal["CRITICAL", "ERROR", "WARNING", "NOTICE", "INFO", "DEBUG"]

class GlobalConfigUpdate(BaseModel):
    log_level: LogLevel | None = Field(
        default=None,
        description="Log level: CRITICAL, ERROR, WARNING, NOTICE, INFO, or DEBUG.",
    )

This provides:

  • Type safety — IDEs and type checkers enforce valid values.
  • API documentation — OpenAPI docs automatically list all allowed values.
  • Validation — Pydantic rejects invalid values and provides a clear error message.

Custom Field Validators

For fields that require complex validation (e.g., file paths that must be within allowed directories), use @field_validator:

from pydantic import field_validator
from app.utils.path_utils import validate_log_path

class AddLogPathRequest(BaseModel):
    log_path: str = Field(..., description="Absolute path to the log file to monitor.")

    @field_validator("log_path", mode="after")
    @classmethod
    def validate_log_path_field(cls, value: str) -> str:
        """Validate that the log path is within allowed directories."""
        return validate_log_path(value)

Path Validation Helper:

For query parameters and other contexts where Pydantic validators cannot be used directly, use the validate_log_path() helper from app.utils.path_utils:

from fastapi import HTTPException, status
from app.utils.path_utils import validate_log_path

@router.delete("/{name}/logpath")
async def delete_log_path(
    name: str,
    log_path: str = Query(...),
) -> None:
    try:
        validate_log_path(log_path)
    except ValueError as e:
        raise HTTPException(
            status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
            detail=str(e),
        ) from e
    # ... rest of handler

Key points:

  • Use mode="after" in model validators to validate after Pydantic's basic type coercion.
  • Raise ValueError if validation fails; Pydantic converts it to an HTTP 400 response.
  • For query parameters that cannot use Pydantic validators, use the validate_log_path() helper and raise HTTP 422.
  • Never use string prefix matching for path validation (e.g., path.startswith("/var/log")). The helper uses Path.relative_to() to prevent bypasses like /var/log_evil/file.log.
  • Symlinks are resolved before validating to prevent symlink-based escapes.

6. Async Rules

  • Never call blocking / synchronous I/O in an async function — no time.sleep(), no synchronous file reads, no requests.get().

  • Use aiohttp.ClientSession for HTTP calls, aiosqlite for database access.

  • Use asyncio.TaskGroup (Python 3.11+) when you need to run independent coroutines concurrently.

  • Long-running startup/shutdown logic goes into the FastAPI lifespan context manager.

  • Never call db.commit() inside a loop. With aiosqlite, every commit serialises through a background thread and forces an fsync. N rows × 1 commit = N fsyncs. Accumulate all writes in the loop, then issue a single db.commit() once after the loop ends. The difference between 5,000 commits and 1 commit can be seconds vs milliseconds.

    # Good — one commit for the whole batch
    for ip, info in results.items():
        await db.execute(INSERT_SQL, (ip, info.country_code, ...))
    await db.commit()  # ← single fsync
    
    # Bad — one fsync per row
    for ip, info in results.items():
        await db.execute(INSERT_SQL, (ip, info.country_code, ...))
        await db.commit()  # ← fsync on every iteration
    
  • Prefer executemany() over calling execute() in a loop when inserting or updating multiple rows with the same SQL template. aiosqlite passes the entire batch to SQLite in one call, reducing Python↔thread overhead on top of the single-commit saving.

    # Good
    await db.executemany(INSERT_SQL, [(ip, cc, cn, asn, org) for ip, info in results.items()])
    await db.commit()
    
  • Shared resources (DB connections, HTTP sessions) are created once during startup and closed during shutdown — never inside request handlers.

from contextlib import asynccontextmanager
from collections.abc import AsyncGenerator
from fastapi import FastAPI
import aiohttp
import aiosqlite

@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None]:
    # Startup
    app.state.http_session = aiohttp.ClientSession()
    app.state.db = await aiosqlite.connect("bangui.db")
    yield
    # Shutdown
    await app.state.http_session.close()
    await app.state.db.close()

6.1 Database Query Conventions

LIKE Queries and Wildcard Escaping

SQLite's LIKE operator treats % (any sequence of characters) and _ (any single character) as wildcards. When querying with user-supplied filters that may contain these characters, you must escape them to prevent unintended matches.

The Problem:

# Bad — ip_filter="10.0.0_" matches "10.0.0.1", "10.0.0.2", etc.
ip_filter = "10.0.0_"
await db.execute(
    "SELECT * FROM bans WHERE ip LIKE ?",
    (f"{ip_filter}%",)  # ← wildcard characters not escaped
)

The Solution:

Use the escape_like() helper from app.utils.fail2ban_db_utils:

from app.utils.fail2ban_db_utils import escape_like

# Good — wildcard characters are escaped
ip_filter = "10.0.0_"
await db.execute(
    "SELECT * FROM bans WHERE ip LIKE ? ESCAPE '\\'",
    (f"{escape_like(ip_filter)}%",)  # ← underscores escaped to literal
)

How escape_like() works:

The function escapes backslashes first, then % and _ signs:

def escape_like(s: str) -> str:
    return s.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")

Key rules:

  1. Backslash escapes first — to prevent double-escaping when the input contains backslashes.
  2. Add ESCAPE '\\' to the SQL — tells SQLite which character to use for escaping.
  3. Dots are not wildcards — they do not need escaping; normal IP addresses pass through unchanged.

Test example:

assert escape_like("10.0.0_") == "10.0.0\\_"
assert escape_like("10.0.0%test") == "10.0.0\\%test"
assert escape_like("10.0.0.1") == "10.0.0.1"  # Unchanged

7. Logging

  • Use structlog for every log message.
  • Bind contextual key-value pairs — never format strings manually.
  • Log levels: debug for development detail, info for operational events, warning for recoverable issues, error for failures, critical for fatal problems.
  • Never log sensitive data (passwords, tokens, session tokens, raw credentials, private keys).
    • For session correlation without leaking token material, use a one-way hash fragment: hashlib.sha256(token.encode()).hexdigest()[:12].
    • Use numeric database IDs for entity correlation instead of raw identifiers: session_id=session.id instead of token=session.token.
import structlog
import hashlib

log: structlog.stdlib.BoundLogger = structlog.get_logger()

async def ban_ip(ip: str, jail: str) -> None:
    log.info("banning_ip", ip=ip, jail=jail)
    try:
        await _execute_ban(ip, jail)
        log.info("ip_banned", ip=ip, jail=jail)
    except BanError as exc:
        log.error("ban_failed", ip=ip, jail=jail, error=str(exc))
        raise

async def logout_session(db: aiosqlite.Connection, token: str) -> None:
    # Use a one-way hash for token correlation in logs
    token_hash = hashlib.sha256(token.encode()).hexdigest()[:12]
    await session_repo.delete_session(db, token)
    log.info("session_terminated", token_hash=token_hash)

8. Error Handling

  • Define custom exception classes for domain errors (e.g., JailNotFoundError, BanFailedError).
  • Catch specific exceptions — never bare except: or except Exception: without re-raising.
  • Map domain exceptions to HTTP status codes via FastAPI exception handlers registered on the app.
  • Always log errors with context before raising.
class JailNotFoundError(Exception):
    def __init__(self, name: str) -> None:
        self.name: str = name
        super().__init__(f"Jail '{name}' not found")

# In main.py
@app.exception_handler(JailNotFoundError)
async def jail_not_found_handler(request: Request, exc: JailNotFoundError) -> JSONResponse:
    return JSONResponse(status_code=404, content={"detail": f"Jail '{exc.name}' not found"})

Routers and Exception Propagation

  • Routers must NOT construct HTTPException for domain errors — let domain exceptions propagate.
  • Routers should never have helper functions like _bad_gateway(), _not_found(), _conflict() etc. that convert domain exceptions to HTTPException.
  • All domain exception types must have corresponding handlers registered in main.py via app.add_exception_handler().
  • Exception handlers are registered in order from most specific to least specific — FastAPI evaluates them in registration order.
# ❌ BAD — routers constructing HTTPException for domain exceptions
@router.get("/{name}")
async def get_jail(name: str, socket_path: Fail2BanSocketDep) -> JailDetailResponse:
    try:
        return await jail_service.get_jail(socket_path, name)
    except JailNotFoundError:
        raise HTTPException(status_code=404, detail=f"Jail not found: {name!r}") from None

# ✅ GOOD — domain exception propagates to global handler
@router.get("/{name}")
async def get_jail(name: str, socket_path: Fail2BanSocketDep) -> JailDetailResponse:
    return await jail_service.get_jail(socket_path, name)

All domain exceptions raised by services propagate to handlers in main.py, ensuring:

  1. Consistent error response format across the entire API.
  2. No duplicated exception-to-HTTP-status mapping logic.
  3. Easy to audit all error codes — they are all in one place.

9. Testing

  • Every new feature or bug fix must include tests.
  • Tests live in tests/ mirroring the app/ structure.
  • Use pytest with pytest-asyncio for async tests.
  • Use httpx.AsyncClient to test FastAPI endpoints (not TestClient which is sync).
  • Mock external dependencies (fail2ban socket, aiohttp calls) — tests must never touch real infrastructure.
  • Aim for >80 % line coverage — critical paths (auth, banning, scheduling) must be 100 %.
  • Test names follow test_<unit>_<scenario>_<expected> pattern.
import pytest
from httpx import AsyncClient, ASGITransport
from app.main import create_app

@pytest.fixture
async def client() -> AsyncClient:
    app = create_app()
    transport: ASGITransport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as ac:
        yield ac

@pytest.mark.asyncio
async def test_list_jails_returns_200(client: AsyncClient) -> None:
    response = await client.get("/api/jails/")
    assert response.status_code == 200
    data: dict = response.json()
    assert "jails" in data

9.1 Background Tasks and Scheduler Architecture

BanGUI uses APScheduler 4.x (async mode) to manage background jobs that execute on a schedule without user interaction. This section documents how to write and register background tasks.

Task Location and Structure

All background tasks live in backend/app/tasks/ as separate modules. Each task:

  • Exports a register(app: FastAPI) -> None or async def register(app: FastAPI) -> None function.
  • Opens its own database connection using app.db.open_db() or the task_db() helper.
  • Closes connections when work completes (use the async context manager pattern).
  • Runs independently of the FastAPI request/response cycle.

Example Task

# backend/app/tasks/my_task.py
import structlog
from fastapi import FastAPI
from apscheduler.schedulers.asyncio import AsyncIOScheduler

log = structlog.get_logger()

async def my_background_job(app: FastAPI) -> None:
    """Do important work on a schedule."""
    log.info("my_background_job_started")
    try:
        db = await app.db.open_db(app.state.settings.database_path)
        try:
            # Do work...
            pass
        finally:
            await db.close()
    except Exception:
        log.error("my_background_job_failed", exc_info=True)

def register(app: FastAPI) -> None:
    """Register the job with the scheduler."""
    scheduler: AsyncIOScheduler = app.state.scheduler
    scheduler.add_job(
        my_background_job,
        args=(app,),
        trigger="interval",
        seconds=60,
        id="my_task",
        name="My Background Job",
    )

Accessing Shared Resources in Tasks

Since tasks do not have access to Depends(get_db) (no request scope), they must:

  1. Open their own DB connection via app.state.db_factory.open_db(path).
  2. Access app-level stateapp.state.http_session, app.state.geo_cache, app.state.settings, etc.
  3. Use structlog for all logging (never print()).

Single-Worker Requirement

The scheduler is bound to a single asyncio event loop and cannot be shared across multiple worker processes. BanGUI enforces single-worker mode to prevent duplicate task execution.

  • Deployment constraint: Set BANGUI_WORKERS=1 (default).
  • Startup validation: startup_shared_resources() raises RuntimeError if BANGUI_WORKERS > 1.
  • See Architekture.md § 9.2 for full details.

10. Code Style & Tooling

Tool Purpose
Ruff Linter and formatter (replaces black, isort, flake8).
mypy or pyright Static type checking in strict mode.
pre-commit Run ruff + type checker before every commit.
  • Line length: 120 characters max.
  • Strings: use double quotes (").
  • Imports: sorted by ruff — stdlib → third-party → local, one import per line.
  • No unused imports, no unused variables, no # type: ignore without explanation.
  • Docstrings in Google style on every public function, class, and module.

11. fail2ban Response Utilities

All services that interact with the fail2ban daemon must use the canonical response parsing utilities from app.utils.fail2ban_response. This ensures consistent error handling, type safety, and makes it easy to fix bugs in response handling across the entire codebase.

Available Functions

ok(response: object) -> object Extracts the payload from a fail2ban (return_code, data) response tuple.

  • Raises ValueError if return code ≠ 0 or response shape is invalid.
  • Use this on every response from Fail2BanClient.send().

to_dict(pairs: object) -> dict[str, object] Converts a list of (key, value) pairs (fail2ban's native response format) to a Python dict.

  • Silently ignores malformed entries and non-list/tuple inputs.
  • Always returns a dict (empty if input is invalid).

ensure_list(value: object | None) -> list[str] Coerces fail2ban response values (which may be None, a single string, or a list) to a normalized list of strings.

  • Handles all three cases consistently.
  • Returns empty list for None or empty strings.

is_not_found_error(exc: Exception) -> bool Checks if an exception indicates a jail does not exist.

  • Checks for multiple error message patterns (case-insensitive).
  • Use this to distinguish "jail not found" errors from other failures.

Example Usage

from app.utils.fail2ban_response import ok, to_dict, ensure_list, is_not_found_error
from app.utils.fail2ban_client import Fail2BanClient

client = Fail2BanClient(socket_path="/var/run/fail2ban/fail2ban.sock")

try:
    # Get jail status
    response = await client.send(["status", "sshd", "short"])
    status_dict = to_dict(ok(response))  # Extract payload and convert to dict
    
    # Get list of banned IPs
    ban_response = await client.send(["get", "sshd", "banip"])
    banned_ips = ensure_list(ok(ban_response))  # Normalize to list of strings
    
except ValueError as exc:
    if is_not_found_error(exc):
        raise JailNotFoundError("sshd") from exc
    raise

Why This Matters

Before this utility module, every service implemented its own copy of these functions, leading to:

  • Code duplication across 7+ service files.
  • Subtle inconsistencies in error handling.
  • Difficult maintenance — every bug fix required touching multiple files.

Now, all services import from a single authoritative source, making response handling consistent, maintainable, and type-safe.


12. Configuration & Secrets

  • All configuration lives in environment variables loaded through pydantic-settings.
  • Secrets (master password hash, session key) are never committed to the repository.
  • Provide a .env.example with all keys and placeholder values.
  • Validate config at startup — fail fast with a clear error if a required value is missing.
from pydantic_settings import BaseSettings
from pydantic import Field

class Settings(BaseSettings):
    database_path: str = Field("bangui.db", description="Path to SQLite database")
    fail2ban_socket: str = Field("/var/run/fail2ban/fail2ban.sock", description="fail2ban socket path")
    session_secret: str = Field(..., description="Secret key for session signing")
    log_level: str = Field("info", description="Logging level")

    model_config = {"env_prefix": "BANGUI_", "env_file": ".env"}

Session Secret Configuration

The session_secret is the HMAC key used to sign all session tokens. It must be at least 32 characters (256 bits) to provide sufficient cryptographic strength for HMAC-SHA256.

Minimum Length: 32 characters

Why 32 characters? Session tokens are signed using HMAC-SHA256. A secret shorter than 32 bytes (256 bits) significantly weakens the signature, potentially allowing attackers to forge valid tokens. The constraint is enforced at startup — the application will fail to start if session_secret is shorter than 32 characters.

Generation: Generate a secure secret using Python:

python -c "import secrets; print(secrets.token_hex(32))"

This produces a 64-character hexadecimal string (256 bits) suitable for production use.

Environment Variable:

BANGUI_SESSION_SECRET="your-32-character-minimum-secret-here"

Never commit the actual secret to the repository. Provide a .env.example with a placeholder:

# .env.example
BANGUI_SESSION_SECRET="set-this-to-a-32-character-minimum-secret"

The session_cookie_secure configuration controls the Secure flag on the session cookie. This flag prevents browsers from sending the session cookie over unencrypted HTTP.

Default: true — Production deployments are secure by default. Cookies are only sent over HTTPS.

Local Development: Set BANGUI_SESSION_COOKIE_SECURE=false in your compose file or .env to allow cookies over HTTP (required for localhost:8000).

# Docker/compose.debug.yml
environment:
  BANGUI_SESSION_COOKIE_SECURE: "false"  # Allow HTTP during local development

Important: If Secure=true is set, browsers will reject the session cookie when the backend is served over HTTP. Ensure your nginx/reverse proxy terminates TLS and passes X-Forwarded-Proto: https so FastAPI knows the connection is secure.

fail2ban_start_command Configuration

The fail2ban_start_command setting specifies the shell command used to start the fail2ban daemon during recovery operations (e.g., after a rollback).

Format & Parsing:

  • The command is split into arguments using shlex.split(), which respects shell quoting rules.
  • Paths with spaces must be quoted. Example: "/opt/my tools/fail2ban-client" start.
  • The command is not executed through a shell — no shell variables or globbing are interpreted.

Validation:

  • The command is validated at startup using shlex.split(). Mismatched quotes will raise a ValueError with the problematic command in the error message.

Environment Variables:

BANGUI_FAIL2BAN_START_COMMAND="fail2ban-client start"           # Default
BANGUI_FAIL2BAN_START_COMMAND="systemctl start fail2ban"        # systemd
BANGUI_FAIL2BAN_START_COMMAND='"/opt/my tools/fail2ban" start'  # Quoted path

Common Pitfall: Using .split() instead of shlex.split() would break commands with spaces in paths. Always use quoted strings for paths that contain whitespace.

Log Path Validation & Allowlisting

Authenticated users can instruct fail2ban to monitor additional log files through the API endpoint POST /api/config/jails/{name}/logpath. To prevent path-traversal attacks and unauthorized reads of sensitive system files, all requested log paths must resolve to locations within a configurable allowlist of safe directories.

Allowed Directories:

  • Configured via the BANGUI_ALLOWED_LOG_DIRS environment variable (comma-separated list).
  • Defaults to: ["/var/log", "/config/log"].

Path Validation Rules:

  1. The requested path is resolved to its canonical form using Path(log_path).resolve(), which:
    • Expands relative paths to absolute paths.
    • Resolves symbolic links to their real targets.
    • Normalizes . and .. components.
  2. The resolved path is checked using Path.is_relative_to() against each allowed directory prefix.
  3. If the resolved path is not relative to any allowed directory, a ValueError is raised with a descriptive error message.

Implementation:

  • Validation occurs in the Pydantic model AddLogPathRequest using a @field_validator.
  • The validator runs at request time, before the service layer is invoked.
  • Symlinks that escape allowed directories are rejected (see symlink bypass tests).

Important: Use is_relative_to(), not startswith() or string prefix matching. The latter is bypassable with paths like /var/log_evil/file.log.

Environment Variables:

BANGUI_ALLOWED_LOG_DIRS="/var/log,/config/log"                    # Default
BANGUI_ALLOWED_LOG_DIRS="/var/log,/config/log,/home/app/logs"     # Custom directory

Log Target Validation (fail2ban)

The log_target field on the global config endpoint (PUT /api/config/global) is critical for security because fail2ban runs as root. Users can only set log targets to:

  1. Special values: STDOUT, STDERR, SYSLOG (case-insensitive)
  2. File paths: Must resolve to one of the configured allowed directories (same allowlist as log paths)

Why This Matters:

  • fail2ban creates/opens files with root privileges. Without validation, an attacker could write to arbitrary system paths (e.g., /etc/cron.d/malicious_script).
  • Validation occurs at both the Pydantic model layer (GlobalConfigUpdate.validate_log_target()) and the service layer (update_global_config()) for defense in depth.
  • This prevents both HTTP and non-HTTP attack vectors.

Implementation:

# Model layer: Automatic validation via @field_validator
update = GlobalConfigUpdate(log_target="/etc/passwd")  # Raises ValidationError → HTTP 422

# Service layer: Defense in depth
await config_service.update_global_config(socket_path, update)  # Validates again before sending to fail2ban

Login Rate Limiting

The login endpoint (POST /api/auth/login) is protected against brute-force attacks using an in-memory rate limiter.

Design:

  • Uses a dict[str, deque[float]] keyed by client IP, storing login attempt timestamps within a time window.
  • Attempts outside the window are automatically removed during validation checks.
  • Expired IP entries are cleaned up to prevent unbounded memory growth.

Rate Limit Rules:

  • 5 attempts per 60 seconds per IP address.
  • Requests exceeding the limit return HTTP 429 Too Many Requests with a Retry-After header.
  • Each failed login triggers a 10-second server-side delay (asyncio.sleep) to further slow attacks, on top of bcrypt hashing (~100ms).

IP Extraction (Proxy Safety):

  • When behind nginx, the rate limiter reads the real client IP from X-Forwarded-For or X-Real-IP headers.
  • Only trusts these headers when the immediate connection source is in a configured trusted proxy list.
  • Prevents attackers from spoofing these headers to bypass rate limits.
  • Falls back to the direct connection IP when proxy headers cannot be trusted.

Process-Local Limitation:

  • The rate limiter is process-local (in-memory). In multi-worker deployments (e.g., Gunicorn with 4 workers), each worker maintains its own rate limit counter.
  • This is acceptable because the single-worker constraint is enforced elsewhere. See TASK-002/003 notes for details.

Implementation:

  • Rate limiter: app.utils.rate_limiter.RateLimiter
  • IP extraction: app.utils.client_ip.get_client_ip()
  • Dependency: LoginRateLimiterDep in app.dependencies

13. File I/O Conventions

All file write operations to critical configuration files must be atomic to prevent corruption if the process is killed mid-write.

Atomic File Writes

Configuration files (e.g., fail2ban jail configs in jail.d/) are essential for system operation. A truncated or corrupt config file can break fail2ban's ability to reload and may disable active protection.

Rule: Always use write-to-temp + atomic rename

Never use Path.write_text() or file.write() directly for critical files. Instead:

  1. Create a temporary file in the same directory as the target (crucial for atomic os.replace()).
  2. Write content to the temp file.
  3. Atomically rename the temp file to replace the target.
  4. Clean up the temp file if an error occurs.

Implementation Pattern:

import os
import tempfile
from pathlib import Path

target = Path("/path/to/config/file.conf")

tmp_name: str | None = None
try:
    # Create temp file in target's directory (same filesystem = atomic)
    with tempfile.NamedTemporaryFile(
        mode="w",
        encoding="utf-8",
        dir=target.parent,
        delete=False,
        suffix=".tmp",
    ) as tmp:
        tmp.write(content)
        tmp_name = tmp.name
    # Atomic rename (single syscall on POSIX systems)
    os.replace(tmp_name, target)
except OSError as exc:
    # Clean up temp file on error
    with contextlib.suppress(OSError):
        if tmp_name is not None:
            os.unlink(tmp_name)
    raise ConfigWriteError(f"Cannot write config: {exc}") from exc

Why this matters:

  • Path.write_text() overwrites in place. If the process dies mid-write, the file is left truncated or partially written.
  • os.replace() is atomic on POSIX systems (single rename syscall) only if source and target are on the same filesystem.
  • Creating the temp file in target.parent ensures atomicity.
  • On Linux containers, this prevents config corruption and service degradation.

Files requiring atomic writes:

  • All config files under jail.d/ (created/modified by _write_conf_file and _create_conf_file)
  • Any critical state files that fail2ban relies on

Examples in the codebase:

  • app/services/config_file_helpers.py: _write_conf_file, _create_conf_file
  • app/services/jail_config_service.py: _write_local_file_sync, _restore_local_file_sync

14. Git & Workflow

  • Branch naming: feature/<short-description>, fix/<short-description>, chore/<short-description>.
  • Commit messages: imperative tense, max 72 chars first line (Add jail reload endpoint, Fix ban history query).
  • Every merge request must pass: ruff, type checker, all tests.
  • Do not merge with failing CI.
  • Keep pull requests small and focused — one feature or fix per PR.

15. Coding Principles

These principles are non-negotiable. Every backend contributor must internalise and apply them daily.

15.1 Clean Code

  • Write code that reads like well-written prose — a new developer should understand intent without asking.
  • Meaningful names — variables, functions, and classes must reveal their purpose. Avoid abbreviations (cnt, mgr, tmp) unless universally understood.
  • Small functions — each function does exactly one thing. If you need a comment to explain a block inside a function, extract it into its own function.
  • No magic numbers or strings — use named constants.
  • Boy Scout Rule — leave every file cleaner than you found it.
  • Avoid deep nesting — prefer early returns (guard clauses) to keep the happy path at the top indentation level.
# Good — guard clause, clear name, one job
async def get_active_ban(ip: str, jail: str) -> Ban:
    ban: Ban | None = await repo.find_ban(ip=ip, jail=jail)
    if ban is None:
        raise BanNotFoundError(ip=ip, jail=jail)
    if ban.is_expired():
        raise BanExpiredError(ip=ip, jail=jail)
    return ban

# Bad — nested, vague name
async def check(ip, j):
    b = await repo.find_ban(ip=ip, jail=j)
    if b:
        if not b.is_expired():
            return b
        else:
            raise Exception("expired")
    else:
        raise Exception("not found")

15.2 Separation of Concerns (SoC)

  • Each module, class, and function must have a single, well-defined responsibility.
  • Routers → HTTP layer only (parse requests, return responses).
  • Services → business logic and orchestration.
  • Repositories → data access and persistence.
  • Models → data shapes and validation.
  • Tasks → scheduled background jobs.
  • Never mix layers — a router must not execute SQL, and a repository must not raise HTTPException.

15.3 Single Responsibility Principle (SRP)

  • A class or module should have one and only one reason to change.
  • If a service handles both ban management and email notifications, split it into BanService and NotificationService.

15.4 Don't Repeat Yourself (DRY)

  • Extract shared logic into utility functions, base classes, or dependency providers.
  • If the same block of code appears in more than one place, refactor it into a single source of truth.
  • But don't over-abstract — premature DRY that couples unrelated features is worse than a little duplication (see Rule of Three: refactor when something appears a third time).

15.5 KISS — Keep It Simple, Stupid

  • Choose the simplest solution that works correctly.
  • Avoid clever tricks, premature optimisation, and over-engineering.
  • If a standard library function does the job, prefer it over a custom implementation.

15.6 YAGNI — You Aren't Gonna Need It

  • Do not build features, abstractions, or config options "just in case".
  • Implement what is required now. Extend later when a real need emerges.

15.7 Dependency Inversion Principle (DIP)

  • High-level modules (services) must not depend on low-level modules (repositories) directly. Both should depend on abstractions (protocols / interfaces).
  • Use FastAPI's Depends() to inject implementations — this makes swapping and testing trivial.
from typing import Protocol

class BanRepository(Protocol):
    async def find_ban(self, ip: str, jail: str) -> Ban | None: ...
    async def save_ban(self, ban: Ban) -> None: ...

class SqliteBanRepository:
    """Concrete implementation — depends on aiosqlite."""
    async def find_ban(self, ip: str, jail: str) -> Ban | None: ...
    async def save_ban(self, ban: Ban) -> None: ...

13.7.1 Repository Module Pattern — Module-as-Protocol Structural Compatibility

BanGUI uses module-level functions for repository implementations, not classes. Each repository module (e.g., session_repo.py, blocklist_repo.py) exports async functions that match the signatures defined in the Protocol interface in protocols.py. This is a structural typing pattern — mypy accepts the module as a valid Protocol implementation because the function signatures match, even though the module is not explicitly annotated as implementing the Protocol.

This approach works correctly with FastAPI's dependency injection via cast():

# In app/repositories/session_repo.py
async def create_session(db: aiosqlite.Connection, token: str, created_at: str, expires_at: str) -> Session:
    """Insert a new session row."""
    ...

# In app/repositories/protocols.py
class SessionRepository(Protocol):
    async def create_session(
        self,
        db: aiosqlite.Connection,
        token: str,
        created_at: str,
        expires_at: str,
    ) -> Session:
        ...

# In app/dependencies.py
async def get_session_repo() -> SessionRepository:
    """Provide the concrete session repository implementation."""
    from app.repositories import session_repo
    return session_repo  # ← mypy accepts this because the module has matching functions

Why this pattern is used:

  • Simplicity — no boilerplate class/instance wrapping.
  • Compatibility — Python's structural typing (PEP 544) means the module automatically satisfies the Protocol interface if function signatures match.
  • Testability — the same DIP principle applies; services depend on the Protocol, not the module directly, so tests can mock the Protocol.

Risks and mitigations:

  • Silent breakage if function signatures change — If a parameter is added or removed from a module function, the module no longer satisfies the Protocol, but mypy does not flag this as an error because the module is loosely coupled. To prevent this, Protocol signatures in protocols.py are the source of truth. Always check that module functions match the Protocol definitions before merging changes. The CI/CD pipeline validates this compatibility at build time.

How the validation works (CI check):

  • Before each deployment, run mypy --strict to ensure all dependency providers return values compatible with their Protocol types.
  • The cast() calls in dependencies.py are a documented signal that structural compatibility is being verified externally, not via explicit class inheritance.

13.7.2 Session Cache Pluggability — Process-Local vs. Shared Backends

Session validation is expensive (SQLite lookup + password verification). To improve performance, validated session tokens are cached using the SessionCache interface (app.utils.session_cache). The default implementation, InMemorySessionCache, stores cached sessions in process-local memory.

Current implementation (single-worker):

from app.utils.session_cache import SessionCache, InMemorySessionCache, NoOpSessionCache

class SessionCache(Protocol):
    """Interface for session token validation cache backends."""
    def get(self, token: str) -> Session | None: ...
    def set(self, token: str, session: Session, ttl_seconds: float) -> None: ...
    def invalidate(self, token: str) -> None: ...
    def clear(self) -> None: ...

# Default in-memory implementation — PROCESS-LOCAL
class InMemorySessionCache:
    def __init__(self) -> None:
        self._entries: dict[str, tuple[Session, float]] = {}

Single-worker constraint:

InMemorySessionCache is process-local — each worker process has its own dict. In single-worker mode (enforced by TASK-002), this is safe and improves performance. In multi-worker deployments:

  • A logout by worker A clears the session from A's cache, but worker B still has it → logout doesn't work.
  • Enabling/disabling the cache requires restarting all workers to take effect.

Multi-worker solution:

To support multiple workers (future enhancement), implement a shared backend behind the same SessionCache Protocol:

# Example Redis implementation (not yet in codebase)
class RedisSessionCache:
    """Session cache backed by Redis."""
    def __init__(self, redis_url: str) -> None:
        self.client = aioredis.from_url(redis_url)
    
    async def get(self, token: str) -> Session | None:
        data = await self.client.get(f"session:{token}")
        return Session.model_validate_json(data) if data else None
    
    async def set(self, token: str, session: Session, ttl_seconds: float) -> None:
        await self.client.setex(
            f"session:{token}",
            int(ttl_seconds),
            session.model_dump_json()
        )
    
    async def invalidate(self, token: str) -> None:
        await self.client.delete(f"session:{token}")
    
    async def clear(self) -> None:
        await self.client.flushdb()

To adopt a Redis backend:

  1. Create RedisSessionCache in app.utils.session_cache.
  2. Update app.utils.runtime_state.set_runtime_settings() to instantiate RedisSessionCache when REDIS_URL env var is set.
  3. Update app.config.Settings to accept optional REDIS_URL.
  4. Tests continue to use InMemorySessionCache (no Redis dependency in dev).

Implementation rules:

  • All cache methods must be async (even if the backend is sync).
  • Never log session tokens or session data.
  • TTL must be respected — expired entries must be removed on access.
  • See app/utils/session_cache.py for the full Protocol definition and current implementations.

15.8 Composition over Inheritance

  • Favour composing small, focused objects over deep inheritance hierarchies.
  • Use mixins or protocols only when a clear "is-a" relationship exists; otherwise, pass collaborators as constructor arguments.

15.9 Fail Fast

  • Validate inputs as early as possible — at the API boundary with Pydantic, at service entry with assertions or domain checks.
  • Raise specific exceptions immediately rather than letting bad data propagate silently.

15.10 Law of Demeter (Principle of Least Knowledge)

  • A function should only call methods on:
    1. Its own object (self).
    2. Objects passed as parameters.
    3. Objects it creates.
  • Avoid long accessor chains like request.state.db.cursor().execute(...) — wrap them in a meaningful method.

15.11 Defensive Programming

  • Never trust external input — validate and sanitise everything that crosses a boundary (HTTP request, file, socket, environment variable).
  • Handle edge cases explicitly: empty lists, None values, negative numbers, empty strings.
  • Use type narrowing and exhaustive pattern matching (match / case) to eliminate impossible states.

15.12 SSRF Prevention (Server-Side Request Forgery)

When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests:

  1. Use Pydantic's AnyHttpUrl type to restrict schemes to http:// and https:// only.

    • Rejects file://, ftp://, gopher://, and other non-http schemes at the model boundary.
  2. Validate resolved IP addresses before fetching:

    • Parse the hostname and resolve it via DNS (using socket.getaddrinfo()).
    • Use ipaddress.ip_address().is_private to reject private/reserved ranges:
      • RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
      • Loopback: 127.0.0.0/8, ::1/128
      • Link-local: 169.254.0.0/16, fe80::/10
      • IPv6 site-local, multicast, and reserved ranges.
    • Raise ValueError if validation fails; let the router convert it to HTTP 400.
  3. Guard against DNS rebinding:

    • Validate DNS at URL creation/validation time (performed during request deserialization).
    • For additional safety, re-validate the connection IP at HTTP client time (e.g., custom aiohttp.TCPConnector can inspect the resolved address during connect).
  4. Example implementation (see backend/app/utils/ip_utils.py):

    • is_private_ip(ip_str: str) → bool: Checks if IP is private/reserved/loopback/link-local.
    • async validate_blocklist_url(url: AnyHttpUrl) → None: Async DNS resolution + private IP check.
    • Service layer calls await validate_blocklist_url(url) before persisting; router catches ValueError and returns 400.

16. Quick Reference — Do / Don't

Do Don't
Type every function, variable, return Leave types implicit
Use async def for I/O Use sync functions for I/O
Validate with Pydantic at the boundary Pass raw dicts through the codebase
Log with structlog + context keys Use print() or format strings in logs
Write tests for every feature Ship untested code
Use aiohttp for HTTP calls Use requests
Handle errors with custom exceptions Use bare except:
Keep routers thin, logic in services Put business logic in routers
Use datetime.now(datetime.UTC) Use naive datetimes
Run ruff + mypy before committing Push code that doesn't pass linting
Keep GET endpoints read-only (no db.commit()) Call db.commit() / INSERT inside GET handlers
Batch DB writes; issue one db.commit() after the loop Commit inside a loop (1 fsync per row)
Use executemany() for bulk inserts Call execute() + commit() per row in a loop