# Backend Development — Rules & Guidelines Rules and conventions every backend developer must follow. Read this before writing your first line of code. --- ## 1. Language & Typing - **Python 3.12+** is the minimum version. - **Every** function, method, and variable must have explicit type annotations — no exceptions. - Use `str`, `int`, `float`, `bool`, `None` for primitives. - Use `list[T]`, `dict[K, V]`, `set[T]`, `tuple[T, ...]` (lowercase, built-in generics) — never `typing.List`, `typing.Dict`, etc. - Use `T | None` instead of `Optional[T]`. - Use `TypeAlias`, `TypeVar`, `Protocol`, and `NewType` when they improve clarity. - Return types are **mandatory** — including `-> None`. - Never use `Any` unless there is no other option and a comment explains why. - Run `mypy --strict` (or `pyright` in strict mode) — the codebase must pass with zero errors. ```python # Good def get_jail_by_name(name: str) -> Jail | None: ... # Bad — missing types def get_jail_by_name(name): ... ``` --- ## 2. Core Libraries | Purpose | Library | Notes | |---|---|---| | Web framework | **FastAPI** | Async endpoints only. | | Data validation & settings | **Pydantic v2** | All request/response bodies and config models. | | Async HTTP client | **aiohttp** (`ClientSession`) | For external calls (blocklists, IP lookups). | | Scheduling | **APScheduler 4.x** (async) | Blocklist imports, periodic health checks. | | Structured logging | **structlog** | Every log call must use structlog — never `print()` or `logging` directly. | | Database | **aiosqlite** | Async SQLite access for the application database. | | Testing | **pytest** + **pytest-asyncio** + **httpx** (`AsyncClient`) | Every feature needs tests. | | Mocking | **unittest.mock** / **pytest-mock** | Isolate external dependencies. | | Date & time | **datetime** (stdlib) — always timezone-aware | Use `datetime.datetime.now(datetime.UTC)`. Never naive datetimes. | | IP / Network | **ipaddress** (stdlib) | Validate and normalise IPs and CIDR ranges. | | Environment / config | **pydantic-settings** | Load `.env` and environment variables into typed models. | | fail2ban integration | **fail2ban client** (bundled) | Use the local copy at [`./fail2ban-master`](../fail2ban-master). Import from [`./fail2ban-master/fail2ban/client`](../fail2ban-master/fail2ban/client) to communicate with the fail2ban socket. Do **not** install fail2ban as a pip package. | ### fail2ban Client Usage The repository ships with a vendored copy of fail2ban located at `./fail2ban-master`. All communication with the fail2ban daemon must go through the client classes found in `./fail2ban-master/fail2ban/client`. Add the project root to `sys.path` (or configure it in `pyproject.toml` as a path dependency) so that `from fail2ban.client ...` resolves to the bundled copy. ```python import sys from pathlib import Path # Ensure the bundled fail2ban is importable sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "fail2ban-master")) from fail2ban.client.csocket import CSSocket # noqa: E402 ``` ### Libraries you must NOT use - `requests` — use `aiohttp` (async). - `flask` — we use FastAPI. - `celery` — we use APScheduler. - `print()` for logging — use `structlog`. - `json.loads` / `json.dumps` on Pydantic models — use `.model_dump()` / `.model_validate()`. ### Timestamp Handling Timestamp consistency is critical for accurate ban history queries across the dashboard and history endpoints. Follow these rules: **Rule 1: Use consistent UTC timestamps** - All timestamps in the database are stored as Unix epochs (seconds since 1970-01-01 UTC). - fail2ban stores timestamps using `time.time()`, which is always UTC epoch seconds. - When querying fail2ban's SQLite database by timestamp, use `app.utils.time_utils.since_unix()` (not manual datetime calculations). **Rule 2: Time-range windows include a 60-second slack** - The `since_unix()` function includes a 60-second slack window (`TIME_RANGE_SLACK_SECONDS` in `app.utils.constants`). - This slack accommodates: - Clock drift between the local system and fail2ban. - Test seeding delays when timestamps are manually set to exact boundaries. - The slack ensures that dashboard and history queries return consistent row counts for the same time range. **Rule 3: Never duplicate timestamp calculation logic** - All services that query by time range must import and use `since_unix()`. - Do not recalculate timestamps locally using `datetime` or `time` modules in service code. - If you need a timestamp for a time range, use `since_unix()`. **Example:** ```python from app.utils.time_utils import since_unix # Get all bans from the last 24 hours (with 60-second slack) since_ts: int = since_unix("24h") rows = await db.execute( "SELECT * FROM bans WHERE timeofban >= ?", (since_ts,) ) ``` --- ## 3. Project Structure ``` backend/ ├── app/ │ ├── __init__.py │ ├── main.py # FastAPI app factory, lifespan │ ├── config.py # Pydantic settings │ ├── dependencies.py # FastAPI dependency providers │ ├── models/ # Pydantic schemas (request, response, domain) │ ├── routers/ # FastAPI routers grouped by feature │ ├── services/ # Business logic — one service per domain │ ├── repositories/ # Database access layer │ ├── tasks/ # APScheduler jobs │ └── utils/ # Helpers, constants, shared types ├── tests/ │ ├── conftest.py │ ├── test_routers/ │ ├── test_services/ │ └── test_repositories/ ├── pyproject.toml └── .env.example ``` - **Routers** receive requests, validate input via Pydantic, and delegate to **services**. - **Services** contain business logic and call **repositories** or external clients. - **Repositories** handle raw database queries — nothing else. - Never put business logic inside routers or repositories. --- ## 4. FastAPI Conventions - Use **async def** for every endpoint — no sync endpoints. - Every endpoint must declare explicit **response models** (`response_model=...`). - Use **Pydantic models** for request bodies and query parameters — never raw dicts. - Use **Depends()** for dependency injection (database sessions, services, auth). - Group endpoints into routers by feature domain (`routers/jails.py`, `routers/bans.py`, …). - Use appropriate HTTP status codes: `201` for creation, `204` for deletion with no body, `404` for not found, etc. - Protected endpoints should return `401 Unauthorized` or `403 Forbidden` when the session is invalid or expired; the frontend treats these responses as a session-expiry event and redirects the user to `/login`. - Use **HTTPException** or custom exception handlers — never return error dicts manually. - **GET endpoints are read-only — never call `db.commit()` or execute INSERT/UPDATE/DELETE inside a GET handler.** If a GET path produces side-effects (e.g., caching resolved data), that write belongs in a background task, a scheduled flush, or a separate POST endpoint. Users and HTTP caches assume GET is idempotent and non-mutating. ```python # Good — pass db=None on GET so geo_service never commits result = await geo_service.lookup_batch(ips, http_session, db=None) # Bad — triggers INSERT + COMMIT per IP inside a GET handler result = await geo_service.lookup_batch(ips, http_session, db=app_db) ``` ```python from fastapi import APIRouter, Depends, HTTPException, status from app.models.jail import JailResponse, JailListResponse from app.services.jail_service import JailService router: APIRouter = APIRouter(prefix="/api/jails", tags=["Jails"]) @router.get("/", response_model=JailListResponse) async def list_jails(service: JailService = Depends()) -> JailListResponse: jails: list[JailResponse] = await service.get_all_jails() return JailListResponse(jails=jails) ``` --- ## 5. Pydantic Models - Every model inherits from `pydantic.BaseModel`. - Use `model_config = ConfigDict(strict=True)` where appropriate. - Field names use **snake_case** in Python, export as **camelCase** to the frontend via alias generators if needed. - Validate at the boundary — once data enters a Pydantic model it is trusted. - Use `Field(...)` with descriptions for every field to keep auto-generated docs useful. - Separate **request models**, **response models**, and **domain (internal) models** — do not reuse one model for all three. ```python from pydantic import BaseModel, Field from datetime import datetime class BanResponse(BaseModel): ip: str = Field(..., description="Banned IP address") jail: str = Field(..., description="Jail that issued the ban") banned_at: datetime = Field(..., description="UTC timestamp of the ban") expires_at: datetime | None = Field(None, description="UTC expiry, None if permanent") ban_count: int = Field(..., ge=1, description="Number of times this IP was banned") ``` ### Using `Literal` Types for Constrained Strings When a field should only accept a small set of predefined values, use `Literal` to enforce this at the type level: ```python from typing import Literal from pydantic import BaseModel, Field LogLevel = Literal["CRITICAL", "ERROR", "WARNING", "NOTICE", "INFO", "DEBUG"] class GlobalConfigUpdate(BaseModel): log_level: LogLevel | None = Field( default=None, description="Log level: CRITICAL, ERROR, WARNING, NOTICE, INFO, or DEBUG.", ) ``` This provides: - **Type safety** — IDEs and type checkers enforce valid values. - **API documentation** — OpenAPI docs automatically list all allowed values. - **Validation** — Pydantic rejects invalid values and provides a clear error message. ### Custom Field Validators For fields that require complex validation (e.g., file paths that must be within allowed directories), use `@field_validator`: ```python from pydantic import field_validator from app.utils.path_utils import validate_log_path class AddLogPathRequest(BaseModel): log_path: str = Field(..., description="Absolute path to the log file to monitor.") @field_validator("log_path", mode="after") @classmethod def validate_log_path_field(cls, value: str) -> str: """Validate that the log path is within allowed directories.""" return validate_log_path(value) ``` **Path Validation Helper:** For query parameters and other contexts where Pydantic validators cannot be used directly, use the `validate_log_path()` helper from `app.utils.path_utils`: ```python from fastapi import HTTPException, status from app.utils.path_utils import validate_log_path @router.delete("/{name}/logpath") async def delete_log_path( name: str, log_path: str = Query(...), ) -> None: try: validate_log_path(log_path) except ValueError as e: raise HTTPException( status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail=str(e), ) from e # ... rest of handler ``` **Key points:** - Use `mode="after"` in model validators to validate after Pydantic's basic type coercion. - Raise `ValueError` if validation fails; Pydantic converts it to an HTTP 400 response. - For query parameters that cannot use Pydantic validators, use the `validate_log_path()` helper and raise HTTP 422. - **Never use string prefix matching** for path validation (e.g., `path.startswith("/var/log")`). The helper uses `Path.relative_to()` to prevent bypasses like `/var/log_evil/file.log`. - Symlinks are resolved before validating to prevent symlink-based escapes. --- ## 6. Async Rules - **Never** call blocking / synchronous I/O in an async function — no `time.sleep()`, no synchronous file reads, no `requests.get()`. - Use `aiohttp.ClientSession` for HTTP calls, `aiosqlite` for database access. - Use `asyncio.TaskGroup` (Python 3.11+) when you need to run independent coroutines concurrently. - Long-running startup/shutdown logic goes into the **FastAPI lifespan** context manager. - **Never call `db.commit()` inside a loop.** With aiosqlite, every commit serialises through a background thread and forces an `fsync`. N rows × 1 commit = N fsyncs. Accumulate all writes in the loop, then issue a single `db.commit()` once after the loop ends. The difference between 5,000 commits and 1 commit can be seconds vs milliseconds. ```python # Good — one commit for the whole batch for ip, info in results.items(): await db.execute(INSERT_SQL, (ip, info.country_code, ...)) await db.commit() # ← single fsync # Bad — one fsync per row for ip, info in results.items(): await db.execute(INSERT_SQL, (ip, info.country_code, ...)) await db.commit() # ← fsync on every iteration ``` - **Prefer `executemany()` over calling `execute()` in a loop** when inserting or updating multiple rows with the same SQL template. aiosqlite passes the entire batch to SQLite in one call, reducing Python↔thread overhead on top of the single-commit saving. ```python # Good await db.executemany(INSERT_SQL, [(ip, cc, cn, asn, org) for ip, info in results.items()]) await db.commit() ``` - Shared resources (DB connections, HTTP sessions) are created once during startup and closed during shutdown — never inside request handlers. ```python from contextlib import asynccontextmanager from collections.abc import AsyncGenerator from fastapi import FastAPI import aiohttp import aiosqlite @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncGenerator[None]: # Startup app.state.http_session = aiohttp.ClientSession() app.state.db = await aiosqlite.connect("bangui.db") yield # Shutdown await app.state.http_session.close() await app.state.db.close() ``` --- ## 6.1 Database Query Conventions ### LIKE Queries and Wildcard Escaping SQLite's `LIKE` operator treats `%` (any sequence of characters) and `_` (any single character) as wildcards. When querying with user-supplied filters that may contain these characters, you must escape them to prevent unintended matches. **The Problem:** ```python # Bad — ip_filter="10.0.0_" matches "10.0.0.1", "10.0.0.2", etc. ip_filter = "10.0.0_" await db.execute( "SELECT * FROM bans WHERE ip LIKE ?", (f"{ip_filter}%",) # ← wildcard characters not escaped ) ``` **The Solution:** Use the `escape_like()` helper from `app.utils.fail2ban_db_utils`: ```python from app.utils.fail2ban_db_utils import escape_like # Good — wildcard characters are escaped ip_filter = "10.0.0_" await db.execute( "SELECT * FROM bans WHERE ip LIKE ? ESCAPE '\\'", (f"{escape_like(ip_filter)}%",) # ← underscores escaped to literal ) ``` **How `escape_like()` works:** The function escapes backslashes first, then `%` and `_` signs: ```python def escape_like(s: str) -> str: return s.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") ``` **Key rules:** 1. **Backslash escapes first** — to prevent double-escaping when the input contains backslashes. 2. **Add `ESCAPE '\\'` to the SQL** — tells SQLite which character to use for escaping. 3. **Dots are not wildcards** — they do not need escaping; normal IP addresses pass through unchanged. **Test example:** ```python assert escape_like("10.0.0_") == "10.0.0\\_" assert escape_like("10.0.0%test") == "10.0.0\\%test" assert escape_like("10.0.0.1") == "10.0.0.1" # Unchanged ``` --- ## 7. Logging - Use **structlog** for every log message. - Bind contextual key-value pairs — never format strings manually. - Log levels: `debug` for development detail, `info` for operational events, `warning` for recoverable issues, `error` for failures, `critical` for fatal problems. - **Never log sensitive data** (passwords, tokens, session tokens, raw credentials, private keys). - For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`. - Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`. ```python import structlog import hashlib log: structlog.stdlib.BoundLogger = structlog.get_logger() async def ban_ip(ip: str, jail: str) -> None: log.info("banning_ip", ip=ip, jail=jail) try: await _execute_ban(ip, jail) log.info("ip_banned", ip=ip, jail=jail) except BanError as exc: log.error("ban_failed", ip=ip, jail=jail, error=str(exc)) raise async def logout_session(db: aiosqlite.Connection, token: str) -> None: # Use a one-way hash for token correlation in logs token_hash = hashlib.sha256(token.encode()).hexdigest()[:12] await session_repo.delete_session(db, token) log.info("session_terminated", token_hash=token_hash) ``` --- ## 8. Error Handling - Define **custom exception classes** for domain errors (e.g., `JailNotFoundError`, `BanFailedError`). - Catch specific exceptions — never bare `except:` or `except Exception:` without re-raising. - Map domain exceptions to HTTP status codes via FastAPI **exception handlers** registered on the app. - Always log errors with context before raising. ```python class JailNotFoundError(Exception): def __init__(self, name: str) -> None: self.name: str = name super().__init__(f"Jail '{name}' not found") # In main.py @app.exception_handler(JailNotFoundError) async def jail_not_found_handler(request: Request, exc: JailNotFoundError) -> JSONResponse: return JSONResponse(status_code=404, content={"detail": f"Jail '{exc.name}' not found"}) ``` ### Routers and Exception Propagation - **Routers must NOT construct `HTTPException` for domain errors** — let domain exceptions propagate. - Routers should never have helper functions like `_bad_gateway()`, `_not_found()`, `_conflict()` etc. that convert domain exceptions to `HTTPException`. - All domain exception types must have corresponding handlers registered in `main.py` via `app.add_exception_handler()`. - Exception handlers are registered in order from most specific to least specific — FastAPI evaluates them in registration order. ```python # ❌ BAD — routers constructing HTTPException for domain exceptions @router.get("/{name}") async def get_jail(name: str, socket_path: Fail2BanSocketDep) -> JailDetailResponse: try: return await jail_service.get_jail(socket_path, name) except JailNotFoundError: raise HTTPException(status_code=404, detail=f"Jail not found: {name!r}") from None # ✅ GOOD — domain exception propagates to global handler @router.get("/{name}") async def get_jail(name: str, socket_path: Fail2BanSocketDep) -> JailDetailResponse: return await jail_service.get_jail(socket_path, name) ``` All domain exceptions raised by services propagate to handlers in `main.py`, ensuring: 1. Consistent error response format across the entire API. 2. No duplicated exception-to-HTTP-status mapping logic. 3. Easy to audit all error codes — they are all in one place. --- ## 9. Testing - **Every** new feature or bug fix must include tests. - Tests live in `tests/` mirroring the `app/` structure. - Use `pytest` with `pytest-asyncio` for async tests. - Use `httpx.AsyncClient` to test FastAPI endpoints (not `TestClient` which is sync). - Mock external dependencies (fail2ban socket, aiohttp calls) — tests must never touch real infrastructure. - Aim for **>80 % line coverage** — critical paths (auth, banning, scheduling) must be 100 %. - Test names follow `test___` pattern. ```python import pytest from httpx import AsyncClient, ASGITransport from app.main import create_app @pytest.fixture async def client() -> AsyncClient: app = create_app() transport: ASGITransport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: yield ac @pytest.mark.asyncio async def test_list_jails_returns_200(client: AsyncClient) -> None: response = await client.get("/api/jails/") assert response.status_code == 200 data: dict = response.json() assert "jails" in data ``` --- ## 9.1 Background Tasks and Scheduler Architecture BanGUI uses **APScheduler 4.x** (async mode) to manage background jobs that execute on a schedule without user interaction. This section documents how to write and register background tasks. ### Task Location and Structure All background tasks live in `backend/app/tasks/` as separate modules. Each task: - Exports a `register(app: FastAPI) -> None` or `async def register(app: FastAPI) -> None` function. - Opens its own database connection using `app.db.open_db()` or the `task_db()` helper. - Closes connections when work completes (use the async context manager pattern). - Runs independently of the FastAPI request/response cycle. ### Example Task ```python # backend/app/tasks/my_task.py import structlog from fastapi import FastAPI from apscheduler.schedulers.asyncio import AsyncIOScheduler log = structlog.get_logger() async def my_background_job(app: FastAPI) -> None: """Do important work on a schedule.""" log.info("my_background_job_started") try: db = await app.db.open_db(app.state.settings.database_path) try: # Do work... pass finally: await db.close() except Exception: log.error("my_background_job_failed", exc_info=True) def register(app: FastAPI) -> None: """Register the job with the scheduler.""" scheduler: AsyncIOScheduler = app.state.scheduler scheduler.add_job( my_background_job, args=(app,), trigger="interval", seconds=60, id="my_task", name="My Background Job", ) ``` ### Accessing Shared Resources in Tasks Since tasks do not have access to `Depends(get_db)` (no request scope), they must: 1. **Open their own DB connection** via `app.state.db_factory.open_db(path)`. 2. **Access app-level state** — `app.state.http_session`, `app.state.geo_cache`, `app.state.settings`, etc. 3. **Use structlog** for all logging (never `print()`). ### Single-Worker Requirement **The scheduler is bound to a single asyncio event loop and cannot be shared across multiple worker processes.** BanGUI enforces single-worker mode to prevent duplicate task execution. - **Deployment constraint:** Set `BANGUI_WORKERS=1` (default). - **Startup validation:** `startup_shared_resources()` raises `RuntimeError` if `BANGUI_WORKERS > 1`. - See [Architekture.md § 9.2](Architekture.md) for full details. --- ## 10. Code Style & Tooling | Tool | Purpose | |---|---| | **Ruff** | Linter and formatter (replaces black, isort, flake8). | | **mypy** or **pyright** | Static type checking in strict mode. | | **pre-commit** | Run ruff + type checker before every commit. | - Line length: **120 characters** max. - Strings: use **double quotes** (`"`). - Imports: sorted by ruff — stdlib → third-party → local, one import per line. - No unused imports, no unused variables, no `# type: ignore` without explanation. - Docstrings in **Google style** on every public function, class, and module. --- ## 11. fail2ban Response Utilities All services that interact with the fail2ban daemon must use the canonical response parsing utilities from `app.utils.fail2ban_response`. This ensures consistent error handling, type safety, and makes it easy to fix bugs in response handling across the entire codebase. ### Available Functions **`ok(response: object) -> object`** Extracts the payload from a fail2ban ``(return_code, data)`` response tuple. - Raises `ValueError` if return code ≠ 0 or response shape is invalid. - Use this on every response from `Fail2BanClient.send()`. **`to_dict(pairs: object) -> dict[str, object]`** Converts a list of ``(key, value)`` pairs (fail2ban's native response format) to a Python dict. - Silently ignores malformed entries and non-list/tuple inputs. - Always returns a dict (empty if input is invalid). **`ensure_list(value: object | None) -> list[str]`** Coerces fail2ban response values (which may be `None`, a single string, or a list) to a normalized list of strings. - Handles all three cases consistently. - Returns empty list for `None` or empty strings. **`is_not_found_error(exc: Exception) -> bool`** Checks if an exception indicates a jail does not exist. - Checks for multiple error message patterns (case-insensitive). - Use this to distinguish "jail not found" errors from other failures. ### Example Usage ```python from app.utils.fail2ban_response import ok, to_dict, ensure_list, is_not_found_error from app.utils.fail2ban_client import Fail2BanClient client = Fail2BanClient(socket_path="/var/run/fail2ban/fail2ban.sock") try: # Get jail status response = await client.send(["status", "sshd", "short"]) status_dict = to_dict(ok(response)) # Extract payload and convert to dict # Get list of banned IPs ban_response = await client.send(["get", "sshd", "banip"]) banned_ips = ensure_list(ok(ban_response)) # Normalize to list of strings except ValueError as exc: if is_not_found_error(exc): raise JailNotFoundError("sshd") from exc raise ``` ### Why This Matters Before this utility module, every service implemented its own copy of these functions, leading to: - Code duplication across 7+ service files. - Subtle inconsistencies in error handling. - Difficult maintenance — every bug fix required touching multiple files. Now, all services import from a single authoritative source, making response handling consistent, maintainable, and type-safe. --- ## 12. Configuration & Secrets - All configuration lives in **environment variables** loaded through **pydantic-settings**. - Secrets (master password hash, session key) are **never** committed to the repository. - Provide a `.env.example` with all keys and placeholder values. - Validate config at startup — fail fast with a clear error if a required value is missing. ```python from pydantic_settings import BaseSettings from pydantic import Field class Settings(BaseSettings): database_path: str = Field("bangui.db", description="Path to SQLite database") fail2ban_socket: str = Field("/var/run/fail2ban/fail2ban.sock", description="fail2ban socket path") session_secret: str = Field(..., description="Secret key for session signing") log_level: str = Field("info", description="Logging level") model_config = {"env_prefix": "BANGUI_", "env_file": ".env"} ``` ### Session Cookie Security The `session_cookie_secure` configuration controls the `Secure` flag on the session cookie. This flag prevents browsers from sending the session cookie over unencrypted HTTP. **Default:** `true` — Production deployments are secure by default. Cookies are only sent over HTTPS. **Local Development:** Set `BANGUI_SESSION_COOKIE_SECURE=false` in your compose file or `.env` to allow cookies over HTTP (required for `localhost:8000`). ```yaml # Docker/compose.debug.yml environment: BANGUI_SESSION_COOKIE_SECURE: "false" # Allow HTTP during local development ``` **Important:** If `Secure=true` is set, browsers will reject the session cookie when the backend is served over HTTP. Ensure your nginx/reverse proxy terminates TLS and passes `X-Forwarded-Proto: https` so FastAPI knows the connection is secure. ### fail2ban_start_command Configuration The `fail2ban_start_command` setting specifies the shell command used to start the fail2ban daemon during recovery operations (e.g., after a rollback). **Format & Parsing:** - The command is split into arguments using `shlex.split()`, which respects shell quoting rules. - Paths with spaces must be quoted. Example: `"/opt/my tools/fail2ban-client" start`. - The command is **not** executed through a shell — no shell variables or globbing are interpreted. **Validation:** - The command is validated at startup using `shlex.split()`. Mismatched quotes will raise a `ValueError` with the problematic command in the error message. **Environment Variables:** ```bash BANGUI_FAIL2BAN_START_COMMAND="fail2ban-client start" # Default BANGUI_FAIL2BAN_START_COMMAND="systemctl start fail2ban" # systemd BANGUI_FAIL2BAN_START_COMMAND='"/opt/my tools/fail2ban" start' # Quoted path ``` **Common Pitfall:** Using `.split()` instead of `shlex.split()` would break commands with spaces in paths. Always use quoted strings for paths that contain whitespace. ### Log Path Validation & Allowlisting Authenticated users can instruct fail2ban to monitor additional log files through the API endpoint `POST /api/config/jails/{name}/logpath`. To prevent path-traversal attacks and unauthorized reads of sensitive system files, all requested log paths must resolve to locations within a configurable allowlist of safe directories. **Allowed Directories:** - Configured via the `BANGUI_ALLOWED_LOG_DIRS` environment variable (comma-separated list). - Defaults to: `["/var/log", "/config/log"]`. **Path Validation Rules:** 1. The requested path is resolved to its canonical form using `Path(log_path).resolve()`, which: - Expands relative paths to absolute paths. - Resolves symbolic links to their real targets. - Normalizes `.` and `..` components. 2. The resolved path is checked using `Path.is_relative_to()` against each allowed directory prefix. 3. If the resolved path is not relative to any allowed directory, a `ValueError` is raised with a descriptive error message. **Implementation:** - Validation occurs in the Pydantic model `AddLogPathRequest` using a `@field_validator`. - The validator runs at request time, before the service layer is invoked. - Symlinks that escape allowed directories are rejected (see [symlink bypass tests](../../backend/tests/test_models.py)). **Important:** Use `is_relative_to()`, not `startswith()` or string prefix matching. The latter is bypassable with paths like `/var/log_evil/file.log`. **Environment Variables:** ```bash BANGUI_ALLOWED_LOG_DIRS="/var/log,/config/log" # Default BANGUI_ALLOWED_LOG_DIRS="/var/log,/config/log,/home/app/logs" # Custom directory ``` ### Login Rate Limiting The login endpoint (`POST /api/auth/login`) is protected against brute-force attacks using an in-memory rate limiter. **Design:** - Uses a `dict[str, deque[float]]` keyed by client IP, storing login attempt timestamps within a time window. - Attempts outside the window are automatically removed during validation checks. - Expired IP entries are cleaned up to prevent unbounded memory growth. **Rate Limit Rules:** - **5 attempts per 60 seconds** per IP address. - Requests exceeding the limit return **HTTP 429 Too Many Requests** with a `Retry-After` header. - Each failed login triggers a 10-second server-side delay (`asyncio.sleep`) to further slow attacks, on top of bcrypt hashing (~100ms). **IP Extraction (Proxy Safety):** - When behind nginx, the rate limiter reads the real client IP from `X-Forwarded-For` or `X-Real-IP` headers. - Only trusts these headers when the immediate connection source is in a configured trusted proxy list. - Prevents attackers from spoofing these headers to bypass rate limits. - Falls back to the direct connection IP when proxy headers cannot be trusted. **Process-Local Limitation:** - The rate limiter is process-local (in-memory). In multi-worker deployments (e.g., Gunicorn with 4 workers), each worker maintains its own rate limit counter. - This is acceptable because the single-worker constraint is enforced elsewhere. See [TASK-002/003 notes](Instructions.md) for details. **Implementation:** - Rate limiter: `app.utils.rate_limiter.RateLimiter` - IP extraction: `app.utils.client_ip.get_client_ip()` - Dependency: `LoginRateLimiterDep` in `app.dependencies` --- ## 14. Git & Workflow - **Branch naming:** `feature/`, `fix/`, `chore/`. - **Commit messages:** imperative tense, max 72 chars first line (`Add jail reload endpoint`, `Fix ban history query`). - Every merge request must pass: ruff, type checker, all tests. - Do not merge with failing CI. - Keep pull requests small and focused — one feature or fix per PR. --- ## 15. Coding Principles These principles are **non-negotiable**. Every backend contributor must internalise and apply them daily. ### 14.1 Clean Code - Write code that **reads like well-written prose** — a new developer should understand intent without asking. - **Meaningful names** — variables, functions, and classes must reveal their purpose. Avoid abbreviations (`cnt`, `mgr`, `tmp`) unless universally understood. - **Small functions** — each function does exactly one thing. If you need a comment to explain a block inside a function, extract it into its own function. - **No magic numbers or strings** — use named constants. - **Boy Scout Rule** — leave every file cleaner than you found it. - **Avoid deep nesting** — prefer early returns (guard clauses) to keep the happy path at the top indentation level. ```python # Good — guard clause, clear name, one job async def get_active_ban(ip: str, jail: str) -> Ban: ban: Ban | None = await repo.find_ban(ip=ip, jail=jail) if ban is None: raise BanNotFoundError(ip=ip, jail=jail) if ban.is_expired(): raise BanExpiredError(ip=ip, jail=jail) return ban # Bad — nested, vague name async def check(ip, j): b = await repo.find_ban(ip=ip, jail=j) if b: if not b.is_expired(): return b else: raise Exception("expired") else: raise Exception("not found") ``` ### 14.2 Separation of Concerns (SoC) - Each module, class, and function must have a **single, well-defined responsibility**. - **Routers** → HTTP layer only (parse requests, return responses). - **Services** → business logic and orchestration. - **Repositories** → data access and persistence. - **Models** → data shapes and validation. - **Tasks** → scheduled background jobs. - Never mix layers — a router must not execute SQL, and a repository must not raise `HTTPException`. ### 14.3 Single Responsibility Principle (SRP) - A class or module should have **one and only one reason to change**. - If a service handles both ban management *and* email notifications, split it into `BanService` and `NotificationService`. ### 14.4 Don't Repeat Yourself (DRY) - Extract shared logic into utility functions, base classes, or dependency providers. - If the same block of code appears in more than one place, **refactor it** into a single source of truth. - But don't over-abstract — premature DRY that couples unrelated features is worse than a little duplication (see **Rule of Three**: refactor when something appears a third time). ### 14.5 KISS — Keep It Simple, Stupid - Choose the simplest solution that works correctly. - Avoid clever tricks, premature optimisation, and over-engineering. - If a standard library function does the job, prefer it over a custom implementation. ### 14.6 YAGNI — You Aren't Gonna Need It - Do **not** build features, abstractions, or config options "just in case". - Implement what is required **now**. Extend later when a real need emerges. ### 14.7 Dependency Inversion Principle (DIP) - High-level modules (services) must not depend on low-level modules (repositories) directly. Both should depend on **abstractions** (protocols / interfaces). - Use FastAPI's `Depends()` to inject implementations — this makes swapping and testing trivial. ```python from typing import Protocol class BanRepository(Protocol): async def find_ban(self, ip: str, jail: str) -> Ban | None: ... async def save_ban(self, ban: Ban) -> None: ... class SqliteBanRepository: """Concrete implementation — depends on aiosqlite.""" async def find_ban(self, ip: str, jail: str) -> Ban | None: ... async def save_ban(self, ban: Ban) -> None: ... ``` #### 13.7.1 Repository Module Pattern — Module-as-Protocol Structural Compatibility BanGUI uses **module-level functions** for repository implementations, not classes. Each repository module (e.g., `session_repo.py`, `blocklist_repo.py`) exports async functions that match the signatures defined in the Protocol interface in `protocols.py`. This is a **structural typing pattern** — mypy accepts the module as a valid Protocol implementation because the function signatures match, *even though* the module is not explicitly annotated as implementing the Protocol. This approach works correctly with FastAPI's dependency injection via `cast()`: ```python # In app/repositories/session_repo.py async def create_session(db: aiosqlite.Connection, token: str, created_at: str, expires_at: str) -> Session: """Insert a new session row.""" ... # In app/repositories/protocols.py class SessionRepository(Protocol): async def create_session( self, db: aiosqlite.Connection, token: str, created_at: str, expires_at: str, ) -> Session: ... # In app/dependencies.py async def get_session_repo() -> SessionRepository: """Provide the concrete session repository implementation.""" from app.repositories import session_repo return session_repo # ← mypy accepts this because the module has matching functions ``` **Why this pattern is used:** - **Simplicity** — no boilerplate class/instance wrapping. - **Compatibility** — Python's **structural typing** (PEP 544) means the module automatically satisfies the Protocol interface if function signatures match. - **Testability** — the same DIP principle applies; services depend on the Protocol, not the module directly, so tests can mock the Protocol. **Risks and mitigations:** - **Silent breakage if function signatures change** — If a parameter is added or removed from a module function, the module no longer satisfies the Protocol, but mypy does not flag this as an error because the module is loosely coupled. To prevent this, **Protocol signatures in `protocols.py` are the source of truth**. Always check that module functions match the Protocol definitions before merging changes. The CI/CD pipeline validates this compatibility at build time. **How the validation works (CI check):** - Before each deployment, run `mypy --strict` to ensure all dependency providers return values compatible with their Protocol types. - The `cast()` calls in `dependencies.py` are a documented signal that structural compatibility is being verified externally, not via explicit class inheritance. #### 13.7.2 Session Cache Pluggability — Process-Local vs. Shared Backends Session validation is expensive (SQLite lookup + password verification). To improve performance, **validated session tokens are cached** using the `SessionCache` interface (`app.utils.session_cache`). The default implementation, `InMemorySessionCache`, stores cached sessions in process-local memory. **Current implementation (single-worker):** ```python from app.utils.session_cache import SessionCache, InMemorySessionCache, NoOpSessionCache class SessionCache(Protocol): """Interface for session token validation cache backends.""" def get(self, token: str) -> Session | None: ... def set(self, token: str, session: Session, ttl_seconds: float) -> None: ... def invalidate(self, token: str) -> None: ... def clear(self) -> None: ... # Default in-memory implementation — PROCESS-LOCAL class InMemorySessionCache: def __init__(self) -> None: self._entries: dict[str, tuple[Session, float]] = {} ``` **Single-worker constraint:** `InMemorySessionCache` is **process-local** — each worker process has its own dict. In single-worker mode (enforced by TASK-002), this is safe and improves performance. In multi-worker deployments: - A logout by worker A clears the session from A's cache, but worker B still has it → logout doesn't work. - Enabling/disabling the cache requires restarting all workers to take effect. **Multi-worker solution:** To support multiple workers (future enhancement), implement a shared backend behind the same `SessionCache` Protocol: ```python # Example Redis implementation (not yet in codebase) class RedisSessionCache: """Session cache backed by Redis.""" def __init__(self, redis_url: str) -> None: self.client = aioredis.from_url(redis_url) async def get(self, token: str) -> Session | None: data = await self.client.get(f"session:{token}") return Session.model_validate_json(data) if data else None async def set(self, token: str, session: Session, ttl_seconds: float) -> None: await self.client.setex( f"session:{token}", int(ttl_seconds), session.model_dump_json() ) async def invalidate(self, token: str) -> None: await self.client.delete(f"session:{token}") async def clear(self) -> None: await self.client.flushdb() ``` To adopt a Redis backend: 1. Create `RedisSessionCache` in `app.utils.session_cache`. 2. Update `app.utils.runtime_state.set_runtime_settings()` to instantiate `RedisSessionCache` when `REDIS_URL` env var is set. 3. Update `app.config.Settings` to accept optional `REDIS_URL`. 4. Tests continue to use `InMemorySessionCache` (no Redis dependency in dev). **Implementation rules:** - All cache methods must be `async` (even if the backend is sync). - Never log session tokens or session data. - TTL must be respected — expired entries must be removed on access. - See `app/utils/session_cache.py` for the full Protocol definition and current implementations. ### 14.8 Composition over Inheritance - Favour **composing** small, focused objects over deep inheritance hierarchies. - Use mixins or protocols only when a clear "is-a" relationship exists; otherwise, pass collaborators as constructor arguments. ### 14.9 Fail Fast - Validate inputs as early as possible — at the API boundary with Pydantic, at service entry with assertions or domain checks. - Raise specific exceptions immediately rather than letting bad data propagate silently. ### 14.10 Law of Demeter (Principle of Least Knowledge) - A function should only call methods on: 1. Its own object (`self`). 2. Objects passed as parameters. 3. Objects it creates. - Avoid long accessor chains like `request.state.db.cursor().execute(...)` — wrap them in a meaningful method. ### 14.11 Defensive Programming - Never trust external input — validate and sanitise everything that crosses a boundary (HTTP request, file, socket, environment variable). - Handle edge cases explicitly: empty lists, `None` values, negative numbers, empty strings. - Use type narrowing and exhaustive pattern matching (`match` / `case`) to eliminate impossible states. ### 14.12 SSRF Prevention (Server-Side Request Forgery) When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests: 1. **Use Pydantic's `AnyHttpUrl` type** to restrict schemes to `http://` and `https://` only. - Rejects `file://`, `ftp://`, `gopher://`, and other non-http schemes at the model boundary. 2. **Validate resolved IP addresses** before fetching: - Parse the hostname and resolve it via DNS (using `socket.getaddrinfo()`). - Use `ipaddress.ip_address().is_private` to reject private/reserved ranges: - RFC 1918: `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16` - Loopback: `127.0.0.0/8`, `::1/128` - Link-local: `169.254.0.0/16`, `fe80::/10` - IPv6 site-local, multicast, and reserved ranges. - Raise `ValueError` if validation fails; let the router convert it to HTTP 400. 3. **Guard against DNS rebinding**: - Validate DNS at URL creation/validation time (performed during request deserialization). - For additional safety, re-validate the connection IP at HTTP client time (e.g., custom `aiohttp.TCPConnector` can inspect the resolved address during connect). 4. **Example implementation** (see `backend/app/utils/ip_utils.py`): - `is_private_ip(ip_str: str) → bool`: Checks if IP is private/reserved/loopback/link-local. - `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check. - Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400. --- ## 16. Quick Reference — Do / Don't | Do | Don't | |---|---| | Type every function, variable, return | Leave types implicit | | Use `async def` for I/O | Use sync functions for I/O | | Validate with Pydantic at the boundary | Pass raw dicts through the codebase | | Log with structlog + context keys | Use `print()` or format strings in logs | | Write tests for every feature | Ship untested code | | Use `aiohttp` for HTTP calls | Use `requests` | | Handle errors with custom exceptions | Use bare `except:` | | Keep routers thin, logic in services | Put business logic in routers | | Use `datetime.now(datetime.UTC)` | Use naive datetimes | | Run ruff + mypy before committing | Push code that doesn't pass linting | | Keep GET endpoints read-only (no `db.commit()`) | Call `db.commit()` / INSERT inside GET handlers | | Batch DB writes; issue one `db.commit()` after the loop | Commit inside a loop (1 fsync per row) | | Use `executemany()` for bulk inserts | Call `execute()` + `commit()` per row in a loop |