Files
BanGUI/Docs/Architekture.md
Lukas 9b4aee7f37 docs: enhance Pydantic validator constraints and mark task complete
Verified that BanGUI's codebase is fully compliant with the constraint that
Pydantic validators must not execute at import time or have side effects.

Changes:
- Architekture.md § 2.1: Added explicit 'No I/O or Side Effects' constraint
  for model validators, explaining why this prevents circular dependencies
- Backend-Development.md: Enhanced validator documentation with subsection
  on import-time execution, including wrong/correct examples
- Tasks.md: Marked '[Backend] Pydantic validators execute at import time'
  as COMPLETE with verification results and regression prevention guidance

Verification Summary:
✓ Audited 14 model files: no problematic imports or function calls
✓ Import time: 0.159s (fast, no import-time side effects)
✓ Type checking: mypy --strict passes on all models
✓ Unit tests: 17 tests pass (100%)
✓ Correct pattern in use: validation in routers/services, not models

The codebase architecture is sound—no code changes required, only
documentation clarification to prevent future violations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:37:03 +02:00

97 KiB

BanGUI — Architecture

This document describes the system architecture of BanGUI, a web application for monitoring, managing, and configuring fail2ban. It defines every major component, module, and data flow so that any developer can understand how the pieces fit together before writing code.


1. High-Level Overview

BanGUI is a two-tier web application with a clear separation between frontend and backend, connected through a RESTful JSON API.

┌──────────────────────────────────────────────────────────────────┐
│                          Browser                                 │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                   Frontend (React + Fluent UI)             │  │
│  │  TypeScript · Vite · Single-Page Application               │  │
│  └──────────────────────────┬─────────────────────────────────┘  │
└─────────────────────────────┼────────────────────────────────────┘
                              │  HTTP / JSON (REST API)
┌─────────────────────────────┼────────────────────────────────────┐
│                          Server                                  │
│  ┌──────────────────────────┴─────────────────────────────────┐  │
│  │                   Backend (FastAPI)                        │  │
│  │  Python 3.12+ · Async · Pydantic v2 · structlog            │  │
│  └─────┬──────────────┬──────────────┬────────────────────────┘  │
│        │              │              │                           │
│  ┌─────┴─────┐  ┌─────┴─────┐  ┌────┴─────┐                      │
│  │  SQLite   │  │ fail2ban  │  │ External │                      │
│  │  (App DB) │  │  (Socket) │  │   APIs   │                      │
│  └───────────┘  └───────────┘  └──────────┘                      │
└──────────────────────────────────────────────────────────────────┘

Component Summary

Component Technology Purpose
Frontend TypeScript, React, Fluent UI v9, Vite User interface — displays data, captures user input, communicates with the backend API
Backend Python 3.12+, FastAPI, Pydantic v2, aiosqlite Business logic, data persistence, fail2ban communication, scheduling
Application Database SQLite (via aiosqlite) Stores BanGUI's own data: configuration, session state, blocklist sources, import logs
fail2ban Unix domain socket The monitored service — BanGUI reads status, issues commands, and reads the fail2ban database
MaxMind GeoLite2 Offline MMDB file (mounted into container) IP geolocation (primary resolver) — local, encrypted
External APIs HTTP (via aiohttp) Blocklist downloads; IP geolocation fallback (only if MMDB unavailable and HTTP fallback enabled)

2. Backend Architecture

The backend follows a layered architecture with strict separation of concerns. Dependencies flow inward: routers depend on services, services depend on repositories — never the reverse.

                ┌─────────────────────────────────┐
                │        FastAPI Application       │
                │          (main.py)               │
                └──────────┬──────────────────────-┘
                           │
          ┌────────────────┼────────────────┐
          │                │                │
    ┌─────┴──────┐  ┌─────┴──────┐  ┌──────┴──────┐
    │  Routers   │  │   Tasks    │  │   Config    │
    │  (HTTP)    │  │ (Scheduled)│  │ (Settings)  │
    └─────┬──────┘  └─────┬──────┘  └─────────────┘
          │               │
    ┌─────┴───────────────┴──────┐
    │         Services           │
    │     (Business Logic)       │
    └─────┬──────────────┬───────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │Repositories│ │  External  │
    │ (Database) │ │  Clients   │
    └─────┬──────┘ └─────┬──────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │  SQLite    │ │fail2ban /  │
    │            │ │HTTP APIs   │
    └────────────┘ └────────────┘

2.1 Project Structure

backend/
├── app/
│   ├── __init__.py
│   ├── `main.py`                # FastAPI app factory, lifespan, exception handlers
│   ├── `config.py`              # Pydantic settings (env vars, .env loading)
│   ├── `db.py`                  # Database connection and initialization
│   ├── `exceptions.py`          # Shared domain exception classes; all services and routers import from here
│   ├── `dependencies.py`        # FastAPI Depends() providers (DB, services, auth)
│   ├── `models/`                # Pydantic schemas
│   │   ├── auth.py            #   Login request/response, session models
│   │   ├── ban.py             #   Ban request/response/domain models
│   │   ├── jail.py            #   Jail request/response/domain models
│   │   ├── config.py          #   Configuration view/edit models
│   │   ├── blocklist.py       #   Blocklist source/import models
│   │   ├── history.py         #   Ban history models
│   │   ├── server.py          #   Server status, health check models
│   │   └── setup.py           #   Setup wizard models
│   ├── routers/               # FastAPI routers (HTTP layer only)
│   │   ├── auth.py            #   POST /api/auth/login, POST /api/auth/logout
│   │   ├── setup.py           #   POST /api/setup (first-run configuration)
│   │   ├── dashboard.py       #   GET /api/dashboard/status, GET /api/dashboard/bans
│   │   ├── jails.py           #   CRUD + controls for jails
│   │   ├── bans.py            #   Ban/unban actions, currently banned list
│   │   ├── config.py          #   View/edit fail2ban configuration
│   │   ├── history.py         #   Historical ban queries
│   │   ├── blocklist.py       #   Blocklist source management, manual import trigger
│   │   ├── geo.py             #   IP geolocation and lookup
│   │   └── server.py          #   Server settings (log level, DB purge, etc.)
│   ├── services/              # Business logic (one service per domain)
│   │   ├── auth_service.py    #   Password verification, session creation/validation
│   │   ├── setup_service.py   #   First-run setup logic, configuration persistence
│   │   ├── jail_service.py    #   Jail listing, start/stop/reload, status aggregation
│   │   ├── ban_service.py     #   Ban/unban execution, currently-banned queries
│   │   ├── config_service.py  #   Read/write fail2ban config, regex validation
│   │   ├── config_file_service.py #   Shared config parsing and file-level operations
│   │   ├── raw_config_io_service.py #   Raw config file I/O wrapper
│   │   ├── jail_config_service.py #   jail config activation/deactivation logic
│   │   ├── filter_config_service.py #   filter config lifecycle management
│   │   ├── action_config_service.py #   action config lifecycle management
│   │   ├── log_service.py     #   Log preview and regex test operations
│   │   ├── fail2ban_metadata_service.py #   Resolve and cache the fail2ban SQLite DB path via the fail2ban socket
│   │   ├── history_service.py #   Historical ban queries, per-IP timeline
│   │   ├── blocklist_service.py # Orchestration: source CRUD, scheduling, import triggers
│   │   ├── blocklist_downloader.py #   HTTP download with retry logic
│   │   ├── blocklist_parser.py #   Parse and validate IP addresses
│   │   ├── blocklist_ban_executor.py #   Ban execution with error handling
│   │   ├── blocklist_import_workflow.py #   Import orchestration (coordinates components)
│   │   ├── geo_service.py     #   IP-to-country resolution, ASN/RIR lookup
│   │   ├── server_service.py  #   Server settings, log management, DB purge
│   │   └── health_service.py  #   fail2ban connectivity checks, version detection
│   ├── repositories/          # Data access layer (raw queries only)
│   │   ├── settings_repo.py   #   App configuration CRUD in SQLite
│   │   ├── session_repo.py    #   Session storage and lookup
│   │   ├── blocklist_repo.py  #   Blocklist sources and import log persistence│  │   ├── fail2ban_db_repo.py #   fail2ban SQLite ban history read operations
│  │   ├── geo_cache_repo.py  #   IP geolocation cache persistence│   │   └── import_log_repo.py #   Import run history records
│   ├── tasks/                 # APScheduler background jobs
│   │   ├── blocklist_import.py#   Scheduled blocklist download and application
│   │   ├── geo_cache_flush.py #   Periodic geo cache persistence (dirty-set flush to SQLite)│  │   ├── geo_cache_cleanup.py #   Periodic purge of stale geo cache entries
│   │   ├── geo_re_resolve.py  #   Periodic re-resolution of stale geo cache records│   │   └── health_check.py   #   Periodic fail2ban connectivity probe
│   └── utils/                 # Helpers, constants, shared types
│       ├── fail2ban_client.py #   Async wrapper around the fail2ban socket protocol
│       ├── fail2ban_response.py #   Canonical response parsing: ok(), to_dict(), ensure_list(), is_not_found_error()
│       ├── fail2ban_db_utils.py #   fail2ban database query helpers
│       ├── ip_utils.py        #   IP/CIDR validation and normalisation
│       ├── time_utils.py      #   Timezone-aware datetime helpers
│       ├── config_file_utils.py #   fail2ban config file I/O
│       ├── conffile_parser.py #   fail2ban config file parser/serializer
│       ├── config_parser.py   #   Structured config object parser
│       ├── config_writer.py   #   Atomic config file write operations
│       ├── jail_config.py     #   Jail config helper
│       └── constants.py       #   Shared constants (default paths, limits, etc.)
├── tests/
│   ├── conftest.py            # Shared fixtures (test app, client, mock DB)
│   ├── test_routers/          # One test file per router
│   ├── test_services/         # One test file per service
│   └── test_repositories/     # One test file per repository
├── pyproject.toml
└── .env.example

2.2 Module Purposes

Routers (app/routers/)

The HTTP interface layer. Each router maps URL paths to handler functions. Routers parse and validate incoming requests using Pydantic models, delegate all logic to services, and return typed responses. They contain zero business logic.

Router Prefix Purpose
auth.py /api/auth Login (password check), logout, session validation
setup.py /api/setup First-run wizard — save initial configuration
dashboard.py /api/dashboard Server status bar data, recent bans for the dashboard
jails.py /api/jails List jails, jail detail, start/stop/reload/idle controls
bans.py /api/bans Ban an IP, unban an IP, unban all, list currently banned IPs
config.py /api/config Read and write fail2ban jail/filter/server configuration via the socket; also serves the fail2ban log tail and service status for the Log tab
file_config.py /api/config Read and write fail2ban config files on disk (jail.d/, filter.d/, action.d/) — list, get, and overwrite raw file contents, toggle jail enabled/disabled
history.py /api/history Query historical bans, per-IP timeline
blocklist.py /api/blocklists CRUD blocklist sources, trigger import, view import logs
geo.py /api/geo IP geolocation lookup, ASN and RIR data
server.py /api/server Log level, log target, DB path, purge age, flush logs
health.py /api/health fail2ban connectivity health check and status

Services (app/services)

The business logic layer. Services orchestrate operations, enforce rules, and coordinate between repositories, the fail2ban client, and external APIs. Each service covers a single domain.

Service Layer Responsibilities:

Services must be independent of HTTP concerns. They work with domain models (DTOs), not response models. This ensures:

  • Domain logic can evolve without affecting API shape
  • Services are reusable across different frontends
  • Testing is simpler (no mocking HTTP response types)
  • Changes to endpoint responses don't require service changes

Domain Models and Response Mapping:

Services return domain models (e.g., DomainActiveBanList, DomainBansByCountry) that represent pure business logic. Response models (e.g., ActiveBanListResponse, BansByCountryResponse) are defined in app/models/ and used only by routers.

Conversion happens at the router boundary:

  1. Router calls service → receives domain model
  2. Router calls mapper function to convert domain model → response model
  3. Router returns response model to HTTP client

Example:

# In ban_service.py
async def get_active_bans(...) -> DomainActiveBanList:
    """Service returns domain model (not HTTP-aware)."""
    ...

# In routers/bans.py (router boundary)
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Mapper functions live in app/mappers/ and are thin, mechanical translations between structures.

Motivation:

  • The Fail2ban domain doesn't care about field names like country_code (snake_case) vs countryCode (camelCase)
  • If the API needs pagination metadata added to the response, only the mapper changes
  • If repositories change their output schema, only services need updating (routers are unaffected)
  • Services can be tested with simple dataclasses; no need for Pydantic serialization overhead
Service Purpose
auth_service.py Hashes and verifies the master password, creates and validates session tokens, enforces session expiry
setup_service.py Validates setup input, persists initial configuration, ensures setup runs only once
jail_service.py Retrieves jail list and details from fail2ban, aggregates metrics (banned count, failure count), sends start/stop/reload/idle commands
ban_service.py Executes ban and unban commands via the fail2ban socket, queries the currently banned IP list, validates IPs before banning
config_service.py Reads active jail and filter configuration from fail2ban, writes configuration changes, validates regex patterns, triggers reload; reads the fail2ban log file tail and queries service status for the Log tab
file_config_service.py Reads and writes raw fail2ban config files on disk (jail.d/, filter.d/, action.d/); lists files, reads content, overwrites files, toggles enabled/disabled
jail_config_service.py Discovers inactive jails by parsing jail.conf / jail.local / jail.d/*; writes .local overrides to activate/deactivate jails; triggers fail2ban reload; validates jail configurations
filter_config_service.py Discovers available filters by scanning filter.d/; reads, creates, updates, and deletes filter definitions; assigns filters to jails
action_config_service.py Discovers available actions by scanning action.d/; reads, creates, updates, and deletes action definitions; assigns actions to jails
config_file_service.py Shared utilities for configuration parsing and manipulation: parses config files, validates names/IPs, manages atomic file writes, probes fail2ban socket
raw_config_io_service.py Low-level file I/O for raw fail2ban config files
fail2ban_metadata_service.py Resolves the fail2ban SQLite database path by querying the fail2ban socket and caches the result for reuse across services
log_service.py Log preview and regex test operations (extracted from config_service)
history_service.py Queries the fail2ban database for historical ban records, builds per-IP timelines, computes ban counts and repeat-offender flags, and syncs new records into BanGUI's archive table
blocklist_service.py Orchestration layer for blocklist imports. Delegates to focused components: BlocklistDownloader (HTTP download with retry), BlocklistParser (IP validation), BanExecutor (fail2ban integration), and BlocklistImportWorkflow (orchestrates the flow). Maintains public API for source CRUD, preview, scheduling, and import triggers.
geo_cache.py GeoCache class that encapsulates all IP geolocation caching: resolves IP addresses to country, ASN, and organization using a primary local MaxMind GeoLite2-Country database (if available) with optional HTTP fallback to ip-api.com (disabled by default for security). Maintains in-memory and persistent caches with negative cache support, and manages background re-resolution. Instantiated once at startup with allow_http_fallback flag and stored on app.state.geo_cache
geo_service.py (Deprecated) Backward-compatibility wrappers that delegate to the GeoCache instance. Kept for compatibility with existing code. New code should use GeoCache directly or via dependency injection
server_service.py Reads and writes fail2ban server-level settings (log level, log target, syslog socket, DB location, purge age)
health_service.py Probes fail2ban socket connectivity, retrieves server version and global stats, reports online/offline status
Blocklist Import Architecture

The blocklist import flow has been refactored to separate concerns into focused components:

blocklist_service.py (Public API)
    │
    ├─ import_source() ──┐
    │                    │
    └─ import_all()      ├──> BlocklistImportWorkflow (Orchestrator)
                         │         │
                         │         ├──> BlocklistDownloader
                         │         │       • HTTP GET with retry logic
                         │         │       • Exponential backoff (429, 5xx)
                         │         │       • Timeout handling
                         │         │
                         │         ├──> BlocklistParser
                         │         │       • Parse text to IP lines
                         │         │       • Validate IPv4/IPv6 addresses
                         │         │       • Skip CIDRs and malformed entries
                         │         │
                         │         ├──> BanExecutor
                         │         │       • Ban each IP via fail2ban socket
                         │         │       • Abort on JailNotFoundError
                         │         │       • Continue on individual ban failures
                         │         │
                         │         └──> Geo pre-warming
                         │               (optional batch lookup for newly banned IPs)
                         │
                         └──> Result logging (import_log_repo)

Component Responsibilities:

  • BlocklistDownloader: Handles HTTP transport concerns (retries, timeouts, backoff)
  • BlocklistParser: Handles parsing and validation logic (clean, testable, no I/O)
  • BanExecutor: Handles fail2ban integration with error aggregation
  • BlocklistImportWorkflow: Coordinates the flow, handles result aggregation and geo pre-warming
  • blocklist_service.py: Maintains public API (source CRUD, scheduling, import triggers)

Benefits of This Architecture:

  • Each component is independently testable with mock dependencies
  • Error handling is clear: JailNotFoundError stops processing, JailOperationError continues
  • Components can be evolved independently (e.g., replace HTTP client, add batch validation)
  • Logging is contextual and tied to the appropriate layer
  • Retry logic and transient error handling are isolated

DNS-Rebinding Protection

The Vulnerability:

A DNS-rebinding attack exploits a time-of-check-to-time-of-use (TOCTOU) window between when a blocklist URL is validated and when it is actually fetched:

  1. User adds blocklist URL http://attacker.com/blocklist.txt
  2. blocklist_service.create_source() calls validate_blocklist_url() which performs DNS resolution
  3. attacker.com resolves to a public IP (attacker's real server) — validation passes ✓
  4. Later, when BlocklistDownloader fetches the URL, the attacker's DNS server responds with 192.168.1.1
  5. The HTTP client connects to the private IP, potentially accessing internal services

The Protection:

BanGUI closes this window by adding a second DNS-rebinding check at connection time:

  1. Create-time validation (app/utils/ip_utils.py:validate_blocklist_url): Confirms the URL resolves to a public IP when created
  2. Connection-time validation (app/services/dns_validated_connector.py): Validates that all resolved IPs are public when the actual HTTP connection is made

The HTTP session is created with a custom socket factory that intercepts DNS resolution results before socket creation. If any resolved IP is private or reserved, the connection is rejected with a clear error.

Implementation:

  • app/services/dns_validated_connector.py: Provides create_dns_validated_socket_factory() which returns a socket factory that validates IPs using is_private_ip()
  • app/startup.py:_create_http_session(): Passes the socket factory to aiohttp.TCPConnector, protecting all HTTP requests globally
  • All blocklist imports automatically inherit this protection through the shared session

Protected IP Ranges:

The validation blocks all RFC 1918 private ranges, loopback, link-local, ULA, multicast, and reserved addresses:

  • IPv4: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 224.0.0.0/4, 240.0.0.0/4, 255.255.255.255/32
  • IPv6: ::1/128, fe80::/10, fc00::/7, ff00::/8, and others (via ipaddress.IPv6Address.is_private, etc.)

Reference:

Startup DAG (app/startup_dag.py, app/startup.py)

The startup process is orchestrated by an explicit Directed Acyclic Graph (DAG) that defines all resource initialization stages, their dependencies, health checks, and rollback strategy. This replaces implicit ordering with explicit, documented prerequisites.

Why This Exists:

Previously, startup resources were created in a procedural sequence without documented dependencies. If a stage was reordered or a prerequisite was missed, initialization could fail in non-obvious ways. Partial failures could leave stale resources (open database connections, HTTP sessions, running schedulers) that prevented clean rollback.

Startup Stages (in order):

1. WORKER_MODE
   └─ Validates that BANGUI_WORKERS=1 (scheduler cannot run in multiple workers)

2. DATABASE
   ├─ Prerequisite: WORKER_MODE
   ├─ Creates database directory
   ├─ Initializes database schema
   ├─ Caches setup completion state
   └─ Loads persisted runtime settings

3. GEO_CACHE
   ├─ Prerequisite: DATABASE
   ├─ Loads IP geolocation cache from database
   ├─ Counts unresolved IPs
   ├─ Initializes MaxMind GeoLite2 database
   └─ Configures HTTP fallback (if enabled)

4. HTTP_SESSION
   ├─ Prerequisite: GEO_CACHE
   ├─ Creates aiohttp.ClientSession
   └─ Configures timeouts and connection limits

5. SCHEDULER
   ├─ Prerequisite: HTTP_SESSION
   ├─ Creates APScheduler AsyncIOScheduler
   └─ Starts the scheduler

6. TASKS
   ├─ Prerequisite: SCHEDULER
   ├─ Registers health_check task (fail2ban connectivity probe)
   ├─ Registers blocklist_import task (scheduled imports)
   ├─ Registers geo_cache_cleanup task (stale entry purge)
   ├─ Registers geo_cache_flush task (periodic persistence)
   ├─ Registers geo_re_resolve task (stale record re-resolution)
   ├─ Registers history_sync task (ban history sync)
   └─ Registers session_cleanup task (expired session purge)

Failure Mode & Rollback:

If any stage fails:

  1. All completed stages are rolled back in reverse order (Tasks → Scheduler → HTTP_SESSION → GEO_CACHE → DATABASE → WORKER_MODE)
  2. Each rollback suppresses exceptions to ensure all resources are cleaned up
  3. Database connections are closed
  4. HTTP sessions are closed
  5. The scheduler is shut down
  6. The application startup fails with a clear error message

Health Checks:

After all stages complete, a final health check verifies:

  • All resources have initialized successfully
  • Resources pass their individual health_check() methods
  • No failures occurred during any stage

Implementation:

  • StartupDAG: Orchestrates the entire flow, manages prerequisites, and handles failures
  • StartupStage: Enum defining the 6 startup stages
  • StageDependency: Defines stage metadata (description, prerequisites, rollback policy)
  • StartupContext: Tracks registered resources, completed stages, and failure state
  • startup_shared_resources(): Main entry point that builds and executes the DAG
  • stage*(): Functions that implement each stage's initialization logic

Example Usage in Tests:

# Test that a stage with missing prerequisites fails
dag = StartupDAG()
dag.register_stage(StartupStage.HTTP_SESSION, "Create HTTP session", 
                   prerequisites=frozenset([StartupStage.DATABASE]))
dag.register_stage(StartupStage.SCHEDULER, "Create scheduler")

async def http_session_func():
    return aiohttp.ClientSession()

# This will raise RuntimeError because DATABASE hasn't completed
await dag.execute_stage(StartupStage.HTTP_SESSION, http_session_func)

Mappers (app/mappers/)

The response mapping layer. Mappers convert domain models (returned by services) to response models (consumed by HTTP routers). This layer enforces the separation between business logic and API shape.

Location: app/mappers/

Responsibilities:

  • Convert service domain models to API response models
  • Mechanical, thin translation — no business logic
  • Used exclusively at the router boundary

Pattern:

Each domain model has a corresponding mapper function:

# Domain model (from service)
DomainActiveBan  map_domain_active_ban_to_response()  ActiveBan (response)

# Service returns domain models:
async def get_active_bans(...) -> DomainActiveBanList

# Router converts at the boundary:
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Why separate?

When API requirements change (e.g., new field added, field renamed), only:

  1. Response model in app/models/ changes
  2. Mapper function in app/mappers/ updates
  3. Routers stay the same
  4. Services don't change

Without this layer, changes to API shape would require modifying services and their tests.

Repositories (app/repositories/)

The data access layer. Repositories execute raw SQL queries against the application SQLite database. They return plain data or domain models — they never raise HTTP exceptions or contain business logic.

Repository Purpose
settings_repo.py CRUD operations for application settings (master password hash, DB path, fail2ban socket path, preferences)
session_repo.py Store, retrieve, and delete session records for authentication
blocklist_repo.py Persist blocklist source definitions (name, URL, enabled/disabled)
fail2ban_db_repo.py Read historical ban records from the fail2ban SQLite database
geo_cache_repo.py Persist and query IP geo resolution cache
import_log_repo.py Record import run results (timestamp, source, IPs imported, errors) for the import log view

Every repository in app/repositories/ has a corresponding protocol in app/repositories/protocols.py, including settings_repo.py and history_archive_repo.py.

Models (app/models/)

Pydantic schemas that define data shapes and validation. Models are split into three categories per domain.

Model file Purpose
auth.py Login/request and session models
ban.py Ban creation and lookup models
blocklist.py Blocklist source and import log models
config.py Fail2ban config view/edit models
file_config.py Raw config file read/write models
geo.py Geo and ASN lookup models
history.py Historical ban query and timeline models
jail.py Jail listing and status models
server.py Server status and settings models
setup.py First-run setup wizard models

Model Layering Rules: Models are pure data classes (leaf nodes) in the dependency graph. They must not import from application-layer modules (app.services, app.config, app.utils). Models may import from:

  • Standard library and third-party packages (Pydantic, typing)
  • Other models in app.models/ (sibling models)
  • app.models.response (response envelopes)

Critical Constraint — No I/O or Side Effects: Pydantic validators, field defaults, and computed fields must be pure functions with no side effects:

  • NO imports from app.config, app.services, app.utils, or app.routers (these are application-layer modules)
  • NO calls to get_settings(), file I/O, database queries, network calls, or any runtime-dependent functions
  • NO default_factory that calls app-layer functions

These constraints ensure that importing a model file does not trigger application initialization and prevents hidden circular dependencies.

Validation that requires access to app-level state (e.g., allowed log directories, settings, database) must be moved to the router or service layer, not in model validators. Validation occurs at the boundary — where settings and services are already available.

Tasks (app/tasks/)

APScheduler background jobs that run on a schedule without user interaction.

Task Purpose
blocklist_import.py Downloads all enabled blocklist sources, validates entries, applies bans, records results in the import log
geo_cache_cleanup.py Periodically removes entries from the geo_cache table that have not been referenced in the configured retention period (default: 90 days). Prevents unbounded database growth.
geo_cache_flush.py Periodically flushes newly resolved IPs from the in-memory dirty set to the geo_cache SQLite table (default: every 60 seconds). GET requests populate only the in-memory cache; this task persists them without blocking any request.
geo_re_resolve.py Periodically re-resolves stale entries in geo_cache to keep geolocation data fresh
health_check.py Periodically pings the fail2ban socket and updates the cached server status so the frontend always has fresh data
history_sync.py Periodically copies new records from the fail2ban SQLite database into BanGUI's history_archive table; delegates the sync algorithm to history_service.py
session_cleanup.py Periodically removes expired sessions from the sessions SQLite table (default: every 6 hours). Without this cleanup, the table grows unbounded and degrades query performance.

Utils (app/utils/)

Pure helper modules with no framework dependencies.

Module Purpose
fail2ban_client.py Async client that communicates with fail2ban via its Unix domain socket — sends commands and parses responses using the fail2ban protocol. Modelled after ./fail2ban-master/fail2ban/client/csocket.py and ./fail2ban-master/fail2ban/client/fail2banclient.py.
jail_socket.py Low-level jail reload operations (reload_all) extracted to break service dependencies. Used by jail_service, jail_config_service, action_config_service, and filter_config_service to avoid circular imports between sibling services.
ip_utils.py Validates IPv4/IPv6 addresses and CIDR ranges using the ipaddress stdlib module, normalises formats
jail_utils.py Jail helper functions for configuration and status inference
jail_config.py Jail config parser and serializer for fail2ban config manipulation
time_utils.py Timezone-aware datetime construction, formatting helpers, time-range calculations
log_utils.py Structured log formatting and enrichment helpers
conffile_parser.py Parses Fail2ban .conf files into structured objects and serialises back to text
config_parser.py Builds structured config objects from file content tokens
config_writer.py Atomic config file writes, backups, and safe replace semantics
config_file_utils.py Common file-level config utility helpers
fail2ban_db_utils.py Fail2ban DB path discovery and ban-history parsing helpers
setup_utils.py Setup wizard helper utilities
constants.py Shared constants: default socket path, default database path, time-range presets, parser truthy values, limits

Configuration (app/config.py)

A single Pydantic settings model that loads all configuration from environment variables (prefixed BANGUI_) and an optional .env file. Validated at startup — the application refuses to start if required values are missing.

Dependencies (app/dependencies.py)

FastAPI Depends() providers that inject shared resources into route handlers: the database connection, service instances, the authenticated session, and the fail2ban client. This is the wiring layer that connects routers to services without tight coupling.

Application Entry Point (app/main.py)

The FastAPI app factory. Responsibilities:

  • Creates the FastAPI instance with metadata (title, version, docs URL)
  • Registers the lifespan context manager (startup: open DB, create aiohttp session, start scheduler; shutdown: close all)
  • Mounts all routers
  • Registers global exception handlers that map domain exceptions to HTTP status codes
  • Applies the setup-redirect middleware (returns 423 Locked for all API requests when no configuration exists, except for /api/setup and /api/health)

2.3 Dependency Wiring and Service Composition

BanGUI uses a lightweight dependency injection (DI) pattern based on FastAPI's Depends() framework. There is no heavy container library — the composition root is implicit and managed through simple provider functions in app/dependencies.py.

The DI Pattern

Every injectable dependency follows this structure:

  1. Provider Function — An async function in app/dependencies.py that creates and returns a dependency:

    async def get_settings(app_context: ...) -> Settings:
        """Provide application settings."""
        return app_context.runtime_settings or app_context.settings
    
  2. Type Alias — An Annotated alias that decorates the provider for use in route signatures:

    SettingsDep = Annotated[Settings, Depends(get_settings)]
    
  3. Injection Point — Routers declare their dependencies using the type alias:

    async def my_route(settings: SettingsDep) -> Response:
        # FastAPI automatically calls get_settings() and injects the result
        ...
    

Service Composition Root

Services are not instantiated by a container. Instead, they are composed by routers and tasks through explicit parameter passing. This keeps dependencies visible and avoids implicit side effects.

Example: How ban_service.get_active_bans() is wired:

# Step 1: Router declares what it needs (dependencies.py)
async def get_ban_service_context(
    db: Annotated[aiosqlite.Connection, Depends(get_db)],
    fail2ban_db_repo: Annotated[Fail2BanDbRepository, Depends(get_fail2ban_db_repo)],
) -> BanServiceContext:
    """Combine database connection and repository."""
    return BanServiceContext(db=db, fail2ban_db_repo=fail2ban_db_repo)

BanServiceContextDep = Annotated[BanServiceContext, Depends(get_ban_service_context)]

# Step 2: Router uses the context and calls the service
@router.get("/active")
async def get_active_bans(
    ban_ctx: BanServiceContextDep,
    socket_path: Fail2BanSocketDep,
    geo_cache: GeoCacheDep,
) -> ActiveBanListResponse:
    # Router explicitly passes everything the service needs
    domain_result = await ban_service.get_active_bans(
        socket_path,
        geo_cache=geo_cache,
        app_db=ban_ctx.db,  # ← Explicit, no magic
    )
    return map_domain_active_ban_list_to_response(domain_result)

# Step 3: Service function accepts dependencies as parameters
async def get_active_bans(
    socket_path: str,
    geo_cache: GeoCache,
    app_db: aiosqlite.Connection,
) -> DomainActiveBanList:
    """Retrieve active bans. All dependencies are explicit parameters."""
    # Service logic here
    ...

Why this pattern?

  • Explicit: No hidden coupling. Every dependency is visible in function signatures.
  • Testable: Easy to mock dependencies by passing test doubles.
  • Lightweight: No heavyweight DI container library needed. FastAPI's Depends() is sufficient.
  • Debuggable: Stack traces and type checkers understand the full dependency chain.

Service Context Dependencies

For convenience, related repositories and the database connection are bundled into context objects. These prevent routers from depending on the raw database connection (which violates the repository boundary).

Available Service Contexts:

Context Includes Used By
SessionServiceContext db, session_repo auth router
BlocklistServiceContext db, blocklist_repo, import_log_repo, settings_repo blocklist router
SettingsServiceContext db, settings_repo server settings router
BanServiceContext db, fail2ban_db_repo ban router
HistoryServiceContext db, fail2ban_db_repo, history_archive_repo history router

Each context is created by a provider function:

async def get_ban_service_context(
    db: Annotated[aiosqlite.Connection, Depends(get_db)],
    fail2ban_db_repo: Annotated[Fail2BanDbRepository, Depends(get_fail2ban_db_repo)],
) -> BanServiceContext:
    return BanServiceContext(db=db, fail2ban_db_repo=fail2ban_db_repo)

Adding a New Service

Follow this checklist when creating a new service:

  1. Create the service moduleapp/services/my_service.py
  2. Define the service functions — Each function takes its dependencies as explicit parameters (no imports of other services at the same layer)
  3. Export key functions — Only the public API functions are called by routers
  4. If database access is needed:
    • Routers depend on the appropriate ServiceContextDep (e.g., BanServiceContextDep)
    • Pass context.db and context.repository to the service function
  5. If a new context is needed:
    • Create a @dataclass in app/dependencies.py to hold the related resources
    • Create a provider function get_<service>_context() that combines them
    • Create a type alias <Service>ContextDep for router injection
  6. Register the service — No registration step; FastAPI discovers it via Depends()

Example: Adding a new service that needs blocklist and settings repos:

# app/services/my_new_service.py
async def do_something(
    db: aiosqlite.Connection,
    blocklist_repo: BlocklistRepository,
    settings_repo: SettingsRepository,
) -> MyResult:
    """Do something with blocklist and settings data."""
    sources = await blocklist_repo.list_sources(db)
    settings = await settings_repo.load(db)
    # Business logic
    return ...

# app/routers/my_router.py
from app.dependencies import BlocklistServiceContextDep
from app.services import my_new_service

@router.get("/something")
async def my_endpoint(
    ctx: BlocklistServiceContextDep,  # ← Already has db, blocklist_repo, settings_repo
) -> MyResponse:
    result = await my_new_service.do_something(
        db=ctx.db,
        blocklist_repo=ctx.blocklist_repo,
        settings_repo=ctx.settings_repo,
    )
    return MyResponse(...)

The Repository Boundary

Services must not depend on raw database connections. The repository boundary is enforced by not exporting DbDep to routers. Instead:

  • Routers declare a ServiceContextDep which includes both the db and the needed repositories
  • Services receive the db connection and repositories as parameters
  • Repositories are the only modules that execute SQL; services never call SQL directly

This ensures:

  • Queries are centralized and testable
  • Changes to the database layer don't leak into business logic
  • Repositories can be mocked independently for testing

Lifecycle and Scope

  • Request-scoped: Database connections are created fresh for each request and closed after the response is sent. This prevents contention and locking issues with SQLite.
  • Application-scoped: Shared resources like aiohttp.ClientSession, the scheduler, and the GeoCache are created at startup and reused across all requests.
  • Singleton: Some services (e.g., Fail2BanMetadataService) are instantiated once and cached in app.state or imported as module-level instances.

3. Frontend Architecture

The frontend is a React single-page application built with TypeScript, Vite, and Fluent UI v9. It communicates exclusively with the backend REST API — it never accesses fail2ban, the database, or external services directly.

┌──────────────────────────────────────────────────────────────┐
│                     React Application                        │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Pages   │───▶│ Components │───▶│   Fluent UI v9   │    │
│   └────┬─────┘    └────────────┘    └──────────────────┘    │
│        │                                                     │
│   ┌────┴─────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Hooks   │───▶│  API Layer │───▶│  Backend (REST)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │Providers │    │   Types    │    │     Theme        │    │
│   │(Context) │    │(Interfaces)│    │(Tokens, Styles)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
└──────────────────────────────────────────────────────────────┘

3.1 Project Structure

frontend/
├── public/
├── src/
│   ├── api/                   # API client and per-domain request functions
│   │   ├── client.ts          #   Central fetch wrapper (typed GET/POST/PUT/DELETE)
│   │   ├── endpoints.ts       #   API path constants
│   │   ├── auth.ts            #   Login, logout, session check
│   │   ├── dashboard.ts       #   Dashboard status and ban list
│   │   ├── jails.ts           #   Jail CRUD and controls
│   │   ├── bans.ts            #   Ban/unban actions, banned list
│   │   ├── config.ts          #   Configuration read/write
│   │   ├── history.ts         #   Ban history queries
│   │   ├── blocklist.ts       #   Blocklist source management
│   │   ├── geo.ts             #   IP lookup / geolocation
│   │   └── server.ts          #   Server settings
│   ├── assets/                # Static images, fonts, icons
│   ├── components/            # Reusable UI components
│   │   ├── BanTable.tsx       #   Data table for ban entries
│   │   ├── JailCard.tsx       #   Summary card for a jail
│   │   ├── StatusBar.tsx      #   Server status indicator strip
│   │   ├── TimeRangeSelector.tsx # Quick preset picker (24h, 7d, 30d, 365d)
│   │   ├── IpInput.tsx        #   IP address input with validation
│   │   ├── RegexTester.tsx    #   Side-by-side regex match preview
│   │   ├── WorldMap.tsx       #   Country-outline map with ban counts
│   │   ├── ImportLogTable.tsx #   Blocklist import run history
│   │   ├── ConfirmDialog.tsx  #   Reusable confirmation modal
│   │   ├── RequireAuth.tsx    #   Route guard: redirects unauthenticated users to /login
│   │   ├── SetupGuard.tsx     #   Route guard: redirects to /setup if setup incomplete
│   │   └── ...                #   (additional shared components)
│   ├── hooks/                 # Custom React hooks (stateful logic + API calls)
│   │   ├── useAuth.ts         #   Login state, login/logout actions
│   │   ├── useBans.ts         #   Fetch ban list for a time range
│   │   ├── useJails.ts        #   Fetch jail list and details
│   │   ├── useConfig.ts       #   Fetch and update configuration
│   │   ├── useHistory.ts      #   Fetch historical ban data
│   │   ├── useBlocklists.ts   #   Fetch and manage blocklist sources
│   │   ├── useServerStatus.ts #   Poll server health / status
│   │   └── useGeo.ts          #   IP lookup hook
│   ├── layouts/               # Page-level layout wrappers
│   │   └── AppLayout.tsx      #   Sidebar navigation + header + content area
│   ├── pages/                 # Route-level page components (one per route)
│   │   ├── SetupPage.tsx      #   First-run wizard
│   │   ├── LoginPage.tsx      #   Password prompt
│   │   ├── DashboardPage.tsx  #   Ban overview, status bar
│   │   ├── WorldMapPage.tsx   #   Geographical ban map
│   │   ├── JailsPage.tsx      #   Jail list, detail, controls, ban/unban
│   │   ├── ConfigPage.tsx     #   Configuration viewer/editor
│   │   ├── HistoryPage.tsx    #   Ban history browser
│   │   └── BlocklistPage.tsx  #   Blocklist source management + import log
│   ├── providers/             # React context providers
│   │   ├── AuthProvider.tsx   #   Authentication state and guards
│   │   └── ThemeProvider.tsx  #   Light/dark theme switching
│   ├── theme/                 # Fluent UI theme definitions
│   │   ├── customTheme.ts     #   Brand colour ramp, light and dark themes
│   │   └── tokens.ts          #   Spacing, sizing, and z-index constants
│   ├── types/                 # Shared TypeScript interfaces
│   │   ├── auth.ts            #   LoginRequest, SessionInfo
│   │   ├── ban.ts             #   Ban, BanListResponse, BanRequest
│   │   ├── jail.ts            #   Jail, JailDetail, JailListResponse
│   │   ├── config.ts          #   ConfigSection, ConfigUpdateRequest
│   │   ├── history.ts         #   HistoryEntry, IpTimeline
│   │   ├── blocklist.ts       #   BlocklistSource, ImportLogEntry
│   │   ├── geo.ts             #   GeoInfo, AsnInfo
│   │   ├── server.ts          #   ServerStatus, ServerSettings
│   │   └── api.ts             #   ApiError, PaginatedResponse
│   ├── utils/                 # Pure helper functions
│   │   ├── formatDate.ts      #   Date/time formatting with timezone support
│   │   ├── formatIp.ts        #   IP display formatting
│   │   ├── crypto.ts          #   Browser-native SHA-256 helper (SubtleCrypto)
│   │   └── constants.ts       #   Frontend constants (time presets, etc.)
│   ├── App.tsx                # Root: FluentProvider + BrowserRouter + routes
│   ├── main.tsx               # Vite entry point
│   └── vite-env.d.ts          # Vite type shims
├── tsconfig.json
├── vite.config.ts
└── package.json

3.2 Module Purposes

Pages (src/pages/)

Top-level route components. Each page composes layout, components, and hooks to create a full screen. Pages contain no business logic — they orchestrate what is displayed and delegate data fetching to hooks.

Page Route Purpose
SetupPage /setup First-run wizard: set master password, database path, fail2ban connection, preferences
LoginPage /login Single-field password prompt; redirects to requested page after success
DashboardPage / Server status bar, ban list table, time-range selector
WorldMapPage /map World map with per-country ban counts, country filter
JailsPage /jails Jail overview list, jail detail panel, controls (start/stop/reload), ban/unban forms, IP lookup, whitelist management
ConfigPage /config View and edit jail parameters, filter regex, server settings, regex tester, add log observation
HistoryPage /history Browse all past bans, filter by jail/IP/time, per-IP timeline drill-down
BlocklistPage /blocklists Manage blocklist sources, schedule configuration, import log, manual import trigger

Components (src/components/)

Reusable UI building blocks. Components receive data via props, emit changes via callbacks, and never call the API directly. Built exclusively with Fluent UI v9 components.

Component Purpose
StatusBar Displays fail2ban server status (online/offline, version, jail count, total bans)
BanTable Sortable data table for ban entries with columns for time, IP, jail, country, etc.
JailCard Summary card showing jail name, status badge, key metrics
TimeRangeSelector Quick-preset picker for filtering data (24h, 7d, 30d, 365d)
IpInput IP address text field with inline validation
WorldMap SVG/Canvas country-outline map with count overlays and click-to-filter
RegexTester Side-by-side sample log + regex input with live match highlighting
ImportLogTable Table displaying blocklist import history
ConfirmDialog Reusable Fluent UI Dialog for destructive action confirmations
RequireAuth Route guard: renders children only when authenticated; otherwise redirects to /login?next=<path>
SetupGuard Route guard: checks GET /api/setup on mount and redirects to /setup if not complete; shows a spinner while loading
config/ConfigListDetail Reusable two-pane master/detail layout used by the Jails, Filters, and Actions config tabs. Left pane lists items with active/inactive badges (active sorted first, keyboard navigable); right pane renders the selected item's detail content. Collapses to a dropdown on narrow screens.
config/RawConfigSection Collapsible section that lazily loads the raw text of a config file into a monospace textarea. Provides a Save button backed by a configurable save callback; shows idle/saving/saved/error feedback. Used by all three config tabs.
config/AutoSaveIndicator Small inline indicator showing the current save state (idle, saving, saved, error) for form fields that auto-save on change.

Hooks (src/hooks/)

Encapsulate all stateful logic, side effects, and API calls. Components and pages consume hooks to stay declarative.

Hook Purpose
useAuth Manages login state, provides login(), logout(), and isAuthenticated
useBans Fetches ban list for a given time range, returns { bans, loading, error }
useJails Fetches jail list and individual jail detail
useConfig Reads and writes fail2ban jail configuration via the socket-based API
useFilterConfig Fetches and manages a single filter file's parsed configuration
useActionConfig Fetches and manages a single action file's parsed configuration
useJailFileConfig Fetches and manages a single jail.d config file
useConfigActiveStatus Derives active status sets for jails, filters, and actions by correlating the live jail list with the config file lists; returns { activeJails, activeFilters, activeActions, loading, error, refresh }
useAutoSave Debounced auto-save hook: invokes a save callback after the user stops typing, tracks saving/saved/error state
useHistory Queries historical ban data with filters
useBlocklists Manages blocklist sources and import triggers
useServerStatus Polls the server status endpoint at an interval
useGeo Performs IP geolocation lookups on demand

API Layer (src/api/)

A thin typed wrapper around fetch. All HTTP communication is centralised here — components and hooks never construct HTTP requests directly.

Module Purpose
client.ts Central get<T>, post<T>, put<T>, del<T> functions with error handling and credentials
endpoints.ts All API path constants in one place — no hard-coded URLs anywhere else
auth.ts login(), logout(), checkSession()
dashboard.ts fetchStatus(), fetchRecentBans()
jails.ts fetchJails(), fetchJailDetail(), startJail(), stopJail(), reloadJail()
bans.ts banIp(), unbanIp(), unbanAll(), fetchBannedIps()
config.ts Socket-based config: fetchJailConfigs(), updateJailConfig(), testRegex(). File-based config: fetchJailFiles(), fetchJailFile(), writeJailFile(), setJailFileEnabled(), fetchFilterFiles(), fetchFilterFile(), writeFilterFile(), fetchActionFiles(), fetchActionFile(), writeActionFile(), reloadConfig()
history.ts fetchHistory(), fetchIpTimeline()
blocklist.ts fetchSources(), addSource(), removeSource(), triggerImport(), fetchImportLog()
geo.ts lookupIp()
server.ts fetchServerSettings(), updateServerSettings()

Types (src/types/)

Shared TypeScript interfaces and type aliases. Purely declarative — no runtime code. Grouped by domain. Any type used by two or more files lives here.

Providers (src/providers/)

React context providers for application-wide concerns.

Provider Purpose
AuthProvider Holds authentication state; exposes isAuthenticated, login(), and logout() via useAuth()
TimezoneProvider Reads the configured IANA timezone from the backend and supplies it to all children via useTimezone()
ThemeProvider Manages light/dark theme selection, supplies the active Fluent UI theme to FluentProvider

Theme (src/theme/)

Fluent UI custom theme definitions and design token constants. No component logic — only colours, spacing, and sizing values.

Utils (src/utils/)

Pure helper functions with no React or framework dependency. Date formatting, IP display formatting, shared constants, and cryptographic utilities.

Utility Purpose
formatDate.ts Date/time formatting with IANA timezone support
formatIp.ts IP address display formatting
crypto.ts sha256Hex(input) — SHA-256 digest via browser-native SubtleCrypto API; used to hash passwords before transmission
constants.ts Frontend constants (time presets, etc.)

4. Data Flow

4.1 Request Lifecycle

Every user action follows this flow through the system:

User Action (click, form submit)
       │
       ▼
   Page / Component
       │  calls hook
       ▼
   Hook (useXxx)
       │  calls API function
       ▼
   API Layer (src/api/)
       │  HTTP request
       ▼
   FastAPI Router (app/routers/)
       │  validates input (Pydantic)
       │  calls Depends() for auth + services
       ▼
   Service (app/services/)
       │  enforces business rules
       │  calls repository or fail2ban client
       ▼
   Repository (app/repositories/)     or     fail2ban Client (app/utils/)
       │  executes SQL query                       │  sends socket command
       ▼                                           ▼
   SQLite Database                             fail2ban Server
       │                                           │
       └──────────── response bubbles back up ─────┘

4.2 Authentication Flow

┌─────────┐     POST /api/auth/login      ┌─────────────┐
│  Login   │ ─────────────────────────────▶│ auth router  │
│  Page    │     { password: "***" }       │              │
└─────────┘                                └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ auth_service  │
                                           │ - verify hash │
                                           │ - create token│
                                           └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ session_repo  │
                                           │ - store token │
                                           └──────┬───────┘
                                                  │
  Set-Cookie: session=<token>                     │
◀─────────────────────────────────────────────────┘
  • The master password is hashed and stored during setup.
  • On login, the submitted password is verified against the stored hash.
  • A session token is created, stored in the database, and returned as an HTTP-only cookie.
  • Every subsequent request is authenticated via the session cookie using a FastAPI dependency.
  • The AuthProvider on the frontend guards all routes except /setup and /login.

4.3 fail2ban Communication

BanGUI communicates with fail2ban through its Unix domain socket using the fail2ban client-server protocol.

┌────────────────────┐          ┌──────────────────┐
│  ban_service.py    │          │  fail2ban server  │
│  jail_service.py   │──socket──│                   │
│  config_service.py │          │  /var/run/fail2ban│
│  health_service.py │          │  /fail2ban.sock   │
└────────────────────┘          └──────────────────┘

The fail2ban_client.py utility module wraps this communication:

  • Opens an async connection to the Unix socket
  • Serialises commands using the fail2ban protocol (pickle-based, see ./fail2ban-master/fail2ban/client/csocket.py)
  • Parses responses into typed Python objects
  • Handles connection errors gracefully (timeout, socket not found, permission denied)

Reference source: The vendored fail2ban source at ./fail2ban-master is included in the repository as an authoritative protocol reference. When implementing or debugging socket communication, consult:

File What it documents
./fail2ban-master/fail2ban/client/csocket.py CSocket class — low-level Unix socket connection, pickle serialisation, CSPROTO.END framing
./fail2ban-master/fail2ban/client/fail2banclient.py Fail2banClient — command dispatch, argument handling, response beautification
./fail2ban-master/fail2ban/client/beautifier.py Response parser — converts raw server replies into human-readable / structured output
./fail2ban-master/fail2ban/protocol.py CSPROTO constants and the full list of supported commands with descriptions
./fail2ban-master/fail2ban/client/configreader.py Config file parsing used by fail2ban — reference for understanding jail/filter structure

Key commands used:

Command Purpose
status Get global server status (number of jails, fail2ban version)
status <jail> Get jail detail (banned IPs, failure count, filter info)
set <jail> banip <ip> Ban an IP in a specific jail
set <jail> unbanip <ip> Unban an IP from a specific jail
set <jail> idle on/off Toggle jail idle mode
start/stop <jail> Start or stop a jail
reload <jail> Reload a single jail configuration
reload Reload all jails
get <jail> ... Read jail settings (findtime, bantime, maxretry, filter, actions, etc.)
set <jail> ... Write jail settings
set loglevel <level> Change server log level
set logtarget <target> Change server log target
set dbpurgeage <seconds> Set database purge age
flushlogs Flush and re-open log files

4.4 fail2ban Database Access

In addition to the live socket, BanGUI reads the fail2ban SQLite database directly for historical data that the socket protocol does not expose (ban history, past log matches). This is read-only access.

history_service.py ──read-only──▶ fail2ban.db (SQLite)

The fail2ban database contains:

  • bans table — historical ban records (IP, jail, timestamp, ban data)
  • jails table — jail definitions
  • logs table — matched log lines per ban

BanGUI queries these tables to power the Ban History page and the per-IP timeline view.

4.5 External API Communication

geo_service.py ──aiohttp──▶ IP Geolocation API (country, ASN, RIR)
blocklist_service.py ──aiohttp──▶ Blocklist URLs (plain-text IP lists)

All external HTTP calls go through a shared aiohttp.ClientSession created during startup and closed during shutdown. External data is validated before use (IP format, response structure).


5. Database Design

BanGUI maintains its own SQLite database (separate from the fail2ban database) to store application state.

5.1 Application Database Tables

Table Purpose
settings Key-value store for application configuration (master password hash, fail2ban socket path, database path, timezone, session duration)
sessions Active session token hashes with expiry timestamps. Tokens are stored as one-way SHA256 hashes to prevent token hijacking if the database is exposed.
geo_cache Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at, last_seen). Tracks the last time each IP address was referenced to enable retention policies. Entries older than 90 days are automatically purged by the geo_cache_cleanup task to prevent unbounded growth. Loaded into memory at startup via load_cache_from_db(); new entries are flushed back by the geo_cache_flush background task.
blocklist_sources Registered blocklist URLs (id, name, url, enabled, created_at, updated_at)
import_logs Record of every blocklist import run (id, source_id, timestamp, ips_imported, ips_skipped, errors, status)

5.2 Database Boundaries

Database Owner BanGUI Access
BanGUI application DB (bangui.db) BanGUI Read + Write
fail2ban DB (fail2ban.db) fail2ban Read-only (for history queries)

6. Setup & Configuration Persistence

6.1 Initial Setup Wizard & One-Time Configuration

The setup wizard (POST /api/setup) runs once during first-time startup to configure:

  • Master password (bcrypt-hashed)
  • Runtime database path (where BanGUI stores operational state)
  • fail2ban Unix socket path
  • IANA timezone
  • Session duration (in minutes)
  • Map color thresholds for geolocation visualization

Atomicity & Crash-Safety:

Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:

  1. Phase 1 (Bootstrap DB transaction): Set setup_state = "in_progress" and persist database_path. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up.

  2. Phase 2 (Filesystem + Runtime DB): Initialize runtime database schema outside a transaction (idempotent via CREATE TABLE IF NOT EXISTS).

  3. Phase 3 (Runtime DB transaction): Batch-write all runtime settings (password hash, paths, config) atomically in a single BEGIN IMMEDIATE ... COMMIT transaction. Either all settings are persisted or none are.

  4. Phase 4 (Bootstrap DB transaction): Set setup_state = "complete" and setup_completed = "1". This is the final commit point — only when this succeeds is setup considered complete.

Password Hash Idempotency:

The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.

State Machine:

State Meaning Recovery
null Setup not started Normal flow: begin setup
"in_progress" Bootstrap DB marked, runtime DB being initialized Retry from beginning (runtime DB may be partial)
"complete" All settings persisted, setup finished Skip setup (already done)

If a crash is detected in "in_progress" state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.

Backward Compatibility:

The setup_completed = "1" key is still written for backward compatibility with cache detection. Modern code checks setup_state = "complete" for clearer semantics.


8. Authentication & Session Management

  • Single-user model — one master password, no usernames.
  • Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
  • Sessions are token-based, stored server-side in the sessions table as one-way SHA256 hashes, and delivered to the browser as HTTP-only secure cookies.
  • Session token hashing — Session tokens are hashed before storage to prevent token hijacking if the database file is exposed. Only the hash (token_hash) is stored in the database; the raw token is never persisted. When validating a session, the incoming token is hashed before the database lookup. This ensures the database alone is not sufficient to usurp a session — an attacker would also need knowledge of the original token value.
  • Session expiry is configurable (set during setup, stored in settings).
  • The frontend AuthProvider checks session validity on mount and redirects to /login if invalid.
  • The backend dependencies.py provides an authenticated dependency that validates the session cookie on every protected endpoint.
  • Session validation cache (InMemorySessionCache in app.utils.session_cache) — validated session tokens are cached in memory for 10 seconds (configurable via session_cache_ttl_seconds) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. ⚠️ This cache is process-local and not safe for multi-worker or distributed deployments. In single-worker mode (enforced by TASK-002), this is safe and improves performance. For multi-worker deployments, replace InMemorySessionCache with a shared backend (Redis, database, shared memory) implementing the SessionCache protocol. See app/utils/session_cache.py module docstring for implementation details.
  • GeoCacheGeoCache instance is created at startup with a configurable allow_http_fallback flag and stored on app.state.geo_cache. It implements a primary + fallback resolution strategy: (1) try local MaxMind GeoLite2-Country MMDB database (primary, encrypted, no network traffic), (2) if unavailable/no result and allowed, fall back to ip-api.com HTTP API (unencrypted, disabled by default for security). Encapsulates in-memory lookup cache, negative cache for unresolvable IPs (5-minute TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from the geo_cache SQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by the geo_cache_flush background task. Stale entries are re-resolved by the geo_re_resolve task. Injected into routes and tasks via FastAPI's dependency system. See Backend-Development.md § IP Geolocation Resolution for setup and security details.
  • Runtime state (RuntimeState in app.utils.runtime_state) — stores mutable application state: server_status (fail2ban online/offline), last_activation (jail activation tracking), pending_recovery (crash detection), runtime_settings (effective configuration), and service-specific state holders like jail_service_state (JailServiceState for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., record_activation(), clear_pending_recovery()) and via dependency injection to services. Service-specific state (like JailServiceState) is nested within RuntimeState to keep all mutable state in one controlled location. ⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker. Mutations must not span await points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See app/utils/runtime_state.py module docstring for details.
  • Setup-completion flag — once is_setup_complete() returns True, the result is stored in app.state._setup_complete_cached. The SetupRedirectMiddleware skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.

8.1 CSRF Protection

State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a custom header check middleware.

Design:

  • For requests authenticated via the session cookie (not Bearer token), the CsrfMiddleware requires the custom header X-BanGUI-Request: 1 to be present.
  • The frontend API client automatically includes this header on all requests.
  • Cross-site fetch() calls cannot set custom headers without CORS preflight, which the backend rejects for non-allowed origins, providing defense-in-depth.
  • Safe HTTP methods (GET, HEAD, OPTIONS) bypass the check.
  • Bearer token authentication (via Authorization: Bearer header) bypasses the check because tokens are not CSRF-vulnerable (they are not automatically sent on cross-origin requests).
  • Requests missing the CSRF header receive a 403 Forbidden response with detail: "CSRF validation failed. Request rejected.".

This mechanism complements the existing SameSite=Lax cookie policy, which blocks traditional <form> POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.


9. Scheduling

APScheduler 4.x (async mode) manages recurring background tasks.

┌──────────────────────┐
│     APScheduler      │
│  (async, in-process) │
├──────────────────────┤
│  blocklist_import    │  ── runs on configured schedule (default: daily 03:00)
│  geo_cache_cleanup   │  ── runs every 24 hours (nightly)
│  geo_cache_flush     │  ── runs every 60 seconds
│  health_check        │  ── runs every 30 seconds
└──────────────────────┘
  • The scheduler is started during the FastAPI lifespan startup and stopped during shutdown.
  • Job schedules are persisted in the application database so they survive restarts.
  • Users can modify the blocklist import schedule through the web interface.
  • A manual "Run Now" button triggers the blocklist import job outside the schedule.

10.1 Background Tasks and Database Access

  • APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on Depends(get_db).
  • Background tasks must open their own application database connection via app.db.open_db and close it when the work completes.
  • Use a shared task helper (app.tasks.db.task_db) so every task follows the same async context manager pattern and avoids connection leaks.
  • This pattern is intentional: task code is structurally separate from request-handling dependencies and should not attempt to reuse request-scoped DB connections.

9. API Design

9.1 Conventions

  • All endpoints are grouped under /api/ prefix.
  • JSON request and response bodies, validated by Pydantic models.
  • Authentication via session cookie on all endpoints except /api/setup and /api/auth/login.
  • Setup-redirect middleware: while no configuration exists, all API endpoints (except /api/setup and /api/health) return 423 Locked with {"detail": "Setup not complete.", "setup_required": true}. This ensures API consumers can detect setup as a distinct condition rather than transparently following redirects.
  • Standard HTTP status codes: 200 success, 201 created, 204 no content, 400 bad request, 401 unauthorized, 404 not found, 422 validation error, 423 locked, 500 server error.
  • Error responses follow a consistent shape: { "detail": "Human-readable message" }.

9.2 Endpoint Groups

Group Endpoints Description
Auth POST /login, POST /logout Session management
Setup POST /setup First-run configuration
Dashboard GET /status, GET /bans Overview data for the main page
Jails GET /, GET /:name, POST /:name/start, POST /:name/stop, POST /:name/reload, POST /reload-all Jail listing and controls
Bans POST /ban, POST /unban, POST /unban-all, GET /banned Ban management
Config GET /, PUT /, POST /test-regex Configuration viewing and editing
History GET /, GET /ip/:ip Historical ban browsing
Blocklists GET /sources, POST /sources, DELETE /sources/:id, POST /import, GET /import-log Blocklist management
Geo GET /lookup/:ip IP geolocation and enrichment
Server GET /settings, PUT /settings, POST /flush-logs Server-level settings

9. Deployment Architecture

┌──────────────────────────────────────────────────┐
│                   Host Machine                   │
│                                                  │
│  ┌─────────────────────────────────────────────┐ │
│  │  Reverse Proxy (nginx / caddy)              │ │
│  │  - TLS termination                          │ │
│  │  - /api/* → backend (uvicorn)               │ │
│  │  - /*     → frontend (static files)         │ │
│  └──────────────┬───────────────┬──────────────┘ │
│                 │               │                 │
│  ┌──────────────┴───┐  ┌───────┴──────────────┐  │
│  │ Backend           │  │ Frontend             │  │
│  │ uvicorn + FastAPI │  │ Static build (Vite)  │  │
│  │ (port 8000)       │  │ (served by proxy)    │  │
│  └────────┬──────────┘  └──────────────────────┘  │
│           │                                       │
│  ┌────────┴──────────────────────────────────┐    │
│  │  fail2ban (systemd service)               │    │
│  │  Socket: /var/run/fail2ban/fail2ban.sock  │    │
│  │  Database: /var/lib/fail2ban/fail2ban.db  │    │
│  └───────────────────────────────────────────┘    │
└──────────────────────────────────────────────────┘
  • The backend runs as an ASGI server (uvicorn) behind a reverse proxy.
  • The frontend is built to static files by Vite and served directly by the reverse proxy.
  • The backend process needs read access to the fail2ban socket and the fail2ban database.
  • Both the application database and the fail2ban database reside on the same host.

10.2 nginx Routing Rules

The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior:

Location Block Priority

nginx uses longest-prefix matching to determine which location block handles a request:

  1. Exact matches (location =) — highest priority
  2. Regular expression matches (location ~) — second priority
  3. Prefix matches (location /prefix) — matched in order of specificity (longest first)
  4. Catch-all (location /) — lowest priority

Routing Configuration

Location Block Rule Purpose
location /api/ proxy_pass http://backend:8000;no try_files Proxy all API requests to FastAPI backend. Any unmatched API route (typos, invalid paths) returns 404 from the backend.
location /assets/ try_files $uri =404; Serve static assets with long-term caching. Return 404 if file doesn't exist.
location / try_files $uri $uri/ /index.html; SPA fallback: serve index.html for all unmatched routes (client-side routing).

Routing Behavior

Request → /api/some-endpoint
    ↓
    nginx matches location /api/ (longest prefix)
    ↓
    proxy_pass → backend:8000
    ↓
    Backend returns 404 if endpoint doesn't exist (✓ correct)
    Client sees 404, not SPA HTML

Request → /some-page
    ↓
    nginx matches location / (catch-all)
    ↓
    try_files looks for file, then directory, then /index.html
    ↓
    Serves /index.html (React Router handles client-side routing)
    ↓
    Client sees 200 with HTML (✓ correct for SPA)

Request → /api/typos
    ↓
    nginx matches location /api/ (longest prefix, NOT catch-all)
    ↓
    proxy_pass → backend:8000
    ↓
    FastAPI returns 404 (✓ correct, not caught by SPA fallback)

Critical Implementation Notes

  • Never add try_files to the /api/ location block — this would hide backend 404s.
  • The /api/ location must come before the / catch-all in the config (this is automatically respected via longest-prefix matching).
  • No inherited try_files rules — the /api/ location has no global try_files that could affect it.
  • Backend 404 responses pass through nginx unchanged — nginx does not rewrite 404 responses from the backend.

9.2a nginx Security Headers

nginx adds the following OWASP-recommended security headers to all responses:

Header Value Purpose
Content-Security-Policy default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; connect-src 'self'; frame-ancestors 'none'; Prevents XSS attacks by restricting script execution to same-origin. style-src 'unsafe-inline' is required for Fluent UI v9's inline styles.
X-Frame-Options DENY Prevents clickjacking by disallowing iframe embedding.
X-Content-Type-Options nosniff Prevents MIME-sniffing; browsers must respect the declared Content-Type.
Referrer-Policy no-referrer Prevents leaking internal URLs in the Referer header to third-party resources.
Permissions-Policy geolocation=(), microphone=(), camera=() Disables access to browser APIs not needed by the application.
Strict-Transport-Security Commented out Must only be enabled after HTTPS is fully configured. Uncomment when TLS termination is production-ready.

All headers use the always directive, ensuring they are included in error responses (4xx, 5xx) as well.

CSP and Fluent UI

Fluent UI v9 applies styles via inline style attributes on DOM elements. To support this, style-src 'unsafe-inline' is required. A stricter CSP using nonces would require server-side rendering of the HTML shell, which is outside the current architecture.


9.3 Deployment Constraints

Single-Executor Scheduler Requirement

BanGUI's background scheduler must run with exactly one executor process.

The application uses APScheduler's AsyncIOScheduler, which is bound to a single asyncio event loop and cannot be safely shared across multiple worker processes. If the app is deployed with --workers N (where N > 1), the following failures occur:

  • Each worker process creates its own independent scheduler instance.
  • All background jobs execute N times simultaneously (once per worker).
  • Results:
    • Duplicate blocklist imports — the same IP ranges are banned N times.
    • Duplicate history entries — the same historical events are recorded N times.
    • Duplicate ban operations — bans are executed multiple times, with potential state conflicts.
    • SQLite lock contention — concurrent writes to the same database from N workers cause lock timeouts.

Enforcement Mechanism

BanGUI enforces single-executor safety through a database-backed lock that works reliably in container orchestration environments:

  1. Fast check (env var): On startup, the BANGUI_WORKERS environment variable is checked (if set). If explicitly set to a value > 1, startup fails immediately with a clear error.

  2. Authoritative check (database lock): During startup, BanGUI acquires an atomic database lock in the scheduler_lock table. This lock:

    • Uses a singleton row (id=1) to prevent race conditions across simultaneously starting instances
    • Stores the PID, hostname, creation timestamp, and heartbeat timestamp of the lock holder
    • Is considered stale if the heartbeat hasn't been updated for 60 seconds
    • Is automatically cleaned up on stale instance detection, allowing failover in rolling deployments
  3. Lock acquisition (startup):

    • Clean up any stale locks (heartbeat older than 60 seconds)
    • Attempt to insert a new lock row with this instance's PID and hostname
    • If the INSERT fails (row already exists), reject startup with a clear error
    • If the INSERT succeeds, this instance holds the lock and will start the scheduler
  4. Lock maintenance (runtime): A periodic background task (scheduler_lock_heartbeat) updates the lock's heartbeat timestamp every 10 seconds, keeping it alive and preventing false positives from temporary load spikes.

  5. Lock release (shutdown): On graceful shutdown, the lock is released, allowing other instances to acquire it.

Why database-backed instead of filesystem?

Database-backed locking is more reliable in container orchestration because:

  • Atomicity: SQLite transactions are atomic — no race condition window between checking and inserting
  • Container-safe: Works across containers with shared database volumes (no NFS/SMB edge cases)
  • Stale detection: Heartbeat-based TTL is simpler and more reliable than PID-based checks (PID reuse is common in containers)
  • No false positives: Timestamp-based expiration eliminates issues with PID reuse

Startup Sequence with Scheduler Lock

1. DATABASE stage
   └─ Initialize SQLite schema (including scheduler_lock table)

2. WORKER_MODE stage (formerly first, now depends on DATABASE)
   ├─ Fast check: Verify BANGUI_WORKERS env var if explicitly set
   └─ Authoritative check: Acquire scheduler lock in database
      → If lock held by another instance: Fail with clear error
      → If lock acquired: Continue to GEO_CACHE stage

3. (rest of startup continues as normal)

Troubleshooting

Problem: Startup fails with "Could not acquire scheduler lock"

Solution:

  1. Verify no other BanGUI instances are running
  2. Inspect the lock: sqlite3 bangui.db "SELECT * FROM scheduler_lock;"
  3. Check who holds the lock (hostname, PID, heartbeat time)
  4. If stale (heartbeat older than 60 seconds), clean it:
    sqlite3 bangui.db "DELETE FROM scheduler_lock WHERE (strftime('%s', 'now') - heartbeat_at) > 60;"
    
  5. Retry the failed instance

Problem: Stale lock after instance crash

BanGUI handles this automatically:

  • The next instance to start will detect the stale lock (heartbeat older than 60 seconds)
  • It will clean it up and acquire the lock
  • The new instance starts the scheduler as normal

No manual intervention is required.

Environment Variables

  • BANGUI_WORKERS (optional, default: unset)
    • If set to 1 or unset: Normal operation (any number of instances may start, but only one holds the lock)
    • If set to > 1: Startup fails immediately with an error (fast check)
    • Reason: Legacy env var for explicitly forbidding multi-worker deployments

Container Orchestration Examples

Docker Compose:

  • Single service instance (no scaling) — scheduler runs normally

Kubernetes:

  • Single Pod replica — scheduler runs normally
  • Multiple Pod replicas (during rolling update) — old Pod releases lock on shutdown, new Pod acquires it
    • No duplicate jobs, no startup failures
    • Health check should allow 30-60 seconds for lock handoff

systemd / process manager:

  • Single process — scheduler runs normally
  • Accidental multi-process restart — lock prevents duplicate jobs, other processes fail to start scheduler

Future Multi-Worker Support

To safely support multiple workers in the future:

  1. External job store: Move APScheduler from in-memory to a persistent store (e.g., SQLAlchemy-backed job store with PostgreSQL or Redis).
  2. Distributed locking: Use a distributed lock (Redis, etcd) instead of database lock for better performance.
  3. Process coordination: Implement a process-to-worker pool communication mechanism so the scheduler runs only on one designated worker.

Currently, the single-executor approach is simple, maintainable, and sufficient for BanGUI's operational requirements. The database lock provides reliable enforcement across all deployment scenarios.


10. Observability & Distributed Tracing

BanGUI implements distributed tracing via correlation IDs to correlate errors and requests across frontend and backend systems.

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Frontend (React + TypeScript)                               │
├─────────────────────────────────────────────────────────────┤
│ • API Client generates session-scoped UUID4 (correlation ID)│
│ • Telemetry service records structured events               │
│ • Error boundaries catch render errors                      │
│ • All telemetry events include correlation ID for tracing   │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ├─ Every request includes
                     │  X-Correlation-ID header
                     │
┌────────────────────┴────────────────────────────────────────┐
│ Backend (Python + FastAPI + structlog)                      │
├─────────────────────────────────────────────────────────────┤
│ • CorrelationIdMiddleware extracts/generates correlation ID │
│ • All logs automatically include correlation ID              │
│ • Error responses include correlation_id field              │
│ • structlog outputs JSON with correlation ID in all events  │
└─────────────────────────────────────────────────────────────┘

Correlation ID Flow

  1. Frontend → Backend:

    • API client generates/retrieves session-scoped UUID4
    • UUID4 sent in X-Correlation-ID request header
    • All requests use same session UUID (set once, reused)
  2. Backend Processing:

    • CorrelationIdMiddleware extracts/generates correlation ID
    • ID stored in structlog contextvars
    • All structured log entries include correlation ID automatically
    • Error responses include correlation_id field in JSON
  3. Backend → Frontend:

    • Response includes X-Correlation-ID header
    • Error responses include correlation_id in response body
    • Frontend error handlers extract correlation ID
  4. Frontend Error Logging:

    • Error handlers extract correlation ID from API response
    • Telemetry service logs error with correlation ID
    • Browser console and telemetry backends receive linked events

Example: Correlating an Error Across Systems

Scenario: User clicks "Ban IP" button → API returns 500 error → error logged and displayed

Frontend telemetry event:

{
  "event": "api_error",
  "severity": "error",
  "message": "Server error banning IP",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "context": {
    "status": 500,
    "endpoint": "/api/bans"
  },
  "timestamp": "2025-04-30T18:30:00.000Z"
}

Backend structured log:

{
  "event": "ban_service_error",
  "severity": "error",
  "message": "Failed to ban IP",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "context": {
    "ip": "192.168.1.1",
    "jail": "sshd",
    "error": "fail2ban socket error"
  },
  "timestamp": "2025-04-30T18:30:00.000Z"
}

Troubleshooting: Engineer searches logs for correlation ID 550e8400-e29b-41d4-a716-446655440000 and finds all related events (request received, jail lookup, fail2ban call, error response) in order.

Implementation Details

Backend:

  • Middleware: app/middleware/correlation.py
    • Generates UUID4 if X-Correlation-ID header missing
    • Stores in structlog contextvars for automatic inclusion in all logs
    • Adds correlation ID to response header and error responses
  • All error handlers include correlation_id in ErrorResponse
  • See backend/app/models/response.py for ErrorResponse.correlation_id field

Frontend:

  • API client: frontend/src/api/client.ts
    • Generates session-scoped UUID4 on first use
    • Includes in X-Correlation-ID header for all requests
    • Extracts from response headers and stores in ApiError
  • Telemetry service: frontend/src/utils/telemetry.ts
    • Structured event logging with correlation ID support
    • Redaction utilities for privacy/security
    • Handlers for custom backends (console logger by default)
  • Error handlers: frontend/src/utils/fetchError.ts
    • Extract correlation ID from API errors
    • Log with telemetry for distributed tracing
  • Error boundaries: frontend/src/components/{Error,Page,Section}ErrorBoundary.tsx
    • Catch render-time exceptions
    • Log with telemetry for observability

Privacy & Security

  • No sensitive data logged:

    • Passwords, tokens, session IDs never logged
    • PII (names, emails, IPs) logged only with explicit intent and redaction
    • Redaction utilities: telemetry.redact(), telemetry.redactObject()
  • Backend: Correlation IDs use opaque UUID4 (no user data embedded)

  • Frontend: Same session UUID for all requests (safe to expose in logs)

Future Enhancements

  1. Backend error telemetry aggregation:

    • Send structured logs to observability platform (DataDog, Grafana Loki, etc.)
    • Query by correlation ID to trace entire request flow
  2. Frontend error reporting:

    • Send frontend telemetry to backend /api/telemetry endpoint
    • Store alongside backend logs for unified view
  3. Metrics & dashboards:

    • Error rates by endpoint, severity, error type
    • Latency percentiles and distribution
    • Request success/failure trends

11. Design Principles

These principles govern all architectural decisions in BanGUI.

Principle Application
Separation of Concerns Frontend and backend are independent. Backend layers (router → service → repository) never mix responsibilities.
Service Independence Services must not import other services at the same layer (e.g., jail_config_service must not import jail_service). Shared logic belongs in the utils layer (app/utils/). This prevents circular dependencies, improves testability, and keeps each service focused on its domain.
Single Responsibility Each module, service, and component has one well-defined job.
Dependency Inversion Services depend on abstractions (protocols), not concrete implementations. FastAPI Depends() wires everything.
Async Everything All I/O is non-blocking. No synchronous database, HTTP, or socket calls anywhere in the backend.
Validate at the Boundary Pydantic models validate all data entering the backend. TypeScript types enforce structure on the frontend.
Fail Fast Configuration is validated at startup. Invalid input is rejected immediately with clear errors.
Composition over Inheritance Small, focused objects are composed together rather than building deep class hierarchies.
DRY Shared logic lives in utils, hooks, or base services — never duplicated across modules.
KISS The simplest correct solution wins. No premature abstractions or over-engineering.
YAGNI Only build what is needed now. Extend when a real requirement appears.