Files

Lukas 7ec80fdeec refactor(logging): replace structlog with stdlib logging compat layer

- Remove structlog dependency from backend/pyproject.toml
- Add app.utils.logging_compat shim for keyword-arg logging API
- Add app.utils.json_formatter for JSON log output with extra fields
- Update all backend modules to use logging_compat.get_logger()
- Update docstrings in log_sanitizer.py and json_formatter.py
- Update test comment in test_async_utils.py
- Record 406 failing tests in Docs/Tasks.md for tracking

2026-05-10 13:37:54 +02:00

102 KiB

Raw Blame History

BanGUI — Architecture

This document describes the system architecture of BanGUI, a web application for monitoring, managing, and configuring fail2ban. It defines every major component, module, and data flow so that any developer can understand how the pieces fit together before writing code.

1. High-Level Overview

BanGUI is a two-tier web application with a clear separation between frontend and backend, connected through a RESTful JSON API.

┌──────────────────────────────────────────────────────────────────┐
│                          Browser                                 │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                   Frontend (React + Fluent UI)             │  │
│  │  TypeScript · Vite · Single-Page Application               │  │
│  └──────────────────────────┬─────────────────────────────────┘  │
└─────────────────────────────┼────────────────────────────────────┘
                              │  HTTP / JSON (REST API)
┌─────────────────────────────┼────────────────────────────────────┐
│                          Server                                  │
│  ┌──────────────────────────┴─────────────────────────────────┐  │
│  │                   Backend (FastAPI)                        │  │
│  │  Python 3.12+ · Async · Pydantic v2 · structlog            │  │
│  └─────┬──────────────┬──────────────┬────────────────────────┘  │
│        │              │              │                           │
│  ┌─────┴─────┐  ┌─────┴─────┐  ┌────┴─────┐                      │
│  │  SQLite   │  │ fail2ban  │  │ External │                      │
│  │  (App DB) │  │  (Socket) │  │   APIs   │                      │
│  └───────────┘  └───────────┘  └──────────┘                      │
└──────────────────────────────────────────────────────────────────┘

Component Summary

Component	Technology	Purpose
Frontend	TypeScript, React, Fluent UI v9, Vite	User interface — displays data, captures user input, communicates with the backend API
Backend	Python 3.12+, FastAPI, Pydantic v2, aiosqlite	Business logic, data persistence, fail2ban communication, scheduling
Application Database	SQLite (via aiosqlite)	Stores BanGUI's own data: configuration, session state, blocklist sources, import logs
fail2ban	Unix domain socket	The monitored service — BanGUI reads status, issues commands, and reads the fail2ban database
MaxMind GeoLite2	Offline MMDB file (mounted into container)	IP geolocation (primary resolver) — local, encrypted
External APIs	HTTP (via aiohttp)	Blocklist downloads; IP geolocation fallback (only if MMDB unavailable and HTTP fallback enabled)

2. Backend Architecture

The backend follows a layered architecture with strict separation of concerns. Dependencies flow inward: routers depend on services, services depend on repositories — never the reverse.

                ┌─────────────────────────────────┐
                │        FastAPI Application       │
                │          (main.py)               │
                └──────────┬──────────────────────-┘
                           │
          ┌────────────────┼────────────────┐
          │                │                │
    ┌─────┴──────┐  ┌─────┴──────┐  ┌──────┴──────┐
    │  Routers   │  │   Tasks    │  │   Config    │
    │  (HTTP)    │  │ (Scheduled)│  │ (Settings)  │
    └─────┬──────┘  └─────┬──────┘  └─────────────┘
          │               │
    ┌─────┴───────────────┴──────┐
    │         Services           │
    │     (Business Logic)       │
    └─────┬──────────────┬───────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │Repositories│ │  External  │
    │ (Database) │ │  Clients   │
    └─────┬──────┘ └─────┬──────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │  SQLite    │ │fail2ban /  │
    │            │ │HTTP APIs   │
    └────────────┘ └────────────┘

2.1 Project Structure

backend/
├── app/
│   ├── __init__.py
│   ├── `main.py`                # FastAPI app factory, lifespan, exception handlers
│   ├── `config.py`              # Pydantic settings (env vars, .env loading)
│   ├── `db.py`                  # Database connection and initialization
│   ├── `exceptions.py`          # Shared domain exception classes; all services and routers import from here
│   ├── `dependencies.py`        # FastAPI Depends() providers (DB, services, auth)
│   ├── `models/`                # Pydantic schemas
│   │   ├── auth.py            #   Login request/response, session models
│   │   ├── ban.py             #   Ban request/response/domain models
│   │   ├── jail.py            #   Jail request/response/domain models
│   │   ├── config.py          #   Configuration view/edit models
│   │   ├── blocklist.py       #   Blocklist source/import models
│   │   ├── history.py         #   Ban history models
│   │   ├── server.py          #   Server status, health check models
│   │   └── setup.py           #   Setup wizard models
│   ├── routers/               # FastAPI routers (HTTP layer only)
│   │   ├── auth.py            #   POST /api/auth/login, POST /api/auth/logout
│   │   ├── setup.py           #   POST /api/setup (first-run configuration)
│   │   ├── dashboard.py       #   GET /api/dashboard/status, GET /api/dashboard/bans
│   │   ├── jails.py           #   CRUD + controls for jails
│   │   ├── bans.py            #   Ban/unban actions, currently banned list
│   │   ├── config.py          #   View/edit fail2ban configuration
│   │   ├── history.py         #   Historical ban queries
│   │   ├── blocklist.py       #   Blocklist source management, manual import trigger
│   │   ├── geo.py             #   IP geolocation and lookup
│   │   └── server.py          #   Server settings (log level, DB purge, etc.)
│   ├── services/              # Business logic (one service per domain)
│   │   ├── auth_service.py    #   Password verification, session creation/validation
│   │   ├── setup_service.py   #   First-run setup logic, configuration persistence
│   │   ├── jail_service.py    #   Jail listing, start/stop/reload, status aggregation
│   │   ├── ban_service.py     #   Ban/unban execution, currently-banned queries
│   │   ├── config_service.py  #   Read/write fail2ban config, regex validation
│   │   ├── config_file_service.py #   Shared config parsing and file-level operations
│   │   ├── raw_config_io_service.py #   Raw config file I/O wrapper
│   │   ├── jail_config_service.py #   jail config activation/deactivation logic
│   │   ├── filter_config_service.py #   filter config lifecycle management
│   │   ├── action_config_service.py #   action config lifecycle management
│   │   ├── log_service.py     #   Log preview and regex test operations
│   │   ├── fail2ban_metadata_service.py #   Resolve and cache the fail2ban SQLite DB path via the fail2ban socket
│   │   ├── history_service.py #   Historical ban queries, per-IP timeline
│   │   ├── blocklist_service.py # Orchestration: source CRUD, scheduling, import triggers
│   │   ├── blocklist_downloader.py #   HTTP download with retry logic
│   │   ├── blocklist_parser.py #   Parse and validate IP addresses
│   │   ├── blocklist_ban_executor.py #   Ban execution with error handling
│   │   ├── blocklist_import_workflow.py #   Import orchestration (coordinates components)
│   │   ├── geo_service.py     #   IP-to-country resolution, ASN/RIR lookup
│   │   ├── server_service.py  #   Server settings, log management, DB purge
│   │   └── health_service.py  #   fail2ban connectivity checks, version detection
│   ├── repositories/          # Data access layer (raw queries only)
│   │   ├── settings_repo.py   #   App configuration CRUD in SQLite
│   │   ├── session_repo.py    #   Session storage and lookup
│   │   ├── blocklist_repo.py  #   Blocklist sources and import log persistence│  │   ├── fail2ban_db_repo.py #   fail2ban SQLite ban history read operations
│  │   ├── geo_cache_repo.py  #   IP geolocation cache persistence│   │   └── import_log_repo.py #   Import run history records
│   ├── tasks/                 # APScheduler background jobs
│   │   ├── blocklist_import.py#   Scheduled blocklist download and application
│   │   ├── geo_cache_flush.py #   Periodic geo cache persistence (dirty-set flush to SQLite)│  │   ├── geo_cache_cleanup.py #   Periodic purge of stale geo cache entries
│   │   ├── geo_re_resolve.py  #   Periodic re-resolution of stale geo cache records│   │   └── health_check.py   #   Periodic fail2ban connectivity probe
│   └── utils/                 # Helpers, constants, shared types
│       ├── fail2ban_client.py #   Async wrapper around the fail2ban socket protocol
│       ├── fail2ban_response.py #   Canonical response parsing: ok(), to_dict(), ensure_list(), is_not_found_error()
│       ├── fail2ban_db_utils.py #   fail2ban database query helpers
│       ├── ip_utils.py        #   IP/CIDR validation and normalisation
│       ├── time_utils.py      #   Timezone-aware datetime helpers
│       ├── config_file_utils.py #   fail2ban config file I/O
│       ├── conffile_parser.py #   fail2ban config file parser/serializer
│       ├── config_parser.py   #   Structured config object parser
│       ├── config_writer.py   #   Atomic config file write operations
│       ├── jail_config.py     #   Jail config helper
│       └── constants.py       #   Shared constants (default paths, limits, etc.)
├── tests/
│   ├── conftest.py            # Shared fixtures (test app, client, mock DB)
│   ├── test_routers/          # One test file per router
│   ├── test_services/         # One test file per service
│   └── test_repositories/     # One test file per repository
├── pyproject.toml
└── .env.example

2.2 Module Purposes

Routers (`app/routers/`)

The HTTP interface layer. Each router maps URL paths to handler functions. Routers parse and validate incoming requests using Pydantic models, delegate all logic to services, and return typed responses. They contain zero business logic.

Router	Prefix	Purpose
`auth.py`	`/api/auth`	Login (password check), logout, session validation
`setup.py`	`/api/setup`	First-run wizard — save initial configuration
`dashboard.py`	`/api/dashboard`	Server status bar data, recent bans for the dashboard
`jails.py`	`/api/jails`	List jails, jail detail, start/stop/reload/idle controls
`bans.py`	`/api/bans`	Ban an IP, unban an IP, unban all, list currently banned IPs
`config.py`	`/api/config`	Read and write fail2ban jail/filter/server configuration via the socket; also serves the fail2ban log tail and service status for the Log tab
`file_config.py`	`/api/config`	Read and write fail2ban config files on disk (jail.d/, filter.d/, action.d/) — list, get, and overwrite raw file contents, toggle jail enabled/disabled
`history.py`	`/api/history`	Query historical bans, per-IP timeline
`blocklist.py`	`/api/blocklists`	CRUD blocklist sources, trigger import, view import logs
`geo.py`	`/api/geo`	IP geolocation lookup, ASN and RIR data
`server.py`	`/api/server`	Log level, log target, DB path, purge age, flush logs
`health.py`	`/api/health`	fail2ban connectivity health check and status

Services (`app/services`)

The business logic layer. Services orchestrate operations, enforce rules, and coordinate between repositories, the fail2ban client, and external APIs. Each service covers a single domain.

Service Layer Responsibilities:

Services must be independent of HTTP concerns. They work with domain models (DTOs), not response models. This ensures:

Domain logic can evolve without affecting API shape
Services are reusable across different frontends
Testing is simpler (no mocking HTTP response types)
Changes to endpoint responses don't require service changes

Domain Models and Response Mapping:

Services return domain models (e.g., DomainActiveBanList, DomainBansByCountry) that represent pure business logic. Response models (e.g., ActiveBanListResponse, BansByCountryResponse) are defined in app/models/ and used only by routers.

Conversion happens at the router boundary:

Router calls service → receives domain model
Router calls mapper function to convert domain model → response model
Router returns response model to HTTP client

Example:

# In ban_service.py
async def get_active_bans(...) -> DomainActiveBanList:
    """Service returns domain model (not HTTP-aware)."""
    ...

# In routers/bans.py (router boundary)
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Mapper functions live in app/mappers/ and are thin, mechanical translations between structures.

Motivation:

The Fail2ban domain doesn't care about field names like country_code (snake_case) vs countryCode (camelCase)
If the API needs pagination metadata added to the response, only the mapper changes
If repositories change their output schema, only services need updating (routers are unaffected)
Services can be tested with simple dataclasses; no need for Pydantic serialization overhead

Service	Purpose
`auth_service.py`	Hashes and verifies the master password, creates and validates session tokens, enforces session expiry
`setup_service.py`	Validates setup input, persists initial configuration, ensures setup runs only once
`jail_service.py`	Retrieves jail list and details from fail2ban, aggregates metrics (banned count, failure count), sends start/stop/reload/idle commands
`ban_service.py`	Executes ban and unban commands via the fail2ban socket, queries the currently banned IP list, validates IPs before banning
`config_service.py`	Reads active jail and filter configuration from fail2ban, writes configuration changes, validates regex patterns, triggers reload; reads the fail2ban log file tail and queries service status for the Log tab
`file_config_service.py`	Reads and writes raw fail2ban config files on disk (jail.d/, filter.d/, action.d/); lists files, reads content, overwrites files, toggles enabled/disabled
`jail_config_service.py`	Discovers inactive jails by parsing jail.conf / jail.local / jail.d/*; writes .local overrides to activate/deactivate jails; triggers fail2ban reload; validates jail configurations
`filter_config_service.py`	Discovers available filters by scanning filter.d/; reads, creates, updates, and deletes filter definitions; assigns filters to jails
`action_config_service.py`	Discovers available actions by scanning action.d/; reads, creates, updates, and deletes action definitions; assigns actions to jails
`config_file_service.py`	Shared utilities for configuration parsing and manipulation: parses config files, validates names/IPs, manages atomic file writes, probes fail2ban socket
`raw_config_io_service.py`	Low-level file I/O for raw fail2ban config files
`fail2ban_metadata_service.py`	Resolves the fail2ban SQLite database path by querying the fail2ban socket and caches the result for reuse across services
`log_service.py`	Log preview and regex test operations (extracted from config_service)
`history_service.py`	Queries the fail2ban database for historical ban records, builds per-IP timelines, computes ban counts and repeat-offender flags, and syncs new records into BanGUI's archive table
`blocklist_service.py`	Orchestration layer for blocklist imports. Delegates to focused components: `BlocklistDownloader` (HTTP download with retry), `BlocklistParser` (IP validation), `BanExecutor` (fail2ban integration), and `BlocklistImportWorkflow` (orchestrates the flow). Maintains public API for source CRUD, preview, scheduling, and import triggers.
`geo_cache.py`	GeoCache class that encapsulates all IP geolocation caching: resolves IP addresses to country, ASN, and organization using a primary local MaxMind GeoLite2-Country database (if available) with optional HTTP fallback to ip-api.com (disabled by default for security). Maintains in-memory and persistent caches with negative cache support, and manages background re-resolution. Instantiated once at startup with allow_http_fallback flag and stored on `app.state.geo_cache`
`geo_service.py`	(Deprecated) Backward-compatibility wrappers that delegate to the `GeoCache` instance. Kept for compatibility with existing code. New code should use `GeoCache` directly or via dependency injection
`server_service.py`	Reads and writes fail2ban server-level settings (log level, log target, syslog socket, DB location, purge age)
`health_service.py`	Probes fail2ban socket connectivity, retrieves server version and global stats, reports online/offline status

Blocklist Import Architecture

The blocklist import flow has been refactored to separate concerns into focused components:

blocklist_service.py (Public API)
    │
    ├─ import_source() ──┐
    │                    │
    └─ import_all()      ├──> BlocklistImportWorkflow (Orchestrator)
                         │         │
                         │         ├──> BlocklistDownloader
                         │         │       • HTTP GET with retry logic
                         │         │       • Exponential backoff (429, 5xx)
                         │         │       • Timeout handling
                         │         │
                         │         ├──> BlocklistParser
                         │         │       • Parse text to IP lines
                         │         │       • Validate IPv4/IPv6 addresses
                         │         │       • Skip CIDRs and malformed entries
                         │         │
                         │         ├──> BanExecutor
                         │         │       • Ban each IP via fail2ban socket
                         │         │       • Abort on JailNotFoundError
                         │         │       • Continue on individual ban failures
                         │         │
                         │         └──> Geo pre-warming
                         │               (optional batch lookup for newly banned IPs)
                         │
                         └──> Result logging (import_log_repo)

Component Responsibilities:

BlocklistDownloader: Handles HTTP transport concerns (retries, timeouts, backoff)
BlocklistParser: Handles parsing and validation logic (clean, testable, no I/O)
BanExecutor: Handles fail2ban integration with error aggregation
BlocklistImportWorkflow: Coordinates the flow, handles result aggregation and geo pre-warming
blocklist_service.py: Maintains public API (source CRUD, scheduling, import triggers)

Benefits of This Architecture:

Each component is independently testable with mock dependencies
Error handling is clear: JailNotFoundError stops processing, JailOperationError continues
Components can be evolved independently (e.g., replace HTTP client, add batch validation)
Logging is contextual and tied to the appropriate layer
Retry logic and transient error handling are isolated

DNS-Rebinding Protection

The Vulnerability:

A DNS-rebinding attack exploits a time-of-check-to-time-of-use (TOCTOU) window between when a blocklist URL is validated and when it is actually fetched:

User adds blocklist URL http://attacker.com/blocklist.txt
blocklist_service.create_source() calls validate_blocklist_url() which performs DNS resolution
attacker.com resolves to a public IP (attacker's real server) — validation passes ✓
Later, when BlocklistDownloader fetches the URL, the attacker's DNS server responds with 192.168.1.1
The HTTP client connects to the private IP, potentially accessing internal services

The Protection:

BanGUI closes this window by adding a second DNS-rebinding check at connection time:

Create-time validation (app/utils/ip_utils.py:validate_blocklist_url): Confirms the URL resolves to a public IP when created
Connection-time validation (app/services/dns_validated_connector.py): Validates that all resolved IPs are public when the actual HTTP connection is made

The HTTP session is created with a custom socket factory that intercepts DNS resolution results before socket creation. If any resolved IP is private or reserved, the connection is rejected with a clear error.

Implementation:

app/services/dns_validated_connector.py: Provides create_dns_validated_socket_factory() which returns a socket factory that validates IPs using is_private_ip()
app/startup.py:_create_http_session(): Passes the socket factory to aiohttp.TCPConnector, protecting all HTTP requests globally
All blocklist imports automatically inherit this protection through the shared session

Protected IP Ranges:

The validation blocks all RFC 1918 private ranges, loopback, link-local, ULA, multicast, and reserved addresses:

IPv4: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 224.0.0.0/4, 240.0.0.0/4, 255.255.255.255/32
IPv6: ::1/128, fe80::/10, fc00::/7, ff00::/8, and others (via ipaddress.IPv6Address.is_private, etc.)

Reference:

OWASP SSRF Prevention Cheat Sheet
Tests: backend/tests/test_services/test_dns_validated_connector.py

Startup DAG (`app/startup_dag.py`, `app/startup.py`)

The startup process is orchestrated by an explicit Directed Acyclic Graph (DAG) that defines all resource initialization stages, their dependencies, health checks, and rollback strategy. This replaces implicit ordering with explicit, documented prerequisites.

Why This Exists:

Previously, startup resources were created in a procedural sequence without documented dependencies. If a stage was reordered or a prerequisite was missed, initialization could fail in non-obvious ways. Partial failures could leave stale resources (open database connections, HTTP sessions, running schedulers) that prevented clean rollback.

Startup Stages (in order):

1. WORKER_MODE
   └─ Validates that BANGUI_WORKERS=1 (scheduler cannot run in multiple workers)

2. DATABASE
   ├─ Prerequisite: WORKER_MODE
   ├─ Creates database directory
   ├─ Initializes database schema
   ├─ Caches setup completion state
   └─ Loads persisted runtime settings

3. GEO_CACHE
   ├─ Prerequisite: DATABASE
   ├─ Loads IP geolocation cache from database
   ├─ Counts unresolved IPs
   ├─ Initializes MaxMind GeoLite2 database
   └─ Configures HTTP fallback (if enabled)

4. HTTP_SESSION
   ├─ Prerequisite: GEO_CACHE
   ├─ Creates aiohttp.ClientSession
   └─ Configures timeouts and connection limits

5. SCHEDULER
   ├─ Prerequisite: HTTP_SESSION
   ├─ Creates APScheduler AsyncIOScheduler
   └─ Starts the scheduler

6. TASKS
   ├─ Prerequisite: SCHEDULER
   ├─ Registers health_check task (fail2ban connectivity probe)
   ├─ Registers blocklist_import task (scheduled imports)
   ├─ Registers geo_cache_cleanup task (stale entry purge)
   ├─ Registers geo_cache_flush task (periodic persistence)
   ├─ Registers geo_re_resolve task (stale record re-resolution)
   ├─ Registers history_sync task (ban history sync)
   └─ Registers session_cleanup task (expired session purge)

Failure Mode & Rollback:

If any stage fails:

All completed stages are rolled back in reverse order (Tasks → Scheduler → HTTP_SESSION → GEO_CACHE → DATABASE → WORKER_MODE)
Each rollback suppresses exceptions to ensure all resources are cleaned up
Database connections are closed
HTTP sessions are closed
The scheduler is shut down
The application startup fails with a clear error message

Health Checks:

After all stages complete, a final health check verifies:

All resources have initialized successfully
Resources pass their individual health_check() methods
No failures occurred during any stage

Implementation:

StartupDAG: Orchestrates the entire flow, manages prerequisites, and handles failures
StartupStage: Enum defining the 6 startup stages
StageDependency: Defines stage metadata (description, prerequisites, rollback policy)
StartupContext: Tracks registered resources, completed stages, and failure state
startup_shared_resources(): Main entry point that builds and executes the DAG
stage*(): Functions that implement each stage's initialization logic

Example Usage in Tests:

# Test that a stage with missing prerequisites fails
dag = StartupDAG()
dag.register_stage(StartupStage.HTTP_SESSION, "Create HTTP session", 
                   prerequisites=frozenset([StartupStage.DATABASE]))
dag.register_stage(StartupStage.SCHEDULER, "Create scheduler")

async def http_session_func():
    return aiohttp.ClientSession()

# This will raise RuntimeError because DATABASE hasn't completed
await dag.execute_stage(StartupStage.HTTP_SESSION, http_session_func)

Mappers (`app/mappers/`)

The response mapping layer. Mappers convert domain models (returned by services) to response models (consumed by HTTP routers). This layer enforces the separation between business logic and API shape.

Location: app/mappers/

Responsibilities:

Convert service domain models to API response models
Mechanical, thin translation — no business logic
Used exclusively at the router boundary

Pattern:

Each domain model has a corresponding mapper function:

# Domain model (from service)
DomainActiveBan → map_domain_active_ban_to_response() → ActiveBan (response)

# Service returns domain models:
async def get_active_bans(...) -> DomainActiveBanList

# Router converts at the boundary:
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Why separate?

When API requirements change (e.g., new field added, field renamed), only:

Response model in app/models/ changes
Mapper function in app/mappers/ updates
Routers stay the same
Services don't change

Without this layer, changes to API shape would require modifying services and their tests.

Repositories (`app/repositories/`)

The data access layer. Repositories execute raw SQL queries against the application SQLite database. They return plain data or domain models — they never raise HTTP exceptions or contain business logic.

Repository	Purpose
`settings_repo.py`	CRUD operations for application settings (master password hash, DB path, fail2ban socket path, preferences)
`session_repo.py`	Store, retrieve, and delete session records for authentication
`blocklist_repo.py`	Persist blocklist source definitions (name, URL, enabled/disabled)
`fail2ban_db_repo.py`	Read historical ban records from the fail2ban SQLite database
`geo_cache_repo.py`	Persist and query IP geo resolution cache
`import_log_repo.py`	Record import run results (timestamp, source, IPs imported, errors) for the import log view

Every repository in app/repositories/ has a corresponding protocol in app/repositories/protocols.py, including settings_repo.py and history_archive_repo.py.

Models (`app/models/`)

Pydantic schemas that define data shapes and validation. Models are split into three categories per domain.

Model file	Purpose
`auth.py`	Login/request and session models
`ban.py`	Ban creation and lookup models
`blocklist.py`	Blocklist source and import log models
`config.py`	Fail2ban config view/edit models
`file_config.py`	Raw config file read/write models
`geo.py`	Geo and ASN lookup models
`history.py`	Historical ban query and timeline models
`jail.py`	Jail listing and status models
`server.py`	Server status and settings models
`setup.py`	First-run setup wizard models

Model Layering Rules: Models are pure data classes (leaf nodes) in the dependency graph. They must not import from application-layer modules (app.services, app.config, app.utils). Models may import from:

Standard library and third-party packages (Pydantic, typing)
Other models in app.models/ (sibling models)
app.models.response (response envelopes)

Critical Constraint — No I/O or Side Effects: Pydantic validators, field defaults, and computed fields must be pure functions with no side effects:

❌ NO imports from app.config, app.services, app.utils, or app.routers (these are application-layer modules)
❌ NO calls to get_settings(), file I/O, database queries, network calls, or any runtime-dependent functions
❌ NO default_factory that calls app-layer functions

These constraints ensure that importing a model file does not trigger application initialization and prevents hidden circular dependencies.

Validation that requires access to app-level state (e.g., allowed log directories, settings, database) must be moved to the router or service layer, not in model validators. Validation occurs at the boundary — where settings and services are already available.

Tasks (`app/tasks/`)

APScheduler background jobs that run on a schedule without user interaction.

Task	Purpose
`blocklist_import.py`	Downloads all enabled blocklist sources, validates entries, applies bans, records results in the import log
`geo_cache_cleanup.py`	Periodically removes entries from the `geo_cache` table that have not been referenced in the configured retention period (default: 90 days). Prevents unbounded database growth.
`geo_cache_flush.py`	Periodically flushes newly resolved IPs from the in-memory dirty set to the `geo_cache` SQLite table (default: every 60 seconds). GET requests populate only the in-memory cache; this task persists them without blocking any request.
`geo_re_resolve.py`	Periodically re-resolves stale entries in `geo_cache` to keep geolocation data fresh
`health_check.py`	Periodically pings the fail2ban socket and updates the cached server status so the frontend always has fresh data
`history_sync.py`	Periodically copies new records from the fail2ban SQLite database into BanGUI's `history_archive` table; delegates the sync algorithm to `history_service.py`
`session_cleanup.py`	Periodically removes expired sessions from the `sessions` SQLite table (default: every 6 hours). Without this cleanup, the table grows unbounded and degrades query performance.

Utils (`app/utils/`)

Pure helper modules with no framework dependencies.

Module	Purpose
`fail2ban_client.py`	Async client that communicates with fail2ban via its Unix domain socket — sends commands and parses responses using the fail2ban protocol. Modelled after `./fail2ban-master/fail2ban/client/csocket.py` and `./fail2ban-master/fail2ban/client/fail2banclient.py`.
`jail_socket.py`	Low-level jail reload operations (`reload_all`) extracted to break service dependencies. Used by `jail_service`, `jail_config_service`, `action_config_service`, and `filter_config_service` to avoid circular imports between sibling services.
`ip_utils.py`	Validates IPv4/IPv6 addresses and CIDR ranges using the `ipaddress` stdlib module, normalises formats
`jail_utils.py`	Jail helper functions for configuration and status inference
`jail_config.py`	Jail config parser and serializer for fail2ban config manipulation
`time_utils.py`	Timezone-aware datetime construction, formatting helpers, time-range calculations
`log_utils.py`	Structured log formatting and enrichment helpers
`conffile_parser.py`	Parses Fail2ban `.conf` files into structured objects and serialises back to text
`config_parser.py`	Builds structured config objects from file content tokens
`config_writer.py`	Atomic config file writes, backups, and safe replace semantics
`config_file_utils.py`	Common file-level config utility helpers
`fail2ban_db_utils.py`	Fail2ban DB path discovery and ban-history parsing helpers
`setup_utils.py`	Setup wizard helper utilities
`constants.py`	Shared constants: default socket path, default database path, time-range presets, parser truthy values, limits

Configuration (`app/config.py`)

A single Pydantic settings model that loads all configuration from environment variables (prefixed BANGUI_) and an optional .env file. Validated at startup — the application refuses to start if required values are missing.

Dependencies (`app/dependencies.py`)

FastAPI Depends() providers that inject shared resources into route handlers: the database connection, service instances, the authenticated session, and the fail2ban client. This is the wiring layer that connects routers to services without tight coupling.

Application Entry Point (`app/main.py`)

The FastAPI app factory. Responsibilities:

Creates the FastAPI instance with metadata (title, version, docs URL)
Registers the lifespan context manager (startup: open DB, create aiohttp session, start scheduler; shutdown: close all)
Mounts all routers
Registers global exception handlers that map domain exceptions to HTTP status codes with a hierarchical fallback chain
Applies the setup-redirect middleware (returns 423 Locked for all API requests when no configuration exists, except for /api/setup and /api/health)

Exception Handler Hierarchy:

Exception handlers are registered in order of specificity to ensure each exception type is caught by the most appropriate handler:

Specific network errors (Fail2BanConnectionError, Fail2BanProtocolError) → HTTP 502 Bad Gateway
Specific auth/rate errors (AuthenticationError, RateLimitError) → HTTP 401 Unauthorized / 429 Too Many Requests
Category handlers (NotFoundError, BadRequestError, ConflictError, OperationError, ServiceUnavailableError) → HTTP 404/400/409/500/503
DomainError catch-all → HTTP 500 (catches any unregistered DomainError subclass, ensuring proper error_code and metadata are returned)
HTTPException → HTTP status from exception (FastAPI built-in validation and routing errors)
ValueError → HTTP 400 Bad Request (Pydantic validation errors)
Exception catch-all → HTTP 500 Internal Server Error (absolute fallback for unexpected errors)

The DomainError catch-all handler (step 4) is critical: it ensures that any new DomainError subclass automatically gets the correct HTTP status (500), error_code, and metadata through its inherited error_code attribute and get_error_metadata() method, even if the developer forgot to create an explicit handler for it. This prevents silent failures where an unhandled exception would return a generic "internal_error" code instead of the specific error code defined by the exception class.

2.3 Dependency Wiring and Service Composition

BanGUI uses a lightweight dependency injection (DI) pattern based on FastAPI's Depends() framework. There is no heavy container library — the composition root is implicit and managed through simple provider functions in app/dependencies.py.

The DI Pattern

Every injectable dependency follows this structure:

Provider Function — An async function in app/dependencies.py that creates and returns a dependency:

async def get_settings(app_context: ...) -> Settings:
    """Provide application settings."""
    return app_context.runtime_settings or app_context.settings

Type Alias — An Annotated alias that decorates the provider for use in route signatures:
```
SettingsDep = Annotated[Settings, Depends(get_settings)]
```

Injection Point — Routers declare their dependencies using the type alias:

async def my_route(settings: SettingsDep) -> Response:
    # FastAPI automatically calls get_settings() and injects the result
    ...

Module-Level Imports:

All repository and service modules are imported at module level in app/dependencies.py. These imports are safe at the top because no circular dependencies exist — repositories and services do not import from dependencies.py. This follows the principle of importing dependencies early and consistently:

# app/dependencies.py (top of file)
from app.repositories import (
    blocklist_repo,
    fail2ban_db_repo,
    session_repo,
    # ... other repository modules
)
from app.services import auth_service, health_service
from app.services.fail2ban_metadata_service import default_fail2ban_metadata_service

# Provider functions simply return the module
async def get_session_repo() -> SessionRepository:
    return session_repo

Exception: The from app.db import open_db import remains local to get_db() because it is only used within that specific function and the module load overhead is avoided.

Service Composition Root

Services are not instantiated by a container. Instead, they are composed by routers and tasks through explicit parameter passing. This keeps dependencies visible and avoids implicit side effects.

Example: How ban_service.get_active_bans() is wired:

# Step 1: Router declares what it needs (dependencies.py)
async def get_ban_service_context(
    db: Annotated[aiosqlite.Connection, Depends(get_db)],
    fail2ban_db_repo: Annotated[Fail2BanDbRepository, Depends(get_fail2ban_db_repo)],
) -> BanServiceContext:
    """Combine database connection and repository."""
    return BanServiceContext(db=db, fail2ban_db_repo=fail2ban_db_repo)

BanServiceContextDep = Annotated[BanServiceContext, Depends(get_ban_service_context)]

# Step 2: Router uses the context and calls the service
@router.get("/active")
async def get_active_bans(
    ban_ctx: BanServiceContextDep,
    socket_path: Fail2BanSocketDep,
    geo_cache: GeoCacheDep,
) -> ActiveBanListResponse:
    # Router explicitly passes everything the service needs
    domain_result = await ban_service.get_active_bans(
        socket_path,
        geo_cache=geo_cache,
        app_db=ban_ctx.db,  # ← Explicit, no magic
    )
    return map_domain_active_ban_list_to_response(domain_result)

# Step 3: Service function accepts dependencies as parameters
async def get_active_bans(
    socket_path: str,
    geo_cache: GeoCache,
    app_db: aiosqlite.Connection,
) -> DomainActiveBanList:
    """Retrieve active bans. All dependencies are explicit parameters."""
    # Service logic here
    ...

Why this pattern?

Explicit: No hidden coupling. Every dependency is visible in function signatures.
Testable: Easy to mock dependencies by passing test doubles.
Lightweight: No heavyweight DI container library needed. FastAPI's Depends() is sufficient.
Debuggable: Stack traces and type checkers understand the full dependency chain.

Service Context Dependencies

For convenience, related repositories and the database connection are bundled into context objects. These prevent routers from depending on the raw database connection (which violates the repository boundary).

Available Service Contexts:

Context	Includes	Used By
`SessionServiceContext`	`db`, `session_repo`	auth router
`BlocklistServiceContext`	`db`, `blocklist_repo`, `import_log_repo`, `settings_repo`	blocklist router
`SettingsServiceContext`	`db`, `settings_repo`	server settings router
`BanServiceContext`	`db`, `fail2ban_db_repo`	ban router
`HistoryServiceContext`	`db`, `fail2ban_db_repo`, `history_archive_repo`	history router

Each context is created by a provider function:

async def get_ban_service_context(
    db: Annotated[aiosqlite.Connection, Depends(get_db)],
    fail2ban_db_repo: Annotated[Fail2BanDbRepository, Depends(get_fail2ban_db_repo)],
) -> BanServiceContext:
    return BanServiceContext(db=db, fail2ban_db_repo=fail2ban_db_repo)

Adding a New Service

Follow this checklist when creating a new service:

Create the service module — app/services/my_service.py
Define the service functions — Each function takes its dependencies as explicit parameters (no imports of other services at the same layer)
Export key functions — Only the public API functions are called by routers
If database access is needed:
- Routers depend on the appropriate ServiceContextDep (e.g., BanServiceContextDep)
- Pass context.db and context.repository to the service function
If a new context is needed:
- Create a @dataclass in app/dependencies.py to hold the related resources
- Create a provider function get_<service>_context() that combines them
- Create a type alias <Service>ContextDep for router injection
Register the service — No registration step; FastAPI discovers it via Depends()

Example: Adding a new service that needs blocklist and settings repos:

# app/services/my_new_service.py
async def do_something(
    db: aiosqlite.Connection,
    blocklist_repo: BlocklistRepository,
    settings_repo: SettingsRepository,
) -> MyResult:
    """Do something with blocklist and settings data."""
    sources = await blocklist_repo.list_sources(db)
    settings = await settings_repo.load(db)
    # Business logic
    return ...

# app/routers/my_router.py
from app.dependencies import BlocklistServiceContextDep
from app.services import my_new_service

@router.get("/something")
async def my_endpoint(
    ctx: BlocklistServiceContextDep,  # ← Already has db, blocklist_repo, settings_repo
) -> MyResponse:
    result = await my_new_service.do_something(
        db=ctx.db,
        blocklist_repo=ctx.blocklist_repo,
        settings_repo=ctx.settings_repo,
    )
    return MyResponse(...)

The Repository Boundary

Services must not depend on raw database connections. The repository boundary is enforced by not exporting DbDep to routers. Instead:

Routers declare a ServiceContextDep which includes both the db and the needed repositories
Services receive the db connection and repositories as parameters
Repositories are the only modules that execute SQL; services never call SQL directly

This ensures:

Queries are centralized and testable
Changes to the database layer don't leak into business logic
Repositories can be mocked independently for testing

Lifecycle and Scope

Request-scoped: Database connections are created fresh for each request and closed after the response is sent. This prevents contention and locking issues with SQLite.
Application-scoped: Shared resources like aiohttp.ClientSession, the scheduler, and the GeoCache are created at startup and reused across all requests.
Singleton: Some services (e.g., Fail2BanMetadataService) are instantiated once and cached in app.state or imported as module-level instances.

3. Frontend Architecture

The frontend is a React single-page application built with TypeScript, Vite, and Fluent UI v9. It communicates exclusively with the backend REST API — it never accesses fail2ban, the database, or external services directly.

┌──────────────────────────────────────────────────────────────┐
│                     React Application                        │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Pages   │───▶│ Components │───▶│   Fluent UI v9   │    │
│   └────┬─────┘    └────────────┘    └──────────────────┘    │
│        │                                                     │
│   ┌────┴─────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Hooks   │───▶│  API Layer │───▶│  Backend (REST)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │Providers │    │   Types    │    │     Theme        │    │
│   │(Context) │    │(Interfaces)│    │(Tokens, Styles)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
└──────────────────────────────────────────────────────────────┘

3.1 Project Structure

frontend/
├── public/
├── src/
│   ├── api/                   # API client and per-domain request functions
│   │   ├── client.ts          #   Central fetch wrapper (typed GET/POST/PUT/DELETE)
│   │   ├── endpoints.ts       #   API path constants
│   │   ├── auth.ts            #   Login, logout, session check
│   │   ├── dashboard.ts       #   Dashboard status and ban list
│   │   ├── jails.ts           #   Jail CRUD and controls
│   │   ├── bans.ts            #   Ban/unban actions, banned list
│   │   ├── config.ts          #   Configuration read/write
│   │   ├── history.ts         #   Ban history queries
│   │   ├── blocklist.ts       #   Blocklist source management
│   │   ├── geo.ts             #   IP lookup / geolocation
│   │   └── server.ts          #   Server settings
│   ├── assets/                # Static images, fonts, icons
│   ├── components/            # Reusable UI components
│   │   ├── BanTable.tsx       #   Data table for ban entries
│   │   ├── JailCard.tsx       #   Summary card for a jail
│   │   ├── StatusBar.tsx      #   Server status indicator strip
│   │   ├── TimeRangeSelector.tsx # Quick preset picker (24h, 7d, 30d, 365d)
│   │   ├── IpInput.tsx        #   IP address input with validation
│   │   ├── RegexTester.tsx    #   Side-by-side regex match preview
│   │   ├── WorldMap.tsx       #   Country-outline map with ban counts
│   │   ├── ImportLogTable.tsx #   Blocklist import run history
│   │   ├── ConfirmDialog.tsx  #   Reusable confirmation modal
│   │   ├── RequireAuth.tsx    #   Route guard: redirects unauthenticated users to /login
│   │   ├── SetupGuard.tsx     #   Route guard: redirects to /setup if setup incomplete
│   │   └── ...                #   (additional shared components)
│   ├── hooks/                 # Custom React hooks (stateful logic + API calls)
│   │   ├── useAuth.ts         #   Login state, login/logout actions
│   │   ├── useBans.ts         #   Fetch ban list for a time range
│   │   ├── useJails.ts        #   Fetch jail list and details
│   │   ├── useConfig.ts       #   Fetch and update configuration
│   │   ├── useHistory.ts      #   Fetch historical ban data
│   │   ├── useBlocklists.ts   #   Fetch and manage blocklist sources
│   │   ├── useServerStatus.ts #   Poll server health / status
│   │   └── useGeo.ts          #   IP lookup hook
│   ├── layouts/               # Page-level layout wrappers
│   │   └── AppLayout.tsx      #   Sidebar navigation + header + content area
│   ├── pages/                 # Route-level page components (one per route)
│   │   ├── SetupPage.tsx      #   First-run wizard
│   │   ├── LoginPage.tsx      #   Password prompt
│   │   ├── DashboardPage.tsx  #   Ban overview, status bar
│   │   ├── WorldMapPage.tsx   #   Geographical ban map
│   │   ├── JailsPage.tsx      #   Jail list, detail, controls, ban/unban
│   │   ├── ConfigPage.tsx     #   Configuration viewer/editor
│   │   ├── HistoryPage.tsx    #   Ban history browser
│   │   └── BlocklistPage.tsx  #   Blocklist source management + import log
│   ├── providers/             # React context providers
│   │   ├── AuthProvider.tsx   #   Authentication state and guards
│   │   └── ThemeProvider.tsx  #   Light/dark theme switching
│   ├── theme/                 # Fluent UI theme definitions
│   │   ├── customTheme.ts     #   Brand colour ramp, light and dark themes
│   │   └── tokens.ts          #   Spacing, sizing, and z-index constants
│   ├── types/                 # Shared TypeScript interfaces
│   │   ├── auth.ts            #   LoginRequest, SessionInfo
│   │   ├── ban.ts             #   Ban, BanListResponse, BanRequest
│   │   ├── jail.ts            #   Jail, JailDetail, JailListResponse
│   │   ├── config.ts          #   ConfigSection, ConfigUpdateRequest
│   │   ├── history.ts         #   HistoryEntry, IpTimeline
│   │   ├── blocklist.ts       #   BlocklistSource, ImportLogEntry
│   │   ├── geo.ts             #   GeoInfo, AsnInfo
│   │   ├── server.ts          #   ServerStatus, ServerSettings
│   │   └── api.ts             #   ApiError, PaginatedResponse
│   ├── utils/                 # Pure helper functions
│   │   ├── formatDate.ts      #   Date/time formatting with timezone support
│   │   ├── formatIp.ts        #   IP display formatting
│   │   ├── crypto.ts          #   Browser-native SHA-256 helper (SubtleCrypto)
│   │   └── constants.ts       #   Frontend constants (time presets, etc.)
│   ├── App.tsx                # Root: FluentProvider + BrowserRouter + routes
│   ├── main.tsx               # Vite entry point
│   └── vite-env.d.ts          # Vite type shims
├── tsconfig.json
├── vite.config.ts
└── package.json

3.2 Module Purposes

Pages (`src/pages/`)

Top-level route components. Each page composes layout, components, and hooks to create a full screen. Pages contain no business logic — they orchestrate what is displayed and delegate data fetching to hooks.

Page	Route	Purpose
`SetupPage`	`/setup`	First-run wizard: set master password, database path, fail2ban connection, preferences
`LoginPage`	`/login`	Single-field password prompt; redirects to requested page after success
`DashboardPage`	`/`	Server status bar, ban list table, time-range selector
`WorldMapPage`	`/map`	World map with per-country ban counts, country filter
`JailsPage`	`/jails`	Jail overview list, jail detail panel, controls (start/stop/reload), ban/unban forms, IP lookup, whitelist management
`ConfigPage`	`/config`	View and edit jail parameters, filter regex, server settings, regex tester, add log observation
`HistoryPage`	`/history`	Browse all past bans, filter by jail/IP/time, per-IP timeline drill-down
`BlocklistPage`	`/blocklists`	Manage blocklist sources, schedule configuration, import log, manual import trigger

Components (`src/components/`)

Reusable UI building blocks. Components receive data via props, emit changes via callbacks, and never call the API directly. Built exclusively with Fluent UI v9 components.

Component	Purpose
`StatusBar`	Displays fail2ban server status (online/offline, version, jail count, total bans)
`BanTable`	Sortable data table for ban entries with columns for time, IP, jail, country, etc.
`JailCard`	Summary card showing jail name, status badge, key metrics
`TimeRangeSelector`	Quick-preset picker for filtering data (24h, 7d, 30d, 365d)
`IpInput`	IP address text field with inline validation
`WorldMap`	SVG/Canvas country-outline map with count overlays and click-to-filter
`RegexTester`	Side-by-side sample log + regex input with live match highlighting
`ImportLogTable`	Table displaying blocklist import history
`ConfirmDialog`	Reusable Fluent UI Dialog for destructive action confirmations
`RequireAuth`	Route guard: renders children only when authenticated; otherwise redirects to `/login?next=<path>`
`SetupGuard`	Route guard: checks `GET /api/setup` on mount and redirects to `/setup` if not complete; shows a spinner while loading
`config/ConfigListDetail`	Reusable two-pane master/detail layout used by the Jails, Filters, and Actions config tabs. Left pane lists items with active/inactive badges (active sorted first, keyboard navigable); right pane renders the selected item's detail content. Collapses to a dropdown on narrow screens.
`config/RawConfigSection`	Collapsible section that lazily loads the raw text of a config file into a monospace textarea. Provides a Save button backed by a configurable save callback; shows idle/saving/saved/error feedback. Used by all three config tabs.
`config/AutoSaveIndicator`	Small inline indicator showing the current save state (idle, saving, saved, error) for form fields that auto-save on change.

Hooks (`src/hooks/`)

Encapsulate all stateful logic, side effects, and API calls. Components and pages consume hooks to stay declarative.

Hook	Purpose
`useAuth`	Manages login state, provides `login()`, `logout()`, and `isAuthenticated`
`useBans`	Fetches ban list for a given time range, returns `{ bans, loading, error }`
`useJails`	Fetches jail list and individual jail detail
`useConfig`	Reads and writes fail2ban jail configuration via the socket-based API
`useFilterConfig`	Fetches and manages a single filter file's parsed configuration
`useActionConfig`	Fetches and manages a single action file's parsed configuration
`useJailFileConfig`	Fetches and manages a single jail.d config file
`useConfigActiveStatus`	Derives active status sets for jails, filters, and actions by correlating the live jail list with the config file lists; returns `{ activeJails, activeFilters, activeActions, loading, error, refresh }`
`useAutoSave`	Debounced auto-save hook: invokes a save callback after the user stops typing, tracks saving/saved/error state
`useHistory`	Queries historical ban data with filters
`useBlocklists`	Manages blocklist sources and import triggers
`useServerStatus`	Polls the server status endpoint at an interval
`useGeo`	Performs IP geolocation lookups on demand

API Layer (`src/api/`)

A thin typed wrapper around fetch. All HTTP communication is centralised here — components and hooks never construct HTTP requests directly.

Module	Purpose
`client.ts`	Central `get<T>`, `post<T>`, `put<T>`, `del<T>` functions with error handling and credentials
`endpoints.ts`	All API path constants in one place — no hard-coded URLs anywhere else
`auth.ts`	`login()`, `logout()`, `checkSession()`
`dashboard.ts`	`fetchStatus()`, `fetchRecentBans()`
`jails.ts`	`fetchJails()`, `fetchJailDetail()`, `startJail()`, `stopJail()`, `reloadJail()`
`bans.ts`	`banIp()`, `unbanIp()`, `unbanAll()`, `fetchBannedIps()`
`config.ts`	Socket-based config: `fetchJailConfigs()`, `updateJailConfig()`, `testRegex()`. File-based config: `fetchJailFiles()`, `fetchJailFile()`, `writeJailFile()`, `setJailFileEnabled()`, `fetchFilterFiles()`, `fetchFilterFile()`, `writeFilterFile()`, `fetchActionFiles()`, `fetchActionFile()`, `writeActionFile()`, `reloadConfig()`
`history.ts`	`fetchHistory()`, `fetchIpTimeline()`
`blocklist.ts`	`fetchSources()`, `addSource()`, `removeSource()`, `triggerImport()`, `fetchImportLog()`
`geo.ts`	`lookupIp()`
`server.ts`	`fetchServerSettings()`, `updateServerSettings()`

Types (`src/types/`)

Shared TypeScript interfaces and type aliases. Purely declarative — no runtime code. Grouped by domain. Any type used by two or more files lives here.

Providers (`src/providers/`)

React context providers for application-wide concerns.

Provider Ordering and Compile-Time Validation

Provider order is order-sensitive and enforced at compile-time through TypeScript discriminated unions. The required order (outermost to innermost) is:

ThemeProvider — must be outermost; provides theme context to AppContents
FluentProvider — supplies Fluent UI theme and design tokens to all Fluent UI consumers
NotificationProvider — provides notification service; must wrap error boundaries
ErrorBoundary — catches catastrophic errors at the top level
BrowserRouter — enables client-side routing
NavigationCancellationProvider — manages route-aware request cancellation using useLocation()
AuthProvider — validates session on mount; must be inside BrowserRouter (uses useNavigate())
TimezoneProvider — fetches timezone after auth; wraps protected routes only

Compile-Time Validation:

A type-safe builder pattern (ProviderCompositionBuilder) in providerComposition.tsx enforces this order using TypeScript's discriminated unions. The builder prevents adding providers out of order at compile-time:

const tree = createProviderComposition()
  .withTheme({ children })
  .withFluent(theme)          // ✓ Must come after withTheme
  .withNotification()         // ✓ Must come after withFluent
  .withErrorBoundary()        // ✓ Correct order enforced
  .withBrowserRouter()
  .withNavigationCancellation()
  .withAuth()
  .build(routes);

Attempting to add providers out of order results in TypeScript errors (no runtime overhead).

Runtime Validation (Development):

A runtime validator (providerOrderValidator.tsx) provides fallback validation for development:

validateProviderPosition() — checks if a provider is correctly nested
validateProvidersExist() — ensures required providers are in the tree
hasProvider() — queries provider presence
useProviderValidation() — development-only hook that warns if required providers are missing

See src/providers/PROVIDER_ORDER.md for detailed dependency rationale.

Provider Reference:

Provider	Purpose
`AuthProvider`	Holds authentication state; exposes `isAuthenticated`, `login()`, and `logout()` via `useAuth()`. Synchronizes logout events across browser tabs in real-time using the BroadcastChannel API (with storage event fallback for older browsers). When a user logs out in any tab, all other open tabs immediately reflect the logout state without requiring a page refresh.
`TimezoneProvider`	Reads the configured IANA timezone from the backend and supplies it to all children via `useTimezone()`
`ThemeProvider`	Manages light/dark theme selection, supplies the active Fluent UI theme to `FluentProvider`
`NotificationProvider`	Provides notification service via `useNotification()` hook; must wrap error boundaries so they can display error notifications
`NavigationCancellationProvider`	Detects route changes and automatically aborts pending API requests; call `useNavigationAbortSignal()` to get an `AbortSignal` that lives for the current route

Theme (`src/theme/`)

Fluent UI custom theme definitions and design token constants. No component logic — only colours, spacing, and sizing values.

Utils (`src/utils/`)

Pure helper functions with no React or framework dependency. Date formatting, IP display formatting, shared constants, and cryptographic utilities.

Utility	Purpose
`formatDate.ts`	Date/time formatting with IANA timezone support
`formatIp.ts`	IP address display formatting
`crypto.ts`	`sha256Hex(input)` — SHA-256 digest via browser-native `SubtleCrypto` API; used to hash passwords before transmission
`constants.ts`	Frontend constants (time presets, etc.)

4. Data Flow

4.1 Request Lifecycle

Every user action follows this flow through the system:

User Action (click, form submit)
       │
       ▼
   Page / Component
       │  calls hook
       ▼
   Hook (useXxx)
       │  calls API function
       ▼
   API Layer (src/api/)
       │  HTTP request
       ▼
   FastAPI Router (app/routers/)
       │  validates input (Pydantic)
       │  calls Depends() for auth + services
       ▼
   Service (app/services/)
       │  enforces business rules
       │  calls repository or fail2ban client
       ▼
   Repository (app/repositories/)     or     fail2ban Client (app/utils/)
       │  executes SQL query                       │  sends socket command
       ▼                                           ▼
   SQLite Database                             fail2ban Server
       │                                           │
       └──────────── response bubbles back up ─────┘

4.2 Authentication Flow

┌─────────┐     POST /api/auth/login      ┌─────────────┐
│  Login   │ ─────────────────────────────▶│ auth router  │
│  Page    │     { password: "***" }       │              │
└─────────┘                                └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ auth_service  │
                                           │ - verify hash │
                                           │ - create token│
                                           └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ session_repo  │
                                           │ - store token │
                                           └──────┬───────┘
                                                  │
  Set-Cookie: session=<token>                     │
◀─────────────────────────────────────────────────┘

The master password is hashed and stored during setup.
On login, the submitted password is verified against the stored hash.
A session token is created, stored in the database, and returned as an HTTP-only cookie.
Every subsequent request is authenticated via the session cookie using a FastAPI dependency.
The AuthProvider on the frontend guards all routes except /setup and /login.

4.3 fail2ban Communication

BanGUI communicates with fail2ban through its Unix domain socket using the fail2ban client-server protocol.

┌────────────────────┐          ┌──────────────────┐
│  ban_service.py    │          │  fail2ban server  │
│  jail_service.py   │──socket──│                   │
│  config_service.py │          │  /var/run/fail2ban│
│  health_service.py │          │  /fail2ban.sock   │
└────────────────────┘          └──────────────────┘

The fail2ban_client.py utility module wraps this communication:

Opens an async connection to the Unix socket
Serialises commands using the fail2ban protocol (pickle-based, see ./fail2ban-master/fail2ban/client/csocket.py)
Parses responses into typed Python objects
Handles connection errors gracefully (timeout, socket not found, permission denied)

Reference source: The vendored fail2ban source at ./fail2ban-master is included in the repository as an authoritative protocol reference. When implementing or debugging socket communication, consult:

File What it documents

./fail2ban-master/fail2ban/client/csocket.py CSocket class — low-level Unix socket connection, pickle serialisation, CSPROTO.END framing

./fail2ban-master/fail2ban/client/fail2banclient.py Fail2banClient — command dispatch, argument handling, response beautification

./fail2ban-master/fail2ban/client/beautifier.py Response parser — converts raw server replies into human-readable / structured output

./fail2ban-master/fail2ban/protocol.py CSPROTO constants and the full list of supported commands with descriptions

./fail2ban-master/fail2ban/client/configreader.py Config file parsing used by fail2ban — reference for understanding jail/filter structure

File	What it documents
`./fail2ban-master/fail2ban/client/csocket.py`	`CSocket` class — low-level Unix socket connection, pickle serialisation, `CSPROTO.END` framing
`./fail2ban-master/fail2ban/client/fail2banclient.py`	`Fail2banClient` — command dispatch, argument handling, response beautification
`./fail2ban-master/fail2ban/client/beautifier.py`	Response parser — converts raw server replies into human-readable / structured output
`./fail2ban-master/fail2ban/protocol.py`	`CSPROTO` constants and the full list of supported commands with descriptions
`./fail2ban-master/fail2ban/client/configreader.py`	Config file parsing used by fail2ban — reference for understanding jail/filter structure

Key commands used:

Command	Purpose
`status`	Get global server status (number of jails, fail2ban version)
`status <jail>`	Get jail detail (banned IPs, failure count, filter info)
`set <jail> banip <ip>`	Ban an IP in a specific jail
`set <jail> unbanip <ip>`	Unban an IP from a specific jail
`set <jail> idle on/off`	Toggle jail idle mode
`start/stop <jail>`	Start or stop a jail
`reload <jail>`	Reload a single jail configuration
`reload`	Reload all jails
`get <jail> ...`	Read jail settings (findtime, bantime, maxretry, filter, actions, etc.)
`set <jail> ...`	Write jail settings
`set loglevel <level>`	Change server log level
`set logtarget <target>`	Change server log target
`set dbpurgeage <seconds>`	Set database purge age
`flushlogs`	Flush and re-open log files

4.4 fail2ban Database Access

In addition to the live socket, BanGUI reads the fail2ban SQLite database directly for historical data that the socket protocol does not expose (ban history, past log matches). This is read-only access.

history_service.py ──read-only──▶ fail2ban.db (SQLite)

The fail2ban database contains:

bans table — historical ban records (IP, jail, timestamp, ban data)
jails table — jail definitions
logs table — matched log lines per ban

BanGUI queries these tables to power the Ban History page and the per-IP timeline view.

4.5 External API Communication

geo_service.py ──aiohttp──▶ IP Geolocation API (country, ASN, RIR)
blocklist_service.py ──aiohttp──▶ Blocklist URLs (plain-text IP lists)

All external HTTP calls go through a shared aiohttp.ClientSession created during startup and closed during shutdown. External data is validated before use (IP format, response structure).

5. Database Design

BanGUI maintains its own SQLite database (separate from the fail2ban database) to store application state.

5.1 Application Database Tables

Table	Purpose
`settings`	Key-value store for application configuration (master password hash, fail2ban socket path, database path, timezone, session duration)
`sessions`	Active session token hashes with expiry timestamps. Tokens are stored as one-way SHA256 hashes to prevent token hijacking if the database is exposed.
`geo_cache`	Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at, last_seen). Tracks the last time each IP address was referenced to enable retention policies. Entries older than 90 days are automatically purged by the `geo_cache_cleanup` task to prevent unbounded growth. Loaded into memory at startup via `load_cache_from_db()`; new entries are flushed back by the `geo_cache_flush` background task.
`blocklist_sources`	Registered blocklist URLs (id, name, url, enabled, created_at, updated_at)
`import_logs`	Record of every blocklist import run (id, source_id, timestamp, ips_imported, ips_skipped, errors, status)

5.2 Database Boundaries

Database	Owner	BanGUI Access
BanGUI application DB (`bangui.db`)	BanGUI	Read + Write
fail2ban DB (`fail2ban.db`)	fail2ban	Read-only (for history queries)

6. Setup & Configuration Persistence

6.1 Initial Setup Wizard & One-Time Configuration

The setup wizard (POST /api/setup) runs once during first-time startup to configure:

Master password (bcrypt-hashed)
Runtime database path (where BanGUI stores operational state)
fail2ban Unix socket path
IANA timezone
Session duration (in minutes)
Map color thresholds for geolocation visualization

Atomicity & Crash-Safety:

Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:

Phase 1 (Bootstrap DB transaction): Set setup_state = "in_progress" and persist database_path. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up.
Phase 2 (Filesystem + Runtime DB): Initialize runtime database schema outside a transaction (idempotent via CREATE TABLE IF NOT EXISTS).
Phase 3 (Runtime DB transaction): Batch-write all runtime settings (password hash, paths, config) atomically in a single BEGIN IMMEDIATE ... COMMIT transaction. Either all settings are persisted or none are.
Phase 4 (Bootstrap DB transaction): Set setup_state = "complete" and setup_completed = "1". This is the final commit point — only when this succeeds is setup considered complete.

Password Hash Idempotency:

The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.

State Machine:

State	Meaning	Recovery
`null`	Setup not started	Normal flow: begin setup
`"in_progress"`	Bootstrap DB marked, runtime DB being initialized	Retry from beginning (runtime DB may be partial)
`"complete"`	All settings persisted, setup finished	Skip setup (already done)

If a crash is detected in "in_progress" state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.

Backward Compatibility:

The setup_completed = "1" key is still written for backward compatibility with cache detection. Modern code checks setup_state = "complete" for clearer semantics.

8. Authentication & Session Management

Single-user model — one master password, no usernames.
Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
Sessions are token-based, stored server-side in the sessions table as one-way SHA256 hashes, and delivered to the browser as HTTP-only secure cookies.
Session token hashing — Session tokens are hashed before storage to prevent token hijacking if the database file is exposed. Only the hash (token_hash) is stored in the database; the raw token is never persisted. When validating a session, the incoming token is hashed before the database lookup. This ensures the database alone is not sufficient to usurp a session — an attacker would also need knowledge of the original token value.
Session expiry is configurable (set during setup, stored in settings).
The frontend AuthProvider checks session validity on mount and redirects to /login if invalid.
The backend dependencies.py provides an authenticated dependency that validates the session cookie on every protected endpoint.
Session validation cache (InMemorySessionCache in app.utils.session_cache) — validated session tokens are cached in memory for 10 seconds (configurable via session_cache_ttl_seconds) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. ⚠️ This cache is process-local and not safe for multi-worker or distributed deployments. In single-worker mode (enforced by TASK-002), this is safe and improves performance. For multi-worker deployments, replace InMemorySessionCache with a shared backend (Redis, database, shared memory) implementing the SessionCache protocol. See app/utils/session_cache.py module docstring for implementation details.
GeoCache — GeoCache instance is created at startup with a configurable allow_http_fallback flag and stored on app.state.geo_cache. It implements a primary + fallback resolution strategy: (1) try local MaxMind GeoLite2-Country MMDB database (primary, encrypted, no network traffic), (2) if unavailable/no result and allowed, fall back to ip-api.com HTTP API (unencrypted, disabled by default for security). Encapsulates in-memory lookup cache, negative cache for unresolvable IPs (5-minute TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from the geo_cache SQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by the geo_cache_flush background task. Stale entries are re-resolved by the geo_re_resolve task. Injected into routes and tasks via FastAPI's dependency system. See Backend-Development.md § IP Geolocation Resolution for setup and security details.
Runtime state (RuntimeState in app.utils.runtime_state) — stores mutable application state: server_status (fail2ban online/offline), last_activation (jail activation tracking), pending_recovery (crash detection), runtime_settings (effective configuration), and service-specific state holders like jail_service_state (JailServiceState for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., record_activation(), clear_pending_recovery()) and via dependency injection to services. Service-specific state (like JailServiceState) is nested within RuntimeState to keep all mutable state in one controlled location. ⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker. Mutations must not span await points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See app/utils/runtime_state.py module docstring for details.
Setup-completion flag — once is_setup_complete() returns True, the result is stored in app.state._setup_complete_cached. The SetupRedirectMiddleware skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.

8.1 CSRF Protection

State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a custom header check middleware.

Design:

For requests authenticated via the session cookie (not Bearer token), the CsrfMiddleware requires the custom header X-BanGUI-Request: 1 to be present.
The frontend API client automatically includes this header on all requests.
Cross-site fetch() calls cannot set custom headers without CORS preflight, which the backend rejects for non-allowed origins, providing defense-in-depth.
Safe HTTP methods (GET, HEAD, OPTIONS) bypass the check.
Bearer token authentication (via Authorization: Bearer header) bypasses the check because tokens are not CSRF-vulnerable (they are not automatically sent on cross-origin requests).
Requests missing the CSRF header receive a 403 Forbidden response with detail: "CSRF validation failed. Request rejected.".

This mechanism complements the existing SameSite=Lax cookie policy, which blocks traditional <form> POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.

9. Scheduling

APScheduler 4.x (async mode) manages recurring background tasks.

┌──────────────────────┐
│     APScheduler      │
│  (async, in-process) │
├──────────────────────┤
│  blocklist_import    │  ── runs on configured schedule (default: daily 03:00)
│  geo_cache_cleanup   │  ── runs every 24 hours (nightly)
│  geo_cache_flush     │  ── runs every 60 seconds
│  health_check        │  ── runs every 30 seconds
└──────────────────────┘

The scheduler is started during the FastAPI lifespan startup and stopped during shutdown.
Job schedules are persisted in the application database so they survive restarts.
Users can modify the blocklist import schedule through the web interface.
A manual "Run Now" button triggers the blocklist import job outside the schedule.

10.1 Background Tasks and Database Access

APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on Depends(get_db).
Background tasks must open their own application database connection via app.db.open_db and close it when the work completes.
Use a shared task helper (app.tasks.db.task_db) so every task follows the same async context manager pattern and avoids connection leaks.
This pattern is intentional: task code is structurally separate from request-handling dependencies and should not attempt to reuse request-scoped DB connections.

9. API Design

9.1 Conventions

All endpoints are grouped under /api/ prefix.
JSON request and response bodies, validated by Pydantic models.
Authentication via session cookie on all endpoints except /api/setup and /api/auth/login.
Setup-redirect middleware: while no configuration exists, all API endpoints (except /api/setup and /api/health) return 423 Locked with {"detail": "Setup not complete.", "setup_required": true}. This ensures API consumers can detect setup as a distinct condition rather than transparently following redirects.
Standard HTTP status codes: 200 success, 201 created, 204 no content, 400 bad request, 401 unauthorized, 404 not found, 422 validation error, 423 locked, 500 server error.
Error responses follow a consistent shape: { "detail": "Human-readable message" }.

9.2 Endpoint Groups

Group	Endpoints	Description
Auth	`POST /login`, `POST /logout`	Session management
Setup	`POST /setup`	First-run configuration
Dashboard	`GET /status`, `GET /bans`	Overview data for the main page
Jails	`GET /`, `GET /:name`, `POST /:name/start`, `POST /:name/stop`, `POST /:name/reload`, `POST /reload-all`	Jail listing and controls
Bans	`POST /ban`, `POST /unban`, `POST /unban-all`, `GET /banned`	Ban management
Config	`GET /`, `PUT /`, `POST /test-regex`	Configuration viewing and editing
History	`GET /`, `GET /ip/:ip`	Historical ban browsing
Blocklists	`GET /sources`, `POST /sources`, `DELETE /sources/:id`, `POST /import`, `GET /import-log`	Blocklist management
Geo	`GET /lookup/:ip`	IP geolocation and enrichment
Server	`GET /settings`, `PUT /settings`, `POST /flush-logs`	Server-level settings

9. Deployment Architecture

┌──────────────────────────────────────────────────┐
│                   Host Machine                   │
│                                                  │
│  ┌─────────────────────────────────────────────┐ │
│  │  Reverse Proxy (nginx / caddy)              │ │
│  │  - TLS termination                          │ │
│  │  - /api/* → backend (uvicorn)               │ │
│  │  - /*     → frontend (static files)         │ │
│  └──────────────┬───────────────┬──────────────┘ │
│                 │               │                 │
│  ┌──────────────┴───┐  ┌───────┴──────────────┐  │
│  │ Backend           │  │ Frontend             │  │
│  │ uvicorn + FastAPI │  │ Static build (Vite)  │  │
│  │ (port 8000)       │  │ (served by proxy)    │  │
│  └────────┬──────────┘  └──────────────────────┘  │
│           │                                       │
│  ┌────────┴──────────────────────────────────┐    │
│  │  fail2ban (systemd service)               │    │
│  │  Socket: /var/run/fail2ban/fail2ban.sock  │    │
│  │  Database: /var/lib/fail2ban/fail2ban.db  │    │
│  └───────────────────────────────────────────┘    │
└──────────────────────────────────────────────────┘

The backend runs as an ASGI server (uvicorn) behind a reverse proxy.
The frontend is built to static files by Vite and served directly by the reverse proxy.
The backend process needs read access to the fail2ban socket and the fail2ban database.
Both the application database and the fail2ban database reside on the same host.

10.2 nginx Routing Rules

The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior:

Location Block Priority

nginx uses longest-prefix matching to determine which location block handles a request:

Exact matches (location =) — highest priority
Regular expression matches (location ~) — second priority
Prefix matches (location /prefix) — matched in order of specificity (longest first)
Catch-all (location /) — lowest priority

Routing Configuration

Location Block	Rule	Purpose
`location /api/`	`proxy_pass http://backend:8000;` — no `try_files`	Proxy all API requests to FastAPI backend. Any unmatched API route (typos, invalid paths) returns 404 from the backend.
`location /assets/`	`try_files $uri =404;`	Serve static assets with long-term caching. Return 404 if file doesn't exist.
`location /`	`try_files $uri $uri/ /index.html;`	SPA fallback: serve `index.html` for all unmatched routes (client-side routing).

Routing Behavior

Request → /api/some-endpoint
    ↓
    nginx matches location /api/ (longest prefix)
    ↓
    proxy_pass → backend:8000
    ↓
    Backend returns 404 if endpoint doesn't exist (✓ correct)
    Client sees 404, not SPA HTML

Request → /some-page
    ↓
    nginx matches location / (catch-all)
    ↓
    try_files looks for file, then directory, then /index.html
    ↓
    Serves /index.html (React Router handles client-side routing)
    ↓
    Client sees 200 with HTML (✓ correct for SPA)

Request → /api/typos
    ↓
    nginx matches location /api/ (longest prefix, NOT catch-all)
    ↓
    proxy_pass → backend:8000
    ↓
    FastAPI returns 404 (✓ correct, not caught by SPA fallback)

Critical Implementation Notes

Never add try_files to the /api/ location block — this would hide backend 404s.
The /api/ location must come before the / catch-all in the config (this is automatically respected via longest-prefix matching).
No inherited try_files rules — the /api/ location has no global try_files that could affect it.
Backend 404 responses pass through nginx unchanged — nginx does not rewrite 404 responses from the backend.

9.2a nginx Security Headers

nginx adds the following OWASP-recommended security headers to all responses:

Header	Value	Purpose
Content-Security-Policy	`default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; connect-src 'self'; frame-ancestors 'none';`	Prevents XSS attacks by restricting script execution to same-origin. `style-src 'unsafe-inline'` is required for Fluent UI v9's inline styles.
X-Frame-Options	`DENY`	Prevents clickjacking by disallowing iframe embedding.
X-Content-Type-Options	`nosniff`	Prevents MIME-sniffing; browsers must respect the declared Content-Type.
Referrer-Policy	`no-referrer`	Prevents leaking internal URLs in the `Referer` header to third-party resources.
Permissions-Policy	`geolocation=(), microphone=(), camera=()`	Disables access to browser APIs not needed by the application.
Strict-Transport-Security	Commented out	Must only be enabled after HTTPS is fully configured. Uncomment when TLS termination is production-ready.

All headers use the always directive, ensuring they are included in error responses (4xx, 5xx) as well.

CSP and Fluent UI

Fluent UI v9 applies styles via inline style attributes on DOM elements. To support this, style-src 'unsafe-inline' is required. A stricter CSP using nonces would require server-side rendering of the HTML shell, which is outside the current architecture.

9.3 Deployment Constraints

Single-Executor Scheduler Requirement

BanGUI's background scheduler must run with exactly one executor process.

The application uses APScheduler's AsyncIOScheduler, which is bound to a single asyncio event loop and cannot be safely shared across multiple worker processes. If the app is deployed with --workers N (where N > 1), the following failures occur:

Each worker process creates its own independent scheduler instance.
All background jobs execute N times simultaneously (once per worker).
Results:
- Duplicate blocklist imports — the same IP ranges are banned N times.
- Duplicate history entries — the same historical events are recorded N times.
- Duplicate ban operations — bans are executed multiple times, with potential state conflicts.
- SQLite lock contention — concurrent writes to the same database from N workers cause lock timeouts.

Enforcement Mechanism

BanGUI enforces single-executor safety through a database-backed lock that works reliably in container orchestration environments:

Fast check (env var): On startup, the BANGUI_WORKERS environment variable is checked (if set). If explicitly set to a value > 1, startup fails immediately with a clear error.
Authoritative check (database lock): During startup, BanGUI acquires an atomic database lock in the scheduler_lock table. This lock:
- Uses a singleton row (id=1) to prevent race conditions across simultaneously starting instances
- Stores the PID, hostname, creation timestamp, and heartbeat timestamp of the lock holder
- Is considered stale if the heartbeat hasn't been updated for 60 seconds
- Is automatically cleaned up on stale instance detection, allowing failover in rolling deployments
Lock acquisition (startup):
- Clean up any stale locks (heartbeat older than 60 seconds)
- Attempt to insert a new lock row with this instance's PID and hostname
- If the INSERT fails (row already exists), reject startup with a clear error
- If the INSERT succeeds, this instance holds the lock and will start the scheduler
Lock maintenance (runtime): A periodic background task (scheduler_lock_heartbeat) updates the lock's heartbeat timestamp every 10 seconds, keeping it alive and preventing false positives from temporary load spikes.
Lock release (shutdown): On graceful shutdown, the lock is released, allowing other instances to acquire it.

Why database-backed instead of filesystem?

Database-backed locking is more reliable in container orchestration because:

Atomicity: SQLite transactions are atomic — no race condition window between checking and inserting
Container-safe: Works across containers with shared database volumes (no NFS/SMB edge cases)
Stale detection: Heartbeat-based TTL is simpler and more reliable than PID-based checks (PID reuse is common in containers)
No false positives: Timestamp-based expiration eliminates issues with PID reuse

Startup Sequence with Scheduler Lock

1. DATABASE stage
   └─ Initialize SQLite schema (including scheduler_lock table)

2. WORKER_MODE stage (formerly first, now depends on DATABASE)
   ├─ Fast check: Verify BANGUI_WORKERS env var if explicitly set
   └─ Authoritative check: Acquire scheduler lock in database
      → If lock held by another instance: Fail with clear error
      → If lock acquired: Continue to GEO_CACHE stage

3. (rest of startup continues as normal)

Troubleshooting

Problem: Startup fails with "Could not acquire scheduler lock"

Solution:

Verify no other BanGUI instances are running
Inspect the lock: sqlite3 bangui.db "SELECT * FROM scheduler_lock;"
Check who holds the lock (hostname, PID, heartbeat time)

If stale (heartbeat older than 60 seconds), clean it:

sqlite3 bangui.db "DELETE FROM scheduler_lock WHERE (strftime('%s', 'now') - heartbeat_at) > 60;"

Retry the failed instance

Problem: Stale lock after instance crash

BanGUI handles this automatically:

The next instance to start will detect the stale lock (heartbeat older than 60 seconds)
It will clean it up and acquire the lock
The new instance starts the scheduler as normal

No manual intervention is required.

Environment Variables

BANGUI_WORKERS (optional, default: unset)
- If set to 1 or unset: Normal operation (any number of instances may start, but only one holds the lock)
- If set to > 1: Startup fails immediately with an error (fast check)
- Reason: Legacy env var for explicitly forbidding multi-worker deployments

Container Orchestration Examples

Docker Compose:

Single service instance (no scaling) — scheduler runs normally

Kubernetes:

Single Pod replica — scheduler runs normally
Multiple Pod replicas (during rolling update) — old Pod releases lock on shutdown, new Pod acquires it
- No duplicate jobs, no startup failures
- Health check should allow 30-60 seconds for lock handoff

systemd / process manager:

Single process — scheduler runs normally
Accidental multi-process restart — lock prevents duplicate jobs, other processes fail to start scheduler

Future Multi-Worker Support

To safely support multiple workers in the future:

External job store: Move APScheduler from in-memory to a persistent store (e.g., SQLAlchemy-backed job store with PostgreSQL or Redis).
Distributed locking: Use a distributed lock (Redis, etcd) instead of database lock for better performance.
Process coordination: Implement a process-to-worker pool communication mechanism so the scheduler runs only on one designated worker.

Currently, the single-executor approach is simple, maintainable, and sufficient for BanGUI's operational requirements. The database lock provides reliable enforcement across all deployment scenarios.

10. Observability & Distributed Tracing

BanGUI implements distributed tracing via correlation IDs to correlate errors and requests across frontend and backend systems.

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Frontend (React + TypeScript)                               │
├─────────────────────────────────────────────────────────────┤
│ • API Client generates session-scoped UUID4 (correlation ID)│
│ • Telemetry service records structured events               │
│ • Error boundaries catch render errors                      │
│ • All telemetry events include correlation ID for tracing   │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ├─ Every request includes
                     │  X-Correlation-ID header
                     │
┌────────────────────┴────────────────────────────────────────┐
│ Backend (Python + FastAPI + structlog)                      │
├─────────────────────────────────────────────────────────────┤
│ • CorrelationIdMiddleware extracts/generates correlation ID │
│ • All logs automatically include correlation ID              │
│ • Error responses include correlation_id field              │
│ • structlog outputs JSON with correlation ID in all events  │
└─────────────────────────────────────────────────────────────┘

Correlation ID Flow

Frontend → Backend:
- API client generates/retrieves session-scoped UUID4
- UUID4 sent in X-Correlation-ID request header
- All requests use same session UUID (set once, reused)
Backend Processing:
- CorrelationIdMiddleware extracts/generates correlation ID
- ID stored in structlog contextvars
- All structured log entries include correlation ID automatically
- Error responses include correlation_id field in JSON
Backend → Frontend:
- Response includes X-Correlation-ID header
- Error responses include correlation_id in response body
- Frontend error handlers extract correlation ID
Frontend Error Logging:
- Error handlers extract correlation ID from API response
- Telemetry service logs error with correlation ID
- Browser console and telemetry backends receive linked events

Example: Correlating an Error Across Systems

Scenario: User clicks "Ban IP" button → API returns 500 error → error logged and displayed

Frontend telemetry event:

{
  "event": "api_error",
  "severity": "error",
  "message": "Server error banning IP",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "context": {
    "status": 500,
    "endpoint": "/api/bans"
  },
  "timestamp": "2025-04-30T18:30:00.000Z"
}

Backend structured log:

{
  "event": "ban_service_error",
  "severity": "error",
  "message": "Failed to ban IP",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "context": {
    "ip": "192.168.1.1",
    "jail": "sshd",
    "error": "fail2ban socket error"
  },
  "timestamp": "2025-04-30T18:30:00.000Z"
}

Troubleshooting: Engineer searches logs for correlation ID 550e8400-e29b-41d4-a716-446655440000 and finds all related events (request received, jail lookup, fail2ban call, error response) in order.

Implementation Details

Backend:

Middleware: app/middleware/correlation.py
- Generates UUID4 if X-Correlation-ID header missing
- Stores in structlog contextvars for automatic inclusion in all logs
- Adds correlation ID to response header and error responses
All error handlers include correlation_id in ErrorResponse
See backend/app/models/response.py for ErrorResponse.correlation_id field

Frontend:

API client: frontend/src/api/client.ts
- Generates session-scoped UUID4 on first use
- Includes in X-Correlation-ID header for all requests
- Extracts from response headers and stores in ApiError
Telemetry service: frontend/src/utils/telemetry.ts
- Structured event logging with correlation ID support
- Redaction utilities for privacy/security
- Handlers for custom backends (console logger by default)
Error handlers: frontend/src/utils/fetchError.ts
- Extract correlation ID from API errors
- Log with telemetry for distributed tracing
Error boundaries: frontend/src/components/{Error,Page,Section}ErrorBoundary.tsx
- Catch render-time exceptions
- Log with telemetry for observability
- Note: ErrorBoundary.componentDidCatch() accesses errorInfo.componentStack which is not part of the public React.ErrorInfo type definition. This is a React DevTools implementation detail accessed via type casting (as any). It captures the React component hierarchy for debugging but may change in future React versions. See React issue #3623 for context.

Privacy & Security

No sensitive data logged:
- Passwords, tokens, session IDs never logged
- PII (names, emails, IPs) logged only with explicit intent and redaction
- Redaction utilities: telemetry.redact(), telemetry.redactObject()
Backend: Correlation IDs use opaque UUID4 (no user data embedded)
Frontend: Same session UUID for all requests (safe to expose in logs)

Future Enhancements

Backend error telemetry aggregation:
- Send structured logs to observability platform (DataDog, Grafana Loki, etc.)
- Query by correlation ID to trace entire request flow
Frontend error reporting:
- Send frontend telemetry to backend /api/telemetry endpoint
- Store alongside backend logs for unified view
Metrics & dashboards:
- Error rates by endpoint, severity, error type
- Latency percentiles and distribution
- Request success/failure trends

11. Design Principles

These principles govern all architectural decisions in BanGUI.

Principle	Application
Separation of Concerns	Frontend and backend are independent. Backend layers (router → service → repository) never mix responsibilities.
Service Independence	Services must not import other services at the same layer (e.g., `jail_config_service` must not import `jail_service`). Shared logic belongs in the utils layer (`app/utils/`). This prevents circular dependencies, improves testability, and keeps each service focused on its domain.
Single Responsibility	Each module, service, and component has one well-defined job.
Dependency Inversion	Services depend on abstractions (protocols), not concrete implementations. FastAPI `Depends()` wires everything.
Async Everything	All I/O is non-blocking. No synchronous database, HTTP, or socket calls anywhere in the backend.
Validate at the Boundary	Pydantic models validate all data entering the backend. TypeScript types enforce structure on the frontend.
Fail Fast	Configuration is validated at startup. Invalid input is rejected immediately with clear errors.
Composition over Inheritance	Small, focused objects are composed together rather than building deep class hierarchies.
DRY	Shared logic lives in utils, hooks, or base services — never duplicated across modules.
KISS	The simplest correct solution wins. No premature abstractions or over-engineering.
YAGNI	Only build what is needed now. Extend when a real requirement appears.

102 KiB Raw Blame History