Files
BanGUI/Docs/Architekture.md
Lukas 1302ac821f Fix non-atomic setup persistence across DB contexts (Issue #30)
Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.

## Changes

### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
   - New set_settings_batch() method: writes multiple settings in single
     transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
     or none do, preventing partial state if crash occurs mid-batch.

2. **setup_service.py**: Refactor run_setup() with transactional phases
   - Phase 0: Compute password hash early (before any DB writes) to ensure
     idempotency. Same hash is used throughout retries, preventing divergent
     hashes from bcrypt's random salt.
   - Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
     database_path, then commit. First checkpoint for crash detection.
   - Phase 2 (Filesystem): Initialize runtime database (idempotent)
   - Phase 3 (Runtime DB transaction): Batch-write all settings atomically
   - Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
     setup_completed=1. Final commit point.

3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol

### Testing
- Added 6 new transactionality tests covering:
  - State machine transitions (None → in_progress → complete)
  - Password hash idempotency across retries
  - Atomic batch writes (all-or-nothing persistence)
  - Bootstrap DB state tracking
  - Database path propagation to both DBs
  - Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)

### Documentation
- Updated Docs/Architekture.md with new section 6:
  - Setup state machine with state transitions
  - Transaction boundary documentation
  - Password hash idempotency rationale
  - Backward compatibility notes

## Design Decisions

### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
  hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers

### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:19:53 +02:00

79 KiB

BanGUI — Architecture

This document describes the system architecture of BanGUI, a web application for monitoring, managing, and configuring fail2ban. It defines every major component, module, and data flow so that any developer can understand how the pieces fit together before writing code.


1. High-Level Overview

BanGUI is a two-tier web application with a clear separation between frontend and backend, connected through a RESTful JSON API.

┌──────────────────────────────────────────────────────────────────┐
│                          Browser                                 │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                   Frontend (React + Fluent UI)             │  │
│  │  TypeScript · Vite · Single-Page Application               │  │
│  └──────────────────────────┬─────────────────────────────────┘  │
└─────────────────────────────┼────────────────────────────────────┘
                              │  HTTP / JSON (REST API)
┌─────────────────────────────┼────────────────────────────────────┐
│                          Server                                  │
│  ┌──────────────────────────┴─────────────────────────────────┐  │
│  │                   Backend (FastAPI)                        │  │
│  │  Python 3.12+ · Async · Pydantic v2 · structlog            │  │
│  └─────┬──────────────┬──────────────┬────────────────────────┘  │
│        │              │              │                           │
│  ┌─────┴─────┐  ┌─────┴─────┐  ┌────┴─────┐                      │
│  │  SQLite   │  │ fail2ban  │  │ External │                      │
│  │  (App DB) │  │  (Socket) │  │   APIs   │                      │
│  └───────────┘  └───────────┘  └──────────┘                      │
└──────────────────────────────────────────────────────────────────┘

Component Summary

Component Technology Purpose
Frontend TypeScript, React, Fluent UI v9, Vite User interface — displays data, captures user input, communicates with the backend API
Backend Python 3.12+, FastAPI, Pydantic v2, aiosqlite Business logic, data persistence, fail2ban communication, scheduling
Application Database SQLite (via aiosqlite) Stores BanGUI's own data: configuration, session state, blocklist sources, import logs
fail2ban Unix domain socket The monitored service — BanGUI reads status, issues commands, and reads the fail2ban database
MaxMind GeoLite2 Offline MMDB file (mounted into container) IP geolocation (primary resolver) — local, encrypted
External APIs HTTP (via aiohttp) Blocklist downloads; IP geolocation fallback (only if MMDB unavailable and HTTP fallback enabled)

2. Backend Architecture

The backend follows a layered architecture with strict separation of concerns. Dependencies flow inward: routers depend on services, services depend on repositories — never the reverse.

                ┌─────────────────────────────────┐
                │        FastAPI Application       │
                │          (main.py)               │
                └──────────┬──────────────────────-┘
                           │
          ┌────────────────┼────────────────┐
          │                │                │
    ┌─────┴──────┐  ┌─────┴──────┐  ┌──────┴──────┐
    │  Routers   │  │   Tasks    │  │   Config    │
    │  (HTTP)    │  │ (Scheduled)│  │ (Settings)  │
    └─────┬──────┘  └─────┬──────┘  └─────────────┘
          │               │
    ┌─────┴───────────────┴──────┐
    │         Services           │
    │     (Business Logic)       │
    └─────┬──────────────┬───────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │Repositories│ │  External  │
    │ (Database) │ │  Clients   │
    └─────┬──────┘ └─────┬──────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │  SQLite    │ │fail2ban /  │
    │            │ │HTTP APIs   │
    └────────────┘ └────────────┘

2.1 Project Structure

backend/
├── app/
│   ├── __init__.py
│   ├── `main.py`                # FastAPI app factory, lifespan, exception handlers
│   ├── `config.py`              # Pydantic settings (env vars, .env loading)
│   ├── `db.py`                  # Database connection and initialization
│   ├── `exceptions.py`          # Shared domain exception classes; all services and routers import from here
│   ├── `dependencies.py`        # FastAPI Depends() providers (DB, services, auth)
│   ├── `models/`                # Pydantic schemas
│   │   ├── auth.py            #   Login request/response, session models
│   │   ├── ban.py             #   Ban request/response/domain models
│   │   ├── jail.py            #   Jail request/response/domain models
│   │   ├── config.py          #   Configuration view/edit models
│   │   ├── blocklist.py       #   Blocklist source/import models
│   │   ├── history.py         #   Ban history models
│   │   ├── server.py          #   Server status, health check models
│   │   └── setup.py           #   Setup wizard models
│   ├── routers/               # FastAPI routers (HTTP layer only)
│   │   ├── auth.py            #   POST /api/auth/login, POST /api/auth/logout
│   │   ├── setup.py           #   POST /api/setup (first-run configuration)
│   │   ├── dashboard.py       #   GET /api/dashboard/status, GET /api/dashboard/bans
│   │   ├── jails.py           #   CRUD + controls for jails
│   │   ├── bans.py            #   Ban/unban actions, currently banned list
│   │   ├── config.py          #   View/edit fail2ban configuration
│   │   ├── history.py         #   Historical ban queries
│   │   ├── blocklist.py       #   Blocklist source management, manual import trigger
│   │   ├── geo.py             #   IP geolocation and lookup
│   │   └── server.py          #   Server settings (log level, DB purge, etc.)
│   ├── services/              # Business logic (one service per domain)
│   │   ├── auth_service.py    #   Password verification, session creation/validation
│   │   ├── setup_service.py   #   First-run setup logic, configuration persistence
│   │   ├── jail_service.py    #   Jail listing, start/stop/reload, status aggregation
│   │   ├── ban_service.py     #   Ban/unban execution, currently-banned queries
│   │   ├── config_service.py  #   Read/write fail2ban config, regex validation
│   │   ├── config_file_service.py #   Shared config parsing and file-level operations
│   │   ├── raw_config_io_service.py #   Raw config file I/O wrapper
│   │   ├── jail_config_service.py #   jail config activation/deactivation logic
│   │   ├── filter_config_service.py #   filter config lifecycle management
│   │   ├── action_config_service.py #   action config lifecycle management
│   │   ├── log_service.py     #   Log preview and regex test operations
│   │   ├── fail2ban_metadata_service.py #   Resolve and cache the fail2ban SQLite DB path via the fail2ban socket
│   │   ├── history_service.py #   Historical ban queries, per-IP timeline
│   │   ├── blocklist_service.py # Orchestration: source CRUD, scheduling, import triggers
│   │   ├── blocklist_downloader.py #   HTTP download with retry logic
│   │   ├── blocklist_parser.py #   Parse and validate IP addresses
│   │   ├── blocklist_ban_executor.py #   Ban execution with error handling
│   │   ├── blocklist_import_workflow.py #   Import orchestration (coordinates components)
│   │   ├── geo_service.py     #   IP-to-country resolution, ASN/RIR lookup
│   │   ├── server_service.py  #   Server settings, log management, DB purge
│   │   └── health_service.py  #   fail2ban connectivity checks, version detection
│   ├── repositories/          # Data access layer (raw queries only)
│   │   ├── settings_repo.py   #   App configuration CRUD in SQLite
│   │   ├── session_repo.py    #   Session storage and lookup
│   │   ├── blocklist_repo.py  #   Blocklist sources and import log persistence│  │   ├── fail2ban_db_repo.py #   fail2ban SQLite ban history read operations
│  │   ├── geo_cache_repo.py  #   IP geolocation cache persistence│   │   └── import_log_repo.py #   Import run history records
│   ├── tasks/                 # APScheduler background jobs
│   │   ├── blocklist_import.py#   Scheduled blocklist download and application
│   │   ├── geo_cache_flush.py #   Periodic geo cache persistence (dirty-set flush to SQLite)│  │   ├── geo_cache_cleanup.py #   Periodic purge of stale geo cache entries
│   │   ├── geo_re_resolve.py  #   Periodic re-resolution of stale geo cache records│   │   └── health_check.py   #   Periodic fail2ban connectivity probe
│   └── utils/                 # Helpers, constants, shared types
│       ├── fail2ban_client.py #   Async wrapper around the fail2ban socket protocol
│       ├── fail2ban_response.py #   Canonical response parsing: ok(), to_dict(), ensure_list(), is_not_found_error()
│       ├── fail2ban_db_utils.py #   fail2ban database query helpers
│       ├── ip_utils.py        #   IP/CIDR validation and normalisation
│       ├── time_utils.py      #   Timezone-aware datetime helpers
│       ├── config_file_utils.py #   fail2ban config file I/O
│       ├── conffile_parser.py #   fail2ban config file parser/serializer
│       ├── config_parser.py   #   Structured config object parser
│       ├── config_writer.py   #   Atomic config file write operations
│       ├── jail_config.py     #   Jail config helper
│       └── constants.py       #   Shared constants (default paths, limits, etc.)
├── tests/
│   ├── conftest.py            # Shared fixtures (test app, client, mock DB)
│   ├── test_routers/          # One test file per router
│   ├── test_services/         # One test file per service
│   └── test_repositories/     # One test file per repository
├── pyproject.toml
└── .env.example

2.2 Module Purposes

Routers (app/routers/)

The HTTP interface layer. Each router maps URL paths to handler functions. Routers parse and validate incoming requests using Pydantic models, delegate all logic to services, and return typed responses. They contain zero business logic.

Router Prefix Purpose
auth.py /api/auth Login (password check), logout, session validation
setup.py /api/setup First-run wizard — save initial configuration
dashboard.py /api/dashboard Server status bar data, recent bans for the dashboard
jails.py /api/jails List jails, jail detail, start/stop/reload/idle controls
bans.py /api/bans Ban an IP, unban an IP, unban all, list currently banned IPs
config.py /api/config Read and write fail2ban jail/filter/server configuration via the socket; also serves the fail2ban log tail and service status for the Log tab
file_config.py /api/config Read and write fail2ban config files on disk (jail.d/, filter.d/, action.d/) — list, get, and overwrite raw file contents, toggle jail enabled/disabled
history.py /api/history Query historical bans, per-IP timeline
blocklist.py /api/blocklists CRUD blocklist sources, trigger import, view import logs
geo.py /api/geo IP geolocation lookup, ASN and RIR data
server.py /api/server Log level, log target, DB path, purge age, flush logs
health.py /api/health fail2ban connectivity health check and status

Services (app/services)

The business logic layer. Services orchestrate operations, enforce rules, and coordinate between repositories, the fail2ban client, and external APIs. Each service covers a single domain.

Service Layer Responsibilities:

Services must be independent of HTTP concerns. They work with domain models (DTOs), not response models. This ensures:

  • Domain logic can evolve without affecting API shape
  • Services are reusable across different frontends
  • Testing is simpler (no mocking HTTP response types)
  • Changes to endpoint responses don't require service changes

Domain Models and Response Mapping:

Services return domain models (e.g., DomainActiveBanList, DomainBansByCountry) that represent pure business logic. Response models (e.g., ActiveBanListResponse, BansByCountryResponse) are defined in app/models/ and used only by routers.

Conversion happens at the router boundary:

  1. Router calls service → receives domain model
  2. Router calls mapper function to convert domain model → response model
  3. Router returns response model to HTTP client

Example:

# In ban_service.py
async def get_active_bans(...) -> DomainActiveBanList:
    """Service returns domain model (not HTTP-aware)."""
    ...

# In routers/bans.py (router boundary)
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Mapper functions live in app/mappers/ and are thin, mechanical translations between structures.

Motivation:

  • The Fail2ban domain doesn't care about field names like country_code (snake_case) vs countryCode (camelCase)
  • If the API needs pagination metadata added to the response, only the mapper changes
  • If repositories change their output schema, only services need updating (routers are unaffected)
  • Services can be tested with simple dataclasses; no need for Pydantic serialization overhead
Service Purpose
auth_service.py Hashes and verifies the master password, creates and validates session tokens, enforces session expiry
setup_service.py Validates setup input, persists initial configuration, ensures setup runs only once
jail_service.py Retrieves jail list and details from fail2ban, aggregates metrics (banned count, failure count), sends start/stop/reload/idle commands
ban_service.py Executes ban and unban commands via the fail2ban socket, queries the currently banned IP list, validates IPs before banning
config_service.py Reads active jail and filter configuration from fail2ban, writes configuration changes, validates regex patterns, triggers reload; reads the fail2ban log file tail and queries service status for the Log tab
file_config_service.py Reads and writes raw fail2ban config files on disk (jail.d/, filter.d/, action.d/); lists files, reads content, overwrites files, toggles enabled/disabled
jail_config_service.py Discovers inactive jails by parsing jail.conf / jail.local / jail.d/*; writes .local overrides to activate/deactivate jails; triggers fail2ban reload; validates jail configurations
filter_config_service.py Discovers available filters by scanning filter.d/; reads, creates, updates, and deletes filter definitions; assigns filters to jails
action_config_service.py Discovers available actions by scanning action.d/; reads, creates, updates, and deletes action definitions; assigns actions to jails
config_file_service.py Shared utilities for configuration parsing and manipulation: parses config files, validates names/IPs, manages atomic file writes, probes fail2ban socket
raw_config_io_service.py Low-level file I/O for raw fail2ban config files
fail2ban_metadata_service.py Resolves the fail2ban SQLite database path by querying the fail2ban socket and caches the result for reuse across services
log_service.py Log preview and regex test operations (extracted from config_service)
history_service.py Queries the fail2ban database for historical ban records, builds per-IP timelines, computes ban counts and repeat-offender flags, and syncs new records into BanGUI's archive table
blocklist_service.py Orchestration layer for blocklist imports. Delegates to focused components: BlocklistDownloader (HTTP download with retry), BlocklistParser (IP validation), BanExecutor (fail2ban integration), and BlocklistImportWorkflow (orchestrates the flow). Maintains public API for source CRUD, preview, scheduling, and import triggers.
geo_cache.py GeoCache class that encapsulates all IP geolocation caching: resolves IP addresses to country, ASN, and organization using a primary local MaxMind GeoLite2-Country database (if available) with optional HTTP fallback to ip-api.com (disabled by default for security). Maintains in-memory and persistent caches with negative cache support, and manages background re-resolution. Instantiated once at startup with allow_http_fallback flag and stored on app.state.geo_cache
geo_service.py (Deprecated) Backward-compatibility wrappers that delegate to the GeoCache instance. Kept for compatibility with existing code. New code should use GeoCache directly or via dependency injection
server_service.py Reads and writes fail2ban server-level settings (log level, log target, syslog socket, DB location, purge age)
health_service.py Probes fail2ban socket connectivity, retrieves server version and global stats, reports online/offline status
Blocklist Import Architecture

The blocklist import flow has been refactored to separate concerns into focused components:

blocklist_service.py (Public API)
    │
    ├─ import_source() ──┐
    │                    │
    └─ import_all()      ├──> BlocklistImportWorkflow (Orchestrator)
                         │         │
                         │         ├──> BlocklistDownloader
                         │         │       • HTTP GET with retry logic
                         │         │       • Exponential backoff (429, 5xx)
                         │         │       • Timeout handling
                         │         │
                         │         ├──> BlocklistParser
                         │         │       • Parse text to IP lines
                         │         │       • Validate IPv4/IPv6 addresses
                         │         │       • Skip CIDRs and malformed entries
                         │         │
                         │         ├──> BanExecutor
                         │         │       • Ban each IP via fail2ban socket
                         │         │       • Abort on JailNotFoundError
                         │         │       • Continue on individual ban failures
                         │         │
                         │         └──> Geo pre-warming
                         │               (optional batch lookup for newly banned IPs)
                         │
                         └──> Result logging (import_log_repo)

Component Responsibilities:

  • BlocklistDownloader: Handles HTTP transport concerns (retries, timeouts, backoff)
  • BlocklistParser: Handles parsing and validation logic (clean, testable, no I/O)
  • BanExecutor: Handles fail2ban integration with error aggregation
  • BlocklistImportWorkflow: Coordinates the flow, handles result aggregation and geo pre-warming
  • blocklist_service.py: Maintains public API (source CRUD, scheduling, import triggers)

Benefits of This Architecture:

  • Each component is independently testable with mock dependencies
  • Error handling is clear: JailNotFoundError stops processing, JailOperationError continues
  • Components can be evolved independently (e.g., replace HTTP client, add batch validation)
  • Logging is contextual and tied to the appropriate layer
  • Retry logic and transient error handling are isolated

DNS-Rebinding Protection

The Vulnerability:

A DNS-rebinding attack exploits a time-of-check-to-time-of-use (TOCTOU) window between when a blocklist URL is validated and when it is actually fetched:

  1. User adds blocklist URL http://attacker.com/blocklist.txt
  2. blocklist_service.create_source() calls validate_blocklist_url() which performs DNS resolution
  3. attacker.com resolves to a public IP (attacker's real server) — validation passes ✓
  4. Later, when BlocklistDownloader fetches the URL, the attacker's DNS server responds with 192.168.1.1
  5. The HTTP client connects to the private IP, potentially accessing internal services

The Protection:

BanGUI closes this window by adding a second DNS-rebinding check at connection time:

  1. Create-time validation (app/utils/ip_utils.py:validate_blocklist_url): Confirms the URL resolves to a public IP when created
  2. Connection-time validation (app/services/dns_validated_connector.py): Validates that all resolved IPs are public when the actual HTTP connection is made

The HTTP session is created with a custom socket factory that intercepts DNS resolution results before socket creation. If any resolved IP is private or reserved, the connection is rejected with a clear error.

Implementation:

  • app/services/dns_validated_connector.py: Provides create_dns_validated_socket_factory() which returns a socket factory that validates IPs using is_private_ip()
  • app/startup.py:_create_http_session(): Passes the socket factory to aiohttp.TCPConnector, protecting all HTTP requests globally
  • All blocklist imports automatically inherit this protection through the shared session

Protected IP Ranges:

The validation blocks all RFC 1918 private ranges, loopback, link-local, ULA, multicast, and reserved addresses:

  • IPv4: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 224.0.0.0/4, 240.0.0.0/4, 255.255.255.255/32
  • IPv6: ::1/128, fe80::/10, fc00::/7, ff00::/8, and others (via ipaddress.IPv6Address.is_private, etc.)

Reference:

Startup DAG (app/startup_dag.py, app/startup.py)

The startup process is orchestrated by an explicit Directed Acyclic Graph (DAG) that defines all resource initialization stages, their dependencies, health checks, and rollback strategy. This replaces implicit ordering with explicit, documented prerequisites.

Why This Exists:

Previously, startup resources were created in a procedural sequence without documented dependencies. If a stage was reordered or a prerequisite was missed, initialization could fail in non-obvious ways. Partial failures could leave stale resources (open database connections, HTTP sessions, running schedulers) that prevented clean rollback.

Startup Stages (in order):

1. WORKER_MODE
   └─ Validates that BANGUI_WORKERS=1 (scheduler cannot run in multiple workers)

2. DATABASE
   ├─ Prerequisite: WORKER_MODE
   ├─ Creates database directory
   ├─ Initializes database schema
   ├─ Caches setup completion state
   └─ Loads persisted runtime settings

3. GEO_CACHE
   ├─ Prerequisite: DATABASE
   ├─ Loads IP geolocation cache from database
   ├─ Counts unresolved IPs
   ├─ Initializes MaxMind GeoLite2 database
   └─ Configures HTTP fallback (if enabled)

4. HTTP_SESSION
   ├─ Prerequisite: GEO_CACHE
   ├─ Creates aiohttp.ClientSession
   └─ Configures timeouts and connection limits

5. SCHEDULER
   ├─ Prerequisite: HTTP_SESSION
   ├─ Creates APScheduler AsyncIOScheduler
   └─ Starts the scheduler

6. TASKS
   ├─ Prerequisite: SCHEDULER
   ├─ Registers health_check task (fail2ban connectivity probe)
   ├─ Registers blocklist_import task (scheduled imports)
   ├─ Registers geo_cache_cleanup task (stale entry purge)
   ├─ Registers geo_cache_flush task (periodic persistence)
   ├─ Registers geo_re_resolve task (stale record re-resolution)
   ├─ Registers history_sync task (ban history sync)
   └─ Registers session_cleanup task (expired session purge)

Failure Mode & Rollback:

If any stage fails:

  1. All completed stages are rolled back in reverse order (Tasks → Scheduler → HTTP_SESSION → GEO_CACHE → DATABASE → WORKER_MODE)
  2. Each rollback suppresses exceptions to ensure all resources are cleaned up
  3. Database connections are closed
  4. HTTP sessions are closed
  5. The scheduler is shut down
  6. The application startup fails with a clear error message

Health Checks:

After all stages complete, a final health check verifies:

  • All resources have initialized successfully
  • Resources pass their individual health_check() methods
  • No failures occurred during any stage

Implementation:

  • StartupDAG: Orchestrates the entire flow, manages prerequisites, and handles failures
  • StartupStage: Enum defining the 6 startup stages
  • StageDependency: Defines stage metadata (description, prerequisites, rollback policy)
  • StartupContext: Tracks registered resources, completed stages, and failure state
  • startup_shared_resources(): Main entry point that builds and executes the DAG
  • stage*(): Functions that implement each stage's initialization logic

Example Usage in Tests:

# Test that a stage with missing prerequisites fails
dag = StartupDAG()
dag.register_stage(StartupStage.HTTP_SESSION, "Create HTTP session", 
                   prerequisites=frozenset([StartupStage.DATABASE]))
dag.register_stage(StartupStage.SCHEDULER, "Create scheduler")

async def http_session_func():
    return aiohttp.ClientSession()

# This will raise RuntimeError because DATABASE hasn't completed
await dag.execute_stage(StartupStage.HTTP_SESSION, http_session_func)

Mappers (app/mappers/)

The response mapping layer. Mappers convert domain models (returned by services) to response models (consumed by HTTP routers). This layer enforces the separation between business logic and API shape.

Location: app/mappers/

Responsibilities:

  • Convert service domain models to API response models
  • Mechanical, thin translation — no business logic
  • Used exclusively at the router boundary

Pattern:

Each domain model has a corresponding mapper function:

# Domain model (from service)
DomainActiveBan  map_domain_active_ban_to_response()  ActiveBan (response)

# Service returns domain models:
async def get_active_bans(...) -> DomainActiveBanList

# Router converts at the boundary:
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Why separate?

When API requirements change (e.g., new field added, field renamed), only:

  1. Response model in app/models/ changes
  2. Mapper function in app/mappers/ updates
  3. Routers stay the same
  4. Services don't change

Without this layer, changes to API shape would require modifying services and their tests.

Repositories (app/repositories/)

The data access layer. Repositories execute raw SQL queries against the application SQLite database. They return plain data or domain models — they never raise HTTP exceptions or contain business logic.

Repository Purpose
settings_repo.py CRUD operations for application settings (master password hash, DB path, fail2ban socket path, preferences)
session_repo.py Store, retrieve, and delete session records for authentication
blocklist_repo.py Persist blocklist source definitions (name, URL, enabled/disabled)
fail2ban_db_repo.py Read historical ban records from the fail2ban SQLite database
geo_cache_repo.py Persist and query IP geo resolution cache
import_log_repo.py Record import run results (timestamp, source, IPs imported, errors) for the import log view

Every repository in app/repositories/ has a corresponding protocol in app/repositories/protocols.py, including settings_repo.py and history_archive_repo.py.

Models (app/models/)

Pydantic schemas that define data shapes and validation. Models are split into three categories per domain.

Model file Purpose
auth.py Login/request and session models
ban.py Ban creation and lookup models
blocklist.py Blocklist source and import log models
config.py Fail2ban config view/edit models
file_config.py Raw config file read/write models
geo.py Geo and ASN lookup models
history.py Historical ban query and timeline models
jail.py Jail listing and status models
server.py Server status and settings models
setup.py First-run setup wizard models

Tasks (app/tasks/)

APScheduler background jobs that run on a schedule without user interaction.

Task Purpose
blocklist_import.py Downloads all enabled blocklist sources, validates entries, applies bans, records results in the import log
geo_cache_cleanup.py Periodically removes entries from the geo_cache table that have not been referenced in the configured retention period (default: 90 days). Prevents unbounded database growth.
geo_cache_flush.py Periodically flushes newly resolved IPs from the in-memory dirty set to the geo_cache SQLite table (default: every 60 seconds). GET requests populate only the in-memory cache; this task persists them without blocking any request.
geo_re_resolve.py Periodically re-resolves stale entries in geo_cache to keep geolocation data fresh
health_check.py Periodically pings the fail2ban socket and updates the cached server status so the frontend always has fresh data
history_sync.py Periodically copies new records from the fail2ban SQLite database into BanGUI's history_archive table; delegates the sync algorithm to history_service.py
session_cleanup.py Periodically removes expired sessions from the sessions SQLite table (default: every 6 hours). Without this cleanup, the table grows unbounded and degrades query performance.

Utils (app/utils/)

Pure helper modules with no framework dependencies.

Module Purpose
fail2ban_client.py Async client that communicates with fail2ban via its Unix domain socket — sends commands and parses responses using the fail2ban protocol. Modelled after ./fail2ban-master/fail2ban/client/csocket.py and ./fail2ban-master/fail2ban/client/fail2banclient.py.
jail_socket.py Low-level jail reload operations (reload_all) extracted to break service dependencies. Used by jail_service, jail_config_service, action_config_service, and filter_config_service to avoid circular imports between sibling services.
ip_utils.py Validates IPv4/IPv6 addresses and CIDR ranges using the ipaddress stdlib module, normalises formats
jail_utils.py Jail helper functions for configuration and status inference
jail_config.py Jail config parser and serializer for fail2ban config manipulation
time_utils.py Timezone-aware datetime construction, formatting helpers, time-range calculations
log_utils.py Structured log formatting and enrichment helpers
conffile_parser.py Parses Fail2ban .conf files into structured objects and serialises back to text
config_parser.py Builds structured config objects from file content tokens
config_writer.py Atomic config file writes, backups, and safe replace semantics
config_file_utils.py Common file-level config utility helpers
fail2ban_db_utils.py Fail2ban DB path discovery and ban-history parsing helpers
setup_utils.py Setup wizard helper utilities
constants.py Shared constants: default socket path, default database path, time-range presets, parser truthy values, limits

Configuration (app/config.py)

A single Pydantic settings model that loads all configuration from environment variables (prefixed BANGUI_) and an optional .env file. Validated at startup — the application refuses to start if required values are missing.

Dependencies (app/dependencies.py)

FastAPI Depends() providers that inject shared resources into route handlers: the database connection, service instances, the authenticated session, and the fail2ban client. This is the wiring layer that connects routers to services without tight coupling.

Application Entry Point (app/main.py)

The FastAPI app factory. Responsibilities:

  • Creates the FastAPI instance with metadata (title, version, docs URL)
  • Registers the lifespan context manager (startup: open DB, create aiohttp session, start scheduler; shutdown: close all)
  • Mounts all routers
  • Registers global exception handlers that map domain exceptions to HTTP status codes
  • Applies the setup-redirect middleware (returns 423 Locked for all API requests when no configuration exists, except for /api/setup and /api/health)

3. Frontend Architecture

The frontend is a React single-page application built with TypeScript, Vite, and Fluent UI v9. It communicates exclusively with the backend REST API — it never accesses fail2ban, the database, or external services directly.

┌──────────────────────────────────────────────────────────────┐
│                     React Application                        │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Pages   │───▶│ Components │───▶│   Fluent UI v9   │    │
│   └────┬─────┘    └────────────┘    └──────────────────┘    │
│        │                                                     │
│   ┌────┴─────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Hooks   │───▶│  API Layer │───▶│  Backend (REST)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │Providers │    │   Types    │    │     Theme        │    │
│   │(Context) │    │(Interfaces)│    │(Tokens, Styles)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
└──────────────────────────────────────────────────────────────┘

3.1 Project Structure

frontend/
├── public/
├── src/
│   ├── api/                   # API client and per-domain request functions
│   │   ├── client.ts          #   Central fetch wrapper (typed GET/POST/PUT/DELETE)
│   │   ├── endpoints.ts       #   API path constants
│   │   ├── auth.ts            #   Login, logout, session check
│   │   ├── dashboard.ts       #   Dashboard status and ban list
│   │   ├── jails.ts           #   Jail CRUD and controls
│   │   ├── bans.ts            #   Ban/unban actions, banned list
│   │   ├── config.ts          #   Configuration read/write
│   │   ├── history.ts         #   Ban history queries
│   │   ├── blocklist.ts       #   Blocklist source management
│   │   ├── geo.ts             #   IP lookup / geolocation
│   │   └── server.ts          #   Server settings
│   ├── assets/                # Static images, fonts, icons
│   ├── components/            # Reusable UI components
│   │   ├── BanTable.tsx       #   Data table for ban entries
│   │   ├── JailCard.tsx       #   Summary card for a jail
│   │   ├── StatusBar.tsx      #   Server status indicator strip
│   │   ├── TimeRangeSelector.tsx # Quick preset picker (24h, 7d, 30d, 365d)
│   │   ├── IpInput.tsx        #   IP address input with validation
│   │   ├── RegexTester.tsx    #   Side-by-side regex match preview
│   │   ├── WorldMap.tsx       #   Country-outline map with ban counts
│   │   ├── ImportLogTable.tsx #   Blocklist import run history
│   │   ├── ConfirmDialog.tsx  #   Reusable confirmation modal
│   │   ├── RequireAuth.tsx    #   Route guard: redirects unauthenticated users to /login
│   │   ├── SetupGuard.tsx     #   Route guard: redirects to /setup if setup incomplete
│   │   └── ...                #   (additional shared components)
│   ├── hooks/                 # Custom React hooks (stateful logic + API calls)
│   │   ├── useAuth.ts         #   Login state, login/logout actions
│   │   ├── useBans.ts         #   Fetch ban list for a time range
│   │   ├── useJails.ts        #   Fetch jail list and details
│   │   ├── useConfig.ts       #   Fetch and update configuration
│   │   ├── useHistory.ts      #   Fetch historical ban data
│   │   ├── useBlocklists.ts   #   Fetch and manage blocklist sources
│   │   ├── useServerStatus.ts #   Poll server health / status
│   │   └── useGeo.ts          #   IP lookup hook
│   ├── layouts/               # Page-level layout wrappers
│   │   └── AppLayout.tsx      #   Sidebar navigation + header + content area
│   ├── pages/                 # Route-level page components (one per route)
│   │   ├── SetupPage.tsx      #   First-run wizard
│   │   ├── LoginPage.tsx      #   Password prompt
│   │   ├── DashboardPage.tsx  #   Ban overview, status bar
│   │   ├── WorldMapPage.tsx   #   Geographical ban map
│   │   ├── JailsPage.tsx      #   Jail list, detail, controls, ban/unban
│   │   ├── ConfigPage.tsx     #   Configuration viewer/editor
│   │   ├── HistoryPage.tsx    #   Ban history browser
│   │   └── BlocklistPage.tsx  #   Blocklist source management + import log
│   ├── providers/             # React context providers
│   │   ├── AuthProvider.tsx   #   Authentication state and guards
│   │   └── ThemeProvider.tsx  #   Light/dark theme switching
│   ├── theme/                 # Fluent UI theme definitions
│   │   ├── customTheme.ts     #   Brand colour ramp, light and dark themes
│   │   └── tokens.ts          #   Spacing, sizing, and z-index constants
│   ├── types/                 # Shared TypeScript interfaces
│   │   ├── auth.ts            #   LoginRequest, SessionInfo
│   │   ├── ban.ts             #   Ban, BanListResponse, BanRequest
│   │   ├── jail.ts            #   Jail, JailDetail, JailListResponse
│   │   ├── config.ts          #   ConfigSection, ConfigUpdateRequest
│   │   ├── history.ts         #   HistoryEntry, IpTimeline
│   │   ├── blocklist.ts       #   BlocklistSource, ImportLogEntry
│   │   ├── geo.ts             #   GeoInfo, AsnInfo
│   │   ├── server.ts          #   ServerStatus, ServerSettings
│   │   └── api.ts             #   ApiError, PaginatedResponse
│   ├── utils/                 # Pure helper functions
│   │   ├── formatDate.ts      #   Date/time formatting with timezone support
│   │   ├── formatIp.ts        #   IP display formatting
│   │   ├── crypto.ts          #   Browser-native SHA-256 helper (SubtleCrypto)
│   │   └── constants.ts       #   Frontend constants (time presets, etc.)
│   ├── App.tsx                # Root: FluentProvider + BrowserRouter + routes
│   ├── main.tsx               # Vite entry point
│   └── vite-env.d.ts          # Vite type shims
├── tsconfig.json
├── vite.config.ts
└── package.json

3.2 Module Purposes

Pages (src/pages/)

Top-level route components. Each page composes layout, components, and hooks to create a full screen. Pages contain no business logic — they orchestrate what is displayed and delegate data fetching to hooks.

Page Route Purpose
SetupPage /setup First-run wizard: set master password, database path, fail2ban connection, preferences
LoginPage /login Single-field password prompt; redirects to requested page after success
DashboardPage / Server status bar, ban list table, time-range selector
WorldMapPage /map World map with per-country ban counts, country filter
JailsPage /jails Jail overview list, jail detail panel, controls (start/stop/reload), ban/unban forms, IP lookup, whitelist management
ConfigPage /config View and edit jail parameters, filter regex, server settings, regex tester, add log observation
HistoryPage /history Browse all past bans, filter by jail/IP/time, per-IP timeline drill-down
BlocklistPage /blocklists Manage blocklist sources, schedule configuration, import log, manual import trigger

Components (src/components/)

Reusable UI building blocks. Components receive data via props, emit changes via callbacks, and never call the API directly. Built exclusively with Fluent UI v9 components.

Component Purpose
StatusBar Displays fail2ban server status (online/offline, version, jail count, total bans)
BanTable Sortable data table for ban entries with columns for time, IP, jail, country, etc.
JailCard Summary card showing jail name, status badge, key metrics
TimeRangeSelector Quick-preset picker for filtering data (24h, 7d, 30d, 365d)
IpInput IP address text field with inline validation
WorldMap SVG/Canvas country-outline map with count overlays and click-to-filter
RegexTester Side-by-side sample log + regex input with live match highlighting
ImportLogTable Table displaying blocklist import history
ConfirmDialog Reusable Fluent UI Dialog for destructive action confirmations
RequireAuth Route guard: renders children only when authenticated; otherwise redirects to /login?next=<path>
SetupGuard Route guard: checks GET /api/setup on mount and redirects to /setup if not complete; shows a spinner while loading
config/ConfigListDetail Reusable two-pane master/detail layout used by the Jails, Filters, and Actions config tabs. Left pane lists items with active/inactive badges (active sorted first, keyboard navigable); right pane renders the selected item's detail content. Collapses to a dropdown on narrow screens.
config/RawConfigSection Collapsible section that lazily loads the raw text of a config file into a monospace textarea. Provides a Save button backed by a configurable save callback; shows idle/saving/saved/error feedback. Used by all three config tabs.
config/AutoSaveIndicator Small inline indicator showing the current save state (idle, saving, saved, error) for form fields that auto-save on change.

Hooks (src/hooks/)

Encapsulate all stateful logic, side effects, and API calls. Components and pages consume hooks to stay declarative.

Hook Purpose
useAuth Manages login state, provides login(), logout(), and isAuthenticated
useBans Fetches ban list for a given time range, returns { bans, loading, error }
useJails Fetches jail list and individual jail detail
useConfig Reads and writes fail2ban jail configuration via the socket-based API
useFilterConfig Fetches and manages a single filter file's parsed configuration
useActionConfig Fetches and manages a single action file's parsed configuration
useJailFileConfig Fetches and manages a single jail.d config file
useConfigActiveStatus Derives active status sets for jails, filters, and actions by correlating the live jail list with the config file lists; returns { activeJails, activeFilters, activeActions, loading, error, refresh }
useAutoSave Debounced auto-save hook: invokes a save callback after the user stops typing, tracks saving/saved/error state
useHistory Queries historical ban data with filters
useBlocklists Manages blocklist sources and import triggers
useServerStatus Polls the server status endpoint at an interval
useGeo Performs IP geolocation lookups on demand

API Layer (src/api/)

A thin typed wrapper around fetch. All HTTP communication is centralised here — components and hooks never construct HTTP requests directly.

Module Purpose
client.ts Central get<T>, post<T>, put<T>, del<T> functions with error handling and credentials
endpoints.ts All API path constants in one place — no hard-coded URLs anywhere else
auth.ts login(), logout(), checkSession()
dashboard.ts fetchStatus(), fetchRecentBans()
jails.ts fetchJails(), fetchJailDetail(), startJail(), stopJail(), reloadJail()
bans.ts banIp(), unbanIp(), unbanAll(), fetchBannedIps()
config.ts Socket-based config: fetchJailConfigs(), updateJailConfig(), testRegex(). File-based config: fetchJailFiles(), fetchJailFile(), writeJailFile(), setJailFileEnabled(), fetchFilterFiles(), fetchFilterFile(), writeFilterFile(), fetchActionFiles(), fetchActionFile(), writeActionFile(), reloadConfig()
history.ts fetchHistory(), fetchIpTimeline()
blocklist.ts fetchSources(), addSource(), removeSource(), triggerImport(), fetchImportLog()
geo.ts lookupIp()
server.ts fetchServerSettings(), updateServerSettings()

Types (src/types/)

Shared TypeScript interfaces and type aliases. Purely declarative — no runtime code. Grouped by domain. Any type used by two or more files lives here.

Providers (src/providers/)

React context providers for application-wide concerns.

Provider Purpose
AuthProvider Holds authentication state; exposes isAuthenticated, login(), and logout() via useAuth()
TimezoneProvider Reads the configured IANA timezone from the backend and supplies it to all children via useTimezone()
ThemeProvider Manages light/dark theme selection, supplies the active Fluent UI theme to FluentProvider

Theme (src/theme/)

Fluent UI custom theme definitions and design token constants. No component logic — only colours, spacing, and sizing values.

Utils (src/utils/)

Pure helper functions with no React or framework dependency. Date formatting, IP display formatting, shared constants, and cryptographic utilities.

Utility Purpose
formatDate.ts Date/time formatting with IANA timezone support
formatIp.ts IP address display formatting
crypto.ts sha256Hex(input) — SHA-256 digest via browser-native SubtleCrypto API; used to hash passwords before transmission
constants.ts Frontend constants (time presets, etc.)

4. Data Flow

4.1 Request Lifecycle

Every user action follows this flow through the system:

User Action (click, form submit)
       │
       ▼
   Page / Component
       │  calls hook
       ▼
   Hook (useXxx)
       │  calls API function
       ▼
   API Layer (src/api/)
       │  HTTP request
       ▼
   FastAPI Router (app/routers/)
       │  validates input (Pydantic)
       │  calls Depends() for auth + services
       ▼
   Service (app/services/)
       │  enforces business rules
       │  calls repository or fail2ban client
       ▼
   Repository (app/repositories/)     or     fail2ban Client (app/utils/)
       │  executes SQL query                       │  sends socket command
       ▼                                           ▼
   SQLite Database                             fail2ban Server
       │                                           │
       └──────────── response bubbles back up ─────┘

4.2 Authentication Flow

┌─────────┐     POST /api/auth/login      ┌─────────────┐
│  Login   │ ─────────────────────────────▶│ auth router  │
│  Page    │     { password: "***" }       │              │
└─────────┘                                └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ auth_service  │
                                           │ - verify hash │
                                           │ - create token│
                                           └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ session_repo  │
                                           │ - store token │
                                           └──────┬───────┘
                                                  │
  Set-Cookie: session=<token>                     │
◀─────────────────────────────────────────────────┘
  • The master password is hashed and stored during setup.
  • On login, the submitted password is verified against the stored hash.
  • A session token is created, stored in the database, and returned as an HTTP-only cookie.
  • Every subsequent request is authenticated via the session cookie using a FastAPI dependency.
  • The AuthProvider on the frontend guards all routes except /setup and /login.

4.3 fail2ban Communication

BanGUI communicates with fail2ban through its Unix domain socket using the fail2ban client-server protocol.

┌────────────────────┐          ┌──────────────────┐
│  ban_service.py    │          │  fail2ban server  │
│  jail_service.py   │──socket──│                   │
│  config_service.py │          │  /var/run/fail2ban│
│  health_service.py │          │  /fail2ban.sock   │
└────────────────────┘          └──────────────────┘

The fail2ban_client.py utility module wraps this communication:

  • Opens an async connection to the Unix socket
  • Serialises commands using the fail2ban protocol (pickle-based, see ./fail2ban-master/fail2ban/client/csocket.py)
  • Parses responses into typed Python objects
  • Handles connection errors gracefully (timeout, socket not found, permission denied)

Reference source: The vendored fail2ban source at ./fail2ban-master is included in the repository as an authoritative protocol reference. When implementing or debugging socket communication, consult:

File What it documents
./fail2ban-master/fail2ban/client/csocket.py CSocket class — low-level Unix socket connection, pickle serialisation, CSPROTO.END framing
./fail2ban-master/fail2ban/client/fail2banclient.py Fail2banClient — command dispatch, argument handling, response beautification
./fail2ban-master/fail2ban/client/beautifier.py Response parser — converts raw server replies into human-readable / structured output
./fail2ban-master/fail2ban/protocol.py CSPROTO constants and the full list of supported commands with descriptions
./fail2ban-master/fail2ban/client/configreader.py Config file parsing used by fail2ban — reference for understanding jail/filter structure

Key commands used:

Command Purpose
status Get global server status (number of jails, fail2ban version)
status <jail> Get jail detail (banned IPs, failure count, filter info)
set <jail> banip <ip> Ban an IP in a specific jail
set <jail> unbanip <ip> Unban an IP from a specific jail
set <jail> idle on/off Toggle jail idle mode
start/stop <jail> Start or stop a jail
reload <jail> Reload a single jail configuration
reload Reload all jails
get <jail> ... Read jail settings (findtime, bantime, maxretry, filter, actions, etc.)
set <jail> ... Write jail settings
set loglevel <level> Change server log level
set logtarget <target> Change server log target
set dbpurgeage <seconds> Set database purge age
flushlogs Flush and re-open log files

4.4 fail2ban Database Access

In addition to the live socket, BanGUI reads the fail2ban SQLite database directly for historical data that the socket protocol does not expose (ban history, past log matches). This is read-only access.

history_service.py ──read-only──▶ fail2ban.db (SQLite)

The fail2ban database contains:

  • bans table — historical ban records (IP, jail, timestamp, ban data)
  • jails table — jail definitions
  • logs table — matched log lines per ban

BanGUI queries these tables to power the Ban History page and the per-IP timeline view.

4.5 External API Communication

geo_service.py ──aiohttp──▶ IP Geolocation API (country, ASN, RIR)
blocklist_service.py ──aiohttp──▶ Blocklist URLs (plain-text IP lists)

All external HTTP calls go through a shared aiohttp.ClientSession created during startup and closed during shutdown. External data is validated before use (IP format, response structure).


5. Database Design

BanGUI maintains its own SQLite database (separate from the fail2ban database) to store application state.

5.1 Application Database Tables

Table Purpose
settings Key-value store for application configuration (master password hash, fail2ban socket path, database path, timezone, session duration)
sessions Active session token hashes with expiry timestamps. Tokens are stored as one-way SHA256 hashes to prevent token hijacking if the database is exposed.
geo_cache Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at, last_seen). Tracks the last time each IP address was referenced to enable retention policies. Entries older than 90 days are automatically purged by the geo_cache_cleanup task to prevent unbounded growth. Loaded into memory at startup via load_cache_from_db(); new entries are flushed back by the geo_cache_flush background task.
blocklist_sources Registered blocklist URLs (id, name, url, enabled, created_at, updated_at)
import_logs Record of every blocklist import run (id, source_id, timestamp, ips_imported, ips_skipped, errors, status)

5.2 Database Boundaries

Database Owner BanGUI Access
BanGUI application DB (bangui.db) BanGUI Read + Write
fail2ban DB (fail2ban.db) fail2ban Read-only (for history queries)

6. Setup & Configuration Persistence

6.1 Initial Setup Wizard & One-Time Configuration

The setup wizard (POST /api/setup) runs once during first-time startup to configure:

  • Master password (bcrypt-hashed)
  • Runtime database path (where BanGUI stores operational state)
  • fail2ban Unix socket path
  • IANA timezone
  • Session duration (in minutes)
  • Map color thresholds for geolocation visualization

Atomicity & Crash-Safety:

Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:

  1. Phase 1 (Bootstrap DB transaction): Set setup_state = "in_progress" and persist database_path. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up.

  2. Phase 2 (Filesystem + Runtime DB): Initialize runtime database schema outside a transaction (idempotent via CREATE TABLE IF NOT EXISTS).

  3. Phase 3 (Runtime DB transaction): Batch-write all runtime settings (password hash, paths, config) atomically in a single BEGIN IMMEDIATE ... COMMIT transaction. Either all settings are persisted or none are.

  4. Phase 4 (Bootstrap DB transaction): Set setup_state = "complete" and setup_completed = "1". This is the final commit point — only when this succeeds is setup considered complete.

Password Hash Idempotency:

The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.

State Machine:

State Meaning Recovery
null Setup not started Normal flow: begin setup
"in_progress" Bootstrap DB marked, runtime DB being initialized Retry from beginning (runtime DB may be partial)
"complete" All settings persisted, setup finished Skip setup (already done)

If a crash is detected in "in_progress" state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.

Backward Compatibility:

The setup_completed = "1" key is still written for backward compatibility with cache detection. Modern code checks setup_state = "complete" for clearer semantics.


8. Authentication & Session Management

  • Single-user model — one master password, no usernames.
  • Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
  • Sessions are token-based, stored server-side in the sessions table as one-way SHA256 hashes, and delivered to the browser as HTTP-only secure cookies.
  • Session token hashing — Session tokens are hashed before storage to prevent token hijacking if the database file is exposed. Only the hash (token_hash) is stored in the database; the raw token is never persisted. When validating a session, the incoming token is hashed before the database lookup. This ensures the database alone is not sufficient to usurp a session — an attacker would also need knowledge of the original token value.
  • Session expiry is configurable (set during setup, stored in settings).
  • The frontend AuthProvider checks session validity on mount and redirects to /login if invalid.
  • The backend dependencies.py provides an authenticated dependency that validates the session cookie on every protected endpoint.
  • Session validation cache (InMemorySessionCache in app.utils.session_cache) — validated session tokens are cached in memory for 10 seconds (configurable via session_cache_ttl_seconds) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. ⚠️ This cache is process-local and not safe for multi-worker or distributed deployments. In single-worker mode (enforced by TASK-002), this is safe and improves performance. For multi-worker deployments, replace InMemorySessionCache with a shared backend (Redis, database, shared memory) implementing the SessionCache protocol. See app/utils/session_cache.py module docstring for implementation details.
  • GeoCacheGeoCache instance is created at startup with a configurable allow_http_fallback flag and stored on app.state.geo_cache. It implements a primary + fallback resolution strategy: (1) try local MaxMind GeoLite2-Country MMDB database (primary, encrypted, no network traffic), (2) if unavailable/no result and allowed, fall back to ip-api.com HTTP API (unencrypted, disabled by default for security). Encapsulates in-memory lookup cache, negative cache for unresolvable IPs (5-minute TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from the geo_cache SQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by the geo_cache_flush background task. Stale entries are re-resolved by the geo_re_resolve task. Injected into routes and tasks via FastAPI's dependency system. See Backend-Development.md § IP Geolocation Resolution for setup and security details.
  • Runtime state (RuntimeState in app.utils.runtime_state) — stores mutable application state: server_status (fail2ban online/offline), last_activation (jail activation tracking), pending_recovery (crash detection), runtime_settings (effective configuration), and service-specific state holders like jail_service_state (JailServiceState for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., record_activation(), clear_pending_recovery()) and via dependency injection to services. Service-specific state (like JailServiceState) is nested within RuntimeState to keep all mutable state in one controlled location. ⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker. Mutations must not span await points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See app/utils/runtime_state.py module docstring for details.
  • Setup-completion flag — once is_setup_complete() returns True, the result is stored in app.state._setup_complete_cached. The SetupRedirectMiddleware skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.

8.1 CSRF Protection

State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a custom header check middleware.

Design:

  • For requests authenticated via the session cookie (not Bearer token), the CsrfMiddleware requires the custom header X-BanGUI-Request: 1 to be present.
  • The frontend API client automatically includes this header on all requests.
  • Cross-site fetch() calls cannot set custom headers without CORS preflight, which the backend rejects for non-allowed origins, providing defense-in-depth.
  • Safe HTTP methods (GET, HEAD, OPTIONS) bypass the check.
  • Bearer token authentication (via Authorization: Bearer header) bypasses the check because tokens are not CSRF-vulnerable (they are not automatically sent on cross-origin requests).
  • Requests missing the CSRF header receive a 403 Forbidden response with detail: "CSRF validation failed. Request rejected.".

This mechanism complements the existing SameSite=Lax cookie policy, which blocks traditional <form> POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.


9. Scheduling

APScheduler 4.x (async mode) manages recurring background tasks.

┌──────────────────────┐
│     APScheduler      │
│  (async, in-process) │
├──────────────────────┤
│  blocklist_import    │  ── runs on configured schedule (default: daily 03:00)
│  geo_cache_cleanup   │  ── runs every 24 hours (nightly)
│  geo_cache_flush     │  ── runs every 60 seconds
│  health_check        │  ── runs every 30 seconds
└──────────────────────┘
  • The scheduler is started during the FastAPI lifespan startup and stopped during shutdown.
  • Job schedules are persisted in the application database so they survive restarts.
  • Users can modify the blocklist import schedule through the web interface.
  • A manual "Run Now" button triggers the blocklist import job outside the schedule.

10.1 Background Tasks and Database Access

  • APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on Depends(get_db).
  • Background tasks must open their own application database connection via app.db.open_db and close it when the work completes.
  • Use a shared task helper (app.tasks.db.task_db) so every task follows the same async context manager pattern and avoids connection leaks.
  • This pattern is intentional: task code is structurally separate from request-handling dependencies and should not attempt to reuse request-scoped DB connections.

9. API Design

9.1 Conventions

  • All endpoints are grouped under /api/ prefix.
  • JSON request and response bodies, validated by Pydantic models.
  • Authentication via session cookie on all endpoints except /api/setup and /api/auth/login.
  • Setup-redirect middleware: while no configuration exists, all API endpoints (except /api/setup and /api/health) return 423 Locked with {"detail": "Setup not complete.", "setup_required": true}. This ensures API consumers can detect setup as a distinct condition rather than transparently following redirects.
  • Standard HTTP status codes: 200 success, 201 created, 204 no content, 400 bad request, 401 unauthorized, 404 not found, 422 validation error, 423 locked, 500 server error.
  • Error responses follow a consistent shape: { "detail": "Human-readable message" }.

9.2 Endpoint Groups

Group Endpoints Description
Auth POST /login, POST /logout Session management
Setup POST /setup First-run configuration
Dashboard GET /status, GET /bans Overview data for the main page
Jails GET /, GET /:name, POST /:name/start, POST /:name/stop, POST /:name/reload, POST /reload-all Jail listing and controls
Bans POST /ban, POST /unban, POST /unban-all, GET /banned Ban management
Config GET /, PUT /, POST /test-regex Configuration viewing and editing
History GET /, GET /ip/:ip Historical ban browsing
Blocklists GET /sources, POST /sources, DELETE /sources/:id, POST /import, GET /import-log Blocklist management
Geo GET /lookup/:ip IP geolocation and enrichment
Server GET /settings, PUT /settings, POST /flush-logs Server-level settings

9. Deployment Architecture

┌──────────────────────────────────────────────────┐
│                   Host Machine                   │
│                                                  │
│  ┌─────────────────────────────────────────────┐ │
│  │  Reverse Proxy (nginx / caddy)              │ │
│  │  - TLS termination                          │ │
│  │  - /api/* → backend (uvicorn)               │ │
│  │  - /*     → frontend (static files)         │ │
│  └──────────────┬───────────────┬──────────────┘ │
│                 │               │                 │
│  ┌──────────────┴───┐  ┌───────┴──────────────┐  │
│  │ Backend           │  │ Frontend             │  │
│  │ uvicorn + FastAPI │  │ Static build (Vite)  │  │
│  │ (port 8000)       │  │ (served by proxy)    │  │
│  └────────┬──────────┘  └──────────────────────┘  │
│           │                                       │
│  ┌────────┴──────────────────────────────────┐    │
│  │  fail2ban (systemd service)               │    │
│  │  Socket: /var/run/fail2ban/fail2ban.sock  │    │
│  │  Database: /var/lib/fail2ban/fail2ban.db  │    │
│  └───────────────────────────────────────────┘    │
└──────────────────────────────────────────────────┘
  • The backend runs as an ASGI server (uvicorn) behind a reverse proxy.
  • The frontend is built to static files by Vite and served directly by the reverse proxy.
  • The backend process needs read access to the fail2ban socket and the fail2ban database.
  • Both the application database and the fail2ban database reside on the same host.

10.2 nginx Routing Rules

The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior:

Location Block Priority

nginx uses longest-prefix matching to determine which location block handles a request:

  1. Exact matches (location =) — highest priority
  2. Regular expression matches (location ~) — second priority
  3. Prefix matches (location /prefix) — matched in order of specificity (longest first)
  4. Catch-all (location /) — lowest priority

Routing Configuration

Location Block Rule Purpose
location /api/ proxy_pass http://backend:8000;no try_files Proxy all API requests to FastAPI backend. Any unmatched API route (typos, invalid paths) returns 404 from the backend.
location /assets/ try_files $uri =404; Serve static assets with long-term caching. Return 404 if file doesn't exist.
location / try_files $uri $uri/ /index.html; SPA fallback: serve index.html for all unmatched routes (client-side routing).

Routing Behavior

Request → /api/some-endpoint
    ↓
    nginx matches location /api/ (longest prefix)
    ↓
    proxy_pass → backend:8000
    ↓
    Backend returns 404 if endpoint doesn't exist (✓ correct)
    Client sees 404, not SPA HTML

Request → /some-page
    ↓
    nginx matches location / (catch-all)
    ↓
    try_files looks for file, then directory, then /index.html
    ↓
    Serves /index.html (React Router handles client-side routing)
    ↓
    Client sees 200 with HTML (✓ correct for SPA)

Request → /api/typos
    ↓
    nginx matches location /api/ (longest prefix, NOT catch-all)
    ↓
    proxy_pass → backend:8000
    ↓
    FastAPI returns 404 (✓ correct, not caught by SPA fallback)

Critical Implementation Notes

  • Never add try_files to the /api/ location block — this would hide backend 404s.
  • The /api/ location must come before the / catch-all in the config (this is automatically respected via longest-prefix matching).
  • No inherited try_files rules — the /api/ location has no global try_files that could affect it.
  • Backend 404 responses pass through nginx unchanged — nginx does not rewrite 404 responses from the backend.

9.2a nginx Security Headers

nginx adds the following OWASP-recommended security headers to all responses:

Header Value Purpose
Content-Security-Policy default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; connect-src 'self'; frame-ancestors 'none'; Prevents XSS attacks by restricting script execution to same-origin. style-src 'unsafe-inline' is required for Fluent UI v9's inline styles.
X-Frame-Options DENY Prevents clickjacking by disallowing iframe embedding.
X-Content-Type-Options nosniff Prevents MIME-sniffing; browsers must respect the declared Content-Type.
Referrer-Policy no-referrer Prevents leaking internal URLs in the Referer header to third-party resources.
Permissions-Policy geolocation=(), microphone=(), camera=() Disables access to browser APIs not needed by the application.
Strict-Transport-Security Commented out Must only be enabled after HTTPS is fully configured. Uncomment when TLS termination is production-ready.

All headers use the always directive, ensuring they are included in error responses (4xx, 5xx) as well.

CSP and Fluent UI

Fluent UI v9 applies styles via inline style attributes on DOM elements. To support this, style-src 'unsafe-inline' is required. A stricter CSP using nonces would require server-side rendering of the HTML shell, which is outside the current architecture.


9.3 Deployment Constraints

Single-Worker Requirement

BanGUI's background scheduler must run with exactly one uvicorn worker process.

The application uses APScheduler's AsyncIOScheduler, which is bound to a single asyncio event loop and cannot be safely shared across multiple worker processes. If the app is deployed with --workers N (where N > 1), the following failures occur:

  • Each worker process creates its own independent scheduler instance.
  • All background jobs execute N times simultaneously (once per worker).
  • Results:
    • Duplicate blocklist imports — the same IP ranges are banned N times.
    • Duplicate history entries — the same historical events are recorded N times.
    • Duplicate ban operations — bans are executed multiple times, with potential state conflicts.
    • SQLite lock contention — concurrent writes to the same database from N workers cause lock timeouts.

Enforcement

  1. Environment variable: Set BANGUI_WORKERS=1 (default in Dockerfile.backend).
  2. Detection: On startup, startup_shared_resources() validates BANGUI_WORKERS and raises a clear RuntimeError if it is not 1.
  3. Single-process design: The application is optimized for a single-process, high-concurrency model using asyncio. Request handling is fully async and leverages the event loop efficiently.

Future Multi-Worker Support

To safely support multiple workers in the future:

  1. External job store: Move APScheduler from in-memory to a persistent store (e.g., SQLAlchemy-backed job store with PostgreSQL or Redis).
  2. Distributed locking: Use a distributed lock (Redis, etcd) to ensure only one worker executes each scheduled job.
  3. Process coordination: Implement a process-to-worker pool communication mechanism so the scheduler runs only on one designated worker.

Currently, the single-worker approach is simple, maintainable, and sufficient for BanGUI's operational requirements.


10. Design Principles

These principles govern all architectural decisions in BanGUI.

Principle Application
Separation of Concerns Frontend and backend are independent. Backend layers (router → service → repository) never mix responsibilities.
Service Independence Services must not import other services at the same layer (e.g., jail_config_service must not import jail_service). Shared logic belongs in the utils layer (app/utils/). This prevents circular dependencies, improves testability, and keeps each service focused on its domain.
Single Responsibility Each module, service, and component has one well-defined job.
Dependency Inversion Services depend on abstractions (protocols), not concrete implementations. FastAPI Depends() wires everything.
Async Everything All I/O is non-blocking. No synchronous database, HTTP, or socket calls anywhere in the backend.
Validate at the Boundary Pydantic models validate all data entering the backend. TypeScript types enforce structure on the frontend.
Fail Fast Configuration is validated at startup. Invalid input is rejected immediately with clear errors.
Composition over Inheritance Small, focused objects are composed together rather than building deep class hierarchies.
DRY Shared logic lives in utils, hooks, or base services — never duplicated across modules.
KISS The simplest correct solution wins. No premature abstractions or over-engineering.
YAGNI Only build what is needed now. Extend when a real requirement appears.