Files

Lukas 1302ac821f Fix non-atomic setup persistence across DB contexts (Issue #30 )

Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.

## Changes

### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
   - New set_settings_batch() method: writes multiple settings in single
     transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
     or none do, preventing partial state if crash occurs mid-batch.

2. **setup_service.py**: Refactor run_setup() with transactional phases
   - Phase 0: Compute password hash early (before any DB writes) to ensure
     idempotency. Same hash is used throughout retries, preventing divergent
     hashes from bcrypt's random salt.
   - Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
     database_path, then commit. First checkpoint for crash detection.
   - Phase 2 (Filesystem): Initialize runtime database (idempotent)
   - Phase 3 (Runtime DB transaction): Batch-write all settings atomically
   - Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
     setup_completed=1. Final commit point.

3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol

### Testing
- Added 6 new transactionality tests covering:
  - State machine transitions (None → in_progress → complete)
  - Password hash idempotency across retries
  - Atomic batch writes (all-or-nothing persistence)
  - Bootstrap DB state tracking
  - Database path propagation to both DBs
  - Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)

### Documentation
- Updated Docs/Architekture.md with new section 6:
  - Setup state machine with state transitions
  - Transaction boundary documentation
  - Password hash idempotency rationale
  - Backward compatibility notes

## Design Decisions

### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
  hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers

### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-04-29 19:19:53 +02:00

79 KiB

Raw Blame History

BanGUI — Architecture

This document describes the system architecture of BanGUI, a web application for monitoring, managing, and configuring fail2ban. It defines every major component, module, and data flow so that any developer can understand how the pieces fit together before writing code.

1. High-Level Overview

BanGUI is a two-tier web application with a clear separation between frontend and backend, connected through a RESTful JSON API.

┌──────────────────────────────────────────────────────────────────┐
│                          Browser                                 │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                   Frontend (React + Fluent UI)             │  │
│  │  TypeScript · Vite · Single-Page Application               │  │
│  └──────────────────────────┬─────────────────────────────────┘  │
└─────────────────────────────┼────────────────────────────────────┘
                              │  HTTP / JSON (REST API)
┌─────────────────────────────┼────────────────────────────────────┐
│                          Server                                  │
│  ┌──────────────────────────┴─────────────────────────────────┐  │
│  │                   Backend (FastAPI)                        │  │
│  │  Python 3.12+ · Async · Pydantic v2 · structlog            │  │
│  └─────┬──────────────┬──────────────┬────────────────────────┘  │
│        │              │              │                           │
│  ┌─────┴─────┐  ┌─────┴─────┐  ┌────┴─────┐                      │
│  │  SQLite   │  │ fail2ban  │  │ External │                      │
│  │  (App DB) │  │  (Socket) │  │   APIs   │                      │
│  └───────────┘  └───────────┘  └──────────┘                      │
└──────────────────────────────────────────────────────────────────┘

Component Summary

Component	Technology	Purpose
Frontend	TypeScript, React, Fluent UI v9, Vite	User interface — displays data, captures user input, communicates with the backend API
Backend	Python 3.12+, FastAPI, Pydantic v2, aiosqlite	Business logic, data persistence, fail2ban communication, scheduling
Application Database	SQLite (via aiosqlite)	Stores BanGUI's own data: configuration, session state, blocklist sources, import logs
fail2ban	Unix domain socket	The monitored service — BanGUI reads status, issues commands, and reads the fail2ban database
MaxMind GeoLite2	Offline MMDB file (mounted into container)	IP geolocation (primary resolver) — local, encrypted
External APIs	HTTP (via aiohttp)	Blocklist downloads; IP geolocation fallback (only if MMDB unavailable and HTTP fallback enabled)

2. Backend Architecture

The backend follows a layered architecture with strict separation of concerns. Dependencies flow inward: routers depend on services, services depend on repositories — never the reverse.

                ┌─────────────────────────────────┐
                │        FastAPI Application       │
                │          (main.py)               │
                └──────────┬──────────────────────-┘
                           │
          ┌────────────────┼────────────────┐
          │                │                │
    ┌─────┴──────┐  ┌─────┴──────┐  ┌──────┴──────┐
    │  Routers   │  │   Tasks    │  │   Config    │
    │  (HTTP)    │  │ (Scheduled)│  │ (Settings)  │
    └─────┬──────┘  └─────┬──────┘  └─────────────┘
          │               │
    ┌─────┴───────────────┴──────┐
    │         Services           │
    │     (Business Logic)       │
    └─────┬──────────────┬───────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │Repositories│ │  External  │
    │ (Database) │ │  Clients   │
    └─────┬──────┘ └─────┬──────┘
          │              │
    ┌─────┴──────┐ ┌─────┴──────┐
    │  SQLite    │ │fail2ban /  │
    │            │ │HTTP APIs   │
    └────────────┘ └────────────┘

2.1 Project Structure

backend/
├── app/
│   ├── __init__.py
│   ├── `main.py`                # FastAPI app factory, lifespan, exception handlers
│   ├── `config.py`              # Pydantic settings (env vars, .env loading)
│   ├── `db.py`                  # Database connection and initialization
│   ├── `exceptions.py`          # Shared domain exception classes; all services and routers import from here
│   ├── `dependencies.py`        # FastAPI Depends() providers (DB, services, auth)
│   ├── `models/`                # Pydantic schemas
│   │   ├── auth.py            #   Login request/response, session models
│   │   ├── ban.py             #   Ban request/response/domain models
│   │   ├── jail.py            #   Jail request/response/domain models
│   │   ├── config.py          #   Configuration view/edit models
│   │   ├── blocklist.py       #   Blocklist source/import models
│   │   ├── history.py         #   Ban history models
│   │   ├── server.py          #   Server status, health check models
│   │   └── setup.py           #   Setup wizard models
│   ├── routers/               # FastAPI routers (HTTP layer only)
│   │   ├── auth.py            #   POST /api/auth/login, POST /api/auth/logout
│   │   ├── setup.py           #   POST /api/setup (first-run configuration)
│   │   ├── dashboard.py       #   GET /api/dashboard/status, GET /api/dashboard/bans
│   │   ├── jails.py           #   CRUD + controls for jails
│   │   ├── bans.py            #   Ban/unban actions, currently banned list
│   │   ├── config.py          #   View/edit fail2ban configuration
│   │   ├── history.py         #   Historical ban queries
│   │   ├── blocklist.py       #   Blocklist source management, manual import trigger
│   │   ├── geo.py             #   IP geolocation and lookup
│   │   └── server.py          #   Server settings (log level, DB purge, etc.)
│   ├── services/              # Business logic (one service per domain)
│   │   ├── auth_service.py    #   Password verification, session creation/validation
│   │   ├── setup_service.py   #   First-run setup logic, configuration persistence
│   │   ├── jail_service.py    #   Jail listing, start/stop/reload, status aggregation
│   │   ├── ban_service.py     #   Ban/unban execution, currently-banned queries
│   │   ├── config_service.py  #   Read/write fail2ban config, regex validation
│   │   ├── config_file_service.py #   Shared config parsing and file-level operations
│   │   ├── raw_config_io_service.py #   Raw config file I/O wrapper
│   │   ├── jail_config_service.py #   jail config activation/deactivation logic
│   │   ├── filter_config_service.py #   filter config lifecycle management
│   │   ├── action_config_service.py #   action config lifecycle management
│   │   ├── log_service.py     #   Log preview and regex test operations
│   │   ├── fail2ban_metadata_service.py #   Resolve and cache the fail2ban SQLite DB path via the fail2ban socket
│   │   ├── history_service.py #   Historical ban queries, per-IP timeline
│   │   ├── blocklist_service.py # Orchestration: source CRUD, scheduling, import triggers
│   │   ├── blocklist_downloader.py #   HTTP download with retry logic
│   │   ├── blocklist_parser.py #   Parse and validate IP addresses
│   │   ├── blocklist_ban_executor.py #   Ban execution with error handling
│   │   ├── blocklist_import_workflow.py #   Import orchestration (coordinates components)
│   │   ├── geo_service.py     #   IP-to-country resolution, ASN/RIR lookup
│   │   ├── server_service.py  #   Server settings, log management, DB purge
│   │   └── health_service.py  #   fail2ban connectivity checks, version detection
│   ├── repositories/          # Data access layer (raw queries only)
│   │   ├── settings_repo.py   #   App configuration CRUD in SQLite
│   │   ├── session_repo.py    #   Session storage and lookup
│   │   ├── blocklist_repo.py  #   Blocklist sources and import log persistence│  │   ├── fail2ban_db_repo.py #   fail2ban SQLite ban history read operations
│  │   ├── geo_cache_repo.py  #   IP geolocation cache persistence│   │   └── import_log_repo.py #   Import run history records
│   ├── tasks/                 # APScheduler background jobs
│   │   ├── blocklist_import.py#   Scheduled blocklist download and application
│   │   ├── geo_cache_flush.py #   Periodic geo cache persistence (dirty-set flush to SQLite)│  │   ├── geo_cache_cleanup.py #   Periodic purge of stale geo cache entries
│   │   ├── geo_re_resolve.py  #   Periodic re-resolution of stale geo cache records│   │   └── health_check.py   #   Periodic fail2ban connectivity probe
│   └── utils/                 # Helpers, constants, shared types
│       ├── fail2ban_client.py #   Async wrapper around the fail2ban socket protocol
│       ├── fail2ban_response.py #   Canonical response parsing: ok(), to_dict(), ensure_list(), is_not_found_error()
│       ├── fail2ban_db_utils.py #   fail2ban database query helpers
│       ├── ip_utils.py        #   IP/CIDR validation and normalisation
│       ├── time_utils.py      #   Timezone-aware datetime helpers
│       ├── config_file_utils.py #   fail2ban config file I/O
│       ├── conffile_parser.py #   fail2ban config file parser/serializer
│       ├── config_parser.py   #   Structured config object parser
│       ├── config_writer.py   #   Atomic config file write operations
│       ├── jail_config.py     #   Jail config helper
│       └── constants.py       #   Shared constants (default paths, limits, etc.)
├── tests/
│   ├── conftest.py            # Shared fixtures (test app, client, mock DB)
│   ├── test_routers/          # One test file per router
│   ├── test_services/         # One test file per service
│   └── test_repositories/     # One test file per repository
├── pyproject.toml
└── .env.example

2.2 Module Purposes

Routers (`app/routers/`)

The HTTP interface layer. Each router maps URL paths to handler functions. Routers parse and validate incoming requests using Pydantic models, delegate all logic to services, and return typed responses. They contain zero business logic.

Router	Prefix	Purpose
`auth.py`	`/api/auth`	Login (password check), logout, session validation
`setup.py`	`/api/setup`	First-run wizard — save initial configuration
`dashboard.py`	`/api/dashboard`	Server status bar data, recent bans for the dashboard
`jails.py`	`/api/jails`	List jails, jail detail, start/stop/reload/idle controls
`bans.py`	`/api/bans`	Ban an IP, unban an IP, unban all, list currently banned IPs
`config.py`	`/api/config`	Read and write fail2ban jail/filter/server configuration via the socket; also serves the fail2ban log tail and service status for the Log tab
`file_config.py`	`/api/config`	Read and write fail2ban config files on disk (jail.d/, filter.d/, action.d/) — list, get, and overwrite raw file contents, toggle jail enabled/disabled
`history.py`	`/api/history`	Query historical bans, per-IP timeline
`blocklist.py`	`/api/blocklists`	CRUD blocklist sources, trigger import, view import logs
`geo.py`	`/api/geo`	IP geolocation lookup, ASN and RIR data
`server.py`	`/api/server`	Log level, log target, DB path, purge age, flush logs
`health.py`	`/api/health`	fail2ban connectivity health check and status

Services (`app/services`)

The business logic layer. Services orchestrate operations, enforce rules, and coordinate between repositories, the fail2ban client, and external APIs. Each service covers a single domain.

Service Layer Responsibilities:

Services must be independent of HTTP concerns. They work with domain models (DTOs), not response models. This ensures:

Domain logic can evolve without affecting API shape
Services are reusable across different frontends
Testing is simpler (no mocking HTTP response types)
Changes to endpoint responses don't require service changes

Domain Models and Response Mapping:

Services return domain models (e.g., DomainActiveBanList, DomainBansByCountry) that represent pure business logic. Response models (e.g., ActiveBanListResponse, BansByCountryResponse) are defined in app/models/ and used only by routers.

Conversion happens at the router boundary:

Router calls service → receives domain model
Router calls mapper function to convert domain model → response model
Router returns response model to HTTP client

Example:

# In ban_service.py
async def get_active_bans(...) -> DomainActiveBanList:
    """Service returns domain model (not HTTP-aware)."""
    ...

# In routers/bans.py (router boundary)
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Mapper functions live in app/mappers/ and are thin, mechanical translations between structures.

Motivation:

The Fail2ban domain doesn't care about field names like country_code (snake_case) vs countryCode (camelCase)
If the API needs pagination metadata added to the response, only the mapper changes
If repositories change their output schema, only services need updating (routers are unaffected)
Services can be tested with simple dataclasses; no need for Pydantic serialization overhead

Service	Purpose
`auth_service.py`	Hashes and verifies the master password, creates and validates session tokens, enforces session expiry
`setup_service.py`	Validates setup input, persists initial configuration, ensures setup runs only once
`jail_service.py`	Retrieves jail list and details from fail2ban, aggregates metrics (banned count, failure count), sends start/stop/reload/idle commands
`ban_service.py`	Executes ban and unban commands via the fail2ban socket, queries the currently banned IP list, validates IPs before banning
`config_service.py`	Reads active jail and filter configuration from fail2ban, writes configuration changes, validates regex patterns, triggers reload; reads the fail2ban log file tail and queries service status for the Log tab
`file_config_service.py`	Reads and writes raw fail2ban config files on disk (jail.d/, filter.d/, action.d/); lists files, reads content, overwrites files, toggles enabled/disabled
`jail_config_service.py`	Discovers inactive jails by parsing jail.conf / jail.local / jail.d/*; writes .local overrides to activate/deactivate jails; triggers fail2ban reload; validates jail configurations
`filter_config_service.py`	Discovers available filters by scanning filter.d/; reads, creates, updates, and deletes filter definitions; assigns filters to jails
`action_config_service.py`	Discovers available actions by scanning action.d/; reads, creates, updates, and deletes action definitions; assigns actions to jails
`config_file_service.py`	Shared utilities for configuration parsing and manipulation: parses config files, validates names/IPs, manages atomic file writes, probes fail2ban socket
`raw_config_io_service.py`	Low-level file I/O for raw fail2ban config files
`fail2ban_metadata_service.py`	Resolves the fail2ban SQLite database path by querying the fail2ban socket and caches the result for reuse across services
`log_service.py`	Log preview and regex test operations (extracted from config_service)
`history_service.py`	Queries the fail2ban database for historical ban records, builds per-IP timelines, computes ban counts and repeat-offender flags, and syncs new records into BanGUI's archive table
`blocklist_service.py`	Orchestration layer for blocklist imports. Delegates to focused components: `BlocklistDownloader` (HTTP download with retry), `BlocklistParser` (IP validation), `BanExecutor` (fail2ban integration), and `BlocklistImportWorkflow` (orchestrates the flow). Maintains public API for source CRUD, preview, scheduling, and import triggers.
`geo_cache.py`	GeoCache class that encapsulates all IP geolocation caching: resolves IP addresses to country, ASN, and organization using a primary local MaxMind GeoLite2-Country database (if available) with optional HTTP fallback to ip-api.com (disabled by default for security). Maintains in-memory and persistent caches with negative cache support, and manages background re-resolution. Instantiated once at startup with allow_http_fallback flag and stored on `app.state.geo_cache`
`geo_service.py`	(Deprecated) Backward-compatibility wrappers that delegate to the `GeoCache` instance. Kept for compatibility with existing code. New code should use `GeoCache` directly or via dependency injection
`server_service.py`	Reads and writes fail2ban server-level settings (log level, log target, syslog socket, DB location, purge age)
`health_service.py`	Probes fail2ban socket connectivity, retrieves server version and global stats, reports online/offline status

Blocklist Import Architecture

The blocklist import flow has been refactored to separate concerns into focused components:

blocklist_service.py (Public API)
    │
    ├─ import_source() ──┐
    │                    │
    └─ import_all()      ├──> BlocklistImportWorkflow (Orchestrator)
                         │         │
                         │         ├──> BlocklistDownloader
                         │         │       • HTTP GET with retry logic
                         │         │       • Exponential backoff (429, 5xx)
                         │         │       • Timeout handling
                         │         │
                         │         ├──> BlocklistParser
                         │         │       • Parse text to IP lines
                         │         │       • Validate IPv4/IPv6 addresses
                         │         │       • Skip CIDRs and malformed entries
                         │         │
                         │         ├──> BanExecutor
                         │         │       • Ban each IP via fail2ban socket
                         │         │       • Abort on JailNotFoundError
                         │         │       • Continue on individual ban failures
                         │         │
                         │         └──> Geo pre-warming
                         │               (optional batch lookup for newly banned IPs)
                         │
                         └──> Result logging (import_log_repo)

Component Responsibilities:

BlocklistDownloader: Handles HTTP transport concerns (retries, timeouts, backoff)
BlocklistParser: Handles parsing and validation logic (clean, testable, no I/O)
BanExecutor: Handles fail2ban integration with error aggregation
BlocklistImportWorkflow: Coordinates the flow, handles result aggregation and geo pre-warming
blocklist_service.py: Maintains public API (source CRUD, scheduling, import triggers)

Benefits of This Architecture:

Each component is independently testable with mock dependencies
Error handling is clear: JailNotFoundError stops processing, JailOperationError continues
Components can be evolved independently (e.g., replace HTTP client, add batch validation)
Logging is contextual and tied to the appropriate layer
Retry logic and transient error handling are isolated

DNS-Rebinding Protection

The Vulnerability:

A DNS-rebinding attack exploits a time-of-check-to-time-of-use (TOCTOU) window between when a blocklist URL is validated and when it is actually fetched:

User adds blocklist URL http://attacker.com/blocklist.txt
blocklist_service.create_source() calls validate_blocklist_url() which performs DNS resolution
attacker.com resolves to a public IP (attacker's real server) — validation passes ✓
Later, when BlocklistDownloader fetches the URL, the attacker's DNS server responds with 192.168.1.1
The HTTP client connects to the private IP, potentially accessing internal services

The Protection:

BanGUI closes this window by adding a second DNS-rebinding check at connection time:

Create-time validation (app/utils/ip_utils.py:validate_blocklist_url): Confirms the URL resolves to a public IP when created
Connection-time validation (app/services/dns_validated_connector.py): Validates that all resolved IPs are public when the actual HTTP connection is made

The HTTP session is created with a custom socket factory that intercepts DNS resolution results before socket creation. If any resolved IP is private or reserved, the connection is rejected with a clear error.

Implementation:

app/services/dns_validated_connector.py: Provides create_dns_validated_socket_factory() which returns a socket factory that validates IPs using is_private_ip()
app/startup.py:_create_http_session(): Passes the socket factory to aiohttp.TCPConnector, protecting all HTTP requests globally
All blocklist imports automatically inherit this protection through the shared session

Protected IP Ranges:

The validation blocks all RFC 1918 private ranges, loopback, link-local, ULA, multicast, and reserved addresses:

IPv4: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 224.0.0.0/4, 240.0.0.0/4, 255.255.255.255/32
IPv6: ::1/128, fe80::/10, fc00::/7, ff00::/8, and others (via ipaddress.IPv6Address.is_private, etc.)

Reference:

OWASP SSRF Prevention Cheat Sheet
Tests: backend/tests/test_services/test_dns_validated_connector.py

Startup DAG (`app/startup_dag.py`, `app/startup.py`)

The startup process is orchestrated by an explicit Directed Acyclic Graph (DAG) that defines all resource initialization stages, their dependencies, health checks, and rollback strategy. This replaces implicit ordering with explicit, documented prerequisites.

Why This Exists:

Previously, startup resources were created in a procedural sequence without documented dependencies. If a stage was reordered or a prerequisite was missed, initialization could fail in non-obvious ways. Partial failures could leave stale resources (open database connections, HTTP sessions, running schedulers) that prevented clean rollback.

Startup Stages (in order):

1. WORKER_MODE
   └─ Validates that BANGUI_WORKERS=1 (scheduler cannot run in multiple workers)

2. DATABASE
   ├─ Prerequisite: WORKER_MODE
   ├─ Creates database directory
   ├─ Initializes database schema
   ├─ Caches setup completion state
   └─ Loads persisted runtime settings

3. GEO_CACHE
   ├─ Prerequisite: DATABASE
   ├─ Loads IP geolocation cache from database
   ├─ Counts unresolved IPs
   ├─ Initializes MaxMind GeoLite2 database
   └─ Configures HTTP fallback (if enabled)

4. HTTP_SESSION
   ├─ Prerequisite: GEO_CACHE
   ├─ Creates aiohttp.ClientSession
   └─ Configures timeouts and connection limits

5. SCHEDULER
   ├─ Prerequisite: HTTP_SESSION
   ├─ Creates APScheduler AsyncIOScheduler
   └─ Starts the scheduler

6. TASKS
   ├─ Prerequisite: SCHEDULER
   ├─ Registers health_check task (fail2ban connectivity probe)
   ├─ Registers blocklist_import task (scheduled imports)
   ├─ Registers geo_cache_cleanup task (stale entry purge)
   ├─ Registers geo_cache_flush task (periodic persistence)
   ├─ Registers geo_re_resolve task (stale record re-resolution)
   ├─ Registers history_sync task (ban history sync)
   └─ Registers session_cleanup task (expired session purge)

Failure Mode & Rollback:

If any stage fails:

All completed stages are rolled back in reverse order (Tasks → Scheduler → HTTP_SESSION → GEO_CACHE → DATABASE → WORKER_MODE)
Each rollback suppresses exceptions to ensure all resources are cleaned up
Database connections are closed
HTTP sessions are closed
The scheduler is shut down
The application startup fails with a clear error message

Health Checks:

After all stages complete, a final health check verifies:

All resources have initialized successfully
Resources pass their individual health_check() methods
No failures occurred during any stage

Implementation:

StartupDAG: Orchestrates the entire flow, manages prerequisites, and handles failures
StartupStage: Enum defining the 6 startup stages
StageDependency: Defines stage metadata (description, prerequisites, rollback policy)
StartupContext: Tracks registered resources, completed stages, and failure state
startup_shared_resources(): Main entry point that builds and executes the DAG
stage*(): Functions that implement each stage's initialization logic

Example Usage in Tests:

# Test that a stage with missing prerequisites fails
dag = StartupDAG()
dag.register_stage(StartupStage.HTTP_SESSION, "Create HTTP session", 
                   prerequisites=frozenset([StartupStage.DATABASE]))
dag.register_stage(StartupStage.SCHEDULER, "Create scheduler")

async def http_session_func():
    return aiohttp.ClientSession()

# This will raise RuntimeError because DATABASE hasn't completed
await dag.execute_stage(StartupStage.HTTP_SESSION, http_session_func)

Mappers (`app/mappers/`)

The response mapping layer. Mappers convert domain models (returned by services) to response models (consumed by HTTP routers). This layer enforces the separation between business logic and API shape.

Location: app/mappers/

Responsibilities:

Convert service domain models to API response models
Mechanical, thin translation — no business logic
Used exclusively at the router boundary

Pattern:

Each domain model has a corresponding mapper function:

# Domain model (from service)
DomainActiveBan → map_domain_active_ban_to_response() → ActiveBan (response)

# Service returns domain models:
async def get_active_bans(...) -> DomainActiveBanList

# Router converts at the boundary:
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)

Why separate?

When API requirements change (e.g., new field added, field renamed), only:

Response model in app/models/ changes
Mapper function in app/mappers/ updates
Routers stay the same
Services don't change

Without this layer, changes to API shape would require modifying services and their tests.

Repositories (`app/repositories/`)

The data access layer. Repositories execute raw SQL queries against the application SQLite database. They return plain data or domain models — they never raise HTTP exceptions or contain business logic.

Repository	Purpose
`settings_repo.py`	CRUD operations for application settings (master password hash, DB path, fail2ban socket path, preferences)
`session_repo.py`	Store, retrieve, and delete session records for authentication
`blocklist_repo.py`	Persist blocklist source definitions (name, URL, enabled/disabled)
`fail2ban_db_repo.py`	Read historical ban records from the fail2ban SQLite database
`geo_cache_repo.py`	Persist and query IP geo resolution cache
`import_log_repo.py`	Record import run results (timestamp, source, IPs imported, errors) for the import log view

Every repository in app/repositories/ has a corresponding protocol in app/repositories/protocols.py, including settings_repo.py and history_archive_repo.py.

Models (`app/models/`)

Pydantic schemas that define data shapes and validation. Models are split into three categories per domain.

Model file	Purpose
`auth.py`	Login/request and session models
`ban.py`	Ban creation and lookup models
`blocklist.py`	Blocklist source and import log models
`config.py`	Fail2ban config view/edit models
`file_config.py`	Raw config file read/write models
`geo.py`	Geo and ASN lookup models
`history.py`	Historical ban query and timeline models
`jail.py`	Jail listing and status models
`server.py`	Server status and settings models
`setup.py`	First-run setup wizard models

Tasks (`app/tasks/`)

APScheduler background jobs that run on a schedule without user interaction.

Task	Purpose
`blocklist_import.py`	Downloads all enabled blocklist sources, validates entries, applies bans, records results in the import log
`geo_cache_cleanup.py`	Periodically removes entries from the `geo_cache` table that have not been referenced in the configured retention period (default: 90 days). Prevents unbounded database growth.
`geo_cache_flush.py`	Periodically flushes newly resolved IPs from the in-memory dirty set to the `geo_cache` SQLite table (default: every 60 seconds). GET requests populate only the in-memory cache; this task persists them without blocking any request.
`geo_re_resolve.py`	Periodically re-resolves stale entries in `geo_cache` to keep geolocation data fresh
`health_check.py`	Periodically pings the fail2ban socket and updates the cached server status so the frontend always has fresh data
`history_sync.py`	Periodically copies new records from the fail2ban SQLite database into BanGUI's `history_archive` table; delegates the sync algorithm to `history_service.py`
`session_cleanup.py`	Periodically removes expired sessions from the `sessions` SQLite table (default: every 6 hours). Without this cleanup, the table grows unbounded and degrades query performance.

Utils (`app/utils/`)

Pure helper modules with no framework dependencies.

Module	Purpose
`fail2ban_client.py`	Async client that communicates with fail2ban via its Unix domain socket — sends commands and parses responses using the fail2ban protocol. Modelled after `./fail2ban-master/fail2ban/client/csocket.py` and `./fail2ban-master/fail2ban/client/fail2banclient.py`.
`jail_socket.py`	Low-level jail reload operations (`reload_all`) extracted to break service dependencies. Used by `jail_service`, `jail_config_service`, `action_config_service`, and `filter_config_service` to avoid circular imports between sibling services.
`ip_utils.py`	Validates IPv4/IPv6 addresses and CIDR ranges using the `ipaddress` stdlib module, normalises formats
`jail_utils.py`	Jail helper functions for configuration and status inference
`jail_config.py`	Jail config parser and serializer for fail2ban config manipulation
`time_utils.py`	Timezone-aware datetime construction, formatting helpers, time-range calculations
`log_utils.py`	Structured log formatting and enrichment helpers
`conffile_parser.py`	Parses Fail2ban `.conf` files into structured objects and serialises back to text
`config_parser.py`	Builds structured config objects from file content tokens
`config_writer.py`	Atomic config file writes, backups, and safe replace semantics
`config_file_utils.py`	Common file-level config utility helpers
`fail2ban_db_utils.py`	Fail2ban DB path discovery and ban-history parsing helpers
`setup_utils.py`	Setup wizard helper utilities
`constants.py`	Shared constants: default socket path, default database path, time-range presets, parser truthy values, limits

Configuration (`app/config.py`)

A single Pydantic settings model that loads all configuration from environment variables (prefixed BANGUI_) and an optional .env file. Validated at startup — the application refuses to start if required values are missing.

Dependencies (`app/dependencies.py`)

FastAPI Depends() providers that inject shared resources into route handlers: the database connection, service instances, the authenticated session, and the fail2ban client. This is the wiring layer that connects routers to services without tight coupling.

Application Entry Point (`app/main.py`)

The FastAPI app factory. Responsibilities:

Creates the FastAPI instance with metadata (title, version, docs URL)
Registers the lifespan context manager (startup: open DB, create aiohttp session, start scheduler; shutdown: close all)
Mounts all routers
Registers global exception handlers that map domain exceptions to HTTP status codes
Applies the setup-redirect middleware (returns 423 Locked for all API requests when no configuration exists, except for /api/setup and /api/health)

3. Frontend Architecture

The frontend is a React single-page application built with TypeScript, Vite, and Fluent UI v9. It communicates exclusively with the backend REST API — it never accesses fail2ban, the database, or external services directly.

┌──────────────────────────────────────────────────────────────┐
│                     React Application                        │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Pages   │───▶│ Components │───▶│   Fluent UI v9   │    │
│   └────┬─────┘    └────────────┘    └──────────────────┘    │
│        │                                                     │
│   ┌────┴─────┐    ┌────────────┐    ┌──────────────────┐    │
│   │  Hooks   │───▶│  API Layer │───▶│  Backend (REST)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
│                                                              │
│   ┌──────────┐    ┌────────────┐    ┌──────────────────┐    │
│   │Providers │    │   Types    │    │     Theme        │    │
│   │(Context) │    │(Interfaces)│    │(Tokens, Styles)  │    │
│   └──────────┘    └────────────┘    └──────────────────┘    │
└──────────────────────────────────────────────────────────────┘

3.1 Project Structure

frontend/
├── public/
├── src/
│   ├── api/                   # API client and per-domain request functions
│   │   ├── client.ts          #   Central fetch wrapper (typed GET/POST/PUT/DELETE)
│   │   ├── endpoints.ts       #   API path constants
│   │   ├── auth.ts            #   Login, logout, session check
│   │   ├── dashboard.ts       #   Dashboard status and ban list
│   │   ├── jails.ts           #   Jail CRUD and controls
│   │   ├── bans.ts            #   Ban/unban actions, banned list
│   │   ├── config.ts          #   Configuration read/write
│   │   ├── history.ts         #   Ban history queries
│   │   ├── blocklist.ts       #   Blocklist source management
│   │   ├── geo.ts             #   IP lookup / geolocation
│   │   └── server.ts          #   Server settings
│   ├── assets/                # Static images, fonts, icons
│   ├── components/            # Reusable UI components
│   │   ├── BanTable.tsx       #   Data table for ban entries
│   │   ├── JailCard.tsx       #   Summary card for a jail
│   │   ├── StatusBar.tsx      #   Server status indicator strip
│   │   ├── TimeRangeSelector.tsx # Quick preset picker (24h, 7d, 30d, 365d)
│   │   ├── IpInput.tsx        #   IP address input with validation
│   │   ├── RegexTester.tsx    #   Side-by-side regex match preview
│   │   ├── WorldMap.tsx       #   Country-outline map with ban counts
│   │   ├── ImportLogTable.tsx #   Blocklist import run history
│   │   ├── ConfirmDialog.tsx  #   Reusable confirmation modal
│   │   ├── RequireAuth.tsx    #   Route guard: redirects unauthenticated users to /login
│   │   ├── SetupGuard.tsx     #   Route guard: redirects to /setup if setup incomplete
│   │   └── ...                #   (additional shared components)
│   ├── hooks/                 # Custom React hooks (stateful logic + API calls)
│   │   ├── useAuth.ts         #   Login state, login/logout actions
│   │   ├── useBans.ts         #   Fetch ban list for a time range
│   │   ├── useJails.ts        #   Fetch jail list and details
│   │   ├── useConfig.ts       #   Fetch and update configuration
│   │   ├── useHistory.ts      #   Fetch historical ban data
│   │   ├── useBlocklists.ts   #   Fetch and manage blocklist sources
│   │   ├── useServerStatus.ts #   Poll server health / status
│   │   └── useGeo.ts          #   IP lookup hook
│   ├── layouts/               # Page-level layout wrappers
│   │   └── AppLayout.tsx      #   Sidebar navigation + header + content area
│   ├── pages/                 # Route-level page components (one per route)
│   │   ├── SetupPage.tsx      #   First-run wizard
│   │   ├── LoginPage.tsx      #   Password prompt
│   │   ├── DashboardPage.tsx  #   Ban overview, status bar
│   │   ├── WorldMapPage.tsx   #   Geographical ban map
│   │   ├── JailsPage.tsx      #   Jail list, detail, controls, ban/unban
│   │   ├── ConfigPage.tsx     #   Configuration viewer/editor
│   │   ├── HistoryPage.tsx    #   Ban history browser
│   │   └── BlocklistPage.tsx  #   Blocklist source management + import log
│   ├── providers/             # React context providers
│   │   ├── AuthProvider.tsx   #   Authentication state and guards
│   │   └── ThemeProvider.tsx  #   Light/dark theme switching
│   ├── theme/                 # Fluent UI theme definitions
│   │   ├── customTheme.ts     #   Brand colour ramp, light and dark themes
│   │   └── tokens.ts          #   Spacing, sizing, and z-index constants
│   ├── types/                 # Shared TypeScript interfaces
│   │   ├── auth.ts            #   LoginRequest, SessionInfo
│   │   ├── ban.ts             #   Ban, BanListResponse, BanRequest
│   │   ├── jail.ts            #   Jail, JailDetail, JailListResponse
│   │   ├── config.ts          #   ConfigSection, ConfigUpdateRequest
│   │   ├── history.ts         #   HistoryEntry, IpTimeline
│   │   ├── blocklist.ts       #   BlocklistSource, ImportLogEntry
│   │   ├── geo.ts             #   GeoInfo, AsnInfo
│   │   ├── server.ts          #   ServerStatus, ServerSettings
│   │   └── api.ts             #   ApiError, PaginatedResponse
│   ├── utils/                 # Pure helper functions
│   │   ├── formatDate.ts      #   Date/time formatting with timezone support
│   │   ├── formatIp.ts        #   IP display formatting
│   │   ├── crypto.ts          #   Browser-native SHA-256 helper (SubtleCrypto)
│   │   └── constants.ts       #   Frontend constants (time presets, etc.)
│   ├── App.tsx                # Root: FluentProvider + BrowserRouter + routes
│   ├── main.tsx               # Vite entry point
│   └── vite-env.d.ts          # Vite type shims
├── tsconfig.json
├── vite.config.ts
└── package.json

3.2 Module Purposes

Pages (`src/pages/`)

Top-level route components. Each page composes layout, components, and hooks to create a full screen. Pages contain no business logic — they orchestrate what is displayed and delegate data fetching to hooks.

Page	Route	Purpose
`SetupPage`	`/setup`	First-run wizard: set master password, database path, fail2ban connection, preferences
`LoginPage`	`/login`	Single-field password prompt; redirects to requested page after success
`DashboardPage`	`/`	Server status bar, ban list table, time-range selector
`WorldMapPage`	`/map`	World map with per-country ban counts, country filter
`JailsPage`	`/jails`	Jail overview list, jail detail panel, controls (start/stop/reload), ban/unban forms, IP lookup, whitelist management
`ConfigPage`	`/config`	View and edit jail parameters, filter regex, server settings, regex tester, add log observation
`HistoryPage`	`/history`	Browse all past bans, filter by jail/IP/time, per-IP timeline drill-down
`BlocklistPage`	`/blocklists`	Manage blocklist sources, schedule configuration, import log, manual import trigger

Components (`src/components/`)

Reusable UI building blocks. Components receive data via props, emit changes via callbacks, and never call the API directly. Built exclusively with Fluent UI v9 components.

Component	Purpose
`StatusBar`	Displays fail2ban server status (online/offline, version, jail count, total bans)
`BanTable`	Sortable data table for ban entries with columns for time, IP, jail, country, etc.
`JailCard`	Summary card showing jail name, status badge, key metrics
`TimeRangeSelector`	Quick-preset picker for filtering data (24h, 7d, 30d, 365d)
`IpInput`	IP address text field with inline validation
`WorldMap`	SVG/Canvas country-outline map with count overlays and click-to-filter
`RegexTester`	Side-by-side sample log + regex input with live match highlighting
`ImportLogTable`	Table displaying blocklist import history
`ConfirmDialog`	Reusable Fluent UI Dialog for destructive action confirmations
`RequireAuth`	Route guard: renders children only when authenticated; otherwise redirects to `/login?next=<path>`
`SetupGuard`	Route guard: checks `GET /api/setup` on mount and redirects to `/setup` if not complete; shows a spinner while loading
`config/ConfigListDetail`	Reusable two-pane master/detail layout used by the Jails, Filters, and Actions config tabs. Left pane lists items with active/inactive badges (active sorted first, keyboard navigable); right pane renders the selected item's detail content. Collapses to a dropdown on narrow screens.
`config/RawConfigSection`	Collapsible section that lazily loads the raw text of a config file into a monospace textarea. Provides a Save button backed by a configurable save callback; shows idle/saving/saved/error feedback. Used by all three config tabs.
`config/AutoSaveIndicator`	Small inline indicator showing the current save state (idle, saving, saved, error) for form fields that auto-save on change.

Hooks (`src/hooks/`)

Encapsulate all stateful logic, side effects, and API calls. Components and pages consume hooks to stay declarative.

Hook	Purpose
`useAuth`	Manages login state, provides `login()`, `logout()`, and `isAuthenticated`
`useBans`	Fetches ban list for a given time range, returns `{ bans, loading, error }`
`useJails`	Fetches jail list and individual jail detail
`useConfig`	Reads and writes fail2ban jail configuration via the socket-based API
`useFilterConfig`	Fetches and manages a single filter file's parsed configuration
`useActionConfig`	Fetches and manages a single action file's parsed configuration
`useJailFileConfig`	Fetches and manages a single jail.d config file
`useConfigActiveStatus`	Derives active status sets for jails, filters, and actions by correlating the live jail list with the config file lists; returns `{ activeJails, activeFilters, activeActions, loading, error, refresh }`
`useAutoSave`	Debounced auto-save hook: invokes a save callback after the user stops typing, tracks saving/saved/error state
`useHistory`	Queries historical ban data with filters
`useBlocklists`	Manages blocklist sources and import triggers
`useServerStatus`	Polls the server status endpoint at an interval
`useGeo`	Performs IP geolocation lookups on demand

API Layer (`src/api/`)

A thin typed wrapper around fetch. All HTTP communication is centralised here — components and hooks never construct HTTP requests directly.

Module	Purpose
`client.ts`	Central `get<T>`, `post<T>`, `put<T>`, `del<T>` functions with error handling and credentials
`endpoints.ts`	All API path constants in one place — no hard-coded URLs anywhere else
`auth.ts`	`login()`, `logout()`, `checkSession()`
`dashboard.ts`	`fetchStatus()`, `fetchRecentBans()`
`jails.ts`	`fetchJails()`, `fetchJailDetail()`, `startJail()`, `stopJail()`, `reloadJail()`
`bans.ts`	`banIp()`, `unbanIp()`, `unbanAll()`, `fetchBannedIps()`
`config.ts`	Socket-based config: `fetchJailConfigs()`, `updateJailConfig()`, `testRegex()`. File-based config: `fetchJailFiles()`, `fetchJailFile()`, `writeJailFile()`, `setJailFileEnabled()`, `fetchFilterFiles()`, `fetchFilterFile()`, `writeFilterFile()`, `fetchActionFiles()`, `fetchActionFile()`, `writeActionFile()`, `reloadConfig()`
`history.ts`	`fetchHistory()`, `fetchIpTimeline()`
`blocklist.ts`	`fetchSources()`, `addSource()`, `removeSource()`, `triggerImport()`, `fetchImportLog()`
`geo.ts`	`lookupIp()`
`server.ts`	`fetchServerSettings()`, `updateServerSettings()`

Types (`src/types/`)

Shared TypeScript interfaces and type aliases. Purely declarative — no runtime code. Grouped by domain. Any type used by two or more files lives here.

Providers (`src/providers/`)

React context providers for application-wide concerns.

Provider	Purpose
`AuthProvider`	Holds authentication state; exposes `isAuthenticated`, `login()`, and `logout()` via `useAuth()`
`TimezoneProvider`	Reads the configured IANA timezone from the backend and supplies it to all children via `useTimezone()`
`ThemeProvider`	Manages light/dark theme selection, supplies the active Fluent UI theme to `FluentProvider`

Theme (`src/theme/`)

Fluent UI custom theme definitions and design token constants. No component logic — only colours, spacing, and sizing values.

Utils (`src/utils/`)

Pure helper functions with no React or framework dependency. Date formatting, IP display formatting, shared constants, and cryptographic utilities.

Utility	Purpose
`formatDate.ts`	Date/time formatting with IANA timezone support
`formatIp.ts`	IP address display formatting
`crypto.ts`	`sha256Hex(input)` — SHA-256 digest via browser-native `SubtleCrypto` API; used to hash passwords before transmission
`constants.ts`	Frontend constants (time presets, etc.)

4. Data Flow

4.1 Request Lifecycle

Every user action follows this flow through the system:

User Action (click, form submit)
       │
       ▼
   Page / Component
       │  calls hook
       ▼
   Hook (useXxx)
       │  calls API function
       ▼
   API Layer (src/api/)
       │  HTTP request
       ▼
   FastAPI Router (app/routers/)
       │  validates input (Pydantic)
       │  calls Depends() for auth + services
       ▼
   Service (app/services/)
       │  enforces business rules
       │  calls repository or fail2ban client
       ▼
   Repository (app/repositories/)     or     fail2ban Client (app/utils/)
       │  executes SQL query                       │  sends socket command
       ▼                                           ▼
   SQLite Database                             fail2ban Server
       │                                           │
       └──────────── response bubbles back up ─────┘

4.2 Authentication Flow

┌─────────┐     POST /api/auth/login      ┌─────────────┐
│  Login   │ ─────────────────────────────▶│ auth router  │
│  Page    │     { password: "***" }       │              │
└─────────┘                                └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ auth_service  │
                                           │ - verify hash │
                                           │ - create token│
                                           └──────┬───────┘
                                                  │
                                           ┌──────┴───────┐
                                           │ session_repo  │
                                           │ - store token │
                                           └──────┬───────┘
                                                  │
  Set-Cookie: session=<token>                     │
◀─────────────────────────────────────────────────┘

The master password is hashed and stored during setup.
On login, the submitted password is verified against the stored hash.
A session token is created, stored in the database, and returned as an HTTP-only cookie.
Every subsequent request is authenticated via the session cookie using a FastAPI dependency.
The AuthProvider on the frontend guards all routes except /setup and /login.

4.3 fail2ban Communication

BanGUI communicates with fail2ban through its Unix domain socket using the fail2ban client-server protocol.

┌────────────────────┐          ┌──────────────────┐
│  ban_service.py    │          │  fail2ban server  │
│  jail_service.py   │──socket──│                   │
│  config_service.py │          │  /var/run/fail2ban│
│  health_service.py │          │  /fail2ban.sock   │
└────────────────────┘          └──────────────────┘

The fail2ban_client.py utility module wraps this communication:

Opens an async connection to the Unix socket
Serialises commands using the fail2ban protocol (pickle-based, see ./fail2ban-master/fail2ban/client/csocket.py)
Parses responses into typed Python objects
Handles connection errors gracefully (timeout, socket not found, permission denied)

Reference source: The vendored fail2ban source at ./fail2ban-master is included in the repository as an authoritative protocol reference. When implementing or debugging socket communication, consult:

File What it documents

./fail2ban-master/fail2ban/client/csocket.py CSocket class — low-level Unix socket connection, pickle serialisation, CSPROTO.END framing

./fail2ban-master/fail2ban/client/fail2banclient.py Fail2banClient — command dispatch, argument handling, response beautification

./fail2ban-master/fail2ban/client/beautifier.py Response parser — converts raw server replies into human-readable / structured output

./fail2ban-master/fail2ban/protocol.py CSPROTO constants and the full list of supported commands with descriptions

./fail2ban-master/fail2ban/client/configreader.py Config file parsing used by fail2ban — reference for understanding jail/filter structure

File	What it documents
`./fail2ban-master/fail2ban/client/csocket.py`	`CSocket` class — low-level Unix socket connection, pickle serialisation, `CSPROTO.END` framing
`./fail2ban-master/fail2ban/client/fail2banclient.py`	`Fail2banClient` — command dispatch, argument handling, response beautification
`./fail2ban-master/fail2ban/client/beautifier.py`	Response parser — converts raw server replies into human-readable / structured output
`./fail2ban-master/fail2ban/protocol.py`	`CSPROTO` constants and the full list of supported commands with descriptions
`./fail2ban-master/fail2ban/client/configreader.py`	Config file parsing used by fail2ban — reference for understanding jail/filter structure

Key commands used:

Command	Purpose
`status`	Get global server status (number of jails, fail2ban version)
`status <jail>`	Get jail detail (banned IPs, failure count, filter info)
`set <jail> banip <ip>`	Ban an IP in a specific jail
`set <jail> unbanip <ip>`	Unban an IP from a specific jail
`set <jail> idle on/off`	Toggle jail idle mode
`start/stop <jail>`	Start or stop a jail
`reload <jail>`	Reload a single jail configuration
`reload`	Reload all jails
`get <jail> ...`	Read jail settings (findtime, bantime, maxretry, filter, actions, etc.)
`set <jail> ...`	Write jail settings
`set loglevel <level>`	Change server log level
`set logtarget <target>`	Change server log target
`set dbpurgeage <seconds>`	Set database purge age
`flushlogs`	Flush and re-open log files

4.4 fail2ban Database Access

In addition to the live socket, BanGUI reads the fail2ban SQLite database directly for historical data that the socket protocol does not expose (ban history, past log matches). This is read-only access.

history_service.py ──read-only──▶ fail2ban.db (SQLite)

The fail2ban database contains:

bans table — historical ban records (IP, jail, timestamp, ban data)
jails table — jail definitions
logs table — matched log lines per ban

BanGUI queries these tables to power the Ban History page and the per-IP timeline view.

4.5 External API Communication

geo_service.py ──aiohttp──▶ IP Geolocation API (country, ASN, RIR)
blocklist_service.py ──aiohttp──▶ Blocklist URLs (plain-text IP lists)

All external HTTP calls go through a shared aiohttp.ClientSession created during startup and closed during shutdown. External data is validated before use (IP format, response structure).

5. Database Design

BanGUI maintains its own SQLite database (separate from the fail2ban database) to store application state.

5.1 Application Database Tables

Table	Purpose
`settings`	Key-value store for application configuration (master password hash, fail2ban socket path, database path, timezone, session duration)
`sessions`	Active session token hashes with expiry timestamps. Tokens are stored as one-way SHA256 hashes to prevent token hijacking if the database is exposed.
`geo_cache`	Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at, last_seen). Tracks the last time each IP address was referenced to enable retention policies. Entries older than 90 days are automatically purged by the `geo_cache_cleanup` task to prevent unbounded growth. Loaded into memory at startup via `load_cache_from_db()`; new entries are flushed back by the `geo_cache_flush` background task.
`blocklist_sources`	Registered blocklist URLs (id, name, url, enabled, created_at, updated_at)
`import_logs`	Record of every blocklist import run (id, source_id, timestamp, ips_imported, ips_skipped, errors, status)

5.2 Database Boundaries

Database	Owner	BanGUI Access
BanGUI application DB (`bangui.db`)	BanGUI	Read + Write
fail2ban DB (`fail2ban.db`)	fail2ban	Read-only (for history queries)

6. Setup & Configuration Persistence

6.1 Initial Setup Wizard & One-Time Configuration

The setup wizard (POST /api/setup) runs once during first-time startup to configure:

Master password (bcrypt-hashed)
Runtime database path (where BanGUI stores operational state)
fail2ban Unix socket path
IANA timezone
Session duration (in minutes)
Map color thresholds for geolocation visualization

Atomicity & Crash-Safety:

Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:

Phase 1 (Bootstrap DB transaction): Set setup_state = "in_progress" and persist database_path. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up.
Phase 2 (Filesystem + Runtime DB): Initialize runtime database schema outside a transaction (idempotent via CREATE TABLE IF NOT EXISTS).
Phase 3 (Runtime DB transaction): Batch-write all runtime settings (password hash, paths, config) atomically in a single BEGIN IMMEDIATE ... COMMIT transaction. Either all settings are persisted or none are.
Phase 4 (Bootstrap DB transaction): Set setup_state = "complete" and setup_completed = "1". This is the final commit point — only when this succeeds is setup considered complete.

Password Hash Idempotency:

The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.

State Machine:

State	Meaning	Recovery
`null`	Setup not started	Normal flow: begin setup
`"in_progress"`	Bootstrap DB marked, runtime DB being initialized	Retry from beginning (runtime DB may be partial)
`"complete"`	All settings persisted, setup finished	Skip setup (already done)

If a crash is detected in "in_progress" state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.

Backward Compatibility:

The setup_completed = "1" key is still written for backward compatibility with cache detection. Modern code checks setup_state = "complete" for clearer semantics.

8. Authentication & Session Management

Single-user model — one master password, no usernames.
Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
Sessions are token-based, stored server-side in the sessions table as one-way SHA256 hashes, and delivered to the browser as HTTP-only secure cookies.
Session token hashing — Session tokens are hashed before storage to prevent token hijacking if the database file is exposed. Only the hash (token_hash) is stored in the database; the raw token is never persisted. When validating a session, the incoming token is hashed before the database lookup. This ensures the database alone is not sufficient to usurp a session — an attacker would also need knowledge of the original token value.
Session expiry is configurable (set during setup, stored in settings).
The frontend AuthProvider checks session validity on mount and redirects to /login if invalid.
The backend dependencies.py provides an authenticated dependency that validates the session cookie on every protected endpoint.
Session validation cache (InMemorySessionCache in app.utils.session_cache) — validated session tokens are cached in memory for 10 seconds (configurable via session_cache_ttl_seconds) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. ⚠️ This cache is process-local and not safe for multi-worker or distributed deployments. In single-worker mode (enforced by TASK-002), this is safe and improves performance. For multi-worker deployments, replace InMemorySessionCache with a shared backend (Redis, database, shared memory) implementing the SessionCache protocol. See app/utils/session_cache.py module docstring for implementation details.
GeoCache — GeoCache instance is created at startup with a configurable allow_http_fallback flag and stored on app.state.geo_cache. It implements a primary + fallback resolution strategy: (1) try local MaxMind GeoLite2-Country MMDB database (primary, encrypted, no network traffic), (2) if unavailable/no result and allowed, fall back to ip-api.com HTTP API (unencrypted, disabled by default for security). Encapsulates in-memory lookup cache, negative cache for unresolvable IPs (5-minute TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from the geo_cache SQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by the geo_cache_flush background task. Stale entries are re-resolved by the geo_re_resolve task. Injected into routes and tasks via FastAPI's dependency system. See Backend-Development.md § IP Geolocation Resolution for setup and security details.
Runtime state (RuntimeState in app.utils.runtime_state) — stores mutable application state: server_status (fail2ban online/offline), last_activation (jail activation tracking), pending_recovery (crash detection), runtime_settings (effective configuration), and service-specific state holders like jail_service_state (JailServiceState for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., record_activation(), clear_pending_recovery()) and via dependency injection to services. Service-specific state (like JailServiceState) is nested within RuntimeState to keep all mutable state in one controlled location. ⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker. Mutations must not span await points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See app/utils/runtime_state.py module docstring for details.
Setup-completion flag — once is_setup_complete() returns True, the result is stored in app.state._setup_complete_cached. The SetupRedirectMiddleware skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.

8.1 CSRF Protection

State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a custom header check middleware.

Design:

For requests authenticated via the session cookie (not Bearer token), the CsrfMiddleware requires the custom header X-BanGUI-Request: 1 to be present.
The frontend API client automatically includes this header on all requests.
Cross-site fetch() calls cannot set custom headers without CORS preflight, which the backend rejects for non-allowed origins, providing defense-in-depth.
Safe HTTP methods (GET, HEAD, OPTIONS) bypass the check.
Bearer token authentication (via Authorization: Bearer header) bypasses the check because tokens are not CSRF-vulnerable (they are not automatically sent on cross-origin requests).
Requests missing the CSRF header receive a 403 Forbidden response with detail: "CSRF validation failed. Request rejected.".

This mechanism complements the existing SameSite=Lax cookie policy, which blocks traditional <form> POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.

9. Scheduling

APScheduler 4.x (async mode) manages recurring background tasks.

┌──────────────────────┐
│     APScheduler      │
│  (async, in-process) │
├──────────────────────┤
│  blocklist_import    │  ── runs on configured schedule (default: daily 03:00)
│  geo_cache_cleanup   │  ── runs every 24 hours (nightly)
│  geo_cache_flush     │  ── runs every 60 seconds
│  health_check        │  ── runs every 30 seconds
└──────────────────────┘

The scheduler is started during the FastAPI lifespan startup and stopped during shutdown.
Job schedules are persisted in the application database so they survive restarts.
Users can modify the blocklist import schedule through the web interface.
A manual "Run Now" button triggers the blocklist import job outside the schedule.

10.1 Background Tasks and Database Access

APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on Depends(get_db).
Background tasks must open their own application database connection via app.db.open_db and close it when the work completes.
Use a shared task helper (app.tasks.db.task_db) so every task follows the same async context manager pattern and avoids connection leaks.
This pattern is intentional: task code is structurally separate from request-handling dependencies and should not attempt to reuse request-scoped DB connections.

9. API Design

9.1 Conventions

All endpoints are grouped under /api/ prefix.
JSON request and response bodies, validated by Pydantic models.
Authentication via session cookie on all endpoints except /api/setup and /api/auth/login.
Setup-redirect middleware: while no configuration exists, all API endpoints (except /api/setup and /api/health) return 423 Locked with {"detail": "Setup not complete.", "setup_required": true}. This ensures API consumers can detect setup as a distinct condition rather than transparently following redirects.
Standard HTTP status codes: 200 success, 201 created, 204 no content, 400 bad request, 401 unauthorized, 404 not found, 422 validation error, 423 locked, 500 server error.
Error responses follow a consistent shape: { "detail": "Human-readable message" }.

9.2 Endpoint Groups

Group	Endpoints	Description
Auth	`POST /login`, `POST /logout`	Session management
Setup	`POST /setup`	First-run configuration
Dashboard	`GET /status`, `GET /bans`	Overview data for the main page
Jails	`GET /`, `GET /:name`, `POST /:name/start`, `POST /:name/stop`, `POST /:name/reload`, `POST /reload-all`	Jail listing and controls
Bans	`POST /ban`, `POST /unban`, `POST /unban-all`, `GET /banned`	Ban management
Config	`GET /`, `PUT /`, `POST /test-regex`	Configuration viewing and editing
History	`GET /`, `GET /ip/:ip`	Historical ban browsing
Blocklists	`GET /sources`, `POST /sources`, `DELETE /sources/:id`, `POST /import`, `GET /import-log`	Blocklist management
Geo	`GET /lookup/:ip`	IP geolocation and enrichment
Server	`GET /settings`, `PUT /settings`, `POST /flush-logs`	Server-level settings

9. Deployment Architecture

┌──────────────────────────────────────────────────┐
│                   Host Machine                   │
│                                                  │
│  ┌─────────────────────────────────────────────┐ │
│  │  Reverse Proxy (nginx / caddy)              │ │
│  │  - TLS termination                          │ │
│  │  - /api/* → backend (uvicorn)               │ │
│  │  - /*     → frontend (static files)         │ │
│  └──────────────┬───────────────┬──────────────┘ │
│                 │               │                 │
│  ┌──────────────┴───┐  ┌───────┴──────────────┐  │
│  │ Backend           │  │ Frontend             │  │
│  │ uvicorn + FastAPI │  │ Static build (Vite)  │  │
│  │ (port 8000)       │  │ (served by proxy)    │  │
│  └────────┬──────────┘  └──────────────────────┘  │
│           │                                       │
│  ┌────────┴──────────────────────────────────┐    │
│  │  fail2ban (systemd service)               │    │
│  │  Socket: /var/run/fail2ban/fail2ban.sock  │    │
│  │  Database: /var/lib/fail2ban/fail2ban.db  │    │
│  └───────────────────────────────────────────┘    │
└──────────────────────────────────────────────────┘

The backend runs as an ASGI server (uvicorn) behind a reverse proxy.
The frontend is built to static files by Vite and served directly by the reverse proxy.
The backend process needs read access to the fail2ban socket and the fail2ban database.
Both the application database and the fail2ban database reside on the same host.

10.2 nginx Routing Rules

The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior:

Location Block Priority

nginx uses longest-prefix matching to determine which location block handles a request:

Exact matches (location =) — highest priority
Regular expression matches (location ~) — second priority
Prefix matches (location /prefix) — matched in order of specificity (longest first)
Catch-all (location /) — lowest priority

Routing Configuration

Location Block	Rule	Purpose
`location /api/`	`proxy_pass http://backend:8000;` — no `try_files`	Proxy all API requests to FastAPI backend. Any unmatched API route (typos, invalid paths) returns 404 from the backend.
`location /assets/`	`try_files $uri =404;`	Serve static assets with long-term caching. Return 404 if file doesn't exist.
`location /`	`try_files $uri $uri/ /index.html;`	SPA fallback: serve `index.html` for all unmatched routes (client-side routing).

Routing Behavior

Request → /api/some-endpoint
    ↓
    nginx matches location /api/ (longest prefix)
    ↓
    proxy_pass → backend:8000
    ↓
    Backend returns 404 if endpoint doesn't exist (✓ correct)
    Client sees 404, not SPA HTML

Request → /some-page
    ↓
    nginx matches location / (catch-all)
    ↓
    try_files looks for file, then directory, then /index.html
    ↓
    Serves /index.html (React Router handles client-side routing)
    ↓
    Client sees 200 with HTML (✓ correct for SPA)

Request → /api/typos
    ↓
    nginx matches location /api/ (longest prefix, NOT catch-all)
    ↓
    proxy_pass → backend:8000
    ↓
    FastAPI returns 404 (✓ correct, not caught by SPA fallback)

Critical Implementation Notes

Never add try_files to the /api/ location block — this would hide backend 404s.
The /api/ location must come before the / catch-all in the config (this is automatically respected via longest-prefix matching).
No inherited try_files rules — the /api/ location has no global try_files that could affect it.
Backend 404 responses pass through nginx unchanged — nginx does not rewrite 404 responses from the backend.

9.2a nginx Security Headers

nginx adds the following OWASP-recommended security headers to all responses:

Header	Value	Purpose
Content-Security-Policy	`default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; connect-src 'self'; frame-ancestors 'none';`	Prevents XSS attacks by restricting script execution to same-origin. `style-src 'unsafe-inline'` is required for Fluent UI v9's inline styles.
X-Frame-Options	`DENY`	Prevents clickjacking by disallowing iframe embedding.
X-Content-Type-Options	`nosniff`	Prevents MIME-sniffing; browsers must respect the declared Content-Type.
Referrer-Policy	`no-referrer`	Prevents leaking internal URLs in the `Referer` header to third-party resources.
Permissions-Policy	`geolocation=(), microphone=(), camera=()`	Disables access to browser APIs not needed by the application.
Strict-Transport-Security	Commented out	Must only be enabled after HTTPS is fully configured. Uncomment when TLS termination is production-ready.

All headers use the always directive, ensuring they are included in error responses (4xx, 5xx) as well.

CSP and Fluent UI

Fluent UI v9 applies styles via inline style attributes on DOM elements. To support this, style-src 'unsafe-inline' is required. A stricter CSP using nonces would require server-side rendering of the HTML shell, which is outside the current architecture.

9.3 Deployment Constraints

Single-Worker Requirement

BanGUI's background scheduler must run with exactly one uvicorn worker process.

The application uses APScheduler's AsyncIOScheduler, which is bound to a single asyncio event loop and cannot be safely shared across multiple worker processes. If the app is deployed with --workers N (where N > 1), the following failures occur:

Each worker process creates its own independent scheduler instance.
All background jobs execute N times simultaneously (once per worker).
Results:
- Duplicate blocklist imports — the same IP ranges are banned N times.
- Duplicate history entries — the same historical events are recorded N times.
- Duplicate ban operations — bans are executed multiple times, with potential state conflicts.
- SQLite lock contention — concurrent writes to the same database from N workers cause lock timeouts.

Enforcement

Environment variable: Set BANGUI_WORKERS=1 (default in Dockerfile.backend).
Detection: On startup, startup_shared_resources() validates BANGUI_WORKERS and raises a clear RuntimeError if it is not 1.
Single-process design: The application is optimized for a single-process, high-concurrency model using asyncio. Request handling is fully async and leverages the event loop efficiently.

Future Multi-Worker Support

To safely support multiple workers in the future:

External job store: Move APScheduler from in-memory to a persistent store (e.g., SQLAlchemy-backed job store with PostgreSQL or Redis).
Distributed locking: Use a distributed lock (Redis, etcd) to ensure only one worker executes each scheduled job.
Process coordination: Implement a process-to-worker pool communication mechanism so the scheduler runs only on one designated worker.

Currently, the single-worker approach is simple, maintainable, and sufficient for BanGUI's operational requirements.

10. Design Principles

These principles govern all architectural decisions in BanGUI.

Principle	Application
Separation of Concerns	Frontend and backend are independent. Backend layers (router → service → repository) never mix responsibilities.
Service Independence	Services must not import other services at the same layer (e.g., `jail_config_service` must not import `jail_service`). Shared logic belongs in the utils layer (`app/utils/`). This prevents circular dependencies, improves testability, and keeps each service focused on its domain.
Single Responsibility	Each module, service, and component has one well-defined job.
Dependency Inversion	Services depend on abstractions (protocols), not concrete implementations. FastAPI `Depends()` wires everything.
Async Everything	All I/O is non-blocking. No synchronous database, HTTP, or socket calls anywhere in the backend.
Validate at the Boundary	Pydantic models validate all data entering the backend. TypeScript types enforce structure on the frontend.
Fail Fast	Configuration is validated at startup. Invalid input is rejected immediately with clear errors.
Composition over Inheritance	Small, focused objects are composed together rather than building deep class hierarchies.
DRY	Shared logic lives in utils, hooks, or base services — never duplicated across modules.
KISS	The simplest correct solution wins. No premature abstractions or over-engineering.
YAGNI	Only build what is needed now. Extend when a real requirement appears.

79 KiB Raw Blame History