- Remove structlog dependency from backend/pyproject.toml - Add app.utils.logging_compat shim for keyword-arg logging API - Add app.utils.json_formatter for JSON log output with extra fields - Update all backend modules to use logging_compat.get_logger() - Update docstrings in log_sanitizer.py and json_formatter.py - Update test comment in test_async_utils.py - Record 406 failing tests in Docs/Tasks.md for tracking
102 KiB
BanGUI — Architecture
This document describes the system architecture of BanGUI, a web application for monitoring, managing, and configuring fail2ban. It defines every major component, module, and data flow so that any developer can understand how the pieces fit together before writing code.
1. High-Level Overview
BanGUI is a two-tier web application with a clear separation between frontend and backend, connected through a RESTful JSON API.
┌──────────────────────────────────────────────────────────────────┐
│ Browser │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Frontend (React + Fluent UI) │ │
│ │ TypeScript · Vite · Single-Page Application │ │
│ └──────────────────────────┬─────────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────┘
│ HTTP / JSON (REST API)
┌─────────────────────────────┼────────────────────────────────────┐
│ Server │
│ ┌──────────────────────────┴─────────────────────────────────┐ │
│ │ Backend (FastAPI) │ │
│ │ Python 3.12+ · Async · Pydantic v2 · structlog │ │
│ └─────┬──────────────┬──────────────┬────────────────────────┘ │
│ │ │ │ │
│ ┌─────┴─────┐ ┌─────┴─────┐ ┌────┴─────┐ │
│ │ SQLite │ │ fail2ban │ │ External │ │
│ │ (App DB) │ │ (Socket) │ │ APIs │ │
│ └───────────┘ └───────────┘ └──────────┘ │
└──────────────────────────────────────────────────────────────────┘
Component Summary
| Component | Technology | Purpose |
|---|---|---|
| Frontend | TypeScript, React, Fluent UI v9, Vite | User interface — displays data, captures user input, communicates with the backend API |
| Backend | Python 3.12+, FastAPI, Pydantic v2, aiosqlite | Business logic, data persistence, fail2ban communication, scheduling |
| Application Database | SQLite (via aiosqlite) | Stores BanGUI's own data: configuration, session state, blocklist sources, import logs |
| fail2ban | Unix domain socket | The monitored service — BanGUI reads status, issues commands, and reads the fail2ban database |
| MaxMind GeoLite2 | Offline MMDB file (mounted into container) | IP geolocation (primary resolver) — local, encrypted |
| External APIs | HTTP (via aiohttp) | Blocklist downloads; IP geolocation fallback (only if MMDB unavailable and HTTP fallback enabled) |
2. Backend Architecture
The backend follows a layered architecture with strict separation of concerns. Dependencies flow inward: routers depend on services, services depend on repositories — never the reverse.
┌─────────────────────────────────┐
│ FastAPI Application │
│ (main.py) │
└──────────┬──────────────────────-┘
│
┌────────────────┼────────────────┐
│ │ │
┌─────┴──────┐ ┌─────┴──────┐ ┌──────┴──────┐
│ Routers │ │ Tasks │ │ Config │
│ (HTTP) │ │ (Scheduled)│ │ (Settings) │
└─────┬──────┘ └─────┬──────┘ └─────────────┘
│ │
┌─────┴───────────────┴──────┐
│ Services │
│ (Business Logic) │
└─────┬──────────────┬───────┘
│ │
┌─────┴──────┐ ┌─────┴──────┐
│Repositories│ │ External │
│ (Database) │ │ Clients │
└─────┬──────┘ └─────┬──────┘
│ │
┌─────┴──────┐ ┌─────┴──────┐
│ SQLite │ │fail2ban / │
│ │ │HTTP APIs │
└────────────┘ └────────────┘
2.1 Project Structure
backend/
├── app/
│ ├── __init__.py
│ ├── `main.py` # FastAPI app factory, lifespan, exception handlers
│ ├── `config.py` # Pydantic settings (env vars, .env loading)
│ ├── `db.py` # Database connection and initialization
│ ├── `exceptions.py` # Shared domain exception classes; all services and routers import from here
│ ├── `dependencies.py` # FastAPI Depends() providers (DB, services, auth)
│ ├── `models/` # Pydantic schemas
│ │ ├── auth.py # Login request/response, session models
│ │ ├── ban.py # Ban request/response/domain models
│ │ ├── jail.py # Jail request/response/domain models
│ │ ├── config.py # Configuration view/edit models
│ │ ├── blocklist.py # Blocklist source/import models
│ │ ├── history.py # Ban history models
│ │ ├── server.py # Server status, health check models
│ │ └── setup.py # Setup wizard models
│ ├── routers/ # FastAPI routers (HTTP layer only)
│ │ ├── auth.py # POST /api/auth/login, POST /api/auth/logout
│ │ ├── setup.py # POST /api/setup (first-run configuration)
│ │ ├── dashboard.py # GET /api/dashboard/status, GET /api/dashboard/bans
│ │ ├── jails.py # CRUD + controls for jails
│ │ ├── bans.py # Ban/unban actions, currently banned list
│ │ ├── config.py # View/edit fail2ban configuration
│ │ ├── history.py # Historical ban queries
│ │ ├── blocklist.py # Blocklist source management, manual import trigger
│ │ ├── geo.py # IP geolocation and lookup
│ │ └── server.py # Server settings (log level, DB purge, etc.)
│ ├── services/ # Business logic (one service per domain)
│ │ ├── auth_service.py # Password verification, session creation/validation
│ │ ├── setup_service.py # First-run setup logic, configuration persistence
│ │ ├── jail_service.py # Jail listing, start/stop/reload, status aggregation
│ │ ├── ban_service.py # Ban/unban execution, currently-banned queries
│ │ ├── config_service.py # Read/write fail2ban config, regex validation
│ │ ├── config_file_service.py # Shared config parsing and file-level operations
│ │ ├── raw_config_io_service.py # Raw config file I/O wrapper
│ │ ├── jail_config_service.py # jail config activation/deactivation logic
│ │ ├── filter_config_service.py # filter config lifecycle management
│ │ ├── action_config_service.py # action config lifecycle management
│ │ ├── log_service.py # Log preview and regex test operations
│ │ ├── fail2ban_metadata_service.py # Resolve and cache the fail2ban SQLite DB path via the fail2ban socket
│ │ ├── history_service.py # Historical ban queries, per-IP timeline
│ │ ├── blocklist_service.py # Orchestration: source CRUD, scheduling, import triggers
│ │ ├── blocklist_downloader.py # HTTP download with retry logic
│ │ ├── blocklist_parser.py # Parse and validate IP addresses
│ │ ├── blocklist_ban_executor.py # Ban execution with error handling
│ │ ├── blocklist_import_workflow.py # Import orchestration (coordinates components)
│ │ ├── geo_service.py # IP-to-country resolution, ASN/RIR lookup
│ │ ├── server_service.py # Server settings, log management, DB purge
│ │ └── health_service.py # fail2ban connectivity checks, version detection
│ ├── repositories/ # Data access layer (raw queries only)
│ │ ├── settings_repo.py # App configuration CRUD in SQLite
│ │ ├── session_repo.py # Session storage and lookup
│ │ ├── blocklist_repo.py # Blocklist sources and import log persistence│ │ ├── fail2ban_db_repo.py # fail2ban SQLite ban history read operations
│ │ ├── geo_cache_repo.py # IP geolocation cache persistence│ │ └── import_log_repo.py # Import run history records
│ ├── tasks/ # APScheduler background jobs
│ │ ├── blocklist_import.py# Scheduled blocklist download and application
│ │ ├── geo_cache_flush.py # Periodic geo cache persistence (dirty-set flush to SQLite)│ │ ├── geo_cache_cleanup.py # Periodic purge of stale geo cache entries
│ │ ├── geo_re_resolve.py # Periodic re-resolution of stale geo cache records│ │ └── health_check.py # Periodic fail2ban connectivity probe
│ └── utils/ # Helpers, constants, shared types
│ ├── fail2ban_client.py # Async wrapper around the fail2ban socket protocol
│ ├── fail2ban_response.py # Canonical response parsing: ok(), to_dict(), ensure_list(), is_not_found_error()
│ ├── fail2ban_db_utils.py # fail2ban database query helpers
│ ├── ip_utils.py # IP/CIDR validation and normalisation
│ ├── time_utils.py # Timezone-aware datetime helpers
│ ├── config_file_utils.py # fail2ban config file I/O
│ ├── conffile_parser.py # fail2ban config file parser/serializer
│ ├── config_parser.py # Structured config object parser
│ ├── config_writer.py # Atomic config file write operations
│ ├── jail_config.py # Jail config helper
│ └── constants.py # Shared constants (default paths, limits, etc.)
├── tests/
│ ├── conftest.py # Shared fixtures (test app, client, mock DB)
│ ├── test_routers/ # One test file per router
│ ├── test_services/ # One test file per service
│ └── test_repositories/ # One test file per repository
├── pyproject.toml
└── .env.example
2.2 Module Purposes
Routers (app/routers/)
The HTTP interface layer. Each router maps URL paths to handler functions. Routers parse and validate incoming requests using Pydantic models, delegate all logic to services, and return typed responses. They contain zero business logic.
| Router | Prefix | Purpose |
|---|---|---|
auth.py |
/api/auth |
Login (password check), logout, session validation |
setup.py |
/api/setup |
First-run wizard — save initial configuration |
dashboard.py |
/api/dashboard |
Server status bar data, recent bans for the dashboard |
jails.py |
/api/jails |
List jails, jail detail, start/stop/reload/idle controls |
bans.py |
/api/bans |
Ban an IP, unban an IP, unban all, list currently banned IPs |
config.py |
/api/config |
Read and write fail2ban jail/filter/server configuration via the socket; also serves the fail2ban log tail and service status for the Log tab |
file_config.py |
/api/config |
Read and write fail2ban config files on disk (jail.d/, filter.d/, action.d/) — list, get, and overwrite raw file contents, toggle jail enabled/disabled |
history.py |
/api/history |
Query historical bans, per-IP timeline |
blocklist.py |
/api/blocklists |
CRUD blocklist sources, trigger import, view import logs |
geo.py |
/api/geo |
IP geolocation lookup, ASN and RIR data |
server.py |
/api/server |
Log level, log target, DB path, purge age, flush logs |
health.py |
/api/health |
fail2ban connectivity health check and status |
Services (app/services)
The business logic layer. Services orchestrate operations, enforce rules, and coordinate between repositories, the fail2ban client, and external APIs. Each service covers a single domain.
Service Layer Responsibilities:
Services must be independent of HTTP concerns. They work with domain models (DTOs), not response models. This ensures:
- Domain logic can evolve without affecting API shape
- Services are reusable across different frontends
- Testing is simpler (no mocking HTTP response types)
- Changes to endpoint responses don't require service changes
Domain Models and Response Mapping:
Services return domain models (e.g., DomainActiveBanList, DomainBansByCountry) that represent pure business logic. Response models (e.g., ActiveBanListResponse, BansByCountryResponse) are defined in app/models/ and used only by routers.
Conversion happens at the router boundary:
- Router calls service → receives domain model
- Router calls mapper function to convert domain model → response model
- Router returns response model to HTTP client
Example:
# In ban_service.py
async def get_active_bans(...) -> DomainActiveBanList:
"""Service returns domain model (not HTTP-aware)."""
...
# In routers/bans.py (router boundary)
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)
Mapper functions live in app/mappers/ and are thin, mechanical translations between structures.
Motivation:
- The Fail2ban domain doesn't care about field names like
country_code(snake_case) vscountryCode(camelCase) - If the API needs pagination metadata added to the response, only the mapper changes
- If repositories change their output schema, only services need updating (routers are unaffected)
- Services can be tested with simple dataclasses; no need for Pydantic serialization overhead
| Service | Purpose |
|---|---|
auth_service.py |
Hashes and verifies the master password, creates and validates session tokens, enforces session expiry |
setup_service.py |
Validates setup input, persists initial configuration, ensures setup runs only once |
jail_service.py |
Retrieves jail list and details from fail2ban, aggregates metrics (banned count, failure count), sends start/stop/reload/idle commands |
ban_service.py |
Executes ban and unban commands via the fail2ban socket, queries the currently banned IP list, validates IPs before banning |
config_service.py |
Reads active jail and filter configuration from fail2ban, writes configuration changes, validates regex patterns, triggers reload; reads the fail2ban log file tail and queries service status for the Log tab |
file_config_service.py |
Reads and writes raw fail2ban config files on disk (jail.d/, filter.d/, action.d/); lists files, reads content, overwrites files, toggles enabled/disabled |
jail_config_service.py |
Discovers inactive jails by parsing jail.conf / jail.local / jail.d/*; writes .local overrides to activate/deactivate jails; triggers fail2ban reload; validates jail configurations |
filter_config_service.py |
Discovers available filters by scanning filter.d/; reads, creates, updates, and deletes filter definitions; assigns filters to jails |
action_config_service.py |
Discovers available actions by scanning action.d/; reads, creates, updates, and deletes action definitions; assigns actions to jails |
config_file_service.py |
Shared utilities for configuration parsing and manipulation: parses config files, validates names/IPs, manages atomic file writes, probes fail2ban socket |
raw_config_io_service.py |
Low-level file I/O for raw fail2ban config files |
fail2ban_metadata_service.py |
Resolves the fail2ban SQLite database path by querying the fail2ban socket and caches the result for reuse across services |
log_service.py |
Log preview and regex test operations (extracted from config_service) |
history_service.py |
Queries the fail2ban database for historical ban records, builds per-IP timelines, computes ban counts and repeat-offender flags, and syncs new records into BanGUI's archive table |
blocklist_service.py |
Orchestration layer for blocklist imports. Delegates to focused components: BlocklistDownloader (HTTP download with retry), BlocklistParser (IP validation), BanExecutor (fail2ban integration), and BlocklistImportWorkflow (orchestrates the flow). Maintains public API for source CRUD, preview, scheduling, and import triggers. |
geo_cache.py |
GeoCache class that encapsulates all IP geolocation caching: resolves IP addresses to country, ASN, and organization using a primary local MaxMind GeoLite2-Country database (if available) with optional HTTP fallback to ip-api.com (disabled by default for security). Maintains in-memory and persistent caches with negative cache support, and manages background re-resolution. Instantiated once at startup with allow_http_fallback flag and stored on app.state.geo_cache |
geo_service.py |
(Deprecated) Backward-compatibility wrappers that delegate to the GeoCache instance. Kept for compatibility with existing code. New code should use GeoCache directly or via dependency injection |
server_service.py |
Reads and writes fail2ban server-level settings (log level, log target, syslog socket, DB location, purge age) |
health_service.py |
Probes fail2ban socket connectivity, retrieves server version and global stats, reports online/offline status |
Blocklist Import Architecture
The blocklist import flow has been refactored to separate concerns into focused components:
blocklist_service.py (Public API)
│
├─ import_source() ──┐
│ │
└─ import_all() ├──> BlocklistImportWorkflow (Orchestrator)
│ │
│ ├──> BlocklistDownloader
│ │ • HTTP GET with retry logic
│ │ • Exponential backoff (429, 5xx)
│ │ • Timeout handling
│ │
│ ├──> BlocklistParser
│ │ • Parse text to IP lines
│ │ • Validate IPv4/IPv6 addresses
│ │ • Skip CIDRs and malformed entries
│ │
│ ├──> BanExecutor
│ │ • Ban each IP via fail2ban socket
│ │ • Abort on JailNotFoundError
│ │ • Continue on individual ban failures
│ │
│ └──> Geo pre-warming
│ (optional batch lookup for newly banned IPs)
│
└──> Result logging (import_log_repo)
Component Responsibilities:
- BlocklistDownloader: Handles HTTP transport concerns (retries, timeouts, backoff)
- BlocklistParser: Handles parsing and validation logic (clean, testable, no I/O)
- BanExecutor: Handles fail2ban integration with error aggregation
- BlocklistImportWorkflow: Coordinates the flow, handles result aggregation and geo pre-warming
- blocklist_service.py: Maintains public API (source CRUD, scheduling, import triggers)
Benefits of This Architecture:
- Each component is independently testable with mock dependencies
- Error handling is clear: JailNotFoundError stops processing, JailOperationError continues
- Components can be evolved independently (e.g., replace HTTP client, add batch validation)
- Logging is contextual and tied to the appropriate layer
- Retry logic and transient error handling are isolated
DNS-Rebinding Protection
The Vulnerability:
A DNS-rebinding attack exploits a time-of-check-to-time-of-use (TOCTOU) window between when a blocklist URL is validated and when it is actually fetched:
- User adds blocklist URL
http://attacker.com/blocklist.txt blocklist_service.create_source()callsvalidate_blocklist_url()which performs DNS resolutionattacker.comresolves to a public IP (attacker's real server) — validation passes ✓- Later, when
BlocklistDownloaderfetches the URL, the attacker's DNS server responds with192.168.1.1 - The HTTP client connects to the private IP, potentially accessing internal services
The Protection:
BanGUI closes this window by adding a second DNS-rebinding check at connection time:
- Create-time validation (
app/utils/ip_utils.py:validate_blocklist_url): Confirms the URL resolves to a public IP when created - Connection-time validation (
app/services/dns_validated_connector.py): Validates that all resolved IPs are public when the actual HTTP connection is made
The HTTP session is created with a custom socket factory that intercepts DNS resolution results before socket creation. If any resolved IP is private or reserved, the connection is rejected with a clear error.
Implementation:
app/services/dns_validated_connector.py: Providescreate_dns_validated_socket_factory()which returns a socket factory that validates IPs usingis_private_ip()app/startup.py:_create_http_session(): Passes the socket factory toaiohttp.TCPConnector, protecting all HTTP requests globally- All blocklist imports automatically inherit this protection through the shared session
Protected IP Ranges:
The validation blocks all RFC 1918 private ranges, loopback, link-local, ULA, multicast, and reserved addresses:
- IPv4:
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,127.0.0.0/8,224.0.0.0/4,240.0.0.0/4,255.255.255.255/32 - IPv6:
::1/128,fe80::/10,fc00::/7,ff00::/8, and others (viaipaddress.IPv6Address.is_private, etc.)
Reference:
- OWASP SSRF Prevention Cheat Sheet
- Tests:
backend/tests/test_services/test_dns_validated_connector.py
Startup DAG (app/startup_dag.py, app/startup.py)
The startup process is orchestrated by an explicit Directed Acyclic Graph (DAG) that defines all resource initialization stages, their dependencies, health checks, and rollback strategy. This replaces implicit ordering with explicit, documented prerequisites.
Why This Exists:
Previously, startup resources were created in a procedural sequence without documented dependencies. If a stage was reordered or a prerequisite was missed, initialization could fail in non-obvious ways. Partial failures could leave stale resources (open database connections, HTTP sessions, running schedulers) that prevented clean rollback.
Startup Stages (in order):
1. WORKER_MODE
└─ Validates that BANGUI_WORKERS=1 (scheduler cannot run in multiple workers)
2. DATABASE
├─ Prerequisite: WORKER_MODE
├─ Creates database directory
├─ Initializes database schema
├─ Caches setup completion state
└─ Loads persisted runtime settings
3. GEO_CACHE
├─ Prerequisite: DATABASE
├─ Loads IP geolocation cache from database
├─ Counts unresolved IPs
├─ Initializes MaxMind GeoLite2 database
└─ Configures HTTP fallback (if enabled)
4. HTTP_SESSION
├─ Prerequisite: GEO_CACHE
├─ Creates aiohttp.ClientSession
└─ Configures timeouts and connection limits
5. SCHEDULER
├─ Prerequisite: HTTP_SESSION
├─ Creates APScheduler AsyncIOScheduler
└─ Starts the scheduler
6. TASKS
├─ Prerequisite: SCHEDULER
├─ Registers health_check task (fail2ban connectivity probe)
├─ Registers blocklist_import task (scheduled imports)
├─ Registers geo_cache_cleanup task (stale entry purge)
├─ Registers geo_cache_flush task (periodic persistence)
├─ Registers geo_re_resolve task (stale record re-resolution)
├─ Registers history_sync task (ban history sync)
└─ Registers session_cleanup task (expired session purge)
Failure Mode & Rollback:
If any stage fails:
- All completed stages are rolled back in reverse order (Tasks → Scheduler → HTTP_SESSION → GEO_CACHE → DATABASE → WORKER_MODE)
- Each rollback suppresses exceptions to ensure all resources are cleaned up
- Database connections are closed
- HTTP sessions are closed
- The scheduler is shut down
- The application startup fails with a clear error message
Health Checks:
After all stages complete, a final health check verifies:
- All resources have initialized successfully
- Resources pass their individual health_check() methods
- No failures occurred during any stage
Implementation:
- StartupDAG: Orchestrates the entire flow, manages prerequisites, and handles failures
- StartupStage: Enum defining the 6 startup stages
- StageDependency: Defines stage metadata (description, prerequisites, rollback policy)
- StartupContext: Tracks registered resources, completed stages, and failure state
- startup_shared_resources(): Main entry point that builds and executes the DAG
- stage*(): Functions that implement each stage's initialization logic
Example Usage in Tests:
# Test that a stage with missing prerequisites fails
dag = StartupDAG()
dag.register_stage(StartupStage.HTTP_SESSION, "Create HTTP session",
prerequisites=frozenset([StartupStage.DATABASE]))
dag.register_stage(StartupStage.SCHEDULER, "Create scheduler")
async def http_session_func():
return aiohttp.ClientSession()
# This will raise RuntimeError because DATABASE hasn't completed
await dag.execute_stage(StartupStage.HTTP_SESSION, http_session_func)
Mappers (app/mappers/)
The response mapping layer. Mappers convert domain models (returned by services) to response models (consumed by HTTP routers). This layer enforces the separation between business logic and API shape.
Location: app/mappers/
Responsibilities:
- Convert service domain models to API response models
- Mechanical, thin translation — no business logic
- Used exclusively at the router boundary
Pattern:
Each domain model has a corresponding mapper function:
# Domain model (from service)
DomainActiveBan → map_domain_active_ban_to_response() → ActiveBan (response)
# Service returns domain models:
async def get_active_bans(...) -> DomainActiveBanList
# Router converts at the boundary:
domain_result = await ban_service.get_active_bans(...)
return map_domain_active_ban_list_to_response(domain_result)
Why separate?
When API requirements change (e.g., new field added, field renamed), only:
- Response model in
app/models/changes - Mapper function in
app/mappers/updates - Routers stay the same
- Services don't change
Without this layer, changes to API shape would require modifying services and their tests.
Repositories (app/repositories/)
The data access layer. Repositories execute raw SQL queries against the application SQLite database. They return plain data or domain models — they never raise HTTP exceptions or contain business logic.
| Repository | Purpose |
|---|---|
settings_repo.py |
CRUD operations for application settings (master password hash, DB path, fail2ban socket path, preferences) |
session_repo.py |
Store, retrieve, and delete session records for authentication |
blocklist_repo.py |
Persist blocklist source definitions (name, URL, enabled/disabled) |
fail2ban_db_repo.py |
Read historical ban records from the fail2ban SQLite database |
geo_cache_repo.py |
Persist and query IP geo resolution cache |
import_log_repo.py |
Record import run results (timestamp, source, IPs imported, errors) for the import log view |
Every repository in app/repositories/ has a corresponding protocol in app/repositories/protocols.py, including settings_repo.py and history_archive_repo.py.
Models (app/models/)
Pydantic schemas that define data shapes and validation. Models are split into three categories per domain.
| Model file | Purpose |
|---|---|
auth.py |
Login/request and session models |
ban.py |
Ban creation and lookup models |
blocklist.py |
Blocklist source and import log models |
config.py |
Fail2ban config view/edit models |
file_config.py |
Raw config file read/write models |
geo.py |
Geo and ASN lookup models |
history.py |
Historical ban query and timeline models |
jail.py |
Jail listing and status models |
server.py |
Server status and settings models |
setup.py |
First-run setup wizard models |
Model Layering Rules: Models are pure data classes (leaf nodes) in the dependency graph. They must not import from application-layer modules (app.services, app.config, app.utils). Models may import from:
- Standard library and third-party packages (Pydantic, typing)
- Other models in
app.models/(sibling models) app.models.response(response envelopes)
Critical Constraint — No I/O or Side Effects: Pydantic validators, field defaults, and computed fields must be pure functions with no side effects:
- ❌ NO imports from
app.config,app.services,app.utils, orapp.routers(these are application-layer modules) - ❌ NO calls to
get_settings(), file I/O, database queries, network calls, or any runtime-dependent functions - ❌ NO
default_factorythat calls app-layer functions
These constraints ensure that importing a model file does not trigger application initialization and prevents hidden circular dependencies.
Validation that requires access to app-level state (e.g., allowed log directories, settings, database) must be moved to the router or service layer, not in model validators. Validation occurs at the boundary — where settings and services are already available.
Tasks (app/tasks/)
APScheduler background jobs that run on a schedule without user interaction.
| Task | Purpose |
|---|---|
blocklist_import.py |
Downloads all enabled blocklist sources, validates entries, applies bans, records results in the import log |
geo_cache_cleanup.py |
Periodically removes entries from the geo_cache table that have not been referenced in the configured retention period (default: 90 days). Prevents unbounded database growth. |
geo_cache_flush.py |
Periodically flushes newly resolved IPs from the in-memory dirty set to the geo_cache SQLite table (default: every 60 seconds). GET requests populate only the in-memory cache; this task persists them without blocking any request. |
geo_re_resolve.py |
Periodically re-resolves stale entries in geo_cache to keep geolocation data fresh |
health_check.py |
Periodically pings the fail2ban socket and updates the cached server status so the frontend always has fresh data |
history_sync.py |
Periodically copies new records from the fail2ban SQLite database into BanGUI's history_archive table; delegates the sync algorithm to history_service.py |
session_cleanup.py |
Periodically removes expired sessions from the sessions SQLite table (default: every 6 hours). Without this cleanup, the table grows unbounded and degrades query performance. |
Utils (app/utils/)
Pure helper modules with no framework dependencies.
| Module | Purpose |
|---|---|
fail2ban_client.py |
Async client that communicates with fail2ban via its Unix domain socket — sends commands and parses responses using the fail2ban protocol. Modelled after ./fail2ban-master/fail2ban/client/csocket.py and ./fail2ban-master/fail2ban/client/fail2banclient.py. |
jail_socket.py |
Low-level jail reload operations (reload_all) extracted to break service dependencies. Used by jail_service, jail_config_service, action_config_service, and filter_config_service to avoid circular imports between sibling services. |
ip_utils.py |
Validates IPv4/IPv6 addresses and CIDR ranges using the ipaddress stdlib module, normalises formats |
jail_utils.py |
Jail helper functions for configuration and status inference |
jail_config.py |
Jail config parser and serializer for fail2ban config manipulation |
time_utils.py |
Timezone-aware datetime construction, formatting helpers, time-range calculations |
log_utils.py |
Structured log formatting and enrichment helpers |
conffile_parser.py |
Parses Fail2ban .conf files into structured objects and serialises back to text |
config_parser.py |
Builds structured config objects from file content tokens |
config_writer.py |
Atomic config file writes, backups, and safe replace semantics |
config_file_utils.py |
Common file-level config utility helpers |
fail2ban_db_utils.py |
Fail2ban DB path discovery and ban-history parsing helpers |
setup_utils.py |
Setup wizard helper utilities |
constants.py |
Shared constants: default socket path, default database path, time-range presets, parser truthy values, limits |
Configuration (app/config.py)
A single Pydantic settings model that loads all configuration from environment variables (prefixed BANGUI_) and an optional .env file. Validated at startup — the application refuses to start if required values are missing.
Dependencies (app/dependencies.py)
FastAPI Depends() providers that inject shared resources into route handlers: the database connection, service instances, the authenticated session, and the fail2ban client. This is the wiring layer that connects routers to services without tight coupling.
Application Entry Point (app/main.py)
The FastAPI app factory. Responsibilities:
- Creates the
FastAPIinstance with metadata (title, version, docs URL) - Registers the lifespan context manager (startup: open DB, create aiohttp session, start scheduler; shutdown: close all)
- Mounts all routers
- Registers global exception handlers that map domain exceptions to HTTP status codes with a hierarchical fallback chain
- Applies the setup-redirect middleware (returns
423 Lockedfor all API requests when no configuration exists, except for/api/setupand/api/health)
Exception Handler Hierarchy:
Exception handlers are registered in order of specificity to ensure each exception type is caught by the most appropriate handler:
- Specific network errors (Fail2BanConnectionError, Fail2BanProtocolError) → HTTP 502 Bad Gateway
- Specific auth/rate errors (AuthenticationError, RateLimitError) → HTTP 401 Unauthorized / 429 Too Many Requests
- Category handlers (NotFoundError, BadRequestError, ConflictError, OperationError, ServiceUnavailableError) → HTTP 404/400/409/500/503
- DomainError catch-all → HTTP 500 (catches any unregistered DomainError subclass, ensuring proper error_code and metadata are returned)
- HTTPException → HTTP status from exception (FastAPI built-in validation and routing errors)
- ValueError → HTTP 400 Bad Request (Pydantic validation errors)
- Exception catch-all → HTTP 500 Internal Server Error (absolute fallback for unexpected errors)
The DomainError catch-all handler (step 4) is critical: it ensures that any new DomainError subclass automatically gets the correct HTTP status (500), error_code, and metadata through its inherited error_code attribute and get_error_metadata() method, even if the developer forgot to create an explicit handler for it. This prevents silent failures where an unhandled exception would return a generic "internal_error" code instead of the specific error code defined by the exception class.
2.3 Dependency Wiring and Service Composition
BanGUI uses a lightweight dependency injection (DI) pattern based on FastAPI's Depends() framework. There is no heavy container library — the composition root is implicit and managed through simple provider functions in app/dependencies.py.
The DI Pattern
Every injectable dependency follows this structure:
-
Provider Function — An async function in
app/dependencies.pythat creates and returns a dependency:async def get_settings(app_context: ...) -> Settings: """Provide application settings.""" return app_context.runtime_settings or app_context.settings -
Type Alias — An
Annotatedalias that decorates the provider for use in route signatures:SettingsDep = Annotated[Settings, Depends(get_settings)] -
Injection Point — Routers declare their dependencies using the type alias:
async def my_route(settings: SettingsDep) -> Response: # FastAPI automatically calls get_settings() and injects the result ...
Module-Level Imports:
All repository and service modules are imported at module level in app/dependencies.py. These imports are safe at the top because no circular dependencies exist — repositories and services do not import from dependencies.py. This follows the principle of importing dependencies early and consistently:
# app/dependencies.py (top of file)
from app.repositories import (
blocklist_repo,
fail2ban_db_repo,
session_repo,
# ... other repository modules
)
from app.services import auth_service, health_service
from app.services.fail2ban_metadata_service import default_fail2ban_metadata_service
# Provider functions simply return the module
async def get_session_repo() -> SessionRepository:
return session_repo
Exception: The from app.db import open_db import remains local to get_db() because it is only used within that specific function and the module load overhead is avoided.
Service Composition Root
Services are not instantiated by a container. Instead, they are composed by routers and tasks through explicit parameter passing. This keeps dependencies visible and avoids implicit side effects.
Example: How ban_service.get_active_bans() is wired:
# Step 1: Router declares what it needs (dependencies.py)
async def get_ban_service_context(
db: Annotated[aiosqlite.Connection, Depends(get_db)],
fail2ban_db_repo: Annotated[Fail2BanDbRepository, Depends(get_fail2ban_db_repo)],
) -> BanServiceContext:
"""Combine database connection and repository."""
return BanServiceContext(db=db, fail2ban_db_repo=fail2ban_db_repo)
BanServiceContextDep = Annotated[BanServiceContext, Depends(get_ban_service_context)]
# Step 2: Router uses the context and calls the service
@router.get("/active")
async def get_active_bans(
ban_ctx: BanServiceContextDep,
socket_path: Fail2BanSocketDep,
geo_cache: GeoCacheDep,
) -> ActiveBanListResponse:
# Router explicitly passes everything the service needs
domain_result = await ban_service.get_active_bans(
socket_path,
geo_cache=geo_cache,
app_db=ban_ctx.db, # ← Explicit, no magic
)
return map_domain_active_ban_list_to_response(domain_result)
# Step 3: Service function accepts dependencies as parameters
async def get_active_bans(
socket_path: str,
geo_cache: GeoCache,
app_db: aiosqlite.Connection,
) -> DomainActiveBanList:
"""Retrieve active bans. All dependencies are explicit parameters."""
# Service logic here
...
Why this pattern?
- Explicit: No hidden coupling. Every dependency is visible in function signatures.
- Testable: Easy to mock dependencies by passing test doubles.
- Lightweight: No heavyweight DI container library needed. FastAPI's
Depends()is sufficient. - Debuggable: Stack traces and type checkers understand the full dependency chain.
Service Context Dependencies
For convenience, related repositories and the database connection are bundled into context objects. These prevent routers from depending on the raw database connection (which violates the repository boundary).
Available Service Contexts:
| Context | Includes | Used By |
|---|---|---|
SessionServiceContext |
db, session_repo |
auth router |
BlocklistServiceContext |
db, blocklist_repo, import_log_repo, settings_repo |
blocklist router |
SettingsServiceContext |
db, settings_repo |
server settings router |
BanServiceContext |
db, fail2ban_db_repo |
ban router |
HistoryServiceContext |
db, fail2ban_db_repo, history_archive_repo |
history router |
Each context is created by a provider function:
async def get_ban_service_context(
db: Annotated[aiosqlite.Connection, Depends(get_db)],
fail2ban_db_repo: Annotated[Fail2BanDbRepository, Depends(get_fail2ban_db_repo)],
) -> BanServiceContext:
return BanServiceContext(db=db, fail2ban_db_repo=fail2ban_db_repo)
Adding a New Service
Follow this checklist when creating a new service:
- Create the service module —
app/services/my_service.py - Define the service functions — Each function takes its dependencies as explicit parameters (no imports of other services at the same layer)
- Export key functions — Only the public API functions are called by routers
- If database access is needed:
- Routers depend on the appropriate
ServiceContextDep(e.g.,BanServiceContextDep) - Pass
context.dbandcontext.repositoryto the service function
- Routers depend on the appropriate
- If a new context is needed:
- Create a
@dataclassinapp/dependencies.pyto hold the related resources - Create a provider function
get_<service>_context()that combines them - Create a type alias
<Service>ContextDepfor router injection
- Create a
- Register the service — No registration step; FastAPI discovers it via
Depends()
Example: Adding a new service that needs blocklist and settings repos:
# app/services/my_new_service.py
async def do_something(
db: aiosqlite.Connection,
blocklist_repo: BlocklistRepository,
settings_repo: SettingsRepository,
) -> MyResult:
"""Do something with blocklist and settings data."""
sources = await blocklist_repo.list_sources(db)
settings = await settings_repo.load(db)
# Business logic
return ...
# app/routers/my_router.py
from app.dependencies import BlocklistServiceContextDep
from app.services import my_new_service
@router.get("/something")
async def my_endpoint(
ctx: BlocklistServiceContextDep, # ← Already has db, blocklist_repo, settings_repo
) -> MyResponse:
result = await my_new_service.do_something(
db=ctx.db,
blocklist_repo=ctx.blocklist_repo,
settings_repo=ctx.settings_repo,
)
return MyResponse(...)
The Repository Boundary
Services must not depend on raw database connections. The repository boundary is enforced by not exporting DbDep to routers. Instead:
- Routers declare a
ServiceContextDepwhich includes both thedband the needed repositories - Services receive the
dbconnection and repositories as parameters - Repositories are the only modules that execute SQL; services never call SQL directly
This ensures:
- Queries are centralized and testable
- Changes to the database layer don't leak into business logic
- Repositories can be mocked independently for testing
Lifecycle and Scope
- Request-scoped: Database connections are created fresh for each request and closed after the response is sent. This prevents contention and locking issues with SQLite.
- Application-scoped: Shared resources like
aiohttp.ClientSession, the scheduler, and theGeoCacheare created at startup and reused across all requests. - Singleton: Some services (e.g.,
Fail2BanMetadataService) are instantiated once and cached inapp.stateor imported as module-level instances.
3. Frontend Architecture
The frontend is a React single-page application built with TypeScript, Vite, and Fluent UI v9. It communicates exclusively with the backend REST API — it never accesses fail2ban, the database, or external services directly.
┌──────────────────────────────────────────────────────────────┐
│ React Application │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────────────┐ │
│ │ Pages │───▶│ Components │───▶│ Fluent UI v9 │ │
│ └────┬─────┘ └────────────┘ └──────────────────┘ │
│ │ │
│ ┌────┴─────┐ ┌────────────┐ ┌──────────────────┐ │
│ │ Hooks │───▶│ API Layer │───▶│ Backend (REST) │ │
│ └──────────┘ └────────────┘ └──────────────────┘ │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────────────┐ │
│ │Providers │ │ Types │ │ Theme │ │
│ │(Context) │ │(Interfaces)│ │(Tokens, Styles) │ │
│ └──────────┘ └────────────┘ └──────────────────┘ │
└──────────────────────────────────────────────────────────────┘
3.1 Project Structure
frontend/
├── public/
├── src/
│ ├── api/ # API client and per-domain request functions
│ │ ├── client.ts # Central fetch wrapper (typed GET/POST/PUT/DELETE)
│ │ ├── endpoints.ts # API path constants
│ │ ├── auth.ts # Login, logout, session check
│ │ ├── dashboard.ts # Dashboard status and ban list
│ │ ├── jails.ts # Jail CRUD and controls
│ │ ├── bans.ts # Ban/unban actions, banned list
│ │ ├── config.ts # Configuration read/write
│ │ ├── history.ts # Ban history queries
│ │ ├── blocklist.ts # Blocklist source management
│ │ ├── geo.ts # IP lookup / geolocation
│ │ └── server.ts # Server settings
│ ├── assets/ # Static images, fonts, icons
│ ├── components/ # Reusable UI components
│ │ ├── BanTable.tsx # Data table for ban entries
│ │ ├── JailCard.tsx # Summary card for a jail
│ │ ├── StatusBar.tsx # Server status indicator strip
│ │ ├── TimeRangeSelector.tsx # Quick preset picker (24h, 7d, 30d, 365d)
│ │ ├── IpInput.tsx # IP address input with validation
│ │ ├── RegexTester.tsx # Side-by-side regex match preview
│ │ ├── WorldMap.tsx # Country-outline map with ban counts
│ │ ├── ImportLogTable.tsx # Blocklist import run history
│ │ ├── ConfirmDialog.tsx # Reusable confirmation modal
│ │ ├── RequireAuth.tsx # Route guard: redirects unauthenticated users to /login
│ │ ├── SetupGuard.tsx # Route guard: redirects to /setup if setup incomplete
│ │ └── ... # (additional shared components)
│ ├── hooks/ # Custom React hooks (stateful logic + API calls)
│ │ ├── useAuth.ts # Login state, login/logout actions
│ │ ├── useBans.ts # Fetch ban list for a time range
│ │ ├── useJails.ts # Fetch jail list and details
│ │ ├── useConfig.ts # Fetch and update configuration
│ │ ├── useHistory.ts # Fetch historical ban data
│ │ ├── useBlocklists.ts # Fetch and manage blocklist sources
│ │ ├── useServerStatus.ts # Poll server health / status
│ │ └── useGeo.ts # IP lookup hook
│ ├── layouts/ # Page-level layout wrappers
│ │ └── AppLayout.tsx # Sidebar navigation + header + content area
│ ├── pages/ # Route-level page components (one per route)
│ │ ├── SetupPage.tsx # First-run wizard
│ │ ├── LoginPage.tsx # Password prompt
│ │ ├── DashboardPage.tsx # Ban overview, status bar
│ │ ├── WorldMapPage.tsx # Geographical ban map
│ │ ├── JailsPage.tsx # Jail list, detail, controls, ban/unban
│ │ ├── ConfigPage.tsx # Configuration viewer/editor
│ │ ├── HistoryPage.tsx # Ban history browser
│ │ └── BlocklistPage.tsx # Blocklist source management + import log
│ ├── providers/ # React context providers
│ │ ├── AuthProvider.tsx # Authentication state and guards
│ │ └── ThemeProvider.tsx # Light/dark theme switching
│ ├── theme/ # Fluent UI theme definitions
│ │ ├── customTheme.ts # Brand colour ramp, light and dark themes
│ │ └── tokens.ts # Spacing, sizing, and z-index constants
│ ├── types/ # Shared TypeScript interfaces
│ │ ├── auth.ts # LoginRequest, SessionInfo
│ │ ├── ban.ts # Ban, BanListResponse, BanRequest
│ │ ├── jail.ts # Jail, JailDetail, JailListResponse
│ │ ├── config.ts # ConfigSection, ConfigUpdateRequest
│ │ ├── history.ts # HistoryEntry, IpTimeline
│ │ ├── blocklist.ts # BlocklistSource, ImportLogEntry
│ │ ├── geo.ts # GeoInfo, AsnInfo
│ │ ├── server.ts # ServerStatus, ServerSettings
│ │ └── api.ts # ApiError, PaginatedResponse
│ ├── utils/ # Pure helper functions
│ │ ├── formatDate.ts # Date/time formatting with timezone support
│ │ ├── formatIp.ts # IP display formatting
│ │ ├── crypto.ts # Browser-native SHA-256 helper (SubtleCrypto)
│ │ └── constants.ts # Frontend constants (time presets, etc.)
│ ├── App.tsx # Root: FluentProvider + BrowserRouter + routes
│ ├── main.tsx # Vite entry point
│ └── vite-env.d.ts # Vite type shims
├── tsconfig.json
├── vite.config.ts
└── package.json
3.2 Module Purposes
Pages (src/pages/)
Top-level route components. Each page composes layout, components, and hooks to create a full screen. Pages contain no business logic — they orchestrate what is displayed and delegate data fetching to hooks.
| Page | Route | Purpose |
|---|---|---|
SetupPage |
/setup |
First-run wizard: set master password, database path, fail2ban connection, preferences |
LoginPage |
/login |
Single-field password prompt; redirects to requested page after success |
DashboardPage |
/ |
Server status bar, ban list table, time-range selector |
WorldMapPage |
/map |
World map with per-country ban counts, country filter |
JailsPage |
/jails |
Jail overview list, jail detail panel, controls (start/stop/reload), ban/unban forms, IP lookup, whitelist management |
ConfigPage |
/config |
View and edit jail parameters, filter regex, server settings, regex tester, add log observation |
HistoryPage |
/history |
Browse all past bans, filter by jail/IP/time, per-IP timeline drill-down |
BlocklistPage |
/blocklists |
Manage blocklist sources, schedule configuration, import log, manual import trigger |
Components (src/components/)
Reusable UI building blocks. Components receive data via props, emit changes via callbacks, and never call the API directly. Built exclusively with Fluent UI v9 components.
| Component | Purpose |
|---|---|
StatusBar |
Displays fail2ban server status (online/offline, version, jail count, total bans) |
BanTable |
Sortable data table for ban entries with columns for time, IP, jail, country, etc. |
JailCard |
Summary card showing jail name, status badge, key metrics |
TimeRangeSelector |
Quick-preset picker for filtering data (24h, 7d, 30d, 365d) |
IpInput |
IP address text field with inline validation |
WorldMap |
SVG/Canvas country-outline map with count overlays and click-to-filter |
RegexTester |
Side-by-side sample log + regex input with live match highlighting |
ImportLogTable |
Table displaying blocklist import history |
ConfirmDialog |
Reusable Fluent UI Dialog for destructive action confirmations |
RequireAuth |
Route guard: renders children only when authenticated; otherwise redirects to /login?next=<path> |
SetupGuard |
Route guard: checks GET /api/setup on mount and redirects to /setup if not complete; shows a spinner while loading |
config/ConfigListDetail |
Reusable two-pane master/detail layout used by the Jails, Filters, and Actions config tabs. Left pane lists items with active/inactive badges (active sorted first, keyboard navigable); right pane renders the selected item's detail content. Collapses to a dropdown on narrow screens. |
config/RawConfigSection |
Collapsible section that lazily loads the raw text of a config file into a monospace textarea. Provides a Save button backed by a configurable save callback; shows idle/saving/saved/error feedback. Used by all three config tabs. |
config/AutoSaveIndicator |
Small inline indicator showing the current save state (idle, saving, saved, error) for form fields that auto-save on change. |
Hooks (src/hooks/)
Encapsulate all stateful logic, side effects, and API calls. Components and pages consume hooks to stay declarative.
| Hook | Purpose |
|---|---|
useAuth |
Manages login state, provides login(), logout(), and isAuthenticated |
useBans |
Fetches ban list for a given time range, returns { bans, loading, error } |
useJails |
Fetches jail list and individual jail detail |
useConfig |
Reads and writes fail2ban jail configuration via the socket-based API |
useFilterConfig |
Fetches and manages a single filter file's parsed configuration |
useActionConfig |
Fetches and manages a single action file's parsed configuration |
useJailFileConfig |
Fetches and manages a single jail.d config file |
useConfigActiveStatus |
Derives active status sets for jails, filters, and actions by correlating the live jail list with the config file lists; returns { activeJails, activeFilters, activeActions, loading, error, refresh } |
useAutoSave |
Debounced auto-save hook: invokes a save callback after the user stops typing, tracks saving/saved/error state |
useHistory |
Queries historical ban data with filters |
useBlocklists |
Manages blocklist sources and import triggers |
useServerStatus |
Polls the server status endpoint at an interval |
useGeo |
Performs IP geolocation lookups on demand |
API Layer (src/api/)
A thin typed wrapper around fetch. All HTTP communication is centralised here — components and hooks never construct HTTP requests directly.
| Module | Purpose |
|---|---|
client.ts |
Central get<T>, post<T>, put<T>, del<T> functions with error handling and credentials |
endpoints.ts |
All API path constants in one place — no hard-coded URLs anywhere else |
auth.ts |
login(), logout(), checkSession() |
dashboard.ts |
fetchStatus(), fetchRecentBans() |
jails.ts |
fetchJails(), fetchJailDetail(), startJail(), stopJail(), reloadJail() |
bans.ts |
banIp(), unbanIp(), unbanAll(), fetchBannedIps() |
config.ts |
Socket-based config: fetchJailConfigs(), updateJailConfig(), testRegex(). File-based config: fetchJailFiles(), fetchJailFile(), writeJailFile(), setJailFileEnabled(), fetchFilterFiles(), fetchFilterFile(), writeFilterFile(), fetchActionFiles(), fetchActionFile(), writeActionFile(), reloadConfig() |
history.ts |
fetchHistory(), fetchIpTimeline() |
blocklist.ts |
fetchSources(), addSource(), removeSource(), triggerImport(), fetchImportLog() |
geo.ts |
lookupIp() |
server.ts |
fetchServerSettings(), updateServerSettings() |
Types (src/types/)
Shared TypeScript interfaces and type aliases. Purely declarative — no runtime code. Grouped by domain. Any type used by two or more files lives here.
Providers (src/providers/)
React context providers for application-wide concerns.
Provider Ordering and Compile-Time Validation
Provider order is order-sensitive and enforced at compile-time through TypeScript discriminated unions. The required order (outermost to innermost) is:
ThemeProvider— must be outermost; provides theme context toAppContentsFluentProvider— supplies Fluent UI theme and design tokens to all Fluent UI consumersNotificationProvider— provides notification service; must wrap error boundariesErrorBoundary— catches catastrophic errors at the top levelBrowserRouter— enables client-side routingNavigationCancellationProvider— manages route-aware request cancellation usinguseLocation()AuthProvider— validates session on mount; must be inside BrowserRouter (usesuseNavigate())TimezoneProvider— fetches timezone after auth; wraps protected routes only
Compile-Time Validation:
A type-safe builder pattern (ProviderCompositionBuilder) in providerComposition.tsx enforces this order using TypeScript's discriminated unions. The builder prevents adding providers out of order at compile-time:
const tree = createProviderComposition()
.withTheme({ children })
.withFluent(theme) // ✓ Must come after withTheme
.withNotification() // ✓ Must come after withFluent
.withErrorBoundary() // ✓ Correct order enforced
.withBrowserRouter()
.withNavigationCancellation()
.withAuth()
.build(routes);
Attempting to add providers out of order results in TypeScript errors (no runtime overhead).
Runtime Validation (Development):
A runtime validator (providerOrderValidator.tsx) provides fallback validation for development:
validateProviderPosition()— checks if a provider is correctly nestedvalidateProvidersExist()— ensures required providers are in the treehasProvider()— queries provider presenceuseProviderValidation()— development-only hook that warns if required providers are missing
See src/providers/PROVIDER_ORDER.md for detailed dependency rationale.
Provider Reference:
| Provider | Purpose |
|---|---|
AuthProvider |
Holds authentication state; exposes isAuthenticated, login(), and logout() via useAuth(). Synchronizes logout events across browser tabs in real-time using the BroadcastChannel API (with storage event fallback for older browsers). When a user logs out in any tab, all other open tabs immediately reflect the logout state without requiring a page refresh. |
TimezoneProvider |
Reads the configured IANA timezone from the backend and supplies it to all children via useTimezone() |
ThemeProvider |
Manages light/dark theme selection, supplies the active Fluent UI theme to FluentProvider |
NotificationProvider |
Provides notification service via useNotification() hook; must wrap error boundaries so they can display error notifications |
NavigationCancellationProvider |
Detects route changes and automatically aborts pending API requests; call useNavigationAbortSignal() to get an AbortSignal that lives for the current route |
Theme (src/theme/)
Fluent UI custom theme definitions and design token constants. No component logic — only colours, spacing, and sizing values.
Utils (src/utils/)
Pure helper functions with no React or framework dependency. Date formatting, IP display formatting, shared constants, and cryptographic utilities.
| Utility | Purpose |
|---|---|
formatDate.ts |
Date/time formatting with IANA timezone support |
formatIp.ts |
IP address display formatting |
crypto.ts |
sha256Hex(input) — SHA-256 digest via browser-native SubtleCrypto API; used to hash passwords before transmission |
constants.ts |
Frontend constants (time presets, etc.) |
4. Data Flow
4.1 Request Lifecycle
Every user action follows this flow through the system:
User Action (click, form submit)
│
▼
Page / Component
│ calls hook
▼
Hook (useXxx)
│ calls API function
▼
API Layer (src/api/)
│ HTTP request
▼
FastAPI Router (app/routers/)
│ validates input (Pydantic)
│ calls Depends() for auth + services
▼
Service (app/services/)
│ enforces business rules
│ calls repository or fail2ban client
▼
Repository (app/repositories/) or fail2ban Client (app/utils/)
│ executes SQL query │ sends socket command
▼ ▼
SQLite Database fail2ban Server
│ │
└──────────── response bubbles back up ─────┘
4.2 Authentication Flow
┌─────────┐ POST /api/auth/login ┌─────────────┐
│ Login │ ─────────────────────────────▶│ auth router │
│ Page │ { password: "***" } │ │
└─────────┘ └──────┬───────┘
│
┌──────┴───────┐
│ auth_service │
│ - verify hash │
│ - create token│
└──────┬───────┘
│
┌──────┴───────┐
│ session_repo │
│ - store token │
└──────┬───────┘
│
Set-Cookie: session=<token> │
◀─────────────────────────────────────────────────┘
- The master password is hashed and stored during setup.
- On login, the submitted password is verified against the stored hash.
- A session token is created, stored in the database, and returned as an HTTP-only cookie.
- Every subsequent request is authenticated via the session cookie using a FastAPI dependency.
- The
AuthProvideron the frontend guards all routes except/setupand/login.
4.3 fail2ban Communication
BanGUI communicates with fail2ban through its Unix domain socket using the fail2ban client-server protocol.
┌────────────────────┐ ┌──────────────────┐
│ ban_service.py │ │ fail2ban server │
│ jail_service.py │──socket──│ │
│ config_service.py │ │ /var/run/fail2ban│
│ health_service.py │ │ /fail2ban.sock │
└────────────────────┘ └──────────────────┘
The fail2ban_client.py utility module wraps this communication:
- Opens an async connection to the Unix socket
- Serialises commands using the fail2ban protocol (pickle-based, see
./fail2ban-master/fail2ban/client/csocket.py) - Parses responses into typed Python objects
- Handles connection errors gracefully (timeout, socket not found, permission denied)
Reference source: The vendored fail2ban source at
./fail2ban-masteris included in the repository as an authoritative protocol reference. When implementing or debugging socket communication, consult:
File What it documents ./fail2ban-master/fail2ban/client/csocket.pyCSocketclass — low-level Unix socket connection, pickle serialisation,CSPROTO.ENDframing./fail2ban-master/fail2ban/client/fail2banclient.pyFail2banClient— command dispatch, argument handling, response beautification./fail2ban-master/fail2ban/client/beautifier.pyResponse parser — converts raw server replies into human-readable / structured output ./fail2ban-master/fail2ban/protocol.pyCSPROTOconstants and the full list of supported commands with descriptions./fail2ban-master/fail2ban/client/configreader.pyConfig file parsing used by fail2ban — reference for understanding jail/filter structure
Key commands used:
| Command | Purpose |
|---|---|
status |
Get global server status (number of jails, fail2ban version) |
status <jail> |
Get jail detail (banned IPs, failure count, filter info) |
set <jail> banip <ip> |
Ban an IP in a specific jail |
set <jail> unbanip <ip> |
Unban an IP from a specific jail |
set <jail> idle on/off |
Toggle jail idle mode |
start/stop <jail> |
Start or stop a jail |
reload <jail> |
Reload a single jail configuration |
reload |
Reload all jails |
get <jail> ... |
Read jail settings (findtime, bantime, maxretry, filter, actions, etc.) |
set <jail> ... |
Write jail settings |
set loglevel <level> |
Change server log level |
set logtarget <target> |
Change server log target |
set dbpurgeage <seconds> |
Set database purge age |
flushlogs |
Flush and re-open log files |
4.4 fail2ban Database Access
In addition to the live socket, BanGUI reads the fail2ban SQLite database directly for historical data that the socket protocol does not expose (ban history, past log matches). This is read-only access.
history_service.py ──read-only──▶ fail2ban.db (SQLite)
The fail2ban database contains:
banstable — historical ban records (IP, jail, timestamp, ban data)jailstable — jail definitionslogstable — matched log lines per ban
BanGUI queries these tables to power the Ban History page and the per-IP timeline view.
4.5 External API Communication
geo_service.py ──aiohttp──▶ IP Geolocation API (country, ASN, RIR)
blocklist_service.py ──aiohttp──▶ Blocklist URLs (plain-text IP lists)
All external HTTP calls go through a shared aiohttp.ClientSession created during startup and closed during shutdown. External data is validated before use (IP format, response structure).
5. Database Design
BanGUI maintains its own SQLite database (separate from the fail2ban database) to store application state.
5.1 Application Database Tables
| Table | Purpose |
|---|---|
settings |
Key-value store for application configuration (master password hash, fail2ban socket path, database path, timezone, session duration) |
sessions |
Active session token hashes with expiry timestamps. Tokens are stored as one-way SHA256 hashes to prevent token hijacking if the database is exposed. |
geo_cache |
Resolved IP geolocation results (ip, country_code, country_name, asn, org, cached_at, last_seen). Tracks the last time each IP address was referenced to enable retention policies. Entries older than 90 days are automatically purged by the geo_cache_cleanup task to prevent unbounded growth. Loaded into memory at startup via load_cache_from_db(); new entries are flushed back by the geo_cache_flush background task. |
blocklist_sources |
Registered blocklist URLs (id, name, url, enabled, created_at, updated_at) |
import_logs |
Record of every blocklist import run (id, source_id, timestamp, ips_imported, ips_skipped, errors, status) |
5.2 Database Boundaries
| Database | Owner | BanGUI Access |
|---|---|---|
BanGUI application DB (bangui.db) |
BanGUI | Read + Write |
fail2ban DB (fail2ban.db) |
fail2ban | Read-only (for history queries) |
6. Setup & Configuration Persistence
6.1 Initial Setup Wizard & One-Time Configuration
The setup wizard (POST /api/setup) runs once during first-time startup to configure:
- Master password (bcrypt-hashed)
- Runtime database path (where BanGUI stores operational state)
- fail2ban Unix socket path
- IANA timezone
- Session duration (in minutes)
- Map color thresholds for geolocation visualization
Atomicity & Crash-Safety:
Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:
-
Phase 1 (Bootstrap DB transaction): Set
setup_state = "in_progress"and persistdatabase_path. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up. -
Phase 2 (Filesystem + Runtime DB): Initialize runtime database schema outside a transaction (idempotent via
CREATE TABLE IF NOT EXISTS). -
Phase 3 (Runtime DB transaction): Batch-write all runtime settings (password hash, paths, config) atomically in a single
BEGIN IMMEDIATE ... COMMITtransaction. Either all settings are persisted or none are. -
Phase 4 (Bootstrap DB transaction): Set
setup_state = "complete"andsetup_completed = "1". This is the final commit point — only when this succeeds is setup considered complete.
Password Hash Idempotency:
The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.
State Machine:
| State | Meaning | Recovery |
|---|---|---|
null |
Setup not started | Normal flow: begin setup |
"in_progress" |
Bootstrap DB marked, runtime DB being initialized | Retry from beginning (runtime DB may be partial) |
"complete" |
All settings persisted, setup finished | Skip setup (already done) |
If a crash is detected in "in_progress" state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.
Backward Compatibility:
The setup_completed = "1" key is still written for backward compatibility with cache detection. Modern code checks setup_state = "complete" for clearer semantics.
8. Authentication & Session Management
- Single-user model — one master password, no usernames.
- Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
- Sessions are token-based, stored server-side in the
sessionstable as one-way SHA256 hashes, and delivered to the browser as HTTP-only secure cookies. - Session token hashing — Session tokens are hashed before storage to prevent token hijacking if the database file is exposed. Only the hash (
token_hash) is stored in the database; the raw token is never persisted. When validating a session, the incoming token is hashed before the database lookup. This ensures the database alone is not sufficient to usurp a session — an attacker would also need knowledge of the original token value. - Session expiry is configurable (set during setup, stored in
settings). - The frontend
AuthProviderchecks session validity on mount and redirects to/loginif invalid. - The backend
dependencies.pyprovides anauthenticateddependency that validates the session cookie on every protected endpoint. - Session validation cache (
InMemorySessionCacheinapp.utils.session_cache) — validated session tokens are cached in memory for 10 seconds (configurable viasession_cache_ttl_seconds) to avoid a SQLite round-trip on every request from the same browser. The cache is invalidated immediately on logout. ⚠️ This cache is process-local and not safe for multi-worker or distributed deployments. In single-worker mode (enforced by TASK-002), this is safe and improves performance. For multi-worker deployments, replaceInMemorySessionCachewith a shared backend (Redis, database, shared memory) implementing theSessionCacheprotocol. Seeapp/utils/session_cache.pymodule docstring for implementation details. - GeoCache —
GeoCacheinstance is created at startup with a configurableallow_http_fallbackflag and stored onapp.state.geo_cache. It implements a primary + fallback resolution strategy: (1) try local MaxMind GeoLite2-Country MMDB database (primary, encrypted, no network traffic), (2) if unavailable/no result and allowed, fall back to ip-api.com HTTP API (unencrypted, disabled by default for security). Encapsulates in-memory lookup cache, negative cache for unresolvable IPs (5-minute TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from thegeo_cacheSQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by thegeo_cache_flushbackground task. Stale entries are re-resolved by thegeo_re_resolvetask. Injected into routes and tasks via FastAPI's dependency system. See Backend-Development.md § IP Geolocation Resolution for setup and security details. - Runtime state (
RuntimeStateinapp.utils.runtime_state) — stores mutable application state:server_status(fail2ban online/offline),last_activation(jail activation tracking),pending_recovery(crash detection),runtime_settings(effective configuration), and service-specific state holders likejail_service_state(JailServiceStatefor jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g.,record_activation(),clear_pending_recovery()) and via dependency injection to services. Service-specific state (likeJailServiceState) is nested withinRuntimeStateto keep all mutable state in one controlled location. ⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker. Mutations must not spanawaitpoints (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). Seeapp/utils/runtime_state.pymodule docstring for details. - Setup-completion flag — once
is_setup_complete()returnsTrue, the result is stored inapp.state._setup_complete_cached. TheSetupRedirectMiddlewareskips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.
8.1 CSRF Protection
State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a custom header check middleware.
Design:
- For requests authenticated via the session cookie (not Bearer token), the
CsrfMiddlewarerequires the custom headerX-BanGUI-Request: 1to be present. - The frontend API client automatically includes this header on all requests.
- Cross-site
fetch()calls cannot set custom headers without CORS preflight, which the backend rejects for non-allowed origins, providing defense-in-depth. - Safe HTTP methods (GET, HEAD, OPTIONS) bypass the check.
- Bearer token authentication (via
Authorization: Bearerheader) bypasses the check because tokens are not CSRF-vulnerable (they are not automatically sent on cross-origin requests). - Requests missing the CSRF header receive a
403 Forbiddenresponse with detail:"CSRF validation failed. Request rejected.".
This mechanism complements the existing SameSite=Lax cookie policy, which blocks traditional <form> POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.
9. Scheduling
APScheduler 4.x (async mode) manages recurring background tasks.
┌──────────────────────┐
│ APScheduler │
│ (async, in-process) │
├──────────────────────┤
│ blocklist_import │ ── runs on configured schedule (default: daily 03:00)
│ geo_cache_cleanup │ ── runs every 24 hours (nightly)
│ geo_cache_flush │ ── runs every 60 seconds
│ health_check │ ── runs every 30 seconds
└──────────────────────┘
- The scheduler is started during the FastAPI lifespan startup and stopped during shutdown.
- Job schedules are persisted in the application database so they survive restarts.
- Users can modify the blocklist import schedule through the web interface.
- A manual "Run Now" button triggers the blocklist import job outside the schedule.
10.1 Background Tasks and Database Access
- APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on
Depends(get_db). - Background tasks must open their own application database connection via
app.db.open_dband close it when the work completes. - Use a shared task helper (
app.tasks.db.task_db) so every task follows the same async context manager pattern and avoids connection leaks. - This pattern is intentional: task code is structurally separate from request-handling dependencies and should not attempt to reuse request-scoped DB connections.
9. API Design
9.1 Conventions
- All endpoints are grouped under
/api/prefix. - JSON request and response bodies, validated by Pydantic models.
- Authentication via session cookie on all endpoints except
/api/setupand/api/auth/login. - Setup-redirect middleware: while no configuration exists, all API endpoints (except
/api/setupand/api/health) return423 Lockedwith{"detail": "Setup not complete.", "setup_required": true}. This ensures API consumers can detect setup as a distinct condition rather than transparently following redirects. - Standard HTTP status codes:
200success,201created,204no content,400bad request,401unauthorized,404not found,422validation error,423locked,500server error. - Error responses follow a consistent shape:
{ "detail": "Human-readable message" }.
9.2 Endpoint Groups
| Group | Endpoints | Description |
|---|---|---|
| Auth | POST /login, POST /logout |
Session management |
| Setup | POST /setup |
First-run configuration |
| Dashboard | GET /status, GET /bans |
Overview data for the main page |
| Jails | GET /, GET /:name, POST /:name/start, POST /:name/stop, POST /:name/reload, POST /reload-all |
Jail listing and controls |
| Bans | POST /ban, POST /unban, POST /unban-all, GET /banned |
Ban management |
| Config | GET /, PUT /, POST /test-regex |
Configuration viewing and editing |
| History | GET /, GET /ip/:ip |
Historical ban browsing |
| Blocklists | GET /sources, POST /sources, DELETE /sources/:id, POST /import, GET /import-log |
Blocklist management |
| Geo | GET /lookup/:ip |
IP geolocation and enrichment |
| Server | GET /settings, PUT /settings, POST /flush-logs |
Server-level settings |
9. Deployment Architecture
┌──────────────────────────────────────────────────┐
│ Host Machine │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Reverse Proxy (nginx / caddy) │ │
│ │ - TLS termination │ │
│ │ - /api/* → backend (uvicorn) │ │
│ │ - /* → frontend (static files) │ │
│ └──────────────┬───────────────┬──────────────┘ │
│ │ │ │
│ ┌──────────────┴───┐ ┌───────┴──────────────┐ │
│ │ Backend │ │ Frontend │ │
│ │ uvicorn + FastAPI │ │ Static build (Vite) │ │
│ │ (port 8000) │ │ (served by proxy) │ │
│ └────────┬──────────┘ └──────────────────────┘ │
│ │ │
│ ┌────────┴──────────────────────────────────┐ │
│ │ fail2ban (systemd service) │ │
│ │ Socket: /var/run/fail2ban/fail2ban.sock │ │
│ │ Database: /var/lib/fail2ban/fail2ban.db │ │
│ └───────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
- The backend runs as an ASGI server (uvicorn) behind a reverse proxy.
- The frontend is built to static files by Vite and served directly by the reverse proxy.
- The backend process needs read access to the fail2ban socket and the fail2ban database.
- Both the application database and the fail2ban database reside on the same host.
10.2 nginx Routing Rules
The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior:
Location Block Priority
nginx uses longest-prefix matching to determine which location block handles a request:
- Exact matches (
location =) — highest priority - Regular expression matches (
location ~) — second priority - Prefix matches (
location /prefix) — matched in order of specificity (longest first) - Catch-all (
location /) — lowest priority
Routing Configuration
| Location Block | Rule | Purpose |
|---|---|---|
location /api/ |
proxy_pass http://backend:8000; — no try_files |
Proxy all API requests to FastAPI backend. Any unmatched API route (typos, invalid paths) returns 404 from the backend. |
location /assets/ |
try_files $uri =404; |
Serve static assets with long-term caching. Return 404 if file doesn't exist. |
location / |
try_files $uri $uri/ /index.html; |
SPA fallback: serve index.html for all unmatched routes (client-side routing). |
Routing Behavior
Request → /api/some-endpoint
↓
nginx matches location /api/ (longest prefix)
↓
proxy_pass → backend:8000
↓
Backend returns 404 if endpoint doesn't exist (✓ correct)
Client sees 404, not SPA HTML
Request → /some-page
↓
nginx matches location / (catch-all)
↓
try_files looks for file, then directory, then /index.html
↓
Serves /index.html (React Router handles client-side routing)
↓
Client sees 200 with HTML (✓ correct for SPA)
Request → /api/typos
↓
nginx matches location /api/ (longest prefix, NOT catch-all)
↓
proxy_pass → backend:8000
↓
FastAPI returns 404 (✓ correct, not caught by SPA fallback)
Critical Implementation Notes
- Never add
try_filesto the/api/location block — this would hide backend 404s. - The
/api/location must come before the/catch-all in the config (this is automatically respected via longest-prefix matching). - No inherited
try_filesrules — the/api/location has no globaltry_filesthat could affect it. - Backend 404 responses pass through nginx unchanged — nginx does not rewrite 404 responses from the backend.
9.2a nginx Security Headers
nginx adds the following OWASP-recommended security headers to all responses:
| Header | Value | Purpose |
|---|---|---|
| Content-Security-Policy | default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; connect-src 'self'; frame-ancestors 'none'; |
Prevents XSS attacks by restricting script execution to same-origin. style-src 'unsafe-inline' is required for Fluent UI v9's inline styles. |
| X-Frame-Options | DENY |
Prevents clickjacking by disallowing iframe embedding. |
| X-Content-Type-Options | nosniff |
Prevents MIME-sniffing; browsers must respect the declared Content-Type. |
| Referrer-Policy | no-referrer |
Prevents leaking internal URLs in the Referer header to third-party resources. |
| Permissions-Policy | geolocation=(), microphone=(), camera=() |
Disables access to browser APIs not needed by the application. |
| Strict-Transport-Security | Commented out | Must only be enabled after HTTPS is fully configured. Uncomment when TLS termination is production-ready. |
All headers use the always directive, ensuring they are included in error responses (4xx, 5xx) as well.
CSP and Fluent UI
Fluent UI v9 applies styles via inline style attributes on DOM elements. To support this, style-src 'unsafe-inline' is required. A stricter CSP using nonces would require server-side rendering of the HTML shell, which is outside the current architecture.
9.3 Deployment Constraints
Single-Executor Scheduler Requirement
BanGUI's background scheduler must run with exactly one executor process.
The application uses APScheduler's AsyncIOScheduler, which is bound to a single asyncio event loop and cannot be safely shared across multiple worker processes. If the app is deployed with --workers N (where N > 1), the following failures occur:
- Each worker process creates its own independent scheduler instance.
- All background jobs execute N times simultaneously (once per worker).
- Results:
- Duplicate blocklist imports — the same IP ranges are banned N times.
- Duplicate history entries — the same historical events are recorded N times.
- Duplicate ban operations — bans are executed multiple times, with potential state conflicts.
- SQLite lock contention — concurrent writes to the same database from N workers cause lock timeouts.
Enforcement Mechanism
BanGUI enforces single-executor safety through a database-backed lock that works reliably in container orchestration environments:
-
Fast check (env var): On startup, the
BANGUI_WORKERSenvironment variable is checked (if set). If explicitly set to a value > 1, startup fails immediately with a clear error. -
Authoritative check (database lock): During startup, BanGUI acquires an atomic database lock in the
scheduler_locktable. This lock:- Uses a singleton row (id=1) to prevent race conditions across simultaneously starting instances
- Stores the PID, hostname, creation timestamp, and heartbeat timestamp of the lock holder
- Is considered stale if the heartbeat hasn't been updated for 60 seconds
- Is automatically cleaned up on stale instance detection, allowing failover in rolling deployments
-
Lock acquisition (startup):
- Clean up any stale locks (heartbeat older than 60 seconds)
- Attempt to insert a new lock row with this instance's PID and hostname
- If the INSERT fails (row already exists), reject startup with a clear error
- If the INSERT succeeds, this instance holds the lock and will start the scheduler
-
Lock maintenance (runtime): A periodic background task (
scheduler_lock_heartbeat) updates the lock's heartbeat timestamp every 10 seconds, keeping it alive and preventing false positives from temporary load spikes. -
Lock release (shutdown): On graceful shutdown, the lock is released, allowing other instances to acquire it.
Why database-backed instead of filesystem?
Database-backed locking is more reliable in container orchestration because:
- Atomicity: SQLite transactions are atomic — no race condition window between checking and inserting
- Container-safe: Works across containers with shared database volumes (no NFS/SMB edge cases)
- Stale detection: Heartbeat-based TTL is simpler and more reliable than PID-based checks (PID reuse is common in containers)
- No false positives: Timestamp-based expiration eliminates issues with PID reuse
Startup Sequence with Scheduler Lock
1. DATABASE stage
└─ Initialize SQLite schema (including scheduler_lock table)
2. WORKER_MODE stage (formerly first, now depends on DATABASE)
├─ Fast check: Verify BANGUI_WORKERS env var if explicitly set
└─ Authoritative check: Acquire scheduler lock in database
→ If lock held by another instance: Fail with clear error
→ If lock acquired: Continue to GEO_CACHE stage
3. (rest of startup continues as normal)
Troubleshooting
Problem: Startup fails with "Could not acquire scheduler lock"
Solution:
- Verify no other BanGUI instances are running
- Inspect the lock:
sqlite3 bangui.db "SELECT * FROM scheduler_lock;" - Check who holds the lock (hostname, PID, heartbeat time)
- If stale (heartbeat older than 60 seconds), clean it:
sqlite3 bangui.db "DELETE FROM scheduler_lock WHERE (strftime('%s', 'now') - heartbeat_at) > 60;" - Retry the failed instance
Problem: Stale lock after instance crash
BanGUI handles this automatically:
- The next instance to start will detect the stale lock (heartbeat older than 60 seconds)
- It will clean it up and acquire the lock
- The new instance starts the scheduler as normal
No manual intervention is required.
Environment Variables
BANGUI_WORKERS(optional, default: unset)- If set to
1or unset: Normal operation (any number of instances may start, but only one holds the lock) - If set to >
1: Startup fails immediately with an error (fast check) - Reason: Legacy env var for explicitly forbidding multi-worker deployments
- If set to
Container Orchestration Examples
Docker Compose:
- Single service instance (no scaling) — scheduler runs normally
Kubernetes:
- Single Pod replica — scheduler runs normally
- Multiple Pod replicas (during rolling update) — old Pod releases lock on shutdown, new Pod acquires it
- No duplicate jobs, no startup failures
- Health check should allow 30-60 seconds for lock handoff
systemd / process manager:
- Single process — scheduler runs normally
- Accidental multi-process restart — lock prevents duplicate jobs, other processes fail to start scheduler
Future Multi-Worker Support
To safely support multiple workers in the future:
- External job store: Move APScheduler from in-memory to a persistent store (e.g., SQLAlchemy-backed job store with PostgreSQL or Redis).
- Distributed locking: Use a distributed lock (Redis, etcd) instead of database lock for better performance.
- Process coordination: Implement a process-to-worker pool communication mechanism so the scheduler runs only on one designated worker.
Currently, the single-executor approach is simple, maintainable, and sufficient for BanGUI's operational requirements. The database lock provides reliable enforcement across all deployment scenarios.
10. Observability & Distributed Tracing
BanGUI implements distributed tracing via correlation IDs to correlate errors and requests across frontend and backend systems.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React + TypeScript) │
├─────────────────────────────────────────────────────────────┤
│ • API Client generates session-scoped UUID4 (correlation ID)│
│ • Telemetry service records structured events │
│ • Error boundaries catch render errors │
│ • All telemetry events include correlation ID for tracing │
└────────────────────┬────────────────────────────────────────┘
│
├─ Every request includes
│ X-Correlation-ID header
│
┌────────────────────┴────────────────────────────────────────┐
│ Backend (Python + FastAPI + structlog) │
├─────────────────────────────────────────────────────────────┤
│ • CorrelationIdMiddleware extracts/generates correlation ID │
│ • All logs automatically include correlation ID │
│ • Error responses include correlation_id field │
│ • structlog outputs JSON with correlation ID in all events │
└─────────────────────────────────────────────────────────────┘
Correlation ID Flow
-
Frontend → Backend:
- API client generates/retrieves session-scoped UUID4
- UUID4 sent in
X-Correlation-IDrequest header - All requests use same session UUID (set once, reused)
-
Backend Processing:
- CorrelationIdMiddleware extracts/generates correlation ID
- ID stored in structlog contextvars
- All structured log entries include correlation ID automatically
- Error responses include
correlation_idfield in JSON
-
Backend → Frontend:
- Response includes
X-Correlation-IDheader - Error responses include
correlation_idin response body - Frontend error handlers extract correlation ID
- Response includes
-
Frontend Error Logging:
- Error handlers extract correlation ID from API response
- Telemetry service logs error with correlation ID
- Browser console and telemetry backends receive linked events
Example: Correlating an Error Across Systems
Scenario: User clicks "Ban IP" button → API returns 500 error → error logged and displayed
Frontend telemetry event:
{
"event": "api_error",
"severity": "error",
"message": "Server error banning IP",
"correlation_id": "550e8400-e29b-41d4-a716-446655440000",
"context": {
"status": 500,
"endpoint": "/api/bans"
},
"timestamp": "2025-04-30T18:30:00.000Z"
}
Backend structured log:
{
"event": "ban_service_error",
"severity": "error",
"message": "Failed to ban IP",
"correlation_id": "550e8400-e29b-41d4-a716-446655440000",
"context": {
"ip": "192.168.1.1",
"jail": "sshd",
"error": "fail2ban socket error"
},
"timestamp": "2025-04-30T18:30:00.000Z"
}
Troubleshooting: Engineer searches logs for correlation ID 550e8400-e29b-41d4-a716-446655440000 and finds all related events (request received, jail lookup, fail2ban call, error response) in order.
Implementation Details
Backend:
- Middleware:
app/middleware/correlation.py- Generates UUID4 if
X-Correlation-IDheader missing - Stores in structlog contextvars for automatic inclusion in all logs
- Adds correlation ID to response header and error responses
- Generates UUID4 if
- All error handlers include
correlation_idinErrorResponse - See
backend/app/models/response.pyforErrorResponse.correlation_idfield
Frontend:
- API client:
frontend/src/api/client.ts- Generates session-scoped UUID4 on first use
- Includes in
X-Correlation-IDheader for all requests - Extracts from response headers and stores in
ApiError
- Telemetry service:
frontend/src/utils/telemetry.ts- Structured event logging with correlation ID support
- Redaction utilities for privacy/security
- Handlers for custom backends (console logger by default)
- Error handlers:
frontend/src/utils/fetchError.ts- Extract correlation ID from API errors
- Log with telemetry for distributed tracing
- Error boundaries:
frontend/src/components/{Error,Page,Section}ErrorBoundary.tsx- Catch render-time exceptions
- Log with telemetry for observability
- Note:
ErrorBoundary.componentDidCatch()accesseserrorInfo.componentStackwhich is not part of the public React.ErrorInfo type definition. This is a React DevTools implementation detail accessed via type casting (as any). It captures the React component hierarchy for debugging but may change in future React versions. See React issue #3623 for context.
Privacy & Security
-
No sensitive data logged:
- Passwords, tokens, session IDs never logged
- PII (names, emails, IPs) logged only with explicit intent and redaction
- Redaction utilities:
telemetry.redact(),telemetry.redactObject()
-
Backend: Correlation IDs use opaque UUID4 (no user data embedded)
-
Frontend: Same session UUID for all requests (safe to expose in logs)
Future Enhancements
-
Backend error telemetry aggregation:
- Send structured logs to observability platform (DataDog, Grafana Loki, etc.)
- Query by correlation ID to trace entire request flow
-
Frontend error reporting:
- Send frontend telemetry to backend
/api/telemetryendpoint - Store alongside backend logs for unified view
- Send frontend telemetry to backend
-
Metrics & dashboards:
- Error rates by endpoint, severity, error type
- Latency percentiles and distribution
- Request success/failure trends
11. Design Principles
These principles govern all architectural decisions in BanGUI.
| Principle | Application |
|---|---|
| Separation of Concerns | Frontend and backend are independent. Backend layers (router → service → repository) never mix responsibilities. |
| Service Independence | Services must not import other services at the same layer (e.g., jail_config_service must not import jail_service). Shared logic belongs in the utils layer (app/utils/). This prevents circular dependencies, improves testability, and keeps each service focused on its domain. |
| Single Responsibility | Each module, service, and component has one well-defined job. |
| Dependency Inversion | Services depend on abstractions (protocols), not concrete implementations. FastAPI Depends() wires everything. |
| Async Everything | All I/O is non-blocking. No synchronous database, HTTP, or socket calls anywhere in the backend. |
| Validate at the Boundary | Pydantic models validate all data entering the backend. TypeScript types enforce structure on the frontend. |
| Fail Fast | Configuration is validated at startup. Invalid input is rejected immediately with clear errors. |
| Composition over Inheritance | Small, focused objects are composed together rather than building deep class hierarchies. |
| DRY | Shared logic lives in utils, hooks, or base services — never duplicated across modules. |
| KISS | The simplest correct solution wins. No premature abstractions or over-engineering. |
| YAGNI | Only build what is needed now. Extend when a real requirement appears. |