Files
BanGUI/Docs/DATABASE_SCHEMA.md

348 lines
11 KiB
Markdown

# Database Schema Documentation
BanGUI uses two SQLite databases:
| Database | Purpose | Location |
|---|---|---|
| **BanGUI app DB** | Own configuration, sessions, blocklist sources, import logs, geo cache | `bangui.db` |
| **fail2ban DB** | fail2ban's internal ban/jail data (read-only) | Configured via `FAIL2BAN_DB` env var |
---
## 1. BanGUI Application Schema
Single source of truth: `backend/app/db.py`.
### 1.1 `settings`
Key-value store for application configuration.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY AUTOINCREMENT |
| `key` | TEXT | NOT NULL UNIQUE |
| `value` | TEXT | NOT NULL |
| `created_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
| `updated_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Indexes:** PK only.
**Purpose:** Stores app-wide settings (e.g., timezone, UI preferences). All settings access goes through `settings_repo` / `settings_service`.
---
### 1.2 `sessions`
Session tokens for web authentication.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY AUTOINCREMENT |
| `token_hash` | TEXT | NOT NULL UNIQUE |
| `created_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
| `expires_at` | TEXT | NOT NULL |
**Indexes:** `idx_sessions_token_hash` (UNIQUE) on `token_hash`.
**Purpose:** Web session management. Tokens are SHA-256 hashed before storage. Sessions expire and are cleaned up by `session_cleanup` task. See `auth_service.py`.
---
### 1.3 `blocklist_sources`
Blocklist source definitions for the import pipeline.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY AUTOINCREMENT |
| `name` | TEXT | NOT NULL |
| `url` | TEXT | NOT NULL UNIQUE |
| `enabled` | INTEGER | NOT NULL DEFAULT 1 (boolean) |
| `created_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
| `updated_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Indexes:** PK only.
**Purpose:** Defines sources for blocklist imports. See `blocklist_repo`, `blocklist_service`, `blocklist_import_workflow`.
---
### 1.4 `import_log`
Audit log of individual blocklist import operations.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY AUTOINCREMENT |
| `source_id` | INTEGER | REFERENCES `blocklist_sources(id)` ON DELETE RESTRICT |
| `source_url` | TEXT | NOT NULL |
| `timestamp` | INTEGER | NOT NULL (UNIX epoch) |
| `ips_imported` | INTEGER | NOT NULL DEFAULT 0 |
| `ips_skipped` | INTEGER | NOT NULL DEFAULT 0 |
| `errors` | TEXT | |
**Indexes:**
- `idx_import_log_id_desc` on `(id DESC)` — cursor pagination
- `idx_import_log_source_id_desc` on `(source_id, id DESC)` — filtered pagination
**Purpose:** Audit trail for imports. `source_id` RESTRICT prevents source deletion when logs exist. See migration 9.
**Migration 8:** `timestamp` migrated from TEXT ISO 8601 to INTEGER UNIX epoch.
---
### 1.5 `geo_cache`
Geo-IP lookup cache for ban IP metadata.
| Column | Type | Constraints |
|---|---|---|
| `ip` | TEXT | PRIMARY KEY |
| `country_code` | TEXT | |
| `country_name` | TEXT | |
| `asn` | TEXT | |
| `org` | TEXT | |
| `cached_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Additional (migration 3):**
| Column | Type | Constraints |
|---|---|---|
| `last_seen` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Indexes:** PK only.
**Purpose:** Caches GeoIP results to reduce third-party API calls. TTL managed by `geo_cache_cleanup` task. See `geo_cache_repo`, `geo_service`.
---
### 1.6 `history_archive`
Archived ban/unban history mirrored from fail2ban DB.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY AUTOINCREMENT |
| `jail` | TEXT | NOT NULL |
| `ip` | TEXT | NOT NULL |
| `timeofban` | INTEGER | NOT NULL (UNIX epoch) |
| `bancount` | INTEGER | NOT NULL |
| `data` | TEXT | NOT NULL (JSON) |
| `action` | TEXT | NOT NULL CHECK IN ('ban', 'unban') |
| `created_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Constraints:** `UNIQUE(ip, jail, action, timeofban)` prevents duplicate archive rows.
**Indexes:**
- `idx_history_archive_jail_timeofban` on `(jail, timeofban DESC)` — dashboard filter by jail + time ordering
- `idx_history_archive_timeofban_jail_action` on `(timeofban DESC, jail, action)` — timeline filters
- `idx_history_archive_ip` on `(ip)` — IP prefix/exact searches
- `idx_history_archive_action` on `(action)` — ban/unban filtering
**Purpose:** Long-term ban history. Synced from fail2ban DB by `history_sync` task. See `history_archive_repo`, `history_service`.
---
### 1.7 `scheduler_lock`
Database-backed mutex for multi-worker scheduler safety.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY CHECK (id = 1) — singleton row |
| `pid` | INTEGER | NOT NULL |
| `hostname` | TEXT | NOT NULL |
| `created_at` | REAL | NOT NULL (UNIX epoch) |
| `heartbeat_at` | REAL | NOT NULL (UNIX epoch) |
**Indexes:** PK only (singleton constraint).
**Purpose:** Only one worker process holds the scheduler lock at a time. Lock is heartbeat-renewed by `scheduler_lock_heartbeat` task. Uses `BEGIN IMMEDIATE` transaction to acquire atomically. See `scheduler_lock.py`.
---
### 1.8 `import_runs`
Tracks unique blocklist imports for idempotent retries.
| Column | Type | Constraints |
|---|---|---|
| `id` | INTEGER | PRIMARY KEY AUTOINCREMENT |
| `source_id` | INTEGER | NOT NULL REFERENCES `blocklist_sources(id)` ON DELETE CASCADE |
| `content_hash` | TEXT | NOT NULL |
| `status` | TEXT | NOT NULL CHECK IN ('pending', 'completed', 'failed') |
| `imported_count` | INTEGER | NOT NULL DEFAULT 0 |
| `skipped_count` | INTEGER | NOT NULL DEFAULT 0 |
| `error_message` | TEXT | |
| `created_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
| `updated_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Constraints:** `UNIQUE(source_id, content_hash)` — same source + content = same import run.
**Indexes:** `idx_import_runs_source_status` on `(source_id, status)` — lookup completed imports by source.
**Purpose:** Prevents duplicate IP bans on import crash/retry. See migration 6 and `blocklist_import_workflow`.
---
### 1.9 `schema_migrations`
Tracks applied schema versions.
| Column | Type | Constraints |
|---|---|---|
| `version` | INTEGER | PRIMARY KEY |
| `migrated_at` | TEXT | NOT NULL DEFAULT ISO 8601 |
**Indexes:** PK only.
**Purpose:** Idempotent schema migration tracker. Records each applied version number. See `init_db()` and `_migrate_schema()`.
---
## 2. Fail2ban Database Schema
Read-only access via `fail2ban_db_repo`. Fail2ban manages this DB; BanGUI mirrors data into `history_archive`.
### 2.1 `fail2banDb`
| Column | Type | Constraints |
|---|---|---|
| `version` | INTEGER | |
Single row tracking DB schema version.
---
### 2.2 `jails`
| Column | Type | Constraints |
|---|---|---|
| `name` | TEXT | NOT NULL UNIQUE |
| `enabled` | INTEGER | NOT NULL DEFAULT 1 |
**Indexes:** `jails_name` on `(name)`.
---
### 2.3 `logs`
| Column | Type | Constraints |
|---|---|---|
| `jail` | TEXT | NOT NULL FK → `jails(name)` ON DELETE CASCADE |
| `path` | TEXT | |
| `firstlinemd5` | TEXT | |
| `lastfilepos` | INTEGER | DEFAULT 0 |
| `UNIQUE(jail, path)` | | |
| `UNIQUE(jail, path, firstlinemd5)` | | |
**Indexes:** `logs_path` on `(path)`, `logs_jail_path` on `(jail, path)`.
---
### 2.4 `bans`
| Column | Type | Constraints |
|---|---|---|
| `jail` | TEXT | NOT NULL FK → `jails(name)` |
| `ip` | TEXT | |
| `timeofban` | INTEGER | NOT NULL |
| `bantime` | INTEGER | NOT NULL |
| `bancount` | INTEGER | NOT NULL DEFAULT 1 |
| `data` | JSON | |
**Indexes:**
- `bans_jail_timeofban_ip` on `(jail, timeofban)`
- `bans_jail_ip` on `(jail, ip)`
- `bans_ip` on `(ip)`
---
### 2.5 `bips`
Backup IPs table (ban backup).
| Column | Type | Constraints |
|---|---|---|
| `ip` | TEXT | NOT NULL |
| `jail` | TEXT | NOT NULL FK → `jails(name)` |
| `timeofban` | INTEGER | NOT NULL |
| `bantime` | INTEGER | NOT NULL |
| `bancount` | INTEGER | NOT NULL DEFAULT 1 |
| `data` | JSON | |
| PRIMARY KEY | `(ip, jail)` | |
**Indexes:** `bips_timeofban` on `(timeofban)`, `bips_ip` on `(ip)`.
---
## 3. Relationships and Constraints
```
blocklist_sources (1) ──(id)──→ import_log.source_id [RESTRICT on delete]
└──→ import_runs.source_id [CASCADE on delete]
settings: standalone (key-value, no FK)
sessions: standalone (token hash, no FK)
geo_cache: standalone (IP → geo data, no FK)
history_archive: standalone (archived ban history, no FK)
scheduler_lock: singleton row (id=1), no FK
schema_migrations: standalone (migration tracking, no FK)
```
Fail2ban tables are separate and read-only from BanGUI's perspective.
---
## 4. Indexes Summary
| Table | Index | Columns |
|---|---|---|
| `sessions` | `idx_sessions_token_hash` | `token_hash` UNIQUE |
| `import_log` | `idx_import_log_id_desc` | `id DESC` |
| `import_log` | `idx_import_log_source_id_desc` | `source_id, id DESC` |
| `import_runs` | `idx_import_runs_source_status` | `source_id, status` |
| `history_archive` | `idx_history_archive_jail_timeofban` | `jail, timeofban DESC` |
| `history_archive` | `idx_history_archive_timeofban_jail_action` | `timeofban DESC, jail, action` |
| `history_archive` | `idx_history_archive_ip` | `ip` |
| `history_archive` | `idx_history_archive_action` | `action` |
| `jails` | `jails_name` | `name` |
| `logs` | `logs_path` | `path` |
| `logs` | `logs_jail_path` | `jail, path` |
| `bans` | `bans_jail_timeofban_ip` | `jail, timeofban` |
| `bans` | `bans_jail_ip` | `jail, ip` |
| `bans` | `bans_ip` | `ip` |
| `bips` | `bips_timeofban` | `timeofban` |
| `bips` | `bips_ip` | `ip` |
---
## 5. Migration History
| Version | Description |
|---|---|
| 1 | Initial schema: `settings`, `sessions`, `blocklist_sources`, `import_log`, `geo_cache`, `history_archive`, `schema_migrations` |
| 2 | Hash session tokens (`token_hash` column). Invalidates all existing sessions. |
| 3 | Add `last_seen` to `geo_cache` for retention policy. |
| 4 | Add `scheduler_lock` table for multi-worker scheduler mutex. |
| 5 | Add indexes to `history_archive` for query performance (4 indexes). |
| 6 | Add `import_runs` table for idempotent import tracking. |
| 7 | Add indexes to `import_log` for cursor-based pagination. |
| 8 | Migrate `import_log.timestamp` from TEXT ISO 8601 → INTEGER UNIX epoch. |
| 9 | Change `import_log.source_id` FK to `ON DELETE RESTRICT` (prevents orphaned logs). Recreate table with new FK semantics. |
**Current schema version:** 9 (`_CURRENT_SCHEMA_VERSION` in `db.py`).
---
## 6. Performance Notes
- **WAL mode** (`PRAGMA journal_mode=WAL`) — concurrent reads allowed, better write performance under concurrency.
- **Foreign keys enforced** (`PRAGMA foreign_keys=ON`) — data integrity at DB level.
- **Busy timeout** 5000 ms — prevents "database is locked" errors under contention.
- **`history_archive` indexes** — tuned for dashboard filter + time ordering + pagination. See migration 5 and `PERFORMANCE.md`.
- **`import_log` indexes** — tuned for cursor-based pagination (newest-first by id). See migration 7.
- **`geo_cache` PK on `ip`** — O(1) lookup for geo enrichment on ban events.
- **`scheduler_lock` singleton** (`CHECK (id = 1)`) — trivial lock existence check.
For detailed query patterns and benchmarks, see `Docs/PERFORMANCE.md`.