Fix non-atomic setup persistence across DB contexts (Issue #30)

Implement transactional setup with explicit state machine and crash-safety to prevent partial commits from leaving inconsistent state. ## Changes ### Core Implementation 1. **settings_repo.py**: Add atomic batch settings write - New set_settings_batch() method: writes multiple settings in single transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist or none do, preventing partial state if crash occurs mid-batch. 2. **setup_service.py**: Refactor run_setup() with transactional phases - Phase 0: Compute password hash early (before any DB writes) to ensure idempotency. Same hash is used throughout retries, preventing divergent hashes from bcrypt's random salt. - Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and database_path, then commit. First checkpoint for crash detection. - Phase 2 (Filesystem): Initialize runtime database (idempotent) - Phase 3 (Runtime DB transaction): Batch-write all settings atomically - Phase 4 (Bootstrap DB transaction): Set setup_state=complete and setup_completed=1. Final commit point. 3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol ### Testing - Added 6 new transactionality tests covering: - State machine transitions (None → in_progress → complete) - Password hash idempotency across retries - Atomic batch writes (all-or-nothing persistence) - Bootstrap DB state tracking - Database path propagation to both DBs - Recovery on partial failure - All 18 tests pass (12 existing + 6 new) ### Documentation - Updated Docs/Architekture.md with new section 6: - Setup state machine with state transitions - Transaction boundary documentation - Password hash idempotency rationale - Backward compatibility notes ## Design Decisions ### Why This Approach - Current code already idempotent via INSERT OR REPLACE, but password hash non-idempotency created silent inconsistency risk - Simpler than multi-state machine: 2 states sufficient for detection - Maintains backward compatibility (setup_completed key still written) - Explicit transactions make crash-safety obvious to future maintainers ### Crash Scenarios Now Handled 1. Crash after Phase 1 → detected by setup_state=in_progress on retry 2. Crash after Phase 2 → runtime DB may be partial, safe to retry 3. Crash after Phase 3 → runtime DB rolls back on next connection 4. Crash after Phase 4 → setup_completed detected, skipped Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:19:53 +02:00
parent cc4370c50d
commit 1302ac821f
5 changed files with 376 additions and 30 deletions
--- a/Docs/Architekture.md
+++ b/Docs/Architekture.md
@@ -920,7 +920,51 @@ BanGUI maintains its **own SQLite database** (separate from the fail2ban databas

 ---

-## 6. Authentication & Session Management
+## 6. Setup & Configuration Persistence
+
+### 6.1 Initial Setup Wizard & One-Time Configuration
+
+The setup wizard (`POST /api/setup`) runs once during first-time startup to configure:
+- Master password (bcrypt-hashed)
+- Runtime database path (where BanGUI stores operational state)
+- fail2ban Unix socket path
+- IANA timezone
+- Session duration (in minutes)
+- Map color thresholds for geolocation visualization
+
+**Atomicity & Crash-Safety:**
+
+Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:
+
+1. **Phase 1 (Bootstrap DB transaction)**: Set `setup_state = "in_progress"` and persist `database_path`. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up.
+
+2. **Phase 2 (Filesystem + Runtime DB)**: Initialize runtime database schema outside a transaction (idempotent via `CREATE TABLE IF NOT EXISTS`).
+
+3. **Phase 3 (Runtime DB transaction)**: Batch-write all runtime settings (password hash, paths, config) atomically in a single `BEGIN IMMEDIATE ... COMMIT` transaction. Either all settings are persisted or none are.
+
+4. **Phase 4 (Bootstrap DB transaction)**: Set `setup_state = "complete"` and `setup_completed = "1"`. This is the final commit point — only when this succeeds is setup considered complete.
+
+**Password Hash Idempotency:**
+
+The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.
+
+**State Machine:**
+
+| State | Meaning | Recovery |
+|-------|---------|----------|
+| `null` | Setup not started | Normal flow: begin setup |
+| `"in_progress"` | Bootstrap DB marked, runtime DB being initialized | Retry from beginning (runtime DB may be partial) |
+| `"complete"` | All settings persisted, setup finished | Skip setup (already done) |
+
+If a crash is detected in `"in_progress"` state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.
+
+**Backward Compatibility:**
+
+The `setup_completed = "1"` key is still written for backward compatibility with cache detection. Modern code checks `setup_state = "complete"` for clearer semantics.
+
+---
+
+## 8. Authentication & Session Management

 - **Single-user model** — one master password, no usernames.
 - Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
@@ -934,7 +978,7 @@ BanGUI maintains its **own SQLite database** (separate from the fail2ban databas
 - **Runtime state** (`RuntimeState` in `app.utils.runtime_state`) — stores mutable application state: `server_status` (fail2ban online/offline), `last_activation` (jail activation tracking), `pending_recovery` (crash detection), `runtime_settings` (effective configuration), and service-specific state holders like `jail_service_state` (`JailServiceState` for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., `record_activation()`, `clear_pending_recovery()`) and via dependency injection to services. Service-specific state (like `JailServiceState`) is nested within `RuntimeState` to keep all mutable state in one controlled location. **⚠️  RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker.** Mutations must not span `await` points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See `app/utils/runtime_state.py` module docstring for details.
 - **Setup-completion flag** — once `is_setup_complete()` returns `True`, the result is stored in `app.state._setup_complete_cached`. The `SetupRedirectMiddleware` skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.

-### 6.1 CSRF Protection
+### 8.1 CSRF Protection

 State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a **custom header check middleware**.

@@ -949,7 +993,7 @@ State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authen
 This mechanism complements the existing `SameSite=Lax` cookie policy, which blocks traditional `<form>` POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.

 ---
-## 7. Scheduling
+## 9. Scheduling

 APScheduler 4.x (async mode) manages recurring background tasks.

@@ -972,7 +1016,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.

 ---

-## 7.1 Background Tasks and Database Access
+## 10.1 Background Tasks and Database Access

 - APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on ``Depends(get_db)``.
 - Background tasks must open their own application database connection via ``app.db.open_db`` and close it when the work completes.
@@ -981,9 +1025,9 @@ APScheduler 4.x (async mode) manages recurring background tasks.

 ---

-## 8. API Design
+## 9. API Design

-### 8.1 Conventions
+### 9.1 Conventions

 - All endpoints are grouped under `/api/` prefix.
 - JSON request and response bodies, validated by Pydantic models.
@@ -992,7 +1036,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.
 - Standard HTTP status codes: `200` success, `201` created, `204` no content, `400` bad request, `401` unauthorized, `404` not found, `422` validation error, `423` locked, `500` server error.
 - Error responses follow a consistent shape: `{ "detail": "Human-readable message" }`.

-### 8.2 Endpoint Groups
+### 9.2 Endpoint Groups

 | Group | Endpoints | Description |
 |---|---|---|
@@ -1043,7 +1087,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.

 ---

-## 9.2 nginx Routing Rules
+## 10.2 nginx Routing Rules

 The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior: