Fix non-atomic setup persistence across DB contexts (Issue #30)
Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.
## Changes
### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
- New set_settings_batch() method: writes multiple settings in single
transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
or none do, preventing partial state if crash occurs mid-batch.
2. **setup_service.py**: Refactor run_setup() with transactional phases
- Phase 0: Compute password hash early (before any DB writes) to ensure
idempotency. Same hash is used throughout retries, preventing divergent
hashes from bcrypt's random salt.
- Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
database_path, then commit. First checkpoint for crash detection.
- Phase 2 (Filesystem): Initialize runtime database (idempotent)
- Phase 3 (Runtime DB transaction): Batch-write all settings atomically
- Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
setup_completed=1. Final commit point.
3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol
### Testing
- Added 6 new transactionality tests covering:
- State machine transitions (None → in_progress → complete)
- Password hash idempotency across retries
- Atomic batch writes (all-or-nothing persistence)
- Bootstrap DB state tracking
- Database path propagation to both DBs
- Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)
### Documentation
- Updated Docs/Architekture.md with new section 6:
- Setup state machine with state transitions
- Transaction boundary documentation
- Password hash idempotency rationale
- Backward compatibility notes
## Design Decisions
### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers
### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -920,7 +920,51 @@ BanGUI maintains its **own SQLite database** (separate from the fail2ban databas
|
||||
|
||||
---
|
||||
|
||||
## 6. Authentication & Session Management
|
||||
## 6. Setup & Configuration Persistence
|
||||
|
||||
### 6.1 Initial Setup Wizard & One-Time Configuration
|
||||
|
||||
The setup wizard (`POST /api/setup`) runs once during first-time startup to configure:
|
||||
- Master password (bcrypt-hashed)
|
||||
- Runtime database path (where BanGUI stores operational state)
|
||||
- fail2ban Unix socket path
|
||||
- IANA timezone
|
||||
- Session duration (in minutes)
|
||||
- Map color thresholds for geolocation visualization
|
||||
|
||||
**Atomicity & Crash-Safety:**
|
||||
|
||||
Setup is implemented with explicit transaction boundaries across two SQLite databases (bootstrap config DB and runtime app DB) to ensure atomicity:
|
||||
|
||||
1. **Phase 1 (Bootstrap DB transaction)**: Set `setup_state = "in_progress"` and persist `database_path`. On commit, this is the first checkpoint — if process crashes here, the next setup attempt will detect and clean up.
|
||||
|
||||
2. **Phase 2 (Filesystem + Runtime DB)**: Initialize runtime database schema outside a transaction (idempotent via `CREATE TABLE IF NOT EXISTS`).
|
||||
|
||||
3. **Phase 3 (Runtime DB transaction)**: Batch-write all runtime settings (password hash, paths, config) atomically in a single `BEGIN IMMEDIATE ... COMMIT` transaction. Either all settings are persisted or none are.
|
||||
|
||||
4. **Phase 4 (Bootstrap DB transaction)**: Set `setup_state = "complete"` and `setup_completed = "1"`. This is the final commit point — only when this succeeds is setup considered complete.
|
||||
|
||||
**Password Hash Idempotency:**
|
||||
|
||||
The bcrypt password hash is computed early (before any DB writes) to ensure that if setup is retried after a crash, the same hash is used throughout all retry attempts. This prevents divergent hashes due to bcrypt's random salt generation.
|
||||
|
||||
**State Machine:**
|
||||
|
||||
| State | Meaning | Recovery |
|
||||
|-------|---------|----------|
|
||||
| `null` | Setup not started | Normal flow: begin setup |
|
||||
| `"in_progress"` | Bootstrap DB marked, runtime DB being initialized | Retry from beginning (runtime DB may be partial) |
|
||||
| `"complete"` | All settings persisted, setup finished | Skip setup (already done) |
|
||||
|
||||
If a crash is detected in `"in_progress"` state on the next startup, cleanup logic can detect this and either retry or remove the partial runtime database before retrying.
|
||||
|
||||
**Backward Compatibility:**
|
||||
|
||||
The `setup_completed = "1"` key is still written for backward compatibility with cache detection. Modern code checks `setup_state = "complete"` for clearer semantics.
|
||||
|
||||
---
|
||||
|
||||
## 8. Authentication & Session Management
|
||||
|
||||
- **Single-user model** — one master password, no usernames.
|
||||
- Password is hashed with a strong algorithm (e.g., bcrypt or argon2) and stored in the application database during setup.
|
||||
@@ -934,7 +978,7 @@ BanGUI maintains its **own SQLite database** (separate from the fail2ban databas
|
||||
- **Runtime state** (`RuntimeState` in `app.utils.runtime_state`) — stores mutable application state: `server_status` (fail2ban online/offline), `last_activation` (jail activation tracking), `pending_recovery` (crash detection), `runtime_settings` (effective configuration), and service-specific state holders like `jail_service_state` (`JailServiceState` for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., `record_activation()`, `clear_pending_recovery()`) and via dependency injection to services. Service-specific state (like `JailServiceState`) is nested within `RuntimeState` to keep all mutable state in one controlled location. **⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker.** Mutations must not span `await` points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See `app/utils/runtime_state.py` module docstring for details.
|
||||
- **Setup-completion flag** — once `is_setup_complete()` returns `True`, the result is stored in `app.state._setup_complete_cached`. The `SetupRedirectMiddleware` skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.
|
||||
|
||||
### 6.1 CSRF Protection
|
||||
### 8.1 CSRF Protection
|
||||
|
||||
State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a **custom header check middleware**.
|
||||
|
||||
@@ -949,7 +993,7 @@ State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authen
|
||||
This mechanism complements the existing `SameSite=Lax` cookie policy, which blocks traditional `<form>` POST requests but does not protect against JavaScript-initiated requests on a subdomain or same-origin XSS injection.
|
||||
|
||||
---
|
||||
## 7. Scheduling
|
||||
## 9. Scheduling
|
||||
|
||||
APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
|
||||
@@ -972,7 +1016,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
|
||||
---
|
||||
|
||||
## 7.1 Background Tasks and Database Access
|
||||
## 10.1 Background Tasks and Database Access
|
||||
|
||||
- APScheduler jobs run outside FastAPI request/response scope and therefore cannot rely on ``Depends(get_db)``.
|
||||
- Background tasks must open their own application database connection via ``app.db.open_db`` and close it when the work completes.
|
||||
@@ -981,9 +1025,9 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
|
||||
---
|
||||
|
||||
## 8. API Design
|
||||
## 9. API Design
|
||||
|
||||
### 8.1 Conventions
|
||||
### 9.1 Conventions
|
||||
|
||||
- All endpoints are grouped under `/api/` prefix.
|
||||
- JSON request and response bodies, validated by Pydantic models.
|
||||
@@ -992,7 +1036,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
- Standard HTTP status codes: `200` success, `201` created, `204` no content, `400` bad request, `401` unauthorized, `404` not found, `422` validation error, `423` locked, `500` server error.
|
||||
- Error responses follow a consistent shape: `{ "detail": "Human-readable message" }`.
|
||||
|
||||
### 8.2 Endpoint Groups
|
||||
### 9.2 Endpoint Groups
|
||||
|
||||
| Group | Endpoints | Description |
|
||||
|---|---|---|
|
||||
@@ -1043,7 +1087,7 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
|
||||
---
|
||||
|
||||
## 9.2 nginx Routing Rules
|
||||
## 10.2 nginx Routing Rules
|
||||
|
||||
The reverse proxy (nginx) must route requests correctly to prevent frontend SPA fallback rules from hiding backend 404 errors. The following location blocks ensure proper behavior:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user