Add explicit database transaction isolation to multi-step operations

This commit addresses race conditions in multi-step database operations by:

1. Wrap write operations in BEGIN IMMEDIATE ... COMMIT transactions:
   - import_run_repo: create_pending, mark_completed, mark_failed
   - geo_cache_repo: all upsert_*_and_commit functions
   - geo_cache_repo: bulk_upsert_entries_and_neg_entries_and_commit

2. Handle concurrent write collisions gracefully:
   - import_run_repo.create_pending can now raise IntegrityError
   - blocklist_import_workflow catches IntegrityError and retries lookup
   - Logs 'blocklist_import_lost_race' event when another request wins the race

3. Add comprehensive documentation:
   - Backend-Development.md § 6.3 Database Transactions
   - Explains when to use BEGIN IMMEDIATE
   - Shows transaction pattern with try-except-rollback
   - Documents race condition error handling pattern

The solution leverages SQLite's UNIQUE constraint for data integrity while
handling the concurrent case gracefully in application logic. This is more
efficient than using BEGIN EXCLUSIVE which would serialize all writers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-30 22:04:15 +02:00
parent 94d6352d1d
commit f9e283541b
5 changed files with 259 additions and 112 deletions

View File

@@ -1196,6 +1196,99 @@ async def test_migration_2_is_atomic(tmp_path: Path) -> None:
---
## 6.3 Database Transactions
Database transactions ensure atomicity for multi-step operations and prevent race conditions when concurrent requests interact with the database. BanGUI uses **SQLite with WAL (Write-Ahead Logging)** mode, which enables concurrent readers but serializes writers.
### When to Use Explicit Transactions
**Use `BEGIN IMMEDIATE ... COMMIT` for:**
1. **Multi-step logical operations** — Operations that should succeed or fail as a unit. Example:
```python
# Bad — two separate operations, race condition window exists
await db.execute("INSERT INTO sessions ...")
await db.commit()
# Good — atomic single operation, no need for explicit transaction
```
2. **Operations that combine multiple queries with shared state** — When the operation outcome depends on reading and then writing based on that read:
```python
# Bad — race condition: another request checks between our read and write
existing_run = await import_run_repo.get_by_source_and_hash(db, source_id, content_hash)
if existing_run is None:
run_id = await import_run_repo.create_pending(db, source_id, content_hash)
# Good — atomic: both operations within same transaction boundary
try:
await db.execute("BEGIN IMMEDIATE")
cursor = await db.execute("INSERT INTO import_runs ...")
await db.commit()
except aiosqlite.IntegrityError:
# Another request won the race; fetch the existing record
existing = await import_run_repo.get_by_source_and_hash(...)
...
```
3. **Bulk operations that should be all-or-nothing** — For example, upserting positive and negative geo cache entries:
```python
try:
await db.execute("BEGIN IMMEDIATE")
await bulk_upsert_entries(db, positive_rows)
await bulk_upsert_neg_entries(db, negative_ips)
await db.commit()
except Exception:
await db.rollback()
raise
```
**Do NOT use explicit transactions for:**
- Single SQL statements — SQLite guarantees atomic writes for individual statements. No explicit transaction needed.
- Read-only queries — Queries do not modify data and do not need transaction boundaries.
### Transaction Pattern
Always use this pattern for wrapped operations:
```python
try:
await db.execute("BEGIN IMMEDIATE")
# ... perform all operations ...
await db.commit()
except Exception:
await db.rollback()
raise
```
- **`BEGIN IMMEDIATE`** — Acquires a write lock immediately, preventing other writers from entering the transaction window. This is critical for crash-safety and consistency.
- **`COMMIT`** — Persists all changes.
- **`ROLLBACK`** — Rolls back on any exception, ensuring the database is left in a consistent state.
### Handling Race Condition Errors
When a `UNIQUE` constraint violation occurs due to a race condition (two concurrent requests attempt the same insert), the database raises `aiosqlite.IntegrityError`. **Handle this at the call site** by retrying the lookup:
```python
try:
run_id = await import_run_repo.create_pending(db, source_id, content_hash)
except aiosqlite.IntegrityError:
# Another concurrent request created it first
existing = await import_run_repo.get_by_source_and_hash(db, source_id, content_hash)
if existing is None:
raise RuntimeError("Constraint error indicates row exists but lookup failed")
run_id = existing.id
log.info("lost_race", run_id=run_id)
```
This approach:
1. Lets the database constraint prevent data corruption.
2. Gracefully handles the concurrent case in application logic.
3. Avoids unnecessary locking overhead for the common case (no concurrent writers).
---
## 7. Structured Logging Policy
All logging in BanGUI services and tasks must use **structlog** for consistent, queryable event tracking. This policy defines when and how to log at each level.

View File

@@ -1,48 +1,3 @@
## [CRITICAL] Health check endpoint returns wrong status code
**Where found**
- `backend/app/routers/health.py` — always returns 200, even when fail2ban offline
**Why this is needed**
Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.
**Goal**
Return 503 Service Unavailable when fail2ban is offline.
**What to do**
1. Change health endpoint to return 503 when offline:
```python
if not server_status.online:
return JSONResponse(
status_code=503,
content={"status": "unavailable", "fail2ban": "offline"}
)
```
2. Update Docker health check to expect 503 as "unhealthy"
**Possible traps and issues**
- Returning 503 causes orchestration tools to restart container
- If fail2ban restarts frequently, health check becomes flaky
- Consider gradual degradation
**Docs changes needed**
- Update `Docker/Dockerfile.backend` health check documentation
- Update `Docs/Deployment.md` § Health Checks
**Doc references**
- `backend/app/routers/health.py`
- `Docker/Dockerfile.backend`
---
## [IMPORTANT] Database transactions lack explicit isolation
**Where found**