TASK-023: Make database migrations atomic

Replace non-atomic db.executescript() with explicit transaction control.
Wrap each migration's DDL statements and schema_migrations insert in a
single BEGIN IMMEDIATE ... COMMIT transaction to ensure atomicity.

Changes:
- Add _parse_migration_statements() to split migration scripts into
  individual statements while handling comments and string literals
- Update _apply_migration() to wrap all statements in a single explicit
  transaction with rollback on error
- Ensure _get_current_schema_version() uses execute() instead of
  executescript()
- Add 9 new tests for migration atomicity and statement parsing
- Update Backend-Development.md with migration authoring guidelines

If a crash occurs between DDL execution and schema_migrations insert,
the next startup will re-apply the entire migration atomically,
preventing partial migrations and data corruption.

Test coverage: 98% on db.py (up from 55%)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-26 14:40:27 +02:00
parent 81f009e323
commit a44f1ef35b
3 changed files with 378 additions and 6 deletions

View File

@@ -364,6 +364,91 @@ assert escape_like("10.0.0.1") == "10.0.0.1" # Unchanged
---
## 6.2 Database Migrations
The application database schema is versioned and migrated automatically on startup via `app.db.init_db()`.
### Migration Design Principles
**Migrations must be atomic.** All schema changes for a single version (DDL statements) and the `schema_migrations` record insert must be wrapped in a single `BEGIN IMMEDIATE ... COMMIT` transaction. This prevents partial migrations if a process crashes mid-migration.
If a crash occurs between migration steps, the next startup will:
1. Detect the missing `schema_migrations` record.
2. Re-apply the entire migration in a single transaction (all-or-nothing).
3. Avoid data corruption or schema inconsistency.
### Writing a New Migration
1. **Add the DDL statements** to `_MIGRATIONS` dict in `app/db.py`:
```python
_MIGRATIONS: dict[int, str] = {
1: _CREATE_INITIAL_SCHEMA,
2: """
-- Migration 2: Add new_column to users table.
ALTER TABLE users ADD COLUMN new_column TEXT DEFAULT 'default_value';
CREATE INDEX idx_users_new_column ON users(new_column);
""",
}
```
2. **Update `_CURRENT_SCHEMA_VERSION`** to the new version number:
```python
_CURRENT_SCHEMA_VERSION: int = 2 # was 1
```
3. **Ensure idempotency where possible:**
- Use `CREATE TABLE IF NOT EXISTS` and `CREATE INDEX IF NOT EXISTS`.
- For `ALTER TABLE ADD COLUMN`, check if the column exists first using `PRAGMA table_info()` if re-applying the migration is a concern.
4. **Verify atomicity in tests:**
```python
async def test_migration_2_is_atomic(tmp_path: Path) -> None:
"""Verify migration 2 rolls back on failure."""
db = await open_db(str(tmp_path / "test.db"))
try:
await db.execute("CREATE TABLE schema_migrations (version INTEGER PRIMARY KEY);")
await db.commit()
# Add a test migration that fails mid-way
original = db_module._MIGRATIONS.copy()
db_module._MIGRATIONS[99] = """
CREATE TABLE test_table (id INTEGER PRIMARY KEY);
INSERT INTO nonexistent_table VALUES (1);
"""
try:
with pytest.raises(Exception):
await _apply_migration(db, 99)
# Verify rollback: migration NOT recorded
async with db.execute(
"SELECT version FROM schema_migrations WHERE version = 99;"
) as cursor:
assert await cursor.fetchone() is None
# Verify rollback: table NOT created
async with db.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='test_table';"
) as cursor:
assert await cursor.fetchone() is None
finally:
db_module._MIGRATIONS = original
finally:
await db.close()
```
### Common Pitfalls
- **Non-idempotent statements** — `ALTER TABLE ADD COLUMN` without `IF NOT EXISTS` will fail on re-run. Use explicit checks if needed.
- **Comments containing semicolons** — the migration parser strips comments correctly, but avoid unusual comment syntax.
- **String literals with semicolons** — the parser handles these; no special escaping needed.
- **Multiple operations in one migration** — keep migrations focused. Combine related DDL but split unrelated changes.
---
## 7. Logging
- Use **structlog** for every log message.