Replace check-then-insert race condition with INSERT ON CONFLICT.
- upsert_pending uses RETURNING id for atomic upsert
- UNIQUE(source_id, content_hash) constraint from migration 6
- blocklist_import_workflow updated to use upsert_pending
- test_import_source_success fixed for async mock patterns
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CRITICAL FIX: Background tasks (especially blocklist_import) crashed mid-execution,
leaving partial state. On retry, the same bans were applied again, causing duplicates.
Solution: Content-hash based operation tracking for blocklist imports:
- Added import_runs table (migration 6) to track operations by source + content hash
- Before banning, check if this exact content has already been imported
- If completed: skip banning (already done), optionally re-warm cache
- If new or failed: proceed with ban and mark as completed or failed
Changes:
- Database: Migration 6 adds import_runs table with operation state tracking
- Model: Added ImportRunEntry for import run records
- Repository: New import_run_repo module with CRUD operations
- Workflow: Updated blocklist_import_workflow to check operation history before banning
- Dependencies: Registered import_run_repo for dependency injection
- Tests: Added test_import_source_idempotent_on_retry and test_import_source_different_content_not_reused
- Documentation: Added Task Idempotency section to Backend-Development.md
Verification:
- All 7 import tests pass (5 existing + 2 new idempotency tests)
- Type checking: mypy --strict ✅
- Linting: ruff ✅
- No API changes, backwards compatible via automatic migration
Fixes: Background tasks not idempotent #CRITICAL
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.
## Changes
### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
- New set_settings_batch() method: writes multiple settings in single
transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
or none do, preventing partial state if crash occurs mid-batch.
2. **setup_service.py**: Refactor run_setup() with transactional phases
- Phase 0: Compute password hash early (before any DB writes) to ensure
idempotency. Same hash is used throughout retries, preventing divergent
hashes from bcrypt's random salt.
- Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
database_path, then commit. First checkpoint for crash detection.
- Phase 2 (Filesystem): Initialize runtime database (idempotent)
- Phase 3 (Runtime DB transaction): Batch-write all settings atomically
- Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
setup_completed=1. Final commit point.
3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol
### Testing
- Added 6 new transactionality tests covering:
- State machine transitions (None → in_progress → complete)
- Password hash idempotency across retries
- Atomic batch writes (all-or-nothing persistence)
- Bootstrap DB state tracking
- Database path propagation to both DBs
- Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)
### Documentation
- Updated Docs/Architekture.md with new section 6:
- Setup state machine with state transitions
- Transaction boundary documentation
- Password hash idempotency rationale
- Backward compatibility notes
## Design Decisions
### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers
### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ban domain model for core business logic separation
- Implement mapper pattern for DTO/domain conversions
- Update ban service with new domain-driven approach
- Refactor router endpoints to use new architecture
- Add comprehensive mapper tests
- Update documentation with architecture changes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem: Repository modules use structural typing to satisfy Protocol interfaces via
cast(). A function rename, parameter change, or signature mismatch would silently pass
mypy but fail at runtime.
Solution (Option B — minimal):
1. Aligned Protocol signatures in protocols.py with actual implementations:
- BlocklistRepository: dict[str, object] → dict[str, Any] (matches implementation)
- ImportLogRepository: dict[str, object] → ImportLogRow (typed model)
- GeoCacheRepository: dict[str, object] → GeoCacheRow; Iterable → Sequence
- HistoryArchiveRepository: dict[str, object] → dict[str, Any]
- ImportLogRepository: async compute_total_pages → sync (matches implementation)
2. Created CI validation script (backend/scripts/validate_repository_protocols.py)
that runs at build time to ensure all repository modules satisfy their Protocol
interfaces. Exit 0 if valid, 1 if any mismatch. Detects:
- Missing functions
- Parameter count mismatches
- Type annotation mismatches
- Return type mismatches
3. Updated backend/app/dependencies.py with explicit docstrings linking each
get_*_repo() provider to Backend-Development.md § 13.7.1, explaining the
module-as-Protocol pattern and that it is intentional and validated.
4. Documented the pattern in Backend-Development.md § 13.7.1:
'Repository Module Pattern — Module-as-Protocol Structural Compatibility'
explaining why the pattern works, risks (silent breakage), and how the
validation mitigates it.
5. Fixed type annotation in history_archive_repo.py:
- get_all_archived_history returns list[dict] → list[dict[str, Any]]
- Imported Any type
Benefits:
- Prevents silent breakage of repository interfaces
- Formalizes the module-as-Protocol pattern as intentional
- CI validation prevents regressions without refactoring cost
- All repository tests pass (53/53)
- mypy --strict passes on modified files
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>