fix: atomic upsert for import runs (Issue #12)

Replace check-then-insert race condition with INSERT ON CONFLICT.
- upsert_pending uses RETURNING id for atomic upsert
- UNIQUE(source_id, content_hash) constraint from migration 6
- blocklist_import_workflow updated to use upsert_pending
- test_import_source_success fixed for async mock patterns

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-02 23:39:43 +02:00
parent 1285bc8571
commit e436727942
11 changed files with 144 additions and 164 deletions

View File

@@ -1,99 +1,3 @@
### Issue #10: HIGH - Database Type Inconsistency (Timestamps Mixed Across Tables)
**Where found**:
- `backend/app/db.py` (lines 68-75)
- `import_log` table uses TEXT ISO 8601 format
- `history_archive` table uses INTEGER UNIX timestamp
- Frontend receives both formats
**Why this is needed**:
Frontend parsing code must handle multiple formats:
- Parser might parse one format incorrectly
- Inconsistent type representation makes bugs harder to track
- Aggregation queries mixing both formats require conversions
**Goal**:
Standardize all timestamps on UNIX timestamps (INTEGER seconds since epoch) throughout entire database.
**What to do**:
1. Migrate `import_log.timestamp` from TEXT to INTEGER:
```sql
ALTER TABLE import_log ADD COLUMN timestamp_unix INTEGER;
UPDATE import_log SET timestamp_unix = strftime('%s', timestamp);
ALTER TABLE import_log DROP COLUMN timestamp;
ALTER TABLE import_log RENAME COLUMN timestamp_unix TO timestamp;
```
2. Update all code to use UNIX timestamps
3. Add validation in repositories that timestamps are UNIX format
4. Update frontend parsing to handle UNIX timestamps
5. Write migration tests with various date formats
**Possible traps and issues**:
- Existing data with invalid timestamps causes migration to fail
- Timezone issues if code assumes local time
- Backward compatibility breaks for old APIs
- Performance impact of migration on large tables
**Docs changes needed**:
- Document timestamp format requirement in API docs
- Add to backend development guide
- Create migration runbook
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1.2 Data Type Inconsistency"
---
### Issue #11: HIGH - Foreign Key ON DELETE Semantics Problem (Data Loss)
**Where found**:
- `backend/app/db.py` (lines 61-70)
- `import_log.source_id` uses `ON DELETE SET NULL`
- `import_log.source_url` remains populated
- Orphaned log records with NULL source_id but populated URL
**Why this is needed**:
When a blocklist source is deleted:
- Import logs become orphaned and meaningless
- UI can't link logs back to source
- Data becomes inconsistent
**Goal**:
Fix foreign key cascade strategy to maintain data integrity.
**What to do**:
1. Decide cascade strategy:
- Option A: `ON DELETE CASCADE` - delete all logs when source deleted (data loss)
- Option B: `ON DELETE RESTRICT` - prevent source deletion if logs exist (prevent data loss)
- Option C: Keep source_id for history without URL (reference counting)
2. Recommendation: Use Option B with proper deletion workflow:
```sql
ALTER TABLE import_log
DROP CONSTRAINT fk_source_id,
ADD CONSTRAINT fk_source_id
FOREIGN KEY(source_id) REFERENCES blocklist_sources(id)
ON DELETE RESTRICT;
```
3. If deleting source, first archive and delete old logs
4. Update deletion API to handle RESTRICT error
5. Document deletion procedures
**Possible traps and issues**:
- Existing schema already has ON DELETE SET NULL - migration needed
- User tries to delete source with logs - must handle RESTRICT error
- Cascading deletes can cause unexpected data loss
- Historical logs might be valuable for audit
**Docs changes needed**:
- Add data retention policy to `Docs/Features.md`
- Document deletion constraints in API docs
- Add to admin guide
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1.3 Foreign Key ON DELETE"
---
### Issue #12: HIGH - Race Condition in Concurrent Writes (Import Runs Duplication)
**Where found**: