fix: atomic upsert for import runs (Issue #12)
Replace check-then-insert race condition with INSERT ON CONFLICT. - upsert_pending uses RETURNING id for atomic upsert - UNIQUE(source_id, content_hash) constraint from migration 6 - blocklist_import_workflow updated to use upsert_pending - test_import_source_success fixed for async mock patterns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -121,6 +121,7 @@ rm bangui.db bangui.db-wal bangui.db-shm
|
||||
| 6 | Add import_runs table for idempotent imports |
|
||||
| 7 | Add indexes to import_log |
|
||||
| 8 | Migrate import_log.timestamp TEXT→INTEGER UNIX |
|
||||
| 9 | Change import_log.source_id FK to ON DELETE RESTRICT |
|
||||
|
||||
## Adding New Migrations
|
||||
|
||||
|
||||
@@ -358,6 +358,13 @@ Automated downloading and applying of external IP blocklists to block known mali
|
||||
- Display the import log in the web interface, filterable by source and date range.
|
||||
- Show a warning badge in the navigation if the most recent import encountered errors.
|
||||
|
||||
### Data Retention & Deletion
|
||||
|
||||
- Import logs are retained for audit and troubleshooting purposes.
|
||||
- A blocklist source **cannot be deleted** while it has associated import logs (foreign key RESTRICT constraint).
|
||||
- Before deleting a source, delete all its import logs first via the API.
|
||||
- Attempting to delete a source with logs returns **HTTP 409 Conflict** with error code `blocklist_source_has_logs`.
|
||||
|
||||
### Error Handling
|
||||
|
||||
- If a blocklist URL is unreachable, log the error and continue with remaining sources.
|
||||
|
||||
@@ -1,99 +1,3 @@
|
||||
### Issue #10: HIGH - Database Type Inconsistency (Timestamps Mixed Across Tables)
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/db.py` (lines 68-75)
|
||||
- `import_log` table uses TEXT ISO 8601 format
|
||||
- `history_archive` table uses INTEGER UNIX timestamp
|
||||
- Frontend receives both formats
|
||||
|
||||
**Why this is needed**:
|
||||
Frontend parsing code must handle multiple formats:
|
||||
- Parser might parse one format incorrectly
|
||||
- Inconsistent type representation makes bugs harder to track
|
||||
- Aggregation queries mixing both formats require conversions
|
||||
|
||||
**Goal**:
|
||||
Standardize all timestamps on UNIX timestamps (INTEGER seconds since epoch) throughout entire database.
|
||||
|
||||
**What to do**:
|
||||
1. Migrate `import_log.timestamp` from TEXT to INTEGER:
|
||||
```sql
|
||||
ALTER TABLE import_log ADD COLUMN timestamp_unix INTEGER;
|
||||
UPDATE import_log SET timestamp_unix = strftime('%s', timestamp);
|
||||
ALTER TABLE import_log DROP COLUMN timestamp;
|
||||
ALTER TABLE import_log RENAME COLUMN timestamp_unix TO timestamp;
|
||||
```
|
||||
2. Update all code to use UNIX timestamps
|
||||
3. Add validation in repositories that timestamps are UNIX format
|
||||
4. Update frontend parsing to handle UNIX timestamps
|
||||
5. Write migration tests with various date formats
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Existing data with invalid timestamps causes migration to fail
|
||||
- Timezone issues if code assumes local time
|
||||
- Backward compatibility breaks for old APIs
|
||||
- Performance impact of migration on large tables
|
||||
|
||||
**Docs changes needed**:
|
||||
- Document timestamp format requirement in API docs
|
||||
- Add to backend development guide
|
||||
- Create migration runbook
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1.2 Data Type Inconsistency"
|
||||
|
||||
---
|
||||
|
||||
### Issue #11: HIGH - Foreign Key ON DELETE Semantics Problem (Data Loss)
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/db.py` (lines 61-70)
|
||||
- `import_log.source_id` uses `ON DELETE SET NULL`
|
||||
- `import_log.source_url` remains populated
|
||||
- Orphaned log records with NULL source_id but populated URL
|
||||
|
||||
**Why this is needed**:
|
||||
When a blocklist source is deleted:
|
||||
- Import logs become orphaned and meaningless
|
||||
- UI can't link logs back to source
|
||||
- Data becomes inconsistent
|
||||
|
||||
**Goal**:
|
||||
Fix foreign key cascade strategy to maintain data integrity.
|
||||
|
||||
**What to do**:
|
||||
1. Decide cascade strategy:
|
||||
- Option A: `ON DELETE CASCADE` - delete all logs when source deleted (data loss)
|
||||
- Option B: `ON DELETE RESTRICT` - prevent source deletion if logs exist (prevent data loss)
|
||||
- Option C: Keep source_id for history without URL (reference counting)
|
||||
2. Recommendation: Use Option B with proper deletion workflow:
|
||||
```sql
|
||||
ALTER TABLE import_log
|
||||
DROP CONSTRAINT fk_source_id,
|
||||
ADD CONSTRAINT fk_source_id
|
||||
FOREIGN KEY(source_id) REFERENCES blocklist_sources(id)
|
||||
ON DELETE RESTRICT;
|
||||
```
|
||||
3. If deleting source, first archive and delete old logs
|
||||
4. Update deletion API to handle RESTRICT error
|
||||
5. Document deletion procedures
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Existing schema already has ON DELETE SET NULL - migration needed
|
||||
- User tries to delete source with logs - must handle RESTRICT error
|
||||
- Cascading deletes can cause unexpected data loss
|
||||
- Historical logs might be valuable for audit
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add data retention policy to `Docs/Features.md`
|
||||
- Document deletion constraints in API docs
|
||||
- Add to admin guide
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1.3 Foreign Key ON DELETE"
|
||||
|
||||
---
|
||||
|
||||
### Issue #12: HIGH - Race Condition in Concurrent Writes (Import Runs Duplication)
|
||||
|
||||
**Where found**:
|
||||
|
||||
Reference in New Issue
Block a user