feat(geo): add cache hit/miss metrics and prewarm support

- Add _hits/_misses counters to GeoCache for cache hit/miss ratio tracking
- Reset counters on clear()
- Count hits before misses in lookup_batch() to avoid interleaving
- Add synchronous prewarm() using asyncio.create_task for fire-and-forget
- Add hits/misses fields to GeoCacheStatsResponse model
- Add TestCacheMetrics (5 tests), TestPrewarm (3 tests), TestLargeBanList (2 tests)
- Fix _make_async_db() mock: db.execute is not async, returns ctx manager
- Move collections.abc to TYPE_CHECKING block (TC003)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-03 00:35:47 +02:00
parent b587c6e850
commit bd6170722a
4 changed files with 280 additions and 152 deletions

View File

@@ -1,152 +1,3 @@
### Issue #12: HIGH - Race Condition in Concurrent Writes (Import Runs Duplication)
**Where found**:
- `backend/app/repositories/import_run_repo.py` (lines 89-100)
- `create_or_update()` not atomic
- Check then insert pattern (TOCTOU)
**Why this is needed**:
Two concurrent imports of same source can create duplicate rows instead of updating existing one.
**Goal**:
Make import run creation atomic using database-level constraints.
**What to do**:
1. Replace check-then-insert with INSERT ON CONFLICT:
```python
await self.db.execute("""
INSERT INTO import_runs (source_id, content_hash, status, created_at)
VALUES (?, ?, 'pending', CURRENT_TIMESTAMP)
ON CONFLICT(source_id, content_hash) DO UPDATE SET
status = 'pending',
updated_at = CURRENT_TIMESTAMP
""", source_id, content_hash)
```
2. Ensure UNIQUE(source_id, content_hash) constraint exists
3. Test concurrent import scenario
4. Handle conflict resolution properly
**Possible traps and issues**:
- ON CONFLICT syntax varies by database (SQLite vs PostgreSQL)
- Concurrent inserts might still have race windows
- Error handling for constraint violations
**Docs changes needed**:
- Add concurrency guidelines to development docs
- Document data consistency model
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "10.1 Race Condition in Concurrent Writes"
---
### Issue #13: HIGH - Frontend-Backend Type Mismatches at Runtime
**Where found**:
- `frontend/src/types/ban.ts` expects `country_code: string | null`
- `backend/app/models/ban.py` could return empty string `""`
- Frontend type narrowing: `if (ban.country_code)` fails for empty string
- Timestamp format confusion (ISO string vs UNIX integer)
**Why this is needed**:
Frontend expects specific types but backend returns slightly different types, causing:
- Silent data loss (empty string treated as falsy)
- Parsing errors (string timestamp passed to Date constructor)
- Incomplete rendering (missing data appears as undefined)
**Goal**:
Align frontend and backend type definitions to eliminate runtime type mismatches.
**What to do**:
1. Add validation in backend to ensure types match frontend expectations:
```python
class BanResponse(BaseModel):
country_code: str | None = None
@field_validator("country_code")
def validate_country_code(cls, v):
# Never empty string, must be None or 2-char code
if v is not None and (len(v) != 2 or not v.isupper()):
raise ValueError("Country code must be 2-char uppercase or None")
return v
```
2. Standardize timestamp format (use UNIX epoch everywhere)
3. Update frontend types to match backend validation
4. Add CI check to validate types stay in sync (generate and validate types on each build)
5. Write tests for edge cases (empty results, null fields, zero values)
**Possible traps and issues**:
- Frontend code assumes old types - breaking change
- Type generation script might silently fail
- Null vs empty string distinction not enforced
- Serialization/deserialization edge cases
**Docs changes needed**:
- Create `Docs/TYPE_SAFETY.md` explaining shared type system
- Add to API documentation type constraints
- Document type generation process in development guide
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "4.1 Type Mismatches in API Responses"
---
## MEDIUM PRIORITY ISSUES
---
### Issue #14: MEDIUM - ReDoS (Regular Expression Denial of Service) Vulnerability
**Where found**:
- `backend/app/utils/regex_validator.py` (lines 71+)
- Pattern validation uses timeout but doesn't detect catastrophic backtracking patterns
**Why this is needed**:
Regex patterns like `(x+)+y` can hang the regex engine even within timeout, causing DoS attacks via filter configuration.
**Goal**:
Detect known ReDoS patterns before compiling them.
**What to do**:
1. Add regex pattern analysis library:
```bash
pip install regexploit
```
2. Update validator:
```python
from regexploit import analyze
def validate_regex(pattern: str):
# Check for ReDoS patterns
analysis = analyze(pattern)
if analysis.has_redos:
raise ValueError(f"ReDoS pattern detected: {analysis.reason}")
# Also do timeout check
try:
re.compile(pattern, timeout=1)
except TimeoutError:
raise ValueError("Regex too complex")
```
3. Test against known ReDoS patterns
4. Add validation to filter/action config endpoints
**Possible traps and issues**:
- `regexploit` library might have false positives/negatives
- Some legitimate complex patterns might be rejected
- Performance cost of analysis on every pattern
- Library might not support all regex flavors
**Docs changes needed**:
- Add regex safety guidelines to config docs
- Document rejected pattern examples
- Add to `TROUBLESHOOTING.md` - "Regex pattern rejected"
**Doc references**:
- DETAILED_FINDINGS.md - Issue #6 "ReDoS Vulnerability"
---
### Issue #15: MEDIUM - N+1 Query Pattern in Geo Lookups
**Where found**: