## [CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.) **Where found** - Backend does not set `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options` headers - Frontend HTML served without CSP meta tags **Why this is needed** Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks. **Goal** Add security headers to all HTTP responses. **What to do** 1. Add security headers middleware to `backend/app/main.py`: ```python @app.middleware("http") async def add_security_headers(request, call_next): response = await call_next(request) response.headers["Content-Security-Policy"] = "default-src 'self'" response.headers["X-Frame-Options"] = "DENY" response.headers["X-Content-Type-Options"] = "nosniff" return response ``` 2. In frontend `index.html`, add CSP meta tag 3. Test with browser DevTools Security tab **Possible traps and issues** - CSP `'unsafe-inline'` defeats security — avoid if possible - CDN resources may need explicit allowlist - Too restrictive CSP breaks functionality; too loose defeats security **Docs changes needed** - Add section in `Docs/Security.md` § HTTP Security Headers **Doc references** - `Docs/Security.md` (security headers) --- ## [CRITICAL] Background tasks lack timeout protection **Where found** - `backend/app/tasks/blocklist_import.py` — no timeout - `backend/app/tasks/health_check.py` — no timeout - All task functions lack timeout wrapper **Why this is needed** If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion. **Goal** Ensure all background tasks complete within bounded time or fail gracefully. **What to do** 1. Wrap all task functions with `asyncio.wait_for(task, timeout)`: ```python await asyncio.wait_for(blocklist_service.import_all(...), timeout=300) ``` 2. Set appropriate timeouts per task: - Blocklist import: 300s (5 min) - Health probe: 10s - Geo cache flush: 60s 3. Log timeout events and trigger alerts **Possible traps and issues** - Timeout too short → legitimate tasks killed prematurely - Timeout too long → resource leak if many tasks hang - Killing task mid-operation may leave inconsistent state **Docs changes needed** - Add section in `Docs/Backend-Development.md` § Background Tasks **Doc references** - `Docs/Backend-Development.md` (background tasks) - `backend/app/tasks/` (task modules) --- ## [CRITICAL] Background tasks not idempotent **Where found** - `backend/app/tasks/blocklist_import.py` — bans applied without checking if already banned - `backend/app/tasks/geo_cache_flush.py` — cache entries written without transaction - Multi-step operations not wrapped in transaction **Why this is needed** If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption. **Goal** Make all background tasks idempotent — retrying produces same result as running once. **What to do** 1. Use operation IDs to deduplicate: ```python operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}" if await import_log_repo.get_by_operation_id(operation_id): return # Already done ``` 2. Use transactions for multi-step operations 3. Store operation state before execution **Possible traps and issues** - Idempotency keys must be unique but deterministic - Transactions require database support - State machine (pending → completed/failed) must be enforced **Docs changes needed** - Update `Docs/Backend-Development.md` § Task Idempotency **Doc references** - `Docs/Backend-Development.md` (task design) - `backend/app/tasks/` (task implementations) --- ## [CRITICAL] Health check endpoint returns wrong status code **Where found** - `backend/app/routers/health.py` — always returns 200, even when fail2ban offline **Why this is needed** Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it. **Goal** Return 503 Service Unavailable when fail2ban is offline. **What to do** 1. Change health endpoint to return 503 when offline: ```python if not server_status.online: return JSONResponse( status_code=503, content={"status": "unavailable", "fail2ban": "offline"} ) ``` 2. Update Docker health check to expect 503 as "unhealthy" **Possible traps and issues** - Returning 503 causes orchestration tools to restart container - If fail2ban restarts frequently, health check becomes flaky - Consider gradual degradation **Docs changes needed** - Update `Docker/Dockerfile.backend` health check documentation - Update `Docs/Deployment.md` § Health Checks **Doc references** - `backend/app/routers/health.py` - `Docker/Dockerfile.backend` --- ## [IMPORTANT] Database transactions lack explicit isolation **Where found** - `backend/app/repositories/session_repo.py:40-60` — multiple queries without `BEGIN TRANSACTION` - Similar pattern in multi-step operations across repositories **Why this is needed** Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite. **Goal** Wrap all multi-step operations in explicit transactions with appropriate isolation level. **What to do** 1. Use explicit `BEGIN IMMEDIATE` transaction: ```python await db.execute("BEGIN IMMEDIATE") try: await db.execute("INSERT INTO sessions ...") await db.commit() except Exception: await db.rollback() raise ``` 2. Use `IMMEDIATE` mode to lock immediately for writes 3. Document transaction boundaries clearly **Possible traps and issues** - Nested transactions (SAVEPOINTs) may be needed - Locks held too long cause contention - Deadlocks possible with concurrent writers **Docs changes needed** - Add section in `Docs/Backend-Development.md` § Database Transactions **Doc references** - `Docs/Backend-Development.md` (database design) --- ## [IMPORTANT] Scheduler lock race condition **Where found** - `backend/app/utils/scheduler_lock.py:56-58` — heartbeat interval 10 seconds **Why this is needed** Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption. **Goal** Implement robust distributed locking that prevents concurrent execution. **What to do** **Option A (Strengthen heartbeat):** - Reduce interval to 5s (half of timeout) - Use database advisory locks - Monitor heartbeat failures **Option B (Migrate to Redis):** - Use `redlock-py` or `aioredis` - Simpler, more reliable than database-backed **Current code improvements:** - Log when heartbeat fails - Add metric for lock contention - Test multi-process scenario **Possible traps and issues** - Database locks don't scale under high contention - Redis adds new dependency - Clock skew breaks timestamp-based expiry **Docs changes needed** - Update `Docs/Deployment.md` § Scheduler Lock - Add troubleshooting: "Blocklist import runs twice" **Doc references** - `Docs/Deployment.md` (scheduler) - `backend/app/utils/scheduler_lock.py` (lock implementation) --- ## [IMPORTANT] API pagination doesn't return metadata **Where found** - `backend/app/routers/history.py` — returns bare list, no pagination metadata - All paginated routers have same issue **Why this is needed** Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query. **Goal** Return pagination metadata with every paginated response. **What to do** 1. Create response wrapper: ```python class PaginatedResponse(BaseModel): data: list[Item] pagination: PaginationMetadata ``` 2. Update all paginated routers to return this wrapper 3. Update frontend to use metadata for UI **Possible traps and issues** - `SELECT COUNT(*)` is slow on large tables - Response shape change — old frontend may not handle **Docs changes needed** - Update API documentation § Pagination **Doc references** - `backend/app/utils/pagination.py` --- ## [IMPORTANT] Error response schema inconsistent **Where found** - Different handlers return different response shapes - Fail2Ban errors: `{ "error_code": "...", "detail": "..." }` - Validation errors: `{ "detail": [...] }` - Not found errors: `{ "detail": "...", "error_code": "..." }` **Why this is needed** Frontend must normalize multiple shapes, making error handling fragile and error-prone. **Goal** Unify all error responses to single schema. **What to do** 1. Define canonical error response: ```python class ErrorResponse(BaseModel): error_code: str message: str status: int details: dict | None = None ``` 2. Update all handlers to return this format 3. Update frontend to expect unified schema **Possible traps and issues** - Backward compatibility with old clients - FastAPI's built-in handlers may override custom - Rich detail structures need accommodation **Docs changes needed** - Update API documentation with unified error schema - Add error code reference table **Doc references** - `Docs/API.md` (error codes) - `backend/app/main.py` (exception handlers) --- ## [IMPORTANT] Provider ordering fragility (Frontend) **Where found** - `frontend/src/App.tsx` — 10-level deep provider nesting - `frontend/src/providers/PROVIDER_ORDER.md` — documents order, no compile-time enforcement **Why this is needed** Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy. **Goal** Add compile-time validation of provider ordering. **What to do** 1. Create provider composition utility enforcing order 2. Use TypeScript discriminated unions 3. Add ESLint rule to check provider wrapping **Possible traps and issues** - TypeScript doesn't easily enforce ordering - May be overkill — improve runtime error messages instead **Docs changes needed** - Update `Docs/Architekture.md` § 3.2 (Providers) **Doc references** - `Docs/Architekture.md` § 3.2 (Providers) - `frontend/src/providers/PROVIDER_ORDER.md` --- ## [IMPORTANT] Promise cancellation not checked in .then()/.catch() chains **Where found** - `frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88` - `frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58` - Multiple components use this pattern **Why this is needed** When user navigates away, `.then()` chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context. **Goal** Check for cancellation in all `.then()/.catch()` chains. **What to do** 1. Replace `.then()/.catch()` with `async/await` and cancellation check 2. Or use wrapper hook to hide logic **Possible traps and issues** - Checking `signal.aborted` after `await` introduces race conditions - Better: let AbortError propagate, catch it in catch block **Docs changes needed** - Update `Docs/Web-Development.md` § Async Patterns **Doc references** - `Docs/Web-Development.md` (async patterns) --- ## [MEDIUM] Inefficient database pagination uses OFFSET **Where found** - `backend/app/utils/pagination.py` — uses `OFFSET (page-1) * page_size` **Why this is needed** OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️ **Goal** Implement keyset pagination (cursor-based) for large result sets. **What to do** 1. **Short-term:** Add database indexes on sort columns 2. **Long-term:** Implement cursor-based pagination using WHERE instead of OFFSET 3. Frontend sends cursor (last row ID) instead of page number **Possible traps and issues** - Cursor must be deterministic - API contract changes - Cursor format must be opaque to client **Docs changes needed** - Update `Docs/Backend-Development.md` § Database Performance **Doc references** - `Docs/Backend-Development.md` (database performance) --- ## [MEDIUM] Session secret rotation not implemented **Where found** - `backend/app/config.py` — single `session_secret` with no rotation support **Why this is needed** If secret leaks, all sessions compromised. No way to invalidate old sessions. **Goal** Support gradual secret rotation without forcing logout. **What to do** 1. Store multiple secrets: current and previous 2. Accept tokens signed with either key 3. Re-sign tokens with current secret on validation **Possible traps and issues** - Rotation strategy must be documented - Metrics needed to track secret usage **Docs changes needed** - Update `Docs/Backend-Development.md` § Session Management **Doc references** - `Docs/Backend-Development.md` --- ## [MEDIUM] No CORS configuration **Where found** - `backend/app/main.py` — no CORS middleware added **Why this is needed** If frontend on different origin, cross-origin requests blocked without CORS configuration. **Goal** Add CORS middleware with proper origin whitelisting. **What to do** 1. Add CORS middleware with specific origin whitelist 2. Make configurable via environment variable 3. Default to localhost for development **Possible traps and issues** - `allow_origins=["*"]` defeats CORS security - Credentials require specific origins, not wildcard - Missing config silently fails in browser **Docs changes needed** - Update `Docs/Deployment.md` § CORS Configuration **Doc references** - `Docs/Deployment.md` --- ## [MEDIUM] Input validation missing for regex patterns (ReDoS) **Where found** - `backend/app/routers/config.py` — regex validation accepts arbitrary patterns without timeout **Why this is needed** Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS. **Goal** Add timeout and complexity limits to regex validation. **What to do** 1. Add timeout to regex compilation (2 seconds recommended) 2. Add length limit (reject patterns > 1000 characters) 3. Use `signal.alarm()` (Unix) or timeout library **Possible traps and issues** - `signal.alarm()` Unix-only - Some valid complex regexes may timeout - Frontend should also validate (defense in depth) **Docs changes needed** - Update API docs to document regex validation limits **Doc references** - `backend/app/routers/config.py` --- ## [MEDIUM] No structured logging to external system **Where found** - Logs only go to stdout/file, no external aggregation **Why this is needed** Can't search across instances, historical logs lost on instance recycle. **Goal** Ship logs to centralized logging platform. **What to do** 1. **Short-term:** Ensure `structlog` JSON output is valid (already done) 2. **Long-term:** Ship to logging platform (ELK, Datadog, Papertrail) **Possible traps and issues** - External logging adds latency - Sensitive data must not be logged - Log volume can be massive **Docs changes needed** - Add `Docs/Observability.md` section on logging **Doc references** - `Docs/Observability.md` (new) --- ## [MEDIUM] No Application Performance Monitoring (APM) **Where found** - Backend: no metrics collection, latency tracking - Frontend: no error tracking, performance metrics - No observability into request performance **Why this is needed** Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown. **Goal** Add comprehensive metrics collection and monitoring. **What to do** 1. **Backend metrics:** - Add Prometheus metrics: request count, latency, active requests - Expose `/metrics` endpoint 2. **Frontend metrics:** - Page load time, FCP, LCP using `web-vitals` - API error rates and latencies 3. **Aggregation:** - Prometheus + Grafana, or Datadog/NewRelic **Possible traps and issues** - Metrics collection has performance cost - Cardinality explosion with tags - PII in metrics **Docs changes needed** - Add `Docs/Observability.md` **Doc references** - `Docs/Observability.md` (new) --- ## [LOW] Frontend charts not memoized **Where found** - `frontend/src/components/TopCountriesPieChart.tsx` - `frontend/src/components/TopCountriesBarChart.tsx` **Why this is needed** Charts re-render on every parent update, Recharts reprocesses 5000+ points. **Goal** Memoize chart components. **What to do** 1. Wrap with `React.memo` with custom comparison 2. Ensure data objects are stable **Possible traps and issues** - Shallow comparison might not be enough - Memoization has memory cost **Docs changes needed** - No documentation changes **Doc references** - `frontend/src/components/TopCountriesChart.tsx` --- ## [LOW] No request deduplication on frontend **Where found** - `frontend/src/hooks/useFetchData.ts` — each call launches new request - User clicks "Refresh" twice → two identical requests **Why this is needed** Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data). **Goal** Deduplicate identical in-flight requests. **What to do** 1. Implement request cache 2. Clear cache entry when response received 3. Use in `useFetchData` **Possible traps and issues** - Cache must be cleared on data mutation - Stale data in cache possible if not careful **Docs changes needed** - No documentation changes **Doc references** - `frontend/src/hooks/useFetchData.ts`