- Add SecurityHeadersMiddleware to backend/app/main.py - Implements Content-Security-Policy: default-src 'self' - Implements X-Frame-Options: DENY (clickjacking protection) - Implements X-Content-Type-Options: nosniff (MIME-sniffing protection) - Implements X-XSS-Protection: 1; mode=block (browser XSS filters) - Add CSP meta tag to frontend/index.html for defense-in-depth - Create Docs/Security.md with comprehensive security headers documentation - Add test suite (backend/tests/test_security_headers_middleware.py) with 5 tests - Tests verify headers are present on success and error responses - Tests ensure all four security headers are correctly set - All existing tests continue to pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
734 lines
17 KiB
Markdown
734 lines
17 KiB
Markdown
## [CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)
|
|
|
|
**Where found**
|
|
|
|
- Backend does not set `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options` headers
|
|
- Frontend HTML served without CSP meta tags
|
|
|
|
**Why this is needed**
|
|
|
|
Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks.
|
|
|
|
**Goal**
|
|
|
|
Add security headers to all HTTP responses.
|
|
|
|
**What to do**
|
|
|
|
1. Add security headers middleware to `backend/app/main.py`:
|
|
```python
|
|
@app.middleware("http")
|
|
async def add_security_headers(request, call_next):
|
|
response = await call_next(request)
|
|
response.headers["Content-Security-Policy"] = "default-src 'self'"
|
|
response.headers["X-Frame-Options"] = "DENY"
|
|
response.headers["X-Content-Type-Options"] = "nosniff"
|
|
return response
|
|
```
|
|
|
|
2. In frontend `index.html`, add CSP meta tag
|
|
3. Test with browser DevTools Security tab
|
|
|
|
**Possible traps and issues**
|
|
|
|
- CSP `'unsafe-inline'` defeats security — avoid if possible
|
|
- CDN resources may need explicit allowlist
|
|
- Too restrictive CSP breaks functionality; too loose defeats security
|
|
|
|
**Docs changes needed**
|
|
|
|
- Add section in `Docs/Security.md` § HTTP Security Headers
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Security.md` (security headers)
|
|
|
|
---
|
|
|
|
## [CRITICAL] Background tasks lack timeout protection
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/tasks/blocklist_import.py` — no timeout
|
|
- `backend/app/tasks/health_check.py` — no timeout
|
|
- All task functions lack timeout wrapper
|
|
|
|
**Why this is needed**
|
|
|
|
If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion.
|
|
|
|
**Goal**
|
|
|
|
Ensure all background tasks complete within bounded time or fail gracefully.
|
|
|
|
**What to do**
|
|
|
|
1. Wrap all task functions with `asyncio.wait_for(task, timeout)`:
|
|
```python
|
|
await asyncio.wait_for(blocklist_service.import_all(...), timeout=300)
|
|
```
|
|
|
|
2. Set appropriate timeouts per task:
|
|
- Blocklist import: 300s (5 min)
|
|
- Health probe: 10s
|
|
- Geo cache flush: 60s
|
|
|
|
3. Log timeout events and trigger alerts
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Timeout too short → legitimate tasks killed prematurely
|
|
- Timeout too long → resource leak if many tasks hang
|
|
- Killing task mid-operation may leave inconsistent state
|
|
|
|
**Docs changes needed**
|
|
|
|
- Add section in `Docs/Backend-Development.md` § Background Tasks
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Backend-Development.md` (background tasks)
|
|
- `backend/app/tasks/` (task modules)
|
|
|
|
---
|
|
|
|
## [CRITICAL] Background tasks not idempotent
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/tasks/blocklist_import.py` — bans applied without checking if already banned
|
|
- `backend/app/tasks/geo_cache_flush.py` — cache entries written without transaction
|
|
- Multi-step operations not wrapped in transaction
|
|
|
|
**Why this is needed**
|
|
|
|
If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.
|
|
|
|
**Goal**
|
|
|
|
Make all background tasks idempotent — retrying produces same result as running once.
|
|
|
|
**What to do**
|
|
|
|
1. Use operation IDs to deduplicate:
|
|
```python
|
|
operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}"
|
|
if await import_log_repo.get_by_operation_id(operation_id):
|
|
return # Already done
|
|
```
|
|
|
|
2. Use transactions for multi-step operations
|
|
3. Store operation state before execution
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Idempotency keys must be unique but deterministic
|
|
- Transactions require database support
|
|
- State machine (pending → completed/failed) must be enforced
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Backend-Development.md` § Task Idempotency
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Backend-Development.md` (task design)
|
|
- `backend/app/tasks/` (task implementations)
|
|
|
|
---
|
|
|
|
## [CRITICAL] Health check endpoint returns wrong status code
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/routers/health.py` — always returns 200, even when fail2ban offline
|
|
|
|
**Why this is needed**
|
|
|
|
Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.
|
|
|
|
**Goal**
|
|
|
|
Return 503 Service Unavailable when fail2ban is offline.
|
|
|
|
**What to do**
|
|
|
|
1. Change health endpoint to return 503 when offline:
|
|
```python
|
|
if not server_status.online:
|
|
return JSONResponse(
|
|
status_code=503,
|
|
content={"status": "unavailable", "fail2ban": "offline"}
|
|
)
|
|
```
|
|
|
|
2. Update Docker health check to expect 503 as "unhealthy"
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Returning 503 causes orchestration tools to restart container
|
|
- If fail2ban restarts frequently, health check becomes flaky
|
|
- Consider gradual degradation
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docker/Dockerfile.backend` health check documentation
|
|
- Update `Docs/Deployment.md` § Health Checks
|
|
|
|
**Doc references**
|
|
|
|
- `backend/app/routers/health.py`
|
|
- `Docker/Dockerfile.backend`
|
|
|
|
---
|
|
|
|
## [IMPORTANT] Database transactions lack explicit isolation
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/repositories/session_repo.py:40-60` — multiple queries without `BEGIN TRANSACTION`
|
|
- Similar pattern in multi-step operations across repositories
|
|
|
|
**Why this is needed**
|
|
|
|
Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite.
|
|
|
|
**Goal**
|
|
|
|
Wrap all multi-step operations in explicit transactions with appropriate isolation level.
|
|
|
|
**What to do**
|
|
|
|
1. Use explicit `BEGIN IMMEDIATE` transaction:
|
|
```python
|
|
await db.execute("BEGIN IMMEDIATE")
|
|
try:
|
|
await db.execute("INSERT INTO sessions ...")
|
|
await db.commit()
|
|
except Exception:
|
|
await db.rollback()
|
|
raise
|
|
```
|
|
|
|
2. Use `IMMEDIATE` mode to lock immediately for writes
|
|
3. Document transaction boundaries clearly
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Nested transactions (SAVEPOINTs) may be needed
|
|
- Locks held too long cause contention
|
|
- Deadlocks possible with concurrent writers
|
|
|
|
**Docs changes needed**
|
|
|
|
- Add section in `Docs/Backend-Development.md` § Database Transactions
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Backend-Development.md` (database design)
|
|
|
|
---
|
|
|
|
## [IMPORTANT] Scheduler lock race condition
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/utils/scheduler_lock.py:56-58` — heartbeat interval 10 seconds
|
|
|
|
**Why this is needed**
|
|
|
|
Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.
|
|
|
|
**Goal**
|
|
|
|
Implement robust distributed locking that prevents concurrent execution.
|
|
|
|
**What to do**
|
|
|
|
**Option A (Strengthen heartbeat):**
|
|
- Reduce interval to 5s (half of timeout)
|
|
- Use database advisory locks
|
|
- Monitor heartbeat failures
|
|
|
|
**Option B (Migrate to Redis):**
|
|
- Use `redlock-py` or `aioredis`
|
|
- Simpler, more reliable than database-backed
|
|
|
|
**Current code improvements:**
|
|
- Log when heartbeat fails
|
|
- Add metric for lock contention
|
|
- Test multi-process scenario
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Database locks don't scale under high contention
|
|
- Redis adds new dependency
|
|
- Clock skew breaks timestamp-based expiry
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Deployment.md` § Scheduler Lock
|
|
- Add troubleshooting: "Blocklist import runs twice"
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Deployment.md` (scheduler)
|
|
- `backend/app/utils/scheduler_lock.py` (lock implementation)
|
|
|
|
---
|
|
|
|
## [IMPORTANT] API pagination doesn't return metadata
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/routers/history.py` — returns bare list, no pagination metadata
|
|
- All paginated routers have same issue
|
|
|
|
**Why this is needed**
|
|
|
|
Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.
|
|
|
|
**Goal**
|
|
|
|
Return pagination metadata with every paginated response.
|
|
|
|
**What to do**
|
|
|
|
1. Create response wrapper:
|
|
```python
|
|
class PaginatedResponse(BaseModel):
|
|
data: list[Item]
|
|
pagination: PaginationMetadata
|
|
```
|
|
|
|
2. Update all paginated routers to return this wrapper
|
|
3. Update frontend to use metadata for UI
|
|
|
|
**Possible traps and issues**
|
|
|
|
- `SELECT COUNT(*)` is slow on large tables
|
|
- Response shape change — old frontend may not handle
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update API documentation § Pagination
|
|
|
|
**Doc references**
|
|
|
|
- `backend/app/utils/pagination.py`
|
|
|
|
---
|
|
|
|
## [IMPORTANT] Error response schema inconsistent
|
|
|
|
**Where found**
|
|
|
|
- Different handlers return different response shapes
|
|
- Fail2Ban errors: `{ "error_code": "...", "detail": "..." }`
|
|
- Validation errors: `{ "detail": [...] }`
|
|
- Not found errors: `{ "detail": "...", "error_code": "..." }`
|
|
|
|
**Why this is needed**
|
|
|
|
Frontend must normalize multiple shapes, making error handling fragile and error-prone.
|
|
|
|
**Goal**
|
|
|
|
Unify all error responses to single schema.
|
|
|
|
**What to do**
|
|
|
|
1. Define canonical error response:
|
|
```python
|
|
class ErrorResponse(BaseModel):
|
|
error_code: str
|
|
message: str
|
|
status: int
|
|
details: dict | None = None
|
|
```
|
|
|
|
2. Update all handlers to return this format
|
|
3. Update frontend to expect unified schema
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Backward compatibility with old clients
|
|
- FastAPI's built-in handlers may override custom
|
|
- Rich detail structures need accommodation
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update API documentation with unified error schema
|
|
- Add error code reference table
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/API.md` (error codes)
|
|
- `backend/app/main.py` (exception handlers)
|
|
|
|
---
|
|
|
|
## [IMPORTANT] Provider ordering fragility (Frontend)
|
|
|
|
**Where found**
|
|
|
|
- `frontend/src/App.tsx` — 10-level deep provider nesting
|
|
- `frontend/src/providers/PROVIDER_ORDER.md` — documents order, no compile-time enforcement
|
|
|
|
**Why this is needed**
|
|
|
|
Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.
|
|
|
|
**Goal**
|
|
|
|
Add compile-time validation of provider ordering.
|
|
|
|
**What to do**
|
|
|
|
1. Create provider composition utility enforcing order
|
|
2. Use TypeScript discriminated unions
|
|
3. Add ESLint rule to check provider wrapping
|
|
|
|
**Possible traps and issues**
|
|
|
|
- TypeScript doesn't easily enforce ordering
|
|
- May be overkill — improve runtime error messages instead
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Architekture.md` § 3.2 (Providers)
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Architekture.md` § 3.2 (Providers)
|
|
- `frontend/src/providers/PROVIDER_ORDER.md`
|
|
|
|
---
|
|
|
|
## [IMPORTANT] Promise cancellation not checked in .then()/.catch() chains
|
|
|
|
**Where found**
|
|
|
|
- `frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88`
|
|
- `frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58`
|
|
- Multiple components use this pattern
|
|
|
|
**Why this is needed**
|
|
|
|
When user navigates away, `.then()` chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.
|
|
|
|
**Goal**
|
|
|
|
Check for cancellation in all `.then()/.catch()` chains.
|
|
|
|
**What to do**
|
|
|
|
1. Replace `.then()/.catch()` with `async/await` and cancellation check
|
|
2. Or use wrapper hook to hide logic
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Checking `signal.aborted` after `await` introduces race conditions
|
|
- Better: let AbortError propagate, catch it in catch block
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Web-Development.md` § Async Patterns
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Web-Development.md` (async patterns)
|
|
|
|
---
|
|
|
|
## [MEDIUM] Inefficient database pagination uses OFFSET
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/utils/pagination.py` — uses `OFFSET (page-1) * page_size`
|
|
|
|
**Why this is needed**
|
|
|
|
OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️
|
|
|
|
**Goal**
|
|
|
|
Implement keyset pagination (cursor-based) for large result sets.
|
|
|
|
**What to do**
|
|
|
|
1. **Short-term:** Add database indexes on sort columns
|
|
2. **Long-term:** Implement cursor-based pagination using WHERE instead of OFFSET
|
|
3. Frontend sends cursor (last row ID) instead of page number
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Cursor must be deterministic
|
|
- API contract changes
|
|
- Cursor format must be opaque to client
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Backend-Development.md` § Database Performance
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Backend-Development.md` (database performance)
|
|
|
|
---
|
|
|
|
## [MEDIUM] Session secret rotation not implemented
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/config.py` — single `session_secret` with no rotation support
|
|
|
|
**Why this is needed**
|
|
|
|
If secret leaks, all sessions compromised. No way to invalidate old sessions.
|
|
|
|
**Goal**
|
|
|
|
Support gradual secret rotation without forcing logout.
|
|
|
|
**What to do**
|
|
|
|
1. Store multiple secrets: current and previous
|
|
2. Accept tokens signed with either key
|
|
3. Re-sign tokens with current secret on validation
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Rotation strategy must be documented
|
|
- Metrics needed to track secret usage
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Backend-Development.md` § Session Management
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Backend-Development.md`
|
|
|
|
---
|
|
|
|
## [MEDIUM] No CORS configuration
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/main.py` — no CORS middleware added
|
|
|
|
**Why this is needed**
|
|
|
|
If frontend on different origin, cross-origin requests blocked without CORS configuration.
|
|
|
|
**Goal**
|
|
|
|
Add CORS middleware with proper origin whitelisting.
|
|
|
|
**What to do**
|
|
|
|
1. Add CORS middleware with specific origin whitelist
|
|
2. Make configurable via environment variable
|
|
3. Default to localhost for development
|
|
|
|
**Possible traps and issues**
|
|
|
|
- `allow_origins=["*"]` defeats CORS security
|
|
- Credentials require specific origins, not wildcard
|
|
- Missing config silently fails in browser
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update `Docs/Deployment.md` § CORS Configuration
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Deployment.md`
|
|
|
|
---
|
|
|
|
## [MEDIUM] Input validation missing for regex patterns (ReDoS)
|
|
|
|
**Where found**
|
|
|
|
- `backend/app/routers/config.py` — regex validation accepts arbitrary patterns without timeout
|
|
|
|
**Why this is needed**
|
|
|
|
Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.
|
|
|
|
**Goal**
|
|
|
|
Add timeout and complexity limits to regex validation.
|
|
|
|
**What to do**
|
|
|
|
1. Add timeout to regex compilation (2 seconds recommended)
|
|
2. Add length limit (reject patterns > 1000 characters)
|
|
3. Use `signal.alarm()` (Unix) or timeout library
|
|
|
|
**Possible traps and issues**
|
|
|
|
- `signal.alarm()` Unix-only
|
|
- Some valid complex regexes may timeout
|
|
- Frontend should also validate (defense in depth)
|
|
|
|
**Docs changes needed**
|
|
|
|
- Update API docs to document regex validation limits
|
|
|
|
**Doc references**
|
|
|
|
- `backend/app/routers/config.py`
|
|
|
|
---
|
|
|
|
## [MEDIUM] No structured logging to external system
|
|
|
|
**Where found**
|
|
|
|
- Logs only go to stdout/file, no external aggregation
|
|
|
|
**Why this is needed**
|
|
|
|
Can't search across instances, historical logs lost on instance recycle.
|
|
|
|
**Goal**
|
|
|
|
Ship logs to centralized logging platform.
|
|
|
|
**What to do**
|
|
|
|
1. **Short-term:** Ensure `structlog` JSON output is valid (already done)
|
|
2. **Long-term:** Ship to logging platform (ELK, Datadog, Papertrail)
|
|
|
|
**Possible traps and issues**
|
|
|
|
- External logging adds latency
|
|
- Sensitive data must not be logged
|
|
- Log volume can be massive
|
|
|
|
**Docs changes needed**
|
|
|
|
- Add `Docs/Observability.md` section on logging
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Observability.md` (new)
|
|
|
|
---
|
|
|
|
## [MEDIUM] No Application Performance Monitoring (APM)
|
|
|
|
**Where found**
|
|
|
|
- Backend: no metrics collection, latency tracking
|
|
- Frontend: no error tracking, performance metrics
|
|
- No observability into request performance
|
|
|
|
**Why this is needed**
|
|
|
|
Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.
|
|
|
|
**Goal**
|
|
|
|
Add comprehensive metrics collection and monitoring.
|
|
|
|
**What to do**
|
|
|
|
1. **Backend metrics:**
|
|
- Add Prometheus metrics: request count, latency, active requests
|
|
- Expose `/metrics` endpoint
|
|
|
|
2. **Frontend metrics:**
|
|
- Page load time, FCP, LCP using `web-vitals`
|
|
- API error rates and latencies
|
|
|
|
3. **Aggregation:**
|
|
- Prometheus + Grafana, or Datadog/NewRelic
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Metrics collection has performance cost
|
|
- Cardinality explosion with tags
|
|
- PII in metrics
|
|
|
|
**Docs changes needed**
|
|
|
|
- Add `Docs/Observability.md`
|
|
|
|
**Doc references**
|
|
|
|
- `Docs/Observability.md` (new)
|
|
|
|
---
|
|
|
|
## [LOW] Frontend charts not memoized
|
|
|
|
**Where found**
|
|
|
|
- `frontend/src/components/TopCountriesPieChart.tsx`
|
|
- `frontend/src/components/TopCountriesBarChart.tsx`
|
|
|
|
**Why this is needed**
|
|
|
|
Charts re-render on every parent update, Recharts reprocesses 5000+ points.
|
|
|
|
**Goal**
|
|
|
|
Memoize chart components.
|
|
|
|
**What to do**
|
|
|
|
1. Wrap with `React.memo` with custom comparison
|
|
2. Ensure data objects are stable
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Shallow comparison might not be enough
|
|
- Memoization has memory cost
|
|
|
|
**Docs changes needed**
|
|
|
|
- No documentation changes
|
|
|
|
**Doc references**
|
|
|
|
- `frontend/src/components/TopCountriesChart.tsx`
|
|
|
|
---
|
|
|
|
## [LOW] No request deduplication on frontend
|
|
|
|
**Where found**
|
|
|
|
- `frontend/src/hooks/useFetchData.ts` — each call launches new request
|
|
- User clicks "Refresh" twice → two identical requests
|
|
|
|
**Why this is needed**
|
|
|
|
Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).
|
|
|
|
**Goal**
|
|
|
|
Deduplicate identical in-flight requests.
|
|
|
|
**What to do**
|
|
|
|
1. Implement request cache
|
|
2. Clear cache entry when response received
|
|
3. Use in `useFetchData`
|
|
|
|
**Possible traps and issues**
|
|
|
|
- Cache must be cleared on data mutation
|
|
- Stale data in cache possible if not careful
|
|
|
|
**Docs changes needed**
|
|
|
|
- No documentation changes
|
|
|
|
**Doc references**
|
|
|
|
- `frontend/src/hooks/useFetchData.ts` |