Files
BanGUI/Docs/Tasks.md
Lukas 400ab1a3f1 Add security headers middleware and documentation
- Add SecurityHeadersMiddleware to backend/app/main.py
  - Implements Content-Security-Policy: default-src 'self'
  - Implements X-Frame-Options: DENY (clickjacking protection)
  - Implements X-Content-Type-Options: nosniff (MIME-sniffing protection)
  - Implements X-XSS-Protection: 1; mode=block (browser XSS filters)
- Add CSP meta tag to frontend/index.html for defense-in-depth
- Create Docs/Security.md with comprehensive security headers documentation
- Add test suite (backend/tests/test_security_headers_middleware.py) with 5 tests
  - Tests verify headers are present on success and error responses
  - Tests ensure all four security headers are correctly set
- All existing tests continue to pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 21:33:08 +02:00

734 lines
17 KiB
Markdown

## [CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)
**Where found**
- Backend does not set `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options` headers
- Frontend HTML served without CSP meta tags
**Why this is needed**
Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks.
**Goal**
Add security headers to all HTTP responses.
**What to do**
1. Add security headers middleware to `backend/app/main.py`:
```python
@app.middleware("http")
async def add_security_headers(request, call_next):
response = await call_next(request)
response.headers["Content-Security-Policy"] = "default-src 'self'"
response.headers["X-Frame-Options"] = "DENY"
response.headers["X-Content-Type-Options"] = "nosniff"
return response
```
2. In frontend `index.html`, add CSP meta tag
3. Test with browser DevTools Security tab
**Possible traps and issues**
- CSP `'unsafe-inline'` defeats security — avoid if possible
- CDN resources may need explicit allowlist
- Too restrictive CSP breaks functionality; too loose defeats security
**Docs changes needed**
- Add section in `Docs/Security.md` § HTTP Security Headers
**Doc references**
- `Docs/Security.md` (security headers)
---
## [CRITICAL] Background tasks lack timeout protection
**Where found**
- `backend/app/tasks/blocklist_import.py` — no timeout
- `backend/app/tasks/health_check.py` — no timeout
- All task functions lack timeout wrapper
**Why this is needed**
If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion.
**Goal**
Ensure all background tasks complete within bounded time or fail gracefully.
**What to do**
1. Wrap all task functions with `asyncio.wait_for(task, timeout)`:
```python
await asyncio.wait_for(blocklist_service.import_all(...), timeout=300)
```
2. Set appropriate timeouts per task:
- Blocklist import: 300s (5 min)
- Health probe: 10s
- Geo cache flush: 60s
3. Log timeout events and trigger alerts
**Possible traps and issues**
- Timeout too short → legitimate tasks killed prematurely
- Timeout too long → resource leak if many tasks hang
- Killing task mid-operation may leave inconsistent state
**Docs changes needed**
- Add section in `Docs/Backend-Development.md` § Background Tasks
**Doc references**
- `Docs/Backend-Development.md` (background tasks)
- `backend/app/tasks/` (task modules)
---
## [CRITICAL] Background tasks not idempotent
**Where found**
- `backend/app/tasks/blocklist_import.py` — bans applied without checking if already banned
- `backend/app/tasks/geo_cache_flush.py` — cache entries written without transaction
- Multi-step operations not wrapped in transaction
**Why this is needed**
If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.
**Goal**
Make all background tasks idempotent — retrying produces same result as running once.
**What to do**
1. Use operation IDs to deduplicate:
```python
operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}"
if await import_log_repo.get_by_operation_id(operation_id):
return # Already done
```
2. Use transactions for multi-step operations
3. Store operation state before execution
**Possible traps and issues**
- Idempotency keys must be unique but deterministic
- Transactions require database support
- State machine (pending → completed/failed) must be enforced
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Task Idempotency
**Doc references**
- `Docs/Backend-Development.md` (task design)
- `backend/app/tasks/` (task implementations)
---
## [CRITICAL] Health check endpoint returns wrong status code
**Where found**
- `backend/app/routers/health.py` — always returns 200, even when fail2ban offline
**Why this is needed**
Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.
**Goal**
Return 503 Service Unavailable when fail2ban is offline.
**What to do**
1. Change health endpoint to return 503 when offline:
```python
if not server_status.online:
return JSONResponse(
status_code=503,
content={"status": "unavailable", "fail2ban": "offline"}
)
```
2. Update Docker health check to expect 503 as "unhealthy"
**Possible traps and issues**
- Returning 503 causes orchestration tools to restart container
- If fail2ban restarts frequently, health check becomes flaky
- Consider gradual degradation
**Docs changes needed**
- Update `Docker/Dockerfile.backend` health check documentation
- Update `Docs/Deployment.md` § Health Checks
**Doc references**
- `backend/app/routers/health.py`
- `Docker/Dockerfile.backend`
---
## [IMPORTANT] Database transactions lack explicit isolation
**Where found**
- `backend/app/repositories/session_repo.py:40-60` — multiple queries without `BEGIN TRANSACTION`
- Similar pattern in multi-step operations across repositories
**Why this is needed**
Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite.
**Goal**
Wrap all multi-step operations in explicit transactions with appropriate isolation level.
**What to do**
1. Use explicit `BEGIN IMMEDIATE` transaction:
```python
await db.execute("BEGIN IMMEDIATE")
try:
await db.execute("INSERT INTO sessions ...")
await db.commit()
except Exception:
await db.rollback()
raise
```
2. Use `IMMEDIATE` mode to lock immediately for writes
3. Document transaction boundaries clearly
**Possible traps and issues**
- Nested transactions (SAVEPOINTs) may be needed
- Locks held too long cause contention
- Deadlocks possible with concurrent writers
**Docs changes needed**
- Add section in `Docs/Backend-Development.md` § Database Transactions
**Doc references**
- `Docs/Backend-Development.md` (database design)
---
## [IMPORTANT] Scheduler lock race condition
**Where found**
- `backend/app/utils/scheduler_lock.py:56-58` — heartbeat interval 10 seconds
**Why this is needed**
Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.
**Goal**
Implement robust distributed locking that prevents concurrent execution.
**What to do**
**Option A (Strengthen heartbeat):**
- Reduce interval to 5s (half of timeout)
- Use database advisory locks
- Monitor heartbeat failures
**Option B (Migrate to Redis):**
- Use `redlock-py` or `aioredis`
- Simpler, more reliable than database-backed
**Current code improvements:**
- Log when heartbeat fails
- Add metric for lock contention
- Test multi-process scenario
**Possible traps and issues**
- Database locks don't scale under high contention
- Redis adds new dependency
- Clock skew breaks timestamp-based expiry
**Docs changes needed**
- Update `Docs/Deployment.md` § Scheduler Lock
- Add troubleshooting: "Blocklist import runs twice"
**Doc references**
- `Docs/Deployment.md` (scheduler)
- `backend/app/utils/scheduler_lock.py` (lock implementation)
---
## [IMPORTANT] API pagination doesn't return metadata
**Where found**
- `backend/app/routers/history.py` — returns bare list, no pagination metadata
- All paginated routers have same issue
**Why this is needed**
Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.
**Goal**
Return pagination metadata with every paginated response.
**What to do**
1. Create response wrapper:
```python
class PaginatedResponse(BaseModel):
data: list[Item]
pagination: PaginationMetadata
```
2. Update all paginated routers to return this wrapper
3. Update frontend to use metadata for UI
**Possible traps and issues**
- `SELECT COUNT(*)` is slow on large tables
- Response shape change — old frontend may not handle
**Docs changes needed**
- Update API documentation § Pagination
**Doc references**
- `backend/app/utils/pagination.py`
---
## [IMPORTANT] Error response schema inconsistent
**Where found**
- Different handlers return different response shapes
- Fail2Ban errors: `{ "error_code": "...", "detail": "..." }`
- Validation errors: `{ "detail": [...] }`
- Not found errors: `{ "detail": "...", "error_code": "..." }`
**Why this is needed**
Frontend must normalize multiple shapes, making error handling fragile and error-prone.
**Goal**
Unify all error responses to single schema.
**What to do**
1. Define canonical error response:
```python
class ErrorResponse(BaseModel):
error_code: str
message: str
status: int
details: dict | None = None
```
2. Update all handlers to return this format
3. Update frontend to expect unified schema
**Possible traps and issues**
- Backward compatibility with old clients
- FastAPI's built-in handlers may override custom
- Rich detail structures need accommodation
**Docs changes needed**
- Update API documentation with unified error schema
- Add error code reference table
**Doc references**
- `Docs/API.md` (error codes)
- `backend/app/main.py` (exception handlers)
---
## [IMPORTANT] Provider ordering fragility (Frontend)
**Where found**
- `frontend/src/App.tsx` — 10-level deep provider nesting
- `frontend/src/providers/PROVIDER_ORDER.md` — documents order, no compile-time enforcement
**Why this is needed**
Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.
**Goal**
Add compile-time validation of provider ordering.
**What to do**
1. Create provider composition utility enforcing order
2. Use TypeScript discriminated unions
3. Add ESLint rule to check provider wrapping
**Possible traps and issues**
- TypeScript doesn't easily enforce ordering
- May be overkill — improve runtime error messages instead
**Docs changes needed**
- Update `Docs/Architekture.md` § 3.2 (Providers)
**Doc references**
- `Docs/Architekture.md` § 3.2 (Providers)
- `frontend/src/providers/PROVIDER_ORDER.md`
---
## [IMPORTANT] Promise cancellation not checked in .then()/.catch() chains
**Where found**
- `frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88`
- `frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58`
- Multiple components use this pattern
**Why this is needed**
When user navigates away, `.then()` chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.
**Goal**
Check for cancellation in all `.then()/.catch()` chains.
**What to do**
1. Replace `.then()/.catch()` with `async/await` and cancellation check
2. Or use wrapper hook to hide logic
**Possible traps and issues**
- Checking `signal.aborted` after `await` introduces race conditions
- Better: let AbortError propagate, catch it in catch block
**Docs changes needed**
- Update `Docs/Web-Development.md` § Async Patterns
**Doc references**
- `Docs/Web-Development.md` (async patterns)
---
## [MEDIUM] Inefficient database pagination uses OFFSET
**Where found**
- `backend/app/utils/pagination.py` — uses `OFFSET (page-1) * page_size`
**Why this is needed**
OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️
**Goal**
Implement keyset pagination (cursor-based) for large result sets.
**What to do**
1. **Short-term:** Add database indexes on sort columns
2. **Long-term:** Implement cursor-based pagination using WHERE instead of OFFSET
3. Frontend sends cursor (last row ID) instead of page number
**Possible traps and issues**
- Cursor must be deterministic
- API contract changes
- Cursor format must be opaque to client
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Database Performance
**Doc references**
- `Docs/Backend-Development.md` (database performance)
---
## [MEDIUM] Session secret rotation not implemented
**Where found**
- `backend/app/config.py` — single `session_secret` with no rotation support
**Why this is needed**
If secret leaks, all sessions compromised. No way to invalidate old sessions.
**Goal**
Support gradual secret rotation without forcing logout.
**What to do**
1. Store multiple secrets: current and previous
2. Accept tokens signed with either key
3. Re-sign tokens with current secret on validation
**Possible traps and issues**
- Rotation strategy must be documented
- Metrics needed to track secret usage
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Session Management
**Doc references**
- `Docs/Backend-Development.md`
---
## [MEDIUM] No CORS configuration
**Where found**
- `backend/app/main.py` — no CORS middleware added
**Why this is needed**
If frontend on different origin, cross-origin requests blocked without CORS configuration.
**Goal**
Add CORS middleware with proper origin whitelisting.
**What to do**
1. Add CORS middleware with specific origin whitelist
2. Make configurable via environment variable
3. Default to localhost for development
**Possible traps and issues**
- `allow_origins=["*"]` defeats CORS security
- Credentials require specific origins, not wildcard
- Missing config silently fails in browser
**Docs changes needed**
- Update `Docs/Deployment.md` § CORS Configuration
**Doc references**
- `Docs/Deployment.md`
---
## [MEDIUM] Input validation missing for regex patterns (ReDoS)
**Where found**
- `backend/app/routers/config.py` — regex validation accepts arbitrary patterns without timeout
**Why this is needed**
Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.
**Goal**
Add timeout and complexity limits to regex validation.
**What to do**
1. Add timeout to regex compilation (2 seconds recommended)
2. Add length limit (reject patterns > 1000 characters)
3. Use `signal.alarm()` (Unix) or timeout library
**Possible traps and issues**
- `signal.alarm()` Unix-only
- Some valid complex regexes may timeout
- Frontend should also validate (defense in depth)
**Docs changes needed**
- Update API docs to document regex validation limits
**Doc references**
- `backend/app/routers/config.py`
---
## [MEDIUM] No structured logging to external system
**Where found**
- Logs only go to stdout/file, no external aggregation
**Why this is needed**
Can't search across instances, historical logs lost on instance recycle.
**Goal**
Ship logs to centralized logging platform.
**What to do**
1. **Short-term:** Ensure `structlog` JSON output is valid (already done)
2. **Long-term:** Ship to logging platform (ELK, Datadog, Papertrail)
**Possible traps and issues**
- External logging adds latency
- Sensitive data must not be logged
- Log volume can be massive
**Docs changes needed**
- Add `Docs/Observability.md` section on logging
**Doc references**
- `Docs/Observability.md` (new)
---
## [MEDIUM] No Application Performance Monitoring (APM)
**Where found**
- Backend: no metrics collection, latency tracking
- Frontend: no error tracking, performance metrics
- No observability into request performance
**Why this is needed**
Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.
**Goal**
Add comprehensive metrics collection and monitoring.
**What to do**
1. **Backend metrics:**
- Add Prometheus metrics: request count, latency, active requests
- Expose `/metrics` endpoint
2. **Frontend metrics:**
- Page load time, FCP, LCP using `web-vitals`
- API error rates and latencies
3. **Aggregation:**
- Prometheus + Grafana, or Datadog/NewRelic
**Possible traps and issues**
- Metrics collection has performance cost
- Cardinality explosion with tags
- PII in metrics
**Docs changes needed**
- Add `Docs/Observability.md`
**Doc references**
- `Docs/Observability.md` (new)
---
## [LOW] Frontend charts not memoized
**Where found**
- `frontend/src/components/TopCountriesPieChart.tsx`
- `frontend/src/components/TopCountriesBarChart.tsx`
**Why this is needed**
Charts re-render on every parent update, Recharts reprocesses 5000+ points.
**Goal**
Memoize chart components.
**What to do**
1. Wrap with `React.memo` with custom comparison
2. Ensure data objects are stable
**Possible traps and issues**
- Shallow comparison might not be enough
- Memoization has memory cost
**Docs changes needed**
- No documentation changes
**Doc references**
- `frontend/src/components/TopCountriesChart.tsx`
---
## [LOW] No request deduplication on frontend
**Where found**
- `frontend/src/hooks/useFetchData.ts` — each call launches new request
- User clicks "Refresh" twice → two identical requests
**Why this is needed**
Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).
**Goal**
Deduplicate identical in-flight requests.
**What to do**
1. Implement request cache
2. Clear cache entry when response received
3. Use in `useFetchData`
**Possible traps and issues**
- Cache must be cleared on data mutation
- Stale data in cache possible if not careful
**Docs changes needed**
- No documentation changes
**Doc references**
- `frontend/src/hooks/useFetchData.ts`