- Add global rate limiter utility with configurable limits and cleanup - Move rate limiting logic to middleware for consistent application - Update auth routes to use new rate limiter - Add comprehensive tests for rate limiter functionality - Update documentation with backend development guidelines and tasks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
19 KiB
[CRITICAL] Global rate limiting missing
Where found
backend/app/routers/auth.py— only/api/auth/loginhas rate limiting- All other routers have no rate limiting
Why this is needed
Without rate limiting, attackers can spam endpoints to cause CPU spike, database overload, or network bandwidth exhaustion.
Goal
Implement global per-IP rate limiting on all endpoints.
What to do
-
Add rate limiting middleware to
backend/app/main.py:from slowapi import Limiter limiter = Limiter(key_func=get_remote_address, default_limits=["200 per minute"]) app.state.limiter = limiter -
Apply to all routers with appropriate limits per endpoint
-
Return proper HTTP 429 with
Retry-Afterheader -
Document limits in API docs
Possible traps and issues
- Limits set too low block legitimate users
- Distributed deployments need shared limiter state (Redis-backed)
- Different endpoints may need different limits
- Trusted IPs should bypass limiting
Docs changes needed
- Add section in
Docs/Backend-Development.md§ Rate Limiting - Document default limits in deployment guide
Doc references
Docs/Backend-Development.md(rate limiting)backend/app/main.py(middleware setup)
[CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)
Where found
- Backend does not set
Content-Security-Policy,X-Frame-Options,X-Content-Type-Optionsheaders - Frontend HTML served without CSP meta tags
Why this is needed
Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks.
Goal
Add security headers to all HTTP responses.
What to do
-
Add security headers middleware to
backend/app/main.py:@app.middleware("http") async def add_security_headers(request, call_next): response = await call_next(request) response.headers["Content-Security-Policy"] = "default-src 'self'" response.headers["X-Frame-Options"] = "DENY" response.headers["X-Content-Type-Options"] = "nosniff" return response -
In frontend
index.html, add CSP meta tag -
Test with browser DevTools Security tab
Possible traps and issues
- CSP
'unsafe-inline'defeats security — avoid if possible - CDN resources may need explicit allowlist
- Too restrictive CSP breaks functionality; too loose defeats security
Docs changes needed
- Add section in
Docs/Security.md§ HTTP Security Headers
Doc references
Docs/Security.md(security headers)
[CRITICAL] Background tasks lack timeout protection
Where found
backend/app/tasks/blocklist_import.py— no timeoutbackend/app/tasks/health_check.py— no timeout- All task functions lack timeout wrapper
Why this is needed
If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion.
Goal
Ensure all background tasks complete within bounded time or fail gracefully.
What to do
-
Wrap all task functions with
asyncio.wait_for(task, timeout):await asyncio.wait_for(blocklist_service.import_all(...), timeout=300) -
Set appropriate timeouts per task:
- Blocklist import: 300s (5 min)
- Health probe: 10s
- Geo cache flush: 60s
-
Log timeout events and trigger alerts
Possible traps and issues
- Timeout too short → legitimate tasks killed prematurely
- Timeout too long → resource leak if many tasks hang
- Killing task mid-operation may leave inconsistent state
Docs changes needed
- Add section in
Docs/Backend-Development.md§ Background Tasks
Doc references
Docs/Backend-Development.md(background tasks)backend/app/tasks/(task modules)
[CRITICAL] Background tasks not idempotent
Where found
backend/app/tasks/blocklist_import.py— bans applied without checking if already bannedbackend/app/tasks/geo_cache_flush.py— cache entries written without transaction- Multi-step operations not wrapped in transaction
Why this is needed
If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.
Goal
Make all background tasks idempotent — retrying produces same result as running once.
What to do
-
Use operation IDs to deduplicate:
operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}" if await import_log_repo.get_by_operation_id(operation_id): return # Already done -
Use transactions for multi-step operations
-
Store operation state before execution
Possible traps and issues
- Idempotency keys must be unique but deterministic
- Transactions require database support
- State machine (pending → completed/failed) must be enforced
Docs changes needed
- Update
Docs/Backend-Development.md§ Task Idempotency
Doc references
Docs/Backend-Development.md(task design)backend/app/tasks/(task implementations)
[CRITICAL] Health check endpoint returns wrong status code
Where found
backend/app/routers/health.py— always returns 200, even when fail2ban offline
Why this is needed
Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.
Goal
Return 503 Service Unavailable when fail2ban is offline.
What to do
-
Change health endpoint to return 503 when offline:
if not server_status.online: return JSONResponse( status_code=503, content={"status": "unavailable", "fail2ban": "offline"} ) -
Update Docker health check to expect 503 as "unhealthy"
Possible traps and issues
- Returning 503 causes orchestration tools to restart container
- If fail2ban restarts frequently, health check becomes flaky
- Consider gradual degradation
Docs changes needed
- Update
Docker/Dockerfile.backendhealth check documentation - Update
Docs/Deployment.md§ Health Checks
Doc references
backend/app/routers/health.pyDocker/Dockerfile.backend
[IMPORTANT] Database transactions lack explicit isolation
Where found
backend/app/repositories/session_repo.py:40-60— multiple queries withoutBEGIN TRANSACTION- Similar pattern in multi-step operations across repositories
Why this is needed
Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite.
Goal
Wrap all multi-step operations in explicit transactions with appropriate isolation level.
What to do
-
Use explicit
BEGIN IMMEDIATEtransaction:await db.execute("BEGIN IMMEDIATE") try: await db.execute("INSERT INTO sessions ...") await db.commit() except Exception: await db.rollback() raise -
Use
IMMEDIATEmode to lock immediately for writes -
Document transaction boundaries clearly
Possible traps and issues
- Nested transactions (SAVEPOINTs) may be needed
- Locks held too long cause contention
- Deadlocks possible with concurrent writers
Docs changes needed
- Add section in
Docs/Backend-Development.md§ Database Transactions
Doc references
Docs/Backend-Development.md(database design)
[IMPORTANT] Scheduler lock race condition
Where found
backend/app/utils/scheduler_lock.py:56-58— heartbeat interval 10 seconds
Why this is needed
Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.
Goal
Implement robust distributed locking that prevents concurrent execution.
What to do
Option A (Strengthen heartbeat):
- Reduce interval to 5s (half of timeout)
- Use database advisory locks
- Monitor heartbeat failures
Option B (Migrate to Redis):
- Use
redlock-pyoraioredis - Simpler, more reliable than database-backed
Current code improvements:
- Log when heartbeat fails
- Add metric for lock contention
- Test multi-process scenario
Possible traps and issues
- Database locks don't scale under high contention
- Redis adds new dependency
- Clock skew breaks timestamp-based expiry
Docs changes needed
- Update
Docs/Deployment.md§ Scheduler Lock - Add troubleshooting: "Blocklist import runs twice"
Doc references
Docs/Deployment.md(scheduler)backend/app/utils/scheduler_lock.py(lock implementation)
[IMPORTANT] API pagination doesn't return metadata
Where found
backend/app/routers/history.py— returns bare list, no pagination metadata- All paginated routers have same issue
Why this is needed
Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.
Goal
Return pagination metadata with every paginated response.
What to do
-
Create response wrapper:
class PaginatedResponse(BaseModel): data: list[Item] pagination: PaginationMetadata -
Update all paginated routers to return this wrapper
-
Update frontend to use metadata for UI
Possible traps and issues
SELECT COUNT(*)is slow on large tables- Response shape change — old frontend may not handle
Docs changes needed
- Update API documentation § Pagination
Doc references
backend/app/utils/pagination.py
[IMPORTANT] Error response schema inconsistent
Where found
- Different handlers return different response shapes
- Fail2Ban errors:
{ "error_code": "...", "detail": "..." } - Validation errors:
{ "detail": [...] } - Not found errors:
{ "detail": "...", "error_code": "..." }
Why this is needed
Frontend must normalize multiple shapes, making error handling fragile and error-prone.
Goal
Unify all error responses to single schema.
What to do
-
Define canonical error response:
class ErrorResponse(BaseModel): error_code: str message: str status: int details: dict | None = None -
Update all handlers to return this format
-
Update frontend to expect unified schema
Possible traps and issues
- Backward compatibility with old clients
- FastAPI's built-in handlers may override custom
- Rich detail structures need accommodation
Docs changes needed
- Update API documentation with unified error schema
- Add error code reference table
Doc references
Docs/API.md(error codes)backend/app/main.py(exception handlers)
[IMPORTANT] Provider ordering fragility (Frontend)
Where found
frontend/src/App.tsx— 10-level deep provider nestingfrontend/src/providers/PROVIDER_ORDER.md— documents order, no compile-time enforcement
Why this is needed
Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.
Goal
Add compile-time validation of provider ordering.
What to do
- Create provider composition utility enforcing order
- Use TypeScript discriminated unions
- Add ESLint rule to check provider wrapping
Possible traps and issues
- TypeScript doesn't easily enforce ordering
- May be overkill — improve runtime error messages instead
Docs changes needed
- Update
Docs/Architekture.md§ 3.2 (Providers)
Doc references
Docs/Architekture.md§ 3.2 (Providers)frontend/src/providers/PROVIDER_ORDER.md
[IMPORTANT] Promise cancellation not checked in .then()/.catch() chains
Where found
frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58- Multiple components use this pattern
Why this is needed
When user navigates away, .then() chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.
Goal
Check for cancellation in all .then()/.catch() chains.
What to do
- Replace
.then()/.catch()withasync/awaitand cancellation check - Or use wrapper hook to hide logic
Possible traps and issues
- Checking
signal.abortedafterawaitintroduces race conditions - Better: let AbortError propagate, catch it in catch block
Docs changes needed
- Update
Docs/Web-Development.md§ Async Patterns
Doc references
Docs/Web-Development.md(async patterns)
[MEDIUM] Inefficient database pagination uses OFFSET
Where found
backend/app/utils/pagination.py— usesOFFSET (page-1) * page_size
Why this is needed
OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️
Goal
Implement keyset pagination (cursor-based) for large result sets.
What to do
- Short-term: Add database indexes on sort columns
- Long-term: Implement cursor-based pagination using WHERE instead of OFFSET
- Frontend sends cursor (last row ID) instead of page number
Possible traps and issues
- Cursor must be deterministic
- API contract changes
- Cursor format must be opaque to client
Docs changes needed
- Update
Docs/Backend-Development.md§ Database Performance
Doc references
Docs/Backend-Development.md(database performance)
[MEDIUM] Session secret rotation not implemented
Where found
backend/app/config.py— singlesession_secretwith no rotation support
Why this is needed
If secret leaks, all sessions compromised. No way to invalidate old sessions.
Goal
Support gradual secret rotation without forcing logout.
What to do
- Store multiple secrets: current and previous
- Accept tokens signed with either key
- Re-sign tokens with current secret on validation
Possible traps and issues
- Rotation strategy must be documented
- Metrics needed to track secret usage
Docs changes needed
- Update
Docs/Backend-Development.md§ Session Management
Doc references
Docs/Backend-Development.md
[MEDIUM] No CORS configuration
Where found
backend/app/main.py— no CORS middleware added
Why this is needed
If frontend on different origin, cross-origin requests blocked without CORS configuration.
Goal
Add CORS middleware with proper origin whitelisting.
What to do
- Add CORS middleware with specific origin whitelist
- Make configurable via environment variable
- Default to localhost for development
Possible traps and issues
allow_origins=["*"]defeats CORS security- Credentials require specific origins, not wildcard
- Missing config silently fails in browser
Docs changes needed
- Update
Docs/Deployment.md§ CORS Configuration
Doc references
Docs/Deployment.md
[MEDIUM] Input validation missing for regex patterns (ReDoS)
Where found
backend/app/routers/config.py— regex validation accepts arbitrary patterns without timeout
Why this is needed
Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.
Goal
Add timeout and complexity limits to regex validation.
What to do
- Add timeout to regex compilation (2 seconds recommended)
- Add length limit (reject patterns > 1000 characters)
- Use
signal.alarm()(Unix) or timeout library
Possible traps and issues
signal.alarm()Unix-only- Some valid complex regexes may timeout
- Frontend should also validate (defense in depth)
Docs changes needed
- Update API docs to document regex validation limits
Doc references
backend/app/routers/config.py
[MEDIUM] No structured logging to external system
Where found
- Logs only go to stdout/file, no external aggregation
Why this is needed
Can't search across instances, historical logs lost on instance recycle.
Goal
Ship logs to centralized logging platform.
What to do
- Short-term: Ensure
structlogJSON output is valid (already done) - Long-term: Ship to logging platform (ELK, Datadog, Papertrail)
Possible traps and issues
- External logging adds latency
- Sensitive data must not be logged
- Log volume can be massive
Docs changes needed
- Add
Docs/Observability.mdsection on logging
Doc references
Docs/Observability.md(new)
[MEDIUM] No Application Performance Monitoring (APM)
Where found
- Backend: no metrics collection, latency tracking
- Frontend: no error tracking, performance metrics
- No observability into request performance
Why this is needed
Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.
Goal
Add comprehensive metrics collection and monitoring.
What to do
-
Backend metrics:
- Add Prometheus metrics: request count, latency, active requests
- Expose
/metricsendpoint
-
Frontend metrics:
- Page load time, FCP, LCP using
web-vitals - API error rates and latencies
- Page load time, FCP, LCP using
-
Aggregation:
- Prometheus + Grafana, or Datadog/NewRelic
Possible traps and issues
- Metrics collection has performance cost
- Cardinality explosion with tags
- PII in metrics
Docs changes needed
- Add
Docs/Observability.md
Doc references
Docs/Observability.md(new)
[LOW] Frontend charts not memoized
Where found
frontend/src/components/TopCountriesPieChart.tsxfrontend/src/components/TopCountriesBarChart.tsx
Why this is needed
Charts re-render on every parent update, Recharts reprocesses 5000+ points.
Goal
Memoize chart components.
What to do
- Wrap with
React.memowith custom comparison - Ensure data objects are stable
Possible traps and issues
- Shallow comparison might not be enough
- Memoization has memory cost
Docs changes needed
- No documentation changes
Doc references
frontend/src/components/TopCountriesChart.tsx
[LOW] No request deduplication on frontend
Where found
frontend/src/hooks/useFetchData.ts— each call launches new request- User clicks "Refresh" twice → two identical requests
Why this is needed
Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).
Goal
Deduplicate identical in-flight requests.
What to do
- Implement request cache
- Clear cache entry when response received
- Use in
useFetchData
Possible traps and issues
- Cache must be cleared on data mutation
- Stale data in cache possible if not careful
Docs changes needed
- No documentation changes
Doc references
frontend/src/hooks/useFetchData.ts