BanGUI/Docs/Tasks.md at 2db635ae199cfa35efe4dad0d6269caa15d4ce89

Files

Lukas 2db635ae19 Fix exception handler overlap issue - add DomainError catch-all handler

**Problem:** Broad exception handlers created fragility where adding a new
DomainError subclass without explicit registration would silently fall through
to the generic exception handler, losing the specific error_code and metadata.

**Solution:**
1. Import DomainError in main.py for explicit handler registration
2. Fix type hints in exception handlers from 'Exception' to specific types
   - NotFoundError handler now typed as 'NotFoundError'
   - BadRequestError handler now typed as 'BadRequestError'
   - ConflictError handler now typed as 'ConflictError'
   - DomainError handler now typed as 'DomainError'
   - ServiceUnavailableError handler now typed as 'ServiceUnavailableError'
3. Add DomainError as an explicit catch-all handler in the registration chain
   - Positioned after specific handlers, before HTTPException
   - Any unregistered DomainError subclass now gets correct error_code + metadata
4. Document the exception handler hierarchy with detailed comments
5. Update Backend-Development.md with handler hierarchy documentation
6. Update Architekture.md section 2.2 with exception handler details
7. Fix test expectations in test_main.py to verify ErrorResponse format

**Impact:** Any new DomainError subclass now automatically gets correct HTTP 500
status, error_code, and metadata - even if developer forgets explicit handler.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-04-30 19:44:43 +02:00

36 KiB

Raw Blame History

[Backend] Exception handler overlap — broad handlers catching everything

Where found

backend/app/main.py:182-200 — _get_error_code() accepts any Exception and falls back to snake_case conversion
Multiple handlers (lines 329-466) accept Exception as parameter

Why this is needed

Broad exception handlers create fragility: adding a new DomainError subclass without explicitly registering a handler silently falls through, producing generic error codes instead of specific ones. The fallback chain is not explicitly documented.

Goal

Make the exception handler registration explicit and documented. Every exception type that can bubble up should have a clear path to a handler.

What to do

Audit all exception handlers and confirm they are registered with the most specific base class
Add a comment block documenting the fallback chain
Ensure every custom DomainError subclass has error_code and get_error_metadata() implemented
Add a catch-all Exception handler as the absolute last resort

Possible traps and issues

If a new DomainError subclass is added without handler registration, it silently returns wrong status code
ValueError handler may catch Pydantic ValidationError subclasses

Docs changes needed

Update Docs/Architekture.md § 2.2 (Application Entry Point) — document exception handler hierarchy
Add section in Docs/Backend-Development.md on exception taxonomy

Doc references

Docs/Architekture.md § 2.2 (Application Entry Point)
Docs/Backend-Development.md (exception conventions)

Where found

backend/app/routers/auth.py:82-107 — rate limiter check happens before password verification, penalty sleep happens after

Why this is needed

The current design means attackers who stay under 5 requests/minute get no penalty at all. The asyncio.sleep only fires after the rate limit is already exceeded, significantly weakening the limiter's effectiveness.

Goal

Ensure the rate limiter blocks requests before the password check is attempted. Each wrong password should incur a progressive delay.

What to do

Remove the acquire/release pattern
Change flow so record_failure is called on every wrong password and is_allowed returns False when limit exceeded
Implement exponential backoff: penalty = min(base_delay * (2 ** failure_count), max_delay)
Consider using a token bucket rather than sliding window
Ensure is_allowed uses the failure count atomically

Possible traps and issues

If asyncio.sleep is called before password check, legitimate users experience latency on response
Keep maximum penalty reasonable (2-5 seconds)
record_failure counter must be stored durably (in-memory is fine for single-worker)

Docs changes needed

Update Docs/Architekture.md § 2.2 (auth router) — reflect rate limiting behavior
Add note in Docs/Backend-Development.md about rate limiter design

Doc references

Docs/Architekture.md § 2.2 (auth router)
backend/app/routers/auth.py (login endpoint)

[Backend] Module-level imports inside dependency provider functions

Where found

backend/app/dependencies.py:151 — from app.db import open_db inside get_db()
backend/app/dependencies.py:258, 270, 282, 294, 307, 319, 332 — local imports inside each get_<repo>() provider
Similar patterns in service provider functions

Why this is needed

FastAPI calls dependency provider functions on every request. While Python caches modules, the import statement still has overhead. More importantly, the pattern is inconsistent — some providers have module-level imports while others have local imports, making it unclear which providers are safe to call at high frequency.

Goal

Move all repository and service imports to module level in dependencies.py.

What to do

Move all repository imports to module level near the top
Similarly move from app.services import health_service to module level
Keep from app.db import open_db as local import (only needed within get_db())
Move from app.services import auth_service to module level with TYPE_CHECKING guard if circular deps prevent it

Possible traps and issues

Moving imports to module level may expose hidden circular imports
Current local-import pattern was likely chosen to avoid circular deps — verify no circular dependencies before making change

Docs changes needed

Update Docs/Architekture.md § 2.3 (Dependency Wiring) — repository and service imports should be at module level

Doc references

Docs/Architekture.md § 2.3 (Dependency Wiring)
backend/app/dependencies.py (composition root)

[Backend] `get_password_hash` lives in `setup_service` but is used by `auth_service`

Where found

backend/app/services/auth_service.py:27 — imports get_password_hash from setup_service

Why this is needed

auth_service.py handles all authentication concerns and is the natural home for password hash retrieval. Having it import from setup_service creates incorrect semantic dependency and unnecessary coupling.

Goal

Move get_password_hash to auth_service.py where it belongs, or to a shared app/utils/crypto.py module.

What to do

Move the get_password_hash function body from app/services/setup_service.py to app/services/auth_service.py
Update the import in auth_service.py to use the local function
Search all of backend/app/ for usages and update imports
Remove the function from setup_service.py

Possible traps and issues

Search thoroughly for all usages before removing — may be imported in tests, other services, setup router
If used during initial setup flow, update that usage to import from auth_service instead

Docs changes needed

No documentation changes required — internal code reorganization

Doc references

backend/app/services/auth_service.py
backend/app/services/setup_service.py

[Backend] `re` module imported inside function body

Where found

backend/app/main.py:198-199 — import re inside _get_error_code()

Why this is needed

Importing inside a function is a code smell. Standard practice is to import modules at the top of the file.

Goal

Move import re to the module-level imports at the top of main.py.

What to do

Add import re to existing imports section at top of backend/app/main.py
Remove import re line from inside _get_error_code()

Possible traps and issues

None — straightforward refactoring with no behavioral change

Docs changes needed

No documentation changes needed

Doc references

backend/app/main.py

[Frontend] AuthProvider sessionStorage not synchronized across tabs

Where found

frontend/src/providers/AuthProvider.tsx — uses sessionStorage to persist isAuthenticated across page refreshes
sessionStorage is tab-scoped — not shared across browser tabs

Why this is needed

If a user logs out in Tab A, Tab B's sessionStorage still holds isAuthenticated: true. Tab B continues showing authenticated UI until next full page refresh or 401 error.

Goal

Ensure logout state is propagated across all open tabs immediately.

What to do

Add storage event listener in AuthProvider to receive logout events from other tabs
When logout() is called, clear sessionStorage explicitly
Consider also using BroadcastChannel API for real-time cross-tab sync

Possible traps and issues

storage event only fires when sessionStorage changed in another tab
StorageEvent doesn't fire if cookies are sole authentication mechanism
Very old browsers don't support StorageEvent — fallback to full-page refresh

Docs changes needed

Update Docs/Architekture.md § 3.2 (AuthProvider) — document cross-tab synchronization
Add note in Docs/Web-Development.md about authentication state management

Doc references

Docs/Architekture.md § 3.2 (Providers section)
frontend/src/providers/AuthProvider.tsx

[Frontend] usePolledData — setInterval without drift correction

Where found

frontend/src/hooks/usePolledData.ts — uses setInterval without tracking expected next poll time

Why this is needed

With fixed setInterval, if refetch takes time (e.g., 2 seconds for slow API) and pollInterval is 5 seconds, the actual polling pattern accumulates drift. Effective polling interval becomes > 5 seconds, wasting bandwidth and CPU.

Goal

Implement drift-corrected polling that schedules next poll relative to completion of previous poll, not its start.

What to do

Replace setInterval with self-scheduling timeout
Track elapsed time between poll start and completion
Schedule next poll with compensation: delay = Math.max(0, pollInterval - elapsed)
Alternatively use use-sse or custom hook useDriftCorrectedPolling

Possible traps and issues

Cancellation check must happen both before and after await refetch() to prevent race conditions
Coordinate cancellation with AbortController already used by useFetchData
If pollInterval changes, hook must restart polling loop

Docs changes needed

Update frontend/src/hooks/README.md — document drift-corrected polling behavior

Doc references

frontend/src/hooks/usePolledData.ts
frontend/src/hooks/README.md

[Frontend] ErrorBoundary — non-standard `componentStack` property

Where found

frontend/src/components/ErrorBoundary.tsx — accesses errorInfo.componentStack
Property not in React.ErrorInfo public type definitions

Why this is needed

Accessing errorInfo.componentStack produces TypeScript error under strict mode. Property is React DevTools implementation detail, may change without notice.

Goal

Remove dependency on errorInfo.componentStack, use alternative for stack information.

What to do

Check what recordCritical does with componentStack parameter
Consider using standard Error.prototype.stack instead

Or cast errorInfo to any with comment explaining trade-off:

const stack = (errorInfo as any).componentStack as string | undefined;
// TODO: Remove when React types include componentStack

Handle case where componentStack is undefined

Possible traps and issues

Error.prototype.stack contains JavaScript call stack, not React component stack
Casting to any suppresses type checking — ensure it's documented
In production builds componentStack may be empty or absent

Docs changes needed

Update Docs/Architekture.md § 3.2 (Components) — note that ErrorBoundary uses React-internal error info

Doc references

frontend/src/components/ErrorBoundary.tsx
Docs/Architekture.md § 3.2 (Components section)

Where found

frontend/src/pages/LoginPage.tsx — renders password form immediately without checking session validation state

Why this is needed

AuthProvider validates session on mount. During validation (network round-trip), LoginPage renders form. If user already has valid session, they briefly see login page before redirect — "flash" of wrong page.

Goal

Show loading state (spinner or skeleton) while session validation in progress.

What to do

Add useSessionValidation call to LoginPage to get isValidating flag
If isValidating, return <Spinner label="Checking session..." />
Alternatively, have AuthProvider expose isValidating state that LoginPage can read
Ensure redirect to dashboard happens automatically via useEffect

Possible traps and issues

useSessionValidation may not expose isValidating — check hook implementation
Ensure loading spinner is consistent with Fluent UI design language
Redirect logic must handle ?next= query parameter

Docs changes needed

No documentation changes needed — UX improvement to existing component

Doc references

frontend/src/pages/LoginPage.tsx
frontend/src/providers/AuthProvider.tsx

[CRITICAL] Backend session cache not cluster-safe

Where found

backend/app/startup.py:66-91 — verify_bangui_workers() validates BANGUI_WORKERS env var but only if explicitly set
backend/app/utils/session_cache.py — in-memory cache stored on app.state (process-local)
Multi-worker deployments use separate worker processes with separate app.state

Why this is needed

Multi-worker deployments (e.g., gunicorn -w 4) create separate worker processes. Each has its own in-memory app.state, so sessions cached in Worker A are invisible to Workers B, C, D. Request landing on different worker shows session invalid — users randomly logged out.

Goal

Ensure multi-worker deployments either prevented at startup or migrate to shared session store (Redis, PostgreSQL).

What to do

Option A (Strict single-worker):

Add runtime detection of actual worker count (not just env var checking)
Fail startup if multi-worker scenario detected
Document single-worker requirement prominently

Option B (Support multi-worker):

Migrate session cache to Redis or PostgreSQL
Update app/utils/session_cache.py to use distributed backend
Add BANGUI_REDIS_URL or similar to config

Immediate mitigation:

Document in deployment guide that BANGUI_WORKERS=1 is mandatory
Add validation that fails loudly if BANGUI_WORKERS != 1

Possible traps and issues

uvicorn --workers N creates N processes, each with separate app.state
Environment variable validation easily bypassed using command-line flag
Detecting actual worker count at runtime is tricky — consider os.getpid() and shared lock file

Docs changes needed

Update Docs/Deployment.md § Deployment Constraints — explicitly document single-worker requirement
Add troubleshooting entry: "Why am I randomly logged out?" → Check BANGUI_WORKERS
Update Docker/docker-compose.yml with comment explaining requirement

Doc references

Docs/Deployment.md (deployment constraints)
backend/app/startup.py (worker validation)
backend/app/utils/session_cache.py (session cache)

[CRITICAL] Frontend-Backend type generation drift

Where found

frontend/src/types/ — manually maintained Pydantic model transcriptions
backend/app/models/ — source Pydantic models
No build-time validation or code generation linking them

Why this is needed

Types manually copied from Python models. When backend model changes, frontend types not updated automatically. Frontend uses stale types, causing runtime errors, type errors at build time, or silent UI bugs.

Goal

Implement automated type synchronization from backend schema to frontend types, validated at build time.

What to do

Recommended: Generate TypeScript types from OpenAPI schema

openapi-typescript http://localhost:8000/openapi.json -o src/types/generated.ts

Setup:
- Ensure backend exposes OpenAPI schema at /openapi.json (FastAPI has this built-in)
- Add openapi-typescript to frontend/package.json devDependencies
- Generate types in pre-build step: npm run generate:types
- Import types from generated file, not hand-written
- Add CI check: fail build if generated types don't match committed types
Alternative: Use typed-rest-client or msw with type generation

Possible traps and issues

OpenAPI schema must be kept up-to-date — CI validation must enforce
Generated types may have different names than hand-written types — migration needed
Private/internal model fields should be excluded from schema

Docs changes needed

Update Docs/Web-Development.md § Type Generation — document workflow
Add pre-commit hook documentation
Update Docs/Backend-Development.md to note API changes must keep schema in sync

Doc references

Docs/Web-Development.md (type generation)
Docs/Backend-Development.md (API changes)

[CRITICAL] Docker containers lack resource limits

Where found

Docker/docker-compose.yml — no deploy.limits or deploy.reservations sections

Why this is needed

Without resource limits, single container can consume all host CPU, memory, disk. "Noisy neighbor" scenario where backend memory leak → uses 100% RAM → OOM kill → host unresponsive.

Goal

Set hard and soft resource limits for all containers.

What to do

Add resource limits to docker-compose.yml:

backend:
  deploy:
    limits:
      cpus: '2'
      memory: 512M
    reservations:
      cpus: '1'
      memory: 256M

Document these limits in Docs/Deployment.md
For Kubernetes, add equivalent resources.limits and resources.requests

Possible traps and issues

Limits set too low → OOM kill or throttling
Backend may need more memory for large blocklists
Test under expected load before finalizing
Different environments may need different limits

Docs changes needed

Update Docker/docker-compose.yml with deploy sections
Add section in Docs/Deployment.md § Resource Allocation

Doc references

Docker/docker-compose.yml
Docs/Deployment.md (resource allocation)

[CRITICAL] Global rate limiting missing

Where found

backend/app/routers/auth.py — only /api/auth/login has rate limiting
All other routers have no rate limiting

Why this is needed

Without rate limiting, attackers can spam endpoints to cause CPU spike, database overload, or network bandwidth exhaustion.

Goal

Implement global per-IP rate limiting on all endpoints.

What to do

Add rate limiting middleware to backend/app/main.py:

from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address, default_limits=["200 per minute"])
app.state.limiter = limiter

Apply to all routers with appropriate limits per endpoint
Return proper HTTP 429 with Retry-After header
Document limits in API docs

Possible traps and issues

Limits set too low block legitimate users
Distributed deployments need shared limiter state (Redis-backed)
Different endpoints may need different limits
Trusted IPs should bypass limiting

Docs changes needed

Add section in Docs/Backend-Development.md § Rate Limiting
Document default limits in deployment guide

Doc references

Docs/Backend-Development.md (rate limiting)
backend/app/main.py (middleware setup)

[CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)

Where found

Backend does not set Content-Security-Policy, X-Frame-Options, X-Content-Type-Options headers
Frontend HTML served without CSP meta tags

Why this is needed

Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks.

Goal

Add security headers to all HTTP responses.

What to do

Add security headers middleware to backend/app/main.py:

@app.middleware("http")
async def add_security_headers(request, call_next):
    response = await call_next(request)
    response.headers["Content-Security-Policy"] = "default-src 'self'"
    response.headers["X-Frame-Options"] = "DENY"
    response.headers["X-Content-Type-Options"] = "nosniff"
    return response

In frontend index.html, add CSP meta tag
Test with browser DevTools Security tab

Possible traps and issues

CSP 'unsafe-inline' defeats security — avoid if possible
CDN resources may need explicit allowlist
Too restrictive CSP breaks functionality; too loose defeats security

Docs changes needed

Add section in Docs/Security.md § HTTP Security Headers

Doc references

Docs/Security.md (security headers)

[CRITICAL] Background tasks lack timeout protection

Where found

backend/app/tasks/blocklist_import.py — no timeout
backend/app/tasks/health_check.py — no timeout
All task functions lack timeout wrapper

Why this is needed

If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion.

Goal

Ensure all background tasks complete within bounded time or fail gracefully.

What to do

Wrap all task functions with asyncio.wait_for(task, timeout):

await asyncio.wait_for(blocklist_service.import_all(...), timeout=300)

Set appropriate timeouts per task:
- Blocklist import: 300s (5 min)
- Health probe: 10s
- Geo cache flush: 60s
Log timeout events and trigger alerts

Possible traps and issues

Timeout too short → legitimate tasks killed prematurely
Timeout too long → resource leak if many tasks hang
Killing task mid-operation may leave inconsistent state

Docs changes needed

Add section in Docs/Backend-Development.md § Background Tasks

Doc references

Docs/Backend-Development.md (background tasks)
backend/app/tasks/ (task modules)

[CRITICAL] Background tasks not idempotent

Where found

backend/app/tasks/blocklist_import.py — bans applied without checking if already banned
backend/app/tasks/geo_cache_flush.py — cache entries written without transaction
Multi-step operations not wrapped in transaction

Why this is needed

If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.

Goal

Make all background tasks idempotent — retrying produces same result as running once.

What to do

Use operation IDs to deduplicate:

operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}"
if await import_log_repo.get_by_operation_id(operation_id):
    return  # Already done

Use transactions for multi-step operations
Store operation state before execution

Possible traps and issues

Idempotency keys must be unique but deterministic
Transactions require database support
State machine (pending → completed/failed) must be enforced

Docs changes needed

Update Docs/Backend-Development.md § Task Idempotency

Doc references

Docs/Backend-Development.md (task design)
backend/app/tasks/ (task implementations)

[CRITICAL] Health check endpoint returns wrong status code

Where found

backend/app/routers/health.py — always returns 200, even when fail2ban offline

Why this is needed

Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.

Goal

Return 503 Service Unavailable when fail2ban is offline.

What to do

Change health endpoint to return 503 when offline:

if not server_status.online:
    return JSONResponse(
        status_code=503,
        content={"status": "unavailable", "fail2ban": "offline"}
    )

Update Docker health check to expect 503 as "unhealthy"

Possible traps and issues

Returning 503 causes orchestration tools to restart container
If fail2ban restarts frequently, health check becomes flaky
Consider gradual degradation

Docs changes needed

Update Docker/Dockerfile.backend health check documentation
Update Docs/Deployment.md § Health Checks

Doc references

backend/app/routers/health.py
Docker/Dockerfile.backend

[IMPORTANT] Database transactions lack explicit isolation

Where found

backend/app/repositories/session_repo.py:40-60 — multiple queries without BEGIN TRANSACTION
Similar pattern in multi-step operations across repositories

Why this is needed

Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite.

Goal

Wrap all multi-step operations in explicit transactions with appropriate isolation level.

What to do

Use explicit BEGIN IMMEDIATE transaction:

await db.execute("BEGIN IMMEDIATE")
try:
    await db.execute("INSERT INTO sessions ...")
    await db.commit()
except Exception:
    await db.rollback()
    raise

Use IMMEDIATE mode to lock immediately for writes
Document transaction boundaries clearly

Possible traps and issues

Nested transactions (SAVEPOINTs) may be needed
Locks held too long cause contention
Deadlocks possible with concurrent writers

Docs changes needed

Add section in Docs/Backend-Development.md § Database Transactions

Doc references

Docs/Backend-Development.md (database design)

[IMPORTANT] Scheduler lock race condition

Where found

backend/app/utils/scheduler_lock.py:56-58 — heartbeat interval 10 seconds

Why this is needed

Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.

Goal

Implement robust distributed locking that prevents concurrent execution.

What to do

Option A (Strengthen heartbeat):

Reduce interval to 5s (half of timeout)
Use database advisory locks
Monitor heartbeat failures

Option B (Migrate to Redis):

Use redlock-py or aioredis
Simpler, more reliable than database-backed

Current code improvements:

Log when heartbeat fails
Add metric for lock contention
Test multi-process scenario

Possible traps and issues

Database locks don't scale under high contention
Redis adds new dependency
Clock skew breaks timestamp-based expiry

Docs changes needed

Update Docs/Deployment.md § Scheduler Lock
Add troubleshooting: "Blocklist import runs twice"

Doc references

Docs/Deployment.md (scheduler)
backend/app/utils/scheduler_lock.py (lock implementation)

[IMPORTANT] API pagination doesn't return metadata

Where found

backend/app/routers/history.py — returns bare list, no pagination metadata
All paginated routers have same issue

Why this is needed

Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.

Goal

Return pagination metadata with every paginated response.

What to do

Create response wrapper:

class PaginatedResponse(BaseModel):
    data: list[Item]
    pagination: PaginationMetadata

Update all paginated routers to return this wrapper
Update frontend to use metadata for UI

Possible traps and issues

SELECT COUNT(*) is slow on large tables
Response shape change — old frontend may not handle

Docs changes needed

Update API documentation § Pagination

Doc references

backend/app/utils/pagination.py

[IMPORTANT] Error response schema inconsistent

Where found

Different handlers return different response shapes
Fail2Ban errors: { "error_code": "...", "detail": "..." }
Validation errors: { "detail": [...] }
Not found errors: { "detail": "...", "error_code": "..." }

Why this is needed

Frontend must normalize multiple shapes, making error handling fragile and error-prone.

Goal

Unify all error responses to single schema.

What to do

Define canonical error response:

class ErrorResponse(BaseModel):
    error_code: str
    message: str
    status: int
    details: dict | None = None

Update all handlers to return this format
Update frontend to expect unified schema

Possible traps and issues

Backward compatibility with old clients
FastAPI's built-in handlers may override custom
Rich detail structures need accommodation

Docs changes needed

Update API documentation with unified error schema
Add error code reference table

Doc references

Docs/API.md (error codes)
backend/app/main.py (exception handlers)

[IMPORTANT] Provider ordering fragility (Frontend)

Where found

frontend/src/App.tsx — 10-level deep provider nesting
frontend/src/providers/PROVIDER_ORDER.md — documents order, no compile-time enforcement

Why this is needed

Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.

Goal

Add compile-time validation of provider ordering.

What to do

Create provider composition utility enforcing order
Use TypeScript discriminated unions
Add ESLint rule to check provider wrapping

Possible traps and issues

TypeScript doesn't easily enforce ordering
May be overkill — improve runtime error messages instead

Docs changes needed

Update Docs/Architekture.md § 3.2 (Providers)

Doc references

Docs/Architekture.md § 3.2 (Providers)
frontend/src/providers/PROVIDER_ORDER.md

[IMPORTANT] Promise cancellation not checked in .then()/.catch() chains

Where found

frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88
frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58
Multiple components use this pattern

Why this is needed

When user navigates away, .then() chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.

Goal

Check for cancellation in all .then()/.catch() chains.

What to do

Replace .then()/.catch() with async/await and cancellation check
Or use wrapper hook to hide logic

Possible traps and issues

Checking signal.aborted after await introduces race conditions
Better: let AbortError propagate, catch it in catch block

Docs changes needed

Update Docs/Web-Development.md § Async Patterns

Doc references

Docs/Web-Development.md (async patterns)

[MEDIUM] Inefficient database pagination uses OFFSET

Where found

backend/app/utils/pagination.py — uses OFFSET (page-1) * page_size

Why this is needed

OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️

Goal

Implement keyset pagination (cursor-based) for large result sets.

What to do

Short-term: Add database indexes on sort columns
Long-term: Implement cursor-based pagination using WHERE instead of OFFSET
Frontend sends cursor (last row ID) instead of page number

Possible traps and issues

Cursor must be deterministic
API contract changes
Cursor format must be opaque to client

Docs changes needed

Update Docs/Backend-Development.md § Database Performance

Doc references

Docs/Backend-Development.md (database performance)

[MEDIUM] Session secret rotation not implemented

Where found

backend/app/config.py — single session_secret with no rotation support

Why this is needed

If secret leaks, all sessions compromised. No way to invalidate old sessions.

Goal

Support gradual secret rotation without forcing logout.

What to do

Store multiple secrets: current and previous
Accept tokens signed with either key
Re-sign tokens with current secret on validation

Possible traps and issues

Rotation strategy must be documented
Metrics needed to track secret usage

Docs changes needed

Update Docs/Backend-Development.md § Session Management

Doc references

Docs/Backend-Development.md

[MEDIUM] No CORS configuration

Where found

backend/app/main.py — no CORS middleware added

Why this is needed

If frontend on different origin, cross-origin requests blocked without CORS configuration.

Goal

Add CORS middleware with proper origin whitelisting.

What to do

Add CORS middleware with specific origin whitelist
Make configurable via environment variable
Default to localhost for development

Possible traps and issues

allow_origins=["*"] defeats CORS security
Credentials require specific origins, not wildcard
Missing config silently fails in browser

Docs changes needed

Update Docs/Deployment.md § CORS Configuration

Doc references

Docs/Deployment.md

[MEDIUM] Input validation missing for regex patterns (ReDoS)

Where found

backend/app/routers/config.py — regex validation accepts arbitrary patterns without timeout

Why this is needed

Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.

Goal

Add timeout and complexity limits to regex validation.

What to do

Add timeout to regex compilation (2 seconds recommended)
Add length limit (reject patterns > 1000 characters)
Use signal.alarm() (Unix) or timeout library

Possible traps and issues

signal.alarm() Unix-only
Some valid complex regexes may timeout
Frontend should also validate (defense in depth)

Docs changes needed

Update API docs to document regex validation limits

Doc references

backend/app/routers/config.py

[MEDIUM] No structured logging to external system

Where found

Logs only go to stdout/file, no external aggregation

Why this is needed

Can't search across instances, historical logs lost on instance recycle.

Goal

Ship logs to centralized logging platform.

What to do

Short-term: Ensure structlog JSON output is valid (already done)
Long-term: Ship to logging platform (ELK, Datadog, Papertrail)

Possible traps and issues

External logging adds latency
Sensitive data must not be logged
Log volume can be massive

Docs changes needed

Add Docs/Observability.md section on logging

Doc references

Docs/Observability.md (new)

[MEDIUM] No Application Performance Monitoring (APM)

Where found

Backend: no metrics collection, latency tracking
Frontend: no error tracking, performance metrics
No observability into request performance

Why this is needed

Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.

Goal

Add comprehensive metrics collection and monitoring.

What to do

Backend metrics:
- Add Prometheus metrics: request count, latency, active requests
- Expose /metrics endpoint
Frontend metrics:
- Page load time, FCP, LCP using web-vitals
- API error rates and latencies
Aggregation:
- Prometheus + Grafana, or Datadog/NewRelic

Possible traps and issues

Metrics collection has performance cost
Cardinality explosion with tags
PII in metrics

Docs changes needed

Add Docs/Observability.md

Doc references

Docs/Observability.md (new)

[LOW] Frontend charts not memoized

Where found

frontend/src/components/TopCountriesPieChart.tsx
frontend/src/components/TopCountriesBarChart.tsx

Why this is needed

Charts re-render on every parent update, Recharts reprocesses 5000+ points.

Goal

Memoize chart components.

What to do

Wrap with React.memo with custom comparison
Ensure data objects are stable

Possible traps and issues

Shallow comparison might not be enough
Memoization has memory cost

Docs changes needed

No documentation changes

Doc references

frontend/src/components/TopCountriesChart.tsx

[LOW] No request deduplication on frontend

Where found

frontend/src/hooks/useFetchData.ts — each call launches new request
User clicks "Refresh" twice → two identical requests

Why this is needed

Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).

Goal

Deduplicate identical in-flight requests.

What to do

Implement request cache
Clear cache entry when response received
Use in useFetchData

Possible traps and issues

Cache must be cleared on data mutation
Stale data in cache possible if not careful

Docs changes needed

No documentation changes

Doc references

frontend/src/hooks/useFetchData.ts

36 KiB Raw Blame History

[Backend] Exception handler overlap — broad handlers catching everything

[Backend] Login rate limiter — penalty sleep does not block the request

[Backend] Module-level imports inside dependency provider functions

[Backend] get_password_hash lives in setup_service but is used by auth_service

[Backend] re module imported inside function body

[Frontend] AuthProvider sessionStorage not synchronized across tabs

[Frontend] usePolledData — setInterval without drift correction

[Frontend] ErrorBoundary — non-standard componentStack property

[Frontend] No loading skeleton for Login page

[CRITICAL] Backend session cache not cluster-safe

[CRITICAL] Frontend-Backend type generation drift

[CRITICAL] Docker containers lack resource limits

[CRITICAL] Global rate limiting missing

[CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)

[CRITICAL] Background tasks lack timeout protection

[CRITICAL] Background tasks not idempotent

[CRITICAL] Health check endpoint returns wrong status code

[IMPORTANT] Database transactions lack explicit isolation

[IMPORTANT] Scheduler lock race condition

[IMPORTANT] API pagination doesn't return metadata

[IMPORTANT] Error response schema inconsistent

[IMPORTANT] Provider ordering fragility (Frontend)

[IMPORTANT] Promise cancellation not checked in .then()/.catch() chains

[MEDIUM] Inefficient database pagination uses OFFSET

[MEDIUM] Session secret rotation not implemented

[MEDIUM] No CORS configuration

[MEDIUM] Input validation missing for regex patterns (ReDoS)

[MEDIUM] No structured logging to external system

[MEDIUM] No Application Performance Monitoring (APM)

[LOW] Frontend charts not memoized

[LOW] No request deduplication on frontend

36 KiB

Raw Blame History

[Backend] `get_password_hash` lives in `setup_service` but is used by `auth_service`

[Backend] `re` module imported inside function body

[Frontend] ErrorBoundary — non-standard `componentStack` property