Files
BanGUI/Docs/Tasks.md
Lukas 2db635ae19 Fix exception handler overlap issue - add DomainError catch-all handler
**Problem:** Broad exception handlers created fragility where adding a new
DomainError subclass without explicit registration would silently fall through
to the generic exception handler, losing the specific error_code and metadata.

**Solution:**
1. Import DomainError in main.py for explicit handler registration
2. Fix type hints in exception handlers from 'Exception' to specific types
   - NotFoundError handler now typed as 'NotFoundError'
   - BadRequestError handler now typed as 'BadRequestError'
   - ConflictError handler now typed as 'ConflictError'
   - DomainError handler now typed as 'DomainError'
   - ServiceUnavailableError handler now typed as 'ServiceUnavailableError'
3. Add DomainError as an explicit catch-all handler in the registration chain
   - Positioned after specific handlers, before HTTPException
   - Any unregistered DomainError subclass now gets correct error_code + metadata
4. Document the exception handler hierarchy with detailed comments
5. Update Backend-Development.md with handler hierarchy documentation
6. Update Architekture.md section 2.2 with exception handler details
7. Fix test expectations in test_main.py to verify ErrorResponse format

**Impact:** Any new DomainError subclass now automatically gets correct HTTP 500
status, error_code, and metadata - even if developer forgets explicit handler.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:44:43 +02:00

36 KiB

[Backend] Exception handler overlap — broad handlers catching everything

Where found

  • backend/app/main.py:182-200_get_error_code() accepts any Exception and falls back to snake_case conversion
  • Multiple handlers (lines 329-466) accept Exception as parameter

Why this is needed

Broad exception handlers create fragility: adding a new DomainError subclass without explicitly registering a handler silently falls through, producing generic error codes instead of specific ones. The fallback chain is not explicitly documented.

Goal

Make the exception handler registration explicit and documented. Every exception type that can bubble up should have a clear path to a handler.

What to do

  1. Audit all exception handlers and confirm they are registered with the most specific base class
  2. Add a comment block documenting the fallback chain
  3. Ensure every custom DomainError subclass has error_code and get_error_metadata() implemented
  4. Add a catch-all Exception handler as the absolute last resort

Possible traps and issues

  • If a new DomainError subclass is added without handler registration, it silently returns wrong status code
  • ValueError handler may catch Pydantic ValidationError subclasses

Docs changes needed

  • Update Docs/Architekture.md § 2.2 (Application Entry Point) — document exception handler hierarchy
  • Add section in Docs/Backend-Development.md on exception taxonomy

Doc references

  • Docs/Architekture.md § 2.2 (Application Entry Point)
  • Docs/Backend-Development.md (exception conventions)

[Backend] Login rate limiter — penalty sleep does not block the request

Where found

  • backend/app/routers/auth.py:82-107 — rate limiter check happens before password verification, penalty sleep happens after

Why this is needed

The current design means attackers who stay under 5 requests/minute get no penalty at all. The asyncio.sleep only fires after the rate limit is already exceeded, significantly weakening the limiter's effectiveness.

Goal

Ensure the rate limiter blocks requests before the password check is attempted. Each wrong password should incur a progressive delay.

What to do

  1. Remove the acquire/release pattern
  2. Change flow so record_failure is called on every wrong password and is_allowed returns False when limit exceeded
  3. Implement exponential backoff: penalty = min(base_delay * (2 ** failure_count), max_delay)
  4. Consider using a token bucket rather than sliding window
  5. Ensure is_allowed uses the failure count atomically

Possible traps and issues

  • If asyncio.sleep is called before password check, legitimate users experience latency on response
  • Keep maximum penalty reasonable (2-5 seconds)
  • record_failure counter must be stored durably (in-memory is fine for single-worker)

Docs changes needed

  • Update Docs/Architekture.md § 2.2 (auth router) — reflect rate limiting behavior
  • Add note in Docs/Backend-Development.md about rate limiter design

Doc references

  • Docs/Architekture.md § 2.2 (auth router)
  • backend/app/routers/auth.py (login endpoint)

[Backend] Module-level imports inside dependency provider functions

Where found

  • backend/app/dependencies.py:151from app.db import open_db inside get_db()
  • backend/app/dependencies.py:258, 270, 282, 294, 307, 319, 332 — local imports inside each get_<repo>() provider
  • Similar patterns in service provider functions

Why this is needed

FastAPI calls dependency provider functions on every request. While Python caches modules, the import statement still has overhead. More importantly, the pattern is inconsistent — some providers have module-level imports while others have local imports, making it unclear which providers are safe to call at high frequency.

Goal

Move all repository and service imports to module level in dependencies.py.

What to do

  1. Move all repository imports to module level near the top
  2. Similarly move from app.services import health_service to module level
  3. Keep from app.db import open_db as local import (only needed within get_db())
  4. Move from app.services import auth_service to module level with TYPE_CHECKING guard if circular deps prevent it

Possible traps and issues

  • Moving imports to module level may expose hidden circular imports
  • Current local-import pattern was likely chosen to avoid circular deps — verify no circular dependencies before making change

Docs changes needed

  • Update Docs/Architekture.md § 2.3 (Dependency Wiring) — repository and service imports should be at module level

Doc references

  • Docs/Architekture.md § 2.3 (Dependency Wiring)
  • backend/app/dependencies.py (composition root)

[Backend] get_password_hash lives in setup_service but is used by auth_service

Where found

  • backend/app/services/auth_service.py:27 — imports get_password_hash from setup_service

Why this is needed

auth_service.py handles all authentication concerns and is the natural home for password hash retrieval. Having it import from setup_service creates incorrect semantic dependency and unnecessary coupling.

Goal

Move get_password_hash to auth_service.py where it belongs, or to a shared app/utils/crypto.py module.

What to do

  1. Move the get_password_hash function body from app/services/setup_service.py to app/services/auth_service.py
  2. Update the import in auth_service.py to use the local function
  3. Search all of backend/app/ for usages and update imports
  4. Remove the function from setup_service.py

Possible traps and issues

  • Search thoroughly for all usages before removing — may be imported in tests, other services, setup router
  • If used during initial setup flow, update that usage to import from auth_service instead

Docs changes needed

  • No documentation changes required — internal code reorganization

Doc references

  • backend/app/services/auth_service.py
  • backend/app/services/setup_service.py

[Backend] re module imported inside function body

Where found

  • backend/app/main.py:198-199import re inside _get_error_code()

Why this is needed

Importing inside a function is a code smell. Standard practice is to import modules at the top of the file.

Goal

Move import re to the module-level imports at the top of main.py.

What to do

  1. Add import re to existing imports section at top of backend/app/main.py
  2. Remove import re line from inside _get_error_code()

Possible traps and issues

  • None — straightforward refactoring with no behavioral change

Docs changes needed

  • No documentation changes needed

Doc references

  • backend/app/main.py

[Frontend] AuthProvider sessionStorage not synchronized across tabs

Where found

  • frontend/src/providers/AuthProvider.tsx — uses sessionStorage to persist isAuthenticated across page refreshes
  • sessionStorage is tab-scoped — not shared across browser tabs

Why this is needed

If a user logs out in Tab A, Tab B's sessionStorage still holds isAuthenticated: true. Tab B continues showing authenticated UI until next full page refresh or 401 error.

Goal

Ensure logout state is propagated across all open tabs immediately.

What to do

  1. Add storage event listener in AuthProvider to receive logout events from other tabs
  2. When logout() is called, clear sessionStorage explicitly
  3. Consider also using BroadcastChannel API for real-time cross-tab sync

Possible traps and issues

  • storage event only fires when sessionStorage changed in another tab
  • StorageEvent doesn't fire if cookies are sole authentication mechanism
  • Very old browsers don't support StorageEvent — fallback to full-page refresh

Docs changes needed

  • Update Docs/Architekture.md § 3.2 (AuthProvider) — document cross-tab synchronization
  • Add note in Docs/Web-Development.md about authentication state management

Doc references

  • Docs/Architekture.md § 3.2 (Providers section)
  • frontend/src/providers/AuthProvider.tsx

[Frontend] usePolledData — setInterval without drift correction

Where found

  • frontend/src/hooks/usePolledData.ts — uses setInterval without tracking expected next poll time

Why this is needed

With fixed setInterval, if refetch takes time (e.g., 2 seconds for slow API) and pollInterval is 5 seconds, the actual polling pattern accumulates drift. Effective polling interval becomes > 5 seconds, wasting bandwidth and CPU.

Goal

Implement drift-corrected polling that schedules next poll relative to completion of previous poll, not its start.

What to do

  1. Replace setInterval with self-scheduling timeout
  2. Track elapsed time between poll start and completion
  3. Schedule next poll with compensation: delay = Math.max(0, pollInterval - elapsed)
  4. Alternatively use use-sse or custom hook useDriftCorrectedPolling

Possible traps and issues

  • Cancellation check must happen both before and after await refetch() to prevent race conditions
  • Coordinate cancellation with AbortController already used by useFetchData
  • If pollInterval changes, hook must restart polling loop

Docs changes needed

  • Update frontend/src/hooks/README.md — document drift-corrected polling behavior

Doc references

  • frontend/src/hooks/usePolledData.ts
  • frontend/src/hooks/README.md

[Frontend] ErrorBoundary — non-standard componentStack property

Where found

  • frontend/src/components/ErrorBoundary.tsx — accesses errorInfo.componentStack
  • Property not in React.ErrorInfo public type definitions

Why this is needed

Accessing errorInfo.componentStack produces TypeScript error under strict mode. Property is React DevTools implementation detail, may change without notice.

Goal

Remove dependency on errorInfo.componentStack, use alternative for stack information.

What to do

  1. Check what recordCritical does with componentStack parameter
  2. Consider using standard Error.prototype.stack instead
  3. Or cast errorInfo to any with comment explaining trade-off:
    const stack = (errorInfo as any).componentStack as string | undefined;
    // TODO: Remove when React types include componentStack
    
  4. Handle case where componentStack is undefined

Possible traps and issues

  • Error.prototype.stack contains JavaScript call stack, not React component stack
  • Casting to any suppresses type checking — ensure it's documented
  • In production builds componentStack may be empty or absent

Docs changes needed

  • Update Docs/Architekture.md § 3.2 (Components) — note that ErrorBoundary uses React-internal error info

Doc references

  • frontend/src/components/ErrorBoundary.tsx
  • Docs/Architekture.md § 3.2 (Components section)

[Frontend] No loading skeleton for Login page

Where found

  • frontend/src/pages/LoginPage.tsx — renders password form immediately without checking session validation state

Why this is needed

AuthProvider validates session on mount. During validation (network round-trip), LoginPage renders form. If user already has valid session, they briefly see login page before redirect — "flash" of wrong page.

Goal

Show loading state (spinner or skeleton) while session validation in progress.

What to do

  1. Add useSessionValidation call to LoginPage to get isValidating flag
  2. If isValidating, return <Spinner label="Checking session..." />
  3. Alternatively, have AuthProvider expose isValidating state that LoginPage can read
  4. Ensure redirect to dashboard happens automatically via useEffect

Possible traps and issues

  • useSessionValidation may not expose isValidating — check hook implementation
  • Ensure loading spinner is consistent with Fluent UI design language
  • Redirect logic must handle ?next= query parameter

Docs changes needed

  • No documentation changes needed — UX improvement to existing component

Doc references

  • frontend/src/pages/LoginPage.tsx
  • frontend/src/providers/AuthProvider.tsx

[CRITICAL] Backend session cache not cluster-safe

Where found

  • backend/app/startup.py:66-91verify_bangui_workers() validates BANGUI_WORKERS env var but only if explicitly set
  • backend/app/utils/session_cache.py — in-memory cache stored on app.state (process-local)
  • Multi-worker deployments use separate worker processes with separate app.state

Why this is needed

Multi-worker deployments (e.g., gunicorn -w 4) create separate worker processes. Each has its own in-memory app.state, so sessions cached in Worker A are invisible to Workers B, C, D. Request landing on different worker shows session invalid — users randomly logged out.

Goal

Ensure multi-worker deployments either prevented at startup or migrate to shared session store (Redis, PostgreSQL).

What to do

Option A (Strict single-worker):

  • Add runtime detection of actual worker count (not just env var checking)
  • Fail startup if multi-worker scenario detected
  • Document single-worker requirement prominently

Option B (Support multi-worker):

  • Migrate session cache to Redis or PostgreSQL
  • Update app/utils/session_cache.py to use distributed backend
  • Add BANGUI_REDIS_URL or similar to config

Immediate mitigation:

  • Document in deployment guide that BANGUI_WORKERS=1 is mandatory
  • Add validation that fails loudly if BANGUI_WORKERS != 1

Possible traps and issues

  • uvicorn --workers N creates N processes, each with separate app.state
  • Environment variable validation easily bypassed using command-line flag
  • Detecting actual worker count at runtime is tricky — consider os.getpid() and shared lock file

Docs changes needed

  • Update Docs/Deployment.md § Deployment Constraints — explicitly document single-worker requirement
  • Add troubleshooting entry: "Why am I randomly logged out?" → Check BANGUI_WORKERS
  • Update Docker/docker-compose.yml with comment explaining requirement

Doc references

  • Docs/Deployment.md (deployment constraints)
  • backend/app/startup.py (worker validation)
  • backend/app/utils/session_cache.py (session cache)

[CRITICAL] Frontend-Backend type generation drift

Where found

  • frontend/src/types/ — manually maintained Pydantic model transcriptions
  • backend/app/models/ — source Pydantic models
  • No build-time validation or code generation linking them

Why this is needed

Types manually copied from Python models. When backend model changes, frontend types not updated automatically. Frontend uses stale types, causing runtime errors, type errors at build time, or silent UI bugs.

Goal

Implement automated type synchronization from backend schema to frontend types, validated at build time.

What to do

  1. Recommended: Generate TypeScript types from OpenAPI schema

    openapi-typescript http://localhost:8000/openapi.json -o src/types/generated.ts
    
  2. Setup:

    • Ensure backend exposes OpenAPI schema at /openapi.json (FastAPI has this built-in)
    • Add openapi-typescript to frontend/package.json devDependencies
    • Generate types in pre-build step: npm run generate:types
    • Import types from generated file, not hand-written
    • Add CI check: fail build if generated types don't match committed types
  3. Alternative: Use typed-rest-client or msw with type generation

Possible traps and issues

  • OpenAPI schema must be kept up-to-date — CI validation must enforce
  • Generated types may have different names than hand-written types — migration needed
  • Private/internal model fields should be excluded from schema

Docs changes needed

  • Update Docs/Web-Development.md § Type Generation — document workflow
  • Add pre-commit hook documentation
  • Update Docs/Backend-Development.md to note API changes must keep schema in sync

Doc references

  • Docs/Web-Development.md (type generation)
  • Docs/Backend-Development.md (API changes)

[CRITICAL] Docker containers lack resource limits

Where found

  • Docker/docker-compose.yml — no deploy.limits or deploy.reservations sections

Why this is needed

Without resource limits, single container can consume all host CPU, memory, disk. "Noisy neighbor" scenario where backend memory leak → uses 100% RAM → OOM kill → host unresponsive.

Goal

Set hard and soft resource limits for all containers.

What to do

  1. Add resource limits to docker-compose.yml:

    backend:
      deploy:
        limits:
          cpus: '2'
          memory: 512M
        reservations:
          cpus: '1'
          memory: 256M
    
  2. Document these limits in Docs/Deployment.md

  3. For Kubernetes, add equivalent resources.limits and resources.requests

Possible traps and issues

  • Limits set too low → OOM kill or throttling
  • Backend may need more memory for large blocklists
  • Test under expected load before finalizing
  • Different environments may need different limits

Docs changes needed

  • Update Docker/docker-compose.yml with deploy sections
  • Add section in Docs/Deployment.md § Resource Allocation

Doc references

  • Docker/docker-compose.yml
  • Docs/Deployment.md (resource allocation)

[CRITICAL] Global rate limiting missing

Where found

  • backend/app/routers/auth.py — only /api/auth/login has rate limiting
  • All other routers have no rate limiting

Why this is needed

Without rate limiting, attackers can spam endpoints to cause CPU spike, database overload, or network bandwidth exhaustion.

Goal

Implement global per-IP rate limiting on all endpoints.

What to do

  1. Add rate limiting middleware to backend/app/main.py:

    from slowapi import Limiter
    limiter = Limiter(key_func=get_remote_address, default_limits=["200 per minute"])
    app.state.limiter = limiter
    
  2. Apply to all routers with appropriate limits per endpoint

  3. Return proper HTTP 429 with Retry-After header

  4. Document limits in API docs

Possible traps and issues

  • Limits set too low block legitimate users
  • Distributed deployments need shared limiter state (Redis-backed)
  • Different endpoints may need different limits
  • Trusted IPs should bypass limiting

Docs changes needed

  • Add section in Docs/Backend-Development.md § Rate Limiting
  • Document default limits in deployment guide

Doc references

  • Docs/Backend-Development.md (rate limiting)
  • backend/app/main.py (middleware setup)

[CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)

Where found

  • Backend does not set Content-Security-Policy, X-Frame-Options, X-Content-Type-Options headers
  • Frontend HTML served without CSP meta tags

Why this is needed

Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks.

Goal

Add security headers to all HTTP responses.

What to do

  1. Add security headers middleware to backend/app/main.py:

    @app.middleware("http")
    async def add_security_headers(request, call_next):
        response = await call_next(request)
        response.headers["Content-Security-Policy"] = "default-src 'self'"
        response.headers["X-Frame-Options"] = "DENY"
        response.headers["X-Content-Type-Options"] = "nosniff"
        return response
    
  2. In frontend index.html, add CSP meta tag

  3. Test with browser DevTools Security tab

Possible traps and issues

  • CSP 'unsafe-inline' defeats security — avoid if possible
  • CDN resources may need explicit allowlist
  • Too restrictive CSP breaks functionality; too loose defeats security

Docs changes needed

  • Add section in Docs/Security.md § HTTP Security Headers

Doc references

  • Docs/Security.md (security headers)

[CRITICAL] Background tasks lack timeout protection

Where found

  • backend/app/tasks/blocklist_import.py — no timeout
  • backend/app/tasks/health_check.py — no timeout
  • All task functions lack timeout wrapper

Why this is needed

If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion.

Goal

Ensure all background tasks complete within bounded time or fail gracefully.

What to do

  1. Wrap all task functions with asyncio.wait_for(task, timeout):

    await asyncio.wait_for(blocklist_service.import_all(...), timeout=300)
    
  2. Set appropriate timeouts per task:

    • Blocklist import: 300s (5 min)
    • Health probe: 10s
    • Geo cache flush: 60s
  3. Log timeout events and trigger alerts

Possible traps and issues

  • Timeout too short → legitimate tasks killed prematurely
  • Timeout too long → resource leak if many tasks hang
  • Killing task mid-operation may leave inconsistent state

Docs changes needed

  • Add section in Docs/Backend-Development.md § Background Tasks

Doc references

  • Docs/Backend-Development.md (background tasks)
  • backend/app/tasks/ (task modules)

[CRITICAL] Background tasks not idempotent

Where found

  • backend/app/tasks/blocklist_import.py — bans applied without checking if already banned
  • backend/app/tasks/geo_cache_flush.py — cache entries written without transaction
  • Multi-step operations not wrapped in transaction

Why this is needed

If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.

Goal

Make all background tasks idempotent — retrying produces same result as running once.

What to do

  1. Use operation IDs to deduplicate:

    operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}"
    if await import_log_repo.get_by_operation_id(operation_id):
        return  # Already done
    
  2. Use transactions for multi-step operations

  3. Store operation state before execution

Possible traps and issues

  • Idempotency keys must be unique but deterministic
  • Transactions require database support
  • State machine (pending → completed/failed) must be enforced

Docs changes needed

  • Update Docs/Backend-Development.md § Task Idempotency

Doc references

  • Docs/Backend-Development.md (task design)
  • backend/app/tasks/ (task implementations)

[CRITICAL] Health check endpoint returns wrong status code

Where found

  • backend/app/routers/health.py — always returns 200, even when fail2ban offline

Why this is needed

Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.

Goal

Return 503 Service Unavailable when fail2ban is offline.

What to do

  1. Change health endpoint to return 503 when offline:

    if not server_status.online:
        return JSONResponse(
            status_code=503,
            content={"status": "unavailable", "fail2ban": "offline"}
        )
    
  2. Update Docker health check to expect 503 as "unhealthy"

Possible traps and issues

  • Returning 503 causes orchestration tools to restart container
  • If fail2ban restarts frequently, health check becomes flaky
  • Consider gradual degradation

Docs changes needed

  • Update Docker/Dockerfile.backend health check documentation
  • Update Docs/Deployment.md § Health Checks

Doc references

  • backend/app/routers/health.py
  • Docker/Dockerfile.backend

[IMPORTANT] Database transactions lack explicit isolation

Where found

  • backend/app/repositories/session_repo.py:40-60 — multiple queries without BEGIN TRANSACTION
  • Similar pattern in multi-step operations across repositories

Why this is needed

Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite.

Goal

Wrap all multi-step operations in explicit transactions with appropriate isolation level.

What to do

  1. Use explicit BEGIN IMMEDIATE transaction:

    await db.execute("BEGIN IMMEDIATE")
    try:
        await db.execute("INSERT INTO sessions ...")
        await db.commit()
    except Exception:
        await db.rollback()
        raise
    
  2. Use IMMEDIATE mode to lock immediately for writes

  3. Document transaction boundaries clearly

Possible traps and issues

  • Nested transactions (SAVEPOINTs) may be needed
  • Locks held too long cause contention
  • Deadlocks possible with concurrent writers

Docs changes needed

  • Add section in Docs/Backend-Development.md § Database Transactions

Doc references

  • Docs/Backend-Development.md (database design)

[IMPORTANT] Scheduler lock race condition

Where found

  • backend/app/utils/scheduler_lock.py:56-58 — heartbeat interval 10 seconds

Why this is needed

Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.

Goal

Implement robust distributed locking that prevents concurrent execution.

What to do

Option A (Strengthen heartbeat):

  • Reduce interval to 5s (half of timeout)
  • Use database advisory locks
  • Monitor heartbeat failures

Option B (Migrate to Redis):

  • Use redlock-py or aioredis
  • Simpler, more reliable than database-backed

Current code improvements:

  • Log when heartbeat fails
  • Add metric for lock contention
  • Test multi-process scenario

Possible traps and issues

  • Database locks don't scale under high contention
  • Redis adds new dependency
  • Clock skew breaks timestamp-based expiry

Docs changes needed

  • Update Docs/Deployment.md § Scheduler Lock
  • Add troubleshooting: "Blocklist import runs twice"

Doc references

  • Docs/Deployment.md (scheduler)
  • backend/app/utils/scheduler_lock.py (lock implementation)

[IMPORTANT] API pagination doesn't return metadata

Where found

  • backend/app/routers/history.py — returns bare list, no pagination metadata
  • All paginated routers have same issue

Why this is needed

Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.

Goal

Return pagination metadata with every paginated response.

What to do

  1. Create response wrapper:

    class PaginatedResponse(BaseModel):
        data: list[Item]
        pagination: PaginationMetadata
    
  2. Update all paginated routers to return this wrapper

  3. Update frontend to use metadata for UI

Possible traps and issues

  • SELECT COUNT(*) is slow on large tables
  • Response shape change — old frontend may not handle

Docs changes needed

  • Update API documentation § Pagination

Doc references

  • backend/app/utils/pagination.py

[IMPORTANT] Error response schema inconsistent

Where found

  • Different handlers return different response shapes
  • Fail2Ban errors: { "error_code": "...", "detail": "..." }
  • Validation errors: { "detail": [...] }
  • Not found errors: { "detail": "...", "error_code": "..." }

Why this is needed

Frontend must normalize multiple shapes, making error handling fragile and error-prone.

Goal

Unify all error responses to single schema.

What to do

  1. Define canonical error response:

    class ErrorResponse(BaseModel):
        error_code: str
        message: str
        status: int
        details: dict | None = None
    
  2. Update all handlers to return this format

  3. Update frontend to expect unified schema

Possible traps and issues

  • Backward compatibility with old clients
  • FastAPI's built-in handlers may override custom
  • Rich detail structures need accommodation

Docs changes needed

  • Update API documentation with unified error schema
  • Add error code reference table

Doc references

  • Docs/API.md (error codes)
  • backend/app/main.py (exception handlers)

[IMPORTANT] Provider ordering fragility (Frontend)

Where found

  • frontend/src/App.tsx — 10-level deep provider nesting
  • frontend/src/providers/PROVIDER_ORDER.md — documents order, no compile-time enforcement

Why this is needed

Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.

Goal

Add compile-time validation of provider ordering.

What to do

  1. Create provider composition utility enforcing order
  2. Use TypeScript discriminated unions
  3. Add ESLint rule to check provider wrapping

Possible traps and issues

  • TypeScript doesn't easily enforce ordering
  • May be overkill — improve runtime error messages instead

Docs changes needed

  • Update Docs/Architekture.md § 3.2 (Providers)

Doc references

  • Docs/Architekture.md § 3.2 (Providers)
  • frontend/src/providers/PROVIDER_ORDER.md

[IMPORTANT] Promise cancellation not checked in .then()/.catch() chains

Where found

  • frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88
  • frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58
  • Multiple components use this pattern

Why this is needed

When user navigates away, .then() chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.

Goal

Check for cancellation in all .then()/.catch() chains.

What to do

  1. Replace .then()/.catch() with async/await and cancellation check
  2. Or use wrapper hook to hide logic

Possible traps and issues

  • Checking signal.aborted after await introduces race conditions
  • Better: let AbortError propagate, catch it in catch block

Docs changes needed

  • Update Docs/Web-Development.md § Async Patterns

Doc references

  • Docs/Web-Development.md (async patterns)

[MEDIUM] Inefficient database pagination uses OFFSET

Where found

  • backend/app/utils/pagination.py — uses OFFSET (page-1) * page_size

Why this is needed

OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️

Goal

Implement keyset pagination (cursor-based) for large result sets.

What to do

  1. Short-term: Add database indexes on sort columns
  2. Long-term: Implement cursor-based pagination using WHERE instead of OFFSET
  3. Frontend sends cursor (last row ID) instead of page number

Possible traps and issues

  • Cursor must be deterministic
  • API contract changes
  • Cursor format must be opaque to client

Docs changes needed

  • Update Docs/Backend-Development.md § Database Performance

Doc references

  • Docs/Backend-Development.md (database performance)

[MEDIUM] Session secret rotation not implemented

Where found

  • backend/app/config.py — single session_secret with no rotation support

Why this is needed

If secret leaks, all sessions compromised. No way to invalidate old sessions.

Goal

Support gradual secret rotation without forcing logout.

What to do

  1. Store multiple secrets: current and previous
  2. Accept tokens signed with either key
  3. Re-sign tokens with current secret on validation

Possible traps and issues

  • Rotation strategy must be documented
  • Metrics needed to track secret usage

Docs changes needed

  • Update Docs/Backend-Development.md § Session Management

Doc references

  • Docs/Backend-Development.md

[MEDIUM] No CORS configuration

Where found

  • backend/app/main.py — no CORS middleware added

Why this is needed

If frontend on different origin, cross-origin requests blocked without CORS configuration.

Goal

Add CORS middleware with proper origin whitelisting.

What to do

  1. Add CORS middleware with specific origin whitelist
  2. Make configurable via environment variable
  3. Default to localhost for development

Possible traps and issues

  • allow_origins=["*"] defeats CORS security
  • Credentials require specific origins, not wildcard
  • Missing config silently fails in browser

Docs changes needed

  • Update Docs/Deployment.md § CORS Configuration

Doc references

  • Docs/Deployment.md

[MEDIUM] Input validation missing for regex patterns (ReDoS)

Where found

  • backend/app/routers/config.py — regex validation accepts arbitrary patterns without timeout

Why this is needed

Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.

Goal

Add timeout and complexity limits to regex validation.

What to do

  1. Add timeout to regex compilation (2 seconds recommended)
  2. Add length limit (reject patterns > 1000 characters)
  3. Use signal.alarm() (Unix) or timeout library

Possible traps and issues

  • signal.alarm() Unix-only
  • Some valid complex regexes may timeout
  • Frontend should also validate (defense in depth)

Docs changes needed

  • Update API docs to document regex validation limits

Doc references

  • backend/app/routers/config.py

[MEDIUM] No structured logging to external system

Where found

  • Logs only go to stdout/file, no external aggregation

Why this is needed

Can't search across instances, historical logs lost on instance recycle.

Goal

Ship logs to centralized logging platform.

What to do

  1. Short-term: Ensure structlog JSON output is valid (already done)
  2. Long-term: Ship to logging platform (ELK, Datadog, Papertrail)

Possible traps and issues

  • External logging adds latency
  • Sensitive data must not be logged
  • Log volume can be massive

Docs changes needed

  • Add Docs/Observability.md section on logging

Doc references

  • Docs/Observability.md (new)

[MEDIUM] No Application Performance Monitoring (APM)

Where found

  • Backend: no metrics collection, latency tracking
  • Frontend: no error tracking, performance metrics
  • No observability into request performance

Why this is needed

Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.

Goal

Add comprehensive metrics collection and monitoring.

What to do

  1. Backend metrics:

    • Add Prometheus metrics: request count, latency, active requests
    • Expose /metrics endpoint
  2. Frontend metrics:

    • Page load time, FCP, LCP using web-vitals
    • API error rates and latencies
  3. Aggregation:

    • Prometheus + Grafana, or Datadog/NewRelic

Possible traps and issues

  • Metrics collection has performance cost
  • Cardinality explosion with tags
  • PII in metrics

Docs changes needed

  • Add Docs/Observability.md

Doc references

  • Docs/Observability.md (new)

[LOW] Frontend charts not memoized

Where found

  • frontend/src/components/TopCountriesPieChart.tsx
  • frontend/src/components/TopCountriesBarChart.tsx

Why this is needed

Charts re-render on every parent update, Recharts reprocesses 5000+ points.

Goal

Memoize chart components.

What to do

  1. Wrap with React.memo with custom comparison
  2. Ensure data objects are stable

Possible traps and issues

  • Shallow comparison might not be enough
  • Memoization has memory cost

Docs changes needed

  • No documentation changes

Doc references

  • frontend/src/components/TopCountriesChart.tsx

[LOW] No request deduplication on frontend

Where found

  • frontend/src/hooks/useFetchData.ts — each call launches new request
  • User clicks "Refresh" twice → two identical requests

Why this is needed

Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).

Goal

Deduplicate identical in-flight requests.

What to do

  1. Implement request cache
  2. Clear cache entry when response received
  3. Use in useFetchData

Possible traps and issues

  • Cache must be cleared on data mutation
  • Stale data in cache possible if not careful

Docs changes needed

  • No documentation changes

Doc references

  • frontend/src/hooks/useFetchData.ts