Files
BanGUI/Docs/Tasks.md

19 KiB
Raw Blame History

Issue #52: MEDIUM - Error Handling Patterns Not Declared on Services

Where found:

  • backend/app/services/error_handling.py:1-65 three patterns (ABORT_ON_ERROR, RETURN_DEFAULT, PARTIAL_RESULT) defined but not annotated on callers

Why this is needed: Callers must read docstrings to know whether a service raises, returns a default, or returns partial results. A service silently changing its pattern is a breaking change that type checking and tests may not catch.

Goal: Make each service's error contract explicit and machine-checkable.

What to do:

  1. Create typed wrapper classes or protocol types for each error pattern.
  2. Annotate service return types to reflect the chosen pattern (e.g., Result[T, E] or a tagged union).
  3. Alternatively, use a decorator that records and enforces the declared pattern.

Possible traps and issues:

  • Adding a full Result type may require widespread refactoring; start with documentation-only annotations and migrate incrementally.

Docs changes needed:

  • backend/app/services/error_handling.py: update module docstring with pattern descriptions and usage examples.

Doc references:

  • backend/app/services/error_handling.py

Issue #53: MEDIUM - Pagination Contract Inconsistent (Offset vs Cursor)

Where found:

  • backend/app/models/ban.py:87-95
  • backend/app/utils/pagination.py:265-305
  • backend/app/models/response.py:125-180

Why this is needed: Some endpoints use PaginatedListResponse with a pagination object; others return page and page_size directly. Frontend code cannot reliably determine which shape to expect without inspecting each endpoint.

Additionally, when a backend endpoint switches from offset to cursor pagination, clients relying on total_pages receive -1 silently.

Goal: A single, documented pagination envelope used consistently across all list endpoints.

What to do:

  1. Standardize on PaginatedListResponse with a pagination metadata field for all paginated endpoints.
  2. Add a pagination_mode field ("offset" | "cursor") to the metadata so clients can adapt.
  3. Migrate non-conforming endpoints and add a linting check.

Possible traps and issues:

  • This is a breaking API change for existing frontend consumers; version the affected endpoints or use a feature flag.

Docs changes needed:

  • API reference: document the standard pagination envelope.

Doc references:

  • backend/app/models/ban.py inline comment about pagination fields

Issue #54: MEDIUM - Provider Composition Order Is Fragile

Where found:

  • frontend/src/App.tsx:74-223 providers manually nested with a documented-but-unenforced order

Why this is needed: A developer refactoring App.tsx can accidentally swap two providers (e.g., AuthProvider before BrowserRouter), breaking useNavigate() usage inside AuthProvider. No test or lint rule will catch this.

Goal: Make incorrect provider ordering either impossible or immediately detectable.

What to do:

  1. Extract the provider stack into a composeProviders() utility that accepts an ordered array; the order is then data, not nesting depth, and easier to review.
  2. Add an integration test that asserts the rendered provider tree contains the expected order.

Possible traps and issues:

  • composeProviders utilities can obscure stack traces; ensure display names are preserved.

Docs changes needed:

  • frontend/src/providers/PROVIDER_ORDER.md: keep updated as the canonical order reference.

Doc references:

  • frontend/src/providers/PROVIDER_ORDER.md

Issue #55: MEDIUM - Correlation ID Scope Is Module-Level (Resets on HMR)

Where found:

  • frontend/src/api/client.ts:26-39 sessionCorrelationId stored as a module variable

Why this is needed: Hot Module Replacement (HMR) re-evaluates modules, generating a new correlation ID mid-session. Distributed traces lose continuity, making it impossible to correlate logs from the same user session.

Goal: Persist the correlation ID across module re-evaluations for the lifetime of the browser tab.

What to do:

  1. Store the correlation ID in sessionStorage on first generation; read from there on subsequent module evaluations.
  2. Clear it on logout.

Possible traps and issues:

  • sessionStorage is tab-local, which is the desired scope.
  • Ensure the ID is not leaked in URLs or logs.

Docs changes needed:

  • Add comment explaining the persistence strategy in client.ts.

Doc references:

  • frontend/src/api/client.ts getSessionCorrelationId()

Issue #56: MEDIUM - No API Versioning or Deprecation Strategy

Where found:

  • All backend routers register under /api/v1/ prefix but no versioning mechanism exists

Why this is needed: Breaking backend changes immediately break all frontend clients. Without a deprecation path, there is no safe way to evolve the API.

Goal: Define and implement an API lifecycle policy.

What to do:

  1. Document the versioning strategy (URL versioning is already in place; formalize it).
  2. Add a Deprecation response header to endpoints scheduled for removal.
  3. Implement a /api/v2/ prefix for the next breaking change cycle.
  4. Add a CI check that flags new breaking changes against the OpenAPI spec.

Possible traps and issues:

  • Running two API versions simultaneously doubles maintenance surface; set a sunset date policy.

Docs changes needed:

  • Docs/: create API_VERSIONING.md documenting the lifecycle and deprecation process.

Doc references:

  • All router files under backend/app/routers/

Issue #57: MEDIUM - Health Endpoint Does Not Check Subsystems

Where found:

  • backend/app/routers/health.py

Why this is needed: A process that is running but cannot reach the fail2ban socket, database, or config directory still returns 200 OK. Load balancers and orchestrators treat it as healthy and route traffic to it, causing silent failures.

Goal: Health endpoint reflects true readiness of all critical subsystems.

What to do:

  1. Add a structured health check that tests: database connectivity, fail2ban socket accessibility, config directory read access, scheduler liveness.
  2. Return 200 only when all checks pass; return 503 with a JSON body listing failed checks otherwise.
  3. Expose a separate /health/live (process alive) and /health/ready (subsystems ready) endpoint for Kubernetes probes.

Possible traps and issues:

  • Slow health checks (e.g., DB connect timeout) can overwhelm the endpoint under load; set short timeouts per check.

Docs changes needed:

  • Docs/Deployment.md: document liveness vs readiness probe URLs.

Doc references:

  • backend/app/routers/health.py

Issue #58: MEDIUM - Abort Signal Not Propagated in Request Deduplication

Where found:

  • frontend/src/hooks/useFetchData.ts:93-113

Why this is needed: When multiple hook instances share a requestKey, they await a single in-flight promise. When one component unmounts and aborts its signal, the shared request continues and calls setData() / onSuccess() on the unmounted component, causing React "state update on unmounted component" warnings and memory leaks.

Goal: Unmounting a component that joined a deduplicated request must not receive the result.

What to do:

  1. In the deduplication await path, check the component's own abort signal before calling setData() or onSuccess().
  2. Wrap the deduplication subscriber list so each subscriber can individually opt out on abort.

Possible traps and issues:

  • If all subscribers abort before the request resolves, consider whether the underlying request should also be cancelled.

Docs changes needed:

  • frontend/src/hooks/README.md: document abort signal contract for deduplicated requests.

Doc references:

  • frontend/src/hooks/README.md

Issue #59: MEDIUM - Middleware Registration Order Not Validated at Startup

Where found:

  • backend/app/main.py:53+ middleware added via app.add_middleware() without order assertion

Why this is needed: The required order CorrelationId → CSRF → RateLimit is security-critical. A developer adding or reordering middleware silently breaks CSRF validation or produces rate-limit counters with no correlation ID attached.

Goal: Detect incorrect middleware order at startup, not at runtime under attack.

What to do:

  1. After all middleware is registered, introspect app.middleware_stack and assert the expected order.
  2. Write a unit test that instantiates the app and checks middleware ordering.

Possible traps and issues:

  • FastAPI reverses the middleware stack internally (last registered = outermost); account for this when asserting order.

Docs changes needed:

  • backend/app/main.py: add inline comment documenting the required order and why.

Doc references:

  • backend/app/middleware/ individual middleware module docstrings

Issue #60: MEDIUM - NavigationCancellationProvider Orphans Requests on Rapid Navigation

Where found:

  • frontend/src/providers/NavigationCancellationProvider.tsx
  • frontend/src/hooks/useNavigationAbortSignal.ts:42-52

Why this is needed: When a user navigates A → B → C rapidly, B's in-flight requests are not cancelled because B's signal is replaced before B's requests check it. These requests complete and may write stale data into the wrong page's state.

Goal: Every request initiated for a page is cancelled when that page is navigated away from, regardless of navigation speed.

What to do:

  1. Associate each request with the pathname that was active when it started, not the current pathname.
  2. On navigation, abort all controllers whose associated pathname no longer matches the current route.

Possible traps and issues:

  • Requests that intentionally survive navigation (e.g., background syncs) must opt out; provide an ignoreCancellation flag.

Docs changes needed:

  • frontend/src/providers/PROVIDER_ORDER.md: document the cancellation contract.

Doc references:

  • frontend/src/providers/NavigationCancellationProvider.tsx

Issue #61: MEDIUM - Pagination Offset vs Cursor Mode Indistinguishable to Frontend

Where found:

  • backend/app/utils/pagination.py:265-305
  • backend/app/models/response.py:125-180

Why this is needed: The PaginationMetadata object uses sentinel values (total=-1, total_pages=-1) for cursor mode. If a backend endpoint silently switches pagination modes, frontend code using total_pages to render page controls will display -1 with no error.

Goal: Frontend code can reliably detect which pagination mode is in use and render accordingly.

What to do:

  1. Add a mode: "offset" | "cursor" discriminator field to PaginationMetadata.
  2. Update frontend pagination components to branch on mode rather than checking for -1.

Possible traps and issues:

  • Adding a required field is a breaking change; make it optional with a default of "offset" for backward compatibility.

Docs changes needed:

  • API reference: document the mode field and its values.

Doc references:

  • backend/app/utils/pagination.py

Issue #62: MEDIUM - Blocklist URL Validation Is Async With No Rollback on Failure

Where found:

  • backend/app/services/blocklist_service.py
  • backend/app/models/blocklist.py:36-40

Why this is needed: DNS validation runs asynchronously after the model is validated. If validation fails or is slow, concurrent requests can insert duplicate or invalid blocklist sources before the validation result is checked, leaving the database in a dirty state.

Goal: Blocklist source creation is atomic: either validation passes and the row is committed, or validation fails and no row exists.

What to do:

  1. Perform DNS/URL validation inside a database transaction; roll back on failure.
  2. Add a unique constraint on the URL column to catch duplicates at the DB level.
  3. Return a conflict error (409) on duplicate URL submissions.

Possible traps and issues:

  • Async DNS lookup inside a transaction holds the transaction open longer; use a short timeout.

Docs changes needed:

  • API reference: document the 409 conflict response for duplicate URLs.

Doc references:

  • backend/app/services/blocklist_service.py

Issue #63: MEDIUM - Correlation ID Lost Across Background Task Boundaries

Where found:

  • backend/app/tasks/health_check.py:70-74
  • backend/app/utils/correlation.py

Why this is needed: Background tasks that spawn sub-tasks (e.g., health check triggering failover logic) do not propagate the correlation ID ContextVar to child asyncio tasks. Logs from child tasks appear without a correlation ID, breaking distributed tracing.

Additionally, reset_correlation_id() in the finally block clears the ID before all child tasks have logged.

Goal: Every log line emitted during a background job carries its originating correlation ID.

What to do:

  1. Use asyncio.create_task(coro, context=copy_context()) to propagate the ContextVar to child tasks.
  2. Move reset_correlation_id() to after all child tasks have completed.

Possible traps and issues:

  • copy_context() captures a snapshot; mutations in the parent after the copy won't be seen by the child (this is the desired behavior).

Docs changes needed:

  • Add inline comment in health_check.py explaining context propagation.

Doc references:

  • backend/app/utils/correlation.py

Issue #64: MEDIUM - External Logging Failure Silently Swallowed

Where found:

  • backend/app/main.py:192-213

Why this is needed: When Datadog, Papertrail, or Elasticsearch log handler initialization fails, the error is caught, logged as a warning to stdout, and the application continues. In production this means critical logs may never reach the monitoring system, and operators will not know until an incident occurs.

Goal: External logging failures are surfaced to operators at deployment time.

What to do:

  1. Promote the warning to an error and expose it via the health endpoint (Issue #57).
  2. Add a startup check: if EXTERNAL_LOG_REQUIRED=true and initialization fails, abort startup.
  3. Emit a metric/alert on initialization failure.

Possible traps and issues:

  • Making startup fail on logging issues may be too strict for some environments; make EXTERNAL_LOG_REQUIRED optional and default to false.

Docs changes needed:

  • Docs/Deployment.md: document EXTERNAL_LOG_REQUIRED and the health check for logging status.

Doc references:

  • backend/app/main.py logging initialization block

Issue #65: MEDIUM - Abort Selector Inconsistency in useFetchData

Where found:

  • frontend/src/hooks/useFetchData.ts:124-131

Why this is needed: When a request is aborted, refresh() returns the raw response without running the selector() function. In non-aborted paths the selector runs. Callers of refresh() receive different types depending on the abort state, making the return type unreliable and causing subtle state shape mismatches.

Goal: refresh() returns a consistent type regardless of abort state.

What to do:

  1. On abort, return null (or a typed sentinel) instead of the raw response, so callers can handle the aborted case explicitly.
  2. Update the TypeScript return type of refresh() to reflect the nullable result.

Possible traps and issues:

  • Existing callers that ignore the return value are unaffected; callers that use it need to handle null.

Docs changes needed:

  • frontend/src/hooks/README.md: document the null return on abort.

Doc references:

  • frontend/src/hooks/README.md

Issue #67: LOW - Default Page Size Inconsistently Applied Across Routers

Where found:

  • backend/app/routers/history.py:80-84 uses DEFAULT_PAGE_SIZE constant
  • Multiple other routers may hardcode page size values

Why this is needed: Endpoints with different default page sizes create an inconsistent API experience and make it hard to reason about server load. A client that does not pass page_size gets different result counts from different endpoints.

Goal: All paginated endpoints use the same default page size driven by a single constant.

What to do:

  1. Audit all page_size Query parameters across routers.
  2. Replace all hardcoded defaults with DEFAULT_PAGE_SIZE from constants.py.
  3. Add a linting check or unit test that asserts no hardcoded page size defaults exist in routers.

Possible traps and issues:

  • Some endpoints may intentionally use a different page size for performance reasons; document exceptions explicitly.

Docs changes needed:

  • API reference: document the default page size and how to override it.

Doc references:

  • backend/app/utils/constants.py DEFAULT_PAGE_SIZE

Issue #68: LOW - No Reserved Keyword Validation for Jail Names

Where found:

  • backend/app/models/jail.py jail name validated against alphanumeric regex only
  • backend/app/routers/jail_config.py

Why this is needed: Fail2ban uses reserved jail names and command keywords (e.g., all, status, purge). A user-created jail with a reserved name could shadow fail2ban built-in commands or produce confusing behavior when management commands are issued.

Goal: Reject jail names that conflict with fail2ban reserved words at model validation time.

What to do:

  1. Define a FAIL2BAN_RESERVED_JAIL_NAMES set in constants.py.
  2. Add a Pydantic validator on the jail name field that rejects reserved words.
  3. Return a 422 with a descriptive error message.

Possible traps and issues:

  • The reserved word list may change across fail2ban versions; source it from fail2ban documentation and version-gate if necessary.

Docs changes needed:

  • API reference: document the list of reserved jail names.

Doc references:

  • Fail2ban documentation on reserved jail identifiers

Issue #69: LOW - Jail Names Echoed in Error Messages Without Sanitization

Where found:

  • backend/app/exceptions.py:138,351 jail names interpolated directly into error strings

Why this is needed: Although Python's repr() provides basic escaping, user-supplied jail names are reflected back in error messages. If these messages are ever rendered in an HTML context (e.g., a future admin UI or email notification), they become XSS vectors. They also act as confirmation oracles when combined with timing attacks.

Goal: Error messages referencing user input are sanitized before inclusion.

What to do:

  1. Pass user-supplied values through a dedicated sanitize_for_display() helper before interpolation.
  2. Ensure the helper strips or escapes HTML special characters.
  3. For API responses, always return the original (validated) field name rather than the raw user input.

Possible traps and issues:

  • Over-escaping in JSON responses is not needed (JSON is not HTML); apply sanitization only at HTML render boundaries.

Docs changes needed:

  • CONTRIBUTING.md: document the rule that user input must not be echoed raw in messages.

Doc references:

  • backend/app/exceptions.py