Commit Graph

514 Commits

Author SHA1 Message Date
3d1a6f5538 Implement frontend and backend observability alignment
Align frontend and backend error observability with correlation IDs and
structured telemetry for distributed tracing across systems.

Backend changes:
- Add CorrelationIdMiddleware to generate/extract correlation IDs
- Include correlation_id in all ErrorResponse objects
- Store correlation ID in structlog contextvars for automatic inclusion in logs
- Add correlation ID to response headers (X-Correlation-ID)

Frontend changes:
- API client automatically generates session-scoped UUID4 and includes
  X-Correlation-ID header in all requests
- Extract correlation ID from API error responses
- Update error handlers to use telemetry with correlation IDs
- Add telemetry logging to ErrorBoundary, PageErrorBoundary, SectionErrorBoundary
- Implement redaction utilities for privacy-safe logging of sensitive data

Documentation:
- Add observability guidelines to Web-Development.md
  * Correlation ID usage patterns
  * Privacy & security best practices
  * Telemetry event structure
  * Redaction utilities for sensitive data
- Add distributed tracing architecture section to Architecture.md
  * Correlation ID flow across frontend/backend
  * Example troubleshooting scenario
  * Implementation details for future enhancements

Testing:
- Add comprehensive tests for correlation middleware
- Update error boundary tests to verify telemetry integration
- Verify TypeScript and ESLint pass with no warnings

Fixes: Issue #40 - Frontend and backend observability are not aligned

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 18:32:19 +02:00
9a43123b3a docs: Define explicit DI container strategy for backend service graph
- Add comprehensive 'Dependency Wiring and Service Composition' section to
  Architekture.md (§ 2.3) documenting:
  * The lightweight FastAPI Depends() pattern used as composition root
  * Service composition through explicit parameter passing
  * Service context dependencies pattern (SessionServiceContext, etc.)
  * Repository boundary enforcement
  * Lifecycle and scope management
  * Checklist for adding new services

- Update Backend-Development.md to reference the new Architecture section
  from the 'Dependency Layering' section

- Enhance dependencies.py module docstring with clear explanation of:
  * Composition root pattern
  * Explicit over implicit principles
  * Service context dependencies
  * Repository boundary enforcement

This resolves issue #39 by providing clear guidance on dependency wiring
without over-engineering. The pattern uses FastAPI's built-in Depends()
framework and avoids heavyweight container libraries, keeping the solution
lightweight and maintainable.

Fixes: #39

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:25:25 +02:00
b6631b86e4 Add database migration 5: Indexes for history_archive query performance
- Add composite index on (jail, timeofban DESC) for dashboard filtering
- Add composite index on (timeofban DESC, jail, action) for time-range queries
- Add single-column indexes on ip and action for targeted filtering
- Update schema version to 5 and document in Backend-Development.md

Indexes optimize common dashboard and API query patterns with pagination.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:17:58 +02:00
187cd8250d Implement database-backed scheduler lock for multi-worker safety
Enforce single-executor safety regardless of process launcher through a
robust database-backed lock mechanism that works reliably in container
orchestration environments.

Key changes:
1. Add scheduler_lock table to database schema (migration 4)
   - Singleton row (id=1) prevents concurrent execution
   - Stores PID, hostname, creation timestamp, heartbeat timestamp
   - Atomic transaction prevents race conditions

2. Create scheduler lock utility (app/utils/scheduler_lock.py)
   - acquire_scheduler_lock(): Atomically acquire or fail
   - release_scheduler_lock(): Clean up on shutdown
   - update_scheduler_lock_heartbeat(): Keep lock alive (every 10 seconds)
   - get_scheduler_lock_info(): Debug/inspect lock status
   - Stale lock detection: TTL-based (60 second expiry)

3. Reorder startup DAG stages
   - DATABASE now comes first (required for lock acquisition)
   - WORKER_MODE depends on DATABASE (performs lock check after initialization)
   - Maintains all other stage dependencies intact

4. Update startup process (app/startup.py)
   - Replace _check_single_worker_mode() with two-tier check:
     * Fast check: BANGUI_WORKERS env var (if explicitly set to >1)
     * Authoritative check: Database lock (catches misconfiguration)
   - Return startup_db from startup_shared_resources() for lock management

5. Register scheduler lock heartbeat task
   - New task: scheduler_lock_heartbeat (app/tasks/scheduler_lock_heartbeat.py)
   - Updates lock heartbeat every 10 seconds (keeps lock alive)
   - Prevents false positives from temporary load spikes

6. Add lock release to lifespan shutdown (app/main.py)
   - Release lock before closing database
   - Allows other instances to acquire during rolling deployments
   - Graceful handoff between instances

7. Comprehensive test coverage (backend/tests/test_scheduler_lock.py)
   - Lock acquisition success and failure cases
   - Stale lock cleanup on startup
   - Lock release and heartbeat updates
   - Full lifecycle: acquire → heartbeat → release

8. Update documentation (Docs/Architekture.md § 9.3)
   - Explain single-executor requirement
   - Document database-backed locking mechanism
   - Compare with alternative approaches (filesystem, env var)
   - Include troubleshooting guide
   - Container orchestration examples (Docker, Kubernetes, systemd)

Why database-backed instead of filesystem?
   - Atomicity: SQLite transactions prevent TOCTOU race windows
   - Container-safe: Works across containers with shared DB volumes
   - No NFS/SMB edge cases
   - Timestamp-based stale detection (PID reuse is unreliable)
   - More reliable in rolling deployments

Benefits:
   - Works with any process manager (uvicorn, gunicorn, etc.)
   - Handles simultaneous startup attempts correctly
   - Automatic failover on instance crash (stale lock cleanup)
   - Clear error messages with troubleshooting steps
   - No environment variable required (lock is authoritative)
   - Scales to multi-worker deployments if combined with external job store

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:10:53 +02:00
336242ad06 Implement visibility-aware polling to reduce background tab resource usage
- Add usePageVisibility hook to track page visibility state
- Add pauseWhenHidden option to usePolledData (defaults to false for backward compatibility)
- When enabled, polling pauses when page is hidden and resumes with immediate refresh when visible
- Refactor useBlocklistStatus to use usePolledData with pauseWhenHidden=true
- Add comprehensive tests for usePageVisibility hook
- Add polling lifecycle documentation to Web-Development.md

Fixes #36: Polling continues when tab is not visible

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:01:25 +02:00
0a350b3acc Optimize API client headers by method - only set Content-Type and CSRF header as needed
- Only set Content-Type header for requests with a body (POST, PUT, DELETE with body)
- Only set X-BanGUI-Request CSRF header for mutating methods (POST, PUT, DELETE, PATCH)
- GET, HEAD, OPTIONS requests no longer include unnecessary headers, reducing CORS preflights
- Update Web-Development.md to clarify conditional header behavior
- Add comprehensive tests for header behavior by HTTP method

This reduces unnecessary CORS preflight requests on GET endpoints while maintaining
CSRF protection on state-mutating requests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:52:17 +02:00
bc4ba703f0 Fix #34: Replace setup redirect allowlist prefix matching with explicit allowlist
- Replace fragile startswith() matching with explicit path matching
- Split allowlist into _EXACT_ALLOWED (exact paths) and _PREFIX_ALLOWED (prefixes)
- Prefix paths MUST end with '/' to prevent matching unintended paths like /api/setup-debug
- Paths correctly matched: /api/setup, /api/health, /api/docs, /api/redoc, /api/openapi.json, /api/setup/timezone
- Paths correctly blocked: /api/setup-debug, /api/setup123, /api/jails
- Add comprehensive Setup Guard Route Policy documentation to Backend-Development.md
- Update line numbers in documentation to reflect current implementation

This prevents future route additions from accidentally bypassing the setup guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:45:42 +02:00
6bc440dce4 Refactor backend configuration and authentication
- Add comprehensive documentation for backend development
- Improve client IP detection with utility functions and tests
- Update auth router with better error handling
- Refactor config module with environment-based settings

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:39:55 +02:00
dd14ed7e7e Update Tasks.md
- Remove completed task #31 about fire-and-forget reschedule

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:29:49 +02:00
c2dd9f5f55 Add scheduled cleanup for rate limiter (#32)
Implement periodic cleanup of expired rate-limiter entries to prevent
unbounded memory growth during long runtimes.

Changes:
- Create rate_limiter_cleanup task that calls cleanup_expired() every 30 minutes
- Register the task in the startup DAG alongside other background jobs
- Update rate_limiter module documentation with operational notes about the
  cleanup lifecycle and memory management strategy

The cleanup is conservative and only removes IPs with no recent attempts
(all timestamps outside the rate-limit window), so active IPs are preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:28:45 +02:00
18036d53bf Fix issue #31: Make schedule reschedule deterministic and observable
Replace fire-and-forget reschedule pattern with proper async/await:
- Changed reschedule() from fire-and-forget to awaitable async function
- Errors are now properly propagated instead of silently failing
- Added structured logging for reschedule start and completion
- Schedule updates are now deterministic and observable to callers

Changes:
- app/tasks/blocklist_import.py: Convert reschedule to async, remove asyncio.ensure_future
- tests/test_tasks/test_blocklist_import.py: Add tests for error propagation and logging
- Docs/Features.md: Document scheduling reliability guarantees

All 15 blocklist_import tests pass with 100% coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:24:55 +02:00
1302ac821f Fix non-atomic setup persistence across DB contexts (Issue #30)
Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.

## Changes

### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
   - New set_settings_batch() method: writes multiple settings in single
     transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
     or none do, preventing partial state if crash occurs mid-batch.

2. **setup_service.py**: Refactor run_setup() with transactional phases
   - Phase 0: Compute password hash early (before any DB writes) to ensure
     idempotency. Same hash is used throughout retries, preventing divergent
     hashes from bcrypt's random salt.
   - Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
     database_path, then commit. First checkpoint for crash detection.
   - Phase 2 (Filesystem): Initialize runtime database (idempotent)
   - Phase 3 (Runtime DB transaction): Batch-write all settings atomically
   - Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
     setup_completed=1. Final commit point.

3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol

### Testing
- Added 6 new transactionality tests covering:
  - State machine transitions (None → in_progress → complete)
  - Password hash idempotency across retries
  - Atomic batch writes (all-or-nothing persistence)
  - Bootstrap DB state tracking
  - Database path propagation to both DBs
  - Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)

### Documentation
- Updated Docs/Architekture.md with new section 6:
  - Setup state machine with state transitions
  - Transaction boundary documentation
  - Password hash idempotency rationale
  - Backward compatibility notes

## Design Decisions

### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
  hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers

### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:19:53 +02:00
cc4370c50d feat: Add runtime DNS-rebinding protection for blocklist HTTP connections
## Problem
The blocklist URL validation at create/update time has a TOCTOU (time-of-check-to-time-of-use) window.
An attacker can perform a DNS-rebinding attack where:
1. User adds blocklist URL pointing to attacker.com
2. At create time, attacker.com resolves to a public IP → validation passes
3. Later, when fetching, attacker.com resolves to 192.168.1.1 (internal network)
4. HTTP client connects to the private IP, potentially accessing internal services

## Solution
Add runtime destination IP validation at connection time via a custom socket factory:

- Created 'dns_validated_connector.py' with create_dns_validated_socket_factory() that validates
  all resolved IPs before socket creation
- HTTP session now uses the validated socket factory, protecting all blocklist imports globally
- Rejects connections to RFC 1918 private ranges, loopback, link-local, ULA, multicast, and
  reserved addresses (IPv4 and IPv6)
- Added comprehensive test coverage with 13 test cases

## Changes
- backend/app/services/dns_validated_connector.py: Custom socket factory with IP validation
- backend/app/startup.py: Use DNS-validated socket factory in HTTP session creation
- backend/app/utils/ip_utils.py: Updated docstring explaining runtime validation
- backend/app/services/blocklist_downloader.py: Updated module docstring
- backend/app/services/blocklist_service.py: Updated docstrings explaining two-layer protection
- backend/tests/test_services/test_dns_validated_connector.py: Test suite for socket factory
- Docs/Architekture.md: Added detailed section on DNS-rebinding protection

## Testing
- All 13 DNS validation tests pass
- All blocklist downloader tests pass (unaffected by changes)
- Linting: ruff, mypy pass with --strict
- Test coverage: 90% line coverage on dns_validated_connector.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:10:51 +02:00
9072117db3 ## 28) Login failure delay can enable app-layer DoS 2026-04-29 19:02:00 +02:00
1e2576af2a ## 27) Error response body shape is inconsistent 2026-04-28 22:28:02 +02:00
a2129bb9bd Pagination contract is not standardized across endpoints 2026-04-28 21:40:22 +02:00
ad21590f60 No canonical snake_case/camelCase serialization policy 2026-04-28 21:27:26 +02:00
b27765928a Standardize API response envelopes: use items for collection responses and update tests 2026-04-28 20:48:00 +02:00
1c673d600c Standardize API response envelope shapes across all endpoints
This commit standardizes how API responses are wrapped, solving issue #24.

Problem:
- Inconsistent response envelopes (jails vs items vs bans vs no wrapper)
- Frontend required multiple field name variants
- Integration bugs from branching logic
- No clear pattern for different response types

Solution:
- Created response.py with base classes: PaginatedListResponse,
  CollectionResponse, CommandResponse
- Standardized all list/collection responses to use 'items' field
- Domain-specific field names for detail and aggregation responses
- Updated all backends routers and mappers
- Updated frontend types and hooks to match

Changes:
Backend:
- backend/app/models/response.py (new): Base response models
- backend/app/models/ban.py: Updated responses to inherit from bases
- backend/app/models/jail.py: Updated JailListResponse, JailCommandResponse
- backend/app/models/config.py: Updated collection responses
- backend/app/services/jail_service.py: Updated return statements
- backend/app/mappers/ban_mappers.py: Updated 'bans' to 'items'
- backend/tests/test_mappers/test_ban_mappers.py: Updated tests

Frontend:
- frontend/src/types/jail.ts: Updated response interfaces
- frontend/src/types/config.ts: Updated response interfaces
- frontend/src/hooks/useActiveBans.ts: Updated selector
- frontend/src/hooks/useJailList.ts: Updated selector
- frontend/src/hooks/useJailConfigs.ts: Updated selector
- frontend/src/hooks/useConfigActiveStatus.ts: Updated field access
- frontend/src/hooks/useJailAdmin.ts: Updated field access

Documentation:
- Docs/Backend-Development.md: Added § 4.1 API Response Envelope Policy

The policy defines:
1. Paginated lists use PaginatedListResponse (items, total, page, page_size)
2. Non-paginated collections use CollectionResponse (items, total)
3. Detail responses use entity-specific field names (jail, status, settings)
4. Command responses use CommandResponse (message, success, optional target)
5. Aggregations use domain-specific fields (jails, countries, buckets, bans)

All responses now follow one of these patterns, reducing frontend complexity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 10:12:55 +02:00
7ba1cf7ca2 feat: Implement global request lifecycle cancellation on route transitions
Adds a navigation-aware request cancellation mechanism that automatically
aborts all route-specific API requests when the user navigates to a
different route. This prevents silent state-update errors from responses
arriving after component unmount and conserves bandwidth by cancelling
now-irrelevant requests.

Key additions:
- NavigationCancellationContext: Context for managing route-specific signals
- NavigationCancellationProvider: Provider that detects route changes and
  aborts all signals from the previous route
- useNavigationAbortSignal hook: Allows components to subscribe to
  navigation-aware cancellation signals
- Comprehensive tests for the cancellation lifecycle
- Documentation in Web-Development.md for request lifecycle policy

The provider is placed in the app hierarchy between BrowserRouter and
AuthProvider, ensuring consistent cancellation behavior across all routes.

Long-lived background tasks (polling, session validation) can opt-out by
managing their own AbortController lifecycle.

Closes #23 from Tasks.md: No global cancellation policy on route transitions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:58:59 +02:00
e0a4d36fc3 Document storage key registry pattern in Web-Development
Add "Storage Key Registry" subsection explaining:
- Centralize all storage keys in utils/constants.ts
- Never hard-code storage key strings in components
- Document storage type and purpose with JSDoc
- Include code examples for correct usage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:49:23 +02:00
252204ed97 Consolidate frontend storage keys into constants module
- Move magic strings from AuthProvider, MainLayout, and ThemeProvider to
  frontend/src/utils/constants.ts
- Add STORAGE_KEY_AUTHENTICATED, STORAGE_KEY_SIDEBAR_COLLAPSED, and
  STORAGE_KEY_THEME constants with JSDoc descriptions
- Update all three files to import and use centralized keys
- Prevents key drift and typo regressions across the frontend

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:48:28 +02:00
72c4a0ed04 fix: prevent silent auth error swallowing in fetch error utility
- Add setAuthErrorHandler() registration mechanism to utils/fetchError.ts
- Implement fallback logging when auth errors (401/403) occur without registered handler
- Update AuthProvider to register both API client and fetch error handlers
- Ensure auth errors are handled deterministically at multiple layers
- Add comprehensive tests for auth error handler registration and fallback logging
- Update Web-Development.md documentation with auth error handling contract

Fixes issue #21: Silent auth errors are now caught and logged if the handler is not
registered, preventing actionable errors from being silently swallowed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:45:08 +02:00
ca23858946 Add skeleton loading components for progressive UX
Implement standardized skeleton loading placeholders to reduce perceived
loading time and prevent layout shift during data fetches. These components
match actual content dimensions exactly, improving perceived responsiveness.

New skeleton components in src/components/skeletons/:
- SkeletonTable: Table/grid loading with customizable rows and cells
- SkeletonTableRow: Individual animated skeleton row
- SkeletonChart: Chart/graph loading with bars matching dimensions
- SkeletonStat: Stat card loading with label and value
- SkeletonFormField: Form input loading placeholder
- PageLoadingSkeleton: Convenience wrapper for page-level loading states

Implementation details:
- All skeletons use global 'skeleton-pulse' animation (2s cycle)
- Dimensions match real content to prevent layout shift on arrival
- Marked with aria-hidden and role=presentation for accessibility
- Theme-aware colors using Fluent UI tokens
- Respects prefers-reduced-motion setting

Updates:
- ChartStateWrapper: Uses SkeletonChart instead of spinner
- PageFeedback: Added PageLoadingSkeleton component
- App.tsx: Injects skeleton styles at startup
- Web-Design.md: Added § 8a with loading UX guidance and usage examples

All components tested (22 tests, 100% passing) and linted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:40:10 +02:00
2fea513c9c docs: make provider dependency chain explicit with documentation and tests
This addresses issue #19 by making the implicit provider dependency order
explicit and order-sensitive.

Changes:
1. Created PROVIDER_ORDER.md - comprehensive documentation explaining:
   - The provider hierarchy from outermost to innermost
   - Why each provider must be at its position
   - Order-sensitive pitfalls and what would break
   - Guidelines for adding new providers in the future

2. Added provider composition tests (providerComposition.test.tsx):
   - 13 comprehensive tests validating provider order and dependencies
   - Tests verify all providers mount correctly
   - Tests check that hooks only work inside correct providers
   - Tests validate async initialization (AuthProvider, TimezoneProvider)
   - Tests verify theme persistence and notification propagation

3. Updated App.tsx with inline documentation:
   - Added detailed provider order contract in JSDoc header
   - Inline comments explaining each provider's position
   - Reference to PROVIDER_ORDER.md for detailed rationale

4. Updated Web-Development.md:
   - Added new section 5.5 'Provider Order Contract'
   - Documents provider hierarchy and rationale
   - Links to comprehensive provider documentation
   - References regression test suite

All tests pass. TypeScript compilation succeeds. Build succeeds.
The provider order is now explicit and future refactors can validate
compliance through the regression test suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:30:22 +02:00
d10145e5d6 refactor(frontend): extract shared fetch lifecycle into useFetchData base hook
Eliminates ~100 lines of duplicated code across useListData and usePolledData
by creating a composable base hook that handles:
- Abort controller lifecycle and cancellation
- Loading/error state management
- Fetch error handling
- Unmount cleanup

Changes:
- Create hooks/useFetchData.ts with base fetch lifecycle (no effects on consumers)
- Refactor useListData to compose useFetchData, returns items array by default
- Refactor usePolledData to compose useFetchData, adds polling and focus-refetch
- Add comprehensive tests for useFetchData base hook
- Document hook architecture and composition pattern in Web-Development.md

Result: Both hooks now use shared primitives, reducing maintenance burden
and ensuring consistent cancellation/error handling across all data fetches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:23:34 +02:00
5166789b68 feat: Implement typed error contracts in generic hooks
Introduce discriminated FetchError union type to replace weak string error
handling in API calls and hooks. Enables actionable error diagnostics.

Changes:
- Create types/api.ts with FetchError discriminated union (api_error,
  network_error, abort_error)
- Export type guards: isAuthError, isAbortError, isNetworkError, isApiError
- Update useListData and usePolledData to expose typed FetchError instead of
  string
- Add getErrorMessage() helper to extract displayable messages from FetchError
- Add createStringErrorAdapter() for backward compatibility with string error
  state
- Update handleFetchError() to work with both FetchError and string setters
- Update all consumer hooks to expose typed errors
- Update components to use getErrorMessage() when displaying errors
- Update tests to mock FetchError instead of strings
- Add comprehensive typed error model documentation to Web-Development.md

This enables better error handling patterns:
- Check error.type to distinguish between API, network, and abort errors
- Extract status codes for specific handling (401/403 auth, 50x server errors)
- Maintain backward compatibility with existing string-based error states

All TypeScript compilation passes with no errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 09:13:47 +02:00
6c8e2b3423 fix(#16): Establish consistent API usage layering patterns
- Refactor useActiveBans to use useListData generic hook instead of inline state management
- Refactor useBans to use useListData generic hook for consistency
- Add comprehensive 'API Usage Layering' section to Web-Development.md documenting:
  - Tier 1: API Functions (pure wrappers around HTTP calls)
  - Tier 2: Reusable Generic Hooks (useListData, useConfigItem for common patterns)
  - Tier 3: Domain Hooks (compose Tier 2 with domain-specific logic)
  - Tier 4: Components (receive data/actions via props or context)
- Document pattern for action callbacks with automatic data refresh
- List anti-patterns to avoid for future consistency

These changes improve composability, testability, and reduce code duplication by
establishing a clear convention for data-fetching patterns across the frontend.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:53:36 +02:00
f169bbd39a test: fix BanUnbanForm tests with NotificationProvider wrapper
Include NotificationContainer in test setup so notification messages
appear in the DOM and can be found by tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:44:39 +02:00
ae34d98859 feat: centralized error notification service (issue #15)
- Create NotificationService with context provider for centralized error/success messaging
- Add NotificationContainer component to render notification stack
- Integrate NotificationProvider into App root
- Refactor BanUnbanForm to use notification service instead of local error state
- Update fetchError utility to optionally use notification callbacks
- Add comprehensive error handling guidelines to Web-Development.md
- Prevent duplicate notifications with deduplication logic
- Support auto-dismiss with configurable TTL per notification type

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:41:33 +02:00
da6433b2cf Improve error boundary granularity with page and section level boundaries
Implement three-level error boundary strategy:
- Top-level (app shell): catches critical failures
- Page-level: preserves navigation when page crashes
- Section-level: graceful degradation for charts/tables

Create new components:
- PageErrorBoundary: wraps page routes
- SectionErrorBoundary: wraps data-heavy sections

Enhance ErrorBoundary with customizable titles, messages, and reload behavior.

Apply page boundaries to all route handlers in App.tsx.

Apply section boundaries to:
- DashboardPage: server status, ban trend, country charts, ban list
- JailsPage: jail overview, ban/unban form, IP lookup
- MapPage: world map, ban table
- ConfigPage: configuration editor
- HistoryPage: history table, IP detail view
- BlocklistsPage: sources, schedule, import log

Update Web-Development.md with error boundary strategy documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:33:39 +02:00
42beb9cf3b refactor: Decompose ConfigPage into focused routing and component layers
Split the over-centralized ConfigPage into focused, composable layers:

1. useTabRouter hook: Encapsulates tab state management and URL synchronization
   - Maintains selected tab and active item (e.g., jail name)
   - Syncs state to location.state for deep linking and browser history
   - Supports bookmarkable URLs and back/forward navigation

2. ConfigPageContainer: Orchestrates tab navigation
   - Manages TabList and routes tab selection events
   - Conditionally renders tab content panels
   - Delegates domain-specific logic to tab components

3. ConfigPage: Focused page layout component
   - Renders page structure (header, title, description)
   - Delegates tab orchestration to ConfigPageContainer
   - No routing or tab state logic

Benefits:
- Page is now 30 lines vs 125 lines (76% reduction)
- Tab state management is reusable for other multi-tab pages
- Each tab component remains focused on domain-specific UI
- Deep linking and browser history work out of the box
- Easier to test and maintain

Added comprehensive tests:
- useTabRouter: 6 tests covering state initialization, tab selection, and deep linking
- ConfigPageContainer: 8 tests covering tab rendering and navigation
- ConfigPage: 3 tests for page structure

Updated Web-Development.md with tab orchestration pattern documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:27:36 +02:00
69a5f0ceb1 refactor: eliminate prop drilling in JailsPage with context provider
Replace multi-hop prop forwarding with a dedicated JailContext that manages
jail state and actions. This reduces coupling, simplifies the component hierarchy,
and makes the data flow more explicit.

Changes:
- Create JailContext.tsx with JailProvider and useJailContext hook
- Wrap JailsPage content with JailProvider to expose jail state
- Refactor JailOverviewSection to use useJailContext instead of props
- Remove 10 props from JailOverviewSection component signature
- Add comprehensive documentation on state ownership and prop drilling

Benefits:
- Eliminates unnecessary prop chains through intermediate components
- Makes component contracts clearer (no longer need to pass unrelated props)
- Simplifies future refactoring of jail-related functionality
- Sets a pattern for other page-scoped state management

Testing:
- TypeScript type check passes (tsc --noEmit)
- Frontend builds successfully
- Existing JailsPage tests pass with new context structure

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:20:29 +02:00
ace8930482 Update documentation
- Update Backend-Development.md with recent changes
- Update Tasks.md with current status

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:16:03 +02:00
e86ab6dad1 10) Implement explicit startup DAG for resource initialization
- Created StartupDAG class to orchestrate startup stages with explicit dependencies
- Defined 6 startup stages: WORKER_MODE → DATABASE → GEO_CACHE → HTTP_SESSION → SCHEDULER → TASKS
- Each stage has prerequisites, error handling, and rollback support
- Refactored startup_shared_resources() to use the DAG
- Added StartupContext for resource tracking and failure management
- Partial failures automatically roll back all completed resources in reverse order
- Added health checks to verify all resources initialized successfully
- Comprehensive test coverage: 15 DAG unit tests + 3 integration tests + 6 existing tests
- Documented startup DAG in Architekture.md with detailed stage descriptions and failure modes

This replaces implicit ordering with explicit dependency tracking, making lifecycle
changes safe and failure modes predictable. Hidden order dependencies no longer exist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:08:05 +02:00
a273b96563 feat: Complete repository protocol coverage
- Add missing protocol methods to Fail2BanDbRepository:
  - get_ban_event_counts: Aggregate ban events per IP (used in ban_service)

- Add missing protocol methods to GeoCacheRepository:
  - delete_stale_entries: Remove old geo cache entries (used in geo_cache_cleanup)

- Add missing protocol methods to HistoryArchiveRepository:
  - purge_archived_history: Remove archived entries older than age threshold

- Add comprehensive protocol compliance tests:
  - Created test_protocol_compliance.py with 8 test classes
  - Validates all 7 repository modules fully implement their protocols
  - Prevents silent protocol drift when methods change signatures
  - Tests verify no unexpected public methods in repository modules

- Update documentation:
  - Add Repository Protocol Coverage Checklist to Backend-Development.md
  - Document procedure for adding new repositories with protocol definitions
  - List current protocol coverage (all 7 repositories, 40 total methods)

- All repositories now have 100% protocol coverage:
  - SessionRepository: 4 methods
  - SettingsRepository: 4 methods
  - BlocklistRepository: 6 methods
  - ImportLogRepository: 4 methods
  - GeoCacheRepository: 13 methods
  - HistoryArchiveRepository: 5 methods
  - Fail2BanDbRepository: 8 methods

This ensures:
- Enhanced mockability for testing
- Static contract verification
- Prevention of protocol drift
- Better IDE support and type checking

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 07:58:57 +02:00
52a4d04d92 Task 8: Standardize modeling style (TypedDict vs Pydantic)
Convert inconsistent modeling style to standardized Pydantic models for all
external-facing data structures while maintaining TypedDict compatibility where
appropriate for internal layer-private structures.

Changes:
- Converted IpLookupResult TypedDict to use IpLookupResponse Pydantic model
  in jail_service.lookup_ip() for consistency with routers
- Added GeoCacheEntry Pydantic model for geo cache repository rows
- Converted GeoCacheRow TypedDict to use GeoCacheEntry alias
- Converted ImportLogRow TypedDict to use ImportLogEntry alias
- Updated routers and services to work with Pydantic models
- Updated all tests to use Pydantic model field access (attributes)
  instead of dict subscripting

Documentation:
- Added 'Model Type Usage by Layer' section to Backend-Development.md
- Defines when TypedDict is allowed (internal structures) vs Pydantic
  (external-facing, cross-boundary data)
- Provides clear guidance on modeling conventions per layer

Benefits:
- Consistent validation and serialization behavior
- Better IDE support and type checking
- Clearer separation of concerns by layer
- Reduced maintenance cost from mixed validation approaches

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 07:53:30 +02:00
3888c5eb3f Refactor ban management with domain models and mappers
- Add ban domain model for core business logic separation
- Implement mapper pattern for DTO/domain conversions
- Update ban service with new domain-driven approach
- Refactor router endpoints to use new architecture
- Add comprehensive mapper tests
- Update documentation with architecture changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 07:46:02 +02:00
507f153ab9 Enforce repository boundary: Remove DbDep from routers
This commit enforces the repository boundary by eliminating direct database connection
dependencies (DbDep) from all routers. Routers now depend on service context dependencies
that combine the database connection with the related repositories.

Changes:
- Add 5 service context dependencies in dependencies.py:
  * SessionServiceContext: db + session_repo
  * BlocklistServiceContext: db + blocklist_repo + import_log_repo + settings_repo
  * SettingsServiceContext: db + settings_repo
  * BanServiceContext: db + fail2ban_db_repo
  * HistoryServiceContext: db + fail2ban_db_repo + history_archive_repo

- Refactor all 9 routers (auth, bans, blocklist, config_misc, dashboard, geo,
  history, jails, setup) to use service contexts instead of DbDep.

- Update Backend-Development.md with clear examples of the new pattern and
  documentation of available service contexts.

Rationale:
- Enforces the repository boundary through the dependency system
- Makes database operations explicit and auditable
- Improves testability by allowing service contexts to be mocked
- Prevents accidental direct database access from routers

The deprecated DbDep remains available for backward compatibility with
services that have not yet been refactored, but routers can no longer import it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 07:35:23 +02:00
813cf09bed Enforce repository boundary for persistence access
- Hide raw database connections (DbDep) from routers by removing from public exports
- Maintain DbDep as deprecated export for backward compatibility
- Add _DbDep internal dependency for use by other dependencies like require_auth
- Update module docstring to explain dependency layering rules
- Add comprehensive documentation section on dependency layering to Backend-Development.md

This enforces the architectural boundary where:
- Routers depend on repository dependencies (SessionRepoDep, BlocklistRepositoryDep, etc)
- Services orchestrate operations through repositories
- Only repositories execute SQL queries

The repository boundary is now technically enforced through the dependency injection
system, making it impossible for routers to accidentally bypass repositories and
access the database directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 19:04:52 +02:00
afc1e44e99 Implement centralized exception handling and validation
- Add custom exception classes for structured error handling
- Implement global exception handlers in FastAPI application
- Add comprehensive request/response validation
- Create exception contract tests for validation
- Update backend development documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:52:12 +02:00
2e221f6852 Refactor: Move module-level mutable flags to JailServiceState
TASK-004: Replace module-level mutable runtime flags in service layer with
injected state holder, eliminating hidden global state and improving testability
and synchronization boundaries.

Changes:
- Create JailServiceState dataclass in app/utils/runtime_state.py to hold
  backend capability cache and synchronization lock
- Add JailServiceState as a field in RuntimeState (with default_factory)
- Remove module-level _backend_cmd_supported and _backend_cmd_lock from
  jail_service.py
- Refactor _check_backend_cmd_supported() to accept state parameter
- Inject JailServiceState into list_jails() and _fetch_jail_summary() via
  parameters
- Add get_jail_service_state() dependency provider in app/dependencies.py
- Add JailServiceStateDep type alias for router injection
- Update jails router to receive and pass state to service functions
- Update all tests to use jail_service_state fixture and pass state to functions
- Remove duplicate _MAX_PAGE_SIZE constant definition
- Document mutable state management in Backend-Development.md
- Update Architecture.md to describe JailServiceState and state nesting pattern

Benefits:
- Eliminates global mutable state and associated race conditions
- Makes state visible to callers (not hidden in module scope)
- Enables test isolation (each test gets fresh state)
- Prepares codebase for multi-worker deployments (state can be extracted to
  shared backend)
- Synchronization boundaries are now explicit (state.get_backend_cmd_lock())

Compliance:
- All tests pass (17 passed in TestListJails, TestGetJail, TestLockInitialization)
- No ruff linting errors
- Type-safe: JailServiceState properly typed with asyncio.Lock, bool | None

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:42:52 +02:00
79112c0430 Remove completed task from documentation
Task #2 about hidden cross-service coupling has been resolved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:35:06 +02:00
e08a16c7dd Refactor: Split blocklist import flow into focused components
Extracted the monolithic import_source() function (776 lines) into focused,
testable components with clear single responsibilities:

- BlocklistDownloader: HTTP download with exponential backoff retry logic
  * Handles transient failures (429, 5xx errors, timeouts)
  * Configurable retry attempts and backoff strategy
  * 93% test coverage

- BlocklistParser: Parse and validate IP addresses
  * Extract valid IPv4/IPv6 addresses from text
  * Skip CIDRs and malformed entries gracefully
  * Separate parsing from validation concerns
  * 100% test coverage

- BanExecutor: Ban execution with error handling
  * Ban IPs via fail2ban socket
  * Stop on JailNotFoundError (jail doesn't exist)
  * Continue on JailOperationError (individual ban failures)
  * 100% test coverage

- BlocklistImportWorkflow: Thin orchestrator
  * Coordinates the download → parse → ban → log flow
  * Pre-warms geo cache with newly banned IPs
  * 96% test coverage

- blocklist_service.py: Maintains public API
  * Source CRUD (create, read, update, delete)
  * URL validation and preview functionality
  * Scheduling configuration and import triggers
  * 92% test coverage

Benefits:
* Each component is independently testable with mock dependencies
* Error handling is explicit and localized
* Components can evolve independently
* Logging is contextual and clear
* Retry and transient error handling are isolated

Testing:
* All 36 existing blocklist_service tests pass
* All 13 blocklist import task tests pass
* Added 17 comprehensive component unit tests
* Combined 96%+ coverage on new modules
* Zero type errors in new code

Documentation:
* Updated Refactoring.md with detailed architecture notes
* Added component architecture diagram to Architekture.md
* Documented ownership and responsibilities of each component

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:34:11 +02:00
3bbf413c55 refactor: Make service dependencies explicit and injectable
Remove hidden cross-service coupling by making dependencies explicit through
dependency injection while maintaining backward compatibility via lazy imports.

Key changes:
- history_service and ban_service: Removed direct module-level imports of
  fail2ban_metadata_service, added optional service parameters to functions
- Added get_fail2ban_metadata_service() provider to dependencies.py
- Updated history router to inject Fail2BanMetadataService dependency
- history_service functions now use lazy imports in fallback paths for
  backward compatibility when service is not explicitly injected
- All test patches updated to use internal _get_fail2ban_db_path() helper
- jail_config_service and jail_service already follow best practices

This pattern prevents circular imports, makes services testable via explicit
mocking, and documents service dependencies clearly.

Fixes: Instructions.md #2 - Hidden cross-service coupling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:26:08 +02:00
bc315b936b Refactor services and update documentation
- Refactor ban_service.py with improved error handling
- Refactor blocklist_service.py for better code organization
- Update geo_cache.py with performance improvements
- Update backend development guide and task documentation
- Update runner.csx script

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 20:27:04 +02:00
93021500c3 TASK-033: Remove session token from JSON response body
Fixes a critical security vulnerability where the session token was
being returned in the JSON response body of POST /api/auth/login.
This exposed the token to JavaScript, allowing malicious scripts to
steal it and bypass the HttpOnly cookie protection.

Changes:
- Backend: Remove 'token' field from LoginResponse model (auth.py)
- Backend: Update login() endpoint to return only 'expires_at'
- Frontend: Update LoginResponse type to exclude 'token' field
- Backend: Update test helper _login() to extract token from cookie
- Backend: Update test cases to verify token is NOT in response body
- Documentation: Add section 'Authentication Endpoints' in Backend-Development.md
- Documentation: Update Web-Development.md to explain HttpOnly cookie benefits

Security benefit: Session tokens are now only accessible via HttpOnly
cookies, protected from JavaScript access, XSS attacks, and malicious
third-party scripts. The frontend continues to use only the cookie for
authentication.

All auth tests pass (23 tests). Type checking and linting pass with
zero errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 19:38:33 +02:00
e2560f5db0 TASK-032: Implement geo_cache retention policy and cleanup
Add automatic cleanup of stale geolocation cache entries to prevent
unbounded database growth. Resolves the issue where unique IP addresses
accumulated indefinitely in the geo_cache table, degrading query performance.

## Changes

### Database Schema (Migration 3)
- Add 'last_seen' column to geo_cache table tracking last reference time
- Existing entries default to current timestamp

### Repository Layer (geo_cache_repo.py)
- Update upsert_entry() to set/refresh last_seen on insert/update
- Update upsert_neg_entry() to set/refresh last_seen on negative cache hits
- Update bulk_upsert_entries() to set/refresh last_seen in batch operations
- Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries

### Background Task (geo_cache_cleanup.py)
- New APScheduler task that runs nightly (24-hour interval)
- Calculates cutoff as 90 days ago from current time (UTC)
- Deletes all entries with last_seen older than cutoff
- Logs operation results (info when deleted > 0, debug when 0 deleted)
- Configurable retention period via GEO_CACHE_RETENTION_DAYS constant

### Application Startup (startup.py)
- Register geo_cache_cleanup task in scheduler during app startup
- Placed after geo_cache_flush in task registration order

### Tests
- Add delete_stale_entries test cases covering:
  * Removal of old entries beyond cutoff
  * No deletion when all entries are recent
  * Empty table edge case
- Update existing test fixtures to include last_seen column
- Add full test suite for cleanup task registration and execution

### Documentation
- Architekture.md: Document cleanup task, update schema/diagram
- Backend-Development.md: Add retention policy documentation

## Behavior

When an IP is accessed, its last_seen is refreshed. After 90 days of no
access, an IP is purged by the nightly cleanup. On next encounter, the IP
is re-resolved from MaxMind MMDB or ip-api.com (if configured).

This is acceptable because:
1. Stale geolocation data may become inaccurate over time
2. Re-resolution cost is minimal compared to unbounded storage growth
3. Active IPs maintain fresh data through their last_seen updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 19:24:34 +02:00
32aad186c3 TASK-031: Enforce bcrypt 72-byte password limit
Bcrypt silently truncates passwords at 72 bytes, so passwords longer than 72
characters provide no additional security. This commit enforces the 72-byte
maximum across the authentication and setup flows.

Changes:
- Add max_length=72 to LoginRequest.password and SetupRequest.master_password
- Update field validator in SetupRequest to explicitly check max_length
- Add comprehensive tests for password length validation (6 new test cases)
- Document the 72-byte limitation in Features.md (master password options)
- Add new section 12 'Password Hashing' in Backend-Development.md explaining:
  - The bcrypt truncation behavior
  - Why the limit is enforced
  - The validation flow from frontend to backend
  - What happens when passwords exceed the limit

All existing tests pass, no regressions introduced.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 15:38:20 +02:00
1d91e24a88 TASK-030: Secure IP geolocation with MMDB-primary resolver
Make MaxMind GeoLite2-Country MMDB the primary IP resolver (local, encrypted)
and demote ip-api.com to optional fallback only (disabled by default).

Changes:
- Add geoip_allow_http_fallback config flag (default False) to Settings
- Refactor GeoCache.lookup() and lookup_batch() to try MMDB first
- Update startup.py to pass config flag and log security warning when HTTP enabled
- Update all 49 tests to reflect new MMDB-primary strategy
- Add comprehensive geoip configuration section to Backend-Development.md
- Update Architekture.md to show MMDB + optional HTTP in system dependencies
- Update .env.example with BANGUI_GEOIP_DB_PATH and HTTP fallback flag

Security impact:
- 99% of IP addresses (successful MMDB lookups) now stay local, encrypted
- HTTP-only IPs are cached for 5 minutes to minimize external calls
- Operators must explicitly enable HTTP fallback (security-conscious default)
- GDPR/CCPA compliance: no PII sent over unencrypted networks by default

Fixes TASK-030: Resolved plaintext IP transmission to ip-api.com

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 15:31:39 +02:00