Commit Graph

54 Commits

Author SHA1 Message Date
65fe747cba feat(backend): add deprecation middleware and API versioning support
- Add deprecation middleware for warning headers on sunset endpoints
- Add jails_v2 router for API v2 migration path
- Update CI workflow with new test coverage
- Update API versioning documentation
- Remove completed tasks from Tasks.md
2026-05-04 00:03:52 +02:00
ae9313568e feat: enforce single-worker at startup
Fail with RuntimeError when WEB_CONCURRENCY or BANGUI_WORKERS > 1.

In-memory session cache, rate-limit windows, and runtime state are
process-local. Multi-worker silently causes stale limits, ghost sessions,
inconsistent status.

Skipped when TESTING=1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-03 20:33:23 +02:00
2df029f7e8 refactor(ban_service): extract _bans_by_country_load_data helper
Break up long function into focused helper. Load data logic separate from aggregation.
2026-05-03 17:00:34 +02:00
5058a50143 Refactor backend: fix geo cache cleanup, scheduler heartbeat, correlation middleware; update docs 2026-05-03 16:02:40 +02:00
b631c1c546 feat(backend): implement graceful shutdown for container stop
Graceful shutdown ensures in-flight operations complete before process exits:
- Lifespan shutdown handler drains pending tasks with 25s timeout
- Scheduler stops accepting new jobs immediately
- HTTP session, external logging, scheduler lock, DB conn closed cleanly
- 25s Python timeout leaves 5s margin before Docker's 30s SIGKILL

Files changed:
- backend/app/main.py: enhanced _lifespan shutdown with task drain
- Docker/Dockerfile.backend: documented signal handling in header
- Docker/docker-compose.yml: added stop_grace_period: 30s
- Docker/compose.prod.yml: added stop_grace_period: 30s
- Docs/Deployment.md: new Graceful Shutdown section with sequence table
- Docs/TROUBLESHOOTING.md: new Graceful Shutdown Issues section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-02 22:47:10 +02:00
cc6dbcf3f0 feat: implement API versioning /api/v1/
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-02 21:29:30 +02:00
1af67eb0ce Add Application Performance Monitoring (APM) with Prometheus metrics
- Backend: Implement Prometheus metrics collection
  - Add prometheus-client dependency
  - Create metrics utility module with HTTP request tracking counters, histograms, gauges
  - Implement MetricsMiddleware to track request latency, count, and active requests
  - Add /metrics endpoint to expose metrics in Prometheus text format
  - Normalize paths to prevent cardinality explosion (e.g., /api/{id} for UUIDs)
  - Exclude /metrics and /health from detailed tracking

- Frontend: Add web vitals and API metrics collection
  - Install web-vitals library (v4.0.0) for Core Web Vitals tracking
  - Create metrics utility module for FCP, LCP, CLS, INP, TTFB collection
  - Implement useTrackedFetch hook for automatic API call metrics (method, endpoint, status, duration)
  - Initialize web vitals tracking in App component on mount
  - Provide exportMetrics() for sending metrics to backend

- Testing:
  - Add comprehensive backend metrics tests (9 tests, 100% coverage)
  - Add comprehensive frontend metrics tests (10 tests)
  - All tests passing

- Documentation:
  - Expand Docs/Observability.md with complete APM section
  - Include metrics reference, integration examples (Prometheus, Datadog, NewRelic)
  - Add troubleshooting guide and best practices for cardinality management
  - Update Tasks.md to mark APM task as complete

Metrics exposed:
- bangui_http_requests_total: HTTP request count by method, endpoint, status
- bangui_http_request_duration_seconds: Request latency histogram
- bangui_http_active_requests: Active request gauge
- Web Vitals: CLS, FCP, INP, LCP, TTFB with ratings
- API metrics: endpoint, method, status, duration, timestamp

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 18:33:14 +02:00
37078b742b Implement structured logging to centralized platforms (Datadog, Papertrail, ELK)
This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure.

## Key Changes:

### 1. New Documentation: Docs/Observability.md
- Comprehensive guide to logging architecture and configuration
- Covers all three supported platforms (Datadog, Papertrail, Elasticsearch)
- Includes best practices, security considerations, and troubleshooting
- Documents sensitive data handling and compliance requirements

### 2. Core Implementation: app/utils/external_logging.py
- ExternalLogHandler: Abstract base class for non-blocking log delivery
- DatadogLogHandler: HTTP API integration with JSON payloads
- PapertrailLogHandler: Syslog protocol over TCP
- ElasticsearchLogHandler: Bulk API integration with NDJSON format
- Features:
  - Async buffering with configurable batch size and flush interval
  - Exponential backoff retry logic
  - Non-blocking delivery (never blocks application logic)
  - Proper error handling and internal logging
  - Lifecycle management (start/shutdown)

### 3. Configuration: app/config.py
- New Settings fields for external logging:
  - external_logging_enabled (default: False)
  - external_logging_provider (datadog/papertrail/elasticsearch)
  - external_logging_buffer_size (default: 1000)
  - external_logging_flush_interval_seconds (default: 5.0)
  - Provider-specific configuration (API keys, hosts, batch sizes)
- All fields have sensible defaults
- Full field validation and normalization

### 4. Integration: app/main.py
- Global _external_log_handler for application lifecycle
- _external_logging_processor: structlog processor for handler integration
- Updated _configure_logging(): Add handler to processor chain when enabled
- Updated _lifespan(): Initialize handler before startup, shutdown on termination

### 5. Tests: backend/tests/test_external_logging.py
- 20 comprehensive tests covering all handlers and factory
- Configuration validation tests
- All tests passing

## Design Decisions:

1. **Non-blocking Delivery**: External logging never blocks request handling.
   Failures are logged locally but don't impact application.

2. **Buffering Strategy**: In-memory buffer with configurable size prevents
   unbounded memory growth. When buffer fills, oldest logs are dropped with
   a warning.

3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried
   with exponential backoff. Permanent failures (bad credentials) are logged
   and skipped.

4. **Disabled by Default**: External logging is opt-in via environment
   variables, maintaining backward compatibility with existing deployments.

5. **Provider Flexibility**: Support for multiple platforms allows users to
   choose based on their infrastructure (cloud-native, on-premise, etc).

## Backward Compatibility:

- All new configuration fields have defaults
- External logging disabled by default
- No changes to existing logging behavior unless explicitly configured
- No new required dependencies

## Testing:

- All 20 new tests passing
- Existing tests unaffected (same count of passing tests)
- Configuration validation tested
- Handler creation and lifecycle management tested

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 18:25:26 +02:00
4f7316c484 Add unified RequestValidationError handler to unify error response schema
- Add RequestValidationError handler that converts Pydantic validation errors to unified ErrorResponse format
- Ensures all error responses return consistent schema: code, detail, metadata, correlation_id
- Add field_errors count and first_field location to metadata for validation errors
- Register handler in exception handler hierarchy before HTTPException handler
- Add comprehensive tests for validation error responses
- Update Backend-Development.md documentation to include correlation_id field and validation error details
- All 44 error-related tests pass (38 existing + 6 new validation tests)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 15:49:39 +02:00
400ab1a3f1 Add security headers middleware and documentation
- Add SecurityHeadersMiddleware to backend/app/main.py
  - Implements Content-Security-Policy: default-src 'self'
  - Implements X-Frame-Options: DENY (clickjacking protection)
  - Implements X-Content-Type-Options: nosniff (MIME-sniffing protection)
  - Implements X-XSS-Protection: 1; mode=block (browser XSS filters)
- Add CSP meta tag to frontend/index.html for defense-in-depth
- Create Docs/Security.md with comprehensive security headers documentation
- Add test suite (backend/tests/test_security_headers_middleware.py) with 5 tests
  - Tests verify headers are present on success and error responses
  - Tests ensure all four security headers are correctly set
- All existing tests continue to pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 21:33:08 +02:00
3bd9848a08 Implement global rate limiter and refactor auth middleware
- Add global rate limiter utility with configurable limits and cleanup
- Move rate limiting logic to middleware for consistent application
- Update auth routes to use new rate limiter
- Add comprehensive tests for rate limiter functionality
- Update documentation with backend development guidelines and tasks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 21:26:31 +02:00
ac53a56ae7 Update backend configuration and documentation
- Modified main.py with backend updates
- Updated Tasks.md documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 20:10:57 +02:00
2db635ae19 Fix exception handler overlap issue - add DomainError catch-all handler
**Problem:** Broad exception handlers created fragility where adding a new
DomainError subclass without explicit registration would silently fall through
to the generic exception handler, losing the specific error_code and metadata.

**Solution:**
1. Import DomainError in main.py for explicit handler registration
2. Fix type hints in exception handlers from 'Exception' to specific types
   - NotFoundError handler now typed as 'NotFoundError'
   - BadRequestError handler now typed as 'BadRequestError'
   - ConflictError handler now typed as 'ConflictError'
   - DomainError handler now typed as 'DomainError'
   - ServiceUnavailableError handler now typed as 'ServiceUnavailableError'
3. Add DomainError as an explicit catch-all handler in the registration chain
   - Positioned after specific handlers, before HTTPException
   - Any unregistered DomainError subclass now gets correct error_code + metadata
4. Document the exception handler hierarchy with detailed comments
5. Update Backend-Development.md with handler hierarchy documentation
6. Update Architekture.md section 2.2 with exception handler details
7. Fix test expectations in test_main.py to verify ErrorResponse format

**Impact:** Any new DomainError subclass now automatically gets correct HTTP 500
status, error_code, and metadata - even if developer forgets explicit handler.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:44:43 +02:00
3d1a6f5538 Implement frontend and backend observability alignment
Align frontend and backend error observability with correlation IDs and
structured telemetry for distributed tracing across systems.

Backend changes:
- Add CorrelationIdMiddleware to generate/extract correlation IDs
- Include correlation_id in all ErrorResponse objects
- Store correlation ID in structlog contextvars for automatic inclusion in logs
- Add correlation ID to response headers (X-Correlation-ID)

Frontend changes:
- API client automatically generates session-scoped UUID4 and includes
  X-Correlation-ID header in all requests
- Extract correlation ID from API error responses
- Update error handlers to use telemetry with correlation IDs
- Add telemetry logging to ErrorBoundary, PageErrorBoundary, SectionErrorBoundary
- Implement redaction utilities for privacy-safe logging of sensitive data

Documentation:
- Add observability guidelines to Web-Development.md
  * Correlation ID usage patterns
  * Privacy & security best practices
  * Telemetry event structure
  * Redaction utilities for sensitive data
- Add distributed tracing architecture section to Architecture.md
  * Correlation ID flow across frontend/backend
  * Example troubleshooting scenario
  * Implementation details for future enhancements

Testing:
- Add comprehensive tests for correlation middleware
- Update error boundary tests to verify telemetry integration
- Verify TypeScript and ESLint pass with no warnings

Fixes: Issue #40 - Frontend and backend observability are not aligned

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 18:32:19 +02:00
187cd8250d Implement database-backed scheduler lock for multi-worker safety
Enforce single-executor safety regardless of process launcher through a
robust database-backed lock mechanism that works reliably in container
orchestration environments.

Key changes:
1. Add scheduler_lock table to database schema (migration 4)
   - Singleton row (id=1) prevents concurrent execution
   - Stores PID, hostname, creation timestamp, heartbeat timestamp
   - Atomic transaction prevents race conditions

2. Create scheduler lock utility (app/utils/scheduler_lock.py)
   - acquire_scheduler_lock(): Atomically acquire or fail
   - release_scheduler_lock(): Clean up on shutdown
   - update_scheduler_lock_heartbeat(): Keep lock alive (every 10 seconds)
   - get_scheduler_lock_info(): Debug/inspect lock status
   - Stale lock detection: TTL-based (60 second expiry)

3. Reorder startup DAG stages
   - DATABASE now comes first (required for lock acquisition)
   - WORKER_MODE depends on DATABASE (performs lock check after initialization)
   - Maintains all other stage dependencies intact

4. Update startup process (app/startup.py)
   - Replace _check_single_worker_mode() with two-tier check:
     * Fast check: BANGUI_WORKERS env var (if explicitly set to >1)
     * Authoritative check: Database lock (catches misconfiguration)
   - Return startup_db from startup_shared_resources() for lock management

5. Register scheduler lock heartbeat task
   - New task: scheduler_lock_heartbeat (app/tasks/scheduler_lock_heartbeat.py)
   - Updates lock heartbeat every 10 seconds (keeps lock alive)
   - Prevents false positives from temporary load spikes

6. Add lock release to lifespan shutdown (app/main.py)
   - Release lock before closing database
   - Allows other instances to acquire during rolling deployments
   - Graceful handoff between instances

7. Comprehensive test coverage (backend/tests/test_scheduler_lock.py)
   - Lock acquisition success and failure cases
   - Stale lock cleanup on startup
   - Lock release and heartbeat updates
   - Full lifecycle: acquire → heartbeat → release

8. Update documentation (Docs/Architekture.md § 9.3)
   - Explain single-executor requirement
   - Document database-backed locking mechanism
   - Compare with alternative approaches (filesystem, env var)
   - Include troubleshooting guide
   - Container orchestration examples (Docker, Kubernetes, systemd)

Why database-backed instead of filesystem?
   - Atomicity: SQLite transactions prevent TOCTOU race windows
   - Container-safe: Works across containers with shared DB volumes
   - No NFS/SMB edge cases
   - Timestamp-based stale detection (PID reuse is unreliable)
   - More reliable in rolling deployments

Benefits:
   - Works with any process manager (uvicorn, gunicorn, etc.)
   - Handles simultaneous startup attempts correctly
   - Automatic failover on instance crash (stale lock cleanup)
   - Clear error messages with troubleshooting steps
   - No environment variable required (lock is authoritative)
   - Scales to multi-worker deployments if combined with external job store

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:10:53 +02:00
bc4ba703f0 Fix #34: Replace setup redirect allowlist prefix matching with explicit allowlist
- Replace fragile startswith() matching with explicit path matching
- Split allowlist into _EXACT_ALLOWED (exact paths) and _PREFIX_ALLOWED (prefixes)
- Prefix paths MUST end with '/' to prevent matching unintended paths like /api/setup-debug
- Paths correctly matched: /api/setup, /api/health, /api/docs, /api/redoc, /api/openapi.json, /api/setup/timezone
- Paths correctly blocked: /api/setup-debug, /api/setup123, /api/jails
- Add comprehensive Setup Guard Route Policy documentation to Backend-Development.md
- Update line numbers in documentation to reflect current implementation

This prevents future route additions from accidentally bypassing the setup guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:45:42 +02:00
1e2576af2a ## 27) Error response body shape is inconsistent 2026-04-28 22:28:02 +02:00
afc1e44e99 Implement centralized exception handling and validation
- Add custom exception classes for structured error handling
- Implement global exception handlers in FastAPI application
- Add comprehensive request/response validation
- Create exception contract tests for validation
- Update backend development documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:52:12 +02:00
b9289a3b0e Fix: Remove socket path leak in fail2ban error responses
- Change _fail2ban_connection_handler() to return generic message instead of
  leaking socket path in HTTP 502 response body
- Change _fail2ban_protocol_handler() to return generic message instead of
  leaking raw exception details in HTTP 502 response body
- Full error details are still logged server-side (error=str(exc)) for debugging
- Update Backend-Development.md with error message hygiene section explaining
  the pattern: generic user-friendly messages in HTTP responses, full details
  in server logs only

Fixes TASK-029: Fail2BanConnectionError leaks socket path in HTTP error responses

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 15:21:35 +02:00
df841c21e4 TASK-026: Disable API docs in production, protect with BANGUI_ENABLE_DOCS setting
Addresses security concern where FastAPI's default behavior exposes interactive
API documentation (/docs, /redoc) without authentication, allowing attackers to
enumerate endpoints and understand API schemas.

Changes:
- Add BANGUI_ENABLE_DOCS boolean setting (default: false) to Settings
- Modify create_app() to conditionally set docs_url, redoc_url, openapi_url
- Add docs endpoints to SetupRedirectMiddleware allowlist (/api/docs, /api/redoc, /api/openapi.json)
- Set BANGUI_ENABLE_DOCS=true in Docker/compose.debug.yml for development
- Production compose files leave it unset (defaults to false, docs disabled)
- Add comprehensive tests for docs configuration
- Document the new setting in Backend-Development.md

Security Impact:
- API documentation is now disabled by default in production
- Development environments can enable docs by setting BANGUI_ENABLE_DOCS=true
- Docs endpoints are inaccessible in production without manual configuration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 15:09:51 +02:00
c2348d7075 Refactor backend architecture and update documentation
- Add CSRF protection middleware implementation
- Update API client with improved configuration
- Enhance documentation for backend development
- Add architecture documentation updates
- Reorganize and clean up task documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 14:52:23 +02:00
ea4c7c2f85 Implement login endpoint rate limiting (TASK-007)
- Add in-memory rate limiter with per-IP deque tracking of attempt timestamps
- Limit login attempts to 5 per 60 seconds per IP, return 429 on excess
- Add Retry-After header to rate limit responses
- Implement IP extraction utility with proxy trust validation (prevent X-Forwarded-For spoofing)
- Integrate rate limiter into auth router and dependencies
- Add 10-second asyncio.sleep on failed login attempts to further slow brute-force
- Add comprehensive tests for rate limiting (9 new tests, all passing)
- Update Features.md to document login rate limiting
- Update Backend-Development.md with rate limiting conventions and design patterns
- Fix test infrastructure issues: update password to meet complexity requirements
- Fix TestValidateSession tests to use Bearer token authentication
- All tests passing: 23 auth tests + full test suite coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 12:40:52 +02:00
e57d19fd76 T-05: Remove app.state mutation from _build_app_context
Move session cache initialization from per-request _build_app_context to
startup lifespan handler. The session cache type is now decided once at app
startup based on settings, making _build_app_context pure (read-only).

Changes:
- Move cache initialization logic to new _update_session_cache() in main.py
- Call _update_session_cache() during lifespan startup to initialize cache
- Remove three if/elif/elif branches mutating state.session_cache from _build_app_context
- Add cache swap logic to set_runtime_settings() in runtime_state.py to handle
  runtime settings changes (e.g., setup wizard updates)
- Keep app.state.session_cache initialization in create_app() for test compatibility

This ensures:
- _build_app_context is pure and doesn't mutate app state on each request
- Session cache configuration decisions are centralized at startup
- Settings changes during runtime (via setup wizard) also trigger cache swap
- Cache initialization logic is isolated in one place

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-25 18:23:08 +02:00
5480dce221 refactor: Remove duplicate router-level exception helpers
All routers now let domain exceptions propagate to the global handlers in main.py
instead of catching and converting them to HTTPException. This eliminates:

- Duplicate exception-to-HTTP-status mappings across 8 routers
- Duplicate helper functions (_bad_gateway, _not_found, _conflict, etc.)
- Inconsistent error response formats

Changes:
- Removed all try/except blocks from routers that catch domain exceptions
- Removed duplicate helper functions from all routers
- Added missing exception handlers to main.py for:
  * ActionNameError
  * FilterNameError
  * JailNameError
  * JailNotFoundInConfigError
  * FilterInvalidRegexError
- Removed unused imports from affected routers

All domain exceptions now propagate to the single authoritative mapping in
main.py, ensuring consistent error codes, messages, and logging across the API.

Affected routers:
- action_config.py: Removed _action_not_found, _bad_request, _not_found helpers
- bans.py: Removed try/except in ban/unban endpoints
- config_misc.py: Removed try/except blocks
- file_config.py: Removed 6 try/except blocks and _service_unavailable helper
- filter_config.py: Removed try/except blocks
- geo.py: Removed try/except in lookup_ip endpoint
- jail_config.py: Removed try/except blocks
- jails.py: Removed try/except blocks
- server.py: Removed try/except blocks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:00:37 +02:00
04b2e2f700 Add global domain exception handlers in main.py
Register consistent HTTP error mappings for common domain exceptions and add regression tests for 404/400/500 handler behavior.
2026-04-17 16:42:18 +02:00
328f3575e2 Move Fail2Ban exceptions into central app.exceptions module 2026-04-15 10:22:48 +02:00
53cdd63b6a Add no-op session cache when session cache is disabled
Use NoOpSessionCache in backend/app/main.py and dynamically switch cache implementation in backend/app/dependencies.py so disabled cache mode remains safe while get_session_cache always returns a valid object.
2026-04-14 12:14:50 +02:00
1dfc17f4f5 Replace process-local session cache with pluggable session cache backend 2026-04-10 19:22:02 +02:00
ff92733f90 Move runtime application state into a dedicated runtime state manager 2026-04-10 19:07:35 +02:00
1fc04ed978 Extract startup resource initialization from main.py
Move lifespan startup logic into app.startup and remove local imports from app.main._lifespan. Mark startup wiring issue as done.
2026-04-07 20:48:29 +02:00
95f72018f7 Add backend lifecycle regression tests and fix lifespan cleanup 2026-04-06 20:56:57 +02:00
1a7096b276 Add environment-driven CORS settings and backend regression tests 2026-04-06 20:42:33 +02:00
89ab41cc9e Convert setup guard to startup-driven cache and update tests 2026-04-06 20:38:15 +02:00
3ccfc20c64 Harden fail2ban integration and mark task complete 2026-04-06 20:20:14 +02:00
42c030c706 Refactor backend to use request-scoped SQLite connections 2026-04-05 23:14:46 +02:00
9f05da2d4d Complete history archive support for dashboard/map data and mark task finished
Add source=archive option for dashboard endpoints and history service; update Docs/Tasks.md; include archive branch for list_bans, bans_by_country, ban_trend, bans_by_jail; tests for archive paths.
2026-03-28 12:39:47 +01:00
c9e688cc52 Refactor geo cache persistence into repository + remove raw SQL from tasks/main, update task list 2026-03-22 14:24:24 +01:00
e98fd1de93 Fix global version handling and unify app version across backend/frontend 2026-03-17 09:06:42 +01:00
57cf93b1e5 Add ensure_jail_configs startup check for required jail config files
On startup BanGUI now verifies that the four fail2ban jail config files
required by its two custom jails (manual-Jail and blocklist-import) are
present in `$fail2ban_config_dir/jail.d`.  Any missing file is created
with the correct default content; existing files are never overwritten.

Files managed:
  - manual-Jail.conf        (enabled=false template)
  - manual-Jail.local       (enabled=true override)
  - blocklist-import.conf   (enabled=false template)
  - blocklist-import.local  (enabled=true override)

The check runs in the lifespan hook immediately after logging is
configured, before the database is opened.
2026-03-16 16:26:39 +01:00
21753c4f06 Fix Stage 0 bootstrap and startup regression
Task 0.1: Create database parent directory before connecting
- main.py _lifespan now calls Path(database_path).parent.mkdir(parents=True,
  exist_ok=True) before aiosqlite.connect() so the app starts cleanly on
  a fresh Docker volume with a nested database path.

Task 0.2: SetupRedirectMiddleware redirects when db is None
- Guard now reads: if db is None or not is_setup_complete(db)
  A missing database (startup still in progress) is treated as setup not
  complete instead of silently allowing all API routes through.

Task 0.3: SetupGuard redirects to /setup on API failure
- .catch() handler now sets status to 'pending' instead of 'done'.
  A crashed backend cannot serve protected routes; conservative fallback
  is to redirect to /setup.

Task 0.4: SetupPage shows spinner while checking setup status
- Added 'checking' boolean state; full-screen Spinner is rendered until
  getSetupStatus() resolves, preventing form flash before redirect.
- Added console.warn in catch block; cleanup return added to useEffect.

Also: remove unused type: ignore[call-arg] from config.py.

Tests: 18 backend tests pass; 117 frontend tests pass.
2026-03-15 18:05:53 +01:00
ea35695221 Add better jail configuration: file CRUD, enable/disable, log paths
Task 4 (Better Jail Configuration) implementation:
- Add fail2ban_config_dir setting to app/config.py
- New file_config_service: list/view/edit/create jail.d, filter.d, action.d files
  with path-traversal prevention and 512 KB content size limit
- New file_config router: GET/PUT/POST endpoints for jail files, filter files,
  and action files; PUT .../enabled for toggle on/off
- Extend config_service with delete_log_path() and add_log_path()
- Add DELETE /api/config/jails/{name}/logpath and POST /api/config/jails/{name}/logpath
- Extend geo router with re-resolve endpoint; add geo_re_resolve background task
- Update blocklist_service with revised scheduling helpers
- Update Docker compose files with BANGUI_FAIL2BAN_CONFIG_DIR env var and
  rw volume mount for the fail2ban config directory
- Frontend: new Jail Files, Filters, Actions tabs in ConfigPage; file editor
  with accordion-per-file, editable textarea, save/create; add/delete log paths
- Frontend: types in types/config.ts; API calls in api/config.ts and api/endpoints.ts
- 63 new backend tests (test_file_config_service, test_file_config, test_geo_re_resolve)
- 6 new frontend tests in ConfigPageLogPath.test.tsx
- ruff, mypy --strict, tsc --noEmit, eslint: all clean; 617 backend tests pass
2026-03-12 20:08:33 +01:00
d931e8c6a3 Reduce per-request DB overhead (Task 4)
- Cache setup_completed flag in app.state._setup_complete_cached after
  first successful is_setup_complete() call; all subsequent API requests
  skip the DB query entirely (one-way transition, cleared on restart).
- Add in-memory session token TTL cache (10 s) in require_auth; the second
  request with the same token within the window skips session_repo.get_session.
- Call invalidate_session_cache() on logout so revoked tokens are evicted
  immediately rather than waiting for TTL expiry.
- Add clear_session_cache() for test isolation.
- 5 new tests covering the cached fast-path for both optimisations.
- 460 tests pass, 83% coverage, zero ruff/mypy warnings.
2026-03-10 19:16:00 +01:00
44a5a3d70e Fix geo cache write performance: batch commits, read-only GETs, dirty flush
- Remove per-IP db.commit() from _persist_entry() and _persist_neg_entry();
  add a single commit after the full lookup_batch() chunk loop instead.
  Reduces commits from ~5,200 to 1 per bans/by-country request.

- Remove db dependency from GET /api/dashboard/bans and
  GET /api/dashboard/bans/by-country; pass app_db=None so no SQLite
  writes occur during read-only requests.

- Add _dirty set to geo_service; _store() marks resolved IPs dirty.
  New flush_dirty(db) batch-upserts all dirty entries in one transaction.
  New geo_cache_flush APScheduler task flushes every 60 s so geo data
  is persisted without blocking requests.
2026-03-10 18:45:58 +01:00
12a859061c Fix missing country: neg cache, geoip2 fallback, re-resolve endpoint
- Add 5-min negative cache (_neg_cache) so failing IPs are throttled
  rather than hammering the API on every request
- Add MaxMind GeoLite2 fallback (init_geoip / _geoip_lookup) that fires
  when ip-api fails; controlled by BANGUI_GEOIP_DB_PATH env var
- Fix lookup_batch bug: failed API results were stored in positive cache
- Add _persist_neg_entry: INSERT OR IGNORE into geo_cache with NULL
  country_code so re-resolve can find historically failed IPs
- Add POST /api/geo/re-resolve: clears neg cache, batch-retries all
  geo_cache rows with country_code IS NULL, returns resolved/total count
- BanTable + MapPage: wrap the country — placeholder in a Fluent UI
  Tooltip explaining the retry behaviour
- Add geoip2>=4.8.0 dependency; geoip_db_path config setting
- Tests: add TestNegativeCache (4), TestGeoipFallback (4), TestReResolve (4)
2026-03-07 20:42:34 +01:00
ddfc8a0b02 Optimise geo lookup and aggregation for 10k+ IPs
- Add persistent geo_cache SQLite table (db.py)
- Rewrite geo_service: batch API (100 IPs/call), two-tier cache,
  no caching of failed lookups so they are retried
- Pre-warm geo cache from DB on startup (main.py lifespan)
- Rewrite bans_by_country: SQL GROUP BY ip aggregation + lookup_batch
  instead of 2000-row fetch + asyncio.gather individual calls
- Pre-warm geo cache after blocklist import (blocklist_service)
- Add 300ms debounce to useMapData hook to cancel stale requests
- Add perf benchmark asserting <2s for 10k bans
- Add seed_10k_bans.py script for manual perf testing
2026-03-07 20:28:51 +01:00
19bb94ee47 Fix fail2ban-master path resolution for Docker container
In the Docker image, the app source is copied to /app/app/ (not
backend/app/), so parents[2] resolved to '/' instead of /app.
This left the fail2ban package absent from sys.path, causing every
pickle.loads() call on socket responses to raise:

  ModuleNotFoundError: No module named 'fail2ban'

Replace the hardcoded parents[2] with a walk-up search that iterates
over all ancestors until it finds a fail2ban-master/ sibling directory.
Works correctly in both local dev and Docker without environment-specific
path magic.
2026-03-01 20:48:59 +01:00
1cdc97a729 Stage 11: polish, cross-cutting concerns & hardening
- 11.1 MainLayout health indicator: warning MessageBar when fail2ban offline
- 11.2 formatDate utility + TimezoneProvider + GET /api/setup/timezone
- 11.3 Responsive sidebar: auto-collapse <640px, media query listener
- 11.4 PageFeedback (PageLoading/PageError/PageEmpty), BanTable updated
- 11.5 prefers-reduced-motion: disable sidebar transition
- 11.6 WorldMap ARIA: role/tabIndex/aria-label/onKeyDown for countries
- 11.7 Health transition logging (fail2ban_came_online/went_offline)
- 11.8 Global handlers: Fail2BanConnectionError/ProtocolError -> 502
- 11.9 379 tests pass, 82% coverage, ruff+mypy+tsc+eslint clean
- Timezone endpoint: setup_service.get_timezone, 5 new tests
2026-03-01 15:59:06 +01:00
1efa0e973b Stage 10: external blocklist importer — backend + frontend
- blocklist_repo.py: CRUD for blocklist_sources table
- import_log_repo.py: add/list/get-last log entries
- blocklist_service.py: source CRUD, preview, import (download/validate/ban),
  import_all, schedule get/set/info
- blocklist_import.py: APScheduler task (hourly/daily/weekly schedule triggers)
- blocklist.py router: 9 endpoints (list/create/update/delete/preview/import/
  schedule-get+put/log)
- blocklist.py models: ScheduleFrequency (StrEnum), ScheduleConfig, ScheduleInfo,
  ImportSourceResult, ImportRunResult, PreviewResponse
- 59 new tests (18 repo + 19 service + 22 router); 374 total pass
- ruff clean, mypy clean for Stage 10 files
- types/blocklist.ts, api/blocklist.ts, hooks/useBlocklist.ts
- BlocklistsPage.tsx: source management, schedule picker, import log table
- Frontend tsc + ESLint clean
2026-03-01 15:33:24 +01:00
b8f3a1c562 Stage 9: ban history — backend service, router, frontend history page
- history.py models: HistoryBanItem, HistoryListResponse, IpTimelineEvent, IpDetailResponse
- history_service.py: list_history() with dynamic WHERE clauses (range/jail/ip
  prefix/all-time), get_ip_detail() with timeline aggregation
- history.py router: GET /api/history + GET /api/history/{ip} (404 for unknown)
- Fixed latent bug in ban_service._parse_data_json: json.loads('null') -> None
  -> AttributeError; now checks isinstance(parsed, dict) before assigning obj
- 317 tests pass (27 new), ruff + mypy clean (46 files)
- types/history.ts, api/history.ts, hooks/useHistory.ts created
- HistoryPage.tsx: filter bar (time range/jail/IP), DataGrid table,
  high-ban-count row highlighting, per-IP IpDetailView with timeline,
  pagination
- Frontend tsc + ESLint clean (0 errors/warnings)
- Tasks.md Stage 9 marked done
2026-03-01 15:09:22 +01:00
7f81f0614b Stage 7: configuration view — backend service, routers, tests, and frontend
- config_service.py: read/write jail config via asyncio.gather, global
  settings, in-process regex validation, log preview via _read_tail_lines
- server_service.py: read/write server settings, flush logs
- config router: 9 endpoints for jail/global config, regex-test,
  logpath management, log preview
- server router: GET/PUT settings, POST flush-logs
- models/config.py expanded with JailConfig, GlobalConfigUpdate,
  LogPreview* models
- 285 tests pass (68 new), ruff clean, mypy clean (44 files)
- Frontend: types/config.ts, api/config.ts, hooks/useConfig.ts,
  ConfigPage.tsx full implementation (Jails accordion editor,
  Global config, Server settings, Regex Tester with preview)
- Fixed pre-existing frontend lint: JSX.Element → React.JSX.Element
  (10 files), void/promise patterns in useServerStatus + useJails,
  no-misused-spread in client.ts, eslint.config.ts self-excluded
2026-03-01 14:37:55 +01:00