- Add deprecation middleware for warning headers on sunset endpoints
- Add jails_v2 router for API v2 migration path
- Update CI workflow with new test coverage
- Update API versioning documentation
- Remove completed tasks from Tasks.md
Fail with RuntimeError when WEB_CONCURRENCY or BANGUI_WORKERS > 1.
In-memory session cache, rate-limit windows, and runtime state are
process-local. Multi-worker silently causes stale limits, ghost sessions,
inconsistent status.
Skipped when TESTING=1.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure.
## Key Changes:
### 1. New Documentation: Docs/Observability.md
- Comprehensive guide to logging architecture and configuration
- Covers all three supported platforms (Datadog, Papertrail, Elasticsearch)
- Includes best practices, security considerations, and troubleshooting
- Documents sensitive data handling and compliance requirements
### 2. Core Implementation: app/utils/external_logging.py
- ExternalLogHandler: Abstract base class for non-blocking log delivery
- DatadogLogHandler: HTTP API integration with JSON payloads
- PapertrailLogHandler: Syslog protocol over TCP
- ElasticsearchLogHandler: Bulk API integration with NDJSON format
- Features:
- Async buffering with configurable batch size and flush interval
- Exponential backoff retry logic
- Non-blocking delivery (never blocks application logic)
- Proper error handling and internal logging
- Lifecycle management (start/shutdown)
### 3. Configuration: app/config.py
- New Settings fields for external logging:
- external_logging_enabled (default: False)
- external_logging_provider (datadog/papertrail/elasticsearch)
- external_logging_buffer_size (default: 1000)
- external_logging_flush_interval_seconds (default: 5.0)
- Provider-specific configuration (API keys, hosts, batch sizes)
- All fields have sensible defaults
- Full field validation and normalization
### 4. Integration: app/main.py
- Global _external_log_handler for application lifecycle
- _external_logging_processor: structlog processor for handler integration
- Updated _configure_logging(): Add handler to processor chain when enabled
- Updated _lifespan(): Initialize handler before startup, shutdown on termination
### 5. Tests: backend/tests/test_external_logging.py
- 20 comprehensive tests covering all handlers and factory
- Configuration validation tests
- All tests passing
## Design Decisions:
1. **Non-blocking Delivery**: External logging never blocks request handling.
Failures are logged locally but don't impact application.
2. **Buffering Strategy**: In-memory buffer with configurable size prevents
unbounded memory growth. When buffer fills, oldest logs are dropped with
a warning.
3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried
with exponential backoff. Permanent failures (bad credentials) are logged
and skipped.
4. **Disabled by Default**: External logging is opt-in via environment
variables, maintaining backward compatibility with existing deployments.
5. **Provider Flexibility**: Support for multiple platforms allows users to
choose based on their infrastructure (cloud-native, on-premise, etc).
## Backward Compatibility:
- All new configuration fields have defaults
- External logging disabled by default
- No changes to existing logging behavior unless explicitly configured
- No new required dependencies
## Testing:
- All 20 new tests passing
- Existing tests unaffected (same count of passing tests)
- Configuration validation tested
- Handler creation and lifecycle management tested
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add SecurityHeadersMiddleware to backend/app/main.py
- Implements Content-Security-Policy: default-src 'self'
- Implements X-Frame-Options: DENY (clickjacking protection)
- Implements X-Content-Type-Options: nosniff (MIME-sniffing protection)
- Implements X-XSS-Protection: 1; mode=block (browser XSS filters)
- Add CSP meta tag to frontend/index.html for defense-in-depth
- Create Docs/Security.md with comprehensive security headers documentation
- Add test suite (backend/tests/test_security_headers_middleware.py) with 5 tests
- Tests verify headers are present on success and error responses
- Tests ensure all four security headers are correctly set
- All existing tests continue to pass
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add global rate limiter utility with configurable limits and cleanup
- Move rate limiting logic to middleware for consistent application
- Update auth routes to use new rate limiter
- Add comprehensive tests for rate limiter functionality
- Update documentation with backend development guidelines and tasks
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
**Problem:** Broad exception handlers created fragility where adding a new
DomainError subclass without explicit registration would silently fall through
to the generic exception handler, losing the specific error_code and metadata.
**Solution:**
1. Import DomainError in main.py for explicit handler registration
2. Fix type hints in exception handlers from 'Exception' to specific types
- NotFoundError handler now typed as 'NotFoundError'
- BadRequestError handler now typed as 'BadRequestError'
- ConflictError handler now typed as 'ConflictError'
- DomainError handler now typed as 'DomainError'
- ServiceUnavailableError handler now typed as 'ServiceUnavailableError'
3. Add DomainError as an explicit catch-all handler in the registration chain
- Positioned after specific handlers, before HTTPException
- Any unregistered DomainError subclass now gets correct error_code + metadata
4. Document the exception handler hierarchy with detailed comments
5. Update Backend-Development.md with handler hierarchy documentation
6. Update Architekture.md section 2.2 with exception handler details
7. Fix test expectations in test_main.py to verify ErrorResponse format
**Impact:** Any new DomainError subclass now automatically gets correct HTTP 500
status, error_code, and metadata - even if developer forgets explicit handler.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Align frontend and backend error observability with correlation IDs and
structured telemetry for distributed tracing across systems.
Backend changes:
- Add CorrelationIdMiddleware to generate/extract correlation IDs
- Include correlation_id in all ErrorResponse objects
- Store correlation ID in structlog contextvars for automatic inclusion in logs
- Add correlation ID to response headers (X-Correlation-ID)
Frontend changes:
- API client automatically generates session-scoped UUID4 and includes
X-Correlation-ID header in all requests
- Extract correlation ID from API error responses
- Update error handlers to use telemetry with correlation IDs
- Add telemetry logging to ErrorBoundary, PageErrorBoundary, SectionErrorBoundary
- Implement redaction utilities for privacy-safe logging of sensitive data
Documentation:
- Add observability guidelines to Web-Development.md
* Correlation ID usage patterns
* Privacy & security best practices
* Telemetry event structure
* Redaction utilities for sensitive data
- Add distributed tracing architecture section to Architecture.md
* Correlation ID flow across frontend/backend
* Example troubleshooting scenario
* Implementation details for future enhancements
Testing:
- Add comprehensive tests for correlation middleware
- Update error boundary tests to verify telemetry integration
- Verify TypeScript and ESLint pass with no warnings
Fixes: Issue #40 - Frontend and backend observability are not aligned
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Change _fail2ban_connection_handler() to return generic message instead of
leaking socket path in HTTP 502 response body
- Change _fail2ban_protocol_handler() to return generic message instead of
leaking raw exception details in HTTP 502 response body
- Full error details are still logged server-side (error=str(exc)) for debugging
- Update Backend-Development.md with error message hygiene section explaining
the pattern: generic user-friendly messages in HTTP responses, full details
in server logs only
Fixes TASK-029: Fail2BanConnectionError leaks socket path in HTTP error responses
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses security concern where FastAPI's default behavior exposes interactive
API documentation (/docs, /redoc) without authentication, allowing attackers to
enumerate endpoints and understand API schemas.
Changes:
- Add BANGUI_ENABLE_DOCS boolean setting (default: false) to Settings
- Modify create_app() to conditionally set docs_url, redoc_url, openapi_url
- Add docs endpoints to SetupRedirectMiddleware allowlist (/api/docs, /api/redoc, /api/openapi.json)
- Set BANGUI_ENABLE_DOCS=true in Docker/compose.debug.yml for development
- Production compose files leave it unset (defaults to false, docs disabled)
- Add comprehensive tests for docs configuration
- Document the new setting in Backend-Development.md
Security Impact:
- API documentation is now disabled by default in production
- Development environments can enable docs by setting BANGUI_ENABLE_DOCS=true
- Docs endpoints are inaccessible in production without manual configuration
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add in-memory rate limiter with per-IP deque tracking of attempt timestamps
- Limit login attempts to 5 per 60 seconds per IP, return 429 on excess
- Add Retry-After header to rate limit responses
- Implement IP extraction utility with proxy trust validation (prevent X-Forwarded-For spoofing)
- Integrate rate limiter into auth router and dependencies
- Add 10-second asyncio.sleep on failed login attempts to further slow brute-force
- Add comprehensive tests for rate limiting (9 new tests, all passing)
- Update Features.md to document login rate limiting
- Update Backend-Development.md with rate limiting conventions and design patterns
- Fix test infrastructure issues: update password to meet complexity requirements
- Fix TestValidateSession tests to use Bearer token authentication
- All tests passing: 23 auth tests + full test suite coverage
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move session cache initialization from per-request _build_app_context to
startup lifespan handler. The session cache type is now decided once at app
startup based on settings, making _build_app_context pure (read-only).
Changes:
- Move cache initialization logic to new _update_session_cache() in main.py
- Call _update_session_cache() during lifespan startup to initialize cache
- Remove three if/elif/elif branches mutating state.session_cache from _build_app_context
- Add cache swap logic to set_runtime_settings() in runtime_state.py to handle
runtime settings changes (e.g., setup wizard updates)
- Keep app.state.session_cache initialization in create_app() for test compatibility
This ensures:
- _build_app_context is pure and doesn't mutate app state on each request
- Session cache configuration decisions are centralized at startup
- Settings changes during runtime (via setup wizard) also trigger cache swap
- Cache initialization logic is isolated in one place
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use NoOpSessionCache in backend/app/main.py and dynamically switch cache implementation in backend/app/dependencies.py so disabled cache mode remains safe while get_session_cache always returns a valid object.
Add source=archive option for dashboard endpoints and history service; update Docs/Tasks.md; include archive branch for list_bans, bans_by_country, ban_trend, bans_by_jail; tests for archive paths.
On startup BanGUI now verifies that the four fail2ban jail config files
required by its two custom jails (manual-Jail and blocklist-import) are
present in `$fail2ban_config_dir/jail.d`. Any missing file is created
with the correct default content; existing files are never overwritten.
Files managed:
- manual-Jail.conf (enabled=false template)
- manual-Jail.local (enabled=true override)
- blocklist-import.conf (enabled=false template)
- blocklist-import.local (enabled=true override)
The check runs in the lifespan hook immediately after logging is
configured, before the database is opened.
Task 0.1: Create database parent directory before connecting
- main.py _lifespan now calls Path(database_path).parent.mkdir(parents=True,
exist_ok=True) before aiosqlite.connect() so the app starts cleanly on
a fresh Docker volume with a nested database path.
Task 0.2: SetupRedirectMiddleware redirects when db is None
- Guard now reads: if db is None or not is_setup_complete(db)
A missing database (startup still in progress) is treated as setup not
complete instead of silently allowing all API routes through.
Task 0.3: SetupGuard redirects to /setup on API failure
- .catch() handler now sets status to 'pending' instead of 'done'.
A crashed backend cannot serve protected routes; conservative fallback
is to redirect to /setup.
Task 0.4: SetupPage shows spinner while checking setup status
- Added 'checking' boolean state; full-screen Spinner is rendered until
getSetupStatus() resolves, preventing form flash before redirect.
- Added console.warn in catch block; cleanup return added to useEffect.
Also: remove unused type: ignore[call-arg] from config.py.
Tests: 18 backend tests pass; 117 frontend tests pass.
Task 4 (Better Jail Configuration) implementation:
- Add fail2ban_config_dir setting to app/config.py
- New file_config_service: list/view/edit/create jail.d, filter.d, action.d files
with path-traversal prevention and 512 KB content size limit
- New file_config router: GET/PUT/POST endpoints for jail files, filter files,
and action files; PUT .../enabled for toggle on/off
- Extend config_service with delete_log_path() and add_log_path()
- Add DELETE /api/config/jails/{name}/logpath and POST /api/config/jails/{name}/logpath
- Extend geo router with re-resolve endpoint; add geo_re_resolve background task
- Update blocklist_service with revised scheduling helpers
- Update Docker compose files with BANGUI_FAIL2BAN_CONFIG_DIR env var and
rw volume mount for the fail2ban config directory
- Frontend: new Jail Files, Filters, Actions tabs in ConfigPage; file editor
with accordion-per-file, editable textarea, save/create; add/delete log paths
- Frontend: types in types/config.ts; API calls in api/config.ts and api/endpoints.ts
- 63 new backend tests (test_file_config_service, test_file_config, test_geo_re_resolve)
- 6 new frontend tests in ConfigPageLogPath.test.tsx
- ruff, mypy --strict, tsc --noEmit, eslint: all clean; 617 backend tests pass
- Cache setup_completed flag in app.state._setup_complete_cached after
first successful is_setup_complete() call; all subsequent API requests
skip the DB query entirely (one-way transition, cleared on restart).
- Add in-memory session token TTL cache (10 s) in require_auth; the second
request with the same token within the window skips session_repo.get_session.
- Call invalidate_session_cache() on logout so revoked tokens are evicted
immediately rather than waiting for TTL expiry.
- Add clear_session_cache() for test isolation.
- 5 new tests covering the cached fast-path for both optimisations.
- 460 tests pass, 83% coverage, zero ruff/mypy warnings.
- Remove per-IP db.commit() from _persist_entry() and _persist_neg_entry();
add a single commit after the full lookup_batch() chunk loop instead.
Reduces commits from ~5,200 to 1 per bans/by-country request.
- Remove db dependency from GET /api/dashboard/bans and
GET /api/dashboard/bans/by-country; pass app_db=None so no SQLite
writes occur during read-only requests.
- Add _dirty set to geo_service; _store() marks resolved IPs dirty.
New flush_dirty(db) batch-upserts all dirty entries in one transaction.
New geo_cache_flush APScheduler task flushes every 60 s so geo data
is persisted without blocking requests.
- Add 5-min negative cache (_neg_cache) so failing IPs are throttled
rather than hammering the API on every request
- Add MaxMind GeoLite2 fallback (init_geoip / _geoip_lookup) that fires
when ip-api fails; controlled by BANGUI_GEOIP_DB_PATH env var
- Fix lookup_batch bug: failed API results were stored in positive cache
- Add _persist_neg_entry: INSERT OR IGNORE into geo_cache with NULL
country_code so re-resolve can find historically failed IPs
- Add POST /api/geo/re-resolve: clears neg cache, batch-retries all
geo_cache rows with country_code IS NULL, returns resolved/total count
- BanTable + MapPage: wrap the country — placeholder in a Fluent UI
Tooltip explaining the retry behaviour
- Add geoip2>=4.8.0 dependency; geoip_db_path config setting
- Tests: add TestNegativeCache (4), TestGeoipFallback (4), TestReResolve (4)
- Add persistent geo_cache SQLite table (db.py)
- Rewrite geo_service: batch API (100 IPs/call), two-tier cache,
no caching of failed lookups so they are retried
- Pre-warm geo cache from DB on startup (main.py lifespan)
- Rewrite bans_by_country: SQL GROUP BY ip aggregation + lookup_batch
instead of 2000-row fetch + asyncio.gather individual calls
- Pre-warm geo cache after blocklist import (blocklist_service)
- Add 300ms debounce to useMapData hook to cancel stale requests
- Add perf benchmark asserting <2s for 10k bans
- Add seed_10k_bans.py script for manual perf testing
In the Docker image, the app source is copied to /app/app/ (not
backend/app/), so parents[2] resolved to '/' instead of /app.
This left the fail2ban package absent from sys.path, causing every
pickle.loads() call on socket responses to raise:
ModuleNotFoundError: No module named 'fail2ban'
Replace the hardcoded parents[2] with a walk-up search that iterates
over all ancestors until it finds a fail2ban-master/ sibling directory.
Works correctly in both local dev and Docker without environment-specific
path magic.