BanGUI

Author	SHA1	Message	Date
Lukas	96ce516ecf	fix(logging): resolve logging_compat keyword arg conflicts - Fix logging_compat._log() to handle extra keyword arguments properly - Update config.py, main.py, and test_bans.py for compatibility - Update Tasks.md and runner.csx	2026-05-10 15:54:00 +02:00
Lukas	7ec80fdeec	refactor(logging): replace structlog with stdlib logging compat layer - Remove structlog dependency from backend/pyproject.toml - Add app.utils.logging_compat shim for keyword-arg logging API - Add app.utils.json_formatter for JSON log output with extra fields - Update all backend modules to use logging_compat.get_logger() - Update docstrings in log_sanitizer.py and json_formatter.py - Update test comment in test_async_utils.py - Record 406 failing tests in Docs/Tasks.md for tracking	2026-05-10 13:37:54 +02:00
Lukas	7790736918	feat(jail-config): add banaction and banaction_allports to blocklist config Adds iptables-multiport and iptables-allports ban actions to the blocklist-import jail configuration and updates the corresponding test assertions.	2026-05-10 09:35:33 +02:00
Lukas	79df1aa493	backup	2026-05-10 08:48:42 +02:00
Lukas	e4c3ae718c	fix(backend): relax SSRF validation for loopback in dev, graceful metrics/regexploit fallback - ip_utils: allow loopback (127.0.0.1) in dev mode (BANGUI_LOG_LEVEL=debug) so e2e tests can reach a mock HTTP server on the host - metrics: make all operations no-ops when prometheus_client not installed - regex_validator: graceful fallback when regexploit not installed - geo_cache: use attribute access instead of dict subscript for typed rows - rate_limit: support bucket_override parameter for per-endpoint rate limits - ban_service: construct DomainActiveBan explicitly instead of model_copy Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-08 08:07:13 +02:00
Lukas	481f32bb85	backup	2026-05-05 18:47:56 +02:00
Lukas	744275d17f	backup	2026-05-04 07:20:20 +02:00
Lukas	58173bd6a9	backup	2026-05-04 07:20:16 +02:00
Lukas	0a3f9c6c16	refactor(backend): external logging metrics, required mode, health checks - Add external_logging_init_failures counter - Add external_log_required flag, raise if init fails and required - Health endpoint: add external_logging status check - Blocklist service: enrich with metadata fields, update import logic - Health check task: add runtime_state dependency, fix return typing - Metrics: add Histogram for request latencies - Frontend: align BlocklistImportLogSection props - Docs: update deployment guide, remove stale tasks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-04 03:45:13 +02:00
Lukas	fc57c83f79	refactor: split pagination logic from response models - Extract pagination logic to separate util module - Update response models to use new pagination util - Fix pagination calculation edge cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 22:57:21 +02:00
Lukas	dafe8d61e2	feat(security): add CSRF header constants and security-headers endpoint Move X-BanGUI-Request header name/value to backend/app/utils/constants.py as single source of truth. Add GET /api/v1/config/security-headers endpoint. Update csrf middleware, frontend api client, and docs to use shared constants.	2026-05-03 22:06:43 +02:00
Lukas	1c3dff31e8	feat(rate-limiting): add per-bucket limits and startup validation - Add per-bucket rate limit config (ban, unban, import, config, jail, filter, action) - Add process-local warning at startup for multi-worker deployments - Document Redis migration path for shared state across workers - Remove Issue #42 from Tasks.md (resolved)	2026-05-03 20:53:21 +02:00
Lukas	c3cd1574dc	fix(auth): invalidate session cache on login Stale sessions from a stolen device could be reused up to the cache TTL after a legitimate user re-logs in, because login never cleared the existing cache entry. Changes: - Add invalidate_by_user(user_id) to SessionCache protocol - InMemorySessionCache maintains a user_id -> set[token] index to support O(1) invalidation of all sessions for a given user - NoOpSessionCache stub updated for API compatibility - auth_service.login() now returns the Session object alongside signed_token and expires_at - login router calls session_cache.invalidate_by_user(session.id) immediately after successful authentication Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 20:51:51 +02:00
Lukas	ae9313568e	feat: enforce single-worker at startup Fail with RuntimeError when WEB_CONCURRENCY or BANGUI_WORKERS > 1. In-memory session cache, rate-limit windows, and runtime state are process-local. Multi-worker silently causes stale limits, ghost sessions, inconsistent status. Skipped when TESTING=1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 20:33:23 +02:00
Lukas	96525573fa	Normalise IP addresses across backend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 18:19:41 +02:00
Lukas	5f0ab40816	refactor(backend): clean up models setup, improve ip utils, add adr docs - Extract ADR documents for architectural decisions (SQLite, FastAPI, React, APScheduler, Scheduler) - Refactor setup.py: improve code structure and readability - Add IP validation utilities with test coverage - Update frontend components (BanTable, HistoryPage) - Add pre-commit hooks and CONTRIBUTING.md - Add .editorconfig for consistent coding standards	2026-05-03 18:04:45 +02:00
Lukas	2f9fc8076d	refactor(backend): clean up jail service, add error handling service - Extract jail status/processing to helper functions - Add error_handling.py service for centralized error handling - Update config.py with validation and defaults - Update .env.example with all config options - Remove obsolete Tasks.md, add Service-Development.md - Minor fixes across routers and services Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 17:40:37 +02:00
Lukas	2df029f7e8	refactor(ban_service): extract _bans_by_country_load_data helper Break up long function into focused helper. Load data logic separate from aggregation.	2026-05-03 17:00:34 +02:00
Lukas	896751ada9	fix: handle socket close errors properly in PapertrailLogHandler - Replace contextlib.suppress with try/except + warning log - Add test for fail2ban client - Remove stale Issue #21 from Tasks.md (indexes) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 12:25:14 +02:00
Copilot	22db607875	Add fail2ban DB index management and socket-based path resolution - New get_fail2ban_db_path() in setup_service resolves DB path from configured socket path - New ensure_fail2ban_indexes() creates missing performance indexes on bans table - Call ensure_fail2ban_indexes on every startup before first ban query - Remove completed tasks from Docs/Tasks.md - Update Docs/PERFORMANCE.md with index findings	2026-05-03 12:17:31 +02:00
Lukas	0133489920	Update observability docs and task utilities - Add Observability.md documentation - Standardize task logging with correlation_id support - Add log_sanitizer utility for PII masking - Update Tasks.md tracking - Update geo_cache tasks and other task modules with correlation_id Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 11:52:09 +02:00
Lukas	881cfbdd71	fix: replace broad except Exception with specific exception types - jail_service: catch ValueError (fail2ban protocol error) instead of Exception - health.py: catch AttributeError (not OSError/TypeError) for defensive checks - ban_service: re-raise programming errors in geo lookup handlers - server_service: catch Fail2BanConnectionError, Fail2BanProtocolError, ValueError - config_writer: catch OSError instead of Exception Programming errors now bubble to global handler instead of being silently caught. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-03 00:54:44 +02:00
Lukas	0817a4cb47	fix(regex_validator): add ReDoS detection via regexploit Detect catastrophic backtracking patterns before regex compilation using regexploit library. Add ReDoSDetectedError exception and _MINIMUM_STARRINESS threshold (>=3) to catch dangerous patterns like (a+)+b. Update pyproject.toml deps, add tests for detection.	2026-05-03 00:05:33 +02:00
Lukas	cc6dbcf3f0	feat: implement API versioning /api/v1/ - All backend routers moved to /api/v1/ prefix - Frontend BASE_URL updated to /api/v1 - Setup redirect middleware updated to redirect to /api/v1/setup - Health router path fixed: prefix=/api/v1/health, @router.get('') - conftest.py: set server_status=online for test fixture - Created Docs/API_VERSIONING.md with deprecation policy - Updated Docs/Backend-Development.md with versioning section - Updated Instructions.md curl examples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-02 21:29:30 +02:00
Lukas	0d5882b32f	Fix HIGH priority issues: unbounded queries, rate limiting, health checks Issue #3 - Unbounded Query Results (OOM): - get_all_archived_history() now uses keyset pagination with bounded max_rows (50k default) - Added 'id' field to records from get_archived_history() and get_archived_history_keyset() - Protocol signature updated with page_size, max_rows, last_ban_id params Issue #7 - Docker Health Check Fails: - Added curl to Dockerfile.backend runtime image - HEALTHCHECK now uses 'curl -f http://localhost:8000/api/health' - compose.prod.yml: increased start_period to 40s, timeout to 10s - Frontend healthcheck proxies to backend /api/health Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 21:47:36 +02:00
Lukas	1af67eb0ce	Add Application Performance Monitoring (APM) with Prometheus metrics - Backend: Implement Prometheus metrics collection - Add prometheus-client dependency - Create metrics utility module with HTTP request tracking counters, histograms, gauges - Implement MetricsMiddleware to track request latency, count, and active requests - Add /metrics endpoint to expose metrics in Prometheus text format - Normalize paths to prevent cardinality explosion (e.g., /api/{id} for UUIDs) - Exclude /metrics and /health from detailed tracking - Frontend: Add web vitals and API metrics collection - Install web-vitals library (v4.0.0) for Core Web Vitals tracking - Create metrics utility module for FCP, LCP, CLS, INP, TTFB collection - Implement useTrackedFetch hook for automatic API call metrics (method, endpoint, status, duration) - Initialize web vitals tracking in App component on mount - Provide exportMetrics() for sending metrics to backend - Testing: - Add comprehensive backend metrics tests (9 tests, 100% coverage) - Add comprehensive frontend metrics tests (10 tests) - All tests passing - Documentation: - Expand Docs/Observability.md with complete APM section - Include metrics reference, integration examples (Prometheus, Datadog, NewRelic) - Add troubleshooting guide and best practices for cardinality management - Update Tasks.md to mark APM task as complete Metrics exposed: - bangui_http_requests_total: HTTP request count by method, endpoint, status - bangui_http_request_duration_seconds: Request latency histogram - bangui_http_active_requests: Active request gauge - Web Vitals: CLS, FCP, INP, LCP, TTFB with ratings - API metrics: endpoint, method, status, duration, timestamp Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 18:33:14 +02:00
Lukas	37078b742b	Implement structured logging to centralized platforms (Datadog, Papertrail, ELK) This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure. ## Key Changes: ### 1. New Documentation: Docs/Observability.md - Comprehensive guide to logging architecture and configuration - Covers all three supported platforms (Datadog, Papertrail, Elasticsearch) - Includes best practices, security considerations, and troubleshooting - Documents sensitive data handling and compliance requirements ### 2. Core Implementation: app/utils/external_logging.py - ExternalLogHandler: Abstract base class for non-blocking log delivery - DatadogLogHandler: HTTP API integration with JSON payloads - PapertrailLogHandler: Syslog protocol over TCP - ElasticsearchLogHandler: Bulk API integration with NDJSON format - Features: - Async buffering with configurable batch size and flush interval - Exponential backoff retry logic - Non-blocking delivery (never blocks application logic) - Proper error handling and internal logging - Lifecycle management (start/shutdown) ### 3. Configuration: app/config.py - New Settings fields for external logging: - external_logging_enabled (default: False) - external_logging_provider (datadog/papertrail/elasticsearch) - external_logging_buffer_size (default: 1000) - external_logging_flush_interval_seconds (default: 5.0) - Provider-specific configuration (API keys, hosts, batch sizes) - All fields have sensible defaults - Full field validation and normalization ### 4. Integration: app/main.py - Global _external_log_handler for application lifecycle - _external_logging_processor: structlog processor for handler integration - Updated _configure_logging(): Add handler to processor chain when enabled - Updated _lifespan(): Initialize handler before startup, shutdown on termination ### 5. Tests: backend/tests/test_external_logging.py - 20 comprehensive tests covering all handlers and factory - Configuration validation tests - All tests passing ## Design Decisions: 1. Non-blocking Delivery: External logging never blocks request handling. Failures are logged locally but don't impact application. 2. Buffering Strategy: In-memory buffer with configurable size prevents unbounded memory growth. When buffer fills, oldest logs are dropped with a warning. 3. Retry Logic: Transient failures (timeouts, 5xx errors) are retried with exponential backoff. Permanent failures (bad credentials) are logged and skipped. 4. Disabled by Default: External logging is opt-in via environment variables, maintaining backward compatibility with existing deployments. 5. Provider Flexibility: Support for multiple platforms allows users to choose based on their infrastructure (cloud-native, on-premise, etc). ## Backward Compatibility: - All new configuration fields have defaults - External logging disabled by default - No changes to existing logging behavior unless explicitly configured - No new required dependencies ## Testing: - All 20 new tests passing - Existing tests unaffected (same count of passing tests) - Configuration validation tested - Handler creation and lifecycle management tested Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 18:25:26 +02:00
Lukas	60d9c5b340	Refactor filter configuration with regex validation - Add regex validation utility for query strings - Update filter_config_service to use regex validation - Add comprehensive test coverage for regex validator - Update exception handling for validation errors - Update documentation for tasks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 18:17:12 +02:00
Lukas	67b26a3ef7	Refactor pagination with cursor-based support and standardized response format - Implement cursor-based pagination in pagination.py - Update response models to standardize pagination structure - Add cursor pagination utilities for repositories - Update HistoryArchiveRepository and ImportLogRepository with new pagination - Add comprehensive tests for cursor pagination - Update documentation for backend development and task tracking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 17:54:05 +02:00
Lukas	0221e423f2	Fix pagination metadata return structure and test assertions The API pagination infrastructure was already correctly implemented with: - PaginatedListResponse base model containing 'items' and 'pagination' fields - PaginationMetadata object with all required fields (page, page_size, total, total_pages, has_next_page, has_prev_page) - All services correctly calling create_pagination_metadata() However, there were two bugs preventing tests from passing: 1. IMPORT BUG: time_utils.py was importing TIME_RANGE_SECONDS from app.models.ban when it's actually defined in app.models._common. This caused import errors in tests that exercise time-range filtering. 2. TEST BUG: Test assertions were using outdated API structure, accessing .total, .page, .page_size directly on paginated responses instead of through the .pagination object. Fixed locations: - test_mappers/test_ban_mappers.py: 3 assertions updated to use .pagination.* - test_services/test_blocklist_service.py: 6 assertions updated - test_services/test_history_service.py: 14 assertions updated All paginated API endpoints now correctly return pagination metadata: - GET /api/history - GET /api/history/archive - GET /api/dashboard/bans - GET /api/jails/{name}/banned - GET /api/blocklists/log Verified with 24 passing pagination tests demonstrating correct behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 15:42:05 +02:00
Lukas	73021429f7	refactor: restructure API pagination metadata for better frontend usability - Create PaginationMetadata model with computed derived fields (total_pages, has_next_page, has_prev_page) - Update PaginatedListResponse to embed pagination metadata in a separate 'pagination' object - Add create_pagination_metadata() factory function in utils/pagination.py for consistent computation - Update all paginated service functions to use new structure: - history_service.list_history() - blocklist_service.get_import_logs() - jail_service.get_jail_banned_ips() - ban_mappers.map_domain_dashboard_ban_list_to_response() - Update response model docstrings with new structure examples - Update Backend-Development.md documentation with new pagination patterns - Update test fixtures to work with new response structure Response shape changes from: {"items": [...], "total": 100, "page": 1, "page_size": 50} To: {"items": [...], "pagination": {"page": 1, "page_size": 50, "total": 100, "total_pages": 2, "has_next_page": true, "has_prev_page": false}} Benefits: - Frontend receives all pagination state needed for UI controls - No need for frontend to calculate total_pages or page navigation logic - Consolidated pagination metadata reduces field sprawl - OpenAPI schema automatically reflects changes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-30 22:24:42 +02:00
Lukas	05c3b564ae	Refactor scheduler lock implementation with heartbeat mechanism - Add heartbeat-based lock renewal in scheduler_lock_heartbeat.py - Update scheduler_lock.py with improved lock management - Add comprehensive tests for scheduler lock functionality - Update deployment and task documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-30 22:10:38 +02:00
Lukas	3bd9848a08	Implement global rate limiter and refactor auth middleware - Add global rate limiter utility with configurable limits and cleanup - Move rate limiting logic to middleware for consistent application - Update auth routes to use new rate limiter - Add comprehensive tests for rate limiter functionality - Update documentation with backend development guidelines and tasks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-30 21:26:31 +02:00
Lukas	c4ede71fa6	Fix: Enforce single-worker deployment for session cache cluster safety Addresses: Backend session cache not cluster-safe (multi-worker issue) Problem: - Session cache is process-local (InMemorySessionCache) - Multi-worker deployments (uvicorn --workers N) create separate processes - Each process has its own independent session cache - Sessions cached in Worker A are invisible to Workers B, C, D - Users randomly logged out when requests land on different workers - Also affects RuntimeState, rate limiter, and background jobs Solution (Option A - Strict single-worker enforcement): - Enhance startup validation with clearer error messages - Update error messages to explain the problem and how to fix it - Document single-worker requirement prominently in Docker configs - Update module docstrings to clarify constraints Changes: 1. app/startup.py: - Enhanced _check_single_worker_mode() error message with troubleshooting - Enhanced _stage_check_worker_mode_and_acquire_lock() error message - Removed unused import 2. app/utils/session_cache.py: - Updated module docstring to explain constraints more clearly - Added references to deployment documentation - Clarified multi-worker solution for future implementation 3. app/utils/runtime_state.py: - Updated module docstring with deployment constraint references - Aligned messaging with session_cache.py 4. Docker/Dockerfile.backend: - Added comprehensive comments about single-worker requirement - Explained impact in multi-worker deployments - Referenced deployment constraints documentation 5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml: - Added documentation comments about BANGUI_WORKERS constraint - Explained why single-worker is required 6. backend/tests/test_startup_integration.py: - Fixed test unpacking to match function return signature (3 values, not 2) This ensures multi-worker deployments fail loudly at startup with clear guidance on what went wrong and how to fix it. The database-backed scheduler lock provides defense-in-depth for container orchestration scenarios. For future multi-worker support, implement: - Redis or database-backed session cache - Shared RuntimeState coordination - Distributed APScheduler backend Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-30 20:54:24 +02:00
Lukas	277f2a467c	Refactor rate limiting with exponential backoff strategy - Update rate limiter to use exponential backoff instead of fixed limit - Implement progressive delays for failed login attempts (0.5s, 1s, 2s, 4s, 5s max) - Update auth router documentation and endpoint docs - Refactor test suite to match new rate limiting behavior - Update backend development documentation - Clean up unused tasks documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-30 19:58:09 +02:00
Lukas	187cd8250d	Implement database-backed scheduler lock for multi-worker safety Enforce single-executor safety regardless of process launcher through a robust database-backed lock mechanism that works reliably in container orchestration environments. Key changes: 1. Add scheduler_lock table to database schema (migration 4) - Singleton row (id=1) prevents concurrent execution - Stores PID, hostname, creation timestamp, heartbeat timestamp - Atomic transaction prevents race conditions 2. Create scheduler lock utility (app/utils/scheduler_lock.py) - acquire_scheduler_lock(): Atomically acquire or fail - release_scheduler_lock(): Clean up on shutdown - update_scheduler_lock_heartbeat(): Keep lock alive (every 10 seconds) - get_scheduler_lock_info(): Debug/inspect lock status - Stale lock detection: TTL-based (60 second expiry) 3. Reorder startup DAG stages - DATABASE now comes first (required for lock acquisition) - WORKER_MODE depends on DATABASE (performs lock check after initialization) - Maintains all other stage dependencies intact 4. Update startup process (app/startup.py) - Replace _check_single_worker_mode() with two-tier check: * Fast check: BANGUI_WORKERS env var (if explicitly set to >1) * Authoritative check: Database lock (catches misconfiguration) - Return startup_db from startup_shared_resources() for lock management 5. Register scheduler lock heartbeat task - New task: scheduler_lock_heartbeat (app/tasks/scheduler_lock_heartbeat.py) - Updates lock heartbeat every 10 seconds (keeps lock alive) - Prevents false positives from temporary load spikes 6. Add lock release to lifespan shutdown (app/main.py) - Release lock before closing database - Allows other instances to acquire during rolling deployments - Graceful handoff between instances 7. Comprehensive test coverage (backend/tests/test_scheduler_lock.py) - Lock acquisition success and failure cases - Stale lock cleanup on startup - Lock release and heartbeat updates - Full lifecycle: acquire → heartbeat → release 8. Update documentation (Docs/Architekture.md § 9.3) - Explain single-executor requirement - Document database-backed locking mechanism - Compare with alternative approaches (filesystem, env var) - Include troubleshooting guide - Container orchestration examples (Docker, Kubernetes, systemd) Why database-backed instead of filesystem? - Atomicity: SQLite transactions prevent TOCTOU race windows - Container-safe: Works across containers with shared DB volumes - No NFS/SMB edge cases - Timestamp-based stale detection (PID reuse is unreliable) - More reliable in rolling deployments Benefits: - Works with any process manager (uvicorn, gunicorn, etc.) - Handles simultaneous startup attempts correctly - Automatic failover on instance crash (stale lock cleanup) - Clear error messages with troubleshooting steps - No environment variable required (lock is authoritative) - Scales to multi-worker deployments if combined with external job store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-29 20:10:53 +02:00
Lukas	6bc440dce4	Refactor backend configuration and authentication - Add comprehensive documentation for backend development - Improve client IP detection with utility functions and tests - Update auth router with better error handling - Refactor config module with environment-based settings Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-29 19:39:55 +02:00
Lukas	c2dd9f5f55	Add scheduled cleanup for rate limiter (#32 ) Implement periodic cleanup of expired rate-limiter entries to prevent unbounded memory growth during long runtimes. Changes: - Create rate_limiter_cleanup task that calls cleanup_expired() every 30 minutes - Register the task in the startup DAG alongside other background jobs - Update rate_limiter module documentation with operational notes about the cleanup lifecycle and memory management strategy The cleanup is conservative and only removes IPs with no recent attempts (all timestamps outside the rate-limit window), so active IPs are preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-29 19:28:45 +02:00
Lukas	cc4370c50d	feat: Add runtime DNS-rebinding protection for blocklist HTTP connections ## Problem The blocklist URL validation at create/update time has a TOCTOU (time-of-check-to-time-of-use) window. An attacker can perform a DNS-rebinding attack where: 1. User adds blocklist URL pointing to attacker.com 2. At create time, attacker.com resolves to a public IP → validation passes 3. Later, when fetching, attacker.com resolves to 192.168.1.1 (internal network) 4. HTTP client connects to the private IP, potentially accessing internal services ## Solution Add runtime destination IP validation at connection time via a custom socket factory: - Created 'dns_validated_connector.py' with create_dns_validated_socket_factory() that validates all resolved IPs before socket creation - HTTP session now uses the validated socket factory, protecting all blocklist imports globally - Rejects connections to RFC 1918 private ranges, loopback, link-local, ULA, multicast, and reserved addresses (IPv4 and IPv6) - Added comprehensive test coverage with 13 test cases ## Changes - backend/app/services/dns_validated_connector.py: Custom socket factory with IP validation - backend/app/startup.py: Use DNS-validated socket factory in HTTP session creation - backend/app/utils/ip_utils.py: Updated docstring explaining runtime validation - backend/app/services/blocklist_downloader.py: Updated module docstring - backend/app/services/blocklist_service.py: Updated docstrings explaining two-layer protection - backend/tests/test_services/test_dns_validated_connector.py: Test suite for socket factory - Docs/Architekture.md: Added detailed section on DNS-rebinding protection ## Testing - All 13 DNS validation tests pass - All blocklist downloader tests pass (unaffected by changes) - Linting: ruff, mypy pass with --strict - Test coverage: 90% line coverage on dns_validated_connector.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-29 19:10:51 +02:00
Lukas	9072117db3	## 28) Login failure delay can enable app-layer DoS	2026-04-29 19:02:00 +02:00
Lukas	a2129bb9bd	Pagination contract is not standardized across endpoints	2026-04-28 21:40:22 +02:00
Lukas	2e221f6852	Refactor: Move module-level mutable flags to JailServiceState TASK-004: Replace module-level mutable runtime flags in service layer with injected state holder, eliminating hidden global state and improving testability and synchronization boundaries. Changes: - Create JailServiceState dataclass in app/utils/runtime_state.py to hold backend capability cache and synchronization lock - Add JailServiceState as a field in RuntimeState (with default_factory) - Remove module-level _backend_cmd_supported and _backend_cmd_lock from jail_service.py - Refactor _check_backend_cmd_supported() to accept state parameter - Inject JailServiceState into list_jails() and _fetch_jail_summary() via parameters - Add get_jail_service_state() dependency provider in app/dependencies.py - Add JailServiceStateDep type alias for router injection - Update jails router to receive and pass state to service functions - Update all tests to use jail_service_state fixture and pass state to functions - Remove duplicate _MAX_PAGE_SIZE constant definition - Document mutable state management in Backend-Development.md - Update Architecture.md to describe JailServiceState and state nesting pattern Benefits: - Eliminates global mutable state and associated race conditions - Makes state visible to callers (not hidden in module scope) - Enables test isolation (each test gets fresh state) - Prepares codebase for multi-worker deployments (state can be extracted to shared backend) - Synchronization boundaries are now explicit (state.get_backend_cmd_lock()) Compliance: - All tests pass (17 passed in TestListJails, TestGetJail, TestLockInitialization) - No ruff linting errors - Type-safe: JailServiceState properly typed with asyncio.Lock, bool \| None Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-27 18:42:52 +02:00
Lukas	5d24780c63	TASK-028: Add exception logging to fire-and-forget asyncio.create_task() - Create logged_task() helper in backend/app/utils/async_utils.py to wrap fire-and-forget coroutines with exception logging - Ensures unhandled task exceptions are always logged to structlog instead of silently discarded (Python 3.11+ RuntimeWarning) - Update ban_service.py to use logged_task() for geo_cache.lookup_batch() background resolution - Add comprehensive tests for logged_task() in test_async_utils.py - Document fire-and-forget task conventions in Backend-Development.md The logged_task() wrapper catches any exception raised in a background task, logs it with full traceback context and task name, and never re-raises. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-26 15:17:30 +02:00
Lukas	d476e9d611	TASK-020: Fix log_target security vulnerability (defense in depth) Issue: - log_target accepted arbitrary paths, allowing authenticated users to write files as root via fail2ban (e.g., /etc/cron.d/bangui-pwned) - fail2ban runs as root and opens files specified in log_target Solution: 1. Model layer validation: Already existed in GlobalConfigUpdate, prevents invalid paths before reaching service 2. Service layer validation: Added defensive check in update_global_config() that validates log_target even if model validation is bypassed 3. New validation helper: Added validate_log_target() utility that accepts special values (STDOUT, STDERR, SYSLOG) or paths within allowed directories Changes: - app/utils/path_utils.py: Added validate_log_target() helper - app/services/config_service.py: Added service-layer validation before sending command to fail2ban - backend/tests: Fixed session_secret length issues in fixtures (min 32 chars) - backend/tests: Added tests for valid special log targets - Docs/Backend-Development.md: Documented log_target security requirements Test Coverage: - Model validation rejects /etc/passwd (existing test) - Model validation accepts STDOUT, STDERR, SYSLOG special values - Model validation accepts paths in allowed directories - Service layer validation tested with special values Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-26 14:23:56 +02:00
Lukas	667ab674ca	Fix SQLite LIKE wildcard escaping in IP filter queries - Add escape_like() helper to escape % and _ wildcards in LIKE queries - Update fail2ban_db_repo.get_history_page() to use escaping - Update history_archive_repo.get_archived_history() to use escaping - Add ESCAPE clause to all LIKE queries - Add comprehensive unit tests for escape_like function - Add integration tests for LIKE wildcard handling - Document LIKE escaping best practices in Backend-Development.md Fixes TASK-017: Prevent unintended LIKE matches when IP filter contains special characters like underscore or percent sign. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-26 14:07:49 +02:00
Lukas	94bdabe622	TASK-016: Validate delete_log_path query parameter with allowlist - Extract path validation logic into shared helper function in backend/app/utils/path_utils.py (validate_log_path) - Refactor AddLogPathRequest to use the helper function - Apply the same validation to DELETE /api/config/jails/{name}/logpath endpoint by validating the log_path query parameter - Return HTTP 422 with descriptive error if validation fails - Add comprehensive unit tests for path validation - Update Backend-Development.md with usage examples This prevents path-traversal attacks on the delete_log_path endpoint by ensuring all log paths are within allowlisted directories. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-26 14:04:21 +02:00
Lukas	4ab767e3d4	TASK-009: Mitigate SSRF vulnerability in blocklist URL validation - Change BlocklistSourceCreate.url from str to AnyHttpUrl (Pydantic type) - Rejects non-http schemes (file://, ftp://, etc.) at model boundary - Add is_private_ip() utility to detect RFC 1918 private ranges: - 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (RFC 1918) - 127.0.0.0/8, ::1/128 (loopback) - 169.254.0.0/16, fe80::/10 (link-local) - IPv6 site-local, multicast, and reserved ranges - Add async validate_blocklist_url() function: - Resolves hostname via DNS using loop.run_in_executor() - Rejects if hostname resolves to private/reserved IP - Raises ValueError on validation failure - Integrate validation into service layer: - create_source() calls validate_blocklist_url() before persist - update_source() conditionally validates if url provided - Both raise ValueError on failure - Update router endpoints with error handling: - create_blocklist() and update_blocklist() catch ValueError - Return HTTP 400 Bad Request with descriptive error message - Add comprehensive test coverage (9 new SSRF tests): - file://, ftp://, localhost, 127.0.0.1, 192.168.x.x - 10.x.x.x, 172.16.x.x, 169.254.x.x (link-local) - Valid public URLs (passes validation) - All 36 service tests passing - Update documentation: - Features.md: Document URL validation constraints - Backend-Development.md: Add SSRF prevention pattern section Fixes SSRF vulnerability where authenticated users could supply file://, ftp://, or private IP URLs and the backend would fetch them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-26 12:57:23 +02:00
Lukas	ea4c7c2f85	Implement login endpoint rate limiting (TASK-007) - Add in-memory rate limiter with per-IP deque tracking of attempt timestamps - Limit login attempts to 5 per 60 seconds per IP, return 429 on excess - Add Retry-After header to rate limit responses - Implement IP extraction utility with proxy trust validation (prevent X-Forwarded-For spoofing) - Integrate rate limiter into auth router and dependencies - Add 10-second asyncio.sleep on failed login attempts to further slow brute-force - Add comprehensive tests for rate limiting (9 new tests, all passing) - Update Features.md to document login rate limiting - Update Backend-Development.md with rate limiting conventions and design patterns - Fix test infrastructure issues: update password to meet complexity requirements - Fix TestValidateSession tests to use Bearer token authentication - All tests passing: 23 auth tests + full test suite coverage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-26 12:40:52 +02:00
Lukas	d982fe3efc	TASK-003: Document process-local constraint for RuntimeState and SessionCache - Add comprehensive docstring to runtime_state.py explaining single-process constraint, impacts in multi-worker deployments, and solution approach - Add comprehensive docstring to session_cache.py explaining process-local cache limitation, security implications, and Redis/database alternatives - Update Architecture.md to clarify session cache is process-local and describe single-worker enforcement via TASK-002 - Update Architecture.md runtime state section with detailed explanation of per-process state and multi-worker impacts - Add Backend-Development.md section 13.7.2 documenting session cache pluggability pattern with example Redis implementation - All tests pass; linting passes; type checking has pre-existing errors This is the short-term fix for TASK-003: enforce single-worker deployment (TASK-002) and document the constraint clearly. The long-term fix (Redis backend) is deferred as a follow-up.	2026-04-26 11:43:34 +02:00
Lukas	ac2028e1c2	Fix: Consolidate divergent _since_unix implementations (T-09) Consolidate the two divergent implementations of _since_unix from ban_service.py and history_service.py into a single shared utility function in time_utils.py. Changes: - Move _since_unix to app/utils/time_utils.py with consistent time.time() approach - Move TIME_RANGE_SLACK_SECONDS constant to app/utils/constants.py - Update ban_service.py to import since_unix from time_utils - Update history_service.py to import since_unix from time_utils - Both services now use the same window boundary calculation with 60-second slack - Add comprehensive tests for the shared since_unix function - Document timestamp handling rationale in Backend-Development.md This ensures dashboard and history queries return consistent row counts for the same time range by using the same timestamp calculation and slack window across all services. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-25 18:44:59 +02:00

1 2

88 Commits