Commit Graph

49 Commits

Author SHA1 Message Date
c988b4b8b6 Refactor provider composition and ESLint configuration
- Add new provider composition system with validation
- Create providerComposition.tsx for centralized provider management
- Implement providerOrderValidator.tsx to ensure correct provider order
- Add comprehensive tests for provider composition
- Create custom ESLint rules in frontend/eslint-rules/
- Update ESLint configuration
- Update architecture and tasks documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 17:33:56 +02:00
f074882f2d Update documentation and ErrorBoundary component
- Updated architecture documentation with refactoring notes
- Updated task documentation with progress
- Enhanced ErrorBoundary component for better error handling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 20:43:41 +02:00
69d32bfbe9 feat: Implement cross-tab authentication synchronization in AuthProvider
- Add BroadcastChannel API for real-time logout synchronization across tabs
- Implement storage event listener as fallback for older browsers
- When a user logs out in one tab, all other tabs immediately reflect the logout state
- Update tests to verify storage event and BroadcastChannel behavior
- Update Architecture.md to document cross-tab synchronization
- Update Web-Development.md with authentication state management notes

The provider now broadcasts logout messages to other tabs so they immediately
reflect the logout state without requiring a page refresh or additional API calls.
The implementation uses BroadcastChannel as the primary sync mechanism with
storage events as a fallback for older browsers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 20:15:26 +02:00
3d5acb756f refactor: move repository and service imports to module level in dependencies.py
Move all repository imports (session_repo, blocklist_repo, import_log_repo,
settings_repo, history_archive_repo, geo_cache_repo, fail2ban_db_repo) and
service imports (auth_service, health_service, default_fail2ban_metadata_service)
to module level in app/dependencies.py.

This eliminates the pattern of local imports inside provider functions,
providing consistency and reducing import overhead. The from app.db import
open_db remains a local import since it's only used within get_db().

- Verified no circular dependencies exist
- All repository and service provider functions simplified to return modules
- Updated Architekture.md § 2.3 to document the module-level import pattern
- All tests pass (28 dependency + auth tests)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 20:06:10 +02:00
277f2a467c Refactor rate limiting with exponential backoff strategy
- Update rate limiter to use exponential backoff instead of fixed limit
- Implement progressive delays for failed login attempts (0.5s, 1s, 2s, 4s, 5s max)
- Update auth router documentation and endpoint docs
- Refactor test suite to match new rate limiting behavior
- Update backend development documentation
- Clean up unused tasks documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:58:09 +02:00
2db635ae19 Fix exception handler overlap issue - add DomainError catch-all handler
**Problem:** Broad exception handlers created fragility where adding a new
DomainError subclass without explicit registration would silently fall through
to the generic exception handler, losing the specific error_code and metadata.

**Solution:**
1. Import DomainError in main.py for explicit handler registration
2. Fix type hints in exception handlers from 'Exception' to specific types
   - NotFoundError handler now typed as 'NotFoundError'
   - BadRequestError handler now typed as 'BadRequestError'
   - ConflictError handler now typed as 'ConflictError'
   - DomainError handler now typed as 'DomainError'
   - ServiceUnavailableError handler now typed as 'ServiceUnavailableError'
3. Add DomainError as an explicit catch-all handler in the registration chain
   - Positioned after specific handlers, before HTTPException
   - Any unregistered DomainError subclass now gets correct error_code + metadata
4. Document the exception handler hierarchy with detailed comments
5. Update Backend-Development.md with handler hierarchy documentation
6. Update Architekture.md section 2.2 with exception handler details
7. Fix test expectations in test_main.py to verify ErrorResponse format

**Impact:** Any new DomainError subclass now automatically gets correct HTTP 500
status, error_code, and metadata - even if developer forgets explicit handler.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:44:43 +02:00
9b4aee7f37 docs: enhance Pydantic validator constraints and mark task complete
Verified that BanGUI's codebase is fully compliant with the constraint that
Pydantic validators must not execute at import time or have side effects.

Changes:
- Architekture.md § 2.1: Added explicit 'No I/O or Side Effects' constraint
  for model validators, explaining why this prevents circular dependencies
- Backend-Development.md: Enhanced validator documentation with subsection
  on import-time execution, including wrong/correct examples
- Tasks.md: Marked '[Backend] Pydantic validators execute at import time'
  as COMPLETE with verification results and regression prevention guidance

Verification Summary:
✓ Audited 14 model files: no problematic imports or function calls
✓ Import time: 0.159s (fast, no import-time side effects)
✓ Type checking: mypy --strict passes on all models
✓ Unit tests: 17 tests pass (100%)
✓ Correct pattern in use: validation in routers/services, not models

The codebase architecture is sound—no code changes required, only
documentation clarification to prevent future violations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:37:03 +02:00
100fd47c4b Refactor: Make model packages true leaf nodes - remove app-layer dependencies
Models in app/models/ are now pure data classes with no cross-layer dependencies.
This ensures the models layer remains a true leaf node in the dependency graph.

Changes:
- Create app/models/_common.py with shared types (TimeRange, bucket_count, constants)
- Move TimeRange and time-range constants from ban.py to _common.py
- Update history.py, routers, and services to import from _common.py
- Remove imports from app.config and app.utils from config.py models
- Move field validators from models to router layer:
  - Add log_target validation in config_misc router
  - Add log_path validation in jail_config router
- Update test_models.py to reflect validators moved to router layer
- Update documentation (Architekture.md, Backend-Development.md) with model layering rules
- Fix import ordering and type annotations in affected files

Model layering rule: Models may only import from:
✓ Standard library and third-party packages (Pydantic, typing)
✓ Other models in app/models/ (sibling models)
✓ app.models.response (response envelopes)
✗ app.services, app.config, app.utils, or any application layer

Validation requiring app-level state (settings, allowed directories) now happens
at the router or service layer, not in model validators.

Fixes: Models were not true leaf nodes due to circular imports and app-layer dependencies

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 19:31:11 +02:00
3d1a6f5538 Implement frontend and backend observability alignment
Align frontend and backend error observability with correlation IDs and
structured telemetry for distributed tracing across systems.

Backend changes:
- Add CorrelationIdMiddleware to generate/extract correlation IDs
- Include correlation_id in all ErrorResponse objects
- Store correlation ID in structlog contextvars for automatic inclusion in logs
- Add correlation ID to response headers (X-Correlation-ID)

Frontend changes:
- API client automatically generates session-scoped UUID4 and includes
  X-Correlation-ID header in all requests
- Extract correlation ID from API error responses
- Update error handlers to use telemetry with correlation IDs
- Add telemetry logging to ErrorBoundary, PageErrorBoundary, SectionErrorBoundary
- Implement redaction utilities for privacy-safe logging of sensitive data

Documentation:
- Add observability guidelines to Web-Development.md
  * Correlation ID usage patterns
  * Privacy & security best practices
  * Telemetry event structure
  * Redaction utilities for sensitive data
- Add distributed tracing architecture section to Architecture.md
  * Correlation ID flow across frontend/backend
  * Example troubleshooting scenario
  * Implementation details for future enhancements

Testing:
- Add comprehensive tests for correlation middleware
- Update error boundary tests to verify telemetry integration
- Verify TypeScript and ESLint pass with no warnings

Fixes: Issue #40 - Frontend and backend observability are not aligned

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 18:32:19 +02:00
9a43123b3a docs: Define explicit DI container strategy for backend service graph
- Add comprehensive 'Dependency Wiring and Service Composition' section to
  Architekture.md (§ 2.3) documenting:
  * The lightweight FastAPI Depends() pattern used as composition root
  * Service composition through explicit parameter passing
  * Service context dependencies pattern (SessionServiceContext, etc.)
  * Repository boundary enforcement
  * Lifecycle and scope management
  * Checklist for adding new services

- Update Backend-Development.md to reference the new Architecture section
  from the 'Dependency Layering' section

- Enhance dependencies.py module docstring with clear explanation of:
  * Composition root pattern
  * Explicit over implicit principles
  * Service context dependencies
  * Repository boundary enforcement

This resolves issue #39 by providing clear guidance on dependency wiring
without over-engineering. The pattern uses FastAPI's built-in Depends()
framework and avoids heavyweight container libraries, keeping the solution
lightweight and maintainable.

Fixes: #39

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:25:25 +02:00
187cd8250d Implement database-backed scheduler lock for multi-worker safety
Enforce single-executor safety regardless of process launcher through a
robust database-backed lock mechanism that works reliably in container
orchestration environments.

Key changes:
1. Add scheduler_lock table to database schema (migration 4)
   - Singleton row (id=1) prevents concurrent execution
   - Stores PID, hostname, creation timestamp, heartbeat timestamp
   - Atomic transaction prevents race conditions

2. Create scheduler lock utility (app/utils/scheduler_lock.py)
   - acquire_scheduler_lock(): Atomically acquire or fail
   - release_scheduler_lock(): Clean up on shutdown
   - update_scheduler_lock_heartbeat(): Keep lock alive (every 10 seconds)
   - get_scheduler_lock_info(): Debug/inspect lock status
   - Stale lock detection: TTL-based (60 second expiry)

3. Reorder startup DAG stages
   - DATABASE now comes first (required for lock acquisition)
   - WORKER_MODE depends on DATABASE (performs lock check after initialization)
   - Maintains all other stage dependencies intact

4. Update startup process (app/startup.py)
   - Replace _check_single_worker_mode() with two-tier check:
     * Fast check: BANGUI_WORKERS env var (if explicitly set to >1)
     * Authoritative check: Database lock (catches misconfiguration)
   - Return startup_db from startup_shared_resources() for lock management

5. Register scheduler lock heartbeat task
   - New task: scheduler_lock_heartbeat (app/tasks/scheduler_lock_heartbeat.py)
   - Updates lock heartbeat every 10 seconds (keeps lock alive)
   - Prevents false positives from temporary load spikes

6. Add lock release to lifespan shutdown (app/main.py)
   - Release lock before closing database
   - Allows other instances to acquire during rolling deployments
   - Graceful handoff between instances

7. Comprehensive test coverage (backend/tests/test_scheduler_lock.py)
   - Lock acquisition success and failure cases
   - Stale lock cleanup on startup
   - Lock release and heartbeat updates
   - Full lifecycle: acquire → heartbeat → release

8. Update documentation (Docs/Architekture.md § 9.3)
   - Explain single-executor requirement
   - Document database-backed locking mechanism
   - Compare with alternative approaches (filesystem, env var)
   - Include troubleshooting guide
   - Container orchestration examples (Docker, Kubernetes, systemd)

Why database-backed instead of filesystem?
   - Atomicity: SQLite transactions prevent TOCTOU race windows
   - Container-safe: Works across containers with shared DB volumes
   - No NFS/SMB edge cases
   - Timestamp-based stale detection (PID reuse is unreliable)
   - More reliable in rolling deployments

Benefits:
   - Works with any process manager (uvicorn, gunicorn, etc.)
   - Handles simultaneous startup attempts correctly
   - Automatic failover on instance crash (stale lock cleanup)
   - Clear error messages with troubleshooting steps
   - No environment variable required (lock is authoritative)
   - Scales to multi-worker deployments if combined with external job store

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 20:10:53 +02:00
1302ac821f Fix non-atomic setup persistence across DB contexts (Issue #30)
Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.

## Changes

### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
   - New set_settings_batch() method: writes multiple settings in single
     transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
     or none do, preventing partial state if crash occurs mid-batch.

2. **setup_service.py**: Refactor run_setup() with transactional phases
   - Phase 0: Compute password hash early (before any DB writes) to ensure
     idempotency. Same hash is used throughout retries, preventing divergent
     hashes from bcrypt's random salt.
   - Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
     database_path, then commit. First checkpoint for crash detection.
   - Phase 2 (Filesystem): Initialize runtime database (idempotent)
   - Phase 3 (Runtime DB transaction): Batch-write all settings atomically
   - Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
     setup_completed=1. Final commit point.

3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol

### Testing
- Added 6 new transactionality tests covering:
  - State machine transitions (None → in_progress → complete)
  - Password hash idempotency across retries
  - Atomic batch writes (all-or-nothing persistence)
  - Bootstrap DB state tracking
  - Database path propagation to both DBs
  - Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)

### Documentation
- Updated Docs/Architekture.md with new section 6:
  - Setup state machine with state transitions
  - Transaction boundary documentation
  - Password hash idempotency rationale
  - Backward compatibility notes

## Design Decisions

### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
  hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers

### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:19:53 +02:00
cc4370c50d feat: Add runtime DNS-rebinding protection for blocklist HTTP connections
## Problem
The blocklist URL validation at create/update time has a TOCTOU (time-of-check-to-time-of-use) window.
An attacker can perform a DNS-rebinding attack where:
1. User adds blocklist URL pointing to attacker.com
2. At create time, attacker.com resolves to a public IP → validation passes
3. Later, when fetching, attacker.com resolves to 192.168.1.1 (internal network)
4. HTTP client connects to the private IP, potentially accessing internal services

## Solution
Add runtime destination IP validation at connection time via a custom socket factory:

- Created 'dns_validated_connector.py' with create_dns_validated_socket_factory() that validates
  all resolved IPs before socket creation
- HTTP session now uses the validated socket factory, protecting all blocklist imports globally
- Rejects connections to RFC 1918 private ranges, loopback, link-local, ULA, multicast, and
  reserved addresses (IPv4 and IPv6)
- Added comprehensive test coverage with 13 test cases

## Changes
- backend/app/services/dns_validated_connector.py: Custom socket factory with IP validation
- backend/app/startup.py: Use DNS-validated socket factory in HTTP session creation
- backend/app/utils/ip_utils.py: Updated docstring explaining runtime validation
- backend/app/services/blocklist_downloader.py: Updated module docstring
- backend/app/services/blocklist_service.py: Updated docstrings explaining two-layer protection
- backend/tests/test_services/test_dns_validated_connector.py: Test suite for socket factory
- Docs/Architekture.md: Added detailed section on DNS-rebinding protection

## Testing
- All 13 DNS validation tests pass
- All blocklist downloader tests pass (unaffected by changes)
- Linting: ruff, mypy pass with --strict
- Test coverage: 90% line coverage on dns_validated_connector.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-29 19:10:51 +02:00
e86ab6dad1 10) Implement explicit startup DAG for resource initialization
- Created StartupDAG class to orchestrate startup stages with explicit dependencies
- Defined 6 startup stages: WORKER_MODE → DATABASE → GEO_CACHE → HTTP_SESSION → SCHEDULER → TASKS
- Each stage has prerequisites, error handling, and rollback support
- Refactored startup_shared_resources() to use the DAG
- Added StartupContext for resource tracking and failure management
- Partial failures automatically roll back all completed resources in reverse order
- Added health checks to verify all resources initialized successfully
- Comprehensive test coverage: 15 DAG unit tests + 3 integration tests + 6 existing tests
- Documented startup DAG in Architekture.md with detailed stage descriptions and failure modes

This replaces implicit ordering with explicit dependency tracking, making lifecycle
changes safe and failure modes predictable. Hidden order dependencies no longer exist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:08:05 +02:00
3888c5eb3f Refactor ban management with domain models and mappers
- Add ban domain model for core business logic separation
- Implement mapper pattern for DTO/domain conversions
- Update ban service with new domain-driven approach
- Refactor router endpoints to use new architecture
- Add comprehensive mapper tests
- Update documentation with architecture changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 07:46:02 +02:00
2e221f6852 Refactor: Move module-level mutable flags to JailServiceState
TASK-004: Replace module-level mutable runtime flags in service layer with
injected state holder, eliminating hidden global state and improving testability
and synchronization boundaries.

Changes:
- Create JailServiceState dataclass in app/utils/runtime_state.py to hold
  backend capability cache and synchronization lock
- Add JailServiceState as a field in RuntimeState (with default_factory)
- Remove module-level _backend_cmd_supported and _backend_cmd_lock from
  jail_service.py
- Refactor _check_backend_cmd_supported() to accept state parameter
- Inject JailServiceState into list_jails() and _fetch_jail_summary() via
  parameters
- Add get_jail_service_state() dependency provider in app/dependencies.py
- Add JailServiceStateDep type alias for router injection
- Update jails router to receive and pass state to service functions
- Update all tests to use jail_service_state fixture and pass state to functions
- Remove duplicate _MAX_PAGE_SIZE constant definition
- Document mutable state management in Backend-Development.md
- Update Architecture.md to describe JailServiceState and state nesting pattern

Benefits:
- Eliminates global mutable state and associated race conditions
- Makes state visible to callers (not hidden in module scope)
- Enables test isolation (each test gets fresh state)
- Prepares codebase for multi-worker deployments (state can be extracted to
  shared backend)
- Synchronization boundaries are now explicit (state.get_backend_cmd_lock())

Compliance:
- All tests pass (17 passed in TestListJails, TestGetJail, TestLockInitialization)
- No ruff linting errors
- Type-safe: JailServiceState properly typed with asyncio.Lock, bool | None

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:42:52 +02:00
e08a16c7dd Refactor: Split blocklist import flow into focused components
Extracted the monolithic import_source() function (776 lines) into focused,
testable components with clear single responsibilities:

- BlocklistDownloader: HTTP download with exponential backoff retry logic
  * Handles transient failures (429, 5xx errors, timeouts)
  * Configurable retry attempts and backoff strategy
  * 93% test coverage

- BlocklistParser: Parse and validate IP addresses
  * Extract valid IPv4/IPv6 addresses from text
  * Skip CIDRs and malformed entries gracefully
  * Separate parsing from validation concerns
  * 100% test coverage

- BanExecutor: Ban execution with error handling
  * Ban IPs via fail2ban socket
  * Stop on JailNotFoundError (jail doesn't exist)
  * Continue on JailOperationError (individual ban failures)
  * 100% test coverage

- BlocklistImportWorkflow: Thin orchestrator
  * Coordinates the download → parse → ban → log flow
  * Pre-warms geo cache with newly banned IPs
  * 96% test coverage

- blocklist_service.py: Maintains public API
  * Source CRUD (create, read, update, delete)
  * URL validation and preview functionality
  * Scheduling configuration and import triggers
  * 92% test coverage

Benefits:
* Each component is independently testable with mock dependencies
* Error handling is explicit and localized
* Components can evolve independently
* Logging is contextual and clear
* Retry and transient error handling are isolated

Testing:
* All 36 existing blocklist_service tests pass
* All 13 blocklist import task tests pass
* Added 17 comprehensive component unit tests
* Combined 96%+ coverage on new modules
* Zero type errors in new code

Documentation:
* Updated Refactoring.md with detailed architecture notes
* Added component architecture diagram to Architekture.md
* Documented ownership and responsibilities of each component

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-27 18:34:11 +02:00
e2560f5db0 TASK-032: Implement geo_cache retention policy and cleanup
Add automatic cleanup of stale geolocation cache entries to prevent
unbounded database growth. Resolves the issue where unique IP addresses
accumulated indefinitely in the geo_cache table, degrading query performance.

## Changes

### Database Schema (Migration 3)
- Add 'last_seen' column to geo_cache table tracking last reference time
- Existing entries default to current timestamp

### Repository Layer (geo_cache_repo.py)
- Update upsert_entry() to set/refresh last_seen on insert/update
- Update upsert_neg_entry() to set/refresh last_seen on negative cache hits
- Update bulk_upsert_entries() to set/refresh last_seen in batch operations
- Add delete_stale_entries(db, cutoff_iso) -> int for purging old entries

### Background Task (geo_cache_cleanup.py)
- New APScheduler task that runs nightly (24-hour interval)
- Calculates cutoff as 90 days ago from current time (UTC)
- Deletes all entries with last_seen older than cutoff
- Logs operation results (info when deleted > 0, debug when 0 deleted)
- Configurable retention period via GEO_CACHE_RETENTION_DAYS constant

### Application Startup (startup.py)
- Register geo_cache_cleanup task in scheduler during app startup
- Placed after geo_cache_flush in task registration order

### Tests
- Add delete_stale_entries test cases covering:
  * Removal of old entries beyond cutoff
  * No deletion when all entries are recent
  * Empty table edge case
- Update existing test fixtures to include last_seen column
- Add full test suite for cleanup task registration and execution

### Documentation
- Architekture.md: Document cleanup task, update schema/diagram
- Backend-Development.md: Add retention policy documentation

## Behavior

When an IP is accessed, its last_seen is refreshed. After 90 days of no
access, an IP is purged by the nightly cleanup. On next encounter, the IP
is re-resolved from MaxMind MMDB or ip-api.com (if configured).

This is acceptable because:
1. Stale geolocation data may become inaccurate over time
2. Re-resolution cost is minimal compared to unbounded storage growth
3. Active IPs maintain fresh data through their last_seen updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 19:24:34 +02:00
1d91e24a88 TASK-030: Secure IP geolocation with MMDB-primary resolver
Make MaxMind GeoLite2-Country MMDB the primary IP resolver (local, encrypted)
and demote ip-api.com to optional fallback only (disabled by default).

Changes:
- Add geoip_allow_http_fallback config flag (default False) to Settings
- Refactor GeoCache.lookup() and lookup_batch() to try MMDB first
- Update startup.py to pass config flag and log security warning when HTTP enabled
- Update all 49 tests to reflect new MMDB-primary strategy
- Add comprehensive geoip configuration section to Backend-Development.md
- Update Architekture.md to show MMDB + optional HTTP in system dependencies
- Update .env.example with BANGUI_GEOIP_DB_PATH and HTTP fallback flag

Security impact:
- 99% of IP addresses (successful MMDB lookups) now stay local, encrypted
- HTTP-only IPs are cached for 5 minutes to minimize external calls
- Operators must explicitly enable HTTP fallback (security-conscious default)
- GDPR/CCPA compliance: no PII sent over unencrypted networks by default

Fixes TASK-030: Resolved plaintext IP transmission to ip-api.com

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 15:31:39 +02:00
c2348d7075 Refactor backend architecture and update documentation
- Add CSRF protection middleware implementation
- Update API client with improved configuration
- Enhance documentation for backend development
- Add architecture documentation updates
- Reorganize and clean up task documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 14:52:23 +02:00
81f009e323 TASK-022: Hash session tokens in database for security
- Store session tokens as one-way SHA256 hashes instead of plaintext
- Hash tokens on write (create_session) and on read (get_session, delete_session)
- Add migration to drop plaintext sessions table and recreate with token_hash column
- Update Session model: token field still contains raw token for signing
- Add test to verify tokens are hashed in database, not plaintext
- Update Architekture.md to document session token hashing
- Update Backend-Development.md with implementation pattern and best practices

Prevents direct session token hijacking if database file is exposed to attacker.
If plaintext DB was readable, sessions are invalidated by the migration anyway.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 14:36:21 +02:00
5d9cef7760 TASK-013: Add nginx security headers (CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy)
- Added OWASP-recommended security headers to nginx server block
- CSP allows same-origin scripts and inline styles (required for Fluent UI v9)
- X-Frame-Options: DENY prevents clickjacking
- X-Content-Type-Options: nosniff prevents MIME-sniffing
- Referrer-Policy: no-referrer prevents URL leakage
- Permissions-Policy: disables geolocation, microphone, camera APIs
- HSTS commented out until HTTPS is fully configured
- All headers use 'always' directive for error responses (4xx, 5xx)
- Updated Architekture.md with security header documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 13:35:15 +02:00
a5b55d1248 Add session cleanup task and update documentation
- Implement session_cleanup task for removing expired sessions
- Add comprehensive tests for session cleanup functionality
- Update architecture and task documentation
- Integrate cleanup task into application startup

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 12:49:13 +02:00
9725714aa2 docs: document nginx routing rules to prevent SPA fallback hiding API 404s
TASK-006: Document the nginx routing configuration that ensures API requests
returning 404 from FastAPI are not intercepted by the SPA wildcard fallback
rule. This prevents development bugs from being masked by 200 responses
containing HTML instead of 404 errors.

Added section 9.2 in Architekture.md covering:
- nginx location block priority (longest-prefix matching)
- Routing configuration for /api/, /assets/, and /
- Detailed routing behavior diagrams
- Critical implementation notes to prevent regressions

The current nginx.conf is already correct:
- /api/ location has no try_files and proxies directly to backend
- /assets/ location uses try_files with =404
- / catch-all uses SPA fallback to index.html

This ensures:
✓ API typos like /api/jailss return 404, not SPA HTML
✓ Frontend routes serve SPA HTML for client-side routing
✓ Static assets properly return 404 when missing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 12:10:21 +02:00
d982fe3efc TASK-003: Document process-local constraint for RuntimeState and SessionCache
- Add comprehensive docstring to runtime_state.py explaining single-process
  constraint, impacts in multi-worker deployments, and solution approach
- Add comprehensive docstring to session_cache.py explaining process-local
  cache limitation, security implications, and Redis/database alternatives
- Update Architecture.md to clarify session cache is process-local and
  describe single-worker enforcement via TASK-002
- Update Architecture.md runtime state section with detailed explanation of
  per-process state and multi-worker impacts
- Add Backend-Development.md section 13.7.2 documenting session cache
  pluggability pattern with example Redis implementation
- All tests pass; linting passes; type checking has pre-existing errors

This is the short-term fix for TASK-003: enforce single-worker deployment
(TASK-002) and document the constraint clearly. The long-term fix (Redis
backend) is deferred as a follow-up.
2026-04-26 11:43:34 +02:00
825a67f13a Add multi-worker detection for APScheduler safety
- Add _check_single_worker_mode() to startup.py that detects and rejects
  multi-worker configurations, raising a clear RuntimeError with instructions
- Set BANGUI_WORKERS=1 as default in Dockerfile.backend
- Document single-worker requirement in compose.prod.yml
- Add 'Deployment Constraints' section to Architekture.md explaining why
  single-worker mode is required and detailing future multi-worker support
- Add '9.1 Background Tasks and Scheduler Architecture' section to
  Backend-Development.md documenting task structure and single-worker requirement
- Add comprehensive test suite (test_startup.py) covering all scenarios:
  allows single worker, rejects multi-worker, validates config format,
  and verifies informative error messages

This fix addresses TASK-002 which identified that in-process APScheduler is
unsafe in multi-worker deployments due to each worker creating independent
scheduler instances, causing duplicate background job execution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 11:39:51 +02:00
83452ffc23 Refactor backend services and jail configuration
- Refactor action_config_service, filter_config_service, jail_config_service, and jail_service
- Add jail_socket utility module for socket communication
- Update test_jail_service with new test cases
- Update architecture and task documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-25 18:34:03 +02:00
654dbdb000 T-04: Encapsulate geo_service module-level mutable state in GeoCache class
Create GeoCache class with all mutable state as instance attributes:
- _cache, _neg_cache, _dirty, _geoip_reader, _geoip_initialized, _cache_lock
- All public methods: lookup(), lookup_batch(), lookup_cached_only(), flush_dirty(), load_from_db(), clear(), etc.

Initialization & Dependency Injection:
- Instantiate GeoCache in startup.py and store on app.state.geo_cache
- Add get_geo_cache() dependency function in dependencies.py
- Inject into routes and tasks via FastAPI's dependency system

Backward Compatibility:
- Maintain module-level functions in geo_service.py as deprecated wrappers
- All old callers continue to work through _default_geo_cache instance
- Remove test-escape-hatch functions (clear_cache, clear_neg_cache moved to methods)

Background Tasks:
- Update geo_cache_flush.py and geo_re_resolve.py to receive GeoCache instance
- Tasks now operate on injected instance rather than module globals

Tests:
- Refactor test_geo_service.py with geo_cache fixture providing fresh instances
- Update patch paths to target GeoCache methods correctly
- Fix internal state assertions to access instance attributes

Documentation:
- Update Architekture.md to document GeoCache as managed stateful service
- Describe cache lifecycle (load on startup, flush periodically, re-resolve stale)
- Note process-local limitations for multi-worker deployments

Fixes violation of Single Responsibility Principle: module no longer owns both
lookup logic and cache lifecycle management. Cache is now a first-class
injectable service with transparent lifecycle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:18:09 +02:00
b634ce876a refactor: Extract fail2ban response utilities into shared module
Consolidate duplicate _ok(), _to_dict(), ensure_list(), and is_not_found_error()
functions from 6 service modules into a single canonical implementation at
backend/app/utils/fail2ban_response.py.

Changes:
- Create fail2ban_response.py with canonical implementations
- Remove local duplicates from: ban_service, jail_service, config_service,
  health_service, server_service, config_file_utils
- Update all imports to use shared module
- Add comprehensive docstrings and examples
- Update Architecture.md and Backend-Development.md documentation

Benefits:
- Single source of truth for response parsing logic
- Eliminates code duplication across service layer
- Improves maintainability and consistency
- Enables centralized bug fixes and improvements

Tests: All 228 service tests passing, no regressions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 15:11:21 +02:00
be1d66988f Document RuntimeState concurrency model and mark task 5 complete 2026-04-18 19:56:41 +02:00
db5b4cb77e Add settings and history archive repository protocols and DI support 2026-04-17 20:54:08 +02:00
5e5d7c34b2 Document task DB access and unify background task DB handling 2026-04-17 17:18:49 +02:00
cdb0c3681e Task 3: remove config_file_service facade, update direct imports and tests 2026-04-15 21:16:00 +02:00
6dc53a80b5 Mark TASK-13 complete and document fail2ban_metadata_service 2026-04-15 09:14:29 +02:00
a8f2d2d7b9 Refactor geo re-resolve endpoint into geo_service and add typed response 2026-04-15 08:56:37 +02:00
58bb769a35 Refactor history sync into history_service and update docs/tests 2026-04-14 15:09:58 +02:00
5a9d226cca Consolidate fail2ban truthy values into shared constants 2026-04-14 09:03:49 +02:00
a5674f9e4c Consolidate domain exceptions into app.exceptions
Move all shared domain exception classes to backend/app/exceptions.py and update services/routers to import the canonical exceptions. Update docs to reflect the shared exceptions source.
2026-04-13 19:35:12 +02:00
8e43ef9ad2 Fix setup_service to mark setup_complete only after successful runtime DB init 2026-04-12 20:30:22 +02:00
effcc65e1b Document process-local auth session cache semantics
Clarify that dependencies.py session cache is process-local and not cluster-safe, and document the limitation in architecture docs.
2026-04-07 20:42:31 +02:00
335f89c554 Docs: mark Task 8/9 completed and update architecture docs 2026-03-22 14:24:28 +01:00
cc235b95c6 Split config_file_service.py into three specialized service modules
Extract jail, filter, and action configuration management into separate
domain-focused service modules:

- jail_config_service.py: Jail activation, deactivation, validation, rollback
- filter_config_service.py: Filter discovery, CRUD, assignment to jails
- action_config_service.py: Action discovery, CRUD, assignment to jails

Benefits:
- Reduces monolithic 3100-line module into three focused modules
- Improves readability and maintainability per domain
- Clearer separation of concerns following single responsibility principle
- Easier to test domain-specific functionality in isolation
- Reduces coupling - each service only depends on its needed utilities

Changes:
- Create three new service modules under backend/app/services/
- Update backend/app/routers/config.py to import from new modules
- Update exception and function imports to source from appropriate service
- Update Architecture.md to reflect new service organization
- All existing tests continue to pass with new module structure

Relates to Task 4 of refactoring backlog in Docs/Tasks.md
2026-03-22 14:24:28 +01:00
ab11ece001 Add fail2ban log viewer and service health to Config page
Task 2: adds a new Log tab to the Configuration page.

Backend:
- New Pydantic models: Fail2BanLogResponse, ServiceStatusResponse
  (backend/app/models/config.py)
- New service methods in config_service.py:
    read_fail2ban_log() — queries socket for log target/level, validates the
    resolved path against a safe-prefix allowlist (/var/log) to prevent
    path traversal, then reads the tail of the file via the existing
    _read_tail_lines() helper; optional substring filter applied server-side.
    get_service_status() — delegates to health_service.probe() and appends
    log level/target from the socket.
- New endpoints in routers/config.py:
    GET /api/config/fail2ban-log?lines=200&filter=...
    GET /api/config/service-status
  Both require authentication; log endpoint returns 400 for non-file log
  targets or path-traversal attempts, 502 when fail2ban is unreachable.

Frontend:
- New LogTab.tsx component:
    Service Health panel (Running/Offline badge, version, jail count, bans,
    failures, log level/target, offline warning banner).
    Log viewer with color-coded lines (error=red, warning=yellow,
    debug=grey), toolbar (filter input + debounce, lines selector, manual
    refresh, auto-refresh with interval selector), truncation notice, and
    auto-scroll to bottom on data updates.
  fetchData uses Promise.allSettled so a log-read failure never hides the
  service-health panel.
- Types: Fail2BanLogResponse, ServiceStatusResponse (types/config.ts)
- API functions: fetchFail2BanLog, fetchServiceStatus (api/config.ts)
- Endpoint constants (api/endpoints.ts)
- ConfigPage.tsx: Log tab added after existing tabs

Tests:
- Backend service tests: TestReadFail2BanLog (6), TestGetServiceStatus (2)
- Backend router tests: TestGetFail2BanLog (8), TestGetServiceStatus (3)
- Frontend: LogTab.test.tsx (8 tests)

Docs:
- Features.md: Log section added under Configuration View
- Architekture.md: config.py router and config_service.py descriptions updated
- Tasks.md: Task 2 marked done
2026-03-14 12:54:03 +01:00
8d9d63b866 feat(stage-1): inactive jail discovery and activation
- Backend: config_file_service.py parses jail.conf/jail.local/jail.d/*
  following fail2ban merge order; discovers jails not running in fail2ban
- Backend: 3 new API endpoints (GET /jails/inactive, POST /jails/{name}/activate,
  POST /jails/{name}/deactivate); moved /jails/inactive before /jails/{name}
  to fix route-ordering conflict
- Frontend: ActivateJailDialog component with optional parameter overrides
- Frontend: JailsTab extended with inactive jail list and InactiveJailDetail pane
- Frontend: JailsPage JailOverviewSection shows inactive jails with toggle
- Tests: 57 service tests + 16 router tests for all new endpoints (all pass)
- Docs: Features.md, Architekture.md, Tasks.md updated; Tasks 1.1-1.5 marked done
2026-03-13 15:44:36 +01:00
a344f1035b docs: update Features and Architecture for config list/detail redesign
- Features.md §6: describe list/detail layout with active/inactive badges,
  active-first sort, and per-item collapsible raw config editing
- Architekture.md routers: add file_config.py router entry
- Architekture.md services: add file_config_service.py and conffile_parser.py
- Architekture.md components: add ConfigListDetail, RawConfigSection,
  AutoSaveIndicator
- Architekture.md hooks: add useConfigActiveStatus, useFilterConfig,
  useActionConfig, useJailFileConfig, useAutoSave
- Architekture.md API layer: expand config.ts entry with full function list
2026-03-13 14:44:29 +01:00
fe8eefa173 Add jail distribution chart (Stage 5)
- backend: GET /api/dashboard/bans/by-jail endpoint
  - JailBanCount + BansByJailResponse Pydantic models in ban.py
  - bans_by_jail() service function with origin filter support
  - Route added to dashboard router
  - 17 new tests (7 service, 10 router); full suite 497 passed, 83% coverage

- frontend: JailDistributionChart component
  - JailBanCount / BansByJailResponse types in types/ban.ts
  - dashboardBansByJail endpoint constant in api/endpoints.ts
  - fetchBansByJail() in api/dashboard.ts
  - useJailDistribution hook in hooks/useJailDistribution.ts
  - JailDistributionChart component (horizontal bar chart, Recharts)
  - DashboardPage: full-width Jail Distribution section below Top Countries
2026-03-11 17:01:19 +01:00
cbad4ea706 Add ban management features and update documentation
- Implement ban model, service, and router endpoints in backend
- Add ban table component and dashboard integration in frontend
- Update ban-related types and API endpoints
- Add comprehensive tests for ban service and dashboard router
- Update documentation (Features, Tasks, Architecture, Web-Design)
- Clean up old fail2ban configuration files
- Update Makefile with new commands
2026-03-06 20:33:42 +01:00
c097e55222 fix: setup routing, async bcrypt, password hashing, clean command
- Add SetupGuard component: redirects to /setup if setup not complete,
  shown as spinner while loading. All routes except /setup now wrapped.
- SetupPage redirects to /login on mount when setup already done.
- Fix async blocking: offload bcrypt.hashpw and bcrypt.checkpw to
  run_in_executor so they never stall the asyncio event loop.
- Hash password with SHA-256 (SubtleCrypto) before transmission; added
  src/utils/crypto.ts with sha256Hex(). Backend stores bcrypt(sha256).
- Add Makefile with make up/down/restart/logs/clean targets.
- Add tests: _check_password async, concurrent bcrypt, expired session,
  login-without-setup, run_setup event-loop interleaving.
- Update Architekture.md and Features.md to reflect all changes.
2026-03-01 19:16:49 +01:00
460d877339 instructions 2026-02-28 20:52:29 +01:00