- Create PaginationMetadata model with computed derived fields (total_pages, has_next_page, has_prev_page)
- Update PaginatedListResponse to embed pagination metadata in a separate 'pagination' object
- Add create_pagination_metadata() factory function in utils/pagination.py for consistent computation
- Update all paginated service functions to use new structure:
- history_service.list_history()
- blocklist_service.get_import_logs()
- jail_service.get_jail_banned_ips()
- ban_mappers.map_domain_dashboard_ban_list_to_response()
- Update response model docstrings with new structure examples
- Update Backend-Development.md documentation with new pagination patterns
- Update test fixtures to work with new response structure
Response shape changes from:
{"items": [...], "total": 100, "page": 1, "page_size": 50}
To:
{"items": [...], "pagination": {"page": 1, "page_size": 50, "total": 100, "total_pages": 2, "has_next_page": true, "has_prev_page": false}}
Benefits:
- Frontend receives all pagination state needed for UI controls
- No need for frontend to calculate total_pages or page navigation logic
- Consolidated pagination metadata reduces field sprawl
- OpenAPI schema automatically reflects changes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit addresses race conditions in multi-step database operations by:
1. Wrap write operations in BEGIN IMMEDIATE ... COMMIT transactions:
- import_run_repo: create_pending, mark_completed, mark_failed
- geo_cache_repo: all upsert_*_and_commit functions
- geo_cache_repo: bulk_upsert_entries_and_neg_entries_and_commit
2. Handle concurrent write collisions gracefully:
- import_run_repo.create_pending can now raise IntegrityError
- blocklist_import_workflow catches IntegrityError and retries lookup
- Logs 'blocklist_import_lost_race' event when another request wins the race
3. Add comprehensive documentation:
- Backend-Development.md § 6.3 Database Transactions
- Explains when to use BEGIN IMMEDIATE
- Shows transaction pattern with try-except-rollback
- Documents race condition error handling pattern
The solution leverages SQLite's UNIQUE constraint for data integrity while
handling the concurrent case gracefully in application logic. This is more
efficient than using BEGIN EXCLUSIVE which would serialize all writers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CRITICAL FIX: Background tasks (especially blocklist_import) crashed mid-execution,
leaving partial state. On retry, the same bans were applied again, causing duplicates.
Solution: Content-hash based operation tracking for blocklist imports:
- Added import_runs table (migration 6) to track operations by source + content hash
- Before banning, check if this exact content has already been imported
- If completed: skip banning (already done), optionally re-warm cache
- If new or failed: proceed with ban and mark as completed or failed
Changes:
- Database: Migration 6 adds import_runs table with operation state tracking
- Model: Added ImportRunEntry for import run records
- Repository: New import_run_repo module with CRUD operations
- Workflow: Updated blocklist_import_workflow to check operation history before banning
- Dependencies: Registered import_run_repo for dependency injection
- Tests: Added test_import_source_idempotent_on_retry and test_import_source_different_content_not_reused
- Documentation: Added Task Idempotency section to Backend-Development.md
Verification:
- All 7 import tests pass (5 existing + 2 new idempotency tests)
- Type checking: mypy --strict ✅
- Linting: ruff ✅
- No API changes, backwards compatible via automatic migration
Fixes: Background tasks not idempotent #CRITICAL
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Models in app/models/ are now pure data classes with no cross-layer dependencies.
This ensures the models layer remains a true leaf node in the dependency graph.
Changes:
- Create app/models/_common.py with shared types (TimeRange, bucket_count, constants)
- Move TimeRange and time-range constants from ban.py to _common.py
- Update history.py, routers, and services to import from _common.py
- Remove imports from app.config and app.utils from config.py models
- Move field validators from models to router layer:
- Add log_target validation in config_misc router
- Add log_path validation in jail_config router
- Update test_models.py to reflect validators moved to router layer
- Update documentation (Architekture.md, Backend-Development.md) with model layering rules
- Fix import ordering and type annotations in affected files
Model layering rule: Models may only import from:
✓ Standard library and third-party packages (Pydantic, typing)
✓ Other models in app/models/ (sibling models)
✓ app.models.response (response envelopes)
✗ app.services, app.config, app.utils, or any application layer
Validation requiring app-level state (settings, allowed directories) now happens
at the router or service layer, not in model validators.
Fixes: Models were not true leaf nodes due to circular imports and app-layer dependencies
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement transactional setup with explicit state machine and crash-safety
to prevent partial commits from leaving inconsistent state.
## Changes
### Core Implementation
1. **settings_repo.py**: Add atomic batch settings write
- New set_settings_batch() method: writes multiple settings in single
transaction (BEGIN IMMEDIATE ... COMMIT). Either all settings persist
or none do, preventing partial state if crash occurs mid-batch.
2. **setup_service.py**: Refactor run_setup() with transactional phases
- Phase 0: Compute password hash early (before any DB writes) to ensure
idempotency. Same hash is used throughout retries, preventing divergent
hashes from bcrypt's random salt.
- Phase 1 (Bootstrap DB transaction): Set setup_state=in_progress and
database_path, then commit. First checkpoint for crash detection.
- Phase 2 (Filesystem): Initialize runtime database (idempotent)
- Phase 3 (Runtime DB transaction): Batch-write all settings atomically
- Phase 4 (Bootstrap DB transaction): Set setup_state=complete and
setup_completed=1. Final commit point.
3. **protocols.py**: Add set_settings_batch to SettingsRepository protocol
### Testing
- Added 6 new transactionality tests covering:
- State machine transitions (None → in_progress → complete)
- Password hash idempotency across retries
- Atomic batch writes (all-or-nothing persistence)
- Bootstrap DB state tracking
- Database path propagation to both DBs
- Recovery on partial failure
- All 18 tests pass (12 existing + 6 new)
### Documentation
- Updated Docs/Architekture.md with new section 6:
- Setup state machine with state transitions
- Transaction boundary documentation
- Password hash idempotency rationale
- Backward compatibility notes
## Design Decisions
### Why This Approach
- Current code already idempotent via INSERT OR REPLACE, but password
hash non-idempotency created silent inconsistency risk
- Simpler than multi-state machine: 2 states sufficient for detection
- Maintains backward compatibility (setup_completed key still written)
- Explicit transactions make crash-safety obvious to future maintainers
### Crash Scenarios Now Handled
1. Crash after Phase 1 → detected by setup_state=in_progress on retry
2. Crash after Phase 2 → runtime DB may be partial, safe to retry
3. Crash after Phase 3 → runtime DB rolls back on next connection
4. Crash after Phase 4 → setup_completed detected, skipped
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Problem
The blocklist URL validation at create/update time has a TOCTOU (time-of-check-to-time-of-use) window.
An attacker can perform a DNS-rebinding attack where:
1. User adds blocklist URL pointing to attacker.com
2. At create time, attacker.com resolves to a public IP → validation passes
3. Later, when fetching, attacker.com resolves to 192.168.1.1 (internal network)
4. HTTP client connects to the private IP, potentially accessing internal services
## Solution
Add runtime destination IP validation at connection time via a custom socket factory:
- Created 'dns_validated_connector.py' with create_dns_validated_socket_factory() that validates
all resolved IPs before socket creation
- HTTP session now uses the validated socket factory, protecting all blocklist imports globally
- Rejects connections to RFC 1918 private ranges, loopback, link-local, ULA, multicast, and
reserved addresses (IPv4 and IPv6)
- Added comprehensive test coverage with 13 test cases
## Changes
- backend/app/services/dns_validated_connector.py: Custom socket factory with IP validation
- backend/app/startup.py: Use DNS-validated socket factory in HTTP session creation
- backend/app/utils/ip_utils.py: Updated docstring explaining runtime validation
- backend/app/services/blocklist_downloader.py: Updated module docstring
- backend/app/services/blocklist_service.py: Updated docstrings explaining two-layer protection
- backend/tests/test_services/test_dns_validated_connector.py: Test suite for socket factory
- Docs/Architekture.md: Added detailed section on DNS-rebinding protection
## Testing
- All 13 DNS validation tests pass
- All blocklist downloader tests pass (unaffected by changes)
- Linting: ruff, mypy pass with --strict
- Test coverage: 90% line coverage on dns_validated_connector.py
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit standardizes how API responses are wrapped, solving issue #24.
Problem:
- Inconsistent response envelopes (jails vs items vs bans vs no wrapper)
- Frontend required multiple field name variants
- Integration bugs from branching logic
- No clear pattern for different response types
Solution:
- Created response.py with base classes: PaginatedListResponse,
CollectionResponse, CommandResponse
- Standardized all list/collection responses to use 'items' field
- Domain-specific field names for detail and aggregation responses
- Updated all backends routers and mappers
- Updated frontend types and hooks to match
Changes:
Backend:
- backend/app/models/response.py (new): Base response models
- backend/app/models/ban.py: Updated responses to inherit from bases
- backend/app/models/jail.py: Updated JailListResponse, JailCommandResponse
- backend/app/models/config.py: Updated collection responses
- backend/app/services/jail_service.py: Updated return statements
- backend/app/mappers/ban_mappers.py: Updated 'bans' to 'items'
- backend/tests/test_mappers/test_ban_mappers.py: Updated tests
Frontend:
- frontend/src/types/jail.ts: Updated response interfaces
- frontend/src/types/config.ts: Updated response interfaces
- frontend/src/hooks/useActiveBans.ts: Updated selector
- frontend/src/hooks/useJailList.ts: Updated selector
- frontend/src/hooks/useJailConfigs.ts: Updated selector
- frontend/src/hooks/useConfigActiveStatus.ts: Updated field access
- frontend/src/hooks/useJailAdmin.ts: Updated field access
Documentation:
- Docs/Backend-Development.md: Added § 4.1 API Response Envelope Policy
The policy defines:
1. Paginated lists use PaginatedListResponse (items, total, page, page_size)
2. Non-paginated collections use CollectionResponse (items, total)
3. Detail responses use entity-specific field names (jail, status, settings)
4. Command responses use CommandResponse (message, success, optional target)
5. Aggregations use domain-specific fields (jails, countries, buckets, bans)
All responses now follow one of these patterns, reducing frontend complexity.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert inconsistent modeling style to standardized Pydantic models for all
external-facing data structures while maintaining TypedDict compatibility where
appropriate for internal layer-private structures.
Changes:
- Converted IpLookupResult TypedDict to use IpLookupResponse Pydantic model
in jail_service.lookup_ip() for consistency with routers
- Added GeoCacheEntry Pydantic model for geo cache repository rows
- Converted GeoCacheRow TypedDict to use GeoCacheEntry alias
- Converted ImportLogRow TypedDict to use ImportLogEntry alias
- Updated routers and services to work with Pydantic models
- Updated all tests to use Pydantic model field access (attributes)
instead of dict subscripting
Documentation:
- Added 'Model Type Usage by Layer' section to Backend-Development.md
- Defines when TypedDict is allowed (internal structures) vs Pydantic
(external-facing, cross-boundary data)
- Provides clear guidance on modeling conventions per layer
Benefits:
- Consistent validation and serialization behavior
- Better IDE support and type checking
- Clearer separation of concerns by layer
- Reduced maintenance cost from mixed validation approaches
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ban domain model for core business logic separation
- Implement mapper pattern for DTO/domain conversions
- Update ban service with new domain-driven approach
- Refactor router endpoints to use new architecture
- Add comprehensive mapper tests
- Update documentation with architecture changes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TASK-004: Replace module-level mutable runtime flags in service layer with
injected state holder, eliminating hidden global state and improving testability
and synchronization boundaries.
Changes:
- Create JailServiceState dataclass in app/utils/runtime_state.py to hold
backend capability cache and synchronization lock
- Add JailServiceState as a field in RuntimeState (with default_factory)
- Remove module-level _backend_cmd_supported and _backend_cmd_lock from
jail_service.py
- Refactor _check_backend_cmd_supported() to accept state parameter
- Inject JailServiceState into list_jails() and _fetch_jail_summary() via
parameters
- Add get_jail_service_state() dependency provider in app/dependencies.py
- Add JailServiceStateDep type alias for router injection
- Update jails router to receive and pass state to service functions
- Update all tests to use jail_service_state fixture and pass state to functions
- Remove duplicate _MAX_PAGE_SIZE constant definition
- Document mutable state management in Backend-Development.md
- Update Architecture.md to describe JailServiceState and state nesting pattern
Benefits:
- Eliminates global mutable state and associated race conditions
- Makes state visible to callers (not hidden in module scope)
- Enables test isolation (each test gets fresh state)
- Prepares codebase for multi-worker deployments (state can be extracted to
shared backend)
- Synchronization boundaries are now explicit (state.get_backend_cmd_lock())
Compliance:
- All tests pass (17 passed in TestListJails, TestGetJail, TestLockInitialization)
- No ruff linting errors
- Type-safe: JailServiceState properly typed with asyncio.Lock, bool | None
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extracted the monolithic import_source() function (776 lines) into focused,
testable components with clear single responsibilities:
- BlocklistDownloader: HTTP download with exponential backoff retry logic
* Handles transient failures (429, 5xx errors, timeouts)
* Configurable retry attempts and backoff strategy
* 93% test coverage
- BlocklistParser: Parse and validate IP addresses
* Extract valid IPv4/IPv6 addresses from text
* Skip CIDRs and malformed entries gracefully
* Separate parsing from validation concerns
* 100% test coverage
- BanExecutor: Ban execution with error handling
* Ban IPs via fail2ban socket
* Stop on JailNotFoundError (jail doesn't exist)
* Continue on JailOperationError (individual ban failures)
* 100% test coverage
- BlocklistImportWorkflow: Thin orchestrator
* Coordinates the download → parse → ban → log flow
* Pre-warms geo cache with newly banned IPs
* 96% test coverage
- blocklist_service.py: Maintains public API
* Source CRUD (create, read, update, delete)
* URL validation and preview functionality
* Scheduling configuration and import triggers
* 92% test coverage
Benefits:
* Each component is independently testable with mock dependencies
* Error handling is explicit and localized
* Components can evolve independently
* Logging is contextual and clear
* Retry and transient error handling are isolated
Testing:
* All 36 existing blocklist_service tests pass
* All 13 blocklist import task tests pass
* Added 17 comprehensive component unit tests
* Combined 96%+ coverage on new modules
* Zero type errors in new code
Documentation:
* Updated Refactoring.md with detailed architecture notes
* Added component architecture diagram to Architekture.md
* Documented ownership and responsibilities of each component
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove hidden cross-service coupling by making dependencies explicit through
dependency injection while maintaining backward compatibility via lazy imports.
Key changes:
- history_service and ban_service: Removed direct module-level imports of
fail2ban_metadata_service, added optional service parameters to functions
- Added get_fail2ban_metadata_service() provider to dependencies.py
- Updated history router to inject Fail2BanMetadataService dependency
- history_service functions now use lazy imports in fallback paths for
backward compatibility when service is not explicitly injected
- All test patches updated to use internal _get_fail2ban_db_path() helper
- jail_config_service and jail_service already follow best practices
This pattern prevents circular imports, makes services testable via explicit
mocking, and documents service dependencies clearly.
Fixes: Instructions.md #2 - Hidden cross-service coupling
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make MaxMind GeoLite2-Country MMDB the primary IP resolver (local, encrypted)
and demote ip-api.com to optional fallback only (disabled by default).
Changes:
- Add geoip_allow_http_fallback config flag (default False) to Settings
- Refactor GeoCache.lookup() and lookup_batch() to try MMDB first
- Update startup.py to pass config flag and log security warning when HTTP enabled
- Update all 49 tests to reflect new MMDB-primary strategy
- Add comprehensive geoip configuration section to Backend-Development.md
- Update Architekture.md to show MMDB + optional HTTP in system dependencies
- Update .env.example with BANGUI_GEOIP_DB_PATH and HTTP fallback flag
Security impact:
- 99% of IP addresses (successful MMDB lookups) now stay local, encrypted
- HTTP-only IPs are cached for 5 minutes to minimize external calls
- Operators must explicitly enable HTTP fallback (security-conscious default)
- GDPR/CCPA compliance: no PII sent over unencrypted networks by default
Fixes TASK-030: Resolved plaintext IP transmission to ip-api.com
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create logged_task() helper in backend/app/utils/async_utils.py to wrap
fire-and-forget coroutines with exception logging
- Ensures unhandled task exceptions are always logged to structlog instead of
silently discarded (Python 3.11+ RuntimeWarning)
- Update ban_service.py to use logged_task() for geo_cache.lookup_batch()
background resolution
- Add comprehensive tests for logged_task() in test_async_utils.py
- Document fire-and-forget task conventions in Backend-Development.md
The logged_task() wrapper catches any exception raised in a background task,
logs it with full traceback context and task name, and never re-raises.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove the early-return branch that skipped HMAC verification for unsigned tokens
- Raise ValueError if the signature separator is absent
- Update unwrap_session_token docstring to reflect mandatory signing requirement
- Add comprehensive session token signing documentation to Backend-Development.md
- Document the session token format, signing/verification pattern, and security rationale
All tokens must now carry a valid HMAC-SHA256 signature. Tokens without a
signature are rejected immediately. This removes the vulnerability where an
attacker with database access could bypass the HMAC layer by using raw tokens.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
**Issue:**
- log_target accepted arbitrary paths, allowing authenticated users to write
files as root via fail2ban (e.g., /etc/cron.d/bangui-pwned)
- fail2ban runs as root and opens files specified in log_target
**Solution:**
1. **Model layer validation:** Already existed in GlobalConfigUpdate, prevents
invalid paths before reaching service
2. **Service layer validation:** Added defensive check in update_global_config()
that validates log_target even if model validation is bypassed
3. **New validation helper:** Added validate_log_target() utility that accepts
special values (STDOUT, STDERR, SYSLOG) or paths within allowed directories
**Changes:**
- app/utils/path_utils.py: Added validate_log_target() helper
- app/services/config_service.py: Added service-layer validation before
sending command to fail2ban
- backend/tests: Fixed session_secret length issues in fixtures (min 32 chars)
- backend/tests: Added tests for valid special log targets
- Docs/Backend-Development.md: Documented log_target security requirements
**Test Coverage:**
- Model validation rejects /etc/passwd (existing test)
- Model validation accepts STDOUT, STDERR, SYSLOG special values
- Model validation accepts paths in allowed directories
- Service layer validation tested with special values
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace Path.write_text() with tempfile.NamedTemporaryFile + os.replace()
in _write_conf_file() and _create_conf_file()
- Ensures atomic operations on same filesystem (temp file in target.parent)
- Prevents config corruption if process is killed mid-write
- Follows existing pattern in jail_config_service.py
- Add proper cleanup of temp files on error with contextlib.suppress()
- Document atomic file write conventions in Backend-Development.md
This prevents fail2ban config files (especially jail.d/*.conf) from being
left in a truncated or corrupt state, which could disable active protection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace sensitive token fragments in structured logs with:
- login(): Use session_id=session.id (database row ID) instead of token_prefix
- logout(): Use token_hash (SHA256 one-way hash, first 12 chars) instead of token_prefix
This prevents partial token material leakage into log aggregation systems while
maintaining useful session correlation via hashed tokens or database IDs.
Also updated Backend-Development.md to clarify logging conventions for
sensitive data handling.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of returning a bound method (geo_cache.lookup_batch), now inject
the GeoCache instance directly into routers and services. This provides
proper runtime isolation since T-04 made GeoCache a proper object.
Changes:
- Remove get_geo_batch_lookup() dependency provider
- Add GeoCacheDep type alias for injecting GeoCache instances
- Update all routers (bans, blocklist, dashboard, jails) to use GeoCacheDep
- Update ban_service, blocklist_service, jail_service to accept GeoCache
- Update service protocols to match new signatures
- Update docstrings to reference GeoCache methods instead of module functions
All callers now call geo_cache.lookup_batch(...) directly instead of
geo_batch_lookup(...), providing real dependency injection with proper
testing isolation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Consolidate the two divergent implementations of _since_unix from ban_service.py
and history_service.py into a single shared utility function in time_utils.py.
Changes:
- Move _since_unix to app/utils/time_utils.py with consistent time.time() approach
- Move TIME_RANGE_SLACK_SECONDS constant to app/utils/constants.py
- Update ban_service.py to import since_unix from time_utils
- Update history_service.py to import since_unix from time_utils
- Both services now use the same window boundary calculation with 60-second slack
- Add comprehensive tests for the shared since_unix function
- Document timestamp handling rationale in Backend-Development.md
This ensures dashboard and history queries return consistent row counts for the
same time range by using the same timestamp calculation and slack window across
all services.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Refactor action_config_service, filter_config_service, jail_config_service, and jail_service
- Add jail_socket utility module for socket communication
- Update test_jail_service with new test cases
- Update architecture and task documentation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Create GeoCache class with all mutable state as instance attributes:
- _cache, _neg_cache, _dirty, _geoip_reader, _geoip_initialized, _cache_lock
- All public methods: lookup(), lookup_batch(), lookup_cached_only(), flush_dirty(), load_from_db(), clear(), etc.
Initialization & Dependency Injection:
- Instantiate GeoCache in startup.py and store on app.state.geo_cache
- Add get_geo_cache() dependency function in dependencies.py
- Inject into routes and tasks via FastAPI's dependency system
Backward Compatibility:
- Maintain module-level functions in geo_service.py as deprecated wrappers
- All old callers continue to work through _default_geo_cache instance
- Remove test-escape-hatch functions (clear_cache, clear_neg_cache moved to methods)
Background Tasks:
- Update geo_cache_flush.py and geo_re_resolve.py to receive GeoCache instance
- Tasks now operate on injected instance rather than module globals
Tests:
- Refactor test_geo_service.py with geo_cache fixture providing fresh instances
- Update patch paths to target GeoCache methods correctly
- Fix internal state assertions to access instance attributes
Documentation:
- Update Architekture.md to document GeoCache as managed stateful service
- Describe cache lifecycle (load on startup, flush periodically, re-resolve stale)
- Note process-local limitations for multi-worker deployments
Fixes violation of Single Responsibility Principle: module no longer owns both
lookup logic and cache lifecycle management. Cache is now a first-class
injectable service with transparent lifecycle.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Reorganized dashboard router with improved structure
- Enhanced ban_service with better separation of concerns
- Updated history service with cleaner logic
- Improved constants and configuration handling
- Updated documentation of completed tasks
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Consolidate duplicate _ok(), _to_dict(), ensure_list(), and is_not_found_error()
functions from 6 service modules into a single canonical implementation at
backend/app/utils/fail2ban_response.py.
Changes:
- Create fail2ban_response.py with canonical implementations
- Remove local duplicates from: ban_service, jail_service, config_service,
health_service, server_service, config_file_utils
- Update all imports to use shared module
- Add comprehensive docstrings and examples
- Update Architecture.md and Backend-Development.md documentation
Benefits:
- Single source of truth for response parsing logic
- Eliminates code duplication across service layer
- Improves maintainability and consistency
- Enables centralized bug fixes and improvements
Tests: All 228 service tests passing, no regressions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>