- Extract jail status/processing to helper functions
- Add error_handling.py service for centralized error handling
- Update config.py with validation and defaults
- Update .env.example with all config options
- Remove obsolete Tasks.md, add Service-Development.md
- Minor fixes across routers and services
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace contextlib.suppress with try/except + warning log
- Add test for fail2ban client
- Remove stale Issue #21 from Tasks.md (indexes)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- New get_fail2ban_db_path() in setup_service resolves DB path from configured socket path
- New ensure_fail2ban_indexes() creates missing performance indexes on bans table
- Call ensure_fail2ban_indexes on every startup before first ban query
- Remove completed tasks from Docs/Tasks.md
- Update Docs/PERFORMANCE.md with index findings
- Move config loading to dedicated ConfigLoader class with validation
- Add DATABASE_MIGRATIONS.md content to TROUBLESHOOTING.md
- Add API_STATUS_CODES.md documenting all API response codes
- Update runner.csx to use new config structure
- Add check_responses.py validation script
- Update config tests for new structure
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit HTTP status code documentation to every endpoint
across 15 router files. Each endpoint now declares all possible
response codes (200/201/204/400/401/404/409/429/502/503) with
descriptions so frontend can distinguish error types.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Establishes shared type conventions to prevent runtime type mismatches
between TS frontend and Python backend. Covers snake_case JSON field
names, null vs empty string handling, timestamp formats, and validation
patterns for country codes, bans, and jail configuration.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect catastrophic backtracking patterns before regex compilation
using regexploit library. Add ReDoSDetectedError exception and
_MINIMUM_STARRINESS threshold (>=3) to catch dangerous patterns
like (a+)+b. Update pyproject.toml deps, add tests for detection.
Replace check-then-insert race condition with INSERT ON CONFLICT.
- upsert_pending uses RETURNING id for atomic upsert
- UNIQUE(source_id, content_hash) constraint from migration 6
- blocklist_import_workflow updated to use upsert_pending
- test_import_source_success fixed for async mock patterns
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Enhance response model with additional fields and validation
- Update health and server router implementations
- Improve frontend type definitions and API integration
- Clean up documentation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add optional requestKey parameter to UseFetchDataOptions
- Implement module-level cache (inFlightRequests) to track in-flight requests
- When requestKey is provided, multiple hook instances with same key share in-flight requests
- Prevents duplicate API calls when multiple components fetch same data or rapid refresh calls
- Cache entries are automatically cleared when response arrives (success or error)
- Maintains backward compatibility: without requestKey, behaves as before
- Adds comprehensive tests for deduplication scenarios
This reduces bandwidth waste and prevents race conditions caused by concurrent requests for identical data.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add custom comparison function to React.memo for TopCountriesPieChart
- Add custom comparison function to React.memo for TopCountriesBarChart
- Use JSON.stringify for deep equality comparison of countries and countryNames
- Prevents unnecessary re-renders when parent updates with same data
- Avoids Recharts reprocessing 5000+ data points on each parent re-render
All tests passing. No linting issues.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure.
## Key Changes:
### 1. New Documentation: Docs/Observability.md
- Comprehensive guide to logging architecture and configuration
- Covers all three supported platforms (Datadog, Papertrail, Elasticsearch)
- Includes best practices, security considerations, and troubleshooting
- Documents sensitive data handling and compliance requirements
### 2. Core Implementation: app/utils/external_logging.py
- ExternalLogHandler: Abstract base class for non-blocking log delivery
- DatadogLogHandler: HTTP API integration with JSON payloads
- PapertrailLogHandler: Syslog protocol over TCP
- ElasticsearchLogHandler: Bulk API integration with NDJSON format
- Features:
- Async buffering with configurable batch size and flush interval
- Exponential backoff retry logic
- Non-blocking delivery (never blocks application logic)
- Proper error handling and internal logging
- Lifecycle management (start/shutdown)
### 3. Configuration: app/config.py
- New Settings fields for external logging:
- external_logging_enabled (default: False)
- external_logging_provider (datadog/papertrail/elasticsearch)
- external_logging_buffer_size (default: 1000)
- external_logging_flush_interval_seconds (default: 5.0)
- Provider-specific configuration (API keys, hosts, batch sizes)
- All fields have sensible defaults
- Full field validation and normalization
### 4. Integration: app/main.py
- Global _external_log_handler for application lifecycle
- _external_logging_processor: structlog processor for handler integration
- Updated _configure_logging(): Add handler to processor chain when enabled
- Updated _lifespan(): Initialize handler before startup, shutdown on termination
### 5. Tests: backend/tests/test_external_logging.py
- 20 comprehensive tests covering all handlers and factory
- Configuration validation tests
- All tests passing
## Design Decisions:
1. **Non-blocking Delivery**: External logging never blocks request handling.
Failures are logged locally but don't impact application.
2. **Buffering Strategy**: In-memory buffer with configurable size prevents
unbounded memory growth. When buffer fills, oldest logs are dropped with
a warning.
3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried
with exponential backoff. Permanent failures (bad credentials) are logged
and skipped.
4. **Disabled by Default**: External logging is opt-in via environment
variables, maintaining backward compatibility with existing deployments.
5. **Provider Flexibility**: Support for multiple platforms allows users to
choose based on their infrastructure (cloud-native, on-premise, etc).
## Backward Compatibility:
- All new configuration fields have defaults
- External logging disabled by default
- No changes to existing logging behavior unless explicitly configured
- No new required dependencies
## Testing:
- All 20 new tests passing
- Existing tests unaffected (same count of passing tests)
- Configuration validation tested
- Handler creation and lifecycle management tested
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds support for gradual session secret rotation without forcing logout:
- Add BANGUI_SESSION_SECRET_PREVIOUS config field for rotation window
- Implement unwrap_session_token_with_rotation() to accept tokens signed with
either current or previous secret
- Update validate_session() to transparently accept old tokens during rotation
- Update logout() to accept tokens from both secrets
- Add comprehensive logging for rotation events and metrics
- Add 8 new tests covering all rotation scenarios
- Update documentation with step-by-step rotation strategy
- Update .env.example with previous secret field
Key features:
- No forced logout: old tokens continue working during rotation window
- Transparent validation: old tokens are automatically logged for monitoring
- Production-safe: can rotate secrets without service interruption
- Metrics-ready: logs track token rotation for observability
Rotation workflow:
1. Generate new secret and set BANGUI_SESSION_SECRET
2. Set BANGUI_SESSION_SECRET_PREVIOUS to old secret
3. Wait for old tokens to expire (≥ session_duration_minutes)
4. Unset BANGUI_SESSION_SECRET_PREVIOUS to complete rotation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Implement cursor-based pagination in pagination.py
- Update response models to standardize pagination structure
- Add cursor pagination utilities for repositories
- Update HistoryArchiveRepository and ImportLogRepository with new pagination
- Add comprehensive tests for cursor pagination
- Update documentation for backend development and task tracking
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add AbortController refs and abort signal checks to prevent race conditions
and memory leaks when components unmount or new requests are initiated.
Components fixed:
- JailsTab.tsx: validation handler with AbortController pattern
- JailInfoSection.tsx: handle function with useCallback wrapper
- RawConfigSection.tsx: fetch handler with abort checks
- ConfFilesTab.tsx: file fetch handler with abort signal verification
- IgnoreListSection.tsx: three handlers (add, remove, toggle) with callbacks
All handlers now:
1. Abort previous requests before initiating new ones
2. Create and store new AbortController instances
3. Check abort status before state updates in .then()/.catch()
4. Include cleanup effects that abort on unmount
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The API pagination infrastructure was already correctly implemented with:
- PaginatedListResponse base model containing 'items' and 'pagination' fields
- PaginationMetadata object with all required fields (page, page_size, total, total_pages, has_next_page, has_prev_page)
- All services correctly calling create_pagination_metadata()
However, there were two bugs preventing tests from passing:
1. IMPORT BUG: time_utils.py was importing TIME_RANGE_SECONDS from app.models.ban
when it's actually defined in app.models._common. This caused import errors
in tests that exercise time-range filtering.
2. TEST BUG: Test assertions were using outdated API structure, accessing
.total, .page, .page_size directly on paginated responses instead of
through the .pagination object.
Fixed locations:
- test_mappers/test_ban_mappers.py: 3 assertions updated to use .pagination.*
- test_services/test_blocklist_service.py: 6 assertions updated
- test_services/test_history_service.py: 14 assertions updated
All paginated API endpoints now correctly return pagination metadata:
- GET /api/history
- GET /api/history/archive
- GET /api/dashboard/bans
- GET /api/jails/{name}/banned
- GET /api/blocklists/log
Verified with 24 passing pagination tests demonstrating correct behavior.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create PaginationMetadata model with computed derived fields (total_pages, has_next_page, has_prev_page)
- Update PaginatedListResponse to embed pagination metadata in a separate 'pagination' object
- Add create_pagination_metadata() factory function in utils/pagination.py for consistent computation
- Update all paginated service functions to use new structure:
- history_service.list_history()
- blocklist_service.get_import_logs()
- jail_service.get_jail_banned_ips()
- ban_mappers.map_domain_dashboard_ban_list_to_response()
- Update response model docstrings with new structure examples
- Update Backend-Development.md documentation with new pagination patterns
- Update test fixtures to work with new response structure
Response shape changes from:
{"items": [...], "total": 100, "page": 1, "page_size": 50}
To:
{"items": [...], "pagination": {"page": 1, "page_size": 50, "total": 100, "total_pages": 2, "has_next_page": true, "has_prev_page": false}}
Benefits:
- Frontend receives all pagination state needed for UI controls
- No need for frontend to calculate total_pages or page navigation logic
- Consolidated pagination metadata reduces field sprawl
- OpenAPI schema automatically reflects changes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit addresses race conditions in multi-step database operations by:
1. Wrap write operations in BEGIN IMMEDIATE ... COMMIT transactions:
- import_run_repo: create_pending, mark_completed, mark_failed
- geo_cache_repo: all upsert_*_and_commit functions
- geo_cache_repo: bulk_upsert_entries_and_neg_entries_and_commit
2. Handle concurrent write collisions gracefully:
- import_run_repo.create_pending can now raise IntegrityError
- blocklist_import_workflow catches IntegrityError and retries lookup
- Logs 'blocklist_import_lost_race' event when another request wins the race
3. Add comprehensive documentation:
- Backend-Development.md § 6.3 Database Transactions
- Explains when to use BEGIN IMMEDIATE
- Shows transaction pattern with try-except-rollback
- Documents race condition error handling pattern
The solution leverages SQLite's UNIQUE constraint for data integrity while
handling the concurrent case gracefully in application logic. This is more
efficient than using BEGIN EXCLUSIVE which would serialize all writers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The health check endpoint now properly indicates service unavailability:
- Returns HTTP 200 when fail2ban is online
- Returns HTTP 503 when fail2ban is offline
This allows Docker and other orchestration tools to correctly detect when
fail2ban is unreachable and automatically restart the backend container,
preventing the situation where Docker treats the container as healthy
despite fail2ban being down.
Changes:
- Update GET /api/health to return 503 on fail2ban offline
- Return appropriate JSON response bodies for each state
- Update tests to verify both online (200) and offline (503) scenarios
- Update Dockerfile HEALTHCHECK documentation
- Add Health Checks section to Deployment.md documentation
All tests pass with 100% coverage on health.py.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CRITICAL FIX: Background tasks (especially blocklist_import) crashed mid-execution,
leaving partial state. On retry, the same bans were applied again, causing duplicates.
Solution: Content-hash based operation tracking for blocklist imports:
- Added import_runs table (migration 6) to track operations by source + content hash
- Before banning, check if this exact content has already been imported
- If completed: skip banning (already done), optionally re-warm cache
- If new or failed: proceed with ban and mark as completed or failed
Changes:
- Database: Migration 6 adds import_runs table with operation state tracking
- Model: Added ImportRunEntry for import run records
- Repository: New import_run_repo module with CRUD operations
- Workflow: Updated blocklist_import_workflow to check operation history before banning
- Dependencies: Registered import_run_repo for dependency injection
- Tests: Added test_import_source_idempotent_on_retry and test_import_source_different_content_not_reused
- Documentation: Added Task Idempotency section to Backend-Development.md
Verification:
- All 7 import tests pass (5 existing + 2 new idempotency tests)
- Type checking: mypy --strict ✅
- Linting: ruff ✅
- No API changes, backwards compatible via automatic migration
Fixes: Background tasks not idempotent #CRITICAL
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add SecurityHeadersMiddleware to backend/app/main.py
- Implements Content-Security-Policy: default-src 'self'
- Implements X-Frame-Options: DENY (clickjacking protection)
- Implements X-Content-Type-Options: nosniff (MIME-sniffing protection)
- Implements X-XSS-Protection: 1; mode=block (browser XSS filters)
- Add CSP meta tag to frontend/index.html for defense-in-depth
- Create Docs/Security.md with comprehensive security headers documentation
- Add test suite (backend/tests/test_security_headers_middleware.py) with 5 tests
- Tests verify headers are present on success and error responses
- Tests ensure all four security headers are correctly set
- All existing tests continue to pass
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add global rate limiter utility with configurable limits and cleanup
- Move rate limiting logic to middleware for consistent application
- Update auth routes to use new rate limiter
- Add comprehensive tests for rate limiter functionality
- Update documentation with backend development guidelines and tasks
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- fail2ban: 0.5 CPU / 128M memory limit, 0.1 CPU / 64M reserved
- backend: 2.0 CPU / 512M memory limit, 1.0 CPU / 256M reserved
- frontend: 0.5 CPU / 128M memory limit, 0.25 CPU / 64M reserved
Prevents 'noisy neighbor' scenarios where one container exhausts
host resources (CPU, memory, disk). Limits are hard caps; reservations
guarantee minimum allocation to prevent OOM kills and ensure
responsive service even under load.
Fixes resource contention issue in production and staging environments.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- LoginPage now shows a loading spinner while validating the session
- Redirect to dashboard automatically once validation completes and session is valid
- Expose isValidating state through AuthProvider for components to track validation status
- Update useAuth hook to return isValidating along with isAuthenticated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document optional local Git pre-commit hook configuration to catch
type drift before commits. Also document Husky alternative.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add automated type synchronization from backend OpenAPI schema to frontend TypeScript types to prevent type drift and ensure runtime safety.
Changes:
- Add openapi-typescript as dev dependency
- Create npm scripts for type generation (generate:types) and validation (validate:types)
- Integrate type generation into build pipeline (runs before TypeScript compilation)
- Generate frontend/src/types/generated.ts from backend OpenAPI schema
- Add frontend/scripts/validate-types.sh for CI/CD validation
- Update Web-Development.md with type generation workflow documentation
- Update Backend-Development.md with OpenAPI schema sync requirements
Workflow:
1. Backend automatically exposes OpenAPI schema at /api/openapi.json (FastAPI built-in)
2. Frontend build runs 'npm run generate:types' to generate types from schema
3. Generated types are committed to version control
4. CI can run 'npm run validate:types' to fail builds if types drift
Fixes critical type safety issue where frontend types were manually maintained
and could become out of sync with backend Pydantic models.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses: Backend session cache not cluster-safe (multi-worker issue)
Problem:
- Session cache is process-local (InMemorySessionCache)
- Multi-worker deployments (uvicorn --workers N) create separate processes
- Each process has its own independent session cache
- Sessions cached in Worker A are invisible to Workers B, C, D
- Users randomly logged out when requests land on different workers
- Also affects RuntimeState, rate limiter, and background jobs
Solution (Option A - Strict single-worker enforcement):
- Enhance startup validation with clearer error messages
- Update error messages to explain the problem and how to fix it
- Document single-worker requirement prominently in Docker configs
- Update module docstrings to clarify constraints
Changes:
1. app/startup.py:
- Enhanced _check_single_worker_mode() error message with troubleshooting
- Enhanced _stage_check_worker_mode_and_acquire_lock() error message
- Removed unused import
2. app/utils/session_cache.py:
- Updated module docstring to explain constraints more clearly
- Added references to deployment documentation
- Clarified multi-worker solution for future implementation
3. app/utils/runtime_state.py:
- Updated module docstring with deployment constraint references
- Aligned messaging with session_cache.py
4. Docker/Dockerfile.backend:
- Added comprehensive comments about single-worker requirement
- Explained impact in multi-worker deployments
- Referenced deployment constraints documentation
5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml:
- Added documentation comments about BANGUI_WORKERS constraint
- Explained why single-worker is required
6. backend/tests/test_startup_integration.py:
- Fixed test unpacking to match function return signature (3 values, not 2)
This ensures multi-worker deployments fail loudly at startup with clear
guidance on what went wrong and how to fix it. The database-backed scheduler
lock provides defense-in-depth for container orchestration scenarios.
For future multi-worker support, implement:
- Redis or database-backed session cache
- Shared RuntimeState coordination
- Distributed APScheduler backend
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Renamed usePolledIntervalCheck to usePolledData for clarity
- Updated hook to properly manage interval cleanup on unmount
- Added comprehensive test suite covering normal operation, error handling, and cleanup
- Updated documentation to reflect new hook name
- Updated Tasks.md to track progress
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add BroadcastChannel API for real-time logout synchronization across tabs
- Implement storage event listener as fallback for older browsers
- When a user logs out in one tab, all other tabs immediately reflect the logout state
- Update tests to verify storage event and BroadcastChannel behavior
- Update Architecture.md to document cross-tab synchronization
- Update Web-Development.md with authentication state management notes
The provider now broadcasts logout messages to other tabs so they immediately
reflect the logout state without requiring a page refresh or additional API calls.
The implementation uses BroadcastChannel as the primary sync mechanism with
storage events as a fallback for older browsers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>