Commit Graph

10 Commits

Author SHA1 Message Date
d13efd4e59 feat: graceful shutdown and WAL cleanup
Some checks failed
CI / Backend Tests (pull_request) Has been cancelled
CI / Lint (pull_request) Has been cancelled
CI / Type Check (pull_request) Has been cancelled
CI / Import Boundary (pull_request) Has been cancelled
CI / OpenAPI Breaking Changes (pull_request) Has been cancelled
CI / OpenAPI Baseline Commit (pull_request) Has been cancelled
- Add stop_grace_period to backend container for graceful shutdown
- Document WAL mode rationale and orphaned file cleanup in db.py
- Handle database close errors gracefully in lifespan
- Clean up orphaned WAL files during startup before opening DB
- Reorder imports and fix formatting in startup.py
2026-05-24 22:04:58 +02:00
aebe0d0236 chore(release): bump version to 0.9.19-rc.4
- Add production Docker Compose configuration

- Add check_auth.py diagnostic script for session 401 debugging
2026-05-23 21:27:52 +02:00
79df1aa493 backup 2026-05-10 08:48:42 +02:00
b631c1c546 feat(backend): implement graceful shutdown for container stop
Graceful shutdown ensures in-flight operations complete before process exits:
- Lifespan shutdown handler drains pending tasks with 25s timeout
- Scheduler stops accepting new jobs immediately
- HTTP session, external logging, scheduler lock, DB conn closed cleanly
- 25s Python timeout leaves 5s margin before Docker's 30s SIGKILL

Files changed:
- backend/app/main.py: enhanced _lifespan shutdown with task drain
- Docker/Dockerfile.backend: documented signal handling in header
- Docker/docker-compose.yml: added stop_grace_period: 30s
- Docker/compose.prod.yml: added stop_grace_period: 30s
- Docs/Deployment.md: new Graceful Shutdown section with sequence table
- Docs/TROUBLESHOOTING.md: new Graceful Shutdown Issues section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-02 22:47:10 +02:00
0d5882b32f Fix HIGH priority issues: unbounded queries, rate limiting, health checks
Issue #3 - Unbounded Query Results (OOM):
- get_all_archived_history() now uses keyset pagination with bounded max_rows (50k default)
- Added 'id' field to records from get_archived_history() and get_archived_history_keyset()
- Protocol signature updated with page_size, max_rows, last_ban_id params

Issue #7 - Docker Health Check Fails:
- Added curl to Dockerfile.backend runtime image
- HEALTHCHECK now uses 'curl -f http://localhost:8000/api/health'
- compose.prod.yml: increased start_period to 40s, timeout to 10s
- Frontend healthcheck proxies to backend /api/health

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 21:47:36 +02:00
c4ede71fa6 Fix: Enforce single-worker deployment for session cache cluster safety
Addresses: Backend session cache not cluster-safe (multi-worker issue)

Problem:
- Session cache is process-local (InMemorySessionCache)
- Multi-worker deployments (uvicorn --workers N) create separate processes
- Each process has its own independent session cache
- Sessions cached in Worker A are invisible to Workers B, C, D
- Users randomly logged out when requests land on different workers
- Also affects RuntimeState, rate limiter, and background jobs

Solution (Option A - Strict single-worker enforcement):
- Enhance startup validation with clearer error messages
- Update error messages to explain the problem and how to fix it
- Document single-worker requirement prominently in Docker configs
- Update module docstrings to clarify constraints

Changes:
1. app/startup.py:
   - Enhanced _check_single_worker_mode() error message with troubleshooting
   - Enhanced _stage_check_worker_mode_and_acquire_lock() error message
   - Removed unused import

2. app/utils/session_cache.py:
   - Updated module docstring to explain constraints more clearly
   - Added references to deployment documentation
   - Clarified multi-worker solution for future implementation

3. app/utils/runtime_state.py:
   - Updated module docstring with deployment constraint references
   - Aligned messaging with session_cache.py

4. Docker/Dockerfile.backend:
   - Added comprehensive comments about single-worker requirement
   - Explained impact in multi-worker deployments
   - Referenced deployment constraints documentation

5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml:
   - Added documentation comments about BANGUI_WORKERS constraint
   - Explained why single-worker is required

6. backend/tests/test_startup_integration.py:
   - Fixed test unpacking to match function return signature (3 values, not 2)

This ensures multi-worker deployments fail loudly at startup with clear
guidance on what went wrong and how to fix it. The database-backed scheduler
lock provides defense-in-depth for container orchestration scenarios.

For future multi-worker support, implement:
- Redis or database-backed session cache
- Shared RuntimeState coordination
- Distributed APScheduler backend

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 20:54:24 +02:00
825a67f13a Add multi-worker detection for APScheduler safety
- Add _check_single_worker_mode() to startup.py that detects and rejects
  multi-worker configurations, raising a clear RuntimeError with instructions
- Set BANGUI_WORKERS=1 as default in Dockerfile.backend
- Document single-worker requirement in compose.prod.yml
- Add 'Deployment Constraints' section to Architekture.md explaining why
  single-worker mode is required and detailing future multi-worker support
- Add '9.1 Background Tasks and Scheduler Architecture' section to
  Backend-Development.md documenting task structure and single-worker requirement
- Add comprehensive test suite (test_startup.py) covering all scenarios:
  allows single worker, rejects multi-worker, validates config format,
  and verifies informative error messages

This fix addresses TASK-002 which identified that in-process APScheduler is
unsafe in multi-worker deployments due to each worker creating independent
scheduler instances, causing duplicate background job execution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-26 11:39:51 +02:00
f62785aaf2 Fix fail2ban runtime errors: jail not found, action locks, log noise
This commit implements fixes for three independent bugs in the fail2ban configuration and integration layer:

1. Task 1: Detect UnknownJailException and prevent silent failures
   - Added JailNotFoundError detection in jail_service.reload_all()
   - Enhanced error handling in config_file_service to catch JailNotFoundError
   - Added specific error message with logpath validation hints
   - Added rollback test for this scenario

2. Task 2: Fix iptables-allports exit code 4 (xtables lock contention)
   - Added global banaction setting in jail.conf with -w 5 lockingopt
   - Removed redundant per-jail banaction overrides from bangui-sim and blocklist-import
   - Added production compose documentation note

3. Task 3: Suppress log noise from unsupported backend/idle commands
   - Implemented capability detection to cache command support status
   - Double-check locking to minimize lock contention
   - Avoids sending unsupported get <jail> backend/idle commands
   - Returns default values without socket calls when unsupported

All changes include comprehensive tests and maintain backward compatibility.
2026-03-15 10:57:00 +01:00
ea35695221 Add better jail configuration: file CRUD, enable/disable, log paths
Task 4 (Better Jail Configuration) implementation:
- Add fail2ban_config_dir setting to app/config.py
- New file_config_service: list/view/edit/create jail.d, filter.d, action.d files
  with path-traversal prevention and 512 KB content size limit
- New file_config router: GET/PUT/POST endpoints for jail files, filter files,
  and action files; PUT .../enabled for toggle on/off
- Extend config_service with delete_log_path() and add_log_path()
- Add DELETE /api/config/jails/{name}/logpath and POST /api/config/jails/{name}/logpath
- Extend geo router with re-resolve endpoint; add geo_re_resolve background task
- Update blocklist_service with revised scheduling helpers
- Update Docker compose files with BANGUI_FAIL2BAN_CONFIG_DIR env var and
  rw volume mount for the fail2ban config directory
- Frontend: new Jail Files, Filters, Actions tabs in ConfigPage; file editor
  with accordion-per-file, editable textarea, save/create; add/delete log paths
- Frontend: types in types/config.ts; API calls in api/config.ts and api/endpoints.ts
- 63 new backend tests (test_file_config_service, test_file_config, test_geo_re_resolve)
- 6 new frontend tests in ConfigPageLogPath.test.tsx
- ruff, mypy --strict, tsc --noEmit, eslint: all clean; 617 backend tests pass
2026-03-12 20:08:33 +01:00
39ee1e2945 chore: add Docker config files and fix fail2ban bind mount path 2026-03-03 20:38:32 +01:00