Addresses: Backend session cache not cluster-safe (multi-worker issue)
Problem:
- Session cache is process-local (InMemorySessionCache)
- Multi-worker deployments (uvicorn --workers N) create separate processes
- Each process has its own independent session cache
- Sessions cached in Worker A are invisible to Workers B, C, D
- Users randomly logged out when requests land on different workers
- Also affects RuntimeState, rate limiter, and background jobs
Solution (Option A - Strict single-worker enforcement):
- Enhance startup validation with clearer error messages
- Update error messages to explain the problem and how to fix it
- Document single-worker requirement prominently in Docker configs
- Update module docstrings to clarify constraints
Changes:
1. app/startup.py:
- Enhanced _check_single_worker_mode() error message with troubleshooting
- Enhanced _stage_check_worker_mode_and_acquire_lock() error message
- Removed unused import
2. app/utils/session_cache.py:
- Updated module docstring to explain constraints more clearly
- Added references to deployment documentation
- Clarified multi-worker solution for future implementation
3. app/utils/runtime_state.py:
- Updated module docstring with deployment constraint references
- Aligned messaging with session_cache.py
4. Docker/Dockerfile.backend:
- Added comprehensive comments about single-worker requirement
- Explained impact in multi-worker deployments
- Referenced deployment constraints documentation
5. Docker/docker-compose.yml, compose.prod.yml, compose.debug.yml:
- Added documentation comments about BANGUI_WORKERS constraint
- Explained why single-worker is required
6. backend/tests/test_startup_integration.py:
- Fixed test unpacking to match function return signature (3 values, not 2)
This ensures multi-worker deployments fail loudly at startup with clear
guidance on what went wrong and how to fix it. The database-backed scheduler
lock provides defense-in-depth for container orchestration scenarios.
For future multi-worker support, implement:
- Redis or database-backed session cache
- Shared RuntimeState coordination
- Distributed APScheduler backend
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add _check_single_worker_mode() to startup.py that detects and rejects
multi-worker configurations, raising a clear RuntimeError with instructions
- Set BANGUI_WORKERS=1 as default in Dockerfile.backend
- Document single-worker requirement in compose.prod.yml
- Add 'Deployment Constraints' section to Architekture.md explaining why
single-worker mode is required and detailing future multi-worker support
- Add '9.1 Background Tasks and Scheduler Architecture' section to
Backend-Development.md documenting task structure and single-worker requirement
- Add comprehensive test suite (test_startup.py) covering all scenarios:
allows single worker, rejects multi-worker, validates config format,
and verifies informative error messages
This fix addresses TASK-002 which identified that in-process APScheduler is
unsafe in multi-worker deployments due to each worker creating independent
scheduler instances, causing duplicate background job execution.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit implements fixes for three independent bugs in the fail2ban configuration and integration layer:
1. Task 1: Detect UnknownJailException and prevent silent failures
- Added JailNotFoundError detection in jail_service.reload_all()
- Enhanced error handling in config_file_service to catch JailNotFoundError
- Added specific error message with logpath validation hints
- Added rollback test for this scenario
2. Task 2: Fix iptables-allports exit code 4 (xtables lock contention)
- Added global banaction setting in jail.conf with -w 5 lockingopt
- Removed redundant per-jail banaction overrides from bangui-sim and blocklist-import
- Added production compose documentation note
3. Task 3: Suppress log noise from unsupported backend/idle commands
- Implemented capability detection to cache command support status
- Double-check locking to minimize lock contention
- Avoids sending unsupported get <jail> backend/idle commands
- Returns default values without socket calls when unsupported
All changes include comprehensive tests and maintain backward compatibility.
Task 4 (Better Jail Configuration) implementation:
- Add fail2ban_config_dir setting to app/config.py
- New file_config_service: list/view/edit/create jail.d, filter.d, action.d files
with path-traversal prevention and 512 KB content size limit
- New file_config router: GET/PUT/POST endpoints for jail files, filter files,
and action files; PUT .../enabled for toggle on/off
- Extend config_service with delete_log_path() and add_log_path()
- Add DELETE /api/config/jails/{name}/logpath and POST /api/config/jails/{name}/logpath
- Extend geo router with re-resolve endpoint; add geo_re_resolve background task
- Update blocklist_service with revised scheduling helpers
- Update Docker compose files with BANGUI_FAIL2BAN_CONFIG_DIR env var and
rw volume mount for the fail2ban config directory
- Frontend: new Jail Files, Filters, Actions tabs in ConfigPage; file editor
with accordion-per-file, editable textarea, save/create; add/delete log paths
- Frontend: types in types/config.ts; API calls in api/config.ts and api/endpoints.ts
- 63 new backend tests (test_file_config_service, test_file_config, test_geo_re_resolve)
- 6 new frontend tests in ConfigPageLogPath.test.tsx
- ruff, mypy --strict, tsc --noEmit, eslint: all clean; 617 backend tests pass