Add multi-worker detection for APScheduler safety
- Add _check_single_worker_mode() to startup.py that detects and rejects multi-worker configurations, raising a clear RuntimeError with instructions - Set BANGUI_WORKERS=1 as default in Dockerfile.backend - Document single-worker requirement in compose.prod.yml - Add 'Deployment Constraints' section to Architekture.md explaining why single-worker mode is required and detailing future multi-worker support - Add '9.1 Background Tasks and Scheduler Architecture' section to Backend-Development.md documenting task structure and single-worker requirement - Add comprehensive test suite (test_startup.py) covering all scenarios: allows single worker, rejects multi-worker, validates config format, and verifies informative error messages This fix addresses TASK-002 which identified that in-process APScheduler is unsafe in multi-worker deployments due to each worker creating independent scheduler instances, causing duplicate background job execution. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -282,7 +282,7 @@ The FastAPI app factory. Responsibilities:
|
||||
- Registers the **lifespan** context manager (startup: open DB, create aiohttp session, start scheduler; shutdown: close all)
|
||||
- Mounts all routers
|
||||
- Registers global exception handlers that map domain exceptions to HTTP status codes
|
||||
- Applies the setup-redirect middleware (redirects all requests to `/api/setup` when no configuration exists)
|
||||
- Applies the setup-redirect middleware (returns `423 Locked` for all API requests when no configuration exists, except for `/api/setup` and `/api/health`)
|
||||
|
||||
---
|
||||
|
||||
@@ -713,8 +713,8 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
- All endpoints are grouped under `/api/` prefix.
|
||||
- JSON request and response bodies, validated by Pydantic models.
|
||||
- Authentication via session cookie on all endpoints except `/api/setup` and `/api/auth/login`.
|
||||
- Setup-redirect middleware: while no configuration exists, all endpoints return `303 See Other` → `/api/setup`.
|
||||
- Standard HTTP status codes: `200` success, `201` created, `204` no content, `400` bad request, `401` unauthorized, `404` not found, `422` validation error, `500` server error.
|
||||
- Setup-redirect middleware: while no configuration exists, all API endpoints (except `/api/setup` and `/api/health`) return `423 Locked` with `{"detail": "Setup not complete.", "setup_required": true}`. This ensures API consumers can detect setup as a distinct condition rather than transparently following redirects.
|
||||
- Standard HTTP status codes: `200` success, `201` created, `204` no content, `400` bad request, `401` unauthorized, `404` not found, `422` validation error, `423` locked, `500` server error.
|
||||
- Error responses follow a consistent shape: `{ "detail": "Human-readable message" }`.
|
||||
|
||||
### 8.2 Endpoint Groups
|
||||
@@ -768,6 +768,40 @@ APScheduler 4.x (async mode) manages recurring background tasks.
|
||||
|
||||
---
|
||||
|
||||
## 9.2 Deployment Constraints
|
||||
|
||||
### Single-Worker Requirement
|
||||
|
||||
**BanGUI's background scheduler must run with exactly one uvicorn worker process.**
|
||||
|
||||
The application uses APScheduler's `AsyncIOScheduler`, which is bound to a single asyncio event loop and cannot be safely shared across multiple worker processes. If the app is deployed with `--workers N` (where N > 1), the following failures occur:
|
||||
|
||||
- Each worker process creates its own independent scheduler instance.
|
||||
- All background jobs execute **N times simultaneously** (once per worker).
|
||||
- Results:
|
||||
- **Duplicate blocklist imports** — the same IP ranges are banned N times.
|
||||
- **Duplicate history entries** — the same historical events are recorded N times.
|
||||
- **Duplicate ban operations** — bans are executed multiple times, with potential state conflicts.
|
||||
- **SQLite lock contention** — concurrent writes to the same database from N workers cause lock timeouts.
|
||||
|
||||
### Enforcement
|
||||
|
||||
1. **Environment variable:** Set `BANGUI_WORKERS=1` (default in Dockerfile.backend).
|
||||
2. **Detection:** On startup, `startup_shared_resources()` validates `BANGUI_WORKERS` and raises a clear `RuntimeError` if it is not 1.
|
||||
3. **Single-process design:** The application is optimized for a single-process, high-concurrency model using asyncio. Request handling is fully async and leverages the event loop efficiently.
|
||||
|
||||
### Future Multi-Worker Support
|
||||
|
||||
To safely support multiple workers in the future:
|
||||
|
||||
1. **External job store:** Move APScheduler from in-memory to a persistent store (e.g., SQLAlchemy-backed job store with PostgreSQL or Redis).
|
||||
2. **Distributed locking:** Use a distributed lock (Redis, etcd) to ensure only one worker executes each scheduled job.
|
||||
3. **Process coordination:** Implement a process-to-worker pool communication mechanism so the scheduler runs only on one designated worker.
|
||||
|
||||
Currently, the single-worker approach is simple, maintainable, and sufficient for BanGUI's operational requirements.
|
||||
|
||||
---
|
||||
|
||||
## 10. Design Principles
|
||||
|
||||
These principles govern all architectural decisions in BanGUI.
|
||||
|
||||
Reference in New Issue
Block a user