- Extract ADR documents for architectural decisions (SQLite, FastAPI, React, APScheduler, Scheduler) - Refactor setup.py: improve code structure and readability - Add IP validation utilities with test coverage - Update frontend components (BanTable, HistoryPage) - Add pre-commit hooks and CONTRIBUTING.md - Add .editorconfig for consistent coding standards
2.3 KiB
2.3 KiB
ADR-005: Single-Instance Scheduler Enforcement
Status
Accepted
Context
APScheduler's AsyncIOScheduler is bound to a single asyncio event loop.
Running multiple scheduler instances leads to duplicate jobs, database lock
contention, and undefined behaviour.
Decision
Enforce exactly one scheduler instance across the entire application lifecycle, using a database-level distributed lock.
Mechanism
1. Startup gate: BANGUI_WORKERS=1
The Docker compose file is configured with BANGUI_WORKERS=1 and the startup DAG
validates this variable. If the variable is not set to 1, startup aborts with
a clear error message.
2. Runtime lock: scheduler_lock table
During startup, after opening the SQLite database, the application attempts:
INSERT INTO scheduler_lock (lock_name, heartbeat_at)
VALUES ('scheduler', unixepoch())
ON CONFLICT(lock_name) DO UPDATE SET heartbeat_at = unixepoch()
WHERE (unixepoch() - heartbeat_at) < 30;
- If the INSERT succeeds, this instance holds the lock and starts the scheduler.
- If the INSERT is a no-op (heartbeat is recent), another instance holds the lock and startup continues without starting the scheduler.
- A background task (
scheduler_lock_heartbeat) updates the heartbeat every 10 seconds. If the process crashes, the lock expires after 30 seconds, allowing a restart to acquire it immediately.
3. Deployment topology
| Deployment | Behaviour |
|---|---|
| Single container | Scheduler runs normally |
| Single Pod (Kubernetes) | Scheduler runs normally |
| Accidental multi-process restart | Second process fails to start scheduler; first continues |
| Intentional multi-worker | Not supported; requires external job store (future) |
Rationale
Why this approach?
- No external coordination service: No ZooKeeper, etcd, or Redis needed. The existing SQLite database is reused.
- Atomic: SQLite's INSERT with ON CONFLICT is atomic; no race condition.
- Self-healing: Lock expiry means a crashed instance automatically releases its lock. No manual cleanup required.
- Crash-safe: A heartbeat-based TTL ensures stale locks are not held indefinitely.
Consequences
BANGUI_WORKERSmust always be1. This is documented and enforced.- Future multi-worker deployments require migration to a persistent job store (PostgreSQL + SQLAlchemy job store, or Redis).