Files
BanGUI/Docs/adr/ADR-005-Single-Instance-Scheduler.md
Lukas 5f0ab40816 refactor(backend): clean up models setup, improve ip utils, add adr docs
- Extract ADR documents for architectural decisions (SQLite, FastAPI, React, APScheduler, Scheduler)
- Refactor setup.py: improve code structure and readability
- Add IP validation utilities with test coverage
- Update frontend components (BanTable, HistoryPage)
- Add pre-commit hooks and CONTRIBUTING.md
- Add .editorconfig for consistent coding standards
2026-05-03 18:04:45 +02:00

2.3 KiB

ADR-005: Single-Instance Scheduler Enforcement

Status

Accepted

Context

APScheduler's AsyncIOScheduler is bound to a single asyncio event loop. Running multiple scheduler instances leads to duplicate jobs, database lock contention, and undefined behaviour.

Decision

Enforce exactly one scheduler instance across the entire application lifecycle, using a database-level distributed lock.

Mechanism

1. Startup gate: BANGUI_WORKERS=1

The Docker compose file is configured with BANGUI_WORKERS=1 and the startup DAG validates this variable. If the variable is not set to 1, startup aborts with a clear error message.

2. Runtime lock: scheduler_lock table

During startup, after opening the SQLite database, the application attempts:

INSERT INTO scheduler_lock (lock_name, heartbeat_at)
VALUES ('scheduler', unixepoch())
ON CONFLICT(lock_name) DO UPDATE SET heartbeat_at = unixepoch()
WHERE (unixepoch() - heartbeat_at) < 30;
  • If the INSERT succeeds, this instance holds the lock and starts the scheduler.
  • If the INSERT is a no-op (heartbeat is recent), another instance holds the lock and startup continues without starting the scheduler.
  • A background task (scheduler_lock_heartbeat) updates the heartbeat every 10 seconds. If the process crashes, the lock expires after 30 seconds, allowing a restart to acquire it immediately.

3. Deployment topology

Deployment Behaviour
Single container Scheduler runs normally
Single Pod (Kubernetes) Scheduler runs normally
Accidental multi-process restart Second process fails to start scheduler; first continues
Intentional multi-worker Not supported; requires external job store (future)

Rationale

Why this approach?

  • No external coordination service: No ZooKeeper, etcd, or Redis needed. The existing SQLite database is reused.
  • Atomic: SQLite's INSERT with ON CONFLICT is atomic; no race condition.
  • Self-healing: Lock expiry means a crashed instance automatically releases its lock. No manual cleanup required.
  • Crash-safe: A heartbeat-based TTL ensures stale locks are not held indefinitely.

Consequences

  • BANGUI_WORKERS must always be 1. This is documented and enforced.
  • Future multi-worker deployments require migration to a persistent job store (PostgreSQL + SQLAlchemy job store, or Redis).