refactor(backend): clean up models setup, improve ip utils, add adr docs
- Extract ADR documents for architectural decisions (SQLite, FastAPI, React, APScheduler, Scheduler) - Refactor setup.py: improve code structure and readability - Add IP validation utilities with test coverage - Update frontend components (BanTable, HistoryPage) - Add pre-commit hooks and CONTRIBUTING.md - Add .editorconfig for consistent coding standards
This commit is contained in:
61
Docs/adr/ADR-005-Single-Instance-Scheduler.md
Normal file
61
Docs/adr/ADR-005-Single-Instance-Scheduler.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# ADR-005: Single-Instance Scheduler Enforcement
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
APScheduler's `AsyncIOScheduler` is bound to a single asyncio event loop.
|
||||
Running multiple scheduler instances leads to duplicate jobs, database lock
|
||||
contention, and undefined behaviour.
|
||||
|
||||
## Decision
|
||||
Enforce exactly **one scheduler instance** across the entire application lifecycle,
|
||||
using a database-level distributed lock.
|
||||
|
||||
## Mechanism
|
||||
|
||||
### 1. Startup gate: `BANGUI_WORKERS=1`
|
||||
The Docker compose file is configured with `BANGUI_WORKERS=1` and the startup DAG
|
||||
validates this variable. If the variable is not set to `1`, startup aborts with
|
||||
a clear error message.
|
||||
|
||||
### 2. Runtime lock: `scheduler_lock` table
|
||||
During startup, after opening the SQLite database, the application attempts:
|
||||
|
||||
```sql
|
||||
INSERT INTO scheduler_lock (lock_name, heartbeat_at)
|
||||
VALUES ('scheduler', unixepoch())
|
||||
ON CONFLICT(lock_name) DO UPDATE SET heartbeat_at = unixepoch()
|
||||
WHERE (unixepoch() - heartbeat_at) < 30;
|
||||
```
|
||||
|
||||
- If the INSERT succeeds, this instance holds the lock and starts the scheduler.
|
||||
- If the INSERT is a no-op (heartbeat is recent), another instance holds the lock
|
||||
and startup continues without starting the scheduler.
|
||||
- A background task (`scheduler_lock_heartbeat`) updates the heartbeat every 10
|
||||
seconds. If the process crashes, the lock expires after 30 seconds, allowing
|
||||
a restart to acquire it immediately.
|
||||
|
||||
### 3. Deployment topology
|
||||
| Deployment | Behaviour |
|
||||
|---|---|
|
||||
| Single container | Scheduler runs normally |
|
||||
| Single Pod (Kubernetes) | Scheduler runs normally |
|
||||
| Accidental multi-process restart | Second process fails to start scheduler; first continues |
|
||||
| Intentional multi-worker | Not supported; requires external job store (future) |
|
||||
|
||||
## Rationale
|
||||
|
||||
### Why this approach?
|
||||
- **No external coordination service:** No ZooKeeper, etcd, or Redis needed.
|
||||
The existing SQLite database is reused.
|
||||
- **Atomic:** SQLite's INSERT with ON CONFLICT is atomic; no race condition.
|
||||
- **Self-healing:** Lock expiry means a crashed instance automatically releases
|
||||
its lock. No manual cleanup required.
|
||||
- **Crash-safe:** A heartbeat-based TTL ensures stale locks are not held
|
||||
indefinitely.
|
||||
|
||||
## Consequences
|
||||
- `BANGUI_WORKERS` must always be `1`. This is documented and enforced.
|
||||
- Future multi-worker deployments require migration to a persistent job store
|
||||
(PostgreSQL + SQLAlchemy job store, or Redis).
|
||||
Reference in New Issue
Block a user