refactor(backend): clean up models setup, improve ip utils, add adr docs

- Extract ADR documents for architectural decisions (SQLite, FastAPI, React, APScheduler, Scheduler) - Refactor setup.py: improve code structure and readability - Add IP validation utilities with test coverage - Update frontend components (BanTable, HistoryPage) - Add pre-commit hooks and CONTRIBUTING.md - Add .editorconfig for consistent coding standards
2026-05-03 18:04:45 +02:00
parent 2f9fc8076d
commit 5f0ab40816
17 changed files with 517 additions and 48 deletions
--- a/Docs/Tasks.md
+++ b/Docs/Tasks.md
@@ -1,46 +1,3 @@
-### Issue #27: MEDIUM - Inconsistent Error Handling Patterns
-
-**Where found**: 
- `ban_service.py` - Raises exceptions
- `server_service.py` - Returns defaults silently
- `jail_service.py` - Mixed approach
-
-**Why this is needed**: 
-Different services have different error handling contracts. Callers don't know what to expect.
-
-**Goal**: 
-Establish clear error handling contract for all services.
-
-**What to do**:
-1. Document error handling patterns:
-   ```python
-   class ServiceErrorContract:
-       """
-       ABORT_ON_ERROR: Raise exception, let router handle
-       RETURN_DEFAULT: Return empty result, log warning
-       PARTIAL_RESULT: Return partial success with error list
-       """
-   ```
-2. Each service method documents which pattern it follows
-3. Routers handle errors consistently
-4. Update service protocols
-
-**Possible traps and issues**:
- Users of service must know pattern
- Switching patterns is breaking change
-
-**Docs changes needed**:
- Create service development guide with error patterns
-
-**Doc references**:
- DETAILED_FINDINGS.md - Issue "Inconsistent Error Handling"
-
---
-
-## LOWER PRIORITY ISSUES (LOW-MEDIUM)
-
---
-
 ### Issue #28: LOW-MEDIUM - Missing Pre-Commit Hooks

 **Where found**: 
--- a/Docs/adr/ADR-001-SQLite-Application-Database.md
+++ b/Docs/adr/ADR-001-SQLite-Application-Database.md
@@ -0,0 +1,45 @@
+# ADR-001: SQLite as the Application Database
+
+## Status
+Accepted
+
+## Context
+BanGUI needs a database to store application state: configuration, session records,
+blocklist sources, import logs, and ban history archives.
+
+## Decision
+Use **SQLite** (via `aiosqlite`) as BanGUI's application database, persisted to a
+volume mounted from the host or a named Docker volume.
+
+## Rationale
+
+### Why SQLite over PostgreSQL?
+- **Zero-infrastructure:** No separate DB server process, no connection pooling,
+  no credentials to manage. Ships in the Docker container with no additional
+  configuration.
+- **Fail2ban-compatible:** The fail2ban database itself is SQLite. BanGUI already
+  depends on SQLite; adding a second database engine would increase operational
+  complexity for no benefit.
+- **Single-instance deployment:** BanGUI runs as a single service with one
+  background scheduler (enforced by `BANGUI_WORKERS=1`). Horizontal scaling is not
+  a design goal.
+- **Async I/O:** `aiosqlite` provides full async support, avoiding blocking I/O in
+  the FastAPI async request handlers.
+
+### Why not PostgreSQL?
+- Requires a separate service or sidecar container.
+- Adds connection overhead (TCP, connection pools, auth).
+- Over-engineered for a single-instance web app.
+
+### Trade-offs
+- **Not suitable for multi-worker deployments.** SQLite's file-level locking means
+  only one process can write at a time. This is explicitly enforced:
+  `BANGUI_WORKERS=1` is validated at startup, and `scheduler_lock` prevents
+  duplicate schedulers in restarts.
+- For future multi-instance deployments, BanGUI would need to migrate to
+  PostgreSQL or add a distributed lock layer (Redis + job store).
+
+## Consequences
+- Application database is a single file (`bangui.db`) in the container's data volume.
+- Backup is a file copy. No `pg_dump` equivalent needed.
+- Schema migrations managed via `app/startup.py` startup DAG.
--- a/Docs/adr/ADR-002-FastAPI-Backend-Framework.md
+++ b/Docs/adr/ADR-002-FastAPI-Backend-Framework.md
@@ -0,0 +1,45 @@
+# ADR-002: FastAPI over Django
+
+## Status
+Accepted
+
+## Context
+The backend requires a Python async web framework with strong typing, validation,
+and OpenAPI support.
+
+## Decision
+Use **FastAPI** as the backend framework.
+
+## Rationale
+
+### Why FastAPI over Django?
+- **Async-first:** FastAPI is built on Starlette with native `async def` route
+  handlers. Django's ORM and request handling are synchronous, requiring thread
+  pools for I/O-bound work.
+- **Modern Python 3.12+:** FastAPI embraces modern Python idioms — type annotations,
+  structural pattern matching, dataclasses. Django maintains broad Python 3.8+
+  compatibility and shows its age.
+- **Pydantic v2 integration:** FastAPI natively uses Pydantic for request/response
+  validation. Automatic OpenAPI schema generation from Pydantic models is seamless.
+- **Dependency injection:** FastAPI's `Depends()` system provides a lightweight,
+  explicit DI pattern without a separate container library.
+- **Performance:** FastAPI + Uvicorn consistently benchmarks as one of the fastest
+  Python web frameworks, comparable to Node.js and Go for JSON APIs.
+
+### Why not Django?
+- Django's synchronous ORM creates thread-pool bottlenecks with SQLite.
+- Django's "batteries-included" philosophy is overkill for BanGUI's scope.
+  We need REST endpoints and background tasks, not a full CMS.
+- Less flexible dependency injection — Django's middleware and view system is
+  less composable than FastAPI's routing layers.
+
+### Trade-offs
+- **Smaller ecosystem:** Django has decades of third-party packages. FastAPI's
+  ecosystem is younger but covers BanGUI's needs (structlog, aiosqlite, APScheduler,
+  aiohttp) completely.
+- **No built-in admin UI:** BanGUI is its own admin UI; Django's admin is irrelevant.
+
+## Consequences
+- FastAPI routes are defined in `app/routers/`.
+- Request/response models live in `app/models/`.
+- Dependency injection via `app/dependencies.py`.
--- a/Docs/adr/ADR-003-React-Frontend-Framework.md
+++ b/Docs/adr/ADR-003-React-Frontend-Framework.md
@@ -0,0 +1,37 @@
+# ADR-003: React over Vue
+
+## Status
+Accepted
+
+## Context
+The frontend requires a component-based SPA framework with strong typing, a
+battle-tested component library, and broad ecosystem support.
+
+## Decision
+Use **React 18+ with TypeScript** as the frontend framework.
+
+## Rationale
+
+### Why React over Vue?
+- **Ecosystem maturity:** React has the largest frontend ecosystem. Libraries
+  (date pickers, data grids, rich text editors) assume React availability first.
+- **Fluent UI v9:** Microsoft's official React component library is built for React.
+  The Vue-compatible version (Fluent UI Vue) lags significantly in features and
+  maintenance.
+- **Hiring and onboarding:** React is more widely known. New contributors are
+  more likely to arrive with React experience than Vue experience.
+- **Concurrent features:** React 18's concurrent rendering (`useTransition`,
+  `useDeferredValue`) provides a foundation for performance improvements in
+  data-heavy views like the ban table.
+
+### Why not Vue?
+- Fluent UI v9 does not provide first-class Vue support.
+- Vue's composition API is well-designed, but does not outweigh the Fluent UI
+  constraint.
+- Ecosystem and hiring advantages strongly favor React for enterprise-adjacent
+  projects.
+
+## Consequences
+- Frontend is a React SPA in `frontend/src/`.
+- All components are functional components using hooks.
+- Global state via React context (`frontend/src/providers/`).
--- a/Docs/adr/ADR-004-APScheduler-Background-Scheduler.md
+++ b/Docs/adr/ADR-004-APScheduler-Background-Scheduler.md
@@ -0,0 +1,48 @@
+# ADR-004: APScheduler over Celery
+
+## Status
+Accepted
+
+## Context
+BanGUI requires background task scheduling for periodic work: geo cache flush,
+session cleanup, history sync, and blocklist imports.
+
+## Decision
+Use **APScheduler 4.x (AsyncIOScheduler)** for background scheduling.
+
+## Rationale
+
+### Why APScheduler over Celery?
+- **No infrastructure:** Celery requires a message broker (Redis or RabbitMQ).
+  APScheduler runs in-process with no broker. Given BanGUI's single-instance
+  constraint, a message queue adds unnecessary operational complexity.
+- **Async-native:** `AsyncIOScheduler` integrates directly with the asyncio event
+  loop. All BanGUI's I/O (database, HTTP, fail2ban socket) is async. APScheduler
+  jobs are `async def` functions that `await` without blocking.
+- **Simplicity:** BanGUI's job set is fixed and small. Celery's rich task routing,
+  retry policies, and distributed execution are overkill. APScheduler covers
+  cron-style scheduling with simpler semantics.
+- **Single-instance enforcement:** APScheduler's in-memory job store is a natural
+  fit when there is only one scheduler. No distributed coordination needed.
+
+### Why not Celery?
+- Celery's architecture (broker + workers + result backend) is designed for
+  distributed systems. BanGUI is explicitly single-instance.
+- Celery tasks are synchronous wrappers around async code without careful
+  handling. Native `async def` tasks require `async_task()` or explicit `run_sync`,
+  creating friction in an async-first codebase.
+- Added operational burden: Redis or RabbitMQ must be available at startup.
+
+### Trade-offs
+- **No horizontal scaling of workers:** APScheduler jobs run in the single
+  uvicorn worker process. CPU-intensive jobs would block the event loop.
+  (This is not a concern for BanGUI's I/O-bound jobs.)
+- **No built-in retry mechanism:** Failed jobs must re-raise exceptions or
+  implement retry logic manually. This is acceptable given BanGUI's job
+  idempotency guarantees.
+
+## Consequences
+- Scheduler is configured in `app/startup.py` using `AsyncIOScheduler`.
+- Jobs live in `app/tasks/`.
+- Single-worker constraint is enforced via `BANGUI_WORKERS=1` validation and
+  the `scheduler_lock` database table.
--- a/Docs/adr/ADR-005-Single-Instance-Scheduler.md
+++ b/Docs/adr/ADR-005-Single-Instance-Scheduler.md
@@ -0,0 +1,61 @@
+# ADR-005: Single-Instance Scheduler Enforcement
+
+## Status
+Accepted
+
+## Context
+APScheduler's `AsyncIOScheduler` is bound to a single asyncio event loop.
+Running multiple scheduler instances leads to duplicate jobs, database lock
+contention, and undefined behaviour.
+
+## Decision
+Enforce exactly **one scheduler instance** across the entire application lifecycle,
+using a database-level distributed lock.
+
+## Mechanism
+
+### 1. Startup gate: `BANGUI_WORKERS=1`
+The Docker compose file is configured with `BANGUI_WORKERS=1` and the startup DAG
+validates this variable. If the variable is not set to `1`, startup aborts with
+a clear error message.
+
+### 2. Runtime lock: `scheduler_lock` table
+During startup, after opening the SQLite database, the application attempts:
+
+```sql
+INSERT INTO scheduler_lock (lock_name, heartbeat_at)
+VALUES ('scheduler', unixepoch())
+ON CONFLICT(lock_name) DO UPDATE SET heartbeat_at = unixepoch()
+WHERE (unixepoch() - heartbeat_at) < 30;
+```
+
+- If the INSERT succeeds, this instance holds the lock and starts the scheduler.
+- If the INSERT is a no-op (heartbeat is recent), another instance holds the lock
+  and startup continues without starting the scheduler.
+- A background task (`scheduler_lock_heartbeat`) updates the heartbeat every 10
+  seconds. If the process crashes, the lock expires after 30 seconds, allowing
+  a restart to acquire it immediately.
+
+### 3. Deployment topology
+| Deployment | Behaviour |
+|---|---|
+| Single container | Scheduler runs normally |
+| Single Pod (Kubernetes) | Scheduler runs normally |
+| Accidental multi-process restart | Second process fails to start scheduler; first continues |
+| Intentional multi-worker | Not supported; requires external job store (future) |
+
+## Rationale
+
+### Why this approach?
+- **No external coordination service:** No ZooKeeper, etcd, or Redis needed.
+  The existing SQLite database is reused.
+- **Atomic:** SQLite's INSERT with ON CONFLICT is atomic; no race condition.
+- **Self-healing:** Lock expiry means a crashed instance automatically releases
+  its lock. No manual cleanup required.
+- **Crash-safe:** A heartbeat-based TTL ensures stale locks are not held
+  indefinitely.
+
+## Consequences
+- `BANGUI_WORKERS` must always be `1`. This is documented and enforced.
+- Future multi-worker deployments require migration to a persistent job store
+  (PostgreSQL + SQLAlchemy job store, or Redis).