Update documentation

- Update Backend-Development.md with recent changes
- Update Tasks.md with current status

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-28 08:16:03 +02:00
parent e86ab6dad1
commit ace8930482
2 changed files with 199 additions and 56 deletions

View File

@@ -705,35 +705,198 @@ async def test_migration_2_is_atomic(tmp_path: Path) -> None:
---
## 7. Logging
## 7. Structured Logging Policy
- Use **structlog** for every log message.
- Bind contextual key-value pairs — never format strings manually.
- Log levels: `debug` for development detail, `info` for operational events, `warning` for recoverable issues, `error` for failures, `critical` for fatal problems.
- **Never log sensitive data** (passwords, tokens, session tokens, raw credentials, private keys).
- For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`.
- Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`.
All logging in BanGUI services and tasks must use **structlog** for consistent, queryable event tracking. This policy defines when and how to log at each level.
### 7.1 Logging Levels and When to Use Them
**INFO** — User-facing operations and state changes
- Use for significant operational events that the operator should know about.
- Examples: service startup/completion, resource creation/deletion, state transitions, successful operations with business impact.
- Never excessive — keep volume reasonable to maintain log clarity.
- Include relevant context: resource IDs, counts, configuration changes.
```python
import structlog
import hashlib
log.info("jail_activated", jail=name)
log.info("blocklist_source_created", id=new_id, name=name, url=url)
log.info("session_cleanup_ran", deleted_count=count, cutoff_time=now_iso)
```
log: structlog.stdlib.BoundLogger = structlog.get_logger()
**WARNING** — Recoverable failures, degraded functionality, unexpected but handled conditions
- Use for issues that are not fatal but indicate something is wrong or suboptimal.
- Examples: missing optional config, fallback behavior triggered, non-critical API call failed, parsing errors, missing resources that have workarounds.
- Include error details and context to enable investigation.
- Do NOT use for expected error paths (e.g., wrong password in login attempt).
async def ban_ip(ip: str, jail: str) -> None:
log.info("banning_ip", ip=ip, jail=jail)
```python
log.warning("jail_status_parse_error", jail=name, error=str(exc))
log.warning("geo_lookup_failed", ip=ip, error=type(exc).__name__)
log.warning("geoip_mmdb_not_found", path=mmdb_path)
```
**ERROR** — Fatal/unrecoverable failures within a request or task
- Use for errors that prevent an operation from completing.
- Examples: database write failed, critical resource is missing, state is corrupted.
- Include the full error context to enable debugging.
- Pair with exception handling — if you log an error, you've decided the caller should handle it or return a failure.
```python
log.error("jail_activation_rollback_restore_failed", jail=name, error=str(exc))
log.error("fail2ban_probe_parse_error", error=str(exc))
```
**EXCEPTION** — Unhandled exceptions in background tasks and scheduled jobs
- Use `log.exception()` (not `log.error()`) in catch-all exception handlers to automatically capture the full traceback.
- This level is ONLY for surprises that should never happen in production.
- Use in task callback exception handlers and top-level task runners.
```python
try:
result = await blocklist_service.import_all(...)
except Exception:
log.exception("blocklist_import_unexpected_error")
```
**DEBUG** — Low-level details for development and troubleshooting
- Use for details too verbose for normal operation (e.g., successful lookups, parsed files, state transitions, loop iterations).
- Include data samples and validation results.
- Safe to leave in the code — debug logs are not emitted by default in production.
```python
log.debug("geo_lookup_success_mmdb", ip=ip, country=result.country_code)
log.debug("action_file_parsed", name=raw.name)
log.debug("backend_cmd_supported_detected")
```
### 7.2 Event Naming Convention
All log event names must follow a consistent pattern for queryability:
**Pattern:** `{domain}_{entity_or_action}[_{result_or_detail}]`
- **Domain** — The service or feature area (e.g., `jail`, `blocklist`, `geo`, `auth`, `ban`).
- **Entity or Action** — The noun or verb describing what happened (e.g., `activated`, `created`, `failed`, `synced`).
- **Result or Detail** (optional) — Additional specificity for complex scenarios (e.g., `_restore_failed`, `_non_200`, `_no_fallback`).
**Examples (organized by domain):**
| Domain | INFO | WARNING | ERROR | DEBUG |
|--------|------|---------|-------|-------|
| **jail** | `jail_activated`, `jail_deactivated`, `jail_reloaded` | `jail_status_parse_error`, `jail_rollback_failed` | `jail_activation_rollback_restore_failed` | `jail_file_parsed`, `backend_cmd_supported_detected` |
| **geo** | `geo_cache_loaded_from_db`, `geo_flush_dirty_complete` | `geo_lookup_failed`, `geo_persist_failed` | - | `geo_lookup_success_mmdb`, `geo_cache_cleanup_ran` |
| **blocklist** | `blocklist_import_starting`, `blocklist_source_created` | `blocklist_schedule_invalid`, `blocklist_preview_failed` | - | `blocklist_ban_failed` |
| **ban** | `active_bans_fetched`, `all_ips_unbanned` | `ban_service_geo_lookup_failed` | `ban_service_geo_lookup_unexpected_error` | `ban_entry_parse_error` |
| **auth** | `bangui_login_success`, `bangui_logout` | `bangui_login_wrong_password`, `bangui_login_no_hash` | - | - |
| **config** | `filter_created`, `action_updated` | `filter_read_error`, `action_d_not_found` | - | `filter_file_parsed` |
| **setup** | `bangui_setup_started`, `bangui_setup_completed` | - | `bangui_setup_failed` | - |
**Key rules:**
- Use **snake_case** (never camelCase or PascalCase).
- Keep event names short but descriptive — aim for 24 words.
- Use consistent terminology across the codebase (e.g., always `_created`, never `_added` or `_new`).
- Prefix task/background job events with the job name (e.g., `blocklist_import_starting`, `session_cleanup_ran`).
### 7.3 Structured Context and Key-Value Pairs
Always log with **structured context** — key-value pairs that make logs queryable and analyzable.
**Essential patterns:**
```python
# Operation with count
log.info("active_bans_fetched", total=len(bans))
# Resource manipulation with ID
log.info("blocklist_source_deleted", id=source_id)
# State transition with reason or error
log.warning("geo_persist_failed", ip=ip, error=type(exc).__name__)
# Time-bounded operation
log.info("session_cleanup_ran", deleted_count=count, cutoff_time=now_iso)
# Config or feature change
log.info("blocklist_schedule_updated", frequency=config.frequency, hour=config.hour)
# Batch operation with metrics
log.info("blocklist_import_finished", total_imported=result.total_imported,
total_skipped=result.total_skipped, errors=result.errors_count)
```
**What to include:**
- Resource IDs (jail names, IP addresses, source IDs).
- Counts and metrics (rows processed, items synced, errors).
- Configuration or decision points (enabled/disabled flags, thresholds).
- Timestamps and durations for long-running operations (optional but useful).
- Error types and short error messages (use `type(exc).__name__` or `str(exc)` depending on context).
**What to NEVER log:**
- Sensitive data: passwords, tokens, session IDs, API keys, hashes.
- For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`.
- Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`.
- Full exception tracebacks in INFO/WARNING (use `log.exception()` only in catch-all handlers).
- Redundant system time (structlog adds `timestamp` automatically).
- User PII in most cases (name, email, phone, etc.) — unless directly relevant to debugging and anonymized.
### 7.4 Background Tasks and Scheduled Jobs
Every background task (APScheduler job) must follow this pattern:
1. **On startup:** Log `{job_name}_scheduled` with interval or cron expression.
2. **On execution:** Log `{job_name}_starting` (INFO) and `{job_name}_finished` or `{job_name}_ran` (INFO) with metrics.
3. **On exception:** Use `log.exception("{job_name}_unexpected_error")` in the top-level try/except.
```python
# app/tasks/blocklist_import.py
async def _run_import_with_resources(settings: Settings, http_session: ClientSession) -> None:
"""APScheduler callback that imports all enabled blocklist sources."""
log.info("blocklist_import_starting")
try:
await _execute_ban(ip, jail)
log.info("ip_banned", ip=ip, jail=jail)
except BanError as exc:
log.error("ban_failed", ip=ip, jail=jail, error=str(exc))
raise
result = await blocklist_service.import_all(...)
log.info("blocklist_import_finished",
total_imported=result.total_imported,
total_skipped=result.total_skipped,
errors=result.errors_count)
except Exception:
log.exception("blocklist_import_unexpected_error")
async def logout_session(db: aiosqlite.Connection, token: str) -> None:
# Use a one-way hash for token correlation in logs
token_hash = hashlib.sha256(token.encode()).hexdigest()[:12]
await session_repo.delete_session(db, token)
log.info("session_terminated", token_hash=token_hash)
# Register and log in the lifespan or register function:
log.info("blocklist_import_scheduled", interval_seconds=INTERVAL_SECONDS)
```
### 7.5 Service Functions and Methods
Service functions should log at entry/exit for significant operations, or when errors occur:
**Entry logging** (optional, use for complex or long-running operations):
```python
async def import_all(...) -> ImportResult:
log.debug("blocklist_import_starting", count=len(sources))
try:
...
except SomeError:
log.warning("blocklist_import_partial_failure", imported=count, error=str(exc))
```
**Exit/success logging** (log results with metrics):
```python
async def get_all_jails(socket_path: str) -> list[JailResponse]:
jails = await _fetch_jails(socket_path)
log.info("jail_list_fetched", count=len(jails))
return jails
```
**Error handling** (log with context, let caller decide how to respond):
```python
try:
result = await fetch_external_data(url)
except TimeoutError:
log.warning("external_fetch_timeout", url=url, timeout_seconds=TIMEOUT)
raise
except Exception as exc:
log.error("external_fetch_failed", url=url, error=type(exc).__name__)
raise
```
---
@@ -1588,7 +1751,7 @@ atomic_write(path, updated_content) # Atomic write, auto-cleanup on error
These principles are **non-negotiable**. Every backend contributor must internalise and apply them daily.
### 15.1 Clean Code
### 17.1 Clean Code
- Write code that **reads like well-written prose** — a new developer should understand intent without asking.
- **Meaningful names** — variables, functions, and classes must reveal their purpose. Avoid abbreviations (`cnt`, `mgr`, `tmp`) unless universally understood.
@@ -1619,7 +1782,7 @@ async def check(ip, j):
raise Exception("not found")
```
### 15.2 Separation of Concerns (SoC)
### 17.2 Separation of Concerns (SoC)
- Each module, class, and function must have a **single, well-defined responsibility**.
- **Routers** → HTTP layer only (parse requests, return responses).
@@ -1629,29 +1792,29 @@ async def check(ip, j):
- **Tasks** → scheduled background jobs.
- Never mix layers — a router must not execute SQL, and a repository must not raise `HTTPException`.
### 15.3 Single Responsibility Principle (SRP)
### 17.3 Single Responsibility Principle (SRP)
- A class or module should have **one and only one reason to change**.
- If a service handles both ban management *and* email notifications, split it into `BanService` and `NotificationService`.
### 15.4 Don't Repeat Yourself (DRY)
### 17.4 Don't Repeat Yourself (DRY)
- Extract shared logic into utility functions, base classes, or dependency providers.
- If the same block of code appears in more than one place, **refactor it** into a single source of truth.
- But don't over-abstract — premature DRY that couples unrelated features is worse than a little duplication (see **Rule of Three**: refactor when something appears a third time).
### 15.5 KISS — Keep It Simple, Stupid
### 17.5 KISS — Keep It Simple, Stupid
- Choose the simplest solution that works correctly.
- Avoid clever tricks, premature optimisation, and over-engineering.
- If a standard library function does the job, prefer it over a custom implementation.
### 15.6 YAGNI — You Aren't Gonna Need It
### 17.6 YAGNI — You Aren't Gonna Need It
- Do **not** build features, abstractions, or config options "just in case".
- Implement what is required **now**. Extend later when a real need emerges.
### 15.7 Dependency Inversion Principle (DIP)
### 17.7 Dependency Inversion Principle (DIP)
- High-level modules (services) must not depend on low-level modules (repositories) directly. Both should depend on **abstractions** (protocols / interfaces).
- Use FastAPI's `Depends()` to inject implementations — this makes swapping and testing trivial.
@@ -1940,17 +2103,17 @@ To adopt a Redis backend:
- TTL must be respected — expired entries must be removed on access.
- See `app/utils/session_cache.py` for the full Protocol definition and current implementations.
### 15.8 Composition over Inheritance
### 17.8 Composition over Inheritance
- Favour **composing** small, focused objects over deep inheritance hierarchies.
- Use mixins or protocols only when a clear "is-a" relationship exists; otherwise, pass collaborators as constructor arguments.
### 15.9 Fail Fast
### 17.9 Fail Fast
- Validate inputs as early as possible — at the API boundary with Pydantic, at service entry with assertions or domain checks.
- Raise specific exceptions immediately rather than letting bad data propagate silently.
### 15.10 Law of Demeter (Principle of Least Knowledge)
### 17.10 Law of Demeter (Principle of Least Knowledge)
- A function should only call methods on:
1. Its own object (`self`).
@@ -1958,13 +2121,13 @@ To adopt a Redis backend:
3. Objects it creates.
- Avoid long accessor chains like `request.state.db.cursor().execute(...)` — wrap them in a meaningful method.
### 15.11 Defensive Programming
### 17.11 Defensive Programming
- Never trust external input — validate and sanitise everything that crosses a boundary (HTTP request, file, socket, environment variable).
- Handle edge cases explicitly: empty lists, `None` values, negative numbers, empty strings.
- Use type narrowing and exhaustive pattern matching (`match` / `case`) to eliminate impossible states.
### 15.12 SSRF Prevention (Server-Side Request Forgery)
### 17.12 SSRF Prevention (Server-Side Request Forgery)
When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests:
@@ -1989,8 +2152,6 @@ When user-supplied URLs are fetched by the backend, validate them before making
- `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check.
- Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400.
---
## 18. Quick Reference — Do / Don't
| Do | Don't |
@@ -1998,7 +2159,8 @@ When user-supplied URLs are fetched by the backend, validate them before making
| Type every function, variable, return | Leave types implicit |
| Use `async def` for I/O | Use sync functions for I/O |
| Validate with Pydantic at the boundary | Pass raw dicts through the codebase |
| Log with structlog + context keys | Use `print()` or format strings in logs |
| Log with structlog + context keys (INFO/WARNING/ERROR/DEBUG) | Use `print()` or format strings in logs |
| Use `log.exception()` in catch-all handlers (captures traceback) | Use `log.error()` for exceptions; let exceptions get lost |
| Write tests for every feature | Ship untested code |
| Use `aiohttp` for HTTP calls | Use `requests` |
| Handle errors with custom exceptions | Use bare `except:` |