Update documentation

- Update Backend-Development.md with recent changes - Update Tasks.md with current status Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 08:16:03 +02:00
parent e86ab6dad1
commit ace8930482
2 changed files with 199 additions and 56 deletions
--- a/Docs/Backend-Development.md
+++ b/Docs/Backend-Development.md
@@ -705,35 +705,198 @@ async def test_migration_2_is_atomic(tmp_path: Path) -> None:

 ---

-## 7. Logging
+## 7. Structured Logging Policy

- Use **structlog** for every log message.
- Bind contextual key-value pairs — never format strings manually.
- Log levels: `debug` for development detail, `info` for operational events, `warning` for recoverable issues, `error` for failures, `critical` for fatal problems.
- **Never log sensitive data** (passwords, tokens, session tokens, raw credentials, private keys).
-  - For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`.
-  - Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`.
+All logging in BanGUI services and tasks must use **structlog** for consistent, queryable event tracking. This policy defines when and how to log at each level.
+
+### 7.1 Logging Levels and When to Use Them
+
+**INFO** — User-facing operations and state changes
+- Use for significant operational events that the operator should know about.
+- Examples: service startup/completion, resource creation/deletion, state transitions, successful operations with business impact.
+- Never excessive — keep volume reasonable to maintain log clarity.
+- Include relevant context: resource IDs, counts, configuration changes.

 ```python
-import structlog
-import hashlib
+log.info("jail_activated", jail=name)
+log.info("blocklist_source_created", id=new_id, name=name, url=url)
+log.info("session_cleanup_ran", deleted_count=count, cutoff_time=now_iso)
+```

-log: structlog.stdlib.BoundLogger = structlog.get_logger()
+**WARNING** — Recoverable failures, degraded functionality, unexpected but handled conditions
+- Use for issues that are not fatal but indicate something is wrong or suboptimal.
+- Examples: missing optional config, fallback behavior triggered, non-critical API call failed, parsing errors, missing resources that have workarounds.
+- Include error details and context to enable investigation.
+- Do NOT use for expected error paths (e.g., wrong password in login attempt).

-async def ban_ip(ip: str, jail: str) -> None:
-    log.info("banning_ip", ip=ip, jail=jail)
+```python
+log.warning("jail_status_parse_error", jail=name, error=str(exc))
+log.warning("geo_lookup_failed", ip=ip, error=type(exc).__name__)
+log.warning("geoip_mmdb_not_found", path=mmdb_path)
+```
+
+**ERROR** — Fatal/unrecoverable failures within a request or task
+- Use for errors that prevent an operation from completing.
+- Examples: database write failed, critical resource is missing, state is corrupted.
+- Include the full error context to enable debugging.
+- Pair with exception handling — if you log an error, you've decided the caller should handle it or return a failure.
+
+```python
+log.error("jail_activation_rollback_restore_failed", jail=name, error=str(exc))
+log.error("fail2ban_probe_parse_error", error=str(exc))
+```
+
+**EXCEPTION** — Unhandled exceptions in background tasks and scheduled jobs
+- Use `log.exception()` (not `log.error()`) in catch-all exception handlers to automatically capture the full traceback.
+- This level is ONLY for surprises that should never happen in production.
+- Use in task callback exception handlers and top-level task runners.
+
+```python
+try:
+    result = await blocklist_service.import_all(...)
+except Exception:
+    log.exception("blocklist_import_unexpected_error")
+```
+
+**DEBUG** — Low-level details for development and troubleshooting
+- Use for details too verbose for normal operation (e.g., successful lookups, parsed files, state transitions, loop iterations).
+- Include data samples and validation results.
+- Safe to leave in the code — debug logs are not emitted by default in production.
+
+```python
+log.debug("geo_lookup_success_mmdb", ip=ip, country=result.country_code)
+log.debug("action_file_parsed", name=raw.name)
+log.debug("backend_cmd_supported_detected")
+```
+
+### 7.2 Event Naming Convention
+
+All log event names must follow a consistent pattern for queryability:
+
+**Pattern:** `{domain}_{entity_or_action}[_{result_or_detail}]`
+
+- **Domain** — The service or feature area (e.g., `jail`, `blocklist`, `geo`, `auth`, `ban`).
+- **Entity or Action** — The noun or verb describing what happened (e.g., `activated`, `created`, `failed`, `synced`).
+- **Result or Detail** (optional) — Additional specificity for complex scenarios (e.g., `_restore_failed`, `_non_200`, `_no_fallback`).
+
+**Examples (organized by domain):**
+
+| Domain | INFO | WARNING | ERROR | DEBUG |
+|--------|------|---------|-------|-------|
+| **jail** | `jail_activated`, `jail_deactivated`, `jail_reloaded` | `jail_status_parse_error`, `jail_rollback_failed` | `jail_activation_rollback_restore_failed` | `jail_file_parsed`, `backend_cmd_supported_detected` |
+| **geo** | `geo_cache_loaded_from_db`, `geo_flush_dirty_complete` | `geo_lookup_failed`, `geo_persist_failed` | - | `geo_lookup_success_mmdb`, `geo_cache_cleanup_ran` |
+| **blocklist** | `blocklist_import_starting`, `blocklist_source_created` | `blocklist_schedule_invalid`, `blocklist_preview_failed` | - | `blocklist_ban_failed` |
+| **ban** | `active_bans_fetched`, `all_ips_unbanned` | `ban_service_geo_lookup_failed` | `ban_service_geo_lookup_unexpected_error` | `ban_entry_parse_error` |
+| **auth** | `bangui_login_success`, `bangui_logout` | `bangui_login_wrong_password`, `bangui_login_no_hash` | - | - |
+| **config** | `filter_created`, `action_updated` | `filter_read_error`, `action_d_not_found` | - | `filter_file_parsed` |
+| **setup** | `bangui_setup_started`, `bangui_setup_completed` | - | `bangui_setup_failed` | - |
+
+**Key rules:**
+- Use **snake_case** (never camelCase or PascalCase).
+- Keep event names short but descriptive — aim for 2–4 words.
+- Use consistent terminology across the codebase (e.g., always `_created`, never `_added` or `_new`).
+- Prefix task/background job events with the job name (e.g., `blocklist_import_starting`, `session_cleanup_ran`).
+
+### 7.3 Structured Context and Key-Value Pairs
+
+Always log with **structured context** — key-value pairs that make logs queryable and analyzable.
+
+**Essential patterns:**
+
+```python
+# Operation with count
+log.info("active_bans_fetched", total=len(bans))
+
+# Resource manipulation with ID
+log.info("blocklist_source_deleted", id=source_id)
+
+# State transition with reason or error
+log.warning("geo_persist_failed", ip=ip, error=type(exc).__name__)
+
+# Time-bounded operation
+log.info("session_cleanup_ran", deleted_count=count, cutoff_time=now_iso)
+
+# Config or feature change
+log.info("blocklist_schedule_updated", frequency=config.frequency, hour=config.hour)
+
+# Batch operation with metrics
+log.info("blocklist_import_finished", total_imported=result.total_imported, 
+         total_skipped=result.total_skipped, errors=result.errors_count)
+```
+
+**What to include:**
+- Resource IDs (jail names, IP addresses, source IDs).
+- Counts and metrics (rows processed, items synced, errors).
+- Configuration or decision points (enabled/disabled flags, thresholds).
+- Timestamps and durations for long-running operations (optional but useful).
+- Error types and short error messages (use `type(exc).__name__` or `str(exc)` depending on context).
+
+**What to NEVER log:**
+- Sensitive data: passwords, tokens, session IDs, API keys, hashes.
+  - For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`.
+  - Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`.
+- Full exception tracebacks in INFO/WARNING (use `log.exception()` only in catch-all handlers).
+- Redundant system time (structlog adds `timestamp` automatically).
+- User PII in most cases (name, email, phone, etc.) — unless directly relevant to debugging and anonymized.
+
+### 7.4 Background Tasks and Scheduled Jobs
+
+Every background task (APScheduler job) must follow this pattern:
+
+1. **On startup:** Log `{job_name}_scheduled` with interval or cron expression.
+2. **On execution:** Log `{job_name}_starting` (INFO) and `{job_name}_finished` or `{job_name}_ran` (INFO) with metrics.
+3. **On exception:** Use `log.exception("{job_name}_unexpected_error")` in the top-level try/except.
+
+```python
+# app/tasks/blocklist_import.py
+async def _run_import_with_resources(settings: Settings, http_session: ClientSession) -> None:
+    """APScheduler callback that imports all enabled blocklist sources."""
+    log.info("blocklist_import_starting")
    try:
-        await _execute_ban(ip, jail)
-        log.info("ip_banned", ip=ip, jail=jail)
-    except BanError as exc:
-        log.error("ban_failed", ip=ip, jail=jail, error=str(exc))
-        raise
+        result = await blocklist_service.import_all(...)
+        log.info("blocklist_import_finished",
+                 total_imported=result.total_imported,
+                 total_skipped=result.total_skipped,
+                 errors=result.errors_count)
+    except Exception:
+        log.exception("blocklist_import_unexpected_error")

-async def logout_session(db: aiosqlite.Connection, token: str) -> None:
-    # Use a one-way hash for token correlation in logs
-    token_hash = hashlib.sha256(token.encode()).hexdigest()[:12]
-    await session_repo.delete_session(db, token)
-    log.info("session_terminated", token_hash=token_hash)
+# Register and log in the lifespan or register function:
+log.info("blocklist_import_scheduled", interval_seconds=INTERVAL_SECONDS)
+```
+
+### 7.5 Service Functions and Methods
+
+Service functions should log at entry/exit for significant operations, or when errors occur:
+
+**Entry logging** (optional, use for complex or long-running operations):
+```python
+async def import_all(...) -> ImportResult:
+    log.debug("blocklist_import_starting", count=len(sources))
+    try:
+        ...
+    except SomeError:
+        log.warning("blocklist_import_partial_failure", imported=count, error=str(exc))
+```
+
+**Exit/success logging** (log results with metrics):
+```python
+async def get_all_jails(socket_path: str) -> list[JailResponse]:
+    jails = await _fetch_jails(socket_path)
+    log.info("jail_list_fetched", count=len(jails))
+    return jails
+```
+
+**Error handling** (log with context, let caller decide how to respond):
+```python
+try:
+    result = await fetch_external_data(url)
+except TimeoutError:
+    log.warning("external_fetch_timeout", url=url, timeout_seconds=TIMEOUT)
+    raise
+except Exception as exc:
+    log.error("external_fetch_failed", url=url, error=type(exc).__name__)
+    raise
 ```

 ---
@@ -1588,7 +1751,7 @@ atomic_write(path, updated_content)  # Atomic write, auto-cleanup on error

 These principles are **non-negotiable**. Every backend contributor must internalise and apply them daily.

-### 15.1 Clean Code
+### 17.1 Clean Code

 - Write code that **reads like well-written prose** — a new developer should understand intent without asking.
 - **Meaningful names** — variables, functions, and classes must reveal their purpose. Avoid abbreviations (`cnt`, `mgr`, `tmp`) unless universally understood.
@@ -1619,7 +1782,7 @@ async def check(ip, j):
        raise Exception("not found")
 ```

-### 15.2 Separation of Concerns (SoC)
+### 17.2 Separation of Concerns (SoC)

 - Each module, class, and function must have a **single, well-defined responsibility**.
 - **Routers** → HTTP layer only (parse requests, return responses).
@@ -1629,29 +1792,29 @@ async def check(ip, j):
 - **Tasks** → scheduled background jobs.
 - Never mix layers — a router must not execute SQL, and a repository must not raise `HTTPException`.

-### 15.3 Single Responsibility Principle (SRP)
+### 17.3 Single Responsibility Principle (SRP)

 - A class or module should have **one and only one reason to change**.
 - If a service handles both ban management *and* email notifications, split it into `BanService` and `NotificationService`.

-### 15.4 Don't Repeat Yourself (DRY)
+### 17.4 Don't Repeat Yourself (DRY)

 - Extract shared logic into utility functions, base classes, or dependency providers.
 - If the same block of code appears in more than one place, **refactor it** into a single source of truth.
 - But don't over-abstract — premature DRY that couples unrelated features is worse than a little duplication (see **Rule of Three**: refactor when something appears a third time).

-### 15.5 KISS — Keep It Simple, Stupid
+### 17.5 KISS — Keep It Simple, Stupid

 - Choose the simplest solution that works correctly.
 - Avoid clever tricks, premature optimisation, and over-engineering.
 - If a standard library function does the job, prefer it over a custom implementation.

-### 15.6 YAGNI — You Aren't Gonna Need It
+### 17.6 YAGNI — You Aren't Gonna Need It

 - Do **not** build features, abstractions, or config options "just in case".
 - Implement what is required **now**. Extend later when a real need emerges.

-### 15.7 Dependency Inversion Principle (DIP)
+### 17.7 Dependency Inversion Principle (DIP)

 - High-level modules (services) must not depend on low-level modules (repositories) directly. Both should depend on **abstractions** (protocols / interfaces).
 - Use FastAPI's `Depends()` to inject implementations — this makes swapping and testing trivial.
@@ -1940,17 +2103,17 @@ To adopt a Redis backend:
 - TTL must be respected — expired entries must be removed on access.
 - See `app/utils/session_cache.py` for the full Protocol definition and current implementations.

-### 15.8 Composition over Inheritance
+### 17.8 Composition over Inheritance

 - Favour **composing** small, focused objects over deep inheritance hierarchies.
 - Use mixins or protocols only when a clear "is-a" relationship exists; otherwise, pass collaborators as constructor arguments.

-### 15.9 Fail Fast
+### 17.9 Fail Fast

 - Validate inputs as early as possible — at the API boundary with Pydantic, at service entry with assertions or domain checks.
 - Raise specific exceptions immediately rather than letting bad data propagate silently.

-### 15.10 Law of Demeter (Principle of Least Knowledge)
+### 17.10 Law of Demeter (Principle of Least Knowledge)

 - A function should only call methods on:
  1. Its own object (`self`).
@@ -1958,13 +2121,13 @@ To adopt a Redis backend:
  3. Objects it creates.
 - Avoid long accessor chains like `request.state.db.cursor().execute(...)` — wrap them in a meaningful method.

-### 15.11 Defensive Programming
+### 17.11 Defensive Programming

 - Never trust external input — validate and sanitise everything that crosses a boundary (HTTP request, file, socket, environment variable).
 - Handle edge cases explicitly: empty lists, `None` values, negative numbers, empty strings.
 - Use type narrowing and exhaustive pattern matching (`match` / `case`) to eliminate impossible states.

-### 15.12 SSRF Prevention (Server-Side Request Forgery)
+### 17.12 SSRF Prevention (Server-Side Request Forgery)

 When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests:

@@ -1989,8 +2152,6 @@ When user-supplied URLs are fetched by the backend, validate them before making
   - `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check.
   - Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400.

---
-
 ## 18. Quick Reference — Do / Don't

 | Do | Don't |
@@ -1998,7 +2159,8 @@ When user-supplied URLs are fetched by the backend, validate them before making
 | Type every function, variable, return | Leave types implicit |
 | Use `async def` for I/O | Use sync functions for I/O |
 | Validate with Pydantic at the boundary | Pass raw dicts through the codebase |
-| Log with structlog + context keys | Use `print()` or format strings in logs |
+| Log with structlog + context keys (INFO/WARNING/ERROR/DEBUG) | Use `print()` or format strings in logs |
+| Use `log.exception()` in catch-all handlers (captures traceback) | Use `log.error()` for exceptions; let exceptions get lost |
 | Write tests for every feature | Ship untested code |
 | Use `aiohttp` for HTTP calls | Use `requests` |
 | Handle errors with custom exceptions | Use bare `except:` |