Update documentation
- Update Backend-Development.md with recent changes - Update Tasks.md with current status Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -705,35 +705,198 @@ async def test_migration_2_is_atomic(tmp_path: Path) -> None:
|
||||
|
||||
---
|
||||
|
||||
## 7. Logging
|
||||
## 7. Structured Logging Policy
|
||||
|
||||
- Use **structlog** for every log message.
|
||||
- Bind contextual key-value pairs — never format strings manually.
|
||||
- Log levels: `debug` for development detail, `info` for operational events, `warning` for recoverable issues, `error` for failures, `critical` for fatal problems.
|
||||
- **Never log sensitive data** (passwords, tokens, session tokens, raw credentials, private keys).
|
||||
- For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`.
|
||||
- Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`.
|
||||
All logging in BanGUI services and tasks must use **structlog** for consistent, queryable event tracking. This policy defines when and how to log at each level.
|
||||
|
||||
### 7.1 Logging Levels and When to Use Them
|
||||
|
||||
**INFO** — User-facing operations and state changes
|
||||
- Use for significant operational events that the operator should know about.
|
||||
- Examples: service startup/completion, resource creation/deletion, state transitions, successful operations with business impact.
|
||||
- Never excessive — keep volume reasonable to maintain log clarity.
|
||||
- Include relevant context: resource IDs, counts, configuration changes.
|
||||
|
||||
```python
|
||||
import structlog
|
||||
import hashlib
|
||||
log.info("jail_activated", jail=name)
|
||||
log.info("blocklist_source_created", id=new_id, name=name, url=url)
|
||||
log.info("session_cleanup_ran", deleted_count=count, cutoff_time=now_iso)
|
||||
```
|
||||
|
||||
log: structlog.stdlib.BoundLogger = structlog.get_logger()
|
||||
**WARNING** — Recoverable failures, degraded functionality, unexpected but handled conditions
|
||||
- Use for issues that are not fatal but indicate something is wrong or suboptimal.
|
||||
- Examples: missing optional config, fallback behavior triggered, non-critical API call failed, parsing errors, missing resources that have workarounds.
|
||||
- Include error details and context to enable investigation.
|
||||
- Do NOT use for expected error paths (e.g., wrong password in login attempt).
|
||||
|
||||
async def ban_ip(ip: str, jail: str) -> None:
|
||||
log.info("banning_ip", ip=ip, jail=jail)
|
||||
```python
|
||||
log.warning("jail_status_parse_error", jail=name, error=str(exc))
|
||||
log.warning("geo_lookup_failed", ip=ip, error=type(exc).__name__)
|
||||
log.warning("geoip_mmdb_not_found", path=mmdb_path)
|
||||
```
|
||||
|
||||
**ERROR** — Fatal/unrecoverable failures within a request or task
|
||||
- Use for errors that prevent an operation from completing.
|
||||
- Examples: database write failed, critical resource is missing, state is corrupted.
|
||||
- Include the full error context to enable debugging.
|
||||
- Pair with exception handling — if you log an error, you've decided the caller should handle it or return a failure.
|
||||
|
||||
```python
|
||||
log.error("jail_activation_rollback_restore_failed", jail=name, error=str(exc))
|
||||
log.error("fail2ban_probe_parse_error", error=str(exc))
|
||||
```
|
||||
|
||||
**EXCEPTION** — Unhandled exceptions in background tasks and scheduled jobs
|
||||
- Use `log.exception()` (not `log.error()`) in catch-all exception handlers to automatically capture the full traceback.
|
||||
- This level is ONLY for surprises that should never happen in production.
|
||||
- Use in task callback exception handlers and top-level task runners.
|
||||
|
||||
```python
|
||||
try:
|
||||
result = await blocklist_service.import_all(...)
|
||||
except Exception:
|
||||
log.exception("blocklist_import_unexpected_error")
|
||||
```
|
||||
|
||||
**DEBUG** — Low-level details for development and troubleshooting
|
||||
- Use for details too verbose for normal operation (e.g., successful lookups, parsed files, state transitions, loop iterations).
|
||||
- Include data samples and validation results.
|
||||
- Safe to leave in the code — debug logs are not emitted by default in production.
|
||||
|
||||
```python
|
||||
log.debug("geo_lookup_success_mmdb", ip=ip, country=result.country_code)
|
||||
log.debug("action_file_parsed", name=raw.name)
|
||||
log.debug("backend_cmd_supported_detected")
|
||||
```
|
||||
|
||||
### 7.2 Event Naming Convention
|
||||
|
||||
All log event names must follow a consistent pattern for queryability:
|
||||
|
||||
**Pattern:** `{domain}_{entity_or_action}[_{result_or_detail}]`
|
||||
|
||||
- **Domain** — The service or feature area (e.g., `jail`, `blocklist`, `geo`, `auth`, `ban`).
|
||||
- **Entity or Action** — The noun or verb describing what happened (e.g., `activated`, `created`, `failed`, `synced`).
|
||||
- **Result or Detail** (optional) — Additional specificity for complex scenarios (e.g., `_restore_failed`, `_non_200`, `_no_fallback`).
|
||||
|
||||
**Examples (organized by domain):**
|
||||
|
||||
| Domain | INFO | WARNING | ERROR | DEBUG |
|
||||
|--------|------|---------|-------|-------|
|
||||
| **jail** | `jail_activated`, `jail_deactivated`, `jail_reloaded` | `jail_status_parse_error`, `jail_rollback_failed` | `jail_activation_rollback_restore_failed` | `jail_file_parsed`, `backend_cmd_supported_detected` |
|
||||
| **geo** | `geo_cache_loaded_from_db`, `geo_flush_dirty_complete` | `geo_lookup_failed`, `geo_persist_failed` | - | `geo_lookup_success_mmdb`, `geo_cache_cleanup_ran` |
|
||||
| **blocklist** | `blocklist_import_starting`, `blocklist_source_created` | `blocklist_schedule_invalid`, `blocklist_preview_failed` | - | `blocklist_ban_failed` |
|
||||
| **ban** | `active_bans_fetched`, `all_ips_unbanned` | `ban_service_geo_lookup_failed` | `ban_service_geo_lookup_unexpected_error` | `ban_entry_parse_error` |
|
||||
| **auth** | `bangui_login_success`, `bangui_logout` | `bangui_login_wrong_password`, `bangui_login_no_hash` | - | - |
|
||||
| **config** | `filter_created`, `action_updated` | `filter_read_error`, `action_d_not_found` | - | `filter_file_parsed` |
|
||||
| **setup** | `bangui_setup_started`, `bangui_setup_completed` | - | `bangui_setup_failed` | - |
|
||||
|
||||
**Key rules:**
|
||||
- Use **snake_case** (never camelCase or PascalCase).
|
||||
- Keep event names short but descriptive — aim for 2–4 words.
|
||||
- Use consistent terminology across the codebase (e.g., always `_created`, never `_added` or `_new`).
|
||||
- Prefix task/background job events with the job name (e.g., `blocklist_import_starting`, `session_cleanup_ran`).
|
||||
|
||||
### 7.3 Structured Context and Key-Value Pairs
|
||||
|
||||
Always log with **structured context** — key-value pairs that make logs queryable and analyzable.
|
||||
|
||||
**Essential patterns:**
|
||||
|
||||
```python
|
||||
# Operation with count
|
||||
log.info("active_bans_fetched", total=len(bans))
|
||||
|
||||
# Resource manipulation with ID
|
||||
log.info("blocklist_source_deleted", id=source_id)
|
||||
|
||||
# State transition with reason or error
|
||||
log.warning("geo_persist_failed", ip=ip, error=type(exc).__name__)
|
||||
|
||||
# Time-bounded operation
|
||||
log.info("session_cleanup_ran", deleted_count=count, cutoff_time=now_iso)
|
||||
|
||||
# Config or feature change
|
||||
log.info("blocklist_schedule_updated", frequency=config.frequency, hour=config.hour)
|
||||
|
||||
# Batch operation with metrics
|
||||
log.info("blocklist_import_finished", total_imported=result.total_imported,
|
||||
total_skipped=result.total_skipped, errors=result.errors_count)
|
||||
```
|
||||
|
||||
**What to include:**
|
||||
- Resource IDs (jail names, IP addresses, source IDs).
|
||||
- Counts and metrics (rows processed, items synced, errors).
|
||||
- Configuration or decision points (enabled/disabled flags, thresholds).
|
||||
- Timestamps and durations for long-running operations (optional but useful).
|
||||
- Error types and short error messages (use `type(exc).__name__` or `str(exc)` depending on context).
|
||||
|
||||
**What to NEVER log:**
|
||||
- Sensitive data: passwords, tokens, session IDs, API keys, hashes.
|
||||
- For session correlation without leaking token material, use a one-way hash fragment: `hashlib.sha256(token.encode()).hexdigest()[:12]`.
|
||||
- Use numeric database IDs for entity correlation instead of raw identifiers: `session_id=session.id` instead of `token=session.token`.
|
||||
- Full exception tracebacks in INFO/WARNING (use `log.exception()` only in catch-all handlers).
|
||||
- Redundant system time (structlog adds `timestamp` automatically).
|
||||
- User PII in most cases (name, email, phone, etc.) — unless directly relevant to debugging and anonymized.
|
||||
|
||||
### 7.4 Background Tasks and Scheduled Jobs
|
||||
|
||||
Every background task (APScheduler job) must follow this pattern:
|
||||
|
||||
1. **On startup:** Log `{job_name}_scheduled` with interval or cron expression.
|
||||
2. **On execution:** Log `{job_name}_starting` (INFO) and `{job_name}_finished` or `{job_name}_ran` (INFO) with metrics.
|
||||
3. **On exception:** Use `log.exception("{job_name}_unexpected_error")` in the top-level try/except.
|
||||
|
||||
```python
|
||||
# app/tasks/blocklist_import.py
|
||||
async def _run_import_with_resources(settings: Settings, http_session: ClientSession) -> None:
|
||||
"""APScheduler callback that imports all enabled blocklist sources."""
|
||||
log.info("blocklist_import_starting")
|
||||
try:
|
||||
await _execute_ban(ip, jail)
|
||||
log.info("ip_banned", ip=ip, jail=jail)
|
||||
except BanError as exc:
|
||||
log.error("ban_failed", ip=ip, jail=jail, error=str(exc))
|
||||
raise
|
||||
result = await blocklist_service.import_all(...)
|
||||
log.info("blocklist_import_finished",
|
||||
total_imported=result.total_imported,
|
||||
total_skipped=result.total_skipped,
|
||||
errors=result.errors_count)
|
||||
except Exception:
|
||||
log.exception("blocklist_import_unexpected_error")
|
||||
|
||||
async def logout_session(db: aiosqlite.Connection, token: str) -> None:
|
||||
# Use a one-way hash for token correlation in logs
|
||||
token_hash = hashlib.sha256(token.encode()).hexdigest()[:12]
|
||||
await session_repo.delete_session(db, token)
|
||||
log.info("session_terminated", token_hash=token_hash)
|
||||
# Register and log in the lifespan or register function:
|
||||
log.info("blocklist_import_scheduled", interval_seconds=INTERVAL_SECONDS)
|
||||
```
|
||||
|
||||
### 7.5 Service Functions and Methods
|
||||
|
||||
Service functions should log at entry/exit for significant operations, or when errors occur:
|
||||
|
||||
**Entry logging** (optional, use for complex or long-running operations):
|
||||
```python
|
||||
async def import_all(...) -> ImportResult:
|
||||
log.debug("blocklist_import_starting", count=len(sources))
|
||||
try:
|
||||
...
|
||||
except SomeError:
|
||||
log.warning("blocklist_import_partial_failure", imported=count, error=str(exc))
|
||||
```
|
||||
|
||||
**Exit/success logging** (log results with metrics):
|
||||
```python
|
||||
async def get_all_jails(socket_path: str) -> list[JailResponse]:
|
||||
jails = await _fetch_jails(socket_path)
|
||||
log.info("jail_list_fetched", count=len(jails))
|
||||
return jails
|
||||
```
|
||||
|
||||
**Error handling** (log with context, let caller decide how to respond):
|
||||
```python
|
||||
try:
|
||||
result = await fetch_external_data(url)
|
||||
except TimeoutError:
|
||||
log.warning("external_fetch_timeout", url=url, timeout_seconds=TIMEOUT)
|
||||
raise
|
||||
except Exception as exc:
|
||||
log.error("external_fetch_failed", url=url, error=type(exc).__name__)
|
||||
raise
|
||||
```
|
||||
|
||||
---
|
||||
@@ -1588,7 +1751,7 @@ atomic_write(path, updated_content) # Atomic write, auto-cleanup on error
|
||||
|
||||
These principles are **non-negotiable**. Every backend contributor must internalise and apply them daily.
|
||||
|
||||
### 15.1 Clean Code
|
||||
### 17.1 Clean Code
|
||||
|
||||
- Write code that **reads like well-written prose** — a new developer should understand intent without asking.
|
||||
- **Meaningful names** — variables, functions, and classes must reveal their purpose. Avoid abbreviations (`cnt`, `mgr`, `tmp`) unless universally understood.
|
||||
@@ -1619,7 +1782,7 @@ async def check(ip, j):
|
||||
raise Exception("not found")
|
||||
```
|
||||
|
||||
### 15.2 Separation of Concerns (SoC)
|
||||
### 17.2 Separation of Concerns (SoC)
|
||||
|
||||
- Each module, class, and function must have a **single, well-defined responsibility**.
|
||||
- **Routers** → HTTP layer only (parse requests, return responses).
|
||||
@@ -1629,29 +1792,29 @@ async def check(ip, j):
|
||||
- **Tasks** → scheduled background jobs.
|
||||
- Never mix layers — a router must not execute SQL, and a repository must not raise `HTTPException`.
|
||||
|
||||
### 15.3 Single Responsibility Principle (SRP)
|
||||
### 17.3 Single Responsibility Principle (SRP)
|
||||
|
||||
- A class or module should have **one and only one reason to change**.
|
||||
- If a service handles both ban management *and* email notifications, split it into `BanService` and `NotificationService`.
|
||||
|
||||
### 15.4 Don't Repeat Yourself (DRY)
|
||||
### 17.4 Don't Repeat Yourself (DRY)
|
||||
|
||||
- Extract shared logic into utility functions, base classes, or dependency providers.
|
||||
- If the same block of code appears in more than one place, **refactor it** into a single source of truth.
|
||||
- But don't over-abstract — premature DRY that couples unrelated features is worse than a little duplication (see **Rule of Three**: refactor when something appears a third time).
|
||||
|
||||
### 15.5 KISS — Keep It Simple, Stupid
|
||||
### 17.5 KISS — Keep It Simple, Stupid
|
||||
|
||||
- Choose the simplest solution that works correctly.
|
||||
- Avoid clever tricks, premature optimisation, and over-engineering.
|
||||
- If a standard library function does the job, prefer it over a custom implementation.
|
||||
|
||||
### 15.6 YAGNI — You Aren't Gonna Need It
|
||||
### 17.6 YAGNI — You Aren't Gonna Need It
|
||||
|
||||
- Do **not** build features, abstractions, or config options "just in case".
|
||||
- Implement what is required **now**. Extend later when a real need emerges.
|
||||
|
||||
### 15.7 Dependency Inversion Principle (DIP)
|
||||
### 17.7 Dependency Inversion Principle (DIP)
|
||||
|
||||
- High-level modules (services) must not depend on low-level modules (repositories) directly. Both should depend on **abstractions** (protocols / interfaces).
|
||||
- Use FastAPI's `Depends()` to inject implementations — this makes swapping and testing trivial.
|
||||
@@ -1940,17 +2103,17 @@ To adopt a Redis backend:
|
||||
- TTL must be respected — expired entries must be removed on access.
|
||||
- See `app/utils/session_cache.py` for the full Protocol definition and current implementations.
|
||||
|
||||
### 15.8 Composition over Inheritance
|
||||
### 17.8 Composition over Inheritance
|
||||
|
||||
- Favour **composing** small, focused objects over deep inheritance hierarchies.
|
||||
- Use mixins or protocols only when a clear "is-a" relationship exists; otherwise, pass collaborators as constructor arguments.
|
||||
|
||||
### 15.9 Fail Fast
|
||||
### 17.9 Fail Fast
|
||||
|
||||
- Validate inputs as early as possible — at the API boundary with Pydantic, at service entry with assertions or domain checks.
|
||||
- Raise specific exceptions immediately rather than letting bad data propagate silently.
|
||||
|
||||
### 15.10 Law of Demeter (Principle of Least Knowledge)
|
||||
### 17.10 Law of Demeter (Principle of Least Knowledge)
|
||||
|
||||
- A function should only call methods on:
|
||||
1. Its own object (`self`).
|
||||
@@ -1958,13 +2121,13 @@ To adopt a Redis backend:
|
||||
3. Objects it creates.
|
||||
- Avoid long accessor chains like `request.state.db.cursor().execute(...)` — wrap them in a meaningful method.
|
||||
|
||||
### 15.11 Defensive Programming
|
||||
### 17.11 Defensive Programming
|
||||
|
||||
- Never trust external input — validate and sanitise everything that crosses a boundary (HTTP request, file, socket, environment variable).
|
||||
- Handle edge cases explicitly: empty lists, `None` values, negative numbers, empty strings.
|
||||
- Use type narrowing and exhaustive pattern matching (`match` / `case`) to eliminate impossible states.
|
||||
|
||||
### 15.12 SSRF Prevention (Server-Side Request Forgery)
|
||||
### 17.12 SSRF Prevention (Server-Side Request Forgery)
|
||||
|
||||
When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests:
|
||||
|
||||
@@ -1989,8 +2152,6 @@ When user-supplied URLs are fetched by the backend, validate them before making
|
||||
- `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check.
|
||||
- Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400.
|
||||
|
||||
---
|
||||
|
||||
## 18. Quick Reference — Do / Don't
|
||||
|
||||
| Do | Don't |
|
||||
@@ -1998,7 +2159,8 @@ When user-supplied URLs are fetched by the backend, validate them before making
|
||||
| Type every function, variable, return | Leave types implicit |
|
||||
| Use `async def` for I/O | Use sync functions for I/O |
|
||||
| Validate with Pydantic at the boundary | Pass raw dicts through the codebase |
|
||||
| Log with structlog + context keys | Use `print()` or format strings in logs |
|
||||
| Log with structlog + context keys (INFO/WARNING/ERROR/DEBUG) | Use `print()` or format strings in logs |
|
||||
| Use `log.exception()` in catch-all handlers (captures traceback) | Use `log.error()` for exceptions; let exceptions get lost |
|
||||
| Write tests for every feature | Ship untested code |
|
||||
| Use `aiohttp` for HTTP calls | Use `requests` |
|
||||
| Handle errors with custom exceptions | Use bare `except:` |
|
||||
|
||||
Reference in New Issue
Block a user