Update observability docs and task utilities

- Add Observability.md documentation - Standardize task logging with correlation_id support - Add log_sanitizer utility for PII masking - Update Tasks.md tracking - Update geo_cache tasks and other task modules with correlation_id Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-03 11:52:09 +02:00
parent 7b93499551
commit 0133489920
17 changed files with 582 additions and 124 deletions
--- a/Docs/Observability.md
+++ b/Docs/Observability.md
@@ -62,6 +62,40 @@ log.info("password_check", password=password_hash)  # Never!

 Structlog provides context variable filtering to prevent accidental logging of sensitive data. Code reviews must verify compliance with this rule.

+### Log Sanitization
+
+All external output (subprocess results, API responses, config file contents) passed to structlog **must** be sanitized first using `sanitize_for_logging()` from `app.utils.log_sanitizer`.
+
+This prevents sensitive data — passwords, API keys, tokens, private keys — from leaking into logs.
+
+```python
+from app.utils.log_sanitizer import sanitize_for_logging
+
+# ✓ Correct: Sanitize before logging
+log.error(
+    "fail2ban_start_failed",
+    command=" ".join(start_cmd_parts),
+    returncode=process.returncode,
+    stdout=sanitize_for_logging(stdout.decode("utf-8", errors="replace")),
+    stderr=sanitize_for_logging(stderr.decode("utf-8", errors="replace")),
+)
+
+# ✗ Wrong: Raw output may contain secrets
+log.error("fail2ban_start_failed", stdout=stdout_raw, stderr=stderr_raw)  # Never!
+```
+
+`sanitize_for_logging()` redacts the following patterns:
+
+| Pattern | Example match | Replacement |
+|---------|---------------|-------------|
+| `password=X` | `password=Secret123` | `password=***` |
+| `api_key=X` / `api-key=X` | `api_key=key123` | `api_key=***` |
+| `token=X` | `token=eyJhbG...` | `token=***` |
+| `Authorization: Bearer X` | `Authorization: Bearer tok...` | `Authorization: ***` |
+| `secret=X` | `secret=myvalue` | `secret=***` |
+| `-----BEGIN RSA PRIVATE KEY-----` | (key header) | `*** PRIVATE KEY ***` |
+| `AKIA...` | `AKIAIOSFODNN7EXAMPLE` | `AKIA***` |
+
 ---

 ## Structured Logging Best Practices
@@ -102,6 +136,35 @@ log.info("user_action", action="create_jail")  # Automatically includes correlat
 structlog.contextvars.clear_contextvars()
 ```

+### Background Task Correlation
+
+Background tasks (APScheduler jobs) run outside the HTTP request context.
+Use :mod:`app.utils.correlation` to propagate correlation IDs through tasks:
+
+```python
+from app.utils.correlation import get_correlation_id, reset_correlation_id, set_correlation_id
+
+async def my_background_task(correlation_id: str | None = None) -> None:
+    # Generate a new ID if not provided (scheduled tasks have no parent request)
+    if correlation_id is None:
+        import uuid
+        correlation_id = str(uuid.uuid4())
+
+    # Set the correlation ID for all logs in this task
+    token = set_correlation_id(correlation_id)
+    try:
+        log.info("task_started")  # Now includes correlation_id
+        # ... task logic ...
+    finally:
+        reset_correlation_id(token)
+
+# When scheduling, optionally pass the current correlation ID:
+# scheduler.add_job(my_background_task, kwargs={"correlation_id": get_correlation_id()})
+```
+
+Scheduled tasks (no parent request) generate a fresh UUID for each run.
+Tasks triggered by a request inherit the request's correlation ID.
+
 ### Event Naming Convention

 Use snake_case for event names, prefixed with the component or module name:
--- a/Docs/Tasks.md
+++ b/Docs/Tasks.md
@@ -1,103 +1,3 @@
-### Issue #18: MEDIUM - Configuration Validation Missing at Startup
-
-**Where found**: 
- `backend/app/config.py` (lines 37-95)
- `database_path` has no validation
- `fail2ban_socket` not verified to exist
- Hard-coded paths assumed in Docker
-
-**Why this is needed**: 
-Configuration errors not caught at startup:
- Database path doesn't exist - fails on first DB operation (confusing error)
- fail2ban socket wrong path - only fails when health check runs
- Directory not writable - discovered hours after deployment
-
-**Goal**: 
-Validate all configuration at startup with clear error messages.
-
-**What to do**:
-1. Add validators to config fields:
-   ```python
-   @field_validator("database_path")
-   def validate_db_path(cls, v):
-       path = Path(v)
-       parent = path.parent
-       
-       if not parent.exists():
-           raise ValueError(
-               f"Database parent directory does not exist: {parent}\n"
-               f"Create it with: mkdir -p {parent}"
-           )
-       
-       if not os.access(parent, os.W_OK):
-           raise ValueError(
-               f"Database directory not writable: {parent}\n"
-               f"Fix with: chmod 755 {parent}"
-           )
-       
-       return v
-   ```
-2. Validate fail2ban socket exists and is readable
-3. Verify session secret is sufficiently long
-4. Check environment variables are set
-5. Provide actionable error messages
-
-**Possible traps and issues**:
- Validation might be too strict for some deployments
- Need to handle cases where files don't exist yet but will be created
- Docker initialization order might delay file creation
-
-**Docs changes needed**:
- Document required directories and permissions
- Create setup validation troubleshooting guide
-
-**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.2 Missing Configuration Validation"
-
---
-
-### Issue #19: MEDIUM - Sensitive Data Could Leak in Logs
-
-**Where found**: 
- `backend/app/utils/fail2ban_client.py` (line 148) - Logs subprocess output without sanitization
- Could contain passwords, API keys from config files
-
-**Why this is needed**: 
-If subprocess output contains secrets, logs become security liability and violate compliance requirements.
-
-**Goal**: 
-Sanitize logs to remove sensitive information patterns.
-
-**What to do**:
-1. Create sanitizer function:
-   ```python
-   def sanitize_for_logging(text: str) -> str:
-       # Remove passwords
-       text = re.sub(r'password[=:]\S+', 'password=***', text, flags=re.IGNORECASE)
-       # Remove API keys
-       text = re.sub(r'api[_-]?key[=:]\S+', 'api_key=***', text, flags=re.IGNORECASE)
-       # Remove tokens
-       text = re.sub(r'token[=:]\S+', 'token=***', text, flags=re.IGNORECASE)
-       return text
-   ```
-2. Apply to all subprocess output and external responses
-3. Add to logging middleware
-4. Audit existing logs for sensitive data
-
-**Possible traps and issues**:
- Patterns might miss some sensitive data formats
- Over-sanitization might hide helpful debug info
- Performance cost if sanitizing large outputs
-
-**Docs changes needed**:
- Add logging best practices guide
- Document what's sanitized
-
-**Doc references**:
- DETAILED_FINDINGS.md - Issue #25 "Sensitive Data in Logs"
-
---
-
 ### Issue #20: MEDIUM - No Correlation ID in Background Tasks

 **Where found**: