refactor(logging): replace structlog with stdlib logging compat layer

- Remove structlog dependency from backend/pyproject.toml
- Add app.utils.logging_compat shim for keyword-arg logging API
- Add app.utils.json_formatter for JSON log output with extra fields
- Update all backend modules to use logging_compat.get_logger()
- Update docstrings in log_sanitizer.py and json_formatter.py
- Update test comment in test_async_utils.py
- Record 406 failing tests in Docs/Tasks.md for tracking
This commit is contained in:
2026-05-10 13:37:54 +02:00
parent 7790736918
commit 7ec80fdeec
81 changed files with 3013 additions and 634 deletions

View File

@@ -1238,8 +1238,6 @@ The `setup_completed = "1"` key is still written for backward compatibility with
- **GeoCache** — `GeoCache` instance is created at startup with a configurable `allow_http_fallback` flag and stored on `app.state.geo_cache`. It implements a primary + fallback resolution strategy: (1) try local MaxMind GeoLite2-Country MMDB database (primary, encrypted, no network traffic), (2) if unavailable/no result and allowed, fall back to ip-api.com HTTP API (unencrypted, disabled by default for security). Encapsulates in-memory lookup cache, negative cache for unresolvable IPs (5-minute TTL), dirty set for persistence, and thread-safe async locking. Cache is loaded from the `geo_cache` SQLite table on startup. New resolutions are accumulated in memory and periodically flushed to the database by the `geo_cache_flush` background task. Stale entries are re-resolved by the `geo_re_resolve` task. Injected into routes and tasks via FastAPI's dependency system. See Backend-Development.md § IP Geolocation Resolution for setup and security details.
- **Runtime state** (`RuntimeState` in `app.utils.runtime_state`) — stores mutable application state: `server_status` (fail2ban online/offline), `last_activation` (jail activation tracking), `pending_recovery` (crash detection), `runtime_settings` (effective configuration), and service-specific state holders like `jail_service_state` (`JailServiceState` for jail capability detection cache). RuntimeState fields are managed through dedicated functions (e.g., `record_activation()`, `clear_pending_recovery()`) and via dependency injection to services. Service-specific state (like `JailServiceState`) is nested within `RuntimeState` to keep all mutable state in one controlled location. **⚠️ RuntimeState is process-local and only safe when BanGUI runs as a single asyncio worker.** Mutations must not span `await` points (cooperative scheduling within a single event loop is safe). In multi-worker deployments, each process has its own copy — logouts from worker A don't affect worker B's cache, health status updates are per-worker, and activation tracking is unreliable. BanGUI enforces single-worker mode (TASK-002) to prevent this issue. For future multi-worker support, replace RuntimeState with a shared coordination backend (Redis, shared memory, database). See `app/utils/runtime_state.py` module docstring for details.
- **Setup-completion flag** — once `is_setup_complete()` returns `True`, the result is stored in `app.state._setup_complete_cached`. The `SetupRedirectMiddleware` skips the DB query on all subsequent requests, removing 1 SQL query per request for the common post-setup case. The completion flag is only written after the runtime database is successfully initialized and all initial setup settings are persisted, preventing a failed setup from permanently bypassing the setup wizard.
- **Login Rate Limiting** — the `/api/auth/login` endpoint employs exponential backoff to defend against brute-force attacks. Each failed login attempt is recorded per client IP, and subsequent attempts within the backoff window return HTTP 429 Too Many Requests. The penalty grows exponentially with each consecutive failure (2s, 4s, 8s, up to 10s max), ensuring attackers face rapidly increasing delays. This is complemented by bcrypt password hashing (≈100ms per attempt), which adds computational resistance without blocking legitimate users. The backoff counter resets after 60 seconds without additional failures. The rate limiter is process-local and tracks failures in memory via `app.utils.rate_limiter.RateLimiter`, stored on `app.state.login_rate_limiter`. Client IP detection respects proxy headers (`X-Forwarded-For`, `X-Real-IP`) only from configured trusted proxies, preventing header spoofing attacks. In multi-worker deployments, each worker has independent rate limit counters; BanGUI enforces single-worker mode (TASK-002) to prevent attackers from bypassing limits by distributing requests across workers.
### 8.1 CSRF Protection
State-mutating endpoints (POST, PUT, DELETE, PATCH) that use cookie-based authentication are protected against Cross-Site Request Forgery (CSRF) attacks via a **custom header check middleware**.

View File

@@ -1665,6 +1665,37 @@ async def get_jail(...) -> JailDetailResponse:
---
### 7.7 Third-Party Library Log Levels
Application code must use **structlog** for all logging. Third-party libraries that emit logs through Python's standard `logging` module are configured centrally in `backend/app/main.py::_configure_logging()`.
**Current overrides:**
| Library | Logger | Level | Reason |
|---------|--------|-------|--------|
| APScheduler | `apscheduler` | `WARNING` | Routine scheduler polling is too verbose at DEBUG. |
| aiosqlite | `aiosqlite` | `WARNING` | Database operation traces clutter logs. |
**Adding a new override:**
```python
# In backend/app/main.py, inside _configure_logging()
logging.getLogger("new_library").setLevel(logging.WARNING)
```
- Prefer `WARNING` over `ERROR` so legitimate warnings (e.g., connection retries) are still visible.
- Place the override immediately after `logging.basicConfig()` so it takes effect before any library initializes its own loggers.
**Disabling suppression:**
Set `BANGUI_SUPPRESS_THIRD_PARTY_LOGS=false` to allow APScheduler and aiosqlite to emit their normal DEBUG/INFO logs. This is useful when troubleshooting scheduler or database issues in development.
**Stdlib interception:**
All stdlib logs are intercepted by `structlog.stdlib.ProcessorFormatter` and rendered as JSON. Even third-party library logs therefore appear as structured JSON in `bangui.log`, not plain text.
---
## 8. Error Handling
- Define **custom exception classes** for domain errors (e.g., `JailNotFoundError`, `BanFailedError`).
@@ -2771,41 +2802,6 @@ update = GlobalConfigUpdate(log_target="/etc/passwd") # Raises ValidationError
await config_service.update_global_config(socket_path, update) # Validates again before sending to fail2ban
```
### Login Rate Limiting
The login endpoint (`POST /api/auth/login`) is protected against brute-force attacks using an in-memory exponential backoff rate limiter.
**Design:**
- Uses a `dict[str, deque[float]]` keyed by client IP, storing failed login timestamps within a time window.
- Old failures outside the time window are automatically pruned during validation checks.
- Expired IP entries are cleaned up to prevent unbounded memory growth.
**Rate Limit Rules:**
- **Exponential backoff:** Each failed login attempt incurs a progressively longer delay before the next attempt is allowed:
- 1st failure: 1 × 2¹ = 2 seconds
- 2nd failure: 1 × 2² = 4 seconds
- 3rd failure: 1 × 2³ = 8 seconds
- 4th+ failures: capped at 10 seconds (max)
- Failed attempts that arrive during the backoff period return **HTTP 429 Too Many Requests** with a `Retry-After` header indicating the remaining wait time.
- Each failed login is also accompanied by bcrypt password hashing (~100ms), providing additional computational resistance.
- The backoff counter resets after the rate-limit window (60 seconds by default) expires with no new failures.
**IP Extraction (Proxy Safety):**
- When behind nginx, the rate limiter reads the real client IP from `X-Forwarded-For` or `X-Real-IP` headers.
- Only trusts these headers when the immediate connection source is in a configured trusted proxy list.
- Prevents attackers from spoofing these headers to bypass rate limits.
- Falls back to the direct connection IP when proxy headers cannot be trusted.
**Process-Local Limitation:**
- The rate limiter is process-local (in-memory). In multi-worker deployments (e.g., Gunicorn with 4 workers), each worker maintains its own rate limit counter.
- This is acceptable because the single-worker constraint is enforced elsewhere. See [TASK-002/003 notes](Instructions.md) for details.
**Implementation:**
- Rate limiter: `app.utils.rate_limiter.RateLimiter`
- IP extraction: `app.utils.client_ip.get_client_ip()`
- Dependency: `LoginRateLimiterDep` in `app.dependencies`
### Global Rate Limiting
In addition to login-specific rate limiting, all API endpoints are protected by global per-IP rate limiting to prevent resource exhaustion, CPU spikes, and network bandwidth attacks from malicious or misconfigured clients.

View File

@@ -98,6 +98,44 @@ log.error("fail2ban_start_failed", stdout=stdout_raw, stderr=stderr_raw) # Neve
---
## Third-Party Library Logs
BanGUI uses **structlog** for all application logs, but third-party libraries often emit plain text through Python's standard `logging` module. To maintain uniform JSON output and reduce noise, the following libraries have their log levels overridden to `WARNING`:
| Library | Logger Name | Level | Rationale |
|---------|-------------|-------|-----------|
| APScheduler | `apscheduler` | `WARNING` | Suppresses routine scheduler polling ("Looking for jobs to run", "Next wakeup is due at...") while preserving job failure warnings. |
| aiosqlite | `aiosqlite` | `WARNING` | Suppresses database operation traces and connection details while preserving connection errors. |
These overrides are applied in `backend/app/main.py::_configure_logging()` immediately after `logging.basicConfig()`.
### Disabling Suppression
Set the environment variable `BANGUI_SUPPRESS_THIRD_PARTY_LOGS=false` to allow APScheduler and aiosqlite to emit their normal DEBUG/INFO logs. This is useful when troubleshooting scheduler or database issues in development.
```bash
BANGUI_SUPPRESS_THIRD_PARTY_LOGS=false python -m uvicorn app.main:create_app
```
When suppression is disabled, the loggers inherit the application's `BANGUI_LOG_LEVEL` (e.g., `debug`).
### Uniform JSON Formatting
All stdlib logs — including those from third-party libraries — are intercepted by `structlog.stdlib.ProcessorFormatter` and rendered as JSON. This ensures every log line in `bangui.log` is machine-readable, regardless of its source.
### Adding New Overrides
When integrating a new library that emits verbose DEBUG logs:
```python
# In backend/app/main.py, inside _configure_logging()
logging.getLogger("new_library").setLevel(logging.WARNING)
```
Use `WARNING` as the default to still capture errors and warnings. Only use `ERROR` if the library is exceptionally noisy and its warnings are not actionable.
---
## Structured Logging Best Practices
### Log Levels

View File

@@ -418,6 +418,65 @@ Then set it in your `.env` file or environment variables.
---
## Enabling Debug Logs for Third-Party Libraries
BanGUI suppresses verbose DEBUG logs from APScheduler and aiosqlite by default (see `Docs/Observability.md`). When troubleshooting scheduler or database issues, you can temporarily re-enable these logs.
### Quick method (environment variable)
Set `BANGUI_SUPPRESS_THIRD_PARTY_LOGS=false` and ensure `BANGUI_LOG_LEVEL=debug`:
```bash
BANGUI_SUPPRESS_THIRD_PARTY_LOGS=false \
BANGUI_LOG_LEVEL=debug \
python -m uvicorn app.main:create_app
```
This allows APScheduler and aiosqlite to inherit the application log level without editing code.
### Code method (for permanent changes)
If you need to change the level for a specific library only, edit `backend/app/main.py` inside `_configure_logging()`:
```python
logging.getLogger("apscheduler").setLevel(logging.DEBUG)
```
Restart the application. You will see scheduler polling messages such as:
- `Looking for jobs to run`
- `Next wakeup is due at ...`
- `Running job ...`
### Reverting
Remove the environment variable or code change and restart. When suppression is re-enabled, the loggers return to `WARNING` level.
---
## Plain Text Logs Still Appearing
If `bangui.log` contains plain text lines that are not JSON, a library is bypassing structlog's `ProcessorFormatter`.
**Diagnosis:**
1. Identify the logger name in the plain text line (usually at the start of the line).
2. Check whether the logger is listed in `backend/app/main.py::_configure_logging()` under the third-party overrides.
3. Verify that `structlog.stdlib.ProcessorFormatter` is attached to all handlers:
```python
for handler in handlers:
handler.setFormatter(formatter)
```
**Common causes:**
| Cause | Fix |
|-------|-----|
| Library initializes its own handler after startup | Add `logging.getLogger("library_name").setLevel(logging.WARNING)` in `_configure_logging()`. |
| Custom handler added outside `_configure_logging()` | Ensure all handlers use `structlog.stdlib.ProcessorFormatter`. |
| Log emitted before `_configure_logging()` is called | Move logging configuration earlier in the lifespan or app factory. |
---
## Getting Help
If issues persist after following this guide:

File diff suppressed because it is too large Load Diff