TASK-009: Mitigate SSRF vulnerability in blocklist URL validation

- Change BlocklistSourceCreate.url from str to AnyHttpUrl (Pydantic type)
  - Rejects non-http schemes (file://, ftp://, etc.) at model boundary

- Add is_private_ip() utility to detect RFC 1918 private ranges:
  - 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (RFC 1918)
  - 127.0.0.0/8, ::1/128 (loopback)
  - 169.254.0.0/16, fe80::/10 (link-local)
  - IPv6 site-local, multicast, and reserved ranges

- Add async validate_blocklist_url() function:
  - Resolves hostname via DNS using loop.run_in_executor()
  - Rejects if hostname resolves to private/reserved IP
  - Raises ValueError on validation failure

- Integrate validation into service layer:
  - create_source() calls validate_blocklist_url() before persist
  - update_source() conditionally validates if url provided
  - Both raise ValueError on failure

- Update router endpoints with error handling:
  - create_blocklist() and update_blocklist() catch ValueError
  - Return HTTP 400 Bad Request with descriptive error message

- Add comprehensive test coverage (9 new SSRF tests):
  - file://, ftp://, localhost, 127.0.0.1, 192.168.x.x
  - 10.x.x.x, 172.16.x.x, 169.254.x.x (link-local)
  - Valid public URLs (passes validation)
  - All 36 service tests passing

- Update documentation:
  - Features.md: Document URL validation constraints
  - Backend-Development.md: Add SSRF prevention pattern section

Fixes SSRF vulnerability where authenticated users could supply
file://, ftp://, or private IP URLs and the backend would fetch them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-26 12:57:23 +02:00
parent a5b55d1248
commit 4ab767e3d4
9 changed files with 291 additions and 66 deletions

View File

@@ -783,6 +783,31 @@ To adopt a Redis backend:
- Handle edge cases explicitly: empty lists, `None` values, negative numbers, empty strings.
- Use type narrowing and exhaustive pattern matching (`match` / `case`) to eliminate impossible states.
### 14.12 SSRF Prevention (Server-Side Request Forgery)
When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests:
1. **Use Pydantic's `AnyHttpUrl` type** to restrict schemes to `http://` and `https://` only.
- Rejects `file://`, `ftp://`, `gopher://`, and other non-http schemes at the model boundary.
2. **Validate resolved IP addresses** before fetching:
- Parse the hostname and resolve it via DNS (using `socket.getaddrinfo()`).
- Use `ipaddress.ip_address().is_private` to reject private/reserved ranges:
- RFC 1918: `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
- Loopback: `127.0.0.0/8`, `::1/128`
- Link-local: `169.254.0.0/16`, `fe80::/10`
- IPv6 site-local, multicast, and reserved ranges.
- Raise `ValueError` if validation fails; let the router convert it to HTTP 400.
3. **Guard against DNS rebinding**:
- Validate DNS at URL creation/validation time (performed during request deserialization).
- For additional safety, re-validate the connection IP at HTTP client time (e.g., custom `aiohttp.TCPConnector` can inspect the resolved address during connect).
4. **Example implementation** (see `backend/app/utils/ip_utils.py`):
- `is_private_ip(ip_str: str) → bool`: Checks if IP is private/reserved/loopback/link-local.
- `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check.
- Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400.
---
## 16. Quick Reference — Do / Don't

View File

@@ -311,6 +311,17 @@ Automated downloading and applying of external IP blocklists to block known mali
- Support for plain-text lists with one IP address per line.
- Preview the contents of a blocklist URL before enabling it (download and display a sample of entries).
#### URL Validation & Security
- **Scheme restriction:** Only `http://` and `https://` schemes are accepted. `file://`, `ftp://`, and other schemes are rejected.
- **Hostname validation:** The hostname is resolved via DNS and the resulting IP address is validated to prevent SSRF attacks:
- Private IP ranges (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`) are rejected.
- Loopback addresses (`127.0.0.1`, `::1`) are rejected.
- Link-local addresses (`169.254.0.0/16`, `fe80::/10`) are rejected.
- Reserved and multicast addresses are rejected.
- **Error handling:** If a URL fails validation (invalid scheme, unresolvable hostname, or resolves to a private IP), the API returns a `400 Bad Request` with a descriptive error message.
- **Ports:** URLs may specify custom ports (e.g. `https://example.com:8443/list.txt`), but the hostname must still resolve to a public IP address.
### Schedule
- Configure when the blocklist import runs using a simple time-and-frequency picker (no raw cron syntax required).

View File

@@ -1,35 +1,3 @@
## TASK-008 — `delete_expired_sessions` never scheduled — sessions table grows unbounded
**Severity:** Medium
### Where found
`backend/app/repositories/session_repo.py``delete_expired_sessions()` exists but is never called from any task or lifespan handler.
### Why this is needed
Expired sessions are only removed individually when that specific token is validated and found expired. The bulk cleanup function is never called. Over months of operation, the `sessions` table accumulates every session ever created and is never trimmed, increasing DB size and degrading query performance.
### Goal
Periodically purge expired sessions from the database.
### What to do
1. Create `backend/app/tasks/session_cleanup.py` following the same pattern as `geo_cache_flush.py`.
2. Schedule it as an interval job (e.g., every 6 hours) in `startup_shared_resources`.
3. The task should call `session_repo.delete_expired_sessions(db, now_iso)` and log how many rows were deleted.
### Possible traps and issues
- The task must use `task_db(settings)` (not the request-scoped `get_db`) to open its own connection.
- Log the count of deleted rows at `info` level, not `debug`, so administrators can see the cleanup is running.
### Docs changes needed
- `Architekture.md` — add `session_cleanup` to the scheduled tasks table.
- `Backend-Development.md` — background task patterns.
### Doc references
- [Architekture.md](Architekture.md) — background tasks
- [Backend-Development.md](Backend-Development.md) — scheduled tasks
---
## TASK-009 — Blocklist URL has no scheme/host validation — SSRF risk
**Severity:** High