TASK-009: Mitigate SSRF vulnerability in blocklist URL validation
- Change BlocklistSourceCreate.url from str to AnyHttpUrl (Pydantic type) - Rejects non-http schemes (file://, ftp://, etc.) at model boundary - Add is_private_ip() utility to detect RFC 1918 private ranges: - 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (RFC 1918) - 127.0.0.0/8, ::1/128 (loopback) - 169.254.0.0/16, fe80::/10 (link-local) - IPv6 site-local, multicast, and reserved ranges - Add async validate_blocklist_url() function: - Resolves hostname via DNS using loop.run_in_executor() - Rejects if hostname resolves to private/reserved IP - Raises ValueError on validation failure - Integrate validation into service layer: - create_source() calls validate_blocklist_url() before persist - update_source() conditionally validates if url provided - Both raise ValueError on failure - Update router endpoints with error handling: - create_blocklist() and update_blocklist() catch ValueError - Return HTTP 400 Bad Request with descriptive error message - Add comprehensive test coverage (9 new SSRF tests): - file://, ftp://, localhost, 127.0.0.1, 192.168.x.x - 10.x.x.x, 172.16.x.x, 169.254.x.x (link-local) - Valid public URLs (passes validation) - All 36 service tests passing - Update documentation: - Features.md: Document URL validation constraints - Backend-Development.md: Add SSRF prevention pattern section Fixes SSRF vulnerability where authenticated users could supply file://, ftp://, or private IP URLs and the backend would fetch them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -783,6 +783,31 @@ To adopt a Redis backend:
|
||||
- Handle edge cases explicitly: empty lists, `None` values, negative numbers, empty strings.
|
||||
- Use type narrowing and exhaustive pattern matching (`match` / `case`) to eliminate impossible states.
|
||||
|
||||
### 14.12 SSRF Prevention (Server-Side Request Forgery)
|
||||
|
||||
When user-supplied URLs are fetched by the backend, validate them before making any HTTP requests:
|
||||
|
||||
1. **Use Pydantic's `AnyHttpUrl` type** to restrict schemes to `http://` and `https://` only.
|
||||
- Rejects `file://`, `ftp://`, `gopher://`, and other non-http schemes at the model boundary.
|
||||
|
||||
2. **Validate resolved IP addresses** before fetching:
|
||||
- Parse the hostname and resolve it via DNS (using `socket.getaddrinfo()`).
|
||||
- Use `ipaddress.ip_address().is_private` to reject private/reserved ranges:
|
||||
- RFC 1918: `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
|
||||
- Loopback: `127.0.0.0/8`, `::1/128`
|
||||
- Link-local: `169.254.0.0/16`, `fe80::/10`
|
||||
- IPv6 site-local, multicast, and reserved ranges.
|
||||
- Raise `ValueError` if validation fails; let the router convert it to HTTP 400.
|
||||
|
||||
3. **Guard against DNS rebinding**:
|
||||
- Validate DNS at URL creation/validation time (performed during request deserialization).
|
||||
- For additional safety, re-validate the connection IP at HTTP client time (e.g., custom `aiohttp.TCPConnector` can inspect the resolved address during connect).
|
||||
|
||||
4. **Example implementation** (see `backend/app/utils/ip_utils.py`):
|
||||
- `is_private_ip(ip_str: str) → bool`: Checks if IP is private/reserved/loopback/link-local.
|
||||
- `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check.
|
||||
- Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400.
|
||||
|
||||
---
|
||||
|
||||
## 16. Quick Reference — Do / Don't
|
||||
|
||||
@@ -311,6 +311,17 @@ Automated downloading and applying of external IP blocklists to block known mali
|
||||
- Support for plain-text lists with one IP address per line.
|
||||
- Preview the contents of a blocklist URL before enabling it (download and display a sample of entries).
|
||||
|
||||
#### URL Validation & Security
|
||||
|
||||
- **Scheme restriction:** Only `http://` and `https://` schemes are accepted. `file://`, `ftp://`, and other schemes are rejected.
|
||||
- **Hostname validation:** The hostname is resolved via DNS and the resulting IP address is validated to prevent SSRF attacks:
|
||||
- Private IP ranges (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`) are rejected.
|
||||
- Loopback addresses (`127.0.0.1`, `::1`) are rejected.
|
||||
- Link-local addresses (`169.254.0.0/16`, `fe80::/10`) are rejected.
|
||||
- Reserved and multicast addresses are rejected.
|
||||
- **Error handling:** If a URL fails validation (invalid scheme, unresolvable hostname, or resolves to a private IP), the API returns a `400 Bad Request` with a descriptive error message.
|
||||
- **Ports:** URLs may specify custom ports (e.g. `https://example.com:8443/list.txt`), but the hostname must still resolve to a public IP address.
|
||||
|
||||
### Schedule
|
||||
|
||||
- Configure when the blocklist import runs using a simple time-and-frequency picker (no raw cron syntax required).
|
||||
|
||||
@@ -1,35 +1,3 @@
|
||||
## TASK-008 — `delete_expired_sessions` never scheduled — sessions table grows unbounded
|
||||
|
||||
**Severity:** Medium
|
||||
|
||||
### Where found
|
||||
`backend/app/repositories/session_repo.py` — `delete_expired_sessions()` exists but is never called from any task or lifespan handler.
|
||||
|
||||
### Why this is needed
|
||||
Expired sessions are only removed individually when that specific token is validated and found expired. The bulk cleanup function is never called. Over months of operation, the `sessions` table accumulates every session ever created and is never trimmed, increasing DB size and degrading query performance.
|
||||
|
||||
### Goal
|
||||
Periodically purge expired sessions from the database.
|
||||
|
||||
### What to do
|
||||
1. Create `backend/app/tasks/session_cleanup.py` following the same pattern as `geo_cache_flush.py`.
|
||||
2. Schedule it as an interval job (e.g., every 6 hours) in `startup_shared_resources`.
|
||||
3. The task should call `session_repo.delete_expired_sessions(db, now_iso)` and log how many rows were deleted.
|
||||
|
||||
### Possible traps and issues
|
||||
- The task must use `task_db(settings)` (not the request-scoped `get_db`) to open its own connection.
|
||||
- Log the count of deleted rows at `info` level, not `debug`, so administrators can see the cleanup is running.
|
||||
|
||||
### Docs changes needed
|
||||
- `Architekture.md` — add `session_cleanup` to the scheduled tasks table.
|
||||
- `Backend-Development.md` — background task patterns.
|
||||
|
||||
### Doc references
|
||||
- [Architekture.md](Architekture.md) — background tasks
|
||||
- [Backend-Development.md](Backend-Development.md) — scheduled tasks
|
||||
|
||||
---
|
||||
|
||||
## TASK-009 — Blocklist URL has no scheme/host validation — SSRF risk
|
||||
|
||||
**Severity:** High
|
||||
|
||||
Reference in New Issue
Block a user