TASK-009: Mitigate SSRF vulnerability in blocklist URL validation

- Change BlocklistSourceCreate.url from str to AnyHttpUrl (Pydantic type)
  - Rejects non-http schemes (file://, ftp://, etc.) at model boundary

- Add is_private_ip() utility to detect RFC 1918 private ranges:
  - 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (RFC 1918)
  - 127.0.0.0/8, ::1/128 (loopback)
  - 169.254.0.0/16, fe80::/10 (link-local)
  - IPv6 site-local, multicast, and reserved ranges

- Add async validate_blocklist_url() function:
  - Resolves hostname via DNS using loop.run_in_executor()
  - Rejects if hostname resolves to private/reserved IP
  - Raises ValueError on validation failure

- Integrate validation into service layer:
  - create_source() calls validate_blocklist_url() before persist
  - update_source() conditionally validates if url provided
  - Both raise ValueError on failure

- Update router endpoints with error handling:
  - create_blocklist() and update_blocklist() catch ValueError
  - Return HTTP 400 Bad Request with descriptive error message

- Add comprehensive test coverage (9 new SSRF tests):
  - file://, ftp://, localhost, 127.0.0.1, 192.168.x.x
  - 10.x.x.x, 172.16.x.x, 169.254.x.x (link-local)
  - Valid public URLs (passes validation)
  - All 36 service tests passing

- Update documentation:
  - Features.md: Document URL validation constraints
  - Backend-Development.md: Add SSRF prevention pattern section

Fixes SSRF vulnerability where authenticated users could supply
file://, ftp://, or private IP URLs and the backend would fetch them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-26 12:57:23 +02:00
parent a5b55d1248
commit 4ab767e3d4
9 changed files with 291 additions and 66 deletions

View File

@@ -1,35 +1,3 @@
## TASK-008 — `delete_expired_sessions` never scheduled — sessions table grows unbounded
**Severity:** Medium
### Where found
`backend/app/repositories/session_repo.py``delete_expired_sessions()` exists but is never called from any task or lifespan handler.
### Why this is needed
Expired sessions are only removed individually when that specific token is validated and found expired. The bulk cleanup function is never called. Over months of operation, the `sessions` table accumulates every session ever created and is never trimmed, increasing DB size and degrading query performance.
### Goal
Periodically purge expired sessions from the database.
### What to do
1. Create `backend/app/tasks/session_cleanup.py` following the same pattern as `geo_cache_flush.py`.
2. Schedule it as an interval job (e.g., every 6 hours) in `startup_shared_resources`.
3. The task should call `session_repo.delete_expired_sessions(db, now_iso)` and log how many rows were deleted.
### Possible traps and issues
- The task must use `task_db(settings)` (not the request-scoped `get_db`) to open its own connection.
- Log the count of deleted rows at `info` level, not `debug`, so administrators can see the cleanup is running.
### Docs changes needed
- `Architekture.md` — add `session_cleanup` to the scheduled tasks table.
- `Backend-Development.md` — background task patterns.
### Doc references
- [Architekture.md](Architekture.md) — background tasks
- [Backend-Development.md](Backend-Development.md) — scheduled tasks
---
## TASK-009 — Blocklist URL has no scheme/host validation — SSRF risk
**Severity:** High