Refactor rate limiting with exponential backoff strategy

- Update rate limiter to use exponential backoff instead of fixed limit
- Implement progressive delays for failed login attempts (0.5s, 1s, 2s, 4s, 5s max)
- Update auth router documentation and endpoint docs
- Refactor test suite to match new rate limiting behavior
- Update backend development documentation
- Clean up unused tasks documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-30 19:58:09 +02:00
parent 2db635ae19
commit 277f2a467c
6 changed files with 165 additions and 208 deletions

View File

@@ -12,15 +12,14 @@ For programmatic API clients (non-browser), use ``POST /api/auth/token``
which returns a token in the response body for use in the ``Authorization``
header. This endpoint does not set a cookie.
Login attempts are rate-limited to 5 per minute per IP address to prevent
brute-force attacks. Requests exceeding the limit return ``429 Too Many Requests``
with a ``Retry-After`` header.
Rate limiting uses exponential backoff: each wrong password attempt incurs
a progressive delay (0.5s, 1s, 2s, 4s, 5s max) per IP address. Requests
blocked by this delay return ``429 Too Many Requests`` with a ``Retry-After``
header.
"""
from __future__ import annotations
import asyncio
import structlog
from fastapi import APIRouter, Request, Response
@@ -60,8 +59,9 @@ async def login(
On success the token is also set as an ``HttpOnly`` ``SameSite=Lax``
cookie so the browser SPA benefits from automatic credential handling.
Rate limiting: Up to 5 login attempts per minute per client IP.
Requests exceeding this limit return ``429 Too Many Requests`` with
Rate limiting: Exponential backoff on failed attempts. Each wrong password
incurs an increasing delay (0.5s, 1s, 2s, 4s, 5s max per IP address).
Requests during the penalty period return ``429 Too Many Requests`` with
a ``Retry-After`` header.
Args:
@@ -81,6 +81,7 @@ async def login(
"""
client_ip = get_client_ip(request, trusted_proxies=settings.trusted_proxies)
# Check if this IP is currently blocked by exponential backoff
if not rate_limiter.is_allowed(client_ip):
log.warning("login_rate_limit_exceeded", client_ip=client_ip)
raise RateLimitError("Too many login attempts. Please try again later.")
@@ -94,16 +95,9 @@ async def login(
session_repo=session_ctx.session_repo,
)
except ValueError as exc:
# Progressive penalty delay on wrong password to slow down brute-force
# attacks without exhausting request capacity (app-layer DoS resistance).
penalty = rate_limiter.record_failure(client_ip)
acquired = rate_limiter.acquire(client_ip)
try:
if acquired:
await asyncio.sleep(penalty)
finally:
rate_limiter.release(client_ip)
log.warning("login_failed", client_ip=client_ip, error=str(exc), penalty=penalty)
# Record this failure to increment the exponential backoff counter
rate_limiter.record_failure(client_ip)
log.warning("login_failed", client_ip=client_ip, error=str(exc))
raise AuthenticationError(str(exc)) from exc
response.set_cookie(

View File

@@ -1,26 +1,39 @@
"""In-memory rate limiter for IP-based request throttling.
Tracks login attempts per IP address and enforces a configurable limit.
Uses a dictionary of deques (per IP) storing timestamps of recent attempts.
Implements exponential backoff for failed login attempts using failure tracking.
Each wrong password attempt increments the failure count for that IP, and subsequent
attempts are blocked for a duration that grows exponentially up to a maximum.
Uses a dictionary of deques (per IP) storing timestamps of recent failures.
Old entries are cleaned up by a background task to prevent unbounded growth.
Process-local implementation — in multi-worker setups, each worker has
independent counters. This constraint limits the blast radius of brute-force
attacks to a single worker.
The penalty strategy for failed login attempts is also managed here:
record_failure() records a failure timestamp and returns the penalty delay
to apply, enabling progressive back-off without exhausting request capacity.
**How It Works:**
Operational Notes
-----------------
1. A successful login resets the failure counter for that IP.
2. Each failed login (wrong password) calls record_failure() and increments the counter.
3. is_allowed() checks if enough time has passed since the last failure based on
the current failure count. The delay grows exponentially with each consecutive failure:
**Cleanup Lifecycle**: The rate limiter state (_attempts, _failures, _lock_counts)
grows as IPs interact with the system. To prevent unbounded memory growth during
long runtimes, a scheduled background task (rate_limiter_cleanup) calls the
cleanup_expired() method every 30 minutes. This is safe because:
- 1st failure: 0.5 second penalty
- 2nd failure: 1 second penalty (0.5 * 2^1)
- 3rd failure: 2 seconds penalty (0.5 * 2^2)
- 4th failure: 4 seconds penalty (0.5 * 2^3)
- ... up to the configured maximum (default 5 seconds)
- cleanup_expired() only removes IPs with no recent attempts (all timestamps
4. Penalties are cumulative within the window: if an attacker makes 5 failed
attempts, they must wait the full 5 seconds before trying again (not 5 seconds
per attempt).
**Cleanup Lifecycle**: The rate limiter state (_failures) grows as IPs interact
with the system. To prevent unbounded memory growth during long runtimes, a
scheduled background task (rate_limiter_cleanup) calls cleanup_expired() every
30 minutes. This is safe because:
- cleanup_expired() only removes IPs with no recent failures (all timestamps
outside the rate-limit window), so active IPs are never disrupted.
- The cleanup is non-blocking and logged for observability.
- Individual requests already prune old timestamps from each IP's deque during
@@ -70,48 +83,57 @@ class RateLimiter:
Args:
max_attempts: Maximum attempts allowed within the window.
(Deprecated: now only used for cleanup window size)
window_seconds: Time window (seconds) for rate limit.
"""
self.max_attempts: int = max_attempts
self.window_seconds: int = window_seconds
self._attempts: dict[str, deque[float]] = {}
self._failures: dict[str, deque[float]] = {}
self._lock_counts: dict[str, int] = {}
def is_allowed(self, ip_address: str) -> bool:
"""Check if a request from *ip_address* is allowed.
If allowed, the current timestamp is recorded. Old entries (outside
the window) are removed before checking.
Checks if the IP has accumulated failures that would currently block
the attempt due to penalty backoff. Does NOT record a new attempt —
that happens only on successful password verification.
Args:
ip_address: The client IP address to rate-limit.
Returns:
``True`` if the request is allowed, ``False`` if the limit is exceeded.
``True`` if the request is allowed (past penalty period), ``False``
if currently blocked by exponential backoff.
"""
now = time()
if ip_address not in self._failures:
self._failures[ip_address] = deque()
failures = self._failures[ip_address]
cutoff = now - self.window_seconds
if ip_address not in self._attempts:
self._attempts[ip_address] = deque()
# Remove old failures outside the window
while failures and failures[0] < cutoff:
failures.popleft()
attempts = self._attempts[ip_address]
# If no recent failures, request is allowed
if not failures:
return True
# Remove old attempts outside the window
while attempts and attempts[0] < cutoff:
attempts.popleft()
# Calculate accumulated penalty: how much time must pass before
# the next attempt is allowed, based on failure count
failure_count = len(failures)
penalty = min(
LOGIN_PENALTY_BASE_SECONDS * (LOGIN_PENALTY_MULTIPLIER ** failure_count),
LOGIN_PENALTY_MAX_SECONDS,
)
# Check if the limit is exceeded
if len(attempts) >= self.max_attempts:
return False
# Record this attempt
attempts.append(now)
return True
# Check if enough time has passed since the last failure
time_since_last_failure = now - failures[-1]
return time_since_last_failure >= penalty
def cleanup_expired(self) -> None:
"""Remove all IPs with no recent attempts (cleanup task).
"""Remove all IPs with no recent failures (cleanup task).
Called periodically by the background task to prevent unbounded
growth of the tracking dictionary.
@@ -120,119 +142,67 @@ class RateLimiter:
cutoff = now - self.window_seconds
ips_to_remove = []
for ip_address, attempts in self._attempts.items():
# Remove old attempts
while attempts and attempts[0] < cutoff:
attempts.popleft()
# Mark IP for removal if no attempts remain
if not attempts:
for ip_address, failures in self._failures.items():
# Remove old failures
while failures and failures[0] < cutoff:
failures.popleft()
# Mark IP for removal if no failures remain
if not failures:
ips_to_remove.append(ip_address)
for ip_address in ips_to_remove:
del self._attempts[ip_address]
del self._failures[ip_address]
if ips_to_remove:
log.debug("rate_limiter_cleanup", removed_ips=len(ips_to_remove))
def get_state(self) -> Mapping[str, int]:
"""Return a read-only view of current attempt counts per IP.
"""Return a read-only view of current failure counts per IP.
For debugging and monitoring.
Returns:
A mapping of IP addresses to their attempt counts.
A mapping of IP addresses to their failure counts.
"""
now = time()
cutoff = now - self.window_seconds
result = {}
for ip_address, attempts in self._attempts.items():
# Count non-expired attempts
count = sum(1 for ts in attempts if ts >= cutoff)
for ip_address, failures in self._failures.items():
# Count non-expired failures
count = sum(1 for ts in failures if ts >= cutoff)
if count > 0:
result[ip_address] = count
return result
def reset(self) -> None:
"""Clear all tracked attempts (for testing)."""
self._attempts.clear()
"""Clear all tracked failures (for testing)."""
self._failures.clear()
self._lock_counts.clear()
# ---------------------------------------------------------------------------
# Penalty strategy for failed login attempts
# ---------------------------------------------------------------------------
def record_failure(self, ip_address: str) -> float:
"""Record a failed login attempt and return the penalty delay in seconds.
def record_failure(self, ip_address: str) -> None:
"""Record a failed login attempt.
Tracks consecutive failures per IP. Penalty grows exponentially with
each failure, bounded by :data:`~app.utils.constants.LOGIN_PENALTY_MAX_SECONDS`,
then resets the failure counter. This provides brute-force resistance
without exhausting request capacity.
A concurrency guard (``_lock_counts``) prevents a single IP from
accumulating many concurrent penalty tasks.
Tracks failures per IP to enable exponential backoff in is_allowed().
The penalty delay is automatically calculated in is_allowed() based on
the failure count, providing transparent brute-force resistance.
Args:
ip_address: The client IP address whose login attempt failed.
Returns:
The penalty delay in seconds to apply.
"""
now = time()
if ip_address not in self._failures:
self._failures[ip_address] = deque()
if ip_address not in self._lock_counts:
self._lock_counts[ip_address] = 0
failures = self._failures[ip_address]
lock_count = self._lock_counts[ip_address]
# Reset if last failure is outside the window
cutoff = now - self.window_seconds
# Remove old failures outside the window
while failures and failures[0] < cutoff:
failures.popleft()
consecutive = len(failures)
penalty = min(
LOGIN_PENALTY_BASE_SECONDS * (LOGIN_PENALTY_MULTIPLIER ** consecutive),
LOGIN_PENALTY_MAX_SECONDS,
)
# Record this failure
failures.append(now)
# Concurrency protection: if too many concurrent sleeps are already
# running for this IP, cap the penalty to avoid thread exhaustion.
if lock_count >= 3:
penalty = min(penalty, LOGIN_PENALTY_BASE_SECONDS)
return penalty
def acquire(self, ip_address: str) -> bool:
"""Acquire a concurrency slot for a penalty task.
Args:
ip_address: The client IP address.
Returns:
``True`` if the slot was acquired, ``False`` if the IP already has
the maximum number of concurrent penalty tasks running.
"""
if ip_address not in self._lock_counts:
self._lock_counts[ip_address] = 0
if self._lock_counts[ip_address] >= 3:
return False
self._lock_counts[ip_address] += 1
return True
def release(self, ip_address: str) -> None:
"""Release a concurrency slot when a penalty task completes.
Args:
ip_address: The client IP address.
"""
if ip_address in self._lock_counts and self._lock_counts[ip_address] > 0:
self._lock_counts[ip_address] -= 1