Refactor rate limiting with exponential backoff strategy

- Update rate limiter to use exponential backoff instead of fixed limit
- Implement progressive delays for failed login attempts (0.5s, 1s, 2s, 4s, 5s max)
- Update auth router documentation and endpoint docs
- Refactor test suite to match new rate limiting behavior
- Update backend development documentation
- Clean up unused tasks documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-30 19:58:09 +02:00
parent 2db635ae19
commit 277f2a467c
6 changed files with 165 additions and 208 deletions

View File

@@ -1,26 +1,39 @@
"""In-memory rate limiter for IP-based request throttling.
Tracks login attempts per IP address and enforces a configurable limit.
Uses a dictionary of deques (per IP) storing timestamps of recent attempts.
Implements exponential backoff for failed login attempts using failure tracking.
Each wrong password attempt increments the failure count for that IP, and subsequent
attempts are blocked for a duration that grows exponentially up to a maximum.
Uses a dictionary of deques (per IP) storing timestamps of recent failures.
Old entries are cleaned up by a background task to prevent unbounded growth.
Process-local implementation — in multi-worker setups, each worker has
independent counters. This constraint limits the blast radius of brute-force
attacks to a single worker.
The penalty strategy for failed login attempts is also managed here:
record_failure() records a failure timestamp and returns the penalty delay
to apply, enabling progressive back-off without exhausting request capacity.
**How It Works:**
Operational Notes
-----------------
1. A successful login resets the failure counter for that IP.
2. Each failed login (wrong password) calls record_failure() and increments the counter.
3. is_allowed() checks if enough time has passed since the last failure based on
the current failure count. The delay grows exponentially with each consecutive failure:
**Cleanup Lifecycle**: The rate limiter state (_attempts, _failures, _lock_counts)
grows as IPs interact with the system. To prevent unbounded memory growth during
long runtimes, a scheduled background task (rate_limiter_cleanup) calls the
cleanup_expired() method every 30 minutes. This is safe because:
- 1st failure: 0.5 second penalty
- 2nd failure: 1 second penalty (0.5 * 2^1)
- 3rd failure: 2 seconds penalty (0.5 * 2^2)
- 4th failure: 4 seconds penalty (0.5 * 2^3)
- ... up to the configured maximum (default 5 seconds)
- cleanup_expired() only removes IPs with no recent attempts (all timestamps
4. Penalties are cumulative within the window: if an attacker makes 5 failed
attempts, they must wait the full 5 seconds before trying again (not 5 seconds
per attempt).
**Cleanup Lifecycle**: The rate limiter state (_failures) grows as IPs interact
with the system. To prevent unbounded memory growth during long runtimes, a
scheduled background task (rate_limiter_cleanup) calls cleanup_expired() every
30 minutes. This is safe because:
- cleanup_expired() only removes IPs with no recent failures (all timestamps
outside the rate-limit window), so active IPs are never disrupted.
- The cleanup is non-blocking and logged for observability.
- Individual requests already prune old timestamps from each IP's deque during
@@ -70,48 +83,57 @@ class RateLimiter:
Args:
max_attempts: Maximum attempts allowed within the window.
(Deprecated: now only used for cleanup window size)
window_seconds: Time window (seconds) for rate limit.
"""
self.max_attempts: int = max_attempts
self.window_seconds: int = window_seconds
self._attempts: dict[str, deque[float]] = {}
self._failures: dict[str, deque[float]] = {}
self._lock_counts: dict[str, int] = {}
def is_allowed(self, ip_address: str) -> bool:
"""Check if a request from *ip_address* is allowed.
If allowed, the current timestamp is recorded. Old entries (outside
the window) are removed before checking.
Checks if the IP has accumulated failures that would currently block
the attempt due to penalty backoff. Does NOT record a new attempt —
that happens only on successful password verification.
Args:
ip_address: The client IP address to rate-limit.
Returns:
``True`` if the request is allowed, ``False`` if the limit is exceeded.
``True`` if the request is allowed (past penalty period), ``False``
if currently blocked by exponential backoff.
"""
now = time()
if ip_address not in self._failures:
self._failures[ip_address] = deque()
failures = self._failures[ip_address]
cutoff = now - self.window_seconds
if ip_address not in self._attempts:
self._attempts[ip_address] = deque()
# Remove old failures outside the window
while failures and failures[0] < cutoff:
failures.popleft()
attempts = self._attempts[ip_address]
# If no recent failures, request is allowed
if not failures:
return True
# Remove old attempts outside the window
while attempts and attempts[0] < cutoff:
attempts.popleft()
# Calculate accumulated penalty: how much time must pass before
# the next attempt is allowed, based on failure count
failure_count = len(failures)
penalty = min(
LOGIN_PENALTY_BASE_SECONDS * (LOGIN_PENALTY_MULTIPLIER ** failure_count),
LOGIN_PENALTY_MAX_SECONDS,
)
# Check if the limit is exceeded
if len(attempts) >= self.max_attempts:
return False
# Record this attempt
attempts.append(now)
return True
# Check if enough time has passed since the last failure
time_since_last_failure = now - failures[-1]
return time_since_last_failure >= penalty
def cleanup_expired(self) -> None:
"""Remove all IPs with no recent attempts (cleanup task).
"""Remove all IPs with no recent failures (cleanup task).
Called periodically by the background task to prevent unbounded
growth of the tracking dictionary.
@@ -120,119 +142,67 @@ class RateLimiter:
cutoff = now - self.window_seconds
ips_to_remove = []
for ip_address, attempts in self._attempts.items():
# Remove old attempts
while attempts and attempts[0] < cutoff:
attempts.popleft()
# Mark IP for removal if no attempts remain
if not attempts:
for ip_address, failures in self._failures.items():
# Remove old failures
while failures and failures[0] < cutoff:
failures.popleft()
# Mark IP for removal if no failures remain
if not failures:
ips_to_remove.append(ip_address)
for ip_address in ips_to_remove:
del self._attempts[ip_address]
del self._failures[ip_address]
if ips_to_remove:
log.debug("rate_limiter_cleanup", removed_ips=len(ips_to_remove))
def get_state(self) -> Mapping[str, int]:
"""Return a read-only view of current attempt counts per IP.
"""Return a read-only view of current failure counts per IP.
For debugging and monitoring.
Returns:
A mapping of IP addresses to their attempt counts.
A mapping of IP addresses to their failure counts.
"""
now = time()
cutoff = now - self.window_seconds
result = {}
for ip_address, attempts in self._attempts.items():
# Count non-expired attempts
count = sum(1 for ts in attempts if ts >= cutoff)
for ip_address, failures in self._failures.items():
# Count non-expired failures
count = sum(1 for ts in failures if ts >= cutoff)
if count > 0:
result[ip_address] = count
return result
def reset(self) -> None:
"""Clear all tracked attempts (for testing)."""
self._attempts.clear()
"""Clear all tracked failures (for testing)."""
self._failures.clear()
self._lock_counts.clear()
# ---------------------------------------------------------------------------
# Penalty strategy for failed login attempts
# ---------------------------------------------------------------------------
def record_failure(self, ip_address: str) -> float:
"""Record a failed login attempt and return the penalty delay in seconds.
def record_failure(self, ip_address: str) -> None:
"""Record a failed login attempt.
Tracks consecutive failures per IP. Penalty grows exponentially with
each failure, bounded by :data:`~app.utils.constants.LOGIN_PENALTY_MAX_SECONDS`,
then resets the failure counter. This provides brute-force resistance
without exhausting request capacity.
A concurrency guard (``_lock_counts``) prevents a single IP from
accumulating many concurrent penalty tasks.
Tracks failures per IP to enable exponential backoff in is_allowed().
The penalty delay is automatically calculated in is_allowed() based on
the failure count, providing transparent brute-force resistance.
Args:
ip_address: The client IP address whose login attempt failed.
Returns:
The penalty delay in seconds to apply.
"""
now = time()
if ip_address not in self._failures:
self._failures[ip_address] = deque()
if ip_address not in self._lock_counts:
self._lock_counts[ip_address] = 0
failures = self._failures[ip_address]
lock_count = self._lock_counts[ip_address]
# Reset if last failure is outside the window
cutoff = now - self.window_seconds
# Remove old failures outside the window
while failures and failures[0] < cutoff:
failures.popleft()
consecutive = len(failures)
penalty = min(
LOGIN_PENALTY_BASE_SECONDS * (LOGIN_PENALTY_MULTIPLIER ** consecutive),
LOGIN_PENALTY_MAX_SECONDS,
)
# Record this failure
failures.append(now)
# Concurrency protection: if too many concurrent sleeps are already
# running for this IP, cap the penalty to avoid thread exhaustion.
if lock_count >= 3:
penalty = min(penalty, LOGIN_PENALTY_BASE_SECONDS)
return penalty
def acquire(self, ip_address: str) -> bool:
"""Acquire a concurrency slot for a penalty task.
Args:
ip_address: The client IP address.
Returns:
``True`` if the slot was acquired, ``False`` if the IP already has
the maximum number of concurrent penalty tasks running.
"""
if ip_address not in self._lock_counts:
self._lock_counts[ip_address] = 0
if self._lock_counts[ip_address] >= 3:
return False
self._lock_counts[ip_address] += 1
return True
def release(self, ip_address: str) -> None:
"""Release a concurrency slot when a penalty task completes.
Args:
ip_address: The client IP address.
"""
if ip_address in self._lock_counts and self._lock_counts[ip_address] > 0:
self._lock_counts[ip_address] -= 1