BanGUI/Docs/Tasks.md at 05c3b564ae77c2af3deb16b5a3a4e4401fcde76b

Files

Lukas 05c3b564ae Refactor scheduler lock implementation with heartbeat mechanism

- Add heartbeat-based lock renewal in scheduler_lock_heartbeat.py
- Update scheduler_lock.py with improved lock management
- Add comprehensive tests for scheduler lock functionality
- Update deployment and task documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-04-30 22:10:38 +02:00

11 KiB

Raw Blame History

[IMPORTANT] Scheduler lock race condition

Where found

backend/app/utils/scheduler_lock.py:56-58 — heartbeat interval 10 seconds

Why this is needed

Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.

Goal

Implement robust distributed locking that prevents concurrent execution.

What to do

Option A (Strengthen heartbeat):

Reduce interval to 5s (half of timeout)
Use database advisory locks
Monitor heartbeat failures

Option B (Migrate to Redis):

Use redlock-py or aioredis
Simpler, more reliable than database-backed

Current code improvements:

Log when heartbeat fails
Add metric for lock contention
Test multi-process scenario

Possible traps and issues

Database locks don't scale under high contention
Redis adds new dependency
Clock skew breaks timestamp-based expiry

Docs changes needed

Update Docs/Deployment.md § Scheduler Lock
Add troubleshooting: "Blocklist import runs twice"

Doc references

Docs/Deployment.md (scheduler)
backend/app/utils/scheduler_lock.py (lock implementation)

[IMPORTANT] API pagination doesn't return metadata

Where found

backend/app/routers/history.py — returns bare list, no pagination metadata
All paginated routers have same issue

Why this is needed

Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.

Goal

Return pagination metadata with every paginated response.

What to do

Create response wrapper:

class PaginatedResponse(BaseModel):
    data: list[Item]
    pagination: PaginationMetadata

Update all paginated routers to return this wrapper
Update frontend to use metadata for UI

Possible traps and issues

SELECT COUNT(*) is slow on large tables
Response shape change — old frontend may not handle

Docs changes needed

Update API documentation § Pagination

Doc references

backend/app/utils/pagination.py

[IMPORTANT] Error response schema inconsistent

Where found

Different handlers return different response shapes
Fail2Ban errors: { "error_code": "...", "detail": "..." }
Validation errors: { "detail": [...] }
Not found errors: { "detail": "...", "error_code": "..." }

Why this is needed

Frontend must normalize multiple shapes, making error handling fragile and error-prone.

Goal

Unify all error responses to single schema.

What to do

Define canonical error response:

class ErrorResponse(BaseModel):
    error_code: str
    message: str
    status: int
    details: dict | None = None

Update all handlers to return this format
Update frontend to expect unified schema

Possible traps and issues

Backward compatibility with old clients
FastAPI's built-in handlers may override custom
Rich detail structures need accommodation

Docs changes needed

Update API documentation with unified error schema
Add error code reference table

Doc references

Docs/API.md (error codes)
backend/app/main.py (exception handlers)

[IMPORTANT] Provider ordering fragility (Frontend)

Where found

frontend/src/App.tsx — 10-level deep provider nesting
frontend/src/providers/PROVIDER_ORDER.md — documents order, no compile-time enforcement

Why this is needed

Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.

Goal

Add compile-time validation of provider ordering.

What to do

Create provider composition utility enforcing order
Use TypeScript discriminated unions
Add ESLint rule to check provider wrapping

Possible traps and issues

TypeScript doesn't easily enforce ordering
May be overkill — improve runtime error messages instead

Docs changes needed

Update Docs/Architekture.md § 3.2 (Providers)

Doc references

Docs/Architekture.md § 3.2 (Providers)
frontend/src/providers/PROVIDER_ORDER.md

[IMPORTANT] Promise cancellation not checked in .then()/.catch() chains

Where found

frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88
frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58
Multiple components use this pattern

Why this is needed

When user navigates away, .then() chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.

Goal

Check for cancellation in all .then()/.catch() chains.

What to do

Replace .then()/.catch() with async/await and cancellation check
Or use wrapper hook to hide logic

Possible traps and issues

Checking signal.aborted after await introduces race conditions
Better: let AbortError propagate, catch it in catch block

Docs changes needed

Update Docs/Web-Development.md § Async Patterns

Doc references

Docs/Web-Development.md (async patterns)

[MEDIUM] Inefficient database pagination uses OFFSET

Where found

backend/app/utils/pagination.py — uses OFFSET (page-1) * page_size

Why this is needed

OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️

Goal

Implement keyset pagination (cursor-based) for large result sets.

What to do

Short-term: Add database indexes on sort columns
Long-term: Implement cursor-based pagination using WHERE instead of OFFSET
Frontend sends cursor (last row ID) instead of page number

Possible traps and issues

Cursor must be deterministic
API contract changes
Cursor format must be opaque to client

Docs changes needed

Update Docs/Backend-Development.md § Database Performance

Doc references

Docs/Backend-Development.md (database performance)

[MEDIUM] Session secret rotation not implemented

Where found

backend/app/config.py — single session_secret with no rotation support

Why this is needed

If secret leaks, all sessions compromised. No way to invalidate old sessions.

Goal

Support gradual secret rotation without forcing logout.

What to do

Store multiple secrets: current and previous
Accept tokens signed with either key
Re-sign tokens with current secret on validation

Possible traps and issues

Rotation strategy must be documented
Metrics needed to track secret usage

Docs changes needed

Update Docs/Backend-Development.md § Session Management

Doc references

Docs/Backend-Development.md

[MEDIUM] No CORS configuration

Where found

backend/app/main.py — no CORS middleware added

Why this is needed

If frontend on different origin, cross-origin requests blocked without CORS configuration.

Goal

Add CORS middleware with proper origin whitelisting.

What to do

Add CORS middleware with specific origin whitelist
Make configurable via environment variable
Default to localhost for development

Possible traps and issues

allow_origins=["*"] defeats CORS security
Credentials require specific origins, not wildcard
Missing config silently fails in browser

Docs changes needed

Update Docs/Deployment.md § CORS Configuration

Doc references

Docs/Deployment.md

[MEDIUM] Input validation missing for regex patterns (ReDoS)

Where found

backend/app/routers/config.py — regex validation accepts arbitrary patterns without timeout

Why this is needed

Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.

Goal

Add timeout and complexity limits to regex validation.

What to do

Add timeout to regex compilation (2 seconds recommended)
Add length limit (reject patterns > 1000 characters)
Use signal.alarm() (Unix) or timeout library

Possible traps and issues

signal.alarm() Unix-only
Some valid complex regexes may timeout
Frontend should also validate (defense in depth)

Docs changes needed

Update API docs to document regex validation limits

Doc references

backend/app/routers/config.py

[MEDIUM] No structured logging to external system

Where found

Logs only go to stdout/file, no external aggregation

Why this is needed

Can't search across instances, historical logs lost on instance recycle.

Goal

Ship logs to centralized logging platform.

What to do

Short-term: Ensure structlog JSON output is valid (already done)
Long-term: Ship to logging platform (ELK, Datadog, Papertrail)

Possible traps and issues

External logging adds latency
Sensitive data must not be logged
Log volume can be massive

Docs changes needed

Add Docs/Observability.md section on logging

Doc references

Docs/Observability.md (new)

[MEDIUM] No Application Performance Monitoring (APM)

Where found

Backend: no metrics collection, latency tracking
Frontend: no error tracking, performance metrics
No observability into request performance

Why this is needed

Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.

Goal

Add comprehensive metrics collection and monitoring.

What to do

Backend metrics:
- Add Prometheus metrics: request count, latency, active requests
- Expose /metrics endpoint
Frontend metrics:
- Page load time, FCP, LCP using web-vitals
- API error rates and latencies
Aggregation:
- Prometheus + Grafana, or Datadog/NewRelic

Possible traps and issues

Metrics collection has performance cost
Cardinality explosion with tags
PII in metrics

Docs changes needed

Add Docs/Observability.md

Doc references

Docs/Observability.md (new)

[LOW] Frontend charts not memoized

Where found

frontend/src/components/TopCountriesPieChart.tsx
frontend/src/components/TopCountriesBarChart.tsx

Why this is needed

Charts re-render on every parent update, Recharts reprocesses 5000+ points.

Goal

Memoize chart components.

What to do

Wrap with React.memo with custom comparison
Ensure data objects are stable

Possible traps and issues

Shallow comparison might not be enough
Memoization has memory cost

Docs changes needed

No documentation changes

Doc references

frontend/src/components/TopCountriesChart.tsx

[LOW] No request deduplication on frontend

Where found

frontend/src/hooks/useFetchData.ts — each call launches new request
User clicks "Refresh" twice → two identical requests

Why this is needed

Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).

Goal

Deduplicate identical in-flight requests.

What to do

Implement request cache
Clear cache entry when response received
Use in useFetchData

Possible traps and issues

Cache must be cleared on data mutation
Stale data in cache possible if not careful

Docs changes needed

No documentation changes

Doc references

frontend/src/hooks/useFetchData.ts

11 KiB Raw Blame History

[IMPORTANT] Scheduler lock race condition

[IMPORTANT] API pagination doesn't return metadata

[IMPORTANT] Error response schema inconsistent

[IMPORTANT] Provider ordering fragility (Frontend)

[IMPORTANT] Promise cancellation not checked in .then()/.catch() chains

[MEDIUM] Inefficient database pagination uses OFFSET

[MEDIUM] Session secret rotation not implemented

[MEDIUM] No CORS configuration

[MEDIUM] Input validation missing for regex patterns (ReDoS)

[MEDIUM] No structured logging to external system

[MEDIUM] No Application Performance Monitoring (APM)

[LOW] Frontend charts not memoized

[LOW] No request deduplication on frontend

11 KiB

Raw Blame History