Files
BanGUI/Docs/Tasks.md
Lukas 3bd2a71367 Refactor usePolledData hook and add comprehensive tests
- Renamed usePolledIntervalCheck to usePolledData for clarity
- Updated hook to properly manage interval cleanup on unmount
- Added comprehensive test suite covering normal operation, error handling, and cleanup
- Updated documentation to reflect new hook name
- Updated Tasks.md to track progress

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-30 20:24:47 +02:00

1053 lines
28 KiB
Markdown

## [Frontend] usePolledData — setInterval without drift correction
**Where found**
- `frontend/src/hooks/usePolledData.ts` — uses `setInterval` without tracking expected next poll time
**Why this is needed**
With fixed `setInterval`, if `refetch` takes time (e.g., 2 seconds for slow API) and `pollInterval` is 5 seconds, the actual polling pattern accumulates drift. Effective polling interval becomes > 5 seconds, wasting bandwidth and CPU.
**Goal**
Implement drift-corrected polling that schedules next poll relative to **completion** of previous poll, not its start.
**What to do**
1. Replace `setInterval` with self-scheduling timeout
2. Track elapsed time between poll start and completion
3. Schedule next poll with compensation: `delay = Math.max(0, pollInterval - elapsed)`
4. Alternatively use `use-sse` or custom hook `useDriftCorrectedPolling`
**Possible traps and issues**
- Cancellation check must happen both before and after `await refetch()` to prevent race conditions
- Coordinate cancellation with `AbortController` already used by `useFetchData`
- If `pollInterval` changes, hook must restart polling loop
**Docs changes needed**
- Update `frontend/src/hooks/README.md` — document drift-corrected polling behavior
**Doc references**
- `frontend/src/hooks/usePolledData.ts`
- `frontend/src/hooks/README.md`
---
## [Frontend] ErrorBoundary — non-standard `componentStack` property
**Where found**
- `frontend/src/components/ErrorBoundary.tsx` — accesses `errorInfo.componentStack`
- Property not in `React.ErrorInfo` public type definitions
**Why this is needed**
Accessing `errorInfo.componentStack` produces TypeScript error under strict mode. Property is React DevTools implementation detail, may change without notice.
**Goal**
Remove dependency on `errorInfo.componentStack`, use alternative for stack information.
**What to do**
1. Check what `recordCritical` does with `componentStack` parameter
2. Consider using standard `Error.prototype.stack` instead
3. Or cast `errorInfo` to `any` with comment explaining trade-off:
```typescript
const stack = (errorInfo as any).componentStack as string | undefined;
// TODO: Remove when React types include componentStack
```
4. Handle case where `componentStack` is `undefined`
**Possible traps and issues**
- `Error.prototype.stack` contains JavaScript call stack, not React component stack
- Casting to `any` suppresses type checking — ensure it's documented
- In production builds `componentStack` may be empty or absent
**Docs changes needed**
- Update `Docs/Architekture.md` § 3.2 (Components) — note that `ErrorBoundary` uses React-internal error info
**Doc references**
- `frontend/src/components/ErrorBoundary.tsx`
- `Docs/Architekture.md` § 3.2 (Components section)
---
## [Frontend] No loading skeleton for Login page
**Where found**
- `frontend/src/pages/LoginPage.tsx` — renders password form immediately without checking session validation state
**Why this is needed**
`AuthProvider` validates session on mount. During validation (network round-trip), `LoginPage` renders form. If user already has valid session, they briefly see login page before redirect — "flash" of wrong page.
**Goal**
Show loading state (spinner or skeleton) while session validation in progress.
**What to do**
1. Add `useSessionValidation` call to `LoginPage` to get `isValidating` flag
2. If `isValidating`, return `<Spinner label="Checking session..." />`
3. Alternatively, have `AuthProvider` expose `isValidating` state that `LoginPage` can read
4. Ensure redirect to dashboard happens automatically via `useEffect`
**Possible traps and issues**
- `useSessionValidation` may not expose `isValidating` — check hook implementation
- Ensure loading spinner is consistent with Fluent UI design language
- Redirect logic must handle `?next=` query parameter
**Docs changes needed**
- No documentation changes needed — UX improvement to existing component
**Doc references**
- `frontend/src/pages/LoginPage.tsx`
- `frontend/src/providers/AuthProvider.tsx`
---
## [CRITICAL] Backend session cache not cluster-safe
**Where found**
- `backend/app/startup.py:66-91` — `verify_bangui_workers()` validates `BANGUI_WORKERS` env var but only if explicitly set
- `backend/app/utils/session_cache.py` — in-memory cache stored on `app.state` (process-local)
- Multi-worker deployments use separate worker processes with separate `app.state`
**Why this is needed**
Multi-worker deployments (e.g., `gunicorn -w 4`) create separate worker processes. Each has its own in-memory `app.state`, so sessions cached in Worker A are invisible to Workers B, C, D. Request landing on different worker shows session invalid — users randomly logged out.
**Goal**
Ensure multi-worker deployments either prevented at startup or migrate to shared session store (Redis, PostgreSQL).
**What to do**
**Option A (Strict single-worker):**
- Add runtime detection of actual worker count (not just env var checking)
- Fail startup if multi-worker scenario detected
- Document single-worker requirement prominently
**Option B (Support multi-worker):**
- Migrate session cache to Redis or PostgreSQL
- Update `app/utils/session_cache.py` to use distributed backend
- Add `BANGUI_REDIS_URL` or similar to config
**Immediate mitigation:**
- Document in deployment guide that `BANGUI_WORKERS=1` is mandatory
- Add validation that fails loudly if `BANGUI_WORKERS != 1`
**Possible traps and issues**
- `uvicorn --workers N` creates N processes, each with separate `app.state`
- Environment variable validation easily bypassed using command-line flag
- Detecting actual worker count at runtime is tricky — consider `os.getpid()` and shared lock file
**Docs changes needed**
- Update `Docs/Deployment.md` § Deployment Constraints — explicitly document single-worker requirement
- Add troubleshooting entry: "Why am I randomly logged out?" → Check `BANGUI_WORKERS`
- Update `Docker/docker-compose.yml` with comment explaining requirement
**Doc references**
- `Docs/Deployment.md` (deployment constraints)
- `backend/app/startup.py` (worker validation)
- `backend/app/utils/session_cache.py` (session cache)
---
## [CRITICAL] Frontend-Backend type generation drift
**Where found**
- `frontend/src/types/` — manually maintained Pydantic model transcriptions
- `backend/app/models/` — source Pydantic models
- No build-time validation or code generation linking them
**Why this is needed**
Types manually copied from Python models. When backend model changes, frontend types not updated automatically. Frontend uses stale types, causing runtime errors, type errors at build time, or silent UI bugs.
**Goal**
Implement automated type synchronization from backend schema to frontend types, validated at build time.
**What to do**
1. **Recommended:** Generate TypeScript types from OpenAPI schema
```bash
openapi-typescript http://localhost:8000/openapi.json -o src/types/generated.ts
```
2. **Setup:**
- Ensure backend exposes OpenAPI schema at `/openapi.json` (FastAPI has this built-in)
- Add `openapi-typescript` to `frontend/package.json` devDependencies
- Generate types in pre-build step: `npm run generate:types`
- Import types from generated file, not hand-written
- Add CI check: fail build if generated types don't match committed types
3. **Alternative:** Use `typed-rest-client` or `msw` with type generation
**Possible traps and issues**
- OpenAPI schema must be kept up-to-date — CI validation must enforce
- Generated types may have different names than hand-written types — migration needed
- Private/internal model fields should be excluded from schema
**Docs changes needed**
- Update `Docs/Web-Development.md` § Type Generation — document workflow
- Add pre-commit hook documentation
- Update `Docs/Backend-Development.md` to note API changes must keep schema in sync
**Doc references**
- `Docs/Web-Development.md` (type generation)
- `Docs/Backend-Development.md` (API changes)
---
## [CRITICAL] Docker containers lack resource limits
**Where found**
- `Docker/docker-compose.yml` — no `deploy.limits` or `deploy.reservations` sections
**Why this is needed**
Without resource limits, single container can consume all host CPU, memory, disk. "Noisy neighbor" scenario where backend memory leak → uses 100% RAM → OOM kill → host unresponsive.
**Goal**
Set hard and soft resource limits for all containers.
**What to do**
1. Add resource limits to `docker-compose.yml`:
```yaml
backend:
deploy:
limits:
cpus: '2'
memory: 512M
reservations:
cpus: '1'
memory: 256M
```
2. Document these limits in `Docs/Deployment.md`
3. For Kubernetes, add equivalent `resources.limits` and `resources.requests`
**Possible traps and issues**
- Limits set too low → OOM kill or throttling
- Backend may need more memory for large blocklists
- Test under expected load before finalizing
- Different environments may need different limits
**Docs changes needed**
- Update `Docker/docker-compose.yml` with `deploy` sections
- Add section in `Docs/Deployment.md` § Resource Allocation
**Doc references**
- `Docker/docker-compose.yml`
- `Docs/Deployment.md` (resource allocation)
---
## [CRITICAL] Global rate limiting missing
**Where found**
- `backend/app/routers/auth.py` — only `/api/auth/login` has rate limiting
- All other routers have no rate limiting
**Why this is needed**
Without rate limiting, attackers can spam endpoints to cause CPU spike, database overload, or network bandwidth exhaustion.
**Goal**
Implement global per-IP rate limiting on all endpoints.
**What to do**
1. Add rate limiting middleware to `backend/app/main.py`:
```python
from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address, default_limits=["200 per minute"])
app.state.limiter = limiter
```
2. Apply to all routers with appropriate limits per endpoint
3. Return proper HTTP 429 with `Retry-After` header
4. Document limits in API docs
**Possible traps and issues**
- Limits set too low block legitimate users
- Distributed deployments need shared limiter state (Redis-backed)
- Different endpoints may need different limits
- Trusted IPs should bypass limiting
**Docs changes needed**
- Add section in `Docs/Backend-Development.md` § Rate Limiting
- Document default limits in deployment guide
**Doc references**
- `Docs/Backend-Development.md` (rate limiting)
- `backend/app/main.py` (middleware setup)
---
## [CRITICAL] Missing security headers (CSP, X-Frame-Options, etc.)
**Where found**
- Backend does not set `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options` headers
- Frontend HTML served without CSP meta tags
**Why this is needed**
Without security headers, browsers won't protect against XSS, clickjacking, MIME-sniffing, referrer leakage attacks.
**Goal**
Add security headers to all HTTP responses.
**What to do**
1. Add security headers middleware to `backend/app/main.py`:
```python
@app.middleware("http")
async def add_security_headers(request, call_next):
response = await call_next(request)
response.headers["Content-Security-Policy"] = "default-src 'self'"
response.headers["X-Frame-Options"] = "DENY"
response.headers["X-Content-Type-Options"] = "nosniff"
return response
```
2. In frontend `index.html`, add CSP meta tag
3. Test with browser DevTools Security tab
**Possible traps and issues**
- CSP `'unsafe-inline'` defeats security — avoid if possible
- CDN resources may need explicit allowlist
- Too restrictive CSP breaks functionality; too loose defeats security
**Docs changes needed**
- Add section in `Docs/Security.md` § HTTP Security Headers
**Doc references**
- `Docs/Security.md` (security headers)
---
## [CRITICAL] Background tasks lack timeout protection
**Where found**
- `backend/app/tasks/blocklist_import.py` — no timeout
- `backend/app/tasks/health_check.py` — no timeout
- All task functions lack timeout wrapper
**Why this is needed**
If task hangs (API unreachable, network partition), task runs forever. Never completes → lock never released → duplicate work, resource exhaustion.
**Goal**
Ensure all background tasks complete within bounded time or fail gracefully.
**What to do**
1. Wrap all task functions with `asyncio.wait_for(task, timeout)`:
```python
await asyncio.wait_for(blocklist_service.import_all(...), timeout=300)
```
2. Set appropriate timeouts per task:
- Blocklist import: 300s (5 min)
- Health probe: 10s
- Geo cache flush: 60s
3. Log timeout events and trigger alerts
**Possible traps and issues**
- Timeout too short → legitimate tasks killed prematurely
- Timeout too long → resource leak if many tasks hang
- Killing task mid-operation may leave inconsistent state
**Docs changes needed**
- Add section in `Docs/Backend-Development.md` § Background Tasks
**Doc references**
- `Docs/Backend-Development.md` (background tasks)
- `backend/app/tasks/` (task modules)
---
## [CRITICAL] Background tasks not idempotent
**Where found**
- `backend/app/tasks/blocklist_import.py` — bans applied without checking if already banned
- `backend/app/tasks/geo_cache_flush.py` — cache entries written without transaction
- Multi-step operations not wrapped in transaction
**Why this is needed**
If task crashes mid-execution, partial state remains. On retry: bans applied again → duplicates, cache entries written twice → corruption.
**Goal**
Make all background tasks idempotent — retrying produces same result as running once.
**What to do**
1. Use operation IDs to deduplicate:
```python
operation_id = f"import_{source.id}_{datetime.now().date().isoformat()}"
if await import_log_repo.get_by_operation_id(operation_id):
return # Already done
```
2. Use transactions for multi-step operations
3. Store operation state before execution
**Possible traps and issues**
- Idempotency keys must be unique but deterministic
- Transactions require database support
- State machine (pending → completed/failed) must be enforced
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Task Idempotency
**Doc references**
- `Docs/Backend-Development.md` (task design)
- `backend/app/tasks/` (task implementations)
---
## [CRITICAL] Health check endpoint returns wrong status code
**Where found**
- `backend/app/routers/health.py` — always returns 200, even when fail2ban offline
**Why this is needed**
Docker health checks interpret 200 as "healthy". If fail2ban offline but backend returns 200, Docker thinks container healthy and doesn't restart it.
**Goal**
Return 503 Service Unavailable when fail2ban is offline.
**What to do**
1. Change health endpoint to return 503 when offline:
```python
if not server_status.online:
return JSONResponse(
status_code=503,
content={"status": "unavailable", "fail2ban": "offline"}
)
```
2. Update Docker health check to expect 503 as "unhealthy"
**Possible traps and issues**
- Returning 503 causes orchestration tools to restart container
- If fail2ban restarts frequently, health check becomes flaky
- Consider gradual degradation
**Docs changes needed**
- Update `Docker/Dockerfile.backend` health check documentation
- Update `Docs/Deployment.md` § Health Checks
**Doc references**
- `backend/app/routers/health.py`
- `Docker/Dockerfile.backend`
---
## [IMPORTANT] Database transactions lack explicit isolation
**Where found**
- `backend/app/repositories/session_repo.py:40-60` — multiple queries without `BEGIN TRANSACTION`
- Similar pattern in multi-step operations across repositories
**Why this is needed**
Without explicit boundaries, concurrent requests can race: Thread A checks if exists → not found, Thread B checks same → not found, Thread A inserts → succeeds, Thread B inserts → duplicate error or silent overwrite.
**Goal**
Wrap all multi-step operations in explicit transactions with appropriate isolation level.
**What to do**
1. Use explicit `BEGIN IMMEDIATE` transaction:
```python
await db.execute("BEGIN IMMEDIATE")
try:
await db.execute("INSERT INTO sessions ...")
await db.commit()
except Exception:
await db.rollback()
raise
```
2. Use `IMMEDIATE` mode to lock immediately for writes
3. Document transaction boundaries clearly
**Possible traps and issues**
- Nested transactions (SAVEPOINTs) may be needed
- Locks held too long cause contention
- Deadlocks possible with concurrent writers
**Docs changes needed**
- Add section in `Docs/Backend-Development.md` § Database Transactions
**Doc references**
- `Docs/Backend-Development.md` (database design)
---
## [IMPORTANT] Scheduler lock race condition
**Where found**
- `backend/app/utils/scheduler_lock.py:56-58` — heartbeat interval 10 seconds
**Why this is needed**
Current design: Process A acquires lock, heartbeat misses, lock expires, Process B acquires lock, both running simultaneously → duplicate work, data corruption.
**Goal**
Implement robust distributed locking that prevents concurrent execution.
**What to do**
**Option A (Strengthen heartbeat):**
- Reduce interval to 5s (half of timeout)
- Use database advisory locks
- Monitor heartbeat failures
**Option B (Migrate to Redis):**
- Use `redlock-py` or `aioredis`
- Simpler, more reliable than database-backed
**Current code improvements:**
- Log when heartbeat fails
- Add metric for lock contention
- Test multi-process scenario
**Possible traps and issues**
- Database locks don't scale under high contention
- Redis adds new dependency
- Clock skew breaks timestamp-based expiry
**Docs changes needed**
- Update `Docs/Deployment.md` § Scheduler Lock
- Add troubleshooting: "Blocklist import runs twice"
**Doc references**
- `Docs/Deployment.md` (scheduler)
- `backend/app/utils/scheduler_lock.py` (lock implementation)
---
## [IMPORTANT] API pagination doesn't return metadata
**Where found**
- `backend/app/routers/history.py` — returns bare list, no pagination metadata
- All paginated routers have same issue
**Why this is needed**
Frontend receives bare list, cannot determine: total results, whether more pages exist, last page number. Must guess or re-query.
**Goal**
Return pagination metadata with every paginated response.
**What to do**
1. Create response wrapper:
```python
class PaginatedResponse(BaseModel):
data: list[Item]
pagination: PaginationMetadata
```
2. Update all paginated routers to return this wrapper
3. Update frontend to use metadata for UI
**Possible traps and issues**
- `SELECT COUNT(*)` is slow on large tables
- Response shape change — old frontend may not handle
**Docs changes needed**
- Update API documentation § Pagination
**Doc references**
- `backend/app/utils/pagination.py`
---
## [IMPORTANT] Error response schema inconsistent
**Where found**
- Different handlers return different response shapes
- Fail2Ban errors: `{ "error_code": "...", "detail": "..." }`
- Validation errors: `{ "detail": [...] }`
- Not found errors: `{ "detail": "...", "error_code": "..." }`
**Why this is needed**
Frontend must normalize multiple shapes, making error handling fragile and error-prone.
**Goal**
Unify all error responses to single schema.
**What to do**
1. Define canonical error response:
```python
class ErrorResponse(BaseModel):
error_code: str
message: str
status: int
details: dict | None = None
```
2. Update all handlers to return this format
3. Update frontend to expect unified schema
**Possible traps and issues**
- Backward compatibility with old clients
- FastAPI's built-in handlers may override custom
- Rich detail structures need accommodation
**Docs changes needed**
- Update API documentation with unified error schema
- Add error code reference table
**Doc references**
- `Docs/API.md` (error codes)
- `backend/app/main.py` (exception handlers)
---
## [IMPORTANT] Provider ordering fragility (Frontend)
**Where found**
- `frontend/src/App.tsx` — 10-level deep provider nesting
- `frontend/src/providers/PROVIDER_ORDER.md` — documents order, no compile-time enforcement
**Why this is needed**
Provider order (ThemeProvider → AppContents → FluentProvider → ...) enforced only at runtime. Accidental reorder caught only after deploy.
**Goal**
Add compile-time validation of provider ordering.
**What to do**
1. Create provider composition utility enforcing order
2. Use TypeScript discriminated unions
3. Add ESLint rule to check provider wrapping
**Possible traps and issues**
- TypeScript doesn't easily enforce ordering
- May be overkill — improve runtime error messages instead
**Docs changes needed**
- Update `Docs/Architekture.md` § 3.2 (Providers)
**Doc references**
- `Docs/Architekture.md` § 3.2 (Providers)
- `frontend/src/providers/PROVIDER_ORDER.md`
---
## [IMPORTANT] Promise cancellation not checked in .then()/.catch() chains
**Where found**
- `frontend/src/components/blocklist/BlocklistSourcesSection.tsx:84-88`
- `frontend/src/components/blocklist/BlocklistScheduleSection.tsx:49-58`
- Multiple components use this pattern
**Why this is needed**
When user navigates away, `.then()` chains don't check if cancelled. State updated on unmounted component → React warnings, memory leak, notification shows wrong context.
**Goal**
Check for cancellation in all `.then()/.catch()` chains.
**What to do**
1. Replace `.then()/.catch()` with `async/await` and cancellation check
2. Or use wrapper hook to hide logic
**Possible traps and issues**
- Checking `signal.aborted` after `await` introduces race conditions
- Better: let AbortError propagate, catch it in catch block
**Docs changes needed**
- Update `Docs/Web-Development.md` § Async Patterns
**Doc references**
- `Docs/Web-Development.md` (async patterns)
---
## [MEDIUM] Inefficient database pagination uses OFFSET
**Where found**
- `backend/app/utils/pagination.py` — uses `OFFSET (page-1) * page_size`
**Why this is needed**
OFFSET scans and discards N rows to fetch N+limit. Last page on 10M row table: 15 seconds ⚠️
**Goal**
Implement keyset pagination (cursor-based) for large result sets.
**What to do**
1. **Short-term:** Add database indexes on sort columns
2. **Long-term:** Implement cursor-based pagination using WHERE instead of OFFSET
3. Frontend sends cursor (last row ID) instead of page number
**Possible traps and issues**
- Cursor must be deterministic
- API contract changes
- Cursor format must be opaque to client
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Database Performance
**Doc references**
- `Docs/Backend-Development.md` (database performance)
---
## [MEDIUM] Session secret rotation not implemented
**Where found**
- `backend/app/config.py` — single `session_secret` with no rotation support
**Why this is needed**
If secret leaks, all sessions compromised. No way to invalidate old sessions.
**Goal**
Support gradual secret rotation without forcing logout.
**What to do**
1. Store multiple secrets: current and previous
2. Accept tokens signed with either key
3. Re-sign tokens with current secret on validation
**Possible traps and issues**
- Rotation strategy must be documented
- Metrics needed to track secret usage
**Docs changes needed**
- Update `Docs/Backend-Development.md` § Session Management
**Doc references**
- `Docs/Backend-Development.md`
---
## [MEDIUM] No CORS configuration
**Where found**
- `backend/app/main.py` — no CORS middleware added
**Why this is needed**
If frontend on different origin, cross-origin requests blocked without CORS configuration.
**Goal**
Add CORS middleware with proper origin whitelisting.
**What to do**
1. Add CORS middleware with specific origin whitelist
2. Make configurable via environment variable
3. Default to localhost for development
**Possible traps and issues**
- `allow_origins=["*"]` defeats CORS security
- Credentials require specific origins, not wildcard
- Missing config silently fails in browser
**Docs changes needed**
- Update `Docs/Deployment.md` § CORS Configuration
**Doc references**
- `Docs/Deployment.md`
---
## [MEDIUM] Input validation missing for regex patterns (ReDoS)
**Where found**
- `backend/app/routers/config.py` — regex validation accepts arbitrary patterns without timeout
**Why this is needed**
Malicious regex causes catastrophic backtracking (ReDoS). Attacker sends pattern → compilation hangs → DoS.
**Goal**
Add timeout and complexity limits to regex validation.
**What to do**
1. Add timeout to regex compilation (2 seconds recommended)
2. Add length limit (reject patterns > 1000 characters)
3. Use `signal.alarm()` (Unix) or timeout library
**Possible traps and issues**
- `signal.alarm()` Unix-only
- Some valid complex regexes may timeout
- Frontend should also validate (defense in depth)
**Docs changes needed**
- Update API docs to document regex validation limits
**Doc references**
- `backend/app/routers/config.py`
---
## [MEDIUM] No structured logging to external system
**Where found**
- Logs only go to stdout/file, no external aggregation
**Why this is needed**
Can't search across instances, historical logs lost on instance recycle.
**Goal**
Ship logs to centralized logging platform.
**What to do**
1. **Short-term:** Ensure `structlog` JSON output is valid (already done)
2. **Long-term:** Ship to logging platform (ELK, Datadog, Papertrail)
**Possible traps and issues**
- External logging adds latency
- Sensitive data must not be logged
- Log volume can be massive
**Docs changes needed**
- Add `Docs/Observability.md` section on logging
**Doc references**
- `Docs/Observability.md` (new)
---
## [MEDIUM] No Application Performance Monitoring (APM)
**Where found**
- Backend: no metrics collection, latency tracking
- Frontend: no error tracking, performance metrics
- No observability into request performance
**Why this is needed**
Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.
**Goal**
Add comprehensive metrics collection and monitoring.
**What to do**
1. **Backend metrics:**
- Add Prometheus metrics: request count, latency, active requests
- Expose `/metrics` endpoint
2. **Frontend metrics:**
- Page load time, FCP, LCP using `web-vitals`
- API error rates and latencies
3. **Aggregation:**
- Prometheus + Grafana, or Datadog/NewRelic
**Possible traps and issues**
- Metrics collection has performance cost
- Cardinality explosion with tags
- PII in metrics
**Docs changes needed**
- Add `Docs/Observability.md`
**Doc references**
- `Docs/Observability.md` (new)
---
## [LOW] Frontend charts not memoized
**Where found**
- `frontend/src/components/TopCountriesPieChart.tsx`
- `frontend/src/components/TopCountriesBarChart.tsx`
**Why this is needed**
Charts re-render on every parent update, Recharts reprocesses 5000+ points.
**Goal**
Memoize chart components.
**What to do**
1. Wrap with `React.memo` with custom comparison
2. Ensure data objects are stable
**Possible traps and issues**
- Shallow comparison might not be enough
- Memoization has memory cost
**Docs changes needed**
- No documentation changes
**Doc references**
- `frontend/src/components/TopCountriesChart.tsx`
---
## [LOW] No request deduplication on frontend
**Where found**
- `frontend/src/hooks/useFetchData.ts` — each call launches new request
- User clicks "Refresh" twice → two identical requests
**Why this is needed**
Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).
**Goal**
Deduplicate identical in-flight requests.
**What to do**
1. Implement request cache
2. Clear cache entry when response received
3. Use in `useFetchData`
**Possible traps and issues**
- Cache must be cleared on data mutation
- Stale data in cache possible if not careful
**Docs changes needed**
- No documentation changes
**Doc references**
- `frontend/src/hooks/useFetchData.ts`