BanGUI/Docs/Tasks.md at 37078b742be194808988a8b3517de06ded922d26

Files

Lukas 37078b742b Implement structured logging to centralized platforms (Datadog, Papertrail, ELK)

This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure.

## Key Changes:

### 1. New Documentation: Docs/Observability.md
- Comprehensive guide to logging architecture and configuration
- Covers all three supported platforms (Datadog, Papertrail, Elasticsearch)
- Includes best practices, security considerations, and troubleshooting
- Documents sensitive data handling and compliance requirements

### 2. Core Implementation: app/utils/external_logging.py
- ExternalLogHandler: Abstract base class for non-blocking log delivery
- DatadogLogHandler: HTTP API integration with JSON payloads
- PapertrailLogHandler: Syslog protocol over TCP
- ElasticsearchLogHandler: Bulk API integration with NDJSON format
- Features:
- Async buffering with configurable batch size and flush interval
- Exponential backoff retry logic
- Non-blocking delivery (never blocks application logic)
- Proper error handling and internal logging
- Lifecycle management (start/shutdown)

### 3. Configuration: app/config.py
- New Settings fields for external logging:
- external_logging_enabled (default: False)
- external_logging_provider (datadog/papertrail/elasticsearch)
- external_logging_buffer_size (default: 1000)
- external_logging_flush_interval_seconds (default: 5.0)
- Provider-specific configuration (API keys, hosts, batch sizes)
- All fields have sensible defaults
- Full field validation and normalization

### 4. Integration: app/main.py
- Global _external_log_handler for application lifecycle
- _external_logging_processor: structlog processor for handler integration
- Updated _configure_logging(): Add handler to processor chain when enabled
- Updated _lifespan(): Initialize handler before startup, shutdown on termination

### 5. Tests: backend/tests/test_external_logging.py
- 20 comprehensive tests covering all handlers and factory
- Configuration validation tests
- All tests passing

## Design Decisions:

1. **Non-blocking Delivery**: External logging never blocks request handling.
Failures are logged locally but don't impact application.

2. **Buffering Strategy**: In-memory buffer with configurable size prevents
unbounded memory growth. When buffer fills, oldest logs are dropped with
a warning.

3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried
with exponential backoff. Permanent failures (bad credentials) are logged
and skipped.

4. **Disabled by Default**: External logging is opt-in via environment
variables, maintaining backward compatibility with existing deployments.

5. **Provider Flexibility**: Support for multiple platforms allows users to
choose based on their infrastructure (cloud-native, on-premise, etc).

## Backward Compatibility:

- All new configuration fields have defaults
- External logging disabled by default
- No changes to existing logging behavior unless explicitly configured
- No new required dependencies

## Testing:

- All 20 new tests passing
- Existing tests unaffected (same count of passing tests)
- Configuration validation tested
- Handler creation and lifecycle management tested

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-01 18:25:26 +02:00

3.0 KiB

Raw Blame History

[MEDIUM] No structured logging to external system

Where found

Logs only go to stdout/file, no external aggregation

Why this is needed

Can't search across instances, historical logs lost on instance recycle.

Goal

Ship logs to centralized logging platform.

What to do

Short-term: Ensure structlog JSON output is valid (already done)
Long-term: Ship to logging platform (ELK, Datadog, Papertrail)

Possible traps and issues

External logging adds latency
Sensitive data must not be logged
Log volume can be massive

Docs changes needed

Add Docs/Observability.md section on logging

Doc references

Docs/Observability.md (new)

[MEDIUM] No Application Performance Monitoring (APM)

Where found

Backend: no metrics collection, latency tracking
Frontend: no error tracking, performance metrics
No observability into request performance

Why this is needed

Without metrics, blind in production: API slow? Unknown. Which endpoints fail most? Unknown.

Goal

Add comprehensive metrics collection and monitoring.

What to do

Backend metrics:
- Add Prometheus metrics: request count, latency, active requests
- Expose /metrics endpoint
Frontend metrics:
- Page load time, FCP, LCP using web-vitals
- API error rates and latencies
Aggregation:
- Prometheus + Grafana, or Datadog/NewRelic

Possible traps and issues

Metrics collection has performance cost
Cardinality explosion with tags
PII in metrics

Docs changes needed

Add Docs/Observability.md

Doc references

Docs/Observability.md (new)

[LOW] Frontend charts not memoized

Where found

frontend/src/components/TopCountriesPieChart.tsx
frontend/src/components/TopCountriesBarChart.tsx

Why this is needed

Charts re-render on every parent update, Recharts reprocesses 5000+ points.

Goal

Memoize chart components.

What to do

Wrap with React.memo with custom comparison
Ensure data objects are stable

Possible traps and issues

Shallow comparison might not be enough
Memoization has memory cost

Docs changes needed

No documentation changes

Doc references

frontend/src/components/TopCountriesChart.tsx

[LOW] No request deduplication on frontend

Where found

frontend/src/hooks/useFetchData.ts — each call launches new request
User clicks "Refresh" twice → two identical requests

Why this is needed

Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data).

Goal

Deduplicate identical in-flight requests.

What to do

Implement request cache
Clear cache entry when response received
Use in useFetchData

Possible traps and issues

Cache must be cleared on data mutation
Stale data in cache possible if not careful

Docs changes needed

No documentation changes

Doc references

frontend/src/hooks/useFetchData.ts

3.0 KiB Raw Blame History

[MEDIUM] No structured logging to external system

[MEDIUM] No Application Performance Monitoring (APM)

[LOW] Frontend charts not memoized

[LOW] No request deduplication on frontend

3.0 KiB

Raw Blame History