This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure.
## Key Changes:
### 1. New Documentation: Docs/Observability.md
- Comprehensive guide to logging architecture and configuration
- Covers all three supported platforms (Datadog, Papertrail, Elasticsearch)
- Includes best practices, security considerations, and troubleshooting
- Documents sensitive data handling and compliance requirements
### 2. Core Implementation: app/utils/external_logging.py
- ExternalLogHandler: Abstract base class for non-blocking log delivery
- DatadogLogHandler: HTTP API integration with JSON payloads
- PapertrailLogHandler: Syslog protocol over TCP
- ElasticsearchLogHandler: Bulk API integration with NDJSON format
- Features:
- Async buffering with configurable batch size and flush interval
- Exponential backoff retry logic
- Non-blocking delivery (never blocks application logic)
- Proper error handling and internal logging
- Lifecycle management (start/shutdown)
### 3. Configuration: app/config.py
- New Settings fields for external logging:
- external_logging_enabled (default: False)
- external_logging_provider (datadog/papertrail/elasticsearch)
- external_logging_buffer_size (default: 1000)
- external_logging_flush_interval_seconds (default: 5.0)
- Provider-specific configuration (API keys, hosts, batch sizes)
- All fields have sensible defaults
- Full field validation and normalization
### 4. Integration: app/main.py
- Global _external_log_handler for application lifecycle
- _external_logging_processor: structlog processor for handler integration
- Updated _configure_logging(): Add handler to processor chain when enabled
- Updated _lifespan(): Initialize handler before startup, shutdown on termination
### 5. Tests: backend/tests/test_external_logging.py
- 20 comprehensive tests covering all handlers and factory
- Configuration validation tests
- All tests passing
## Design Decisions:
1. **Non-blocking Delivery**: External logging never blocks request handling.
Failures are logged locally but don't impact application.
2. **Buffering Strategy**: In-memory buffer with configurable size prevents
unbounded memory growth. When buffer fills, oldest logs are dropped with
a warning.
3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried
with exponential backoff. Permanent failures (bad credentials) are logged
and skipped.
4. **Disabled by Default**: External logging is opt-in via environment
variables, maintaining backward compatibility with existing deployments.
5. **Provider Flexibility**: Support for multiple platforms allows users to
choose based on their infrastructure (cloud-native, on-premise, etc).
## Backward Compatibility:
- All new configuration fields have defaults
- External logging disabled by default
- No changes to existing logging behavior unless explicitly configured
- No new required dependencies
## Testing:
- All 20 new tests passing
- Existing tests unaffected (same count of passing tests)
- Configuration validation tested
- Handler creation and lifecycle management tested
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>