Implement structured logging to centralized platforms (Datadog, Papertrail, ELK)

This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure.

## Key Changes:

### 1. New Documentation: Docs/Observability.md
- Comprehensive guide to logging architecture and configuration
- Covers all three supported platforms (Datadog, Papertrail, Elasticsearch)
- Includes best practices, security considerations, and troubleshooting
- Documents sensitive data handling and compliance requirements

### 2. Core Implementation: app/utils/external_logging.py
- ExternalLogHandler: Abstract base class for non-blocking log delivery
- DatadogLogHandler: HTTP API integration with JSON payloads
- PapertrailLogHandler: Syslog protocol over TCP
- ElasticsearchLogHandler: Bulk API integration with NDJSON format
- Features:
  - Async buffering with configurable batch size and flush interval
  - Exponential backoff retry logic
  - Non-blocking delivery (never blocks application logic)
  - Proper error handling and internal logging
  - Lifecycle management (start/shutdown)

### 3. Configuration: app/config.py
- New Settings fields for external logging:
  - external_logging_enabled (default: False)
  - external_logging_provider (datadog/papertrail/elasticsearch)
  - external_logging_buffer_size (default: 1000)
  - external_logging_flush_interval_seconds (default: 5.0)
  - Provider-specific configuration (API keys, hosts, batch sizes)
- All fields have sensible defaults
- Full field validation and normalization

### 4. Integration: app/main.py
- Global _external_log_handler for application lifecycle
- _external_logging_processor: structlog processor for handler integration
- Updated _configure_logging(): Add handler to processor chain when enabled
- Updated _lifespan(): Initialize handler before startup, shutdown on termination

### 5. Tests: backend/tests/test_external_logging.py
- 20 comprehensive tests covering all handlers and factory
- Configuration validation tests
- All tests passing

## Design Decisions:

1. **Non-blocking Delivery**: External logging never blocks request handling.
   Failures are logged locally but don't impact application.

2. **Buffering Strategy**: In-memory buffer with configurable size prevents
   unbounded memory growth. When buffer fills, oldest logs are dropped with
   a warning.

3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried
   with exponential backoff. Permanent failures (bad credentials) are logged
   and skipped.

4. **Disabled by Default**: External logging is opt-in via environment
   variables, maintaining backward compatibility with existing deployments.

5. **Provider Flexibility**: Support for multiple platforms allows users to
   choose based on their infrastructure (cloud-native, on-premise, etc).

## Backward Compatibility:

- All new configuration fields have defaults
- External logging disabled by default
- No changes to existing logging behavior unless explicitly configured
- No new required dependencies

## Testing:

- All 20 new tests passing
- Existing tests unaffected (same count of passing tests)
- Configuration validation tested
- Handler creation and lifecycle management tested

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-01 18:25:26 +02:00
parent 60d9c5b340
commit 37078b742b
6 changed files with 1383 additions and 53 deletions

View File

@@ -280,6 +280,124 @@ class Settings(BaseSettings):
) from e
return value
external_logging_enabled: bool = Field(
default=False,
description=(
"Enable sending logs to an external centralized logging platform. "
"When disabled (default), logs are written to stdout only. "
"When enabled, set external_logging_provider and provider-specific settings."
),
)
external_logging_provider: Literal["datadog", "papertrail", "elasticsearch"] | None = Field(
default=None,
description=(
"External logging platform provider. "
"Set to 'datadog', 'papertrail', or 'elasticsearch'. "
"Only used when external_logging_enabled is true."
),
)
external_logging_buffer_size: int = Field(
default=1000,
ge=10,
description=(
"Maximum number of log records to buffer in memory before dropping oldest logs. "
"Prevents unbounded memory growth if the external system is temporarily unavailable."
),
)
external_logging_flush_interval_seconds: float = Field(
default=5.0,
gt=0.0,
description=(
"Maximum time in seconds to buffer logs before sending to the external system. "
"Logs are sent earlier if the batch size is reached."
),
)
datadog_api_key: str | None = Field(
default=None,
description=(
"Datadog API key for sending logs. Required when external_logging_provider is 'datadog'. "
"Obtain from Datadog organization settings."
),
)
datadog_site: str = Field(
default="datadoghq.com",
description=(
"Datadog site: 'datadoghq.com' for US or 'datadoghq.eu' for EU. "
"Only used when external_logging_provider is 'datadog'."
),
)
datadog_batch_size: int = Field(
default=10,
ge=1,
description=(
"Number of log records to batch before sending to Datadog. "
"Smaller batches send logs faster; larger batches are more efficient."
),
)
papertrail_host: str | None = Field(
default=None,
description=(
"Papertrail host address (e.g., 'logs1.papertrailapp.com'). "
"Required when external_logging_provider is 'papertrail'."
),
)
papertrail_port: int | None = Field(
default=None,
ge=1,
le=65535,
description=(
"Papertrail port number. Required when external_logging_provider is 'papertrail'. "
"Typically 12345 or in range 10000-32768."
),
)
papertrail_program_name: str = Field(
default="bangui",
description=(
"Program name to include in Syslog messages sent to Papertrail. "
"Useful for filtering logs by program in Papertrail UI."
),
)
elasticsearch_hosts: str | list[str] = Field(
default_factory=list,
description=(
"Elasticsearch host addresses. Can be comma-separated string or list. "
"Examples: 'http://elasticsearch:9200' or 'http://es1:9200,http://es2:9200'. "
"Required when external_logging_provider is 'elasticsearch'."
),
)
elasticsearch_index_prefix: str = Field(
default="bangui",
description=(
"Prefix for Elasticsearch indices where logs are stored. "
"Final index names will be '{prefix}-{date}' or similar."
),
)
elasticsearch_batch_size: int = Field(
default=10,
ge=1,
description=(
"Number of log documents to batch before sending to Elasticsearch. "
"Larger batches are more efficient but introduce slight latency."
),
)
@field_validator("elasticsearch_hosts", mode="before")
@classmethod
def _normalize_elasticsearch_hosts(cls, value: str | list[str] | None) -> list[str]:
"""Normalize elasticsearch_hosts from comma-separated string to list.
Args:
value: A comma-separated string or list of host URLs.
Returns:
A list of normalized host URLs.
"""
if value is None or (isinstance(value, list) and len(value) == 0):
return []
if isinstance(value, str):
return [host.strip() for host in value.split(",") if host.strip()]
return value
model_config = SettingsConfigDict(
env_prefix="BANGUI_",
env_file=".env",