Implement structured logging to centralized platforms (Datadog, Papertrail, ELK)

This commit adds support for shipping logs to external centralized logging platforms, addressing the MEDIUM priority task for structured logging infrastructure. ## Key Changes: ### 1. New Documentation: Docs/Observability.md - Comprehensive guide to logging architecture and configuration - Covers all three supported platforms (Datadog, Papertrail, Elasticsearch) - Includes best practices, security considerations, and troubleshooting - Documents sensitive data handling and compliance requirements ### 2. Core Implementation: app/utils/external_logging.py - ExternalLogHandler: Abstract base class for non-blocking log delivery - DatadogLogHandler: HTTP API integration with JSON payloads - PapertrailLogHandler: Syslog protocol over TCP - ElasticsearchLogHandler: Bulk API integration with NDJSON format - Features: - Async buffering with configurable batch size and flush interval - Exponential backoff retry logic - Non-blocking delivery (never blocks application logic) - Proper error handling and internal logging - Lifecycle management (start/shutdown) ### 3. Configuration: app/config.py - New Settings fields for external logging: - external_logging_enabled (default: False) - external_logging_provider (datadog/papertrail/elasticsearch) - external_logging_buffer_size (default: 1000) - external_logging_flush_interval_seconds (default: 5.0) - Provider-specific configuration (API keys, hosts, batch sizes) - All fields have sensible defaults - Full field validation and normalization ### 4. Integration: app/main.py - Global _external_log_handler for application lifecycle - _external_logging_processor: structlog processor for handler integration - Updated _configure_logging(): Add handler to processor chain when enabled - Updated _lifespan(): Initialize handler before startup, shutdown on termination ### 5. Tests: backend/tests/test_external_logging.py - 20 comprehensive tests covering all handlers and factory - Configuration validation tests - All tests passing ## Design Decisions: 1. **Non-blocking Delivery**: External logging never blocks request handling. Failures are logged locally but don't impact application. 2. **Buffering Strategy**: In-memory buffer with configurable size prevents unbounded memory growth. When buffer fills, oldest logs are dropped with a warning. 3. **Retry Logic**: Transient failures (timeouts, 5xx errors) are retried with exponential backoff. Permanent failures (bad credentials) are logged and skipped. 4. **Disabled by Default**: External logging is opt-in via environment variables, maintaining backward compatibility with existing deployments. 5. **Provider Flexibility**: Support for multiple platforms allows users to choose based on their infrastructure (cloud-native, on-premise, etc). ## Backward Compatibility: - All new configuration fields have defaults - External logging disabled by default - No changes to existing logging behavior unless explicitly configured - No new required dependencies ## Testing: - All 20 new tests passing - Existing tests unaffected (same count of passing tests) - Configuration validation tested - Handler creation and lifecycle management tested Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-01 18:25:26 +02:00
parent 60d9c5b340
commit 37078b742b
6 changed files with 1383 additions and 53 deletions
--- a/backend/app/config.py
+++ b/backend/app/config.py
@@ -280,6 +280,124 @@ class Settings(BaseSettings):
            ) from e
        return value

+    external_logging_enabled: bool = Field(
+        default=False,
+        description=(
+            "Enable sending logs to an external centralized logging platform. "
+            "When disabled (default), logs are written to stdout only. "
+            "When enabled, set external_logging_provider and provider-specific settings."
+        ),
+    )
+    external_logging_provider: Literal["datadog", "papertrail", "elasticsearch"] | None = Field(
+        default=None,
+        description=(
+            "External logging platform provider. "
+            "Set to 'datadog', 'papertrail', or 'elasticsearch'. "
+            "Only used when external_logging_enabled is true."
+        ),
+    )
+    external_logging_buffer_size: int = Field(
+        default=1000,
+        ge=10,
+        description=(
+            "Maximum number of log records to buffer in memory before dropping oldest logs. "
+            "Prevents unbounded memory growth if the external system is temporarily unavailable."
+        ),
+    )
+    external_logging_flush_interval_seconds: float = Field(
+        default=5.0,
+        gt=0.0,
+        description=(
+            "Maximum time in seconds to buffer logs before sending to the external system. "
+            "Logs are sent earlier if the batch size is reached."
+        ),
+    )
+    datadog_api_key: str | None = Field(
+        default=None,
+        description=(
+            "Datadog API key for sending logs. Required when external_logging_provider is 'datadog'. "
+            "Obtain from Datadog organization settings."
+        ),
+    )
+    datadog_site: str = Field(
+        default="datadoghq.com",
+        description=(
+            "Datadog site: 'datadoghq.com' for US or 'datadoghq.eu' for EU. "
+            "Only used when external_logging_provider is 'datadog'."
+        ),
+    )
+    datadog_batch_size: int = Field(
+        default=10,
+        ge=1,
+        description=(
+            "Number of log records to batch before sending to Datadog. "
+            "Smaller batches send logs faster; larger batches are more efficient."
+        ),
+    )
+    papertrail_host: str | None = Field(
+        default=None,
+        description=(
+            "Papertrail host address (e.g., 'logs1.papertrailapp.com'). "
+            "Required when external_logging_provider is 'papertrail'."
+        ),
+    )
+    papertrail_port: int | None = Field(
+        default=None,
+        ge=1,
+        le=65535,
+        description=(
+            "Papertrail port number. Required when external_logging_provider is 'papertrail'. "
+            "Typically 12345 or in range 10000-32768."
+        ),
+    )
+    papertrail_program_name: str = Field(
+        default="bangui",
+        description=(
+            "Program name to include in Syslog messages sent to Papertrail. "
+            "Useful for filtering logs by program in Papertrail UI."
+        ),
+    )
+    elasticsearch_hosts: str | list[str] = Field(
+        default_factory=list,
+        description=(
+            "Elasticsearch host addresses. Can be comma-separated string or list. "
+            "Examples: 'http://elasticsearch:9200' or 'http://es1:9200,http://es2:9200'. "
+            "Required when external_logging_provider is 'elasticsearch'."
+        ),
+    )
+    elasticsearch_index_prefix: str = Field(
+        default="bangui",
+        description=(
+            "Prefix for Elasticsearch indices where logs are stored. "
+            "Final index names will be '{prefix}-{date}' or similar."
+        ),
+    )
+    elasticsearch_batch_size: int = Field(
+        default=10,
+        ge=1,
+        description=(
+            "Number of log documents to batch before sending to Elasticsearch. "
+            "Larger batches are more efficient but introduce slight latency."
+        ),
+    )
+
+    @field_validator("elasticsearch_hosts", mode="before")
+    @classmethod
+    def _normalize_elasticsearch_hosts(cls, value: str | list[str] | None) -> list[str]:
+        """Normalize elasticsearch_hosts from comma-separated string to list.
+
+        Args:
+            value: A comma-separated string or list of host URLs.
+
+        Returns:
+            A list of normalized host URLs.
+        """
+        if value is None or (isinstance(value, list) and len(value) == 0):
+            return []
+        if isinstance(value, str):
+            return [host.strip() for host in value.split(",") if host.strip()]
+        return value
+
    model_config = SettingsConfigDict(
        env_prefix="BANGUI_",
        env_file=".env",