Add startup health checks and /health/ready endpoint

- Add _run_startup_health_checks() function in fastapi_app.py
  - Check ffmpeg availability (warning)
  - Check DNS resolution for aniworld.to and api.themoviedb.org (warning)
  - Check anime_directory configuration and writability (error)
- Store startup checks in app.state for health endpoint access
- Add /health/ready endpoint for container orchestrators
  - Returns not_ready with 503 when critical failures present
  - Includes critical_failures list for debugging
- Update /health endpoint to include startup check results
  - Status reflects worst check (error > warning > ok)
- Document health check endpoints in DEVELOPMENT.md
- Add unit tests for startup health checks
- Add unit tests for /health/ready endpoint

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-23 22:12:03 +02:00
parent 9a20541598
commit 3551838887
5 changed files with 458 additions and 7 deletions

View File

@@ -165,6 +165,61 @@ scheduler = AsyncIOScheduler(jobstores=jobstores)
**If server is down >1 hour:** No automatic recovery. Manual trigger via `POST /api/scheduler/trigger-rescan` or wait for next scheduled run.
### Health Check Endpoints
The application provides health check endpoints for monitoring and container orchestration:
#### `GET /health`
Basic health check returning service status and startup health check results.
**Response fields:**
- `status`: "healthy", "degraded", or "unhealthy" based on startup checks
- `timestamp`: ISO timestamp of the check
- `series_app_initialized`: Whether the series app is loaded
- `anime_directory_configured`: Whether anime_directory is set
- `scheduler_next_run` / `scheduler_last_run`: Scheduler times
- `checks`: Detailed startup check results (ffmpeg, DNS, anime_directory)
#### `GET /health/ready`
Readiness check for container orchestrators (Kubernetes, Docker Swarm).
**Response when ready:**
```json
{
"status": "ready",
"ready": true,
"timestamp": "2024-01-01T00:00:00",
"checks": {...}
}
```
**Response when not ready (503):**
```json
{
"status": "not_ready",
"ready": false,
"timestamp": "2024-01-01T00:00:00",
"critical_failures": ["anime_directory: not configured"],
"checks": {...}
}
```
#### `GET /health/detailed`
Comprehensive health check including database, filesystem, and system metrics.
#### Startup Health Checks
On application startup, the following checks are performed:
| Check | Failure Status | Impact |
|-------|---------------|--------|
| `ffmpeg` | warning | HLS downloads may fail |
| `dns_aniworld` | warning | Provider requests may fail |
| `dns_tmdb` | warning | TMDB API calls may fail |
| `anime_directory` | error | Download service disabled |
DNS checks are warnings because failures can be transient. anime_directory errors disable the download service to prevent failures.
### Troubleshooting Development Issues
#### Scheduler missed a run
@@ -175,3 +230,21 @@ scheduler = AsyncIOScheduler(jobstores=jobstores)
4. Trigger manually: `POST /api/scheduler/trigger-rescan`
5. Monitor next run: `GET /health``scheduler_next_run`
6. If problem repeats, increase `misfire_grace_time` in `scheduler_service.py`.
#### Startup health check failures
If `/health` returns `unhealthy` status:
1. **anime_directory error**: Directory not configured or not writable
- Check `ANIME_DIRECTORY` environment variable
- Verify directory exists and permissions allow write access
- Download service will not initialize until resolved
2. **ffmpeg warning**: ffmpeg not found in PATH
- HLS stream downloads will fail
- Install ffmpeg: `apt install ffmpeg` or `brew install ffmpeg`
3. **DNS warnings**: Domain resolution failed
- Check network connectivity
- DNS failures are transient — warnings don't block startup
- Retry later to verify: `GET /health`

View File

@@ -5,7 +5,7 @@ from datetime import datetime
from typing import Any, Dict, Optional
import psutil
from fastapi import APIRouter, Depends, HTTPException
from fastapi import APIRouter, Depends, HTTPException, Request
from pydantic import BaseModel
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession
@@ -28,6 +28,7 @@ class HealthStatus(BaseModel):
anime_directory_configured: bool = False
scheduler_next_run: Optional[str] = None
scheduler_last_run: Optional[str] = None
checks: Optional[Dict[str, Any]] = None
class DatabaseHealth(BaseModel):
@@ -173,13 +174,14 @@ def get_system_metrics() -> SystemMetrics:
@router.get("", response_model=HealthStatus)
async def basic_health_check() -> HealthStatus:
async def basic_health_check(request: Request) -> HealthStatus:
"""Basic health check endpoint.
This endpoint does not depend on anime_directory configuration
and should always return 200 OK for basic health monitoring.
Includes service information for identification.
Includes scheduler next/last run times for monitoring tools.
Includes startup health check results.
Returns:
HealthStatus: Simple health status with timestamp and service info.
@@ -195,18 +197,67 @@ async def basic_health_check() -> HealthStatus:
except Exception:
pass
# Get startup checks from app state
checks = getattr(request.app.state, "startup_checks", None)
# Determine overall status based on checks
overall_status = "healthy"
if checks:
for check_name, check_data in checks.items():
if check_data.get("status") == "error":
overall_status = "unhealthy"
break
elif check_data.get("status") == "warning":
overall_status = "degraded"
logger.debug("Basic health check requested")
return HealthStatus(
status="healthy",
status=overall_status,
timestamp=datetime.now().isoformat(),
service="aniworld-api",
series_app_initialized=_series_app is not None,
anime_directory_configured=bool(settings.anime_directory),
scheduler_next_run=scheduler_status.get("next_run"),
scheduler_last_run=scheduler_status.get("last_run"),
checks=checks,
)
@router.get("/ready")
async def ready_check(request: Request) -> Dict[str, Any]:
"""Readiness check endpoint for container orchestrators.
Returns 503 if critical dependencies are not available.
This endpoint is used by Kubernetes, Docker Swarm, etc. to determine
if the container should receive traffic.
Returns:
dict: Readiness status with checks details.
"""
checks = getattr(request.app.state, "startup_checks", {})
critical_failures = []
for check_name, check_data in checks.items():
if check_data.get("status") == "error":
critical_failures.append(f"{check_name}: {check_data.get('message')}")
if critical_failures:
return {
"status": "not_ready",
"ready": False,
"timestamp": datetime.now().isoformat(),
"critical_failures": critical_failures,
"checks": checks,
}
return {
"status": "ready",
"ready": True,
"timestamp": datetime.now().isoformat(),
"checks": checks,
}
@router.get("/detailed", response_model=DetailedHealthStatus)
async def detailed_health_check(
db: AsyncSession = Depends(get_database_session),

View File

@@ -104,6 +104,107 @@ async def _check_incomplete_series_on_startup(background_loader) -> None:
logger.exception("Failed to check incomplete series on startup")
async def _run_startup_health_checks(logger) -> dict:
"""Run startup health checks for critical dependencies.
Checks:
- ffmpeg availability
- DNS resolution for aniworld.to and api.themoviedb.org
- anime_directory configuration and writability
Args:
logger: Logger instance for recording check results.
Returns:
dict: Health check results with status and details for each check.
"""
import asyncio
import shutil
import socket
from typing import Dict, Any
checks: Dict[str, Any] = {
"ffmpeg": {"status": "unknown", "message": None},
"dns_aniworld": {"status": "unknown", "message": None},
"dns_tmdb": {"status": "unknown", "message": None},
"anime_directory": {"status": "unknown", "message": None, "path": None},
}
# Check ffmpeg availability
try:
ffmpeg_path = shutil.which("ffmpeg")
if ffmpeg_path:
checks["ffmpeg"]["status"] = "ok"
checks["ffmpeg"]["message"] = f"Found at {ffmpeg_path}"
logger.debug("ffmpeg health check passed: %s", ffmpeg_path)
else:
checks["ffmpeg"]["status"] = "warning"
checks["ffmpeg"]["message"] = "ffmpeg not found in PATH"
logger.warning("ffmpeg health check failed: not in PATH")
except Exception as e:
checks["ffmpeg"]["status"] = "error"
checks["ffmpeg"]["message"] = str(e)
logger.warning("Could not check ffmpeg: %s", e)
# Check DNS resolution for aniworld.to
try:
socket.gethostbyname("aniworld.to")
checks["dns_aniworld"]["status"] = "ok"
checks["dns_aniworld"]["message"] = "Resolved successfully"
logger.debug("DNS health check passed for aniworld.to")
except socket.gaierror as e:
checks["dns_aniworld"]["status"] = "warning"
checks["dns_aniworld"]["message"] = f"DNS resolution failed: {e}"
logger.warning("DNS health check failed for aniworld.to: %s", e)
except Exception as e:
checks["dns_aniworld"]["status"] = "warning"
checks["dns_aniworld"]["message"] = f"Unexpected error: {e}"
logger.warning("Unexpected DNS error for aniworld.to: %s", e)
# Check DNS resolution for api.themoviedb.org
try:
socket.gethostbyname("api.themoviedb.org")
checks["dns_tmdb"]["status"] = "ok"
checks["dns_tmdb"]["message"] = "Resolved successfully"
logger.debug("DNS health check passed for api.themoviedb.org")
except socket.gaierror as e:
checks["dns_tmdb"]["status"] = "warning"
checks["dns_tmdb"]["message"] = f"DNS resolution failed: {e}"
logger.warning("DNS health check failed for api.themoviedb.org: %s", e)
except Exception as e:
checks["dns_tmdb"]["status"] = "warning"
checks["dns_tmdb"]["message"] = f"Unexpected error: {e}"
logger.warning("Unexpected DNS error for api.themoviedb.org: %s", e)
# Check anime_directory configuration and writability
from src.config.settings import settings
anime_dir = settings.anime_directory
if not anime_dir:
checks["anime_directory"]["status"] = "error"
checks["anime_directory"]["message"] = "anime_directory not configured"
checks["anime_directory"]["path"] = None
logger.error("anime_directory health check failed: not configured")
else:
import os
checks["anime_directory"]["path"] = anime_dir
if not os.path.isdir(anime_dir):
checks["anime_directory"]["status"] = "error"
checks["anime_directory"]["message"] = f"Directory does not exist: {anime_dir}"
logger.error("anime_directory health check failed: %s does not exist", anime_dir)
elif not os.access(anime_dir, os.W_OK):
checks["anime_directory"]["status"] = "error"
checks["anime_directory"]["message"] = f"Directory not writable: {anime_dir}"
logger.error("anime_directory health check failed: %s not writable", anime_dir)
else:
checks["anime_directory"]["status"] = "ok"
checks["anime_directory"]["message"] = f"Directory exists and is writable: {anime_dir}"
logger.debug("anime_directory health check passed: %s", anime_dir)
return checks
@asynccontextmanager
async def lifespan(_application: FastAPI):
"""Manage application lifespan (startup and shutdown).
@@ -342,6 +443,14 @@ async def lifespan(_application: FastAPI):
logger.debug("ffmpeg found at: %s", _shutil.which("ffmpeg"))
except Exception as _exc:
logger.warning("Could not check for ffmpeg: %s", _exc)
# Run startup health checks and store results for /health endpoint
try:
startup_checks = await _run_startup_health_checks(logger)
app.state.startup_checks = startup_checks
except Exception as _exc:
logger.warning("Could not run startup health checks: %s", _exc)
app.state.startup_checks = {}
except Exception as e:
logger.error("Error during startup: %s", e, exc_info=True)
startup_error = e

View File

@@ -1,6 +1,6 @@
"""Unit tests for health check endpoints."""
from unittest.mock import AsyncMock, patch
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
@@ -12,16 +12,20 @@ from src.server.api.health import (
check_database_health,
check_filesystem_health,
get_system_metrics,
ready_check,
)
@pytest.mark.asyncio
async def test_basic_health_check():
"""Test basic health check endpoint."""
async def test_basic_health_check_no_startup_checks():
"""Test basic health check endpoint with no startup checks."""
mock_request = MagicMock()
mock_request.app.state.startup_checks = {}
with patch("src.config.settings.settings") as mock_settings, \
patch("src.server.utils.dependencies._series_app", None):
mock_settings.anime_directory = ""
result = await basic_health_check()
result = await basic_health_check(mock_request)
assert isinstance(result, HealthStatus)
assert result.status == "healthy"
@@ -32,6 +36,85 @@ async def test_basic_health_check():
assert result.anime_directory_configured is False
@pytest.mark.asyncio
async def test_basic_health_check_with_error_check():
"""Test basic health check reflects error status from startup checks."""
mock_request = MagicMock()
mock_request.app.state.startup_checks = {
"anime_directory": {"status": "error", "message": "not configured", "path": None},
"ffmpeg": {"status": "ok", "message": "Found at /usr/bin/ffmpeg"},
"dns_aniworld": {"status": "ok", "message": "Resolved successfully"},
"dns_tmdb": {"status": "ok", "message": "Resolved successfully"},
}
with patch("src.config.settings.settings") as mock_settings, \
patch("src.server.utils.dependencies._series_app", None):
mock_settings.anime_directory = ""
result = await basic_health_check(mock_request)
assert isinstance(result, HealthStatus)
assert result.status == "unhealthy"
assert result.checks is not None
assert result.checks["anime_directory"]["status"] == "error"
@pytest.mark.asyncio
async def test_basic_health_check_with_warning_only():
"""Test basic health check shows degraded when only warnings present."""
mock_request = MagicMock()
mock_request.app.state.startup_checks = {
"anime_directory": {"status": "ok", "message": "Found", "path": "/anime"},
"ffmpeg": {"status": "warning", "message": "not found in PATH"},
"dns_aniworld": {"status": "ok", "message": "Resolved successfully"},
"dns_tmdb": {"status": "ok", "message": "Resolved successfully"},
}
with patch("src.config.settings.settings") as mock_settings, \
patch("src.server.utils.dependencies._series_app", None):
mock_settings.anime_directory = "/anime"
result = await basic_health_check(mock_request)
assert isinstance(result, HealthStatus)
assert result.status == "degraded"
@pytest.mark.asyncio
async def test_ready_check_all_healthy():
"""Test ready check returns ready when all checks pass."""
mock_request = MagicMock()
mock_request.app.state.startup_checks = {
"anime_directory": {"status": "ok", "message": "Found", "path": "/anime"},
"ffmpeg": {"status": "ok", "message": "Found at /usr/bin/ffmpeg"},
"dns_aniworld": {"status": "ok", "message": "Resolved successfully"},
"dns_tmdb": {"status": "ok", "message": "Resolved successfully"},
}
result = await ready_check(mock_request)
assert result["ready"] is True
assert result["status"] == "ready"
assert "critical_failures" not in result
@pytest.mark.asyncio
async def test_ready_check_with_critical_failure():
"""Test ready check returns not_ready when anime_directory not configured."""
mock_request = MagicMock()
mock_request.app.state.startup_checks = {
"anime_directory": {"status": "error", "message": "not configured", "path": None},
"ffmpeg": {"status": "warning", "message": "not found in PATH"},
"dns_aniworld": {"status": "ok", "message": "Resolved successfully"},
"dns_tmdb": {"status": "ok", "message": "Resolved successfully"},
}
result = await ready_check(mock_request)
assert result["ready"] is False
assert result["status"] == "not_ready"
assert len(result["critical_failures"]) == 1
assert "anime_directory" in result["critical_failures"][0]
@pytest.mark.asyncio
async def test_database_health_check_success():
"""Test database health check with successful connection."""

View File

@@ -0,0 +1,135 @@
"""Unit tests for startup health checks in fastapi_app.py."""
from unittest.mock import MagicMock, patch
import pytest
class TestStartupHealthChecks:
"""Test startup health check function."""
@pytest.mark.asyncio
async def test_ffmpeg_missing_sets_warning(self):
"""Test ffmpeg missing results in warning status."""
mock_logger = MagicMock()
with patch("shutil.which", return_value=None):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["ffmpeg"]["status"] == "warning"
assert "not found in PATH" in result["ffmpeg"]["message"]
@pytest.mark.asyncio
async def test_ffmpeg_present_sets_ok(self):
"""Test ffmpeg present results in ok status."""
mock_logger = MagicMock()
with patch("shutil.which", return_value="/usr/bin/ffmpeg"):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["ffmpeg"]["status"] == "ok"
assert "Found at" in result["ffmpeg"]["message"]
@pytest.mark.asyncio
async def test_anime_directory_not_configured_sets_error(self):
"""Test anime_directory not configured results in error status."""
mock_logger = MagicMock()
with patch("src.config.settings.settings") as mock_settings:
mock_settings.anime_directory = ""
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["anime_directory"]["status"] == "error"
assert result["anime_directory"]["path"] is None
assert "not configured" in result["anime_directory"]["message"]
@pytest.mark.asyncio
async def test_anime_directory_not_exists_sets_error(self):
"""Test anime_directory path not existing results in error status."""
mock_logger = MagicMock()
with patch("src.config.settings.settings") as mock_settings:
mock_settings.anime_directory = "/nonexistent/path"
with patch("os.path.isdir", return_value=False):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["anime_directory"]["status"] == "error"
assert "does not exist" in result["anime_directory"]["message"]
@pytest.mark.asyncio
async def test_anime_directory_not_writable_sets_error(self):
"""Test anime_directory not writable results in error status."""
mock_logger = MagicMock()
with patch("src.config.settings.settings") as mock_settings:
mock_settings.anime_directory = "/some/path"
with patch("os.path.isdir", return_value=True):
with patch("os.access", return_value=False):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["anime_directory"]["status"] == "error"
assert "not writable" in result["anime_directory"]["message"]
@pytest.mark.asyncio
async def test_anime_directory_ok_when_writable(self):
"""Test anime_directory exists and writable results in ok status."""
mock_logger = MagicMock()
with patch("src.config.settings.settings") as mock_settings:
mock_settings.anime_directory = "/valid/path"
with patch("os.path.isdir", return_value=True):
with patch("os.access", return_value=True):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["anime_directory"]["status"] == "ok"
@pytest.mark.asyncio
async def test_dns_aniworld_failure_sets_warning(self):
"""Test DNS failure for aniworld.to sets warning status."""
mock_logger = MagicMock()
import socket
with patch("socket.gethostbyname", side_effect=socket.gaierror("DNS failed")):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["dns_aniworld"]["status"] == "warning"
assert "DNS resolution failed" in result["dns_aniworld"]["message"]
@pytest.mark.asyncio
async def test_dns_tmdb_failure_sets_warning(self):
"""Test DNS failure for api.themoviedb.org sets warning status."""
mock_logger = MagicMock()
import socket
with patch("socket.gethostbyname", side_effect=socket.gaierror("DNS failed")):
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert result["dns_tmdb"]["status"] == "warning"
@pytest.mark.asyncio
async def test_all_checks_returned(self):
"""Test all health checks are present in result."""
mock_logger = MagicMock()
with patch("src.config.settings.settings") as mock_settings:
mock_settings.anime_directory = ""
from src.server.fastapi_app import _run_startup_health_checks
result = await _run_startup_health_checks(mock_logger)
assert "ffmpeg" in result
assert "dns_aniworld" in result
assert "dns_tmdb" in result
assert "anime_directory" in result