Add startup health checks and /health/ready endpoint

- Add _run_startup_health_checks() function in fastapi_app.py
  - Check ffmpeg availability (warning)
  - Check DNS resolution for aniworld.to and api.themoviedb.org (warning)
  - Check anime_directory configuration and writability (error)
- Store startup checks in app.state for health endpoint access
- Add /health/ready endpoint for container orchestrators
  - Returns not_ready with 503 when critical failures present
  - Includes critical_failures list for debugging
- Update /health endpoint to include startup check results
  - Status reflects worst check (error > warning > ok)
- Document health check endpoints in DEVELOPMENT.md
- Add unit tests for startup health checks
- Add unit tests for /health/ready endpoint

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-23 22:12:03 +02:00
parent 9a20541598
commit 3551838887
5 changed files with 458 additions and 7 deletions

View File

@@ -165,6 +165,61 @@ scheduler = AsyncIOScheduler(jobstores=jobstores)
**If server is down >1 hour:** No automatic recovery. Manual trigger via `POST /api/scheduler/trigger-rescan` or wait for next scheduled run.
### Health Check Endpoints
The application provides health check endpoints for monitoring and container orchestration:
#### `GET /health`
Basic health check returning service status and startup health check results.
**Response fields:**
- `status`: "healthy", "degraded", or "unhealthy" based on startup checks
- `timestamp`: ISO timestamp of the check
- `series_app_initialized`: Whether the series app is loaded
- `anime_directory_configured`: Whether anime_directory is set
- `scheduler_next_run` / `scheduler_last_run`: Scheduler times
- `checks`: Detailed startup check results (ffmpeg, DNS, anime_directory)
#### `GET /health/ready`
Readiness check for container orchestrators (Kubernetes, Docker Swarm).
**Response when ready:**
```json
{
"status": "ready",
"ready": true,
"timestamp": "2024-01-01T00:00:00",
"checks": {...}
}
```
**Response when not ready (503):**
```json
{
"status": "not_ready",
"ready": false,
"timestamp": "2024-01-01T00:00:00",
"critical_failures": ["anime_directory: not configured"],
"checks": {...}
}
```
#### `GET /health/detailed`
Comprehensive health check including database, filesystem, and system metrics.
#### Startup Health Checks
On application startup, the following checks are performed:
| Check | Failure Status | Impact |
|-------|---------------|--------|
| `ffmpeg` | warning | HLS downloads may fail |
| `dns_aniworld` | warning | Provider requests may fail |
| `dns_tmdb` | warning | TMDB API calls may fail |
| `anime_directory` | error | Download service disabled |
DNS checks are warnings because failures can be transient. anime_directory errors disable the download service to prevent failures.
### Troubleshooting Development Issues
#### Scheduler missed a run
@@ -175,3 +230,21 @@ scheduler = AsyncIOScheduler(jobstores=jobstores)
4. Trigger manually: `POST /api/scheduler/trigger-rescan`
5. Monitor next run: `GET /health``scheduler_next_run`
6. If problem repeats, increase `misfire_grace_time` in `scheduler_service.py`.
#### Startup health check failures
If `/health` returns `unhealthy` status:
1. **anime_directory error**: Directory not configured or not writable
- Check `ANIME_DIRECTORY` environment variable
- Verify directory exists and permissions allow write access
- Download service will not initialize until resolved
2. **ffmpeg warning**: ffmpeg not found in PATH
- HLS stream downloads will fail
- Install ffmpeg: `apt install ffmpeg` or `brew install ffmpeg`
3. **DNS warnings**: Domain resolution failed
- Check network connectivity
- DNS failures are transient — warnings don't block startup
- Retry later to verify: `GET /health`