fix(providers): rotate, probe and fall back on 404

Iterate providers actually advertised on the episode page (ordered by
SUPPORTED_PROVIDERS preference) instead of always re-resolving VOE.
Each candidate is HEAD-probed before yt-dlp runs, so dead links are
skipped immediately; direct video URLs use a streaming fast path that
bypasses yt-dlp; total failure now logs the exhausted provider list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-25 14:32:10 +02:00
parent c579235af0
commit a115215416
3 changed files with 629 additions and 24 deletions

View File

@@ -308,3 +308,64 @@ If `/health` returns `unhealthy` status:
- Check network connectivity
- DNS failures are transient — warnings don't block startup
- Retry later to verify: `GET /health`
### Provider Failure Handling
Download providers (VOE, Doodstream, Vidmoly, Vidoza, SpeedFiles, Streamtape,
Luluvdo) regularly break: URLs expire, sites change their player markup, geo
blocks appear, and `yt-dlp` extractors lag behind upstream changes. The
`AniworldLoader.download()` flow is designed to fail fast and rotate.
**Rotation order**
1. The episode page is scraped for the providers AniWorld actually advertises.
2. Results are ordered by the preference in `DEFAULT_PROVIDERS`
(`provider_config.py`); providers not listed run last.
3. For each candidate the loader:
1. Calls `_check_url_alive()` — HEAD probe with GET fallback. Any 4xx
response or connection error skips the provider immediately.
2. Resolves the redirect via `_resolve_direct_link()` to obtain a direct
stream URL plus headers. Provider-specific extractors (e.g. `VOE`) are
preferred; unknown providers fall back to the embed URL so `yt-dlp` can
attempt extraction.
3. Tries `_try_direct_stream()` — straight `requests.get(stream=True)` when
`Content-Type` is `video/*` or `application/octet-stream`. This avoids
`yt-dlp` entirely for direct MP4 links.
4. Falls back to `yt-dlp` with the ffmpeg downloader for HLS streams.
4. On any failure, temp files are cleaned and the loop moves to the next
provider. When the chain is exhausted, the loader logs
`All download providers failed for S{season}E{episode} ...; tried=[...]`
to both the application log and `logs/download_errors.log`.
**Do not hardcode provider URLs.** Provider domains shift constantly (e.g.
Doodstream alternates between `dood.li`, `dood.so`, `dood.la`). Only the
referer hints in `PROVIDER_HEADERS` are persisted — discovery still happens
at runtime through AniWorld's redirect endpoint.
### Updating yt-dlp
When extractors break (typical symptoms: every provider HEAD probe succeeds
but `yt-dlp` raises `Unable to extract` or `HTTP Error 404`):
1. Check the upstream tracker first: https://github.com/yt-dlp/yt-dlp/issues
2. Upgrade in the conda environment:
```bash
conda run -n AniWorld pip install --upgrade yt-dlp
```
3. Smoke-test against a known-good episode before pinning a new floor in
`requirements.txt` (`yt-dlp>=YYYY.MM.DD`).
4. Re-run the provider test suite:
```bash
conda run -n AniWorld python -m pytest tests/unit/test_aniworld_provider.py -v
```
5. If a specific extractor is removed upstream, drop the provider from
`DEFAULT_PROVIDERS` rather than patching `yt-dlp` in tree.
### User Notification on Total Failure
`SeriesApp.download_episode()` already emits a `download_status="failed"`
WebSocket event when `loader.download()` returns `False`. Operators should
forward this to `notification_service.notify_download_failed()` so users see
a HIGH-priority alert. The loader keeps the failure detail in
`logs/download_errors.log` for post-mortem.