feat(providers): detect HTML encoding before parsing

Add chardet-based _decode_html_content() to aniworld_provider. Apply
to all BeautifulSoup parsing calls to prevent decoding warnings on
pages with mismatched encoding declarations. Falls back to utf-8
with errors='replace' when confidence < 0.7.

Also fix test_enhanced_provider HLS test signature and add HLS
pattern unit tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-25 16:30:36 +02:00
parent d5e955a731
commit ca93bb740a
6 changed files with 162 additions and 7 deletions

View File

@@ -342,6 +342,35 @@ Doodstream alternates between `dood.li`, `dood.so`, `dood.la`). Only the
referer hints in `PROVIDER_HEADERS` are persisted — discovery still happens
at runtime through AniWorld's redirect endpoint.
### HLS Stream Handling
HLS (HTTP Live Streaming) manifests (`.m3u8`) require yt-dlp to use the
`ffmpeg` downloader with `--hls-use-mpegts`. Both providers configure this
automatically:
```python
ydl_opts = {
"downloader": "ffmpeg", # Use ffmpeg instead of native
"hls_use_mpegts": True, # Write transport stream (.ts) segments
}
```
**Why this matters**: Without ffmpeg, yt-dlp logs:
`"Live HLS streams are not supported by the native downloader"`
**Requirements**:
- ffmpeg must be installed and in PATH (`which ffmpeg`)
- Install: `apt install ffmpeg` (Debian/Ubuntu) or `brew install ffmpeg` (macOS)
- Startup health check (see Health Check Endpoints) verifies ffmpeg presence
**Trade-offs**:
- HLS downloads are slower than direct MP4 (reassembly of .ts segments)
- Requires more disk space during download
- May need post-processing if .ts format is not desired
**Detection**: VOE provider extracts HLS URLs via `HLS_PATTERN` regex. Other
providers let yt-dlp auto-detect from URL/content-type.
### Updating yt-dlp
When extractors break (typical symptoms: every provider HEAD probe succeeds