get_async_session_factory() returns session directly, not factory. Calling result again with () caused 'AsyncSession' object is not callable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
443 lines
16 KiB
Markdown
443 lines
16 KiB
Markdown
# Development Guide
|
|
|
|
## Document Purpose
|
|
|
|
This document provides guidance for developers working on the Aniworld project.
|
|
|
|
### What This Document Contains
|
|
|
|
- **Prerequisites**: Required software and tools
|
|
- **Environment Setup**: Step-by-step local development setup
|
|
- **Project Structure**: Source code organization explanation
|
|
- **Development Workflow**: Branch strategy, commit conventions
|
|
- **Coding Standards**: Style guide, linting, formatting
|
|
- **Running the Application**: Development server, CLI usage
|
|
- **Debugging Tips**: Common debugging approaches
|
|
- **IDE Configuration**: VS Code settings, recommended extensions
|
|
- **Contributing Guidelines**: How to submit changes
|
|
- **Code Review Process**: Review checklist and expectations
|
|
|
|
### What This Document Does NOT Contain
|
|
|
|
- Production deployment (see [DEPLOYMENT.md](DEPLOYMENT.md))
|
|
- API reference (see [API.md](API.md))
|
|
- Architecture decisions (see [ARCHITECTURE.md](ARCHITECTURE.md))
|
|
- Test writing guides (see [TESTING.md](TESTING.md))
|
|
- Security guidelines (see [SECURITY.md](SECURITY.md))
|
|
|
|
### Target Audience
|
|
|
|
- New Developers joining the project
|
|
- Contributors (internal and external)
|
|
- Anyone setting up a development environment
|
|
|
|
---
|
|
|
|
## Sections to Document
|
|
|
|
1. Prerequisites
|
|
- Python version
|
|
- Conda environment
|
|
- Node.js (if applicable)
|
|
- Git
|
|
2. Getting Started
|
|
- Clone repository
|
|
- Setup conda environment
|
|
- Install dependencies
|
|
- Configuration setup
|
|
3. Project Structure Overview
|
|
4. Development Server
|
|
- Starting FastAPI server
|
|
- Hot reload configuration
|
|
- Debug mode
|
|
5. CLI Development
|
|
6. Code Style
|
|
- PEP 8 compliance
|
|
- Type hints requirements
|
|
- Docstring format
|
|
- Import organization
|
|
7. Git Workflow
|
|
- Branch naming
|
|
- Commit message format
|
|
- Pull request process
|
|
8. Common Development Tasks
|
|
|
|
### Adding Queue Deduplication
|
|
|
|
The download queue prevents duplicate entries at two levels:
|
|
|
|
**In-Memory Deduplication** (`src/server/services/download_service.py`):
|
|
- `_pending_by_episode` dict tracks pending episodes: key = `(serie_id, season, episode)`
|
|
- `_add_to_pending_queue()` updates the dict when adding items
|
|
- `add_to_queue()` checks this dict before adding episodes (includes batch-local dedup)
|
|
- `_remove_from_pending_queue()` cleans up the dict when items are removed
|
|
|
|
**Database Constraint** (`src/server/models.py`):
|
|
- `DownloadQueueItem` has a unique index on `episode_id` via `__table_args__`
|
|
- Prevents duplicate queue entries at the database level
|
|
- Unique constraint: `Index("ix_download_queue_episode_pending", "episode_id", unique=True)`
|
|
|
|
**Scheduler Cooldown** (`src/server/services/scheduler_service.py`):
|
|
- `_last_auto_download_time` tracks when auto-download last ran
|
|
- 5-minute cooldown prevents rapid re-triggers
|
|
- Checked at start of `_auto_download_missing()`
|
|
|
|
### Episode Lifecycle
|
|
|
|
Episodes transition through states stored in the `episodes` table:
|
|
|
|
| State | `is_downloaded` | `file_path` | Description |
|
|
|-------|----------------|-------------|-------------|
|
|
| Missing | `False` | `NULL` | Episode not yet downloaded |
|
|
| Downloaded | `True` | Set | Episode exists on disk |
|
|
|
|
**State Transitions:**
|
|
1. **Missing → Downloaded**: When download completes, `_remove_episode_from_missing_list()` calls `EpisodeService.mark_downloaded()` to set `is_downloaded=True` and populate `file_path`. The episode record is NOT deleted.
|
|
|
|
**Query Implications:**
|
|
- `get_series_with_missing_episodes()`: Filters for `is_downloaded=False` to find series with undownloaded episodes
|
|
- `get_series_with_no_episodes()`: Finds series with `is_downloaded=False` episodes but NO `is_downloaded=True` episodes (completely unwatched series)
|
|
|
|
### Mocking the Download Queue
|
|
|
|
When testing components that use the download queue:
|
|
|
|
```python
|
|
# Mock repository for unit tests
|
|
class MockQueueRepository:
|
|
def __init__(self):
|
|
self._items: Dict[str, DownloadItem] = {}
|
|
|
|
async def save_item(self, item: DownloadItem) -> DownloadItem:
|
|
self._items[item.id] = item
|
|
return item
|
|
|
|
async def get_all_items(self) -> List[DownloadItem]:
|
|
return list(self._items.values())
|
|
|
|
# Use in fixture
|
|
@pytest.fixture
|
|
def mock_queue_repository():
|
|
return MockQueueRepository()
|
|
|
|
@pytest.fixture
|
|
def download_service(mock_anime_service, mock_queue_repository):
|
|
return DownloadService(
|
|
anime_service=mock_anime_service,
|
|
queue_repository=mock_queue_repository,
|
|
max_retries=3,
|
|
)
|
|
```
|
|
|
|
9. Troubleshooting Development Issues
|
|
|
|
### Async Context Managers for aiohttp
|
|
|
|
All `aiohttp.ClientSession` usages must be wrapped in `async with`:
|
|
|
|
```python
|
|
# Correct — session properly closed on exit
|
|
async with TMDBClient(api_key="key") as client:
|
|
result = await client.search_tv_show("Show")
|
|
|
|
# Wrong — session may leak if exception occurs
|
|
client = TMDBClient(api_key="key")
|
|
result = await client.search_tv_show("Show")
|
|
await client.close() # May not be called if exception raised earlier
|
|
```
|
|
|
|
**Why:**
|
|
- `aiohttp.ClientSession` holds TCP connections that must be explicitly closed
|
|
- If exception occurs before `close()`, session leaks
|
|
- Context manager guarantees `__aexit__` runs even on exceptions
|
|
|
|
**Services that use aiohttp:**
|
|
- `TMDBClient` — has `__aenter__`/`__aexit__`, use `async with`
|
|
- `ImageDownloader` — has `__aenter__`/`__aexit__`, use `async with`
|
|
- `NFOService` — wraps both above, use `async with`
|
|
|
|
**Verification:**
|
|
- Missing context manager usage triggers `__del__` warning on garbage collection
|
|
- Integration tests verify no "Unclosed client session" errors in logs
|
|
|
|
### Scheduler Persistence and Recovery
|
|
|
|
APScheduler stores jobs in `data/scheduler.db` (SQLite) so they survive process restarts:
|
|
|
|
```python
|
|
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
|
|
|
|
jobstores = {
|
|
"default": SQLAlchemyJobStore(url="sqlite:///./data/scheduler.db"),
|
|
}
|
|
scheduler = AsyncIOScheduler(jobstores=jobstores)
|
|
```
|
|
|
|
**Grace period:** `misfire_grace_time=3600` (1 hour). If server is down at scheduled time and restarts within 1 hour, missed job runs automatically via APScheduler coalesce behavior.
|
|
|
|
**Startup recovery:** On `start()`, scheduler loads persisted jobs from DB. APScheduler handles missed jobs internally when `coalesce=True`.
|
|
|
|
**Health endpoint:** `GET /health` returns `scheduler_next_run` and `scheduler_last_run` for external monitors (Uptime Kuma, Prometheus, etc.).
|
|
|
|
**If server is down >1 hour:** No automatic recovery. Manual trigger via `POST /api/scheduler/trigger-rescan` or wait for next scheduled run.
|
|
|
|
### Database Session Management
|
|
|
|
`get_async_session_factory()` returns a **new AsyncSession instance** directly (not a factory). The function name is historical — callers receive the session immediately:
|
|
|
|
```python
|
|
# Correct usage:
|
|
db = get_async_session_factory() # db IS the session
|
|
await db.execute(...)
|
|
await db.commit()
|
|
await db.close()
|
|
```
|
|
|
|
Do NOT call the result again with `()` — that tries to call an `AsyncSession` object, causing `'AsyncSession' object is not callable`.
|
|
|
|
For context manager usage, prefer `get_db_session()` (auto-commits) or `get_transactional_session()` (manual commit).
|
|
|
|
### Health Check Endpoints
|
|
|
|
The application provides health check endpoints for monitoring and container orchestration:
|
|
|
|
#### `GET /health`
|
|
Basic health check returning service status and startup health check results.
|
|
|
|
**Response fields:**
|
|
- `status`: "healthy", "degraded", or "unhealthy" based on startup checks
|
|
- `timestamp`: ISO timestamp of the check
|
|
- `series_app_initialized`: Whether the series app is loaded
|
|
- `anime_directory_configured`: Whether anime_directory is set
|
|
- `scheduler_next_run` / `scheduler_last_run`: Scheduler times
|
|
- `checks`: Detailed startup check results (ffmpeg, DNS, anime_directory)
|
|
|
|
#### `GET /health/ready`
|
|
Readiness check for container orchestrators (Kubernetes, Docker Swarm).
|
|
|
|
**Response when ready:**
|
|
```json
|
|
{
|
|
"status": "ready",
|
|
"ready": true,
|
|
"timestamp": "2024-01-01T00:00:00",
|
|
"checks": {...}
|
|
}
|
|
```
|
|
|
|
**Response when not ready (503):**
|
|
```json
|
|
{
|
|
"status": "not_ready",
|
|
"ready": false,
|
|
"timestamp": "2024-01-01T00:00:00",
|
|
"critical_failures": ["anime_directory: not configured"],
|
|
"checks": {...}
|
|
}
|
|
```
|
|
|
|
#### `GET /health/detailed`
|
|
Comprehensive health check including database, filesystem, and system metrics.
|
|
|
|
#### Startup Health Checks
|
|
|
|
On application startup, the following checks are performed:
|
|
|
|
| Check | Failure Status | Impact |
|
|
|-------|---------------|--------|
|
|
| `ffmpeg` | warning | HLS downloads may fail |
|
|
| `dns_aniworld` | warning | Provider requests may fail |
|
|
| `dns_tmdb` | warning | TMDB API calls may fail |
|
|
| `anime_directory` | error | Download service disabled |
|
|
|
|
DNS checks are warnings because failures can be transient. anime_directory errors disable the download service to prevent failures.
|
|
|
|
### Troubleshooting Development Issues
|
|
|
|
#### Scheduler missed a run
|
|
|
|
1. Server was down at scheduled time (03:00 UTC by default).
|
|
2. Check `data/scheduler.db` exists — if not, jobs are not persisted.
|
|
3. If server was down >1 hour, missed job is dropped (misfire window exceeded).
|
|
4. Trigger manually: `POST /api/scheduler/trigger-rescan`
|
|
5. Monitor next run: `GET /health` → `scheduler_next_run`
|
|
6. If problem repeats, increase `misfire_grace_time` in `scheduler_service.py`.
|
|
|
|
#### Scheduler not firing (no events at scheduled time)
|
|
|
|
If the scheduler appears configured but never triggers:
|
|
|
|
1. **Verify scheduler.db contains the job:**
|
|
```bash
|
|
sqlite3 data/scheduler.db "SELECT id, next_run_time FROM apscheduler_jobs;"
|
|
```
|
|
- `next_run_time` should be in the future
|
|
- If it's in the past, the server was down when the job should have fired
|
|
|
|
2. **Check application logs for scheduler startup:**
|
|
```
|
|
grep "Scheduler service started" fastapi_app.log
|
|
```
|
|
- If missing, the scheduler failed to start — check for errors above this line
|
|
- If present, scheduler started successfully
|
|
|
|
3. **Verify APScheduler events in logs:**
|
|
```
|
|
grep "apscheduler.executors.default" fastapi_app.log
|
|
```
|
|
- `Running job` = job triggered
|
|
- `executed successfully` = job completed
|
|
- No output = job never fired
|
|
|
|
4. **Test manual trigger:**
|
|
```bash
|
|
curl -X POST http://localhost:8000/api/scheduler/trigger-rescan -H "Authorization: Bearer <token>"
|
|
```
|
|
- If manual trigger works but cron doesn't, the issue is APScheduler configuration
|
|
|
|
5. **Check next_run_time via health endpoint:**
|
|
```bash
|
|
curl http://localhost:8000/health | jq .scheduler_next_run
|
|
```
|
|
- If `null`, the job is not scheduled
|
|
- If set, the scheduler knows when to run next
|
|
|
|
6. **Check timezone handling:**
|
|
- APScheduler uses UTC internally
|
|
- The schedule_time config (e.g., "03:00") is interpreted as UTC
|
|
- If you expect local time, adjust the schedule_time accordingly
|
|
|
|
#### Startup health check failures
|
|
|
|
If `/health` returns `unhealthy` status:
|
|
|
|
1. **anime_directory error**: Directory not configured or not writable
|
|
- Check `ANIME_DIRECTORY` environment variable
|
|
- Verify directory exists and permissions allow write access
|
|
- Download service will not initialize until resolved
|
|
|
|
2. **ffmpeg warning**: ffmpeg not found in PATH
|
|
- HLS stream downloads will fail
|
|
- Install ffmpeg: `apt install ffmpeg` or `brew install ffmpeg`
|
|
|
|
3. **DNS warnings**: Domain resolution failed
|
|
- Check network connectivity
|
|
- DNS failures are transient — warnings don't block startup
|
|
- Retry later to verify: `GET /health`
|
|
|
|
### Provider Failure Handling
|
|
|
|
Download providers (VOE, Doodstream, Vidmoly, Vidoza, SpeedFiles, Streamtape,
|
|
Luluvdo) regularly break: URLs expire, sites change their player markup, geo
|
|
blocks appear, and `yt-dlp` extractors lag behind upstream changes. The
|
|
`AniworldLoader.download()` flow is designed to fail fast and rotate.
|
|
|
|
**Rotation order**
|
|
|
|
1. The episode page is scraped for the providers AniWorld actually advertises.
|
|
2. Results are ordered by the preference in `DEFAULT_PROVIDERS`
|
|
(`provider_config.py`); providers not listed run last.
|
|
3. For each candidate the loader:
|
|
1. Calls `_check_url_alive()` — HEAD probe with GET fallback. Any 4xx
|
|
response or connection error skips the provider immediately.
|
|
2. Resolves the redirect via `_resolve_direct_link()` to obtain a direct
|
|
stream URL plus headers. Provider-specific extractors (e.g. `VOE`) are
|
|
preferred; unknown providers fall back to the embed URL so `yt-dlp` can
|
|
attempt extraction.
|
|
3. Tries `_try_direct_stream()` — straight `requests.get(stream=True)` when
|
|
`Content-Type` is `video/*` or `application/octet-stream`. This avoids
|
|
`yt-dlp` entirely for direct MP4 links.
|
|
4. Falls back to `yt-dlp` with the ffmpeg downloader for HLS streams.
|
|
4. On any failure, temp files are cleaned and the loop moves to the next
|
|
provider. When the chain is exhausted, the loader logs
|
|
`All download providers failed for S{season}E{episode} ...; tried=[...]`
|
|
to both the application log and `logs/download_errors.log`.
|
|
|
|
**Do not hardcode provider URLs.** Provider domains shift constantly (e.g.
|
|
Doodstream alternates between `dood.li`, `dood.so`, `dood.la`). Only the
|
|
referer hints in `PROVIDER_HEADERS` are persisted — discovery still happens
|
|
at runtime through AniWorld's redirect endpoint.
|
|
|
|
### HLS Stream Handling
|
|
|
|
HLS (HTTP Live Streaming) manifests (`.m3u8`) require yt-dlp to use the
|
|
`ffmpeg` downloader with `--hls-use-mpegts`. Both providers configure this
|
|
automatically:
|
|
|
|
```python
|
|
ydl_opts = {
|
|
"downloader": "ffmpeg", # Use ffmpeg instead of native
|
|
"hls_use_mpegts": True, # Write transport stream (.ts) segments
|
|
}
|
|
```
|
|
|
|
**Why this matters**: Without ffmpeg, yt-dlp logs:
|
|
`"Live HLS streams are not supported by the native downloader"`
|
|
|
|
**Requirements**:
|
|
- ffmpeg must be installed and in PATH (`which ffmpeg`)
|
|
- Install: `apt install ffmpeg` (Debian/Ubuntu) or `brew install ffmpeg` (macOS)
|
|
- Startup health check (see Health Check Endpoints) verifies ffmpeg presence
|
|
|
|
**Trade-offs**:
|
|
- HLS downloads are slower than direct MP4 (reassembly of .ts segments)
|
|
- Requires more disk space during download
|
|
- May need post-processing if .ts format is not desired
|
|
|
|
**Detection**: VOE provider extracts HLS URLs via `HLS_PATTERN` regex. Other
|
|
providers let yt-dlp auto-detect from URL/content-type.
|
|
|
|
### Updating yt-dlp
|
|
|
|
When extractors break (typical symptoms: every provider HEAD probe succeeds
|
|
but `yt-dlp` raises `Unable to extract` or `HTTP Error 404`):
|
|
|
|
1. Check the upstream tracker first: https://github.com/yt-dlp/yt-dlp/issues
|
|
2. Upgrade in the conda environment:
|
|
```bash
|
|
conda run -n AniWorld pip install --upgrade yt-dlp
|
|
```
|
|
3. Smoke-test against a known-good episode before pinning a new floor in
|
|
`requirements.txt` (`yt-dlp>=YYYY.MM.DD`).
|
|
4. Re-run the provider test suite:
|
|
```bash
|
|
conda run -n AniWorld python -m pytest tests/unit/test_aniworld_provider.py -v
|
|
```
|
|
5. If a specific extractor is removed upstream, drop the provider from
|
|
`DEFAULT_PROVIDERS` rather than patching `yt-dlp` in tree.
|
|
|
|
### User Notification on Total Failure
|
|
|
|
`SeriesApp.download_episode()` already emits a `download_status="failed"`
|
|
WebSocket event when `loader.download()` returns `False`. Operators should
|
|
forward this to `notification_service.notify_download_failed()` so users see
|
|
a HIGH-priority alert. The loader keeps the failure detail in
|
|
`logs/download_errors.log` for post-mortem.
|
|
|
|
## Series Storage
|
|
|
|
### Overview
|
|
|
|
Series metadata now stored in the database (SQLAlchemy ORM).
|
|
Legacy files (`key` and `data` per folder) are deprecated but preserved
|
|
for backward compatibility.
|
|
|
|
### Architecture
|
|
|
|
- **Database**: Single source of truth for all series metadata
|
|
- **In-Memory Cache**: SeriesApp maintains a cache for performance
|
|
- **Filesystem**: Only used for episode files themselves, not metadata
|
|
|
|
### Migration
|
|
|
|
First startup after upgrade automatically imports any legacy
|
|
series files into the database.
|
|
|
|
### Legacy Files
|
|
|
|
- `key` file: Contains series provider key (deprecated)
|
|
- `data` file: Contains Serie JSON object (deprecated)
|
|
|
|
Both are safe to delete after migration; not needed for normal operation.
|
|
|