Aniworld/docs/DEVELOPMENT.md

# Development Guide

## Document Purpose

This document provides guidance for developers working on the Aniworld project.

### What This Document Contains

-   **Prerequisites**: Required software and tools
-   **Environment Setup**: Step-by-step local development setup
-   **Project Structure**: Source code organization explanation
-   **Development Workflow**: Branch strategy, commit conventions
-   **Coding Standards**: Style guide, linting, formatting
-   **Running the Application**: Development server, CLI usage
-   **Debugging Tips**: Common debugging approaches
-   **IDE Configuration**: VS Code settings, recommended extensions
-   **Contributing Guidelines**: How to submit changes
-   **Code Review Process**: Review checklist and expectations

### What This Document Does NOT Contain

-   Production deployment (see [DEPLOYMENT.md](DEPLOYMENT.md))
-   API reference (see [API.md](API.md))
-   Architecture decisions (see [ARCHITECTURE.md](ARCHITECTURE.md))
-   Test writing guides (see [TESTING.md](TESTING.md))
-   Security guidelines (see [SECURITY.md](SECURITY.md))

### Target Audience

-   New Developers joining the project
-   Contributors (internal and external)
-   Anyone setting up a development environment

---

## Sections to Document

1. Prerequisites
    - Python version
    - Conda environment
    - Node.js (if applicable)
    - Git
2. Getting Started
    - Clone repository
    - Setup conda environment
    - Install dependencies
    - Configuration setup
3. Project Structure Overview
4. Development Server
    - Starting FastAPI server
    - Hot reload configuration
    - Debug mode
5. CLI Development
6. Code Style
    - PEP 8 compliance
    - Type hints requirements
    - Docstring format
    - Import organization
7. Git Workflow
    - Branch naming
    - Commit message format
    - Pull request process
8. Common Development Tasks

### Adding Queue Deduplication

The download queue prevents duplicate entries at two levels:

**In-Memory Deduplication** (`src/server/services/download_service.py`):
- `_pending_by_episode` dict tracks pending episodes: key = `(serie_id, season, episode)`
- `_add_to_pending_queue()` updates the dict when adding items
- `add_to_queue()` checks this dict before adding episodes (includes batch-local dedup)
- `_remove_from_pending_queue()` cleans up the dict when items are removed

**Database Constraint** (`src/server/models.py`):
- `DownloadQueueItem` has a unique index on `episode_id` via `__table_args__`
- Prevents duplicate queue entries at the database level
- Unique constraint: `Index("ix_download_queue_episode_pending", "episode_id", unique=True)`

**Scheduler Cooldown** (`src/server/services/scheduler_service.py`):
- `_last_auto_download_time` tracks when auto-download last ran
- 5-minute cooldown prevents rapid re-triggers
- Checked at start of `_auto_download_missing()`

### Episode Lifecycle

Episodes transition through states stored in the `episodes` table:

| State | `is_downloaded` | `file_path` | Description |
|-------|----------------|-------------|-------------|
| Missing | `False` | `NULL` | Episode not yet downloaded |
| Downloaded | `True` | Set | Episode exists on disk |

**State Transitions:**
1. **Missing → Downloaded**: When download completes, `_remove_episode_from_missing_list()` calls `EpisodeService.mark_downloaded()` to set `is_downloaded=True` and populate `file_path`. The episode record is NOT deleted.

**Query Implications:**
- `get_series_with_missing_episodes()`: Filters for `is_downloaded=False` to find series with undownloaded episodes
- `get_series_with_no_episodes()`: Finds series with `is_downloaded=False` episodes but NO `is_downloaded=True` episodes (completely unwatched series)

### Mocking the Download Queue

When testing components that use the download queue:

```python
# Mock repository for unit tests
class MockQueueRepository:
    def __init__(self):
        self._items: Dict[str, DownloadItem] = {}

    async def save_item(self, item: DownloadItem) -> DownloadItem:
        self._items[item.id] = item
        return item

    async def get_all_items(self) -> List[DownloadItem]:
        return list(self._items.values())

# Use in fixture
@pytest.fixture
def mock_queue_repository():
    return MockQueueRepository()

@pytest.fixture
def download_service(mock_anime_service, mock_queue_repository):
    return DownloadService(
        anime_service=mock_anime_service,
        queue_repository=mock_queue_repository,
        max_retries=3,
    )
```

9. Troubleshooting Development Issues

### Async Context Managers for aiohttp

All `aiohttp.ClientSession` usages must be wrapped in `async with`:

```python
# Correct — session properly closed on exit
async with TMDBClient(api_key="key") as client:
    result = await client.search_tv_show("Show")

# Wrong — session may leak if exception occurs
client = TMDBClient(api_key="key")
result = await client.search_tv_show("Show")
await client.close()  # May not be called if exception raised earlier
```

**Why:**
- `aiohttp.ClientSession` holds TCP connections that must be explicitly closed
- If exception occurs before `close()`, session leaks
- Context manager guarantees `__aexit__` runs even on exceptions

**Services that use aiohttp:**
- `TMDBClient` — has `__aenter__`/`__aexit__`, use `async with`
- `ImageDownloader` — has `__aenter__`/`__aexit__`, use `async with`
- `NFOService` — wraps both above, use `async with`

**Verification:**
- Missing context manager usage triggers `__del__` warning on garbage collection
- Integration tests verify no "Unclosed client session" errors in logs

### Scheduler Persistence and Recovery

APScheduler stores jobs in `data/scheduler.db` (SQLite) so they survive process restarts:

```python
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore

jobstores = {
    "default": SQLAlchemyJobStore(url="sqlite:///./data/scheduler.db"),
}
scheduler = AsyncIOScheduler(jobstores=jobstores)
```

**Grace period:** `misfire_grace_time=3600` (1 hour). If server is down at scheduled time and restarts within 1 hour, missed job runs automatically via APScheduler coalesce behavior.

**Startup recovery:** On `start()`, scheduler loads persisted jobs from DB. APScheduler handles missed jobs internally when `coalesce=True`.

**Health endpoint:** `GET /health` returns `scheduler_next_run` and `scheduler_last_run` for external monitors (Uptime Kuma, Prometheus, etc.).

**If server is down >1 hour:** No automatic recovery. Manual trigger via `POST /api/scheduler/trigger-rescan` or wait for next scheduled run.

### Database Session Management

`get_async_session_factory()` returns a **new AsyncSession instance** directly (not a factory). The function name is historical — callers receive the session immediately:

```python
# Correct usage:
db = get_async_session_factory()  # db IS the session
await db.execute(...)
await db.commit()
await db.close()
```

Do NOT call the result again with `()` — that tries to call an `AsyncSession` object, causing `'AsyncSession' object is not callable`.

For context manager usage, prefer `get_db_session()` (auto-commits) or `get_transactional_session()` (manual commit).

### Health Check Endpoints

The application provides health check endpoints for monitoring and container orchestration:

#### `GET /health`
Basic health check returning service status and startup health check results.

**Response fields:**
- `status`: "healthy", "degraded", or "unhealthy" based on startup checks
- `timestamp`: ISO timestamp of the check
- `series_app_initialized`: Whether the series app is loaded
- `anime_directory_configured`: Whether anime_directory is set
- `scheduler_next_run` / `scheduler_last_run`: Scheduler times
- `checks`: Detailed startup check results (ffmpeg, DNS, anime_directory)

#### `GET /health/ready`
Readiness check for container orchestrators (Kubernetes, Docker Swarm).

**Response when ready:**
```json
{
  "status": "ready",
  "ready": true,
  "timestamp": "2024-01-01T00:00:00",
  "checks": {...}
}
```

**Response when not ready (503):**
```json
{
  "status": "not_ready",
  "ready": false,
  "timestamp": "2024-01-01T00:00:00",
  "critical_failures": ["anime_directory: not configured"],
  "checks": {...}
}
```

#### `GET /health/detailed`
Comprehensive health check including database, filesystem, and system metrics.

#### Startup Health Checks

On application startup, the following checks are performed:

| Check | Failure Status | Impact |
|-------|---------------|--------|
| `ffmpeg` | warning | HLS downloads may fail |
| `dns_aniworld` | warning | Provider requests may fail |
| `dns_tmdb` | warning | TMDB API calls may fail |
| `anime_directory` | error | Download service disabled |

DNS checks are warnings because failures can be transient. anime_directory errors disable the download service to prevent failures.

### Troubleshooting Development Issues

#### Scheduler missed a run

1. Server was down at scheduled time (03:00 UTC by default).
2. Check `data/scheduler.db` exists — if not, jobs are not persisted.
3. If server was down >1 hour, missed job is dropped (misfire window exceeded).
4. Trigger manually: `POST /api/scheduler/trigger-rescan`
5. Monitor next run: `GET /health` → `scheduler_next_run`
6. If problem repeats, increase `misfire_grace_time` in `scheduler_service.py`.

#### Scheduler not firing (no events at scheduled time)

If the scheduler appears configured but never triggers:

1. **Verify scheduler.db contains the job:**
   ```bash
   sqlite3 data/scheduler.db "SELECT id, next_run_time FROM apscheduler_jobs;"
   ```
   - `next_run_time` should be in the future
   - If it's in the past, the server was down when the job should have fired

2. **Check application logs for scheduler startup:**
   ```
   grep "Scheduler service started" fastapi_app.log
   ```
   - If missing, the scheduler failed to start — check for errors above this line
   - If present, scheduler started successfully

3. **Verify APScheduler events in logs:**
   ```
   grep "apscheduler.executors.default" fastapi_app.log
   ```
   - `Running job` = job triggered
   - `executed successfully` = job completed
   - No output = job never fired

4. **Test manual trigger:**
   ```bash
   curl -X POST http://localhost:8000/api/scheduler/trigger-rescan -H "Authorization: Bearer <token>"
   ```
   - If manual trigger works but cron doesn't, the issue is APScheduler configuration

5. **Check next_run_time via health endpoint:**
   ```bash
   curl http://localhost:8000/health | jq .scheduler_next_run
   ```
   - If `null`, the job is not scheduled
   - If set, the scheduler knows when to run next

6. **Check timezone handling:**
   - APScheduler uses UTC internally
   - The schedule_time config (e.g., "03:00") is interpreted as UTC
   - If you expect local time, adjust the schedule_time accordingly

#### Startup health check failures

If `/health` returns `unhealthy` status:

1. **anime_directory error**: Directory not configured or not writable
   - Check `ANIME_DIRECTORY` environment variable
   - Verify directory exists and permissions allow write access
   - Download service will not initialize until resolved

2. **ffmpeg warning**: ffmpeg not found in PATH
   - HLS stream downloads will fail
   - Install ffmpeg: `apt install ffmpeg` or `brew install ffmpeg`

3. **DNS warnings**: Domain resolution failed
   - Check network connectivity
   - DNS failures are transient — warnings don't block startup
   - Retry later to verify: `GET /health`

### Provider Failure Handling

Download providers (VOE, Doodstream, Vidmoly, Vidoza, SpeedFiles, Streamtape,
Luluvdo) regularly break: URLs expire, sites change their player markup, geo
blocks appear, and `yt-dlp` extractors lag behind upstream changes. The
`AniworldLoader.download()` flow is designed to fail fast and rotate.

**Rotation order**

1. The episode page is scraped for the providers AniWorld actually advertises.
2. Results are ordered by the preference in `DEFAULT_PROVIDERS`
   (`provider_config.py`); providers not listed run last.
3. For each candidate the loader:
   1. Calls `_check_url_alive()` — HEAD probe with GET fallback. Any 4xx
      response or connection error skips the provider immediately.
   2. Resolves the redirect via `_resolve_direct_link()` to obtain a direct
      stream URL plus headers. Provider-specific extractors (e.g. `VOE`) are
      preferred; unknown providers fall back to the embed URL so `yt-dlp` can
      attempt extraction.
   3. Tries `_try_direct_stream()` — straight `requests.get(stream=True)` when
      `Content-Type` is `video/*` or `application/octet-stream`. This avoids
      `yt-dlp` entirely for direct MP4 links.
   4. Falls back to `yt-dlp` with the ffmpeg downloader for HLS streams.
4. On any failure, temp files are cleaned and the loop moves to the next
   provider. When the chain is exhausted, the loader logs
   `All download providers failed for S{season}E{episode} ...; tried=[...]`
   to both the application log and `logs/download_errors.log`.

**Do not hardcode provider URLs.** Provider domains shift constantly (e.g.
Doodstream alternates between `dood.li`, `dood.so`, `dood.la`). Only the
referer hints in `PROVIDER_HEADERS` are persisted — discovery still happens
at runtime through AniWorld's redirect endpoint.

### HLS Stream Handling

HLS (HTTP Live Streaming) manifests (`.m3u8`) require yt-dlp to use the
`ffmpeg` downloader with `--hls-use-mpegts`. Both providers configure this
automatically:

```python
ydl_opts = {
    "downloader": "ffmpeg",     # Use ffmpeg instead of native
    "hls_use_mpegts": True,     # Write transport stream (.ts) segments
}
```

**Why this matters**: Without ffmpeg, yt-dlp logs:
`"Live HLS streams are not supported by the native downloader"`

**Requirements**:
- ffmpeg must be installed and in PATH (`which ffmpeg`)
- Install: `apt install ffmpeg` (Debian/Ubuntu) or `brew install ffmpeg` (macOS)
- Startup health check (see Health Check Endpoints) verifies ffmpeg presence

**Trade-offs**:
- HLS downloads are slower than direct MP4 (reassembly of .ts segments)
- Requires more disk space during download
- May need post-processing if .ts format is not desired

**Detection**: VOE provider extracts HLS URLs via `HLS_PATTERN` regex. Other
providers let yt-dlp auto-detect from URL/content-type.

### Updating yt-dlp

When extractors break (typical symptoms: every provider HEAD probe succeeds
but `yt-dlp` raises `Unable to extract` or `HTTP Error 404`):

1. Check the upstream tracker first: https://github.com/yt-dlp/yt-dlp/issues
2. Upgrade in the conda environment:
   ```bash
   conda run -n AniWorld pip install --upgrade yt-dlp
   ```
3. Smoke-test against a known-good episode before pinning a new floor in
   `requirements.txt` (`yt-dlp>=YYYY.MM.DD`).
4. Re-run the provider test suite:
   ```bash
   conda run -n AniWorld python -m pytest tests/unit/test_aniworld_provider.py -v
   ```
5. If a specific extractor is removed upstream, drop the provider from
   `DEFAULT_PROVIDERS` rather than patching `yt-dlp` in tree.

### User Notification on Total Failure

`SeriesApp.download_episode()` already emits a `download_status="failed"`
WebSocket event when `loader.download()` returns `False`. Operators should
forward this to `notification_service.notify_download_failed()` so users see
a HIGH-priority alert. The loader keeps the failure detail in
`logs/download_errors.log` for post-mortem.

## Series Storage

### Overview

Series metadata now stored in the database (SQLAlchemy ORM).
Legacy files (`key` and `data` per folder) are deprecated but preserved
for backward compatibility.

### Architecture

- **Database**: Single source of truth for all series metadata
- **In-Memory Cache**: SeriesApp maintains a cache for performance
- **Filesystem**: Only used for episode files themselves, not metadata

### Migration

First startup after upgrade automatically imports any legacy
series files into the database.

### Legacy Files

- `key` file: Contains series provider key (deprecated)
- `data` file: Contains Serie JSON object (deprecated)

Both are safe to delete after migration; not needed for normal operation.