Files

Lukas bc87bee416 refactor(scheduler): drop separate scheduler.db in favour of MemoryJobStore

Scheduler used a separate SQLite file (scheduler.db) only to persist one
cron job. This was originally required because APScheduler's
SQLAlchemyJobStore is sync-only, creating an async/sync driver conflict
when accessing the same file.

The job is rebuilt from config.json on every startup regardless
(replace_existing=True), so the persisted state only served misfire
detection. Moved misfire detection into the app layer by querying
system_settings.last_scan_timestamp on startup: if the last scan is
>23h but <25h ago, an immediate rescan is triggered.

Change summary:
- Remove SQLAlchemyJobStore; use default MemoryJobStore instead
- Add _check_missed_run() that reads last_scan_timestamp from aniworld.db
- Update docs/DEVELOPMENT.md scheduler troubleshooting section
- Update the scheduler unit test that verified SQLAlchemyJobStore

2026-05-27 22:09:18 +02:00

16 KiB

Raw Blame History

Development Guide

Document Purpose

This document provides guidance for developers working on the Aniworld project.

What This Document Contains

Prerequisites: Required software and tools
Environment Setup: Step-by-step local development setup
Project Structure: Source code organization explanation
Development Workflow: Branch strategy, commit conventions
Coding Standards: Style guide, linting, formatting
Running the Application: Development server, CLI usage
Debugging Tips: Common debugging approaches
IDE Configuration: VS Code settings, recommended extensions
Contributing Guidelines: How to submit changes
Code Review Process: Review checklist and expectations

What This Document Does NOT Contain

Production deployment (see DEPLOYMENT.md)
API reference (see API.md)
Architecture decisions (see ARCHITECTURE.md)
Test writing guides (see TESTING.md)
Security guidelines (see SECURITY.md)

Target Audience

New Developers joining the project
Contributors (internal and external)
Anyone setting up a development environment

Sections to Document

Prerequisites
- Python version
- Conda environment
- Node.js (if applicable)
- Git
Getting Started
- Clone repository
- Setup conda environment
- Install dependencies
- Configuration setup
Project Structure Overview
Development Server
- Starting FastAPI server
- Hot reload configuration
- Debug mode
CLI Development
Code Style
- PEP 8 compliance
- Type hints requirements
- Docstring format
- Import organization
Git Workflow
- Branch naming
- Commit message format
- Pull request process
Common Development Tasks

Adding Queue Deduplication

The download queue prevents duplicate entries at two levels:

In-Memory Deduplication (src/server/services/download_service.py):

_pending_by_episode dict tracks pending episodes: key = (serie_id, season, episode)
_add_to_pending_queue() updates the dict when adding items
add_to_queue() checks this dict before adding episodes (includes batch-local dedup)
_remove_from_pending_queue() cleans up the dict when items are removed

Database Constraint (src/server/models.py):

DownloadQueueItem has a unique index on episode_id via __table_args__
Prevents duplicate queue entries at the database level
Unique constraint: Index("ix_download_queue_episode_pending", "episode_id", unique=True)

Scheduler Cooldown (src/server/services/scheduler_service.py):

_last_auto_download_time tracks when auto-download last ran
5-minute cooldown prevents rapid re-triggers
Checked at start of _auto_download_missing()

Episode Lifecycle

Episodes transition through states stored in the episodes table:

State	`is_downloaded`	`file_path`	Description
Missing	`False`	`NULL`	Episode not yet downloaded
Downloaded	`True`	Set	Episode exists on disk

State Transitions:

Missing → Downloaded: When download completes, _remove_episode_from_missing_list() calls EpisodeService.mark_downloaded() to set is_downloaded=True and populate file_path. The episode record is NOT deleted.

Query Implications:

get_series_with_missing_episodes(): Filters for is_downloaded=False to find series with undownloaded episodes
get_series_with_no_episodes(): Finds series with is_downloaded=False episodes but NO is_downloaded=True episodes (completely unwatched series)

Mocking the Download Queue

When testing components that use the download queue:

# Mock repository for unit tests
class MockQueueRepository:
    def __init__(self):
        self._items: Dict[str, DownloadItem] = {}

    async def save_item(self, item: DownloadItem) -> DownloadItem:
        self._items[item.id] = item
        return item

    async def get_all_items(self) -> List[DownloadItem]:
        return list(self._items.values())

# Use in fixture
@pytest.fixture
def mock_queue_repository():
    return MockQueueRepository()

@pytest.fixture
def download_service(mock_anime_service, mock_queue_repository):
    return DownloadService(
        anime_service=mock_anime_service,
        queue_repository=mock_queue_repository,
        max_retries=3,
    )

Troubleshooting Development Issues

Async Context Managers for aiohttp

All aiohttp.ClientSession usages must be wrapped in async with:

# Correct — session properly closed on exit
async with TMDBClient(api_key="key") as client:
    result = await client.search_tv_show("Show")

# Wrong — session may leak if exception occurs
client = TMDBClient(api_key="key")
result = await client.search_tv_show("Show")
await client.close()  # May not be called if exception raised earlier

Why:

aiohttp.ClientSession holds TCP connections that must be explicitly closed
If exception occurs before close(), session leaks
Context manager guarantees __aexit__ runs even on exceptions

Services that use aiohttp:

TMDBClient — has __aenter__/__aexit__, use async with
ImageDownloader — has __aenter__/__aexit__, use async with
NFOService — wraps both above, use async with

Verification:

Missing context manager usage triggers __del__ warning on garbage collection
Integration tests verify no "Unclosed client session" errors in logs

Scheduler Persistence and Recovery

The scheduler uses APScheduler's in-memory job store. Jobs are reconstructed from config.json on every startup — no separate database is needed.

# Jobs are built from config on startup — no persistence DB required
scheduler = AsyncIOScheduler()  # default MemoryJobStore
scheduler.add_job(..., replace_existing=True)

Startup misfire recovery: On start(), the scheduler checks system_settings.last_scan_timestamp in aniworld.db. If the last scan is overdue (>23h but <25h ago), an immediate rescan is triggered. This replaces APScheduler's built-in misfire handling which required a separate SQLite database.

Grace period: If the server was down for more than 25 hours, no automatic recovery occurs to avoid surprise rescans after long downtime.

Health endpoint: GET /health returns scheduler_next_run and scheduler_last_run for external monitors (Uptime Kuma, Prometheus, etc.).

If server is down too long: Manual trigger via POST /api/scheduler/trigger-rescan or wait for next scheduled run.

Database Session Management

get_async_session_factory() returns a new AsyncSession instance directly (not a factory). The function name is historical — callers receive the session immediately:

# Correct usage:
db = get_async_session_factory()  # db IS the session
await db.execute(...)
await db.commit()
await db.close()

Do NOT call the result again with () — that tries to call an AsyncSession object, causing 'AsyncSession' object is not callable.

For context manager usage, prefer get_db_session() (auto-commits) or get_transactional_session() (manual commit).

Health Check Endpoints

The application provides health check endpoints for monitoring and container orchestration:

`GET /health`

Basic health check returning service status and startup health check results.

Response fields:

status: "healthy", "degraded", or "unhealthy" based on startup checks
timestamp: ISO timestamp of the check
series_app_initialized: Whether the series app is loaded
anime_directory_configured: Whether anime_directory is set
scheduler_next_run / scheduler_last_run: Scheduler times
checks: Detailed startup check results (ffmpeg, DNS, anime_directory)

`GET /health/ready`

Readiness check for container orchestrators (Kubernetes, Docker Swarm).

Response when ready:

{
  "status": "ready",
  "ready": true,
  "timestamp": "2024-01-01T00:00:00",
  "checks": {...}
}

Response when not ready (503):

{
  "status": "not_ready",
  "ready": false,
  "timestamp": "2024-01-01T00:00:00",
  "critical_failures": ["anime_directory: not configured"],
  "checks": {...}
}

`GET /health/detailed`

Comprehensive health check including database, filesystem, and system metrics.

Startup Health Checks

On application startup, the following checks are performed:

Check	Failure Status	Impact
`ffmpeg`	warning	HLS downloads may fail
`dns_aniworld`	warning	Provider requests may fail
`dns_tmdb`	warning	TMDB API calls may fail
`anime_directory`	error	Download service disabled

DNS checks are warnings because failures can be transient. anime_directory errors disable the download service to prevent failures.

Troubleshooting Development Issues

Scheduler missed a run

Server was down at scheduled time (03:00 UTC by default).
On restart, the scheduler checks last_scan_timestamp — if overdue by 23-25h, it triggers immediately.
If server was down >25 hours, missed job is skipped to avoid surprise rescans.
Trigger manually: POST /api/scheduler/trigger-rescan
Monitor next run: GET /health → scheduler_next_run

Scheduler not firing (no events at scheduled time)

If the scheduler appears configured but never triggers:

Check application logs for scheduler startup:
```
grep "Scheduler service started" fastapi_app.log
```
- If missing, the scheduler failed to start — check for errors above this line
- If present, scheduler started successfully

Verify the job is registered:

grep "Scheduler started with cron trigger" fastapi_app.log

Verify APScheduler events in logs:
```
grep "apscheduler.executors.default" fastapi_app.log
```
- Running job = job triggered
- executed successfully = job completed
- No output = job never fired

Test manual trigger:

curl -X POST http://localhost:8000/api/scheduler/trigger-rescan -H "Authorization: Bearer <token>"

If manual trigger works but cron doesn't, the issue is APScheduler configuration

Check next_run_time via health endpoint:
```
curl http://localhost:8000/health | jq .scheduler_next_run
```
- If null, the job is not scheduled
- If set, the scheduler knows when to run next
Check timezone handling:
- APScheduler uses UTC internally
- The schedule_time config (e.g., "03:00") is interpreted as UTC
- If you expect local time, adjust the schedule_time accordingly

Startup health check failures

If /health returns unhealthy status:

anime_directory error: Directory not configured or not writable
- Check ANIME_DIRECTORY environment variable
- Verify directory exists and permissions allow write access
- Download service will not initialize until resolved
ffmpeg warning: ffmpeg not found in PATH
- HLS stream downloads will fail
- Install ffmpeg: apt install ffmpeg or brew install ffmpeg
DNS warnings: Domain resolution failed
- Check network connectivity
- DNS failures are transient — warnings don't block startup
- Retry later to verify: GET /health

Provider Failure Handling

Download providers (VOE, Doodstream, Vidmoly, Vidoza, SpeedFiles, Streamtape, Luluvdo) regularly break: URLs expire, sites change their player markup, geo blocks appear, and yt-dlp extractors lag behind upstream changes. The AniworldLoader.download() flow is designed to fail fast and rotate.

Rotation order

The episode page is scraped for the providers AniWorld actually advertises.
Results are ordered by the preference in DEFAULT_PROVIDERS (provider_config.py); providers not listed run last.
For each candidate the loader:
1. Calls _check_url_alive() — HEAD probe with GET fallback. Any 4xx response or connection error skips the provider immediately.
2. Resolves the redirect via _resolve_direct_link() to obtain a direct stream URL plus headers. Provider-specific extractors (e.g. VOE) are preferred; unknown providers fall back to the embed URL so yt-dlp can attempt extraction.
3. Tries _try_direct_stream() — straight requests.get(stream=True) when Content-Type is video/* or application/octet-stream. This avoids yt-dlp entirely for direct MP4 links.
4. Falls back to yt-dlp with the ffmpeg downloader for HLS streams.
On any failure, temp files are cleaned and the loop moves to the next provider. When the chain is exhausted, the loader logs All download providers failed for S{season}E{episode} ...; tried=[...] to both the application log and logs/download_errors.log.

Do not hardcode provider URLs. Provider domains shift constantly (e.g. Doodstream alternates between dood.li, dood.so, dood.la). Only the referer hints in PROVIDER_HEADERS are persisted — discovery still happens at runtime through AniWorld's redirect endpoint.

HLS Stream Handling

HLS (HTTP Live Streaming) manifests (.m3u8) require yt-dlp to use the ffmpeg downloader with --hls-use-mpegts. Both providers configure this automatically:

ydl_opts = {
    "downloader": "ffmpeg",     # Use ffmpeg instead of native
    "hls_use_mpegts": True,     # Write transport stream (.ts) segments
}

Why this matters: Without ffmpeg, yt-dlp logs: "Live HLS streams are not supported by the native downloader"

Requirements:

ffmpeg must be installed and in PATH (which ffmpeg)
Install: apt install ffmpeg (Debian/Ubuntu) or brew install ffmpeg (macOS)
Startup health check (see Health Check Endpoints) verifies ffmpeg presence

Trade-offs:

HLS downloads are slower than direct MP4 (reassembly of .ts segments)
Requires more disk space during download
May need post-processing if .ts format is not desired

Detection: VOE provider extracts HLS URLs via HLS_PATTERN regex. Other providers let yt-dlp auto-detect from URL/content-type.

Updating yt-dlp

When extractors break (typical symptoms: every provider HEAD probe succeeds but yt-dlp raises Unable to extract or HTTP Error 404):

Check the upstream tracker first: https://github.com/yt-dlp/yt-dlp/issues

Upgrade in the conda environment:

conda run -n AniWorld pip install --upgrade yt-dlp

Smoke-test against a known-good episode before pinning a new floor in requirements.txt (yt-dlp>=YYYY.MM.DD).

Re-run the provider test suite:

conda run -n AniWorld python -m pytest tests/unit/test_aniworld_provider.py -v

If a specific extractor is removed upstream, drop the provider from DEFAULT_PROVIDERS rather than patching yt-dlp in tree.

User Notification on Total Failure

SeriesApp.download_episode() already emits a download_status="failed" WebSocket event when loader.download() returns False. Operators should forward this to notification_service.notify_download_failed() so users see a HIGH-priority alert. The loader keeps the failure detail in logs/download_errors.log for post-mortem.

Series Storage

Overview

Series metadata now stored in the database (SQLAlchemy ORM). Legacy files (key and data per folder) are deprecated but preserved for backward compatibility.

Architecture

Database: Single source of truth for all series metadata
In-Memory Cache: SeriesApp maintains a cache for performance
Filesystem: Only used for episode files themselves, not metadata

Migration

First startup after upgrade automatically imports any legacy series files into the database.

Legacy Files

key file: Contains series provider key (deprecated)
data file: Contains Serie JSON object (deprecated)

Both are safe to delete after migration; not needed for normal operation.

16 KiB Raw Blame History