Files
Aniworld/docs/DATABASE.md
Lukas 6d30747f25 Fix stale data file updates on download completion
When episodes are downloaded successfully, the in-memory Serie.episodeDict
is updated, but the deprecated data file was not being synced. This caused
UI to show episodes as missing when already downloaded.

Changes:
- Update data file in _remove_episode_from_memory when download completes
- DB is authoritative; data file is optional backup (deprecated)
- Gracefully skip update if data file doesn't exist

New integration tests for episode download sync:
- Verify episode removed from missing list after download
- Verify in-memory cache updated after download
- Verify data file updated after download (when it exists)
- Verify downloads work without data file
2026-05-26 18:57:04 +02:00

27 KiB

Database Documentation

Document Purpose

This document describes the database schema, models, and data layer of the Aniworld application.


1. Database Overview

Technology

  • Database Engine: SQLite 3 (default), PostgreSQL supported
  • ORM: SQLAlchemy 2.0 with async support (aiosqlite)
  • Location: data/aniworld.db (configurable via DATABASE_URL)

Source: src/config/settings.py

Connection Configuration

# Default connection string
DATABASE_URL = "sqlite+aiosqlite:///./data/aniworld.db"

# PostgreSQL alternative
DATABASE_URL = "postgresql+asyncpg://user:pass@localhost/aniworld"

Source: src/server/database/connection.py


2. Entity Relationship Diagram

+---------------------+       +-------------------+       +-------------------+       +------------------------+
| system_settings     |       |   anime_series    |       |     episodes      |       |  download_queue_item   |
+---------------------+       +-------------------+       +-------------------+       +------------------------+
| id (PK)             |       | id (PK)           |<--+   | id (PK)           |   +-->| id (PK, VARCHAR)       |
| initial_scan_...    |       | key (UNIQUE)      |   |   | series_id (FK)----+---+   | series_id (FK)---------+
| initial_nfo_scan... |       | name              |   +---|                   |       | status                 |
| initial_media_...   |       | site              |       | season            |       | priority               |
| last_scan_timestamp |       | folder            |       | episode_number    |       | season                 |
| created_at          |       | created_at        |       | title             |       | episode                |
| updated_at          |       | updated_at        |       | file_path         |       | progress_percent       |
+---------------------+       +-------------------+       | is_downloaded     |       | error_message          |
                                                          | created_at        |       | retry_count            |
                                                          | updated_at        |       | added_at               |
                                                          +-------------------+       | started_at             |
                                                                                      | completed_at           |
                                                                                      | created_at             |
                                                                                      | updated_at             |
                                                                                      +------------------------+

3. Table Schemas

3.1 system_settings

Stores application-wide system settings and initialization state.

Column Type Constraints Description
id INTEGER PRIMARY KEY, AUTOINCREMENT Internal database ID (only one row)
initial_scan_completed BOOLEAN NOT NULL, DEFAULT FALSE Whether initial anime folder scan is complete
initial_nfo_scan_completed BOOLEAN NOT NULL, DEFAULT FALSE Whether initial NFO scan is complete
initial_media_scan_completed BOOLEAN NOT NULL, DEFAULT FALSE Whether initial media scan is complete
last_scan_timestamp DATETIME NULLABLE Timestamp of last completed scan
created_at DATETIME NOT NULL, DEFAULT NOW Record creation timestamp
updated_at DATETIME NOT NULL, ON UPDATE NOW Last update timestamp

Purpose:

This table tracks the initialization status of the application to ensure that expensive one-time setup operations (like scanning the entire anime directory) only run on the first startup, not on every restart.

  • Only one row exists in this table
  • The initial_scan_completed flag prevents redundant full directory scans on each startup
  • The NFO and media scan flags similarly track completion of those setup tasks

Source: src/server/database/models.py, src/server/database/system_settings_service.py

3.2 anime_series

Stores anime series metadata. Corresponds to the core Serie class.

Column Type Constraints Description
id INTEGER PRIMARY KEY, AUTOINCREMENT Internal database ID
key VARCHAR(255) UNIQUE, NOT NULL, INDEX Primary identifier - provider-assigned URL-safe key
name VARCHAR(500) NOT NULL, INDEX Display name of the series
site VARCHAR(500) NOT NULL Provider site URL
folder VARCHAR(1000) NOT NULL Filesystem folder name (metadata only)
year INTEGER NULLABLE Release year of the series
nfo_path VARCHAR(1000) NULLABLE Path to tvshow.nfo metadata file
tmdb_id INTEGER NULLABLE, INDEX TMDB (The Movie Database) ID for metadata
tvdb_id INTEGER NULLABLE, INDEX TVDB (TheTVDB) ID for metadata
has_nfo BOOLEAN NOT NULL, DEFAULT FALSE Whether tvshow.nfo exists
loading_status VARCHAR(50) NOT NULL, DEFAULT 'completed' Status: pending, loading_episodes, loading_nfo, completed, failed
created_at DATETIME NOT NULL, DEFAULT NOW Record creation timestamp
updated_at DATETIME NOT NULL, ON UPDATE NOW Last update timestamp

Identifier Convention:

  • key is the primary identifier for all operations (e.g., "attack-on-titan")
  • folder is metadata only for filesystem operations (e.g., "Attack on Titan (2013)")
  • id is used only for database relationships

EpisodeDict Mapping:

The episodeDict (season → episode numbers mapping) is stored as individual Episode records:

  • Each Episode has season and episode_number columns
  • Relationship: AnimeSeries.episodes returns all Episode records for that series

Source: src/server/database/models.py

3.3 episodes

Stores missing episodes that need to be downloaded. Episodes are automatically managed during scans:

  • New missing episodes are added to the database
  • Episodes that are no longer missing (files now exist) are removed from the database
  • When an episode is downloaded, it can be marked with is_downloaded=True or removed from tracking
Column Type Constraints Description
id INTEGER PRIMARY KEY, AUTOINCREMENT Internal database ID
series_id INTEGER FOREIGN KEY, NOT NULL, INDEX Reference to anime_series.id
season INTEGER NOT NULL Season number (1-based)
episode_number INTEGER NOT NULL Episode number within season
title VARCHAR(500) NULLABLE Episode title if known
file_path VARCHAR(1000) NULLABLE Local file path if downloaded
is_downloaded BOOLEAN NOT NULL, DEFAULT FALSE Download status flag
created_at DATETIME NOT NULL, DEFAULT NOW Record creation timestamp
updated_at DATETIME NOT NULL, ON UPDATE NOW Last update timestamp

Foreign Key:

  • series_id -> anime_series.id (ON DELETE CASCADE)

Source: src/server/database/models.py

3.4 download_queue_item

Stores download queue items with status tracking.

Column Type Constraints Description
id VARCHAR(36) PRIMARY KEY UUID identifier
series_id INTEGER FOREIGN KEY, NOT NULL Reference to anime_series.id
season INTEGER NOT NULL Season number
episode INTEGER NOT NULL Episode number
status VARCHAR(20) NOT NULL, DEFAULT 'pending' Download status
priority VARCHAR(10) NOT NULL, DEFAULT 'NORMAL' Queue priority
progress_percent FLOAT NULLABLE Download progress (0-100)
error_message TEXT NULLABLE Error description if failed
retry_count INTEGER NOT NULL, DEFAULT 0 Number of retry attempts
source_url VARCHAR(2000) NULLABLE Download source URL
added_at DATETIME NOT NULL, DEFAULT NOW When added to queue
started_at DATETIME NULLABLE When download started
completed_at DATETIME NULLABLE When download completed/failed
created_at DATETIME NOT NULL, DEFAULT NOW Record creation timestamp
updated_at DATETIME NOT NULL, ON UPDATE NOW Last update timestamp

Status Values: pending, downloading, paused, completed, failed, cancelled

Priority Values: LOW, NORMAL, HIGH

Foreign Key:

  • series_id -> anime_series.id (ON DELETE CASCADE)

Source: src/server/database/models.py


4. Indexes

Table Index Name Columns Purpose
system_settings N/A (single row) N/A Only one row, no indexes needed
anime_series ix_anime_series_key key Fast lookup by primary identifier
anime_series ix_anime_series_name name Search by name
episodes ix_episodes_series_id series_id Join with series
download_queue_item ix_download_series_id series_id Filter by series
download_queue_item ix_download_status status Filter by status

5. Model Layer

5.1 SQLAlchemy ORM Models

# src/server/database/models.py

class AnimeSeries(Base, TimestampMixin):
    __tablename__ = "anime_series"

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    key: Mapped[str] = mapped_column(String(255), unique=True, index=True)
    name: Mapped[str] = mapped_column(String(500), index=True)
    site: Mapped[str] = mapped_column(String(500))
    folder: Mapped[str] = mapped_column(String(1000))

    episodes: Mapped[List["Episode"]] = relationship(
        "Episode", back_populates="series", cascade="all, delete-orphan"
    )

Source: src/server/database/models.py

5.2 Pydantic API Models

# src/server/models/download.py

class DownloadItem(BaseModel):
    id: str
    serie_id: str      # Maps to anime_series.key
    serie_folder: str  # Metadata only
    serie_name: str
    episode: EpisodeIdentifier
    status: DownloadStatus
    priority: DownloadPriority

Source: src/server/models/download.py

5.3 Model Mapping

API Field Database Column Notes
serie_id anime_series.key Primary identifier
serie_folder anime_series.folder Metadata only
serie_name anime_series.name Display name

6. Transaction Support

6.1 Overview

The database layer provides comprehensive transaction support to ensure data consistency across compound operations. All write operations can be wrapped in explicit transactions.

Source: src/server/database/transaction.py

6.2 Transaction Utilities

Component Type Description
@transactional Decorator Wraps function in transaction boundary
atomic() Async context mgr Provides atomic operation block
atomic_sync() Sync context mgr Sync version of atomic()
TransactionContext Class Explicit sync transaction control
AsyncTransactionContext Class Explicit async transaction control
TransactionManager Class Helper for manual transaction management

6.3 Transaction Propagation Modes

Mode Behavior
REQUIRED Use existing transaction or create new (default)
REQUIRES_NEW Always create new transaction
NESTED Create savepoint within existing transaction

6.4 Usage Examples

Using @transactional decorator:

from src.server.database.transaction import transactional

@transactional()
async def compound_operation(db: AsyncSession, data: dict):
    # All operations commit together or rollback on error
    series = await AnimeSeriesService.create(db, ...)
    episode = await EpisodeService.create(db, series_id=series.id, ...)
    return series, episode

Using atomic() context manager:

from src.server.database.transaction import atomic

async def some_function(db: AsyncSession):
    async with atomic(db) as tx:
        await operation1(db)
        await operation2(db)
        # Auto-commits on success, rolls back on exception

Using savepoints for partial rollback:

async with atomic(db) as tx:
    await outer_operation(db)

    async with tx.savepoint() as sp:
        await risky_operation(db)
        if error_condition:
            await sp.rollback()  # Only rollback nested ops

    await final_operation(db)  # Still executes

Source: src/server/database/transaction.py

6.5 Connection Module Additions

Function Description
get_transactional_session Session without auto-commit for transactions
TransactionManager Helper class for manual transaction control
is_session_in_transaction Check if session is in active transaction
get_session_transaction_depth Get nesting depth of transactions

Source: src/server/database/connection.py


7. Repository Pattern

The QueueRepository class provides data access abstraction.

class QueueRepository:
    async def save_item(self, item: DownloadItem) -> None:
        """Save or update a download item (atomic operation)."""

    async def get_all_items(self) -> List[DownloadItem]:
        """Get all items from database."""

    async def delete_item(self, item_id: str) -> bool:
        """Delete item by ID."""

    async def clear_all(self) -> int:
        """Clear all items (atomic operation)."""

Note: Compound operations (save_item, clear_all) are wrapped in atomic() transactions.

Source: src/server/services/queue_repository.py


8. Database Service

The AnimeSeriesService provides async CRUD operations.

class AnimeSeriesService:
    @staticmethod
    async def create(
        db: AsyncSession,
        key: str,
        name: str,
        site: str,
        folder: str
    ) -> AnimeSeries:
        """Create a new anime series."""

    @staticmethod
    async def get_by_key(
        db: AsyncSession,
        key: str
    ) -> Optional[AnimeSeries]:
        """Get series by primary key identifier."""

Bulk Operations

Services provide bulk operations for transaction-safe batch processing:

Service Method Description
EpisodeService bulk_mark_downloaded Mark multiple episodes at once
DownloadQueueService bulk_delete Delete multiple queue items
DownloadQueueService clear_all Clear entire queue
UserSessionService rotate_session Revoke old + create new atomic
UserSessionService cleanup_expired Bulk delete expired sessions

Source: src/server/database/service.py


9. Data Integrity Rules

Validation Constraints

Field Rule Error Message
anime_series.key Non-empty, max 255 chars "Series key cannot be empty"
anime_series.name Non-empty, max 500 chars "Series name cannot be empty"
episodes.season 0-1000 "Season number must be non-negative"
episodes.episode_number 0-10000 "Episode number must be non-negative"

Source: src/server/database/models.py

Cascade Rules

  • Deleting anime_series deletes all related episodes and download_queue_item

10. Migration Strategy

Currently, SQLAlchemy's create_all() is used for schema creation.

# src/server/database/connection.py
async def init_db():
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

For production migrations, Alembic is recommended but not yet implemented.

Source: src/server/database/connection.py


11. Common Query Patterns

Get all series with missing episodes

series = await db.execute(
    select(AnimeSeries).options(selectinload(AnimeSeries.episodes))
)
for serie in series.scalars():
    downloaded = [e for e in serie.episodes if e.is_downloaded]

Get pending downloads ordered by priority

items = await db.execute(
    select(DownloadQueueItem)
    .where(DownloadQueueItem.status == "pending")
    .order_by(
        case(
            (DownloadQueueItem.priority == "HIGH", 1),
            (DownloadQueueItem.priority == "NORMAL", 2),
            (DownloadQueueItem.priority == "LOW", 3),
        ),
        DownloadQueueItem.added_at
    )
)

12. Series Storage: Database vs Files (Deprecated)

File-Based Storage (Removed in v2.0)

Prior to v2.0, series metadata was stored in two files per anime folder:

File Contents
key Series provider key (e.g., "attack-on-titan")
data JSON serialization of Serie object

File structure example:

/anime/Attack on Titan (2013)/
├── key          # Contains: attack-on-titan
├── data         # Contains: {"key": "...", "name": "...", "episodeDict": {...}}
├── Season 1/
│   └── ...

Database Storage (Current)

Since v2.0, all series metadata is stored in the anime_series table with Episode records for episode tracking. This provides:

  • ACID transactions for data consistency
  • Foreign key constraints (cascade delete)
  • Indexed queries for fast lookups
  • No filesystem dependency for metadata

Migration from Files to Database

The Serie.save_to_file() and Serie.load_from_file() methods are deprecated but still functional for backward compatibility during migration:

from src.core.entities.series import Serie

# Old file-based loading (deprecated)
serie = Serie.load_from_file("/anime/Attack on Titan (2013)/data")

# New database-based loading
from src.server.database.service import AnimeSeriesService
serie = await AnimeSeriesService.get_by_key(db, "attack-on-titan")

Removing File Dependencies

After verifying database schema supports all fields, file-based storage can be removed:

  1. Schema verified: All Serie fields have corresponding DB columns
  2. Migration complete: All existing series migrated to database
  3. File cleanup: Remove key and data files (pending)

Note: The save_to_file() and load_from_file() methods will be removed in v3.0.0.


12. Series Persistence Flow

When a directory scan discovers or updates series, the scanner persists data to the database instead of writing to disk files.

Scan Flow

Scan Directory
    │
    ▼
Find MP4 Files → Extract Serie Key
    │
    ▼
Check DB for Existing Series (by key)
    │
    ├─── EXISTS ──────────────────────► Update Series Metadata
    │                                        │
    │                                        ▼
    │                                 Sync Episodes to DB
    │                                      │
    │◄──────────────────────────────────────┘
    │
    └─── NEW ───────────────────────────► Create New Series Record
                                             │
                                             ▼
                                      Create Episode Records
                                             │
                                             ▼
                                      Return to Scan Loop

Key Methods

SerieScanner._persist_serie_to_db()

  • Called after get_missing_episodes_and_season() computes episodeDict
  • Uses AnimeSeriesService.get_by_key() to check if series exists
  • If exists: calls AnimeSeriesService.update() + _sync_episodes_to_db()
  • If new: calls AnimeSeriesService.create() + creates episodes

SerieScanner._sync_episodes_to_db()

  • Gets existing episodes from DB via EpisodeService.get_by_series()
  • Compares with new episodeDict
  • Removes episodes no longer missing (unless is_downloaded=True)
  • Adds new missing episodes
  • Preserves is_downloaded=True episodes when removing missing ones

SerieList.add_to_db()

  • Used when adding a new discovered series via API
  • Creates filesystem folder + database record + episode records

Episode Sync Logic

# For each episode in DB but not in new episodeDict:
if episode.is_downloaded:
    # Keep - file exists, don't remove
    pass
else:
    # Remove - no longer missing
    EpisodeService.delete()

# For each episode in new episodeDict but not in DB:
# Add as new missing episode
EpisodeService.create(is_downloaded=False)

Transaction Handling

  • DB operations use their own session with commit/rollback
  • If DB write fails, error is logged and scan continues
  • File-based save_to_file() no longer called during scan

Migration Path

  1. v2.x: Scanner writes to both DB (primary) and files (fallback)
  2. v3.0: Scanner writes only to DB, file methods removed

13. Series Persistence

Schema

AnimeSeries Table: Stores series metadata (key, name, site, folder, year)

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment
key VARCHAR(255) UNIQUE, NOT NULL Series provider key
name VARCHAR(500) NOT NULL Display name
site VARCHAR(500) Provider site URL
folder VARCHAR(1000) Filesystem folder

Episode Table: Stores per-episode metadata (season, episode_number, is_downloaded)

Column Type Constraints Description
id INTEGER PRIMARY KEY Auto-increment
series_id INTEGER FOREIGN KEY → anime_series Parent series
season INTEGER NOT NULL Season number
episode_number INTEGER NOT NULL Episode number
is_downloaded BOOLEAN DEFAULT FALSE Download status

Relationships

  • AnimeSeries.episodes → List of Episode objects (one-to-many)
  • Episode.series → Parent AnimeSeries (many-to-one)
  • Cascade delete: Deleting a series removes all its episodes

Queries

# Get all series with episodes
AnimeSeriesService.get_all(db, with_episodes=True)

# Get by provider key
AnimeSeriesService.get_by_key(db, key)

# Get by folder path
AnimeSeriesService.get_by_folder(db, folder)

14. Database Location

Environment Default Location
Development ./data/aniworld.db
Production Via DATABASE_URL environment variable
Testing In-memory SQLite (sqlite+aiosqlite:///:memory:)