Lukas 17754a86f0 Add database migration from legacy data files

- Create DataMigrationService for migrating data files to SQLite
- Add sync database methods to AnimeSeriesService
- Update SerieScanner to save to database with file fallback
- Update anime API endpoints to use database with fallback
- Add delete endpoint for anime series
- Add automatic migration on startup in fastapi_app.py lifespan
- Add 28 unit tests for migration service
- Add 14 integration tests for migration flow
- Update infrastructure.md and database README docs

Migration runs automatically on startup, legacy data files preserved.

2025-12-01 17:42:09 +01:00

16 KiB

Raw Blame History

Aniworld Web Application Infrastructure

conda activate AniWorld

Project Structure

src/
├── core/                  # Core application logic
│   ├── SeriesApp.py       # Main application class
│   ├── SerieScanner.py    # Directory scanner
│   ├── entities/          # Domain entities (series.py, SerieList.py)
│   ├── interfaces/        # Abstract interfaces (providers.py, callbacks.py)
│   ├── providers/         # Content providers (aniworld, streaming)
│   └── exceptions/        # Custom exceptions
├── server/                # FastAPI web application
│   ├── fastapi_app.py     # Main FastAPI application
│   ├── controllers/       # Route controllers (health, page, error)
│   ├── api/               # API routes (auth, config, anime, download, websocket)
│   ├── models/            # Pydantic models
│   ├── services/          # Business logic services
│   ├── database/          # SQLAlchemy ORM layer
│   ├── utils/             # Utilities (dependencies, templates, security)
│   └── web/               # Frontend (templates, static assets)
├── cli/                   # CLI application
data/                      # Config, database, queue state
logs/                      # Application logs
tests/                     # Test suites

Technology Stack

Layer	Technology
Backend	FastAPI, Uvicorn, SQLAlchemy, SQLite, Pydantic
Frontend	HTML5, CSS3, Vanilla JS, Bootstrap 5, HTMX
Security	JWT (python-jose), bcrypt (passlib)
Real-time	Native WebSocket

Series Identifier Convention

Throughout the codebase, three identifiers are used for anime series:

Identifier	Type	Purpose	Example
`key`	Unique, Indexed	PRIMARY - All lookups, API operations, WebSocket events	`"attack-on-titan"`
`folder`	String	Display/filesystem metadata only (never for lookups)	`"Attack on Titan (2013)"`
`id`	Primary Key	Internal database key for relationships	`1`, `42`

Key Format Requirements

Lowercase only: No uppercase letters allowed
URL-safe: Only alphanumeric characters and hyphens
Hyphen-separated: Words separated by single hyphens
No leading/trailing hyphens: Must start and end with alphanumeric
No consecutive hyphens: attack--titan is invalid

Valid examples: "attack-on-titan", "one-piece", "86-eighty-six", "re-zero" Invalid examples: "Attack On Titan", "attack_on_titan", "attack on titan"

Migration Notes

Backward Compatibility: API endpoints accepting anime_id will check key first, then fall back to folder lookup
Deprecation: Folder-based lookups are deprecated and will be removed in a future version
New Code: Always use key for identification; folder is metadata only

API Endpoints

Authentication (`/api/auth`)

POST /login - Master password authentication (returns JWT)
POST /logout - Invalidate session
GET /status - Check authentication status

Configuration (`/api/config`)

GET / - Get configuration
PUT / - Update configuration
POST /validate - Validate without applying
GET /backups - List backups
POST /backups/{name}/restore - Restore backup

Anime (`/api/anime`)

GET / - List anime with missing episodes (returns key as identifier)
GET /{anime_id} - Get anime details (accepts key or folder for backward compatibility)
POST /search - Search for anime (returns key as identifier)
POST /add - Add new series (extracts key from link URL)
POST /rescan - Trigger library rescan

Response Models:

AnimeSummary: key (primary identifier), name, site, folder (metadata), missing_episodes, link
AnimeDetail: key (primary identifier), title, folder (metadata), episodes, description

Download Queue (`/api/queue`)

GET /status - Queue status and statistics
POST /add - Add episodes to queue
DELETE /{item_id} - Remove item
POST /start | /stop | /pause | /resume - Queue control
POST /retry - Retry failed downloads
DELETE /completed - Clear completed items

Request Models:

DownloadRequest: serie_id (key, primary identifier), serie_folder (filesystem path), serie_name (display), episodes, priority

Response Models:

DownloadItem: id, serie_id (key), serie_folder (metadata), serie_name, episode, status, progress
QueueStatus: is_running, is_paused, active_downloads, pending_queue, completed_downloads, failed_downloads

WebSocket (`/ws/connect`)

Real-time updates for downloads, scans, and queue operations.

Rooms: downloads, download_progress, scan_progress

Message Types: download_progress, download_complete, download_failed, queue_status, scan_progress, scan_complete, scan_failed

Series Identifier in Messages: All series-related WebSocket events include key as the primary identifier in their data payload:

{
    "type": "download_progress",
    "timestamp": "2025-10-17T10:30:00.000Z",
    "data": {
        "download_id": "abc123",
        "key": "attack-on-titan",
        "folder": "Attack on Titan (2013)",
        "percent": 45.2,
        "speed_mbps": 2.5,
        "eta_seconds": 180
    }
}

Database Models

Model	Purpose
AnimeSeries	Series metadata (key, name, folder, etc)
Episode	Episodes linked to series
DownloadQueueItem	Queue items with status and progress
UserSession	JWT sessions with expiry

Mixins: TimestampMixin (created_at, updated_at), SoftDeleteMixin

AnimeSeries Identifier Fields

Field	Type	Purpose
`id`	Primary Key	Internal database key for relationships
`key`	Unique, Indexed	PRIMARY IDENTIFIER for all lookups
`folder`	String	Filesystem metadata only (not for identification)

Database Service Methods:

AnimeSeriesService.get_by_key(key) - Primary lookup method
AnimeSeriesService.get_by_id(id) - Internal lookup by database ID
AnimeSeriesService.get_all(db) - Get all series from database
AnimeSeriesService.create(db, key, name, site, folder, episode_dict) - Create new series
AnimeSeriesService.update(db, id, **kwargs) - Update existing series
AnimeSeriesService.delete(db, id) - Delete series by ID
AnimeSeriesService.upsert_sync(db, key, name, site, folder, episode_dict) - Sync upsert for scanner

No get_by_folder() method exists - folder is never used for lookups.

Data Storage Migration

Background

The application has migrated from file-based storage to SQLite database storage for anime series metadata.

Previous Storage (Deprecated):

Individual data files (no extension) in each anime folder
Example: /anime-directory/Attack on Titan (2013)/data

Current Storage (Database):

SQLite database at data/aniworld.db
Managed by AnimeSeriesService using SQLAlchemy

Migration Service

The DataMigrationService handles automatic migration of legacy data files to the database:

from src.server.services.data_migration_service import DataMigrationService

# Check for legacy files
service = DataMigrationService()
files = await service.check_for_legacy_data_files(anime_directory)

# Migrate all to database
result = await service.migrate_all_legacy_data(anime_directory, db_session)
print(result)  # Migration Result: 10 migrated, 2 skipped, 0 failed

# Optional: cleanup old files with backup
await service.cleanup_migrated_files(files, backup=True)

Automatic Migration on Startup

Migration runs automatically during application startup:

Database is initialized (init_db())
Legacy data files are detected
Files are migrated to database
Results are logged (no files are deleted automatically)

Migration Result

@dataclass
class MigrationResult:
    total_found: int    # Total legacy files found
    migrated: int       # Successfully migrated
    failed: int         # Failed to migrate
    skipped: int        # Already in database
    errors: List[str]   # Error messages

Deprecation Notes

Legacy file-based storage is deprecated - Do not create new data files
SerieScanner: Updated to save to database (with file fallback for CLI)
API endpoints: Now use database as primary source
CLI: Still uses file-based storage for backward compatibility

File	Purpose
`src/server/services/data_migration_service.py`	Migration service
`src/server/database/service.py`	Database CRUD operations
`src/server/database/models.py`	SQLAlchemy models
`src/core/SerieScanner.py`	Scanner with DB support

Core Services

SeriesApp (`src/core/SeriesApp.py`)

Main engine for anime series management with async support, progress callbacks, and cancellation.

Callback System (`src/core/interfaces/callbacks.py`)

ProgressCallback, ErrorCallback, CompletionCallback
Context classes include key + optional folder fields
Thread-safe CallbackManager for multiple callback registration

Services (`src/server/services/`)

Service	Purpose
AnimeService	Series management, scans (uses SeriesApp)
DownloadService	Queue management, download execution
ScanService	Library scan operations with callbacks
ProgressService	Centralized progress tracking + WebSocket
WebSocketService	Real-time connection management
AuthService	JWT authentication, rate limiting
ConfigService	Configuration persistence with backups

Validation Utilities (`src/server/utils/validators.py`)

Provides data validation functions for ensuring data integrity across the application.

Series Key Validation

validate_series_key(key): Validates key format (URL-safe, lowercase, hyphens only)
- Valid: "attack-on-titan", "one-piece", "86-eighty-six"
- Invalid: "Attack On Titan", "attack_on_titan", "attack on titan"
validate_series_key_or_folder(identifier, allow_folder=True): Backward-compatible validation
- Returns tuple (identifier, is_key) where is_key indicates if it's a valid key format
- Set allow_folder=False to require strict key format

Other Validators

Function	Purpose
`validate_series_name`	Series display name validation
`validate_episode_range`	Episode range validation (1-1000)
`validate_download_quality`	Quality setting (360p-1080p, best, worst)
`validate_language`	Language codes (ger-sub, ger-dub, etc.)
`validate_anime_url`	Aniworld.to/s.to URL validation
`validate_backup_name`	Backup filename validation
`validate_config_data`	Configuration data structure validation
`sanitize_filename`	Sanitize filenames for safe filesystem use

Template Helpers (`src/server/utils/template_helpers.py`)

Provides utilities for template rendering and series data preparation.

Core Functions

Function	Purpose
`get_base_context`	Base context for all templates
`render_template`	Render template with context
`validate_template_exists`	Check if template file exists
`list_available_templates`	List all available template files

Series Context Helpers

All series helpers use key as the primary identifier:

Function	Purpose
`prepare_series_context`	Prepare series data for templates (uses `key`)
`get_series_by_key`	Find series by `key` (not `folder`)
`filter_series_by_missing_episodes`	Filter series with missing episodes

Example Usage:

from src.server.utils.template_helpers import prepare_series_context

series_data = [
    {"key": "attack-on-titan", "name": "Attack on Titan", "folder": "Attack on Titan (2013)"},
    {"key": "one-piece", "name": "One Piece", "folder": "One Piece (1999)"}
]
prepared = prepare_series_context(series_data, sort_by="name")
# Returns sorted list using 'key' as identifier

Frontend

Static Files

CSS: styles.css (Fluent UI design), ux_features.css (accessibility)
JS: app.js, queue.js, websocket_client.js, accessibility modules

WebSocket Client

Native WebSocket wrapper with Socket.IO-compatible API:

const socket = io();
socket.join("download_progress");
socket.on("download_progress", (data) => {
    /* ... */
});

Authentication

JWT tokens stored in localStorage, included as Authorization: Bearer <token>.

Testing

# All tests
conda run -n AniWorld python -m pytest tests/ -v

# Unit tests only
conda run -n AniWorld python -m pytest tests/unit/ -v

# API tests
conda run -n AniWorld python -m pytest tests/api/ -v

Production Notes

Current (Single-Process)

SQLite with WAL mode
In-memory WebSocket connections
File-based config and queue persistence

Multi-Process Deployment

Switch to PostgreSQL/MySQL
Move WebSocket registry to Redis
Use distributed locking for queue operations
Consider Redis for session/cache storage

Code Examples

API Usage with Key Identifier

# Fetching anime list - response includes 'key' as identifier
response = requests.get("/api/anime", headers={"Authorization": f"Bearer {token}"})
anime_list = response.json()
# Each item has: key="attack-on-titan", folder="Attack on Titan (2013)", ...

# Fetching specific anime by key (preferred)
response = requests.get("/api/anime/attack-on-titan", headers={"Authorization": f"Bearer {token}"})

# Adding to download queue using key
download_request = {
    "serie_id": "attack-on-titan",  # Use key, not folder
    "serie_folder": "Attack on Titan (2013)",  # Metadata for filesystem
    "serie_name": "Attack on Titan",
    "episodes": ["S01E01", "S01E02"],
    "priority": 1
}
response = requests.post("/api/queue/add", json=download_request, headers=headers)

WebSocket Event Handling

// WebSocket events always include 'key' as identifier
socket.on("download_progress", (data) => {
    const key = data.key; // Primary identifier: "attack-on-titan"
    const folder = data.folder; // Metadata: "Attack on Titan (2013)"
    updateProgressBar(key, data.percent);
});

16 KiB Raw Blame History