Aniworld/docs/infrastructure.md
Lukas 684337fd0c Add data file to database sync functionality
- Add get_all_series_from_data_files() to SeriesApp
- Sync series from data files to DB on startup
- Add unit tests for new SeriesApp method
- Add integration tests for sync functionality
- Update documentation
2025-12-13 09:32:57 +01:00

441 lines
17 KiB
Markdown

# Aniworld Web Application Infrastructure
```bash
conda activate AniWorld
```
## Project Structure
```
src/
├── core/ # Core application logic
│ ├── SeriesApp.py # Main application class
│ ├── SerieScanner.py # Directory scanner
│ ├── entities/ # Domain entities (series.py, SerieList.py)
│ ├── interfaces/ # Abstract interfaces (providers.py, callbacks.py)
│ ├── providers/ # Content providers (aniworld, streaming)
│ └── exceptions/ # Custom exceptions
├── server/ # FastAPI web application
│ ├── fastapi_app.py # Main FastAPI application
│ ├── controllers/ # Route controllers (health, page, error)
│ ├── api/ # API routes (auth, config, anime, download, websocket)
│ ├── models/ # Pydantic models
│ ├── services/ # Business logic services
│ ├── database/ # SQLAlchemy ORM layer
│ ├── utils/ # Utilities (dependencies, templates, security)
│ └── web/ # Frontend (templates, static assets)
├── cli/ # CLI application
data/ # Config, database, queue state
logs/ # Application logs
tests/ # Test suites
```
## Technology Stack
| Layer | Technology |
| --------- | ---------------------------------------------- |
| Backend | FastAPI, Uvicorn, SQLAlchemy, SQLite, Pydantic |
| Frontend | HTML5, CSS3, Vanilla JS, Bootstrap 5, HTMX |
| Security | JWT (python-jose), bcrypt (passlib) |
| Real-time | Native WebSocket |
## Series Identifier Convention
Throughout the codebase, three identifiers are used for anime series:
| Identifier | Type | Purpose | Example |
| ---------- | --------------- | ----------------------------------------------------------- | -------------------------- |
| `key` | Unique, Indexed | **PRIMARY** - All lookups, API operations, WebSocket events | `"attack-on-titan"` |
| `folder` | String | Display/filesystem metadata only (never for lookups) | `"Attack on Titan (2013)"` |
| `id` | Primary Key | Internal database key for relationships | `1`, `42` |
### Key Format Requirements
- **Lowercase only**: No uppercase letters allowed
- **URL-safe**: Only alphanumeric characters and hyphens
- **Hyphen-separated**: Words separated by single hyphens
- **No leading/trailing hyphens**: Must start and end with alphanumeric
- **No consecutive hyphens**: `attack--titan` is invalid
**Valid examples**: `"attack-on-titan"`, `"one-piece"`, `"86-eighty-six"`, `"re-zero"`
**Invalid examples**: `"Attack On Titan"`, `"attack_on_titan"`, `"attack on titan"`
### Notes
- **Backward Compatibility**: API endpoints accepting `anime_id` will check `key` first, then fall back to `folder` lookup
- **New Code**: Always use `key` for identification; `folder` is metadata only
## API Endpoints
### Authentication (`/api/auth`)
- `POST /login` - Master password authentication (returns JWT)
- `POST /logout` - Invalidate session
- `GET /status` - Check authentication status
### Configuration (`/api/config`)
- `GET /` - Get configuration
- `PUT /` - Update configuration
- `POST /validate` - Validate without applying
- `GET /backups` - List backups
- `POST /backups/{name}/restore` - Restore backup
### Anime (`/api/anime`)
- `GET /` - List anime with missing episodes (returns `key` as identifier)
- `GET /{anime_id}` - Get anime details (accepts `key` or `folder` for backward compatibility)
- `POST /search` - Search for anime (returns `key` as identifier)
- `POST /add` - Add new series (extracts `key` from link URL)
- `POST /rescan` - Trigger library rescan
**Response Models:**
- `AnimeSummary`: `key` (primary identifier), `name`, `site`, `folder` (metadata), `missing_episodes`, `link`
- `AnimeDetail`: `key` (primary identifier), `title`, `folder` (metadata), `episodes`, `description`
### Download Queue (`/api/queue`)
- `GET /status` - Queue status and statistics
- `POST /add` - Add episodes to queue
- `DELETE /{item_id}` - Remove item
- `POST /start` | `/stop` | `/pause` | `/resume` - Queue control
- `POST /retry` - Retry failed downloads
- `DELETE /completed` - Clear completed items
**Request Models:**
- `DownloadRequest`: `serie_id` (key, primary identifier), `serie_folder` (filesystem path), `serie_name` (display), `episodes`, `priority`
**Response Models:**
- `DownloadItem`: `id`, `serie_id` (key), `serie_folder` (metadata), `serie_name`, `episode`, `status`, `progress`
- `QueueStatus`: `is_running`, `is_paused`, `active_downloads`, `pending_queue`, `completed_downloads`, `failed_downloads`
### WebSocket (`/ws/connect`)
Real-time updates for downloads, scans, and queue operations.
**Rooms**: `downloads`, `download_progress`, `scan_progress`
**Message Types**: `download_progress`, `download_complete`, `download_failed`, `queue_status`, `scan_progress`, `scan_complete`, `scan_failed`
**Series Identifier in Messages:**
All series-related WebSocket events include `key` as the primary identifier in their data payload:
```json
{
"type": "download_progress",
"timestamp": "2025-10-17T10:30:00.000Z",
"data": {
"download_id": "abc123",
"key": "attack-on-titan",
"folder": "Attack on Titan (2013)",
"percent": 45.2,
"speed_mbps": 2.5,
"eta_seconds": 180
}
}
```
## Database Models
| Model | Purpose |
| ----------------- | ---------------------------------------- |
| AnimeSeries | Series metadata (key, name, folder, etc) |
| Episode | Episodes linked to series |
| DownloadQueueItem | Queue items with status and progress |
| UserSession | JWT sessions with expiry |
**Mixins**: `TimestampMixin` (created_at, updated_at), `SoftDeleteMixin`
### AnimeSeries Identifier Fields
| Field | Type | Purpose |
| -------- | --------------- | ------------------------------------------------- |
| `id` | Primary Key | Internal database key for relationships |
| `key` | Unique, Indexed | **PRIMARY IDENTIFIER** for all lookups |
| `folder` | String | Filesystem metadata only (not for identification) |
**Database Service Methods:**
- `AnimeSeriesService.get_by_key(key)` - **Primary lookup method**
- `AnimeSeriesService.get_by_id(id)` - Internal lookup by database ID
- No `get_by_folder()` method exists - folder is never used for lookups
### DownloadQueueItem Fields
| Field | Type | Purpose |
| -------------- | ----------- | ----------------------------------------- |
| `id` | String (PK) | UUID for the queue item |
| `serie_id` | String | Series key for identification |
| `serie_folder` | String | Filesystem folder path |
| `serie_name` | String | Display name for the series |
| `season` | Integer | Season number |
| `episode` | Integer | Episode number |
| `status` | Enum | pending, downloading, completed, failed |
| `priority` | Enum | low, normal, high |
| `progress` | Float | Download progress percentage (0.0-100.0) |
| `error` | String | Error message if failed |
| `retry_count` | Integer | Number of retry attempts |
| `added_at` | DateTime | When item was added to queue |
| `started_at` | DateTime | When download started (nullable) |
| `completed_at` | DateTime | When download completed/failed (nullable) |
## Data Storage
### Storage Architecture
The application uses **SQLite database** as the primary storage for all application data.
| Data Type | Storage Location | Service |
| -------------- | ------------------ | --------------------------------------- |
| Anime Series | `data/aniworld.db` | `AnimeSeriesService` |
| Episodes | `data/aniworld.db` | `AnimeSeriesService` |
| Download Queue | `data/aniworld.db` | `DownloadService` via `QueueRepository` |
| User Sessions | `data/aniworld.db` | `AuthService` |
| Configuration | `data/config.json` | `ConfigService` |
### Download Queue Storage
The download queue is stored in SQLite via `QueueRepository`, which wraps `DownloadQueueService`:
```python
# QueueRepository provides async operations for queue items
repository = QueueRepository(session_factory)
# Save item to database
saved_item = await repository.save_item(download_item)
# Get pending items (ordered by priority and add time)
pending = await repository.get_pending_items()
# Update item status
await repository.update_status(item_id, DownloadStatus.COMPLETED)
# Update download progress
await repository.update_progress(item_id, progress=45.5, downloaded=450, total=1000, speed=2.5)
```
**Queue Persistence Features:**
- Queue state survives server restarts
- Items in `downloading` status are reset to `pending` on startup
- Failed items within retry limit are automatically re-queued
- Completed and failed history is preserved (with limits)
- Real-time progress updates are persisted to database
### Anime Series Database Storage
```python
# Add series to database
await AnimeSeriesService.create(db_session, series_data)
# Query series by key
series = await AnimeSeriesService.get_by_key(db_session, "attack-on-titan")
# Update series
await AnimeSeriesService.update(db_session, series_id, update_data)
```
### Legacy File Storage (Deprecated)
The legacy file-based storage is **deprecated** and will be removed in v3.0.0:
- `Serie.save_to_file()` - Deprecated, use `AnimeSeriesService.create()`
- `Serie.load_from_file()` - Deprecated, use `AnimeSeriesService.get_by_key()`
- `SerieList.add()` - Deprecated, use `SerieList.add_to_db()`
Deprecation warnings are raised when using these methods.
## Core Services
### SeriesApp (`src/core/SeriesApp.py`)
Main engine for anime series management with async support, progress callbacks, and cancellation.
**Key Methods:**
- `search(words)` - Search for anime series
- `download(serie_folder, season, episode, key, language)` - Download an episode
- `rescan()` - Rescan directory for missing episodes
- `get_all_series_from_data_files()` - Load all series from data files in the anime directory (used for database sync on startup)
### Data File to Database Sync
On application startup, the system automatically syncs series from data files to the database:
1. After `download_service.initialize()` succeeds
2. `SeriesApp.get_all_series_from_data_files()` loads all series from `data` files
3. Each series is added to the database via `SerieList.add_to_db()`
4. Existing series are skipped (no duplicates)
5. Sync continues silently even if individual series fail
This ensures that series metadata stored in filesystem data files is available in the database for the web application.
### Callback System (`src/core/interfaces/callbacks.py`)
- `ProgressCallback`, `ErrorCallback`, `CompletionCallback`
- Context classes include `key` + optional `folder` fields
- Thread-safe `CallbackManager` for multiple callback registration
### Services (`src/server/services/`)
| Service | Purpose |
| ---------------- | ----------------------------------------- |
| AnimeService | Series management, scans (uses SeriesApp) |
| DownloadService | Queue management, download execution |
| ScanService | Library scan operations with callbacks |
| ProgressService | Centralized progress tracking + WebSocket |
| WebSocketService | Real-time connection management |
| AuthService | JWT authentication, rate limiting |
| ConfigService | Configuration persistence with backups |
## Validation Utilities (`src/server/utils/validators.py`)
Provides data validation functions for ensuring data integrity across the application.
### Series Key Validation
- **`validate_series_key(key)`**: Validates key format (URL-safe, lowercase, hyphens only)
- Valid: `"attack-on-titan"`, `"one-piece"`, `"86-eighty-six"`
- Invalid: `"Attack On Titan"`, `"attack_on_titan"`, `"attack on titan"`
- **`validate_series_key_or_folder(identifier, allow_folder=True)`**: Backward-compatible validation
- Returns tuple `(identifier, is_key)` where `is_key` indicates if it's a valid key format
- Set `allow_folder=False` to require strict key format
### Other Validators
| Function | Purpose |
| --------------------------- | ------------------------------------------ |
| `validate_series_name` | Series display name validation |
| `validate_episode_range` | Episode range validation (1-1000) |
| `validate_download_quality` | Quality setting (360p-1080p, best, worst) |
| `validate_language` | Language codes (ger-sub, ger-dub, etc.) |
| `validate_anime_url` | Aniworld.to/s.to URL validation |
| `validate_backup_name` | Backup filename validation |
| `validate_config_data` | Configuration data structure validation |
| `sanitize_filename` | Sanitize filenames for safe filesystem use |
## Template Helpers (`src/server/utils/template_helpers.py`)
Provides utilities for template rendering and series data preparation.
### Core Functions
| Function | Purpose |
| -------------------------- | --------------------------------- |
| `get_base_context` | Base context for all templates |
| `render_template` | Render template with context |
| `validate_template_exists` | Check if template file exists |
| `list_available_templates` | List all available template files |
### Series Context Helpers
All series helpers use `key` as the primary identifier:
| Function | Purpose |
| ----------------------------------- | ---------------------------------------------- |
| `prepare_series_context` | Prepare series data for templates (uses `key`) |
| `get_series_by_key` | Find series by `key` (not `folder`) |
| `filter_series_by_missing_episodes` | Filter series with missing episodes |
**Example Usage:**
```python
from src.server.utils.template_helpers import prepare_series_context
series_data = [
{"key": "attack-on-titan", "name": "Attack on Titan", "folder": "Attack on Titan (2013)"},
{"key": "one-piece", "name": "One Piece", "folder": "One Piece (1999)"}
]
prepared = prepare_series_context(series_data, sort_by="name")
# Returns sorted list using 'key' as identifier
```
## Frontend
### Static Files
- CSS: `styles.css` (Fluent UI design), `ux_features.css` (accessibility)
- JS: `app.js`, `queue.js`, `websocket_client.js`, accessibility modules
### WebSocket Client
Native WebSocket wrapper with Socket.IO-compatible API:
```javascript
const socket = io();
socket.join("download_progress");
socket.on("download_progress", (data) => {
/* ... */
});
```
### Authentication
JWT tokens stored in localStorage, included as `Authorization: Bearer <token>`.
## Testing
```bash
# All tests
conda run -n AniWorld python -m pytest tests/ -v
# Unit tests only
conda run -n AniWorld python -m pytest tests/unit/ -v
# API tests
conda run -n AniWorld python -m pytest tests/api/ -v
```
## Production Notes
### Current (Single-Process)
- SQLite with WAL mode
- In-memory WebSocket connections
- File-based config and queue persistence
### Multi-Process Deployment
- Switch to PostgreSQL/MySQL
- Move WebSocket registry to Redis
- Use distributed locking for queue operations
- Consider Redis for session/cache storage
## Code Examples
### API Usage with Key Identifier
```python
# Fetching anime list - response includes 'key' as identifier
response = requests.get("/api/anime", headers={"Authorization": f"Bearer {token}"})
anime_list = response.json()
# Each item has: key="attack-on-titan", folder="Attack on Titan (2013)", ...
# Fetching specific anime by key (preferred)
response = requests.get("/api/anime/attack-on-titan", headers={"Authorization": f"Bearer {token}"})
# Adding to download queue using key
download_request = {
"serie_id": "attack-on-titan", # Use key, not folder
"serie_folder": "Attack on Titan (2013)", # Metadata for filesystem
"serie_name": "Attack on Titan",
"episodes": ["S01E01", "S01E02"],
"priority": 1
}
response = requests.post("/api/queue/add", json=download_request, headers=headers)
```
### WebSocket Event Handling
```javascript
// WebSocket events always include 'key' as identifier
socket.on("download_progress", (data) => {
const key = data.key; // Primary identifier: "attack-on-titan"
const folder = data.folder; // Metadata: "Attack on Titan (2013)"
updateProgressBar(key, data.percent);
});
```