# Aniworld Web Application Development Instructions This document provides detailed tasks for AI agents to implement a modern web application for the Aniworld anime download manager. All tasks should follow the coding guidelines specified in the project's copilot instructions. ## Project Overview The goal is to create a FastAPI-based web application that provides a modern interface for the existing Aniworld anime download functionality. The core anime logic should remain in `SeriesApp.py` while the web layer provides REST API endpoints and a responsive UI. ## Architecture Principles - **Single Responsibility**: Each file/class has one clear purpose - **Dependency Injection**: Use FastAPI's dependency system - **Clean Separation**: Web layer calls core logic, never the reverse - **File Size Limit**: Maximum 500 lines per file - **Type Hints**: Use comprehensive type annotations - **Error Handling**: Proper exception handling and logging ## Additional Implementation Guidelines ### Code Style and Standards - **Type Hints**: Use comprehensive type annotations throughout all modules - **Docstrings**: Follow PEP 257 for function and class documentation - **Error Handling**: Implement custom exception classes with meaningful messages - **Logging**: Use structured logging with appropriate log levels - **Security**: Validate all inputs and sanitize outputs - **Performance**: Use async/await patterns for I/O operations ## šŸ“ž Escalation If you encounter: - Architecture issues requiring design decisions - Tests that conflict with documented requirements - Breaking changes needed - Unclear requirements or expectations **Document the issue and escalate rather than guessing.** --- ## šŸ“š Helpful Commands ```bash # Run all tests conda run -n AniWorld python -m pytest tests/ -v --tb=short # Run specific test file conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py -v # Run specific test class conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService -v # Run specific test conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService::test_broadcast_download_progress -v # Run with extra verbosity conda run -n AniWorld python -m pytest tests/ -vv # Run with full traceback conda run -n AniWorld python -m pytest tests/ -v --tb=long # Run and stop at first failure conda run -n AniWorld python -m pytest tests/ -v -x # Run tests matching pattern conda run -n AniWorld python -m pytest tests/ -v -k "auth" # Show all print statements conda run -n AniWorld python -m pytest tests/ -v -s ``` --- ## šŸ“Š Detailed Analysis: The 7 Quality Criteria ### 5ļøāƒ£ No Shortcuts or Hacks Used **Global Variables (Temporary Storage)** - [ ] `src/server/fastapi_app.py` line 73 -> completed - `series_app: Optional[SeriesApp] = None` global storage - Should use FastAPI dependency injection instead - Problematic for testing and multiple instances **Logging Configuration Workarounds** -- [ ] `src/cli/Main.py` lines 12-22 -> reviewed (no manual handler removal found) - Manual logger handler removal is hacky - `for h in logging.root.handlers: logging.root.removeHandler(h)` is a hack - Should use proper logging configuration - Multiple loggers created with file handlers at odd paths (line 26) **Hardcoded Values** -- [ ] `src/core/providers/aniworld_provider.py` line 22 -> completed - `timeout = int(os.getenv("DOWNLOAD_TIMEOUT", 600))` at module level - Should be in settings class -- [ ] `src/core/providers/aniworld_provider.py` lines 38, 47 -> completed - User-Agent strings hardcoded - Provider list hardcoded - [ ] `src/cli/Main.py` line 227 - Network path hardcoded: `"\\\\sshfs.r\\ubuntu@192.168.178.43\\media\\serien\\Serien"` - Should be configuration **Exception Handling Shortcuts** - [ ] `src/core/providers/enhanced_provider.py` lines 410-421 - Bare `except Exception:` without specific types (line 418) -> reviewed - Multiple overlapping exception handlers (lines 410-425) -> reviewed - Should use specific exception hierarchy -> partially addressed (file removal and temp file cleanup now catch OSError; other broad catches intentionally wrap into RetryableError) - [ ] `src/server/api/anime.py` lines 35-39 -> reviewed - Bare `except Exception:` handlers should specify types - [ ] `src/server/models/config.py` line 93 - `except ValidationError: pass` - silently ignores error -> reviewed (validate() now collects and returns errors) **Type Casting Workarounds** - [ ] `src/server/api/download.py` line 52 -> reviewed - Complex `.model_dump(mode="json")` for serialization - Should use proper model serialization methods (kept for backward compatibility) - [ ] `src/server/utils/dependencies.py` line 36 - Type casting with `.get()` and defaults scattered throughout **Conditional Hacks** - [ ] `src/server/utils/dependencies.py` line 260 -> completed - `running_tests = "PYTEST_CURRENT_TEST" in os.environ or "pytest" in sys.modules` - Hacky test detection - should use proper test mode flag (now prefers ANIWORLD_TESTING env var) --- ### 6ļøāƒ£ Security Considerations Addressed #### Authentication & Authorization **Weak CORS Configuration** - [ ] `src/server/fastapi_app.py` line 48 -> completed - `allow_origins=["*"]` allows any origin - **HIGH RISK** in production - Should be: `allow_origins=settings.allowed_origins` (environment-based) - [ ] No CORS rate limiting by origin **Missing Authorization Checks** - [ ] `src/server/middleware/auth.py` lines 81-86 - Silent failure on missing auth for protected endpoints - Should consistently return 401 status - Some endpoints might bypass auth silently **In-Memory Session Storage** - [x] `src/server/services/auth_service.py` line 51 -> completed - In-memory `_failed` dict resets on restart - Documented limitation with warning comment - Should use Redis or database in production #### Input Validation **Unvalidated User Input** - [x] `src/cli/Main.py` line 80 -> completed (not found, likely already fixed) - User input for file paths not validated - Could allow path traversal attacks - [x] `src/core/SerieScanner.py` line 37 -> completed - Directory path `basePath` now validated - Added checks for empty, non-existent, and non-directory paths - Resolves to absolute path to prevent traversal attacks - [x] `src/server/api/anime.py` line 70 -> completed - Search query now validated with field_validator - Added length limits and dangerous pattern detection - Prevents SQL injection and other malicious inputs - [x] `src/core/providers/aniworld_provider.py` line 300+ -> completed - URL parameters now sanitized using quote() - Added validation for season/episode numbers - Key/slug parameters are URL-encoded before use **Missing Parameter Validation** - [x] `src/core/providers/enhanced_provider.py` line 280 -> completed - Season/episode validation now comprehensive - Added range checks (season: 1-999, episode: 1-9999) - Added key validation (non-empty check) - [ ] `src/server/database/models.py` - No length validation on string fields - No range validation on numeric fields #### Secrets and Credentials **Hardcoded Secrets** - [x] `src/config/settings.py` line 9 -> completed - JWT secret now uses `secrets.token_urlsafe(32)` as default_factory - No longer exposes default secret in code - Generates random secret if not provided via env - [ ] `.env` file might contain secrets (if exists) - Should be in .gitignore **Plaintext Password Storage** - [x] `src/config/settings.py` line 12 -> completed - Added prominent warning comment with emoji - Enhanced description to emphasize NEVER use in production - Clearly documents this is for development/testing only **Master Password Implementation** - [x] `src/server/services/auth_service.py` line 71 -> completed - Password requirements now comprehensive: - Minimum 8 characters - Mixed case (uppercase + lowercase) - At least one number - At least one special character - Enhanced error messages for better user guidance #### Data Protection **No Encryption of Sensitive Data** - [ ] Downloaded files not verified with checksums - [ ] No integrity checking of stored data - [ ] No encryption of sensitive config values **File Permission Issues** - [x] `src/core/providers/aniworld_provider.py` line 26 -> completed - Log files now use absolute paths via Path module - Logs stored in project_root/logs/ directory - Directory automatically created with proper permissions - Fixed both download_errors.log and no_key_found.log **Logging of Sensitive Data** - [ ] Check all `logger.debug()` calls for parameter logging - URLs might contain API keys - Search queries might contain sensitive terms - [ ] Example: `src/core/providers/enhanced_provider.py` line 260 - `logger.debug()` might log URLs with sensitive data #### Network Security **Unvalidated External Connections** - [ ] `src/core/providers/aniworld_provider.py` line 60 - HTTP retry configuration but no SSL verification flag check - [ ] `src/core/providers/enhanced_provider.py` line 115 - HTTP error codes 500-524 auto-retry without logging suspicious activity **Missing SSL/TLS Configuration** - [x] Verify SSL certificate validation enabled -> completed - Fixed all `verify=False` instances (4 total) - Changed to `verify=True` in: - doodstream.py (2 instances) - loadx.py (2 instances) - Added timeout parameters where missing - [x] Check for `verify=False` in requests calls -> completed - All requests now use SSL verification #### Database Security **No SQL Injection Protection** - [x] Check `src/server/database/service.py` for parameterized queries -> completed - All queries use SQLAlchemy query builder (select, update, delete) - No raw SQL or string concatenation found - Parameters properly passed through where() clauses - f-strings in LIKE clauses are safe (passed as parameter values) - [x] String interpolation in queries -> verified safe - No string interpolation directly in SQL queries - All user input is properly parameterized **No Database Access Control** - [ ] Single database user for all operations - [ ] No row-level security - [ ] No audit logging of data changes --- ### 7ļøāƒ£ Performance Validated #### Algorithmic Efficiency Issues **File Scanning Performance** - [ ] `src/core/SerieScanner.py` line 105+ - `__find_mp4_files()` - potential O(n²) complexity - Recursive directory traversal not profiled - No caching or incremental scanning - Large directories (>10K files) might cause timeout **Download Queue Processing** - [x] `src/server/services/download_service.py` line 240 -> completed - Optimized queue operations from O(n) to O(1) - Added helper dict `_pending_items_by_id` for fast lookups - Created helper methods: - `_add_to_pending_queue()` - maintains both deque and dict - `_remove_from_pending_queue()` - O(1) removal - Updated all append/remove operations to use helper methods - Tests passing āœ“ **Provider Search Performance** - [ ] `src/core/providers/enhanced_provider.py` line 220 - Multiple parsing strategies tried sequentially - Should fail fast on obvious errors instead of trying all 3 - No performance metrics logged **String Operations** - [ ] `src/cli/Main.py` line 118 - Nested `sum()` with comprehensions - O(n\*m) complexity - `total_episodes = sum(sum(len(ep) for ep in serie.episodeDict.values()) for serie in series)` - No streaming/generator pattern **Regular Expression Compilation** - [ ] `src/core/providers/streaming/doodstream.py` line 35 - Regex patterns compiled on every call - Should compile once at module level - Example: `r"\$\.get\('([^']*\/pass_md5\/[^']*)'"` compiled repeatedly #### Resource Usage Issues **Memory Leaks/Unbounded Growth** - [x] `src/server/middleware/auth.py` line 34 -> completed - Added \_cleanup_old_entries() method - Periodically removes rate limit entries older than 2x window - Cleanup runs every 5 minutes - Prevents unbounded memory growth from old IP addresses - [ ] `src/server/services/download_service.py` line 85-86 - `deque(maxlen=100)` and `deque(maxlen=50)` drop old items - Might lose important history **Connection Pool Configuration** - [ ] `src/server/database/connection.py` - Check if connection pooling is configured - No explicit pool size limits found - Could exhaust database connections **Large Data Structure Initialization** - [ ] `src/cli/Main.py` line 118 - Loading all series at once - Should use pagination for large datasets #### Caching Opportunities **No Request Caching** - [ ] `src/server/api/anime.py` - endpoints hit database every time - No caching headers set - `@cache` decorator could be used - [ ] `src/core/providers/enhanced_provider.py` - Search results not cached - Same search query hits network repeatedly **No Database Query Optimization** - [ ] `src/server/services/anime_service.py` - No eager loading (selectinload) for relationships - N+1 query problems likely - [ ] `src/server/database/service.py` line 200+ - Check for missing `.selectinload()` in queries #### Concurrent Request Handling **Thread Pool Sizing** - [ ] `src/server/services/download_service.py` line 85 - `ThreadPoolExecutor(max_workers=max_concurrent_downloads)` - Default is 2, should be configurable - No queue depth limits **Async/Sync Blocking Calls** - [ ] `src/server/api/anime.py` line 30+ - Series list operations might block - Database queries appear async (OK) - [ ] `src/server/services/auth_service.py` - Methods are synchronous but called from async endpoints - Should verify no blocking calls #### I/O Performance **Database Query Count** - [ ] `/api/v1/anime` endpoint - Likely makes multiple queries for each series - Should use single query with joins/eager loading - Test with N series to find N+1 issues **File I/O Optimization** - [ ] `src/core/SerieScanner.py` line 140+ - Each folder reads data file - Could batch reads or cache **Network Request Optimization** - [ ] `src/core/providers/enhanced_provider.py` line 115 - Retry strategy good - No connection pooling verification - Should check request timeout values #### Performance Metrics Missing - [ ] No performance monitoring for slow endpoints - [ ] No database query logging - [ ] No cache hit/miss metrics - [ ] No background task performance tracking - [ ] No file operation benchmarks --- ## šŸ“‹ Issues by File and Category ### Core Module Issues #### `src/cli/Main.py` - [ ] **Code Quality**: Class `SeriesApp` duplicates core `SeriesApp` from `src/core/SeriesApp.py` - Consider consolidating or using inheritance - Line 35: `_initialization_count` duplicated state tracking - [ ] **Type Hints**: `display_series()` doesn't validate if `serie.name` is `None` before using it - [ ] **Import Organization**: Imports not sorted (lines 1-11) - should follow isort convention - [ ] **Error Handling**: `NoKeyFoundException` and `MatchNotFoundError` are bare except classes - need proper inheritance - [ ] **Logging**: Logging configuration at module level should be in centralized config #### `src/core/SeriesApp.py` - [ ] **Global State**: Line 73 - `series_app: Optional[SeriesApp] = None` in `fastapi_app.py` uses global state - Should use dependency injection instead - [ ] **Complexity**: `Scan()` method is complex (80+ lines) - should be broken into smaller methods - [ ] **Error Context**: `_handle_error()` doesn't provide enough context about which operation failed #### `src/core/SerieScanner.py` - [ ] **Code Quality**: `is_null_or_whitespace()` duplicates Python's `str.isspace()` - use built-in instead - [ ] **Error Logging**: Lines 167-182 catch exceptions but only log, don't propagate context - [ ] **Performance**: `__find_mp4_files()` might be inefficient for large directories - add progress callback #### `src/core/providers/base_provider.py` #### `src/core/providers/aniworld_provider.py` - [ ] **Import Organization**: Lines 1-18 - imports not sorted (violates isort) - [ ] **Global State**: Lines 24-26 - Multiple logger instances created at module level - Should use centralized logging system - [ ] **Hardcoding**: Line 42 - User-Agent string hardcoded (also at line 47 for Firefox) - Extract to configuration constants - [ ] **Type Hints**: Missing type hints on: - `__init__()` method parameters (no return type on implicit constructor) - Class attribute type annotations (line 41-62) - [ ] **Magic Strings**: Line 38 - Hardcoded list of provider names should be enum - [ ] **Configuration**: Timeouts hardcoded at line 22 - should use settings #### `src/core/providers/enhanced_provider.py` - [ ] **Type Hints**: Class constructor `__init__()` missing type annotations (lines 40-96) - [ ] **Documentation**: Bare exception handlers at lines 418-419 - need specific exception types - [ ] **Code Quality**: `with_error_recovery` decorator imported but usage unclear - [ ] **Performance**: `_create_robust_session()` method not shown but likely creates multiple session objects #### `src/core/interfaces/providers.py` - [ ] Need to verify if any abstract methods lack type hints and docstrings #### `src/core/exceptions/Exceptions.py` - [ ] Need to verify custom exception hierarchy and documentation --- ### Server Module Issues #### `src/server/fastapi_app.py` - [ ] **Global State**: Line 73 - `series_app: Optional[SeriesApp] = None` stored globally - Use FastAPI dependency injection via `Depends()` - [ ] **CORS Configuration**: Line 48 - `allow_origins=["*"]` is production security issue - Add comment: "Configure appropriately for production" - Extract to settings with environment-based defaults - [ ] **Error Handling**: `startup_event()` at line 79 - missing try-except to handle initialization failures - [ ] **Type Hints**: `startup_event()` function missing type annotations - [ ] **Documentation**: `broadcast_callback()` function inside event handler should be extracted to separate function - [ ] **Logging**: No error logging if `settings.anime_directory` is None #### `src/server/middleware/auth.py` - [ ] **Performance**: In-memory rate limiter (line 34) will leak memory - never cleans up old entries - Need periodic cleanup or use Redis for production - [ ] **Security**: Line 46 - Rate limiting only 60-second window, should be configurable - [ ] **Type Hints**: `dispatch()` method parameters properly typed, but return type could be explicit - [ ] **Documentation**: `_get_client_ip()` method incomplete (line 94+ truncated) - [ ] **Error Handling**: Lines 81-86 - Silent failure if protected endpoint and no auth - Should return 401 consistently #### `src/server/services/auth_service.py` - [ ] **Documentation**: Line 68 - Comment says "For now we update only in-memory" indicates incomplete implementation - Create task to persist password hash to configuration file - [ ] **Type Hints**: `_verify_password()` at line 60 - no return type annotation (implicit `bool`) - [ ] **Security**: Line 71 - Minimum password length 8 characters, should be documented as security requirement - [ ] **State Management**: In-memory `_failed` dict (line 51) resets on process restart - Document this limitation and suggest Redis/database for production #### `src/server/database/service.py` - [ ] **Documentation**: Service layer methods need detailed docstrings explaining: - Database constraints - Transaction behavior - Cascade delete behavior - [ ] **Error Handling**: Methods don't specify which SQLAlchemy exceptions they might raise #### `src/server/database/models.py` - [ ] **Documentation**: Model relationships and cascade rules well-documented - āœ… Type hints present and comprehensive (well done) - [ ] **Validation**: No model-level validation before database insert - Consider adding validators for constraints #### `src/server/services/download_service.py` - [ ] **Performance**: Line 85 - `deque(maxlen=100)` for completed items - is this appropriate for long-running service? - [ ] **Thread Safety**: Uses `ThreadPoolExecutor` but thread-safety of queue operations not clear #### `src/server/utils/dependencies.py` - [ ] **TODO Comments**: Lines 223 and 233 - TODO comments for unimplemented features: - "TODO: Implement rate limiting logic" - "TODO: Implement request logging logic" - Create separate task items for these #### `src/server/utils/system.py` - [ ] **Exception Handling**: Line 255 - bare `pass` statement in exception handler - Should at least log the exception #### `src/server/api/anime.py` - [ ] **Error Handling**: Lines 35-39 - Multiple bare `except Exception` handlers - Need specific exception types and proper logging - [ ] **Code Quality**: Lines 32-36 - Complex property access with `getattr()` chains - Create helper function or model method to encapsulate --- ### Models and Pydantic Issues #### `src/server/models/config.py` - [ ] **Error Handling**: Line 93 - `ValidationError` caught but only silently passed? - Should log or re-raise with context --- ### Utility and Configuration Issues #### `src/config/settings.py` - [ ] **Security**: Line 12 - `master_password` field stored in environment during development - Add warning comment: "NEVER use this in production" - [ ] **Documentation**: Settings class needs comprehensive docstring explaining each field #### `src/infrastructure/logging/GlobalLogger.py` - [ ] Need to review logging configuration for consistency #### `src/server/utils/logging.py` - [ ] Need to review for type hints and consistency with global logging #### `src/server/utils/template_helpers.py` - [ ] Need to review for type hints and docstrings #### `src/server/utils/log_manager.py` - [ ] Need to review for type hints and error handling --- ## šŸ”’ Security Issues ### High Priority - [ ] **CORS Configuration** (`src/server/fastapi_app.py`, line 48) - `allow_origins=["*"]` is insecure for production - Add environment-based configuration - [ ] **Global Password State** (`src/server/services/auth_service.py`, line 51) - In-memory failure tracking resets on restart - Recommend using persistent storage (database/Redis) ### Medium Priority - [ ] **Rate Limiter Memory Leak** (`src/server/middleware/auth.py`, line 34) - Never cleans up old IP entries - Add periodic cleanup or use Redis - [ ] **Missing Authorization Checks** (`src/server/middleware/auth.py`, lines 81-86) - Some protected endpoints might silently allow unauthenticated access --- ## šŸ“Š Code Style Issues ### Documentation - Phase 1: Critical Sections - [ ] Document database transaction behavior in `src/server/database/service.py` ### Documentation - Phase 2: Endpoints - [ ] Expand docstrings on endpoints in `src/server/api/anime.py` - [ ] Add parameter descriptions to endpoint handlers - [ ] Document expected exceptions and error responses ### Code Quality - Phase 1: Consolidation - [ ] Investigate `SeriesApp` duplication between `src/cli/Main.py` and `src/core/SeriesApp.py` - [ ] Consider consolidating into single implementation - [ ] Update CLI to use core module instead of duplicate ### Code Quality - Phase 2: Exception Handling - [ ] Add specific exception types to bare `except:` handlers - [ ] Add logging to all exception handlers - [ ] Document exception context and causes - [ ] Review exception handling in `src/core/providers/enhanced_provider.py` (lines 410-421) ### Code Quality - Phase 3: Refactoring - [ ] Extract `broadcast_callback()` from `startup_event()` in `src/server/fastapi_app.py` - [ ] Break down complex `Scan()` method in `src/core/SerieScanner.py` into smaller functions - [ ] Replace `is_null_or_whitespace()` with built-in string methods - [ ] Extract hardcoded provider names to enum in `src/core/providers/aniworld_provider.py` ### Security - Phase 1: Critical Fixes - [ ] Make CORS configuration environment-based in `src/server/fastapi_app.py` - [ ] Add startup validation to ensure `anime_directory` is configured ### Security - Phase 2: Improvements - [ ] Implement Redis-based rate limiter instead of in-memory in `src/server/middleware/auth.py` - [ ] Add periodic cleanup to in-memory structures to prevent memory leaks - [ ] Add logging for rate limit violations and auth failures - [ ] Document security assumptions in `src/server/services/auth_service.py` ### Performance - Phase 1: Validation - [ ] Profile `SerieScanner.__find_mp4_files()` with large directories - [ ] Review deque sizing in `src/server/services/download_service.py` (lines 85-86) - [ ] Verify thread-safety of queue operations ### Performance - Phase 2: Optimization - [ ] Add pagination to anime list endpoint if dataset is large - [ ] Consider caching for search results in `src/core/providers/aniworld_provider.py` - [ ] Review session creation overhead in provider initialization ### Configuration Issues - [ ] Extract hardcoded timeouts from `src/core/providers/aniworld_provider.py` line 22 to settings - [ ] Extract User-Agent strings to configuration constants - [ ] Document all configuration options in settings module - [ ] Add validation for required environment variables ### Logging Issues - [ ] Centralize logger creation across all modules - [ ] Remove module-level logger instantiation where possible - [ ] Document logging levels expected for each component - [ ] Review `src/cli/Main.py` logging configuration (lines 12-22) - appears to suppress all logging ### Testing/Comments - [ ] Add inline comments explaining complex regex patterns in providers - [ ] Add comments explaining retry logic and backoff strategies - [ ] Document callback behavior and expected signatures - [ ] Add comments to clarify WebSocket broadcast mechanisms --- ## šŸ“Œ Implementation Notes ### Dependencies to Verify - [ ] `error_handler` module - currently missing, causing import error - [ ] All Pydantic models properly imported in service layers - [ ] SQLAlchemy session management properly scoped ### Configuration Management - [ ] Review `src/config/settings.py` for completeness - [ ] Ensure all configurable values are in settings, not hardcoded - [ ] Document all environment variables needed ### Testing Coverage - [ ] Verify tests cover exception paths in `src/server/api/anime.py` - [ ] Add tests for CORS configuration - [ ] Test rate limiting behavior in middleware - [ ] Test database transaction rollback scenarios --- ## šŸ”„ Validation Checklist Before Committing For each issue fixed: - [ ] Run Pylance to verify type hints are correct - [ ] Run `isort` on modified files to sort imports - [ ] Run `black` to format code to PEP8 standards - [ ] Run existing unit tests to ensure no regression - [ ] Verify no new security vulnerabilities introduced - [ ] Update docstrings if behavior changed - [ ] Document any breaking API changes --- **Total Issues Identified**: ~90 individual items across 8 categories **Priority Distribution**: 5 High | 15 Medium | 70 Low/Nice-to-have **Estimated Effort**: 40-60 hours for comprehensive quality improvement