# Aniworld Web Application Development Instructions This document provides detailed tasks for AI agents to implement a modern web application for the Aniworld anime download manager. All tasks should follow the coding guidelines specified in the project's copilot instructions. ## Project Overview The goal is to create a FastAPI-based web application that provides a modern interface for the existing Aniworld anime download functionality. The core anime logic should remain in `SeriesApp.py` while the web layer provides REST API endpoints and a responsive UI. ## Architecture Principles - **Single Responsibility**: Each file/class has one clear purpose - **Dependency Injection**: Use FastAPI's dependency system - **Clean Separation**: Web layer calls core logic, never the reverse - **File Size Limit**: Maximum 500 lines per file - **Type Hints**: Use comprehensive type annotations - **Error Handling**: Proper exception handling and logging ## Additional Implementation Guidelines ### Code Style and Standards - **Type Hints**: Use comprehensive type annotations throughout all modules - **Docstrings**: Follow PEP 257 for function and class documentation - **Error Handling**: Implement custom exception classes with meaningful messages - **Logging**: Use structured logging with appropriate log levels - **Security**: Validate all inputs and sanitize outputs - **Performance**: Use async/await patterns for I/O operations ## 📞 Escalation If you encounter: - Architecture issues requiring design decisions - Tests that conflict with documented requirements - Breaking changes needed - Unclear requirements or expectations **Document the issue and escalate rather than guessing.** --- ## 📚 Helpful Commands ```bash # Run all tests conda run -n AniWorld python -m pytest tests/ -v --tb=short # Run specific test file conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py -v # Run specific test class conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService -v # Run specific test conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService::test_broadcast_download_progress -v # Run with extra verbosity conda run -n AniWorld python -m pytest tests/ -vv # Run with full traceback conda run -n AniWorld python -m pytest tests/ -v --tb=long # Run and stop at first failure conda run -n AniWorld python -m pytest tests/ -v -x # Run tests matching pattern conda run -n AniWorld python -m pytest tests/ -v -k "auth" # Show all print statements conda run -n AniWorld python -m pytest tests/ -v -s ``` --- ## 📊 Detailed Analysis: The 7 Quality Criteria ### 5️⃣ No Shortcuts or Hacks Used **Logging Configuration Workarounds** - No outstanding issues (reviewed - no manual handler removal found) **Hardcoded Values** - No outstanding issues (all previously identified issues have been addressed) **Exception Handling Shortcuts** - No outstanding issues (reviewed - OSError handling implemented where appropriate) **Type Casting Workarounds** - No outstanding issues (reviewed - model serialization appropriate for backward compatibility) **Conditional Hacks** - No outstanding issues (completed - proper test mode flag now used) --- ### 6️⃣ Security Considerations Addressed #### Authentication & Authorization **Weak CORS Configuration** - No outstanding issues (completed - environment-based CORS configuration implemented with origin-based rate limiting) **Missing Authorization Checks** - No outstanding issues (completed - proper 401 responses and public path definitions implemented) **In-Memory Session Storage** - No outstanding issues (completed - documented limitation with production recommendation) #### Input Validation **Unvalidated User Input** - No outstanding issues (completed - comprehensive validation implemented) **Missing Parameter Validation** - No outstanding issues (completed - comprehensive validation with range checks implemented) #### Secrets and Credentials **Hardcoded Secrets** - No outstanding issues (completed - secure defaults and .gitignore updated) **Plaintext Password Storage** - No outstanding issues (completed - warning comments added for development-only usage) **Master Password Implementation** - No outstanding issues (completed - comprehensive password requirements implemented) #### Data Protection **No Encryption of Sensitive Data** - [x] Downloaded files not verified with checksums -> **COMPLETED** - Implemented FileIntegrityManager with SHA256 checksums - Integrated into download process (enhanced_provider.py) - Checksums stored and verified automatically - Tests added and passing (test_file_integrity.py) - [x] No integrity checking of stored data -> **COMPLETED** - Implemented DatabaseIntegrityChecker with comprehensive checks - Checks for orphaned records, invalid references, duplicates - Data consistency validation (season/episode numbers, progress, status) - Repair functionality to remove orphaned records - API endpoints added: /api/maintenance/integrity/check and /repair - [x] No encryption of sensitive config values -> **COMPLETED** - Implemented ConfigEncryption with Fernet (AES-128) - Auto-detection of sensitive fields (password, secret, key, token, etc.) - Encryption key stored securely with restrictive permissions (0o600) - Support for encrypting/decrypting entire config dictionaries - Key rotation functionality for enhanced security **File Permission Issues** - No outstanding issues (completed - absolute paths and proper permissions implemented) **Logging of Sensitive Data** - No outstanding issues (completed - sensitive data excluded from logs) #### Network Security **Unvalidated External Connections** - No outstanding issues (completed - SSL verification enabled and server error logging added) **Missing SSL/TLS Configuration** - No outstanding issues (completed - all verify=False instances fixed) - doodstream.py (2 instances) - loadx.py (2 instances) - Added timeout parameters where missing - [x] Check for `verify=False` in requests calls -> completed - All requests now use SSL verification #### Database Security **No SQL Injection Protection** - [x] Check `src/server/database/service.py` for parameterized queries -> completed - All queries use SQLAlchemy query builder (select, update, delete) - No raw SQL or string concatenation found - Parameters properly passed through where() clauses - f-strings in LIKE clauses are safe (passed as parameter values) - [x] String interpolation in queries -> verified safe - No string interpolation directly in SQL queries - All user input is properly parameterized **No Database Access Control** - [x] Single database user for all operations -> reviewed (acceptable for single-user app) - Current design is single-user application - Database access control would be needed for multi-tenant deployment - Document this limitation for production scaling - [x] No row-level security -> reviewed (not needed for current scope) - Single-user application doesn't require row-level security - Future: Implement if multi-user support is added - [x] No audit logging of data changes -> reviewed (tracked as future enhancement) - Not critical for current single-user scope - Consider implementing for compliance requirements - Could use SQLAlchemy events for audit trail --- ### 7️⃣ Performance Validated #### Algorithmic Efficiency Issues **File Scanning Performance** - [x] `src/core/SerieScanner.py` line 105+ -> reviewed (acceptable performance) - `__find_mp4_files()` uses os.walk() which is O(n) for n files - Already uses generator/iterator pattern for memory efficiency - Yields results incrementally, not loading all at once - For very large directories (>10K files), consider adding: - Progress callbacks (already implemented) - File count limits or pagination - Background scanning with cancellation support **Download Queue Processing** - [x] `src/server/services/download_service.py` line 240 -> completed - Optimized queue operations from O(n) to O(1) - Added helper dict `_pending_items_by_id` for fast lookups - Created helper methods: - `_add_to_pending_queue()` - maintains both deque and dict - `_remove_from_pending_queue()` - O(1) removal - Updated all append/remove operations to use helper methods - Tests passing ✓ **Provider Search Performance** - [x] `src/core/providers/enhanced_provider.py` line 220 -> completed - Added quick fail for obviously non-JSON responses (HTML error pages) - Warns if response doesn't start with JSON markers - Multiple parsing strategies (3) is reasonable - first succeeds in most cases - Added performance optimization to reject HTML before trying JSON parse **String Operations** - [x] `src/cli/Main.py` line 118 -> reviewed (acceptable complexity) - Nested generator comprehension is O(n\*m) which is expected - n = number of series, m = average seasons per series - Single-pass calculation, no repeated iteration - Uses generator expression for memory efficiency - This is idiomatic Python and performs well **Regular Expression Compilation** - [x] `src/core/providers/streaming/doodstream.py` line 35 -> completed (already optimized) - Regex patterns already compiled at module level (lines 16-18) - PASS_MD5_PATTERN and TOKEN_PATTERN are precompiled - All streaming providers follow this pattern: - voe.py: 3 patterns compiled at module level - speedfiles.py: 1 pattern compiled at module level - filemoon.py: 3 patterns compiled at module level - doodstream.py: 2 patterns compiled at module level #### Resource Usage Issues **Memory Leaks/Unbounded Growth** - [x] `src/server/middleware/auth.py` line 34 -> completed - Added \_cleanup_old_entries() method - Periodically removes rate limit entries older than 2x window - Cleanup runs every 5 minutes - Prevents unbounded memory growth from old IP addresses - [x] `src/server/services/download_service.py` line 85-86 -> reviewed (intentional design) - `deque(maxlen=100)` for completed items is intentional - `deque(maxlen=50)` for failed items is intentional - Automatically drops oldest items to prevent memory growth - Recent history is sufficient for monitoring and troubleshooting - Full history available in database if needed **Connection Pool Configuration** - [x] `src/server/database/connection.py` -> completed - Added explicit pool size configuration - pool_size=5 for non-SQLite databases (PostgreSQL, MySQL) - max_overflow=10 allows temporary burst to 15 connections - SQLite uses StaticPool (appropriate for single-file database) - pool_pre_ping=True ensures connection health checks **Large Data Structure Initialization** - [x] `src/cli/Main.py` line 118 -> reviewed (acceptable for CLI) - CLI loads all series at once which is appropriate for terminal UI - User can see and select from full list - For web API, pagination already implemented in endpoints - Memory usage acceptable for typical anime collections (<1000 series) #### Caching Opportunities **No Request Caching** - [x] `src/server/api/anime.py` - endpoints hit database every time -> reviewed (acceptable) - Database queries are fast for typical workloads - SQLAlchemy provides query result caching - HTTP caching headers could be added as enhancement - Consider Redis caching for high-traffic production deployments - [x] `src/core/providers/enhanced_provider.py` -> completed (caching implemented) - HTML responses are cached in \_KeyHTMLDict and \_EpisodeHTMLDict - Cache keys use (key, season, episode) tuples - ClearCache() and RemoveFromCache() methods available - In-memory caching appropriate for session-based usage **No Database Query Optimization** - [x] `src/server/services/anime_service.py` -> reviewed (uses database service) - Service layer delegates to database service - Database service handles query optimization - [x] `src/server/database/service.py` line 200+ -> completed (eager loading implemented) - selectinload used for AnimeSeries.episodes (line 151) - selectinload used for DownloadQueueItem.series (line 564) - Prevents N+1 query problems for relationships - Proper use of SQLAlchemy query builder #### Concurrent Request Handling **Thread Pool Sizing** - [x] `src/server/services/download_service.py` line 85 -> reviewed (configurable) - ThreadPoolExecutor uses max_concurrent_downloads parameter - Configurable via DownloadService constructor - Default value reasonable for typical usage - No hard queue depth limit by design (dynamic scheduling) **Async/Sync Blocking Calls** - [x] `src/server/api/anime.py` line 30+ -> reviewed (properly async) - Database queries use async/await properly - SeriesApp operations wrapped in executor where needed - FastAPI handles sync/async mixing automatically - [x] `src/server/services/auth_service.py` -> reviewed (lightweight operations) - Methods are synchronous but perform no blocking I/O - JWT encoding/decoding, password hashing are CPU-bound - Fast enough not to block event loop significantly - Could be moved to executor for high-load scenarios #### I/O Performance **Database Query Count** - [x] `/api/v1/anime` endpoint -> reviewed (optimized with eager loading) - Uses selectinload to prevent N+1 queries - Single query with joins for series and episodes - Pagination available via query parameters - Performance acceptable for typical workloads **File I/O Optimization** - [x] `src/core/SerieScanner.py` line 140+ -> reviewed (acceptable design) - Each folder reads data file individually - Sequential file I/O appropriate for scan operation - Files are small (metadata only) - Caching would complicate freshness guarantees **Network Request Optimization** - [x] `src/core/providers/enhanced_provider.py` line 115 -> reviewed (optimized) - Retry strategy configured with backoff - Connection pooling via requests.Session - Timeout values configurable via environment - pool_connections=10, pool_maxsize=10 for HTTP adapter #### Performance Metrics Missing - [x] No performance monitoring for slow endpoints -> reviewed (future enhancement) - Consider adding middleware for request timing - Log slow requests (>1s) automatically - Future: Integrate Prometheus/Grafana for monitoring - [x] No database query logging -> reviewed (available in debug mode) - SQLAlchemy echo=True enables query logging - Controlled by settings.log_level == "DEBUG" - Production should use external query monitoring - [x] No cache hit/miss metrics -> reviewed (future enhancement) - In-memory caching doesn't track metrics - Future: Implement cache metrics with Redis - [x] No background task performance tracking -> reviewed (future enhancement) - Download service tracks progress internally - Metrics exposed via WebSocket and API endpoints - Future: Add detailed performance counters - [x] No file operation benchmarks -> reviewed (not critical for current scope) - File operations are fast enough for typical usage - Consider profiling if performance issues arise --- ## 📋 Issues by File and Category ### Core Module Issues #### `src/cli/Main.py` - [ ] **Code Quality**: Class `SeriesApp` duplicates core `SeriesApp` from `src/core/SeriesApp.py` - Consider consolidating or using inheritance - Line 35: `_initialization_count` duplicated state tracking - [ ] **Type Hints**: `display_series()` doesn't validate if `serie.name` is `None` before using it - [ ] **Import Organization**: Imports not sorted (lines 1-11) - should follow isort convention - [ ] **Error Handling**: `NoKeyFoundException` and `MatchNotFoundError` are bare except classes - need proper inheritance - [ ] **Logging**: Logging configuration at module level should be in centralized config #### `src/core/SeriesApp.py` - [ ] **Global State**: Line 73 - `series_app: Optional[SeriesApp] = None` in `fastapi_app.py` uses global state - Should use dependency injection instead - [ ] **Complexity**: `Scan()` method is complex (80+ lines) - should be broken into smaller methods - [ ] **Error Context**: `_handle_error()` doesn't provide enough context about which operation failed #### `src/core/SerieScanner.py` - [x] **Code Quality**: `is_null_or_whitespace()` duplicates Python's `str.isspace()` - use built-in instead -> **COMPLETED** - Removed redundant function - Replaced with direct Python idiom: `serie.key and serie.key.strip()` - [ ] **Error Logging**: Lines 167-182 catch exceptions but only log, don't propagate context - [ ] **Performance**: `__find_mp4_files()` might be inefficient for large directories - add progress callback #### `src/core/providers/base_provider.py` #### `src/core/providers/aniworld_provider.py` - [ ] **Import Organization**: Lines 1-18 - imports not sorted (violates isort) - [ ] **Global State**: Lines 24-26 - Multiple logger instances created at module level - Should use centralized logging system - [ ] **Hardcoding**: Line 42 - User-Agent string hardcoded (also at line 47 for Firefox) - Extract to configuration constants - [ ] **Type Hints**: Missing type hints on: - `__init__()` method parameters (no return type on implicit constructor) - Class attribute type annotations (line 41-62) - [ ] **Magic Strings**: Line 38 - Hardcoded list of provider names should be enum - [ ] **Configuration**: Timeouts hardcoded at line 22 - should use settings #### `src/core/providers/enhanced_provider.py` - [ ] **Type Hints**: Class constructor `__init__()` missing type annotations (lines 40-96) - [ ] **Documentation**: Bare exception handlers at lines 418-419 - need specific exception types - [ ] **Code Quality**: `with_error_recovery` decorator imported but usage unclear - [ ] **Performance**: `_create_robust_session()` method not shown but likely creates multiple session objects #### `src/core/interfaces/providers.py` - [ ] Need to verify if any abstract methods lack type hints and docstrings #### `src/core/exceptions/Exceptions.py` - [ ] Need to verify custom exception hierarchy and documentation --- ### Server Module Issues #### `src/server/fastapi_app.py` - [ ] **Global State**: Line 73 - `series_app: Optional[SeriesApp] = None` stored globally - Use FastAPI dependency injection via `Depends()` - [ ] **CORS Configuration**: Line 48 - `allow_origins=["*"]` is production security issue - Add comment: "Configure appropriately for production" - Extract to settings with environment-based defaults - [ ] **Error Handling**: `startup_event()` at line 79 - missing try-except to handle initialization failures - [ ] **Type Hints**: `startup_event()` function missing type annotations - [ ] **Documentation**: `broadcast_callback()` function inside event handler should be extracted to separate function - [ ] **Logging**: No error logging if `settings.anime_directory` is None #### `src/server/middleware/auth.py` - [ ] **Performance**: In-memory rate limiter (line 34) will leak memory - never cleans up old entries - Need periodic cleanup or use Redis for production - [ ] **Security**: Line 46 - Rate limiting only 60-second window, should be configurable - [ ] **Type Hints**: `dispatch()` method parameters properly typed, but return type could be explicit - [ ] **Documentation**: `_get_client_ip()` method incomplete (line 94+ truncated) - [ ] **Error Handling**: Lines 81-86 - Silent failure if protected endpoint and no auth - Should return 401 consistently #### `src/server/services/auth_service.py` - [ ] **Documentation**: Line 68 - Comment says "For now we update only in-memory" indicates incomplete implementation - Create task to persist password hash to configuration file - [ ] **Type Hints**: `_verify_password()` at line 60 - no return type annotation (implicit `bool`) - [ ] **Security**: Line 71 - Minimum password length 8 characters, should be documented as security requirement - [ ] **State Management**: In-memory `_failed` dict (line 51) resets on process restart - Document this limitation and suggest Redis/database for production #### `src/server/database/service.py` - [ ] **Documentation**: Service layer methods need detailed docstrings explaining: - Database constraints - Transaction behavior - Cascade delete behavior - [ ] **Error Handling**: Methods don't specify which SQLAlchemy exceptions they might raise #### `src/server/database/models.py` - [ ] **Documentation**: Model relationships and cascade rules well-documented - ✅ Type hints present and comprehensive (well done) - [ ] **Validation**: No model-level validation before database insert - Consider adding validators for constraints #### `src/server/services/download_service.py` - [ ] **Performance**: Line 85 - `deque(maxlen=100)` for completed items - is this appropriate for long-running service? - [ ] **Thread Safety**: Uses `ThreadPoolExecutor` but thread-safety of queue operations not clear #### `src/server/utils/dependencies.py` - [ ] **TODO Comments**: Lines 223 and 233 - TODO comments for unimplemented features: - "TODO: Implement rate limiting logic" - "TODO: Implement request logging logic" - Create separate task items for these #### `src/server/utils/system.py` - [ ] **Exception Handling**: Line 255 - bare `pass` statement in exception handler - Should at least log the exception #### `src/server/api/anime.py` - [ ] **Error Handling**: Lines 35-39 - Multiple bare `except Exception` handlers - Need specific exception types and proper logging - [ ] **Code Quality**: Lines 32-36 - Complex property access with `getattr()` chains - Create helper function or model method to encapsulate --- ### Models and Pydantic Issues #### `src/server/models/config.py` - [ ] **Error Handling**: Line 93 - `ValidationError` caught but only silently passed? - Should log or re-raise with context --- ### Utility and Configuration Issues #### `src/config/settings.py` - [ ] **Security**: Line 12 - `master_password` field stored in environment during development - Add warning comment: "NEVER use this in production" - [ ] **Documentation**: Settings class needs comprehensive docstring explaining each field #### `src/infrastructure/logging/GlobalLogger.py` - [ ] Need to review logging configuration for consistency #### `src/server/utils/logging.py` - [ ] Need to review for type hints and consistency with global logging #### `src/server/utils/template_helpers.py` - [ ] Need to review for type hints and docstrings #### `src/server/utils/log_manager.py` - [ ] Need to review for type hints and error handling --- ## 🔒 Security Issues ### High Priority - [ ] **CORS Configuration** (`src/server/fastapi_app.py`, line 48) - `allow_origins=["*"]` is insecure for production - Add environment-based configuration - [ ] **Global Password State** (`src/server/services/auth_service.py`, line 51) - In-memory failure tracking resets on restart - Recommend using persistent storage (database/Redis) ### Medium Priority - [ ] **Rate Limiter Memory Leak** (`src/server/middleware/auth.py`, line 34) - Never cleans up old IP entries - Add periodic cleanup or use Redis - [ ] **Missing Authorization Checks** (`src/server/middleware/auth.py`, lines 81-86) - Some protected endpoints might silently allow unauthenticated access --- ## 📊 Code Style Issues ### Documentation - Phase 1: Critical Sections - [ ] Document database transaction behavior in `src/server/database/service.py` ### Documentation - Phase 2: Endpoints - [ ] Expand docstrings on endpoints in `src/server/api/anime.py` - [ ] Add parameter descriptions to endpoint handlers - [ ] Document expected exceptions and error responses ### Code Quality - Phase 1: Consolidation - [ ] Investigate `SeriesApp` duplication between `src/cli/Main.py` and `src/core/SeriesApp.py` - [ ] Consider consolidating into single implementation - [ ] Update CLI to use core module instead of duplicate ### Code Quality - Phase 2: Exception Handling - [ ] Add specific exception types to bare `except:` handlers - [ ] Add logging to all exception handlers - [ ] Document exception context and causes - [ ] Review exception handling in `src/core/providers/enhanced_provider.py` (lines 410-421) ### Code Quality - Phase 3: Refactoring - [ ] Extract `broadcast_callback()` from `startup_event()` in `src/server/fastapi_app.py` - [ ] Break down complex `Scan()` method in `src/core/SerieScanner.py` into smaller functions - [ ] Replace `is_null_or_whitespace()` with built-in string methods - [ ] Extract hardcoded provider names to enum in `src/core/providers/aniworld_provider.py` ### Security - Phase 1: Critical Fixes - [ ] Make CORS configuration environment-based in `src/server/fastapi_app.py` - [ ] Add startup validation to ensure `anime_directory` is configured ### Security - Phase 2: Improvements - [ ] Implement Redis-based rate limiter instead of in-memory in `src/server/middleware/auth.py` - [ ] Add periodic cleanup to in-memory structures to prevent memory leaks - [ ] Add logging for rate limit violations and auth failures - [ ] Document security assumptions in `src/server/services/auth_service.py` ### Performance - Phase 1: Validation - [ ] Profile `SerieScanner.__find_mp4_files()` with large directories - [ ] Review deque sizing in `src/server/services/download_service.py` (lines 85-86) - [ ] Verify thread-safety of queue operations ### Performance - Phase 2: Optimization - [ ] Add pagination to anime list endpoint if dataset is large - [ ] Consider caching for search results in `src/core/providers/aniworld_provider.py` - [ ] Review session creation overhead in provider initialization ### Configuration Issues - [ ] Extract hardcoded timeouts from `src/core/providers/aniworld_provider.py` line 22 to settings - [ ] Extract User-Agent strings to configuration constants - [ ] Document all configuration options in settings module - [ ] Add validation for required environment variables ### Logging Issues - [ ] Centralize logger creation across all modules - [ ] Remove module-level logger instantiation where possible - [ ] Document logging levels expected for each component - [ ] Review `src/cli/Main.py` logging configuration (lines 12-22) - appears to suppress all logging ### Testing/Comments - [ ] Add inline comments explaining complex regex patterns in providers - [ ] Add comments explaining retry logic and backoff strategies - [ ] Document callback behavior and expected signatures - [ ] Add comments to clarify WebSocket broadcast mechanisms --- ## 📌 Implementation Notes ### Dependencies to Verify - [ ] `error_handler` module - currently missing, causing import error - [ ] All Pydantic models properly imported in service layers - [ ] SQLAlchemy session management properly scoped ### Configuration Management - [ ] Review `src/config/settings.py` for completeness - [ ] Ensure all configurable values are in settings, not hardcoded - [ ] Document all environment variables needed ### Testing Coverage - [ ] Verify tests cover exception paths in `src/server/api/anime.py` - [ ] Add tests for CORS configuration - [ ] Test rate limiting behavior in middleware - [ ] Test database transaction rollback scenarios --- ## 🔄 Validation Checklist Before Committing For each issue fixed: - [ ] Run Pylance to verify type hints are correct - [ ] Run `isort` on modified files to sort imports - [ ] Run `black` to format code to PEP8 standards - [ ] Run existing unit tests to ensure no regression - [ ] Verify no new security vulnerabilities introduced - [ ] Update docstrings if behavior changed - [ ] Document any breaking API changes --- **Total Issues Identified**: ~90 individual items across 8 categories **Priority Distribution**: 5 High | 15 Medium | 70 Low/Nice-to-have **Estimated Effort**: 40-60 hours for comprehensive quality improvement