28 KiB
Aniworld Web Application Development Instructions
This document provides detailed tasks for AI agents to implement a modern web application for the Aniworld anime download manager. All tasks should follow the coding guidelines specified in the project's copilot instructions.
Project Overview
The goal is to create a FastAPI-based web application that provides a modern interface for the existing Aniworld anime download functionality. The core anime logic should remain in SeriesApp.py while the web layer provides REST API endpoints and a responsive UI.
Architecture Principles
- Single Responsibility: Each file/class has one clear purpose
- Dependency Injection: Use FastAPI's dependency system
- Clean Separation: Web layer calls core logic, never the reverse
- File Size Limit: Maximum 500 lines per file
- Type Hints: Use comprehensive type annotations
- Error Handling: Proper exception handling and logging
Additional Implementation Guidelines
Code Style and Standards
- Type Hints: Use comprehensive type annotations throughout all modules
- Docstrings: Follow PEP 257 for function and class documentation
- Error Handling: Implement custom exception classes with meaningful messages
- Logging: Use structured logging with appropriate log levels
- Security: Validate all inputs and sanitize outputs
- Performance: Use async/await patterns for I/O operations
📞 Escalation
If you encounter:
- Architecture issues requiring design decisions
- Tests that conflict with documented requirements
- Breaking changes needed
- Unclear requirements or expectations
Document the issue and escalate rather than guessing.
📚 Helpful Commands
# Run all tests
conda run -n AniWorld python -m pytest tests/ -v --tb=short
# Run specific test file
conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py -v
# Run specific test class
conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService -v
# Run specific test
conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService::test_broadcast_download_progress -v
# Run with extra verbosity
conda run -n AniWorld python -m pytest tests/ -vv
# Run with full traceback
conda run -n AniWorld python -m pytest tests/ -v --tb=long
# Run and stop at first failure
conda run -n AniWorld python -m pytest tests/ -v -x
# Run tests matching pattern
conda run -n AniWorld python -m pytest tests/ -v -k "auth"
# Show all print statements
conda run -n AniWorld python -m pytest tests/ -v -s
📊 Detailed Analysis: The 7 Quality Criteria
5️⃣ No Shortcuts or Hacks Used
Logging Configuration Workarounds
- No outstanding issues (reviewed - no manual handler removal found)
Hardcoded Values
- No outstanding issues (all previously identified issues have been addressed)
Exception Handling Shortcuts
- No outstanding issues (reviewed - OSError handling implemented where appropriate)
Type Casting Workarounds
- No outstanding issues (reviewed - model serialization appropriate for backward compatibility)
Conditional Hacks
- No outstanding issues (completed - proper test mode flag now used)
6️⃣ Security Considerations Addressed
Authentication & Authorization
Weak CORS Configuration
- No outstanding issues (completed - environment-based CORS configuration implemented with origin-based rate limiting)
Missing Authorization Checks
- No outstanding issues (completed - proper 401 responses and public path definitions implemented)
In-Memory Session Storage
- No outstanding issues (completed - documented limitation with production recommendation)
Input Validation
Unvalidated User Input
- No outstanding issues (completed - comprehensive validation implemented)
Missing Parameter Validation
- No outstanding issues (completed - comprehensive validation with range checks implemented)
Secrets and Credentials
Hardcoded Secrets
- No outstanding issues (completed - secure defaults and .gitignore updated)
Plaintext Password Storage
- No outstanding issues (completed - warning comments added for development-only usage)
Master Password Implementation
- No outstanding issues (completed - comprehensive password requirements implemented)
Data Protection
No Encryption of Sensitive Data
- Downloaded files not verified with checksums -> COMPLETED
- Implemented FileIntegrityManager with SHA256 checksums
- Integrated into download process (enhanced_provider.py)
- Checksums stored and verified automatically
- Tests added and passing (test_file_integrity.py)
- No integrity checking of stored data -> COMPLETED
- Implemented DatabaseIntegrityChecker with comprehensive checks
- Checks for orphaned records, invalid references, duplicates
- Data consistency validation (season/episode numbers, progress, status)
- Repair functionality to remove orphaned records
- API endpoints added: /api/maintenance/integrity/check and /repair
- No encryption of sensitive config values -> COMPLETED
- Implemented ConfigEncryption with Fernet (AES-128)
- Auto-detection of sensitive fields (password, secret, key, token, etc.)
- Encryption key stored securely with restrictive permissions (0o600)
- Support for encrypting/decrypting entire config dictionaries
- Key rotation functionality for enhanced security
File Permission Issues
- No outstanding issues (completed - absolute paths and proper permissions implemented)
Logging of Sensitive Data
- No outstanding issues (completed - sensitive data excluded from logs)
Network Security
Unvalidated External Connections
- No outstanding issues (completed - SSL verification enabled and server error logging added)
Missing SSL/TLS Configuration
- No outstanding issues (completed - all verify=False instances fixed)
- doodstream.py (2 instances)
- loadx.py (2 instances)
- Added timeout parameters where missing
- Check for
verify=Falsein requests calls -> completed- All requests now use SSL verification
Database Security
No SQL Injection Protection
- Check
src/server/database/service.pyfor parameterized queries -> completed- All queries use SQLAlchemy query builder (select, update, delete)
- No raw SQL or string concatenation found
- Parameters properly passed through where() clauses
- f-strings in LIKE clauses are safe (passed as parameter values)
- String interpolation in queries -> verified safe
- No string interpolation directly in SQL queries
- All user input is properly parameterized
No Database Access Control
- Single database user for all operations -> reviewed (acceptable for single-user app)
- Current design is single-user application
- Database access control would be needed for multi-tenant deployment
- Document this limitation for production scaling
- No row-level security -> reviewed (not needed for current scope)
- Single-user application doesn't require row-level security
- Future: Implement if multi-user support is added
- No audit logging of data changes -> reviewed (tracked as future enhancement)
- Not critical for current single-user scope
- Consider implementing for compliance requirements
- Could use SQLAlchemy events for audit trail
7️⃣ Performance Validated
Algorithmic Efficiency Issues
File Scanning Performance
src/core/SerieScanner.pyline 105+ -> reviewed (acceptable performance)__find_mp4_files()uses os.walk() which is O(n) for n files- Already uses generator/iterator pattern for memory efficiency
- Yields results incrementally, not loading all at once
- For very large directories (>10K files), consider adding:
- Progress callbacks (already implemented)
- File count limits or pagination
- Background scanning with cancellation support
Download Queue Processing
src/server/services/download_service.pyline 240 -> completed- Optimized queue operations from O(n) to O(1)
- Added helper dict
_pending_items_by_idfor fast lookups - Created helper methods:
_add_to_pending_queue()- maintains both deque and dict_remove_from_pending_queue()- O(1) removal- Updated all append/remove operations to use helper methods
- Tests passing ✓
Provider Search Performance
src/core/providers/enhanced_provider.pyline 220 -> completed- Added quick fail for obviously non-JSON responses (HTML error pages)
- Warns if response doesn't start with JSON markers
- Multiple parsing strategies (3) is reasonable - first succeeds in most cases
- Added performance optimization to reject HTML before trying JSON parse
String Operations
src/cli/Main.pyline 118 -> reviewed (acceptable complexity)- Nested generator comprehension is O(n*m) which is expected
- n = number of series, m = average seasons per series
- Single-pass calculation, no repeated iteration
- Uses generator expression for memory efficiency
- This is idiomatic Python and performs well
Regular Expression Compilation
src/core/providers/streaming/doodstream.pyline 35 -> completed (already optimized)- Regex patterns already compiled at module level (lines 16-18)
- PASS_MD5_PATTERN and TOKEN_PATTERN are precompiled
- All streaming providers follow this pattern:
- voe.py: 3 patterns compiled at module level
- speedfiles.py: 1 pattern compiled at module level
- filemoon.py: 3 patterns compiled at module level
- doodstream.py: 2 patterns compiled at module level
Resource Usage Issues
Memory Leaks/Unbounded Growth
src/server/middleware/auth.pyline 34 -> completed- Added _cleanup_old_entries() method
- Periodically removes rate limit entries older than 2x window
- Cleanup runs every 5 minutes
- Prevents unbounded memory growth from old IP addresses
src/server/services/download_service.pyline 85-86 -> reviewed (intentional design)deque(maxlen=100)for completed items is intentionaldeque(maxlen=50)for failed items is intentional- Automatically drops oldest items to prevent memory growth
- Recent history is sufficient for monitoring and troubleshooting
- Full history available in database if needed
Connection Pool Configuration
src/server/database/connection.py-> completed- Added explicit pool size configuration
- pool_size=5 for non-SQLite databases (PostgreSQL, MySQL)
- max_overflow=10 allows temporary burst to 15 connections
- SQLite uses StaticPool (appropriate for single-file database)
- pool_pre_ping=True ensures connection health checks
Large Data Structure Initialization
src/cli/Main.pyline 118 -> reviewed (acceptable for CLI)- CLI loads all series at once which is appropriate for terminal UI
- User can see and select from full list
- For web API, pagination already implemented in endpoints
- Memory usage acceptable for typical anime collections (<1000 series)
Caching Opportunities
No Request Caching
src/server/api/anime.py- endpoints hit database every time -> reviewed (acceptable)- Database queries are fast for typical workloads
- SQLAlchemy provides query result caching
- HTTP caching headers could be added as enhancement
- Consider Redis caching for high-traffic production deployments
src/core/providers/enhanced_provider.py-> completed (caching implemented)- HTML responses are cached in _KeyHTMLDict and _EpisodeHTMLDict
- Cache keys use (key, season, episode) tuples
- ClearCache() and RemoveFromCache() methods available
- In-memory caching appropriate for session-based usage
No Database Query Optimization
src/server/services/anime_service.py-> reviewed (uses database service)- Service layer delegates to database service
- Database service handles query optimization
src/server/database/service.pyline 200+ -> completed (eager loading implemented)- selectinload used for AnimeSeries.episodes (line 151)
- selectinload used for DownloadQueueItem.series (line 564)
- Prevents N+1 query problems for relationships
- Proper use of SQLAlchemy query builder
Concurrent Request Handling
Thread Pool Sizing
src/server/services/download_service.pyline 85 -> reviewed (configurable)- ThreadPoolExecutor uses max_concurrent_downloads parameter
- Configurable via DownloadService constructor
- Default value reasonable for typical usage
- No hard queue depth limit by design (dynamic scheduling)
Async/Sync Blocking Calls
src/server/api/anime.pyline 30+ -> reviewed (properly async)- Database queries use async/await properly
- SeriesApp operations wrapped in executor where needed
- FastAPI handles sync/async mixing automatically
src/server/services/auth_service.py-> reviewed (lightweight operations)- Methods are synchronous but perform no blocking I/O
- JWT encoding/decoding, password hashing are CPU-bound
- Fast enough not to block event loop significantly
- Could be moved to executor for high-load scenarios
I/O Performance
Database Query Count
/api/v1/animeendpoint -> reviewed (optimized with eager loading)- Uses selectinload to prevent N+1 queries
- Single query with joins for series and episodes
- Pagination available via query parameters
- Performance acceptable for typical workloads
File I/O Optimization
src/core/SerieScanner.pyline 140+ -> reviewed (acceptable design)- Each folder reads data file individually
- Sequential file I/O appropriate for scan operation
- Files are small (metadata only)
- Caching would complicate freshness guarantees
Network Request Optimization
src/core/providers/enhanced_provider.pyline 115 -> reviewed (optimized)- Retry strategy configured with backoff
- Connection pooling via requests.Session
- Timeout values configurable via environment
- pool_connections=10, pool_maxsize=10 for HTTP adapter
Performance Metrics Missing
- No performance monitoring for slow endpoints -> reviewed (future enhancement)
- Consider adding middleware for request timing
- Log slow requests (>1s) automatically
- Future: Integrate Prometheus/Grafana for monitoring
- No database query logging -> reviewed (available in debug mode)
- SQLAlchemy echo=True enables query logging
- Controlled by settings.log_level == "DEBUG"
- Production should use external query monitoring
- No cache hit/miss metrics -> reviewed (future enhancement)
- In-memory caching doesn't track metrics
- Future: Implement cache metrics with Redis
- No background task performance tracking -> reviewed (future enhancement)
- Download service tracks progress internally
- Metrics exposed via WebSocket and API endpoints
- Future: Add detailed performance counters
- No file operation benchmarks -> reviewed (not critical for current scope)
- File operations are fast enough for typical usage
- Consider profiling if performance issues arise
📋 Issues by File and Category
Core Module Issues
src/cli/Main.py
- Code Quality: Class
SeriesAppduplicates coreSeriesAppfromsrc/core/SeriesApp.py- Consider consolidating or using inheritance
- Line 35:
_initialization_countduplicated state tracking
- Type Hints:
display_series()doesn't validate ifserie.nameisNonebefore using it - Import Organization: Imports not sorted (lines 1-11) - should follow isort convention
- Error Handling:
NoKeyFoundExceptionandMatchNotFoundErrorare bare except classes - need proper inheritance - Logging: Logging configuration at module level should be in centralized config
src/core/SeriesApp.py
- Global State: Line 73 -
series_app: Optional[SeriesApp] = Noneinfastapi_app.pyuses global state- Should use dependency injection instead
- Complexity:
Scan()method is complex (80+ lines) - should be broken into smaller methods - Error Context:
_handle_error()doesn't provide enough context about which operation failed
src/core/SerieScanner.py
- Code Quality:
is_null_or_whitespace()duplicates Python'sstr.isspace()- use built-in instead -> COMPLETED- Removed redundant function
- Replaced with direct Python idiom:
serie.key and serie.key.strip()
- Error Logging: Lines 167-182 catch exceptions but only log, don't propagate context
- Performance:
__find_mp4_files()might be inefficient for large directories - add progress callback
src/core/providers/base_provider.py
src/core/providers/aniworld_provider.py
- Import Organization: Lines 1-18 - imports not sorted (violates isort)
- Global State: Lines 24-26 - Multiple logger instances created at module level
- Should use centralized logging system
- Hardcoding: Line 42 - User-Agent string hardcoded (also at line 47 for Firefox)
- Extract to configuration constants
- Type Hints: Missing type hints on:
__init__()method parameters (no return type on implicit constructor)- Class attribute type annotations (line 41-62)
- Magic Strings: Line 38 - Hardcoded list of provider names should be enum
- Configuration: Timeouts hardcoded at line 22 - should use settings
src/core/providers/enhanced_provider.py
- Type Hints: Class constructor
__init__()missing type annotations (lines 40-96) - Documentation: Bare exception handlers at lines 418-419 - need specific exception types
- Code Quality:
with_error_recoverydecorator imported but usage unclear - Performance:
_create_robust_session()method not shown but likely creates multiple session objects
src/core/interfaces/providers.py
- Need to verify if any abstract methods lack type hints and docstrings
src/core/exceptions/Exceptions.py
- Need to verify custom exception hierarchy and documentation
Server Module Issues
src/server/fastapi_app.py
- Global State: Line 73 -
series_app: Optional[SeriesApp] = Nonestored globally- Use FastAPI dependency injection via
Depends()
- Use FastAPI dependency injection via
- CORS Configuration: Line 48 -
allow_origins=["*"]is production security issue- Add comment: "Configure appropriately for production"
- Extract to settings with environment-based defaults
- Error Handling:
startup_event()at line 79 - missing try-except to handle initialization failures - Type Hints:
startup_event()function missing type annotations - Documentation:
broadcast_callback()function inside event handler should be extracted to separate function - Logging: No error logging if
settings.anime_directoryis None
src/server/middleware/auth.py
- Performance: In-memory rate limiter (line 34) will leak memory - never cleans up old entries
- Need periodic cleanup or use Redis for production
- Security: Line 46 - Rate limiting only 60-second window, should be configurable
- Type Hints:
dispatch()method parameters properly typed, but return type could be explicit - Documentation:
_get_client_ip()method incomplete (line 94+ truncated) - Error Handling: Lines 81-86 - Silent failure if protected endpoint and no auth
- Should return 401 consistently
src/server/services/auth_service.py
- Documentation: Line 68 - Comment says "For now we update only in-memory" indicates incomplete implementation
- Create task to persist password hash to configuration file
- Type Hints:
_verify_password()at line 60 - no return type annotation (implicitbool) - Security: Line 71 - Minimum password length 8 characters, should be documented as security requirement
- State Management: In-memory
_faileddict (line 51) resets on process restart- Document this limitation and suggest Redis/database for production
src/server/database/service.py
- Documentation: Service layer methods need detailed docstrings explaining:
- Database constraints
- Transaction behavior
- Cascade delete behavior
- Error Handling: Methods don't specify which SQLAlchemy exceptions they might raise
src/server/database/models.py
- Documentation: Model relationships and cascade rules well-documented
- ✅ Type hints present and comprehensive (well done)
- Validation: No model-level validation before database insert
- Consider adding validators for constraints
src/server/services/download_service.py
- Performance: Line 85 -
deque(maxlen=100)for completed items - is this appropriate for long-running service? - Thread Safety: Uses
ThreadPoolExecutorbut thread-safety of queue operations not clear
src/server/utils/dependencies.py
- TODO Comments: Lines 223 and 233 - TODO comments for unimplemented features:
- "TODO: Implement rate limiting logic"
- "TODO: Implement request logging logic"
- Create separate task items for these
src/server/utils/system.py
- Exception Handling: Line 255 - bare
passstatement in exception handler- Should at least log the exception
src/server/api/anime.py
- Error Handling: Lines 35-39 - Multiple bare
except Exceptionhandlers- Need specific exception types and proper logging
- Code Quality: Lines 32-36 - Complex property access with
getattr()chains- Create helper function or model method to encapsulate
Models and Pydantic Issues
src/server/models/config.py
- Error Handling: Line 93 -
ValidationErrorcaught but only silently passed?- Should log or re-raise with context
Utility and Configuration Issues
src/config/settings.py
- Security: Line 12 -
master_passwordfield stored in environment during development- Add warning comment: "NEVER use this in production"
- Documentation: Settings class needs comprehensive docstring explaining each field
src/infrastructure/logging/GlobalLogger.py
- Need to review logging configuration for consistency
src/server/utils/logging.py
- Need to review for type hints and consistency with global logging
src/server/utils/template_helpers.py
- Need to review for type hints and docstrings
src/server/utils/log_manager.py
- Need to review for type hints and error handling
🔒 Security Issues
High Priority
- CORS Configuration (
src/server/fastapi_app.py, line 48)allow_origins=["*"]is insecure for production- Add environment-based configuration
- Global Password State (
src/server/services/auth_service.py, line 51)- In-memory failure tracking resets on restart
- Recommend using persistent storage (database/Redis)
Medium Priority
-
Rate Limiter Memory Leak (
src/server/middleware/auth.py, line 34)- Never cleans up old IP entries
- Add periodic cleanup or use Redis
-
Missing Authorization Checks (
src/server/middleware/auth.py, lines 81-86)- Some protected endpoints might silently allow unauthenticated access
📊 Code Style Issues
Documentation - Phase 1: Critical Sections
- Document database transaction behavior in
src/server/database/service.py
Documentation - Phase 2: Endpoints
- Expand docstrings on endpoints in
src/server/api/anime.py - Add parameter descriptions to endpoint handlers
- Document expected exceptions and error responses
Code Quality - Phase 1: Consolidation
- Investigate
SeriesAppduplication betweensrc/cli/Main.pyandsrc/core/SeriesApp.py - Consider consolidating into single implementation
- Update CLI to use core module instead of duplicate
Code Quality - Phase 2: Exception Handling
- Add specific exception types to bare
except:handlers - Add logging to all exception handlers
- Document exception context and causes
- Review exception handling in
src/core/providers/enhanced_provider.py(lines 410-421)
Code Quality - Phase 3: Refactoring
- Extract
broadcast_callback()fromstartup_event()insrc/server/fastapi_app.py - Break down complex
Scan()method insrc/core/SerieScanner.pyinto smaller functions - Replace
is_null_or_whitespace()with built-in string methods - Extract hardcoded provider names to enum in
src/core/providers/aniworld_provider.py
Security - Phase 1: Critical Fixes
- Make CORS configuration environment-based in
src/server/fastapi_app.py - Add startup validation to ensure
anime_directoryis configured
Security - Phase 2: Improvements
- Implement Redis-based rate limiter instead of in-memory in
src/server/middleware/auth.py - Add periodic cleanup to in-memory structures to prevent memory leaks
- Add logging for rate limit violations and auth failures
- Document security assumptions in
src/server/services/auth_service.py
Performance - Phase 1: Validation
- Profile
SerieScanner.__find_mp4_files()with large directories - Review deque sizing in
src/server/services/download_service.py(lines 85-86) - Verify thread-safety of queue operations
Performance - Phase 2: Optimization
- Add pagination to anime list endpoint if dataset is large
- Consider caching for search results in
src/core/providers/aniworld_provider.py - Review session creation overhead in provider initialization
Configuration Issues
- Extract hardcoded timeouts from
src/core/providers/aniworld_provider.pyline 22 to settings - Extract User-Agent strings to configuration constants
- Document all configuration options in settings module
- Add validation for required environment variables
Logging Issues
- Centralize logger creation across all modules
- Remove module-level logger instantiation where possible
- Document logging levels expected for each component
- Review
src/cli/Main.pylogging configuration (lines 12-22) - appears to suppress all logging
Testing/Comments
- Add inline comments explaining complex regex patterns in providers
- Add comments explaining retry logic and backoff strategies
- Document callback behavior and expected signatures
- Add comments to clarify WebSocket broadcast mechanisms
📌 Implementation Notes
Dependencies to Verify
error_handlermodule - currently missing, causing import error- All Pydantic models properly imported in service layers
- SQLAlchemy session management properly scoped
Configuration Management
- Review
src/config/settings.pyfor completeness - Ensure all configurable values are in settings, not hardcoded
- Document all environment variables needed
Testing Coverage
- Verify tests cover exception paths in
src/server/api/anime.py - Add tests for CORS configuration
- Test rate limiting behavior in middleware
- Test database transaction rollback scenarios
🔄 Validation Checklist Before Committing
For each issue fixed:
- Run Pylance to verify type hints are correct
- Run
isorton modified files to sort imports - Run
blackto format code to PEP8 standards - Run existing unit tests to ensure no regression
- Verify no new security vulnerabilities introduced
- Update docstrings if behavior changed
- Document any breaking API changes
Total Issues Identified: ~90 individual items across 8 categories Priority Distribution: 5 High | 15 Medium | 70 Low/Nice-to-have Estimated Effort: 40-60 hours for comprehensive quality improvement