Aniworld/QualityTODO.md
2025-10-22 15:54:36 +02:00

27 KiB
Raw Blame History

Quality Issues and TODO List

Aniworld Web Application Development Instructions

This document provides detailed tasks for AI agents to implement a modern web application for the Aniworld anime download manager. All tasks should follow the coding guidelines specified in the project's copilot instructions.

Project Overview

The goal is to create a FastAPI-based web application that provides a modern interface for the existing Aniworld anime download functionality. The core anime logic should remain in SeriesApp.py while the web layer provides REST API endpoints and a responsive UI.

Architecture Principles

  • Single Responsibility: Each file/class has one clear purpose
  • Dependency Injection: Use FastAPI's dependency system
  • Clean Separation: Web layer calls core logic, never the reverse
  • File Size Limit: Maximum 500 lines per file
  • Type Hints: Use comprehensive type annotations
  • Error Handling: Proper exception handling and logging

Additional Implementation Guidelines

Code Style and Standards

  • Type Hints: Use comprehensive type annotations throughout all modules
  • Docstrings: Follow PEP 257 for function and class documentation
  • Error Handling: Implement custom exception classes with meaningful messages
  • Logging: Use structured logging with appropriate log levels
  • Security: Validate all inputs and sanitize outputs
  • Performance: Use async/await patterns for I/O operations

📞 Escalation

If you encounter:

  • Architecture issues requiring design decisions
  • Tests that conflict with documented requirements
  • Breaking changes needed
  • Unclear requirements or expectations

Document the issue and escalate rather than guessing.


📚 Helpful Commands

# Run all tests
conda run -n AniWorld python -m pytest tests/ -v --tb=short

# Run specific test file
conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py -v

# Run specific test class
conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService -v

# Run specific test
conda run -n AniWorld python -m pytest tests/unit/test_websocket_service.py::TestWebSocketService::test_broadcast_download_progress -v

# Run with extra verbosity
conda run -n AniWorld python -m pytest tests/ -vv

# Run with full traceback
conda run -n AniWorld python -m pytest tests/ -v --tb=long

# Run and stop at first failure
conda run -n AniWorld python -m pytest tests/ -v -x

# Run tests matching pattern
conda run -n AniWorld python -m pytest tests/ -v -k "auth"

# Show all print statements
conda run -n AniWorld python -m pytest tests/ -v -s

📊 Detailed Analysis: The 7 Quality Criteria

1 Code Follows PEP8 and Project Coding Standards


2 Type Hints Used Where Applicable

Missing Type Hints by Category

Abstract Base Classes (Critical)

Service Classes

API Endpoints

Dependencies and Utils

Core Classes

Invalid Type Hint Syntax


3 Clear, Self-Documenting Code Written

Missing or Inadequate Docstrings

Module-Level Docstrings

Class Docstrings

Method/Function Docstrings

Unclear Variable Names

Unclear Comments or Missing Context

  • src/server/api/download.py line 51
    • Backward compatibility comment clear but needs more detail

4 Complex Logic Commented

Complex Algorithms Without Comments

JSON/HTML Parsing Logic


5 No Shortcuts or Hacks Used

Code Smells and Shortcuts

Duplicate Code

  • src/cli/Main.py vs src/core/SeriesApp.py
    • Entire SeriesApp class duplicated (89 lines vs 590 lines)
    • CLI version is oversimplified - should use core version
    • Line 35-50 in CLI duplicates core initialization logic
  • src/core/providers/aniworld_provider.py vs src/core/providers/enhanced_provider.py
    • Headers dictionary duplicated (lines 38-50 similar)
    • Provider list duplicated (line 38 vs line 45)
    • User-Agent strings duplicated

Global Variables (Temporary Storage)

  • src/server/fastapi_app.py line 73
    • series_app: Optional[SeriesApp] = None global storage
    • Should use FastAPI dependency injection instead
    • Problematic for testing and multiple instances

Logging Configuration Workarounds

  • src/cli/Main.py lines 12-22
    • Manual logger handler removal is hacky
    • for h in logging.root.handlers: logging.root.removeHandler(h) is a hack
    • Should use proper logging configuration
    • Multiple loggers created with file handlers at odd paths (line 26)

Hardcoded Values

  • src/core/providers/aniworld_provider.py line 22
    • timeout = int(os.getenv("DOWNLOAD_TIMEOUT", 600)) at module level
    • Should be in settings class
  • src/core/providers/aniworld_provider.py lines 38, 47
    • User-Agent strings hardcoded
    • Provider list hardcoded
  • src/cli/Main.py line 227
    • Network path hardcoded: "\\\\sshfs.r\\ubuntu@192.168.178.43\\media\\serien\\Serien"
    • Should be configuration

Exception Handling Shortcuts

  • src/core/providers/enhanced_provider.py lines 410-421
    • Bare except Exception: without specific types (line 418)
    • Multiple overlapping exception handlers (lines 410-425)
    • Should use specific exception hierarchy
  • src/server/api/anime.py lines 35-39
    • Bare except Exception: handlers should specify types
  • src/server/models/config.py line 93
    • except ValidationError: pass - silently ignores error

Type Casting Workarounds

  • src/server/api/download.py line 52
    • Complex .model_dump(mode="json") for serialization
    • Should use proper model serialization methods
  • src/server/utils/dependencies.py line 36
    • Type casting with .get() and defaults scattered throughout

Conditional Hacks

  • src/server/utils/dependencies.py line 260
    • running_tests = "PYTEST_CURRENT_TEST" in os.environ or "pytest" in sys.modules
    • Hacky test detection - should use proper test mode flag

6 Security Considerations Addressed

Authentication & Authorization

Weak CORS Configuration

  • src/server/fastapi_app.py line 48
    • allow_origins=["*"] allows any origin
    • HIGH RISK in production
    • Should be: allow_origins=settings.allowed_origins (environment-based)
  • No CORS rate limiting by origin

Missing Authorization Checks

  • src/server/middleware/auth.py lines 81-86
    • Silent failure on missing auth for protected endpoints
    • Should consistently return 401 status
    • Some endpoints might bypass auth silently

In-Memory Session Storage

  • src/server/services/auth_service.py line 51
    • In-memory _failed dict resets on restart
    • Attacker can restart process to bypass rate limiting
    • Should use Redis or database
  • Line 51 comment: "For now we update only in-memory"
    • Indicates incomplete security implementation

Input Validation

Unvalidated User Input

  • src/cli/Main.py line 80
    • User input for file paths not validated
    • Could allow path traversal attacks
  • src/core/SerieScanner.py line 37
    • Directory path basePath not validated
    • Could read files outside intended directory
  • src/server/api/anime.py line 70
    • Search query not validated for injection
  • src/core/providers/aniworld_provider.py line 300+
    • URL parameters not sanitized

Missing Parameter Validation

  • src/core/providers/enhanced_provider.py line 280
    • Season/episode validation present but minimal
  • src/server/database/models.py
    • No length validation on string fields
    • No range validation on numeric fields

Secrets and Credentials

Hardcoded Secrets

  • src/config/settings.py line 9
    • jwt_secret_key: str = Field(default="your-secret-key-here", env="JWT_SECRET_KEY")
    • Default secret exposed in code
    • Should have NO default, or random default
  • .env file might contain secrets (if exists)
    • Should be in .gitignore

Plaintext Password Storage

  • src/config/settings.py line 12
    • master_password: Optional[str] stored in env (development only)
    • Should NEVER be used in production
    • Add bold warning comment

Master Password Implementation

  • src/server/services/auth_service.py line 71
    • Minimum 8 character password requirement documented
    • Should enforce stronger requirements (uppercase, numbers, symbols)

Data Protection

No Encryption of Sensitive Data

  • Downloaded files not verified with checksums
  • No integrity checking of stored data
  • No encryption of sensitive config values

File Permission Issues

  • src/core/providers/aniworld_provider.py line 26
    • Log file created with default permissions
    • Path: "../../download_errors.log" - relative path is unsafe
    • Should use absolute paths with secure permissions

Logging of Sensitive Data

  • Check all logger.debug() calls for parameter logging
    • URLs might contain API keys
    • Search queries might contain sensitive terms
  • Example: src/core/providers/enhanced_provider.py line 260
    • logger.debug() might log URLs with sensitive data

Network Security

Unvalidated External Connections

  • src/core/providers/aniworld_provider.py line 60
    • HTTP retry configuration but no SSL verification flag check
  • src/core/providers/enhanced_provider.py line 115
    • HTTP error codes 500-524 auto-retry without logging suspicious activity

Missing SSL/TLS Configuration

  • Verify SSL certificate validation enabled
  • Check for verify=False in requests calls (should be True)

Database Security

No SQL Injection Protection

  • Check src/server/database/service.py for parameterized queries
    • Should use SQLAlchemy properly (appears to be OK)
  • String interpolation in queries should not exist

No Database Access Control

  • Single database user for all operations
  • No row-level security
  • No audit logging of data changes

7 Performance Validated

Algorithmic Efficiency Issues

File Scanning Performance

  • src/core/SerieScanner.py line 105+
    • __find_mp4_files() - potential O(n²) complexity
    • Recursive directory traversal not profiled
    • No caching or incremental scanning
    • Large directories (>10K files) might cause timeout

Download Queue Processing

  • src/server/services/download_service.py line 240
    • self._pending_queue.remove(item) - O(n) operation in deque
    • Should use dict for O(1) lookup before removal
    • Line 85-86: deque maxlen limits might cause data loss

Provider Search Performance

  • src/core/providers/enhanced_provider.py line 220
    • Multiple parsing strategies tried sequentially
    • Should fail fast on obvious errors instead of trying all 3
    • No performance metrics logged

String Operations

  • src/cli/Main.py line 118
    • Nested sum() with comprehensions - O(n*m) complexity
    • total_episodes = sum(sum(len(ep) for ep in serie.episodeDict.values()) for serie in series)
    • No streaming/generator pattern

Regular Expression Compilation

  • src/core/providers/streaming/doodstream.py line 35
    • Regex patterns compiled on every call
    • Should compile once at module level
    • Example: r"\$\.get\('([^']*\/pass_md5\/[^']*)'" compiled repeatedly

Resource Usage Issues

Memory Leaks/Unbounded Growth

  • src/server/middleware/auth.py line 34
    • self._rate: Dict[str, Dict[str, float]] never cleaned
    • Old IP addresses accumulate forever
    • Solution: add timestamp-based cleanup
  • src/server/services/download_service.py line 85-86
    • deque(maxlen=100) and deque(maxlen=50) drop old items
    • Might lose important history

Connection Pool Configuration

  • src/server/database/connection.py
    • Check if connection pooling is configured
    • No explicit pool size limits found
    • Could exhaust database connections

Large Data Structure Initialization

  • src/cli/Main.py line 118
    • Loading all series at once
    • Should use pagination for large datasets

Caching Opportunities

No Request Caching

  • src/server/api/anime.py - endpoints hit database every time
    • No caching headers set
    • @cache decorator could be used
  • src/core/providers/enhanced_provider.py
    • Search results not cached
    • Same search query hits network repeatedly

No Database Query Optimization

  • src/server/services/anime_service.py
    • No eager loading (selectinload) for relationships
    • N+1 query problems likely
  • src/server/database/service.py line 200+
    • Check for missing .selectinload() in queries

Concurrent Request Handling

Thread Pool Sizing

  • src/server/services/download_service.py line 85
    • ThreadPoolExecutor(max_workers=max_concurrent_downloads)
    • Default is 2, should be configurable
    • No queue depth limits

Async/Sync Blocking Calls

  • src/server/api/anime.py line 30+
    • Series list operations might block
    • Database queries appear async (OK)
  • src/server/services/auth_service.py
    • Methods are synchronous but called from async endpoints
    • Should verify no blocking calls

I/O Performance

Database Query Count

  • /api/v1/anime endpoint
    • Likely makes multiple queries for each series
    • Should use single query with joins/eager loading
    • Test with N series to find N+1 issues

File I/O Optimization

  • src/core/SerieScanner.py line 140+
    • Each folder reads data file
    • Could batch reads or cache

Network Request Optimization

  • src/core/providers/enhanced_provider.py line 115
    • Retry strategy good
    • No connection pooling verification
    • Should check request timeout values

Performance Metrics Missing

  • No performance monitoring for slow endpoints
  • No database query logging
  • No cache hit/miss metrics
  • No background task performance tracking
  • No file operation benchmarks

📋 Issues by File and Category

Core Module Issues

src/cli/Main.py

  • Code Quality: Class SeriesApp duplicates core SeriesApp from src/core/SeriesApp.py
    • Consider consolidating or using inheritance
    • Line 35: _initialization_count duplicated state tracking
  • Type Hints: display_series() doesn't validate if serie.name is None before using it
  • Import Organization: Imports not sorted (lines 1-11) - should follow isort convention
  • Error Handling: NoKeyFoundException and MatchNotFoundError are bare except classes - need proper inheritance
  • Logging: Logging configuration at module level should be in centralized config

src/core/SeriesApp.py

  • Global State: Line 73 - series_app: Optional[SeriesApp] = None in fastapi_app.py uses global state
    • Should use dependency injection instead
  • Complexity: Scan() method is complex (80+ lines) - should be broken into smaller methods
  • Error Context: _handle_error() doesn't provide enough context about which operation failed

src/core/SerieScanner.py

  • Code Quality: is_null_or_whitespace() duplicates Python's str.isspace() - use built-in instead
  • Error Logging: Lines 167-182 catch exceptions but only log, don't propagate context
  • Performance: __find_mp4_files() might be inefficient for large directories - add progress callback

src/core/providers/base_provider.py

src/core/providers/aniworld_provider.py

  • Import Organization: Lines 1-18 - imports not sorted (violates isort)
  • Global State: Lines 24-26 - Multiple logger instances created at module level
    • Should use centralized logging system
  • Hardcoding: Line 42 - User-Agent string hardcoded (also at line 47 for Firefox)
    • Extract to configuration constants
  • Type Hints: Missing type hints on:
    • __init__() method parameters (no return type on implicit constructor)
    • Class attribute type annotations (line 41-62)
  • Magic Strings: Line 38 - Hardcoded list of provider names should be enum
  • Configuration: Timeouts hardcoded at line 22 - should use settings

src/core/providers/enhanced_provider.py

  • Type Hints: Class constructor __init__() missing type annotations (lines 40-96)
  • Documentation: Bare exception handlers at lines 418-419 - need specific exception types
  • Code Quality: with_error_recovery decorator imported but usage unclear
  • Performance: _create_robust_session() method not shown but likely creates multiple session objects

src/core/interfaces/providers.py

  • Need to verify if any abstract methods lack type hints and docstrings

src/core/exceptions/Exceptions.py

  • Need to verify custom exception hierarchy and documentation

Server Module Issues

src/server/fastapi_app.py

  • Global State: Line 73 - series_app: Optional[SeriesApp] = None stored globally
    • Use FastAPI dependency injection via Depends()
  • CORS Configuration: Line 48 - allow_origins=["*"] is production security issue
    • Add comment: "Configure appropriately for production"
    • Extract to settings with environment-based defaults
  • Error Handling: startup_event() at line 79 - missing try-except to handle initialization failures
  • Type Hints: startup_event() function missing type annotations
  • Documentation: broadcast_callback() function inside event handler should be extracted to separate function
  • Logging: No error logging if settings.anime_directory is None

src/server/middleware/auth.py

  • Performance: In-memory rate limiter (line 34) will leak memory - never cleans up old entries
    • Need periodic cleanup or use Redis for production
  • Security: Line 46 - Rate limiting only 60-second window, should be configurable
  • Type Hints: dispatch() method parameters properly typed, but return type could be explicit
  • Documentation: _get_client_ip() method incomplete (line 94+ truncated)
  • Error Handling: Lines 81-86 - Silent failure if protected endpoint and no auth
    • Should return 401 consistently

src/server/services/auth_service.py

  • Documentation: Line 68 - Comment says "For now we update only in-memory" indicates incomplete implementation
    • Create task to persist password hash to configuration file
  • Type Hints: _verify_password() at line 60 - no return type annotation (implicit bool)
  • Security: Line 71 - Minimum password length 8 characters, should be documented as security requirement
  • State Management: In-memory _failed dict (line 51) resets on process restart
    • Document this limitation and suggest Redis/database for production

src/server/database/service.py

  • Documentation: Service layer methods need detailed docstrings explaining:
    • Database constraints
    • Transaction behavior
    • Cascade delete behavior
  • Error Handling: Methods don't specify which SQLAlchemy exceptions they might raise

src/server/database/models.py

  • Documentation: Model relationships and cascade rules well-documented
    • Type hints present and comprehensive (well done)
  • Validation: No model-level validation before database insert
    • Consider adding validators for constraints

src/server/services/download_service.py

  • Performance: Line 85 - deque(maxlen=100) for completed items - is this appropriate for long-running service?
  • Thread Safety: Uses ThreadPoolExecutor but thread-safety of queue operations not clear

src/server/utils/dependencies.py

  • TODO Comments: Lines 223 and 233 - TODO comments for unimplemented features:
    • "TODO: Implement rate limiting logic"
    • "TODO: Implement request logging logic"
    • Create separate task items for these

src/server/utils/system.py

  • Exception Handling: Line 255 - bare pass statement in exception handler
    • Should at least log the exception

src/server/api/anime.py

  • Error Handling: Lines 35-39 - Multiple bare except Exception handlers
    • Need specific exception types and proper logging
  • Code Quality: Lines 32-36 - Complex property access with getattr() chains
    • Create helper function or model method to encapsulate

Models and Pydantic Issues

src/server/models/config.py

  • Error Handling: Line 93 - ValidationError caught but only silently passed?
    • Should log or re-raise with context

Utility and Configuration Issues

src/config/settings.py

  • Security: Line 12 - master_password field stored in environment during development
    • Add warning comment: "NEVER use this in production"
  • Documentation: Settings class needs comprehensive docstring explaining each field

src/infrastructure/logging/GlobalLogger.py

  • Need to review logging configuration for consistency

src/server/utils/logging.py

  • Need to review for type hints and consistency with global logging

src/server/utils/template_helpers.py

  • Need to review for type hints and docstrings

src/server/utils/log_manager.py

  • Need to review for type hints and error handling

🔒 Security Issues

High Priority

  • CORS Configuration (src/server/fastapi_app.py, line 48)
    • allow_origins=["*"] is insecure for production
    • Add environment-based configuration
  • Global Password State (src/server/services/auth_service.py, line 51)
    • In-memory failure tracking resets on restart
    • Recommend using persistent storage (database/Redis)

Medium Priority

  • Rate Limiter Memory Leak (src/server/middleware/auth.py, line 34)

    • Never cleans up old IP entries
    • Add periodic cleanup or use Redis
  • Missing Authorization Checks (src/server/middleware/auth.py, lines 81-86)

    • Some protected endpoints might silently allow unauthenticated access

📊 Code Style Issues

Documentation - Phase 1: Critical Sections

  • Document database transaction behavior in src/server/database/service.py

Documentation - Phase 2: Endpoints

  • Expand docstrings on endpoints in src/server/api/anime.py
  • Add parameter descriptions to endpoint handlers
  • Document expected exceptions and error responses

Code Quality - Phase 1: Consolidation

  • Investigate SeriesApp duplication between src/cli/Main.py and src/core/SeriesApp.py
  • Consider consolidating into single implementation
  • Update CLI to use core module instead of duplicate

Code Quality - Phase 2: Exception Handling

  • Add specific exception types to bare except: handlers
  • Add logging to all exception handlers
  • Document exception context and causes
  • Review exception handling in src/core/providers/enhanced_provider.py (lines 410-421)

Code Quality - Phase 3: Refactoring

  • Extract broadcast_callback() from startup_event() in src/server/fastapi_app.py
  • Break down complex Scan() method in src/core/SerieScanner.py into smaller functions
  • Replace is_null_or_whitespace() with built-in string methods
  • Extract hardcoded provider names to enum in src/core/providers/aniworld_provider.py

Security - Phase 1: Critical Fixes

  • Make CORS configuration environment-based in src/server/fastapi_app.py
  • Add startup validation to ensure anime_directory is configured

Security - Phase 2: Improvements

  • Implement Redis-based rate limiter instead of in-memory in src/server/middleware/auth.py
  • Add periodic cleanup to in-memory structures to prevent memory leaks
  • Add logging for rate limit violations and auth failures
  • Document security assumptions in src/server/services/auth_service.py

Performance - Phase 1: Validation

  • Profile SerieScanner.__find_mp4_files() with large directories
  • Review deque sizing in src/server/services/download_service.py (lines 85-86)
  • Verify thread-safety of queue operations

Performance - Phase 2: Optimization

  • Add pagination to anime list endpoint if dataset is large
  • Consider caching for search results in src/core/providers/aniworld_provider.py
  • Review session creation overhead in provider initialization

Configuration Issues

  • Extract hardcoded timeouts from src/core/providers/aniworld_provider.py line 22 to settings
  • Extract User-Agent strings to configuration constants
  • Document all configuration options in settings module
  • Add validation for required environment variables

Logging Issues

  • Centralize logger creation across all modules
  • Remove module-level logger instantiation where possible
  • Document logging levels expected for each component
  • Review src/cli/Main.py logging configuration (lines 12-22) - appears to suppress all logging

Testing/Comments

  • Add inline comments explaining complex regex patterns in providers
  • Add comments explaining retry logic and backoff strategies
  • Document callback behavior and expected signatures
  • Add comments to clarify WebSocket broadcast mechanisms

📌 Implementation Notes

Dependencies to Verify

  • error_handler module - currently missing, causing import error
  • All Pydantic models properly imported in service layers
  • SQLAlchemy session management properly scoped

Configuration Management

  • Review src/config/settings.py for completeness
  • Ensure all configurable values are in settings, not hardcoded
  • Document all environment variables needed

Testing Coverage

  • Verify tests cover exception paths in src/server/api/anime.py
  • Add tests for CORS configuration
  • Test rate limiting behavior in middleware
  • Test database transaction rollback scenarios

🔄 Validation Checklist Before Committing

For each issue fixed:

  • Run Pylance to verify type hints are correct
  • Run isort on modified files to sort imports
  • Run black to format code to PEP8 standards
  • Run existing unit tests to ensure no regression
  • Verify no new security vulnerabilities introduced
  • Update docstrings if behavior changed
  • Document any breaking API changes

Total Issues Identified: ~90 individual items across 8 categories Priority Distribution: 5 High | 15 Medium | 70 Low/Nice-to-have Estimated Effort: 40-60 hours for comprehensive quality improvement