Aniworld/docs/identifier_standardization_validation.md

423 lines
13 KiB
Markdown

# Series Identifier Standardization - Validation Instructions
## Overview
This document provides comprehensive instructions for AI agents to validate the **Series Identifier Standardization** change across the Aniworld codebase. The change standardizes `key` as the primary identifier for series and relegates `folder` to metadata-only status.
## Summary of the Change
| Field | Purpose | Usage |
| -------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------- |
| `key` | **Primary Identifier** - Provider-assigned, URL-safe (e.g., `attack-on-titan`) | All lookups, API operations, database queries, WebSocket events |
| `folder` | **Metadata Only** - Filesystem folder name (e.g., `Attack on Titan (2013)`) | Display purposes, filesystem operations only |
| `id` | **Database Primary Key** - Internal auto-increment integer | Database relationships only |
---
## Validation Checklist
### Phase 2: Application Layer Services
**Files to validate:**
1. **`src/server/services/anime_service.py`**
- [ ] Class docstring explains `key` vs `folder` convention
- [ ] All public methods accept `key` parameter for series identification
- [ ] No methods accept `folder` as an identifier parameter
- [ ] Event handler methods document key/folder convention
- [ ] Progress tracking uses `key` in progress IDs where possible
2. **`src/server/services/download_service.py`**
- [ ] `DownloadItem` uses `serie_id` (which should be the `key`)
- [ ] `serie_folder` is documented as metadata only
- [ ] Queue operations look up series by `key` not `folder`
- [ ] Persistence format includes `serie_id` as the key identifier
3. **`src/server/services/websocket_service.py`**
- [ ] Module docstring explains key/folder convention
- [ ] Broadcast methods include `key` in message payloads
- [ ] `folder` is documented as optional/display only
- [ ] Event broadcasts use `key` as primary identifier
4. **`src/server/services/scan_service.py`**
- [ ] Scan operations use `key` for identification
- [ ] Progress events include `key` field
5. **`src/server/services/progress_service.py`**
- [ ] Progress tracking includes `key` in metadata where applicable
**Validation Commands:**
```bash
# Check service layer for folder-based lookups
grep -rn "by_folder\|folder.*=.*identifier\|folder.*lookup" src/server/services/ --include="*.py"
# Verify key is used in services
grep -rn "serie_id\|series_key\|key.*identifier" src/server/services/ --include="*.py"
```
---
### Phase 3: API Endpoints and Responses
**Files to validate:**
1. **`src/server/api/anime.py`**
- [ ] `AnimeSummary` model has `key` field with proper description
- [ ] `AnimeDetail` model has `key` field with proper description
- [ ] API docstrings explain `key` is the primary identifier
- [ ] `folder` field descriptions state "metadata only"
- [ ] Endpoint paths use `key` parameter (e.g., `/api/anime/{key}`)
- [ ] No endpoints use `folder` as path parameter for lookups
2. **`src/server/api/download.py`**
- [ ] Download endpoints use `serie_id` (key) for operations
- [ ] Request models document key/folder convention
- [ ] Response models include `key` as primary identifier
3. **`src/server/models/anime.py`**
- [ ] Module docstring explains identifier convention
- [ ] `AnimeSeriesResponse` has `key` field properly documented
- [ ] `SearchResult` has `key` field properly documented
- [ ] Field validators normalize `key` to lowercase
- [ ] `folder` fields document metadata-only purpose
4. **`src/server/models/download.py`**
- [ ] `DownloadItem` has `serie_id` documented as the key
- [ ] `serie_folder` documented as metadata only
- [ ] Field descriptions are clear about primary vs metadata
5. **`src/server/models/websocket.py`**
- [ ] Module docstring explains key/folder convention
- [ ] Message models document `key` as primary identifier
- [ ] `folder` documented as optional display metadata
**Validation Commands:**
```bash
# Check API endpoints for folder-based paths
grep -rn "folder.*Path\|/{folder}" src/server/api/ --include="*.py"
# Verify key is used in endpoints
grep -rn "/{key}\|series_key\|serie_id" src/server/api/ --include="*.py"
# Check model field descriptions
grep -rn "Field.*description.*identifier\|Field.*description.*key\|Field.*description.*folder" src/server/models/ --include="*.py"
```
---
### Phase 4: Frontend Integration
**Files to validate:**
1. **`src/server/web/static/js/app.js`**
- [ ] `selectedSeries` Set uses `key` values, not `folder`
- [ ] `seriesData` array comments indicate `key` as primary identifier
- [ ] Selection operations use `key` property
- [ ] API calls pass `key` for series identification
- [ ] WebSocket message handlers extract `key` from data
- [ ] No code uses `folder` for series lookups
2. **`src/server/web/static/js/queue.js`**
- [ ] Queue items reference series by `key` or `serie_id`
- [ ] WebSocket handlers extract `key` from messages
- [ ] UI operations use `key` for identification
- [ ] `serie_folder` used only for display
3. **`src/server/web/static/js/websocket_client.js`**
- [ ] Message handling preserves `key` field
- [ ] No transformation that loses `key` information
4. **HTML Templates** (`src/server/web/templates/`)
- [ ] Data attributes use `key` for identification (e.g., `data-key`)
- [ ] No `data-folder` used for identification purposes
- [ ] Display uses `folder` or `name` appropriately
**Validation Commands:**
```bash
# Check JavaScript for folder-based lookups
grep -rn "\.folder\s*==\|folder.*identifier\|getByFolder" src/server/web/static/js/ --include="*.js"
# Check data attributes in templates
grep -rn "data-key\|data-folder\|data-series" src/server/web/templates/ --include="*.html"
```
---
### Phase 5: Database Operations
**Files to validate:**
1. **`src/server/database/models.py`**
- [ ] `AnimeSeries` model has `key` column with unique constraint
- [ ] `key` column is indexed
- [ ] Model docstring explains identifier convention
- [ ] `folder` column docstring states "metadata only"
- [ ] Validators check `key` is not empty
- [ ] No `folder` uniqueness constraint (unless intentional)
2. **`src/server/database/service.py`**
- [ ] `AnimeSeriesService` has `get_by_key()` method
- [ ] Class docstring explains lookup convention
- [ ] No `get_by_folder()` without deprecation
- [ ] All CRUD operations use `key` for identification
- [ ] Logging uses `key` in messages
**Validation Commands:**
```bash
# Check database models
grep -rn "unique=True\|index=True" src/server/database/models.py
# Check service lookups
grep -rn "get_by_key\|get_by_folder\|filter.*key\|filter.*folder" src/server/database/service.py
```
---
### Phase 6: WebSocket Events
**Files to validate:**
1. **All WebSocket broadcast calls** should include `key` in payload:
- `download_progress` → includes `key`
- `download_complete` → includes `key`
- `download_failed` → includes `key`
- `scan_progress` → includes `key` (where applicable)
- `queue_status` → items include `key`
2. **Message format validation:**
```json
{
"type": "download_progress",
"data": {
"key": "attack-on-titan", // PRIMARY - always present
"folder": "Attack on Titan (2013)", // OPTIONAL - display only
"progress": 45.5,
...
}
}
```
**Validation Commands:**
```bash
# Check WebSocket broadcast calls
grep -rn "broadcast.*key\|send_json.*key" src/server/services/ --include="*.py"
# Check message construction
grep -rn '"key":\|"folder":' src/server/services/ --include="*.py"
```
---
### Phase 7: Test Coverage
**Test files to validate:**
1. **`tests/unit/test_serie_class.py`**
- [ ] Tests for key validation (empty, whitespace, None)
- [ ] Tests for key as primary identifier
- [ ] Tests for folder as metadata only
2. **`tests/unit/test_anime_service.py`**
- [ ] Service tests use `key` for operations
- [ ] Mock objects have proper `key` attributes
3. **`tests/unit/test_database_models.py`**
- [ ] Tests for `key` uniqueness constraint
- [ ] Tests for `key` validation
4. **`tests/unit/test_database_service.py`**
- [ ] Tests for `get_by_key()` method
- [ ] No tests for deprecated folder lookups
5. **`tests/api/test_anime_endpoints.py`**
- [ ] API tests use `key` in requests
- [ ] Mock `FakeSerie` has proper `key` attribute
- [ ] Comments explain key/folder convention
6. **`tests/unit/test_websocket_service.py`**
- [ ] WebSocket tests verify `key` in messages
- [ ] Broadcast tests include `key` in payload
**Validation Commands:**
```bash
# Run all tests
conda run -n AniWorld python -m pytest tests/ -v --tb=short
# Run specific test files
conda run -n AniWorld python -m pytest tests/unit/test_serie_class.py -v
conda run -n AniWorld python -m pytest tests/unit/test_database_models.py -v
conda run -n AniWorld python -m pytest tests/api/test_anime_endpoints.py -v
# Search tests for identifier usage
grep -rn "key.*identifier\|folder.*metadata" tests/ --include="*.py"
```
---
## Common Issues to Check
### 1. Inconsistent Naming
Look for inconsistent parameter names:
- `serie_key` vs `series_key` vs `key`
- `serie_id` should refer to `key`, not database `id`
- `serie_folder` vs `folder`
### 2. Missing Documentation
Check that ALL models, services, and APIs document:
- What `key` is and how to use it
- That `folder` is metadata only
### 3. Legacy Code Patterns
Search for deprecated patterns:
```python
# Bad - using folder for lookup
series = get_by_folder(folder_name)
# Good - using key for lookup
series = get_by_key(series_key)
```
### 4. API Response Consistency
Verify all API responses include:
- `key` field (primary identifier)
- `folder` field (optional, for display)
### 5. Frontend Data Flow
Verify the frontend:
- Stores `key` in selection sets
- Passes `key` to API calls
- Uses `folder` only for display
---
## Deprecation Warnings
The following should have deprecation warnings (for removal in v3.0.0):
1. Any `get_by_folder()` or `GetByFolder()` methods
2. Any API endpoints that accept `folder` as a lookup parameter
3. Any frontend code that uses `folder` for identification
**Example deprecation:**
```python
import warnings
def get_by_folder(self, folder: str):
"""DEPRECATED: Use get_by_key() instead."""
warnings.warn(
"get_by_folder() is deprecated, use get_by_key(). "
"Will be removed in v3.0.0",
DeprecationWarning,
stacklevel=2
)
# ... implementation
```
---
## Automated Validation Script
Run this script to perform automated checks:
```bash
#!/bin/bash
# identifier_validation.sh
echo "=== Series Identifier Standardization Validation ==="
echo ""
echo "1. Checking core entities..."
grep -rn "PRIMARY IDENTIFIER\|metadata only" src/core/entities/ --include="*.py" | head -20
echo ""
echo "2. Checking for deprecated folder lookups..."
grep -rn "get_by_folder\|GetByFolder" src/ --include="*.py"
echo ""
echo "3. Checking API models for key field..."
grep -rn 'key.*Field\|Field.*key' src/server/models/ --include="*.py" | head -20
echo ""
echo "4. Checking database models..."
grep -rn "key.*unique\|key.*index" src/server/database/models.py
echo ""
echo "5. Checking frontend key usage..."
grep -rn "selectedSeries\|\.key\|data-key" src/server/web/static/js/ --include="*.js" | head -20
echo ""
echo "6. Running tests..."
conda run -n AniWorld python -m pytest tests/unit/test_serie_class.py -v --tb=short
echo ""
echo "=== Validation Complete ==="
```
---
## Expected Results
After validation, you should confirm:
1. ✅ All core entities use `key` as primary identifier
2. ✅ All services look up series by `key`
3. ✅ All API endpoints use `key` for operations
4. ✅ All database queries use `key` for lookups
5. ✅ Frontend uses `key` for selection and API calls
6. ✅ WebSocket events include `key` in payload
7. ✅ All tests pass
8. ✅ Documentation clearly explains the convention
9. ✅ Deprecation warnings exist for legacy patterns
---
## Sign-off
Once validation is complete, update this section:
- [x] Phase 1: Core Entities - Validated by: **AI Agent** Date: **28 Nov 2025**
- [x] Phase 2: Services - Validated by: **AI Agent** Date: **28 Nov 2025**
- [ ] Phase 3: API - Validated by: **\_\_\_** Date: **\_\_\_**
- [ ] Phase 4: Frontend - Validated by: **\_\_\_** Date: **\_\_\_**
- [ ] Phase 5: Database - Validated by: **\_\_\_** Date: **\_\_\_**
- [ ] Phase 6: WebSocket - Validated by: **\_\_\_** Date: **\_\_\_**
- [ ] Phase 7: Tests - Validated by: **\_\_\_** Date: **\_\_\_**
**Final Approval:** \***\*\*\*\*\***\_\_\_\***\*\*\*\*\*** Date: **\*\***\_**\*\***