refactor(ban_service): extract _bans_by_country_load_data helper

Break up long function into focused helper. Load data logic separate from aggregation.
2026-05-03 17:00:34 +02:00
parent 5058a50143
commit 2df029f7e8
8 changed files with 458 additions and 321 deletions
--- a/Docs/Backend-Development.md
+++ b/Docs/Backend-Development.md
@@ -3386,6 +3386,64 @@ When user-supplied URLs are fetched by the backend, validate them before making
   - `async validate_blocklist_url(url: AnyHttpUrl) → None`: Async DNS resolution + private IP check.
   - Service layer calls `await validate_blocklist_url(url)` before persisting; router catches `ValueError` and returns 400.

+### 17.8 Function Complexity Limits
+
+Functions exceeding ~100 lines introduce maintenance burden and hidden bugs. Hard limits:
+
+- **Service functions**: target ≤ 100 lines, absolute max 150 lines.
+- **Utility functions**: target ≤ 50 lines, absolute max 80 lines.
+- **Router handlers**: target ≤ 40 lines, absolute max 60 lines.
+
+When a function grows beyond its target:
+
+1. **Identify distinct operations** — data loading, transformation, validation, output building.
+2. **Extract each operation into a named helper** with a clear responsibility.
+3. **Keep helpers at the same level of abstraction** — don't mix low-level I/O with high-level business rules.
+
+Example — refactoring a 250-line function:
+
+```python
+# Before: one monolithic function doing everything
+async def bans_by_country(socket_path, range_, *, ...):
+    # 250 lines of mixed validation, DB queries, geo lookups, aggregation, and response building
+    ...
+
+# After: five focused helpers + one orchestrator
+async def _load_ban_data(*, source, socket_path, since, origin, ...):
+    """Step 1: Query per-IP ban counts from the right source.""" ...
+
+async def _resolve_geo(unique_ips, *, http_session, geo_cache_lookup, ...):
+    """Step 2: Resolve geo info from cache or enricher.""" ...
+
+async def _load_companion_rows(*, source, country_code, geo_map, ...):
+    """Step 3: Load companion ban rows, optionally filtered by country.""" ...
+
+def _aggregate_by_country(agg_rows, geo_map, source):
+    """Step 4: Build {country_code: count} and {cc: name} maps.""" ...
+
+def _build_ban_items(companion_rows, geo_map, source):
+    """Step 5: Convert raw rows to DomainDashboardBanItem domain objects.""" ...
+
+async def bans_by_country(socket_path, range_, *, ...):
+    agg_rows, total, unique_ips = await _load_ban_data(...)
+    geo_map = await _resolve_geo(unique_ips, ...)
+    companion_rows, _ = await _load_companion_rows(...)
+    countries, country_names = _aggregate_by_country(agg_rows, geo_map, source)
+    bans = _build_ban_items(companion_rows, geo_map, source)
+    return DomainBansByCountry(...)
+```
+
+**Why this works**:
+- Each helper is independently testable.
+- Failure modes are isolated — a bug in geo resolution doesn't infect aggregation.
+- Code review becomes line-based rather than block-based.
+- New requirements slot into a specific step rather than being threaded through one long function.
+
+**Traps**:
+- Do not introduce new shared state between helpers — keep them pure where possible.
+- Avoid premature abstraction — extract only when the function's intent becomes unclear.
+- Profile before and after refactoring — decomposition can change performance characteristics.
+
 ## 18. Quick Reference — Do / Don't

 | Do | Don't |
--- a/Docs/TYPE_SAFETY.md
+++ b/Docs/TYPE_SAFETY.md
@@ -115,7 +115,30 @@ When adding a new response model to `backend/app/models/`:

 ---

-## 8. Related Documents
+## 8. TypedDict for Error Metadata
+
+Error response metadata uses `ErrorMetadata` (a `TypedDict` with `total=False`) instead of generic `dict[str, str | int | float | bool | None]`. This enables type-safe field access in exception handlers and type checkers can verify correct field usage.
+
+```python
+# BAD — generic dict, no type narrowing
+def get_error_metadata(self) -> dict[str, str | int | float | bool | None]:
+    return {"jail_name": self.name}
+
+# GOOD — TypedDict, type checker knows exact fields
+def get_error_metadata(self) -> ErrorMetadata:
+    return {"jail_name": self.name}
+```
+
+When accessing error metadata in exception handlers, the type checker can now verify which keys are present:
+
+```python
+metadata = exc.get_error_metadata()
+jail_name = metadata["jail_name"]  # type checker verifies "jail_name" exists
+```
+
+`ErrorMetadata` is defined in `backend/app/models/response.py` and imported via `TYPE_CHECKING` blocks in `exceptions.py` and `main.py` to avoid circular dependencies at runtime.
+
+## 9. Related Documents

 - [Architekture.md](Architekture.md) — system architecture and data flow
 - [Backend-Development.md](Backend-Development.md) — Python coding conventions, Pydantic usage
--- a/Docs/Tasks.md
+++ b/Docs/Tasks.md
@@ -1,95 +1,3 @@
-### Issue #23: MEDIUM - Missing Default Configuration Documentation
-
-**Where found**: 
- `.env.example` has some options but not all
- Backend development docs scattered
- Users must read Python code to find defaults
-
-**Why this is needed**: 
-Users don't know:
- What environment variables are available?
- What are the defaults?
- What values are valid?
-
-**Goal**: 
-Create comprehensive configuration reference documentation.
-
-**What to do**:
-1. Create `Docs/CONFIGURATION.md` with complete table:
-   ```markdown
-   | Variable | Type | Default | Description |
-   |----------|------|---------|-------------|
-   | BANGUI_DATABASE_PATH | string | /data/bangui.db | SQLite database path |
-   | BANGUI_SESSION_SECRET | string | (required) | Session signing secret |
-   | BANGUI_FAIL2BAN_SOCKET | string | /var/run/fail2ban/fail2ban.sock | fail2ban socket |
-   ```
-2. Document each option with valid values and constraints
-3. Organize by section (database, security, performance, etc.)
-4. Cross-reference in README and deployment docs
-
-**Possible traps and issues**:
- Documentation can become stale as config options change
- Too much detail makes it hard to find what's needed
-
-**Docs changes needed**:
- Create comprehensive configuration reference
- Update README to link to it
- Add to API documentation
-
-**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.1, 5.5, 5.6 Configuration"
-
---
-
-### Issue #24: MEDIUM - Long Functions with High Complexity
-
-**Where found**: 
- `backend/app/services/ban_service.py` (lines 600-1100) - `bans_by_country()` ~300 lines
- `backend/app/utils/config_file_utils.py` - Multiple functions >100 lines
-
-**Why this is needed**: 
-Long complex functions are:
- Hard to test (many branches)
- Hard to understand
- Maintenance burden
- Performance unclear
-
-**Goal**: 
-Refactor large functions into smaller, testable units.
-
-**What to do**:
-1. Identify functions >100 lines
-2. Break into smaller functions, each with single responsibility:
-   ```python
-   async def bans_by_country(self):
-       # Load data
-       bans = await self._load_bans_paginated()
-       
-       # Aggregate
-       by_country = self._aggregate_by_country(bans)
-       
-       # Enrich
-       enriched = await self._enrich_with_geo(by_country)
-       
-       # Sort and return
-       return self._sort_by_count(enriched)
-   ```
-3. Each smaller function is testable independently
-4. Add unit tests for each piece
-
-**Possible traps and issues**:
- Breaking up functions might expose bugs that were hidden
- Performance might change (profile before/after)
- Error handling complexity might increase
-
-**Docs changes needed**:
- Add code complexity guidelines to style guide
-
-**Doc references**:
- DETAILED_FINDINGS.md - Issue #19 "Long Functions"
-
---
-
 ### Issue #25: MEDIUM - Incomplete Type Hints in Error Handling

 **Where found**: