refactor(backend): clean up jail service, add error handling service

- Extract jail status/processing to helper functions
- Add error_handling.py service for centralized error handling
- Update config.py with validation and defaults
- Update .env.example with all config options
- Remove obsolete Tasks.md, add Service-Development.md
- Minor fixes across routers and services

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-05-03 17:40:37 +02:00
parent 2df029f7e8
commit 2f9fc8076d
15 changed files with 332 additions and 154 deletions

View File

@@ -1672,18 +1672,50 @@ async def get_jail(...) -> JailDetailResponse:
- Map domain exceptions to HTTP status codes via FastAPI **exception handlers** registered on the app.
- Always log errors with context before raising.
```python
class JailNotFoundError(Exception):
def __init__(self, name: str) -> None:
self.name: str = name
super().__init__(f"Jail '{name}' not found")
### Service Error Contracts
# In main.py
@app.exception_handler(JailNotFoundError)
async def jail_not_found_handler(request: Request, exc: JailNotFoundError) -> JSONResponse:
return JSONResponse(status_code=404, content={"detail": f"Jail '{exc.name}' not found"})
Each service method must document which error handling pattern it follows. This
lets callers know what to expect without reading the implementation. See
`Docs/Service-Development.md` for the full guide.
**ABORT_ON_ERROR** — Raise an exception, let the router handle it. Used for:
auth, writes, state changes, any operation where partial success is meaningless.
**RETURN_DEFAULT** — Return empty result and log warning. Never raises. Used for:
informational reads where infrastructure unavailability should not block the UI.
**PARTIAL_RESULT** — Return a result that contains both successful items and a
list of errors. Caller decides what to do with each.
```python
async def get_settings(socket_path: str) -> DomainServerSettingsResult:
"""Return current fail2ban server-level settings.
Error contract: RETURN_DEFAULT. Returns DomainServerSettingsResult with
default values if socket is unreachable. Never raises.
"""
...
async def start_jail(socket_path: str, name: str) -> None:
"""Start a stopped fail2ban jail.
Error contract: ABORT_ON_ERROR. Raises JailNotFoundError (404),
JailOperationError (409), Fail2BanConnectionError (503).
"""
...
```
```python
class ServiceErrorContract:
"""
ABORT_ON_ERROR: Raise exception, let router handle
RETURN_DEFAULT: Return empty result, log warning
PARTIAL_RESULT: Return partial success with error list
"""
```
The error contract enum and helper are in `app.services.error_handling`.
### Routers and Exception Propagation
- **Routers must NOT construct `HTTPException` for domain errors** — let domain exceptions propagate.

View File

@@ -128,6 +128,18 @@ Per-IP rate limits applied to API endpoints.
---
## Pagination & Display Limits
Configurable limits that affect API response sizes and data retention.
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `BANGUI_MAX_PAGE_SIZE` | int | `500` | Maximum records returned per paginated API response. Individual endpoints may further limit this. Must be 110000. |
| `BANGUI_PREVIEW_MAX_LINES` | int | `100` | Maximum IP lines returned in a blocklist source preview. Must be ≥ 1. |
| `BANGUI_HISTORY_RETENTION_DAYS` | int | `90` | Number of days historical ban records are retained before archival cleanup. Must be ≥ 1. |
---
## Observability
| Variable | Type | Default | Description |

115
Docs/Service-Development.md Normal file
View File

@@ -0,0 +1,115 @@
# Service Development Guide
How to write and maintain services in BanGUI.
## Error Handling Contracts
Every service method must document which error handling pattern it follows.
This lets callers know what to expect without reading the implementation.
### The Three Patterns
```python
from app.services.error_handling import ABORT_ON_ERROR, RETURN_DEFAULT, PARTIAL_RESULT
```
**ABORT_ON_ERROR** — Raise an exception, let the router convert it to HTTP.
Used for: auth, writes, state changes, any operation where partial success is meaningless.
```python
async def start_jail(socket_path: str, name: str) -> None:
"""Start a stopped fail2ban jail.
Error contract: ABORT_ON_ERROR. Raises JailNotFoundError (404),
JailOperationError (409), Fail2BanConnectionError (503).
"""
...
```
**RETURN_DEFAULT** — Return empty result and log warning. Never raises.
Used for: informational reads (list, get) where infrastructure unavailability
should not block the UI.
```python
async def get_settings(socket_path: str) -> DomainServerSettingsResult:
"""Return current fail2ban server-level settings.
Error contract: RETURN_DEFAULT. Returns DomainServerSettingsResult
with default values if socket is unreachable. Never raises.
"""
...
```
**PARTIAL_RESULT** — Return (result, errors) tuple. Errors collected, not raised.
Used for: batch operations on collections where one item failing does not
invalidate the rest.
```python
# Not yet used in codebase; define as needed for batch operations.
```
### When to Use Which
| Operation type | Pattern |
|---------------|---------|
| Auth / session | ABORT_ON_ERROR |
| Write / state change | ABORT_ON_ERROR |
| Config updates | ABORT_ON_ERROR |
| Single-item read (jail, ban) | ABORT_ON_ERROR |
| Multi-item read (list) | RETURN_DEFAULT |
| Server settings read | RETURN_DEFAULT |
| Batch / parallel fetch | PARTIAL_RESULT |
### Changing Patterns
Switching a method's error contract is a **breaking change**. Update the docstring,
add a changelog entry, and bump the major version if this is a public API.
## Service Structure
Services live in `backend/app/services/`. They contain **no** HTTP/FastAPI concerns.
```
app/services/
ban_service.py # ban/unban, ban history queries
jail_service.py # jail lifecycle, ignore lists
server_service.py # server-level settings
geo_service.py # geolocation
...
error_handling.py # contract definitions
protocols.py # Protocol interfaces for DI
```
## Protocols
Each service has a corresponding protocol in `protocols.py` for dependency injection.
Protocol methods include the error contract in their docstring:
```python
class JailService(Protocol):
async def list_jails(self, socket_path: str) -> DomainJailList:
"""Error contract: ABORT_ON_ERROR."""
...
```
## Router Error Handling
Routers must not catch and silently swallow exceptions from services using
ABORT_ON_ERROR unless they convert to a specific HTTP response.
Let domain exceptions propagate — the global exception handlers handle them.
Exception handler registration (in `main.py`):
- `DomainError` → JSON error response
- `Fail2BanConnectionError` → HTTP 503
- `JailNotFoundError` → HTTP 404
## Logging
Log at the service layer using structlog:
```python
log.info("jail_started", jail=name)
log.warning("socket_unreachable_using_default", socket_path=socket_path)
```
Never log sensitive data (tokens, passwords, IPs in full).

View File

@@ -1,81 +1,3 @@
### Issue #25: MEDIUM - Incomplete Type Hints in Error Handling
**Where found**:
- `backend/app/main.py` (line 283)
- Error metadata uses `dict[str, str | int | list[str]]` instead of TypedDict
**Why this is needed**:
Generic types don't enable proper type narrowing in exception handlers. Code can't safely access error fields.
**Goal**:
Use TypedDict for type-safe error responses.
**What to do**:
1. Define error response types:
```python
class ErrorResponse(TypedDict):
error_id: str
timestamp: int
message: str
tracebacks: list[str]
correlation_id: str
```
2. Use in exception handlers
3. Type checker can verify correct field access
**Possible traps and issues**:
- TypedDict is Python 3.8+ only
- Need to maintain multiple error response types
**Docs changes needed**:
- Add type safety guidelines
**Doc references**:
- DETAILED_FINDINGS.md - Issue #21 "Incomplete Type Hints"
---
### Issue #26: MEDIUM - Hardcoded Constants Not Configurable
**Where found**:
- `backend/app/utils/constants.py`
- MAX_PAGE_SIZE = 1000
- BLOCKLIST_PREVIEW_MAX_LINES = 100
- HISTORY_RETENTION_DAYS = 90
**Why this is needed**:
Different deployments have different needs:
- Large deployment might want smaller pages
- User might want different preview size
- Some want longer history retention
**Goal**:
Make limits configurable via environment variables.
**What to do**:
1. Move constants to config:
```python
class Settings(BaseSettings):
max_page_size: int = Field(default=1000, env="BANGUI_MAX_PAGE_SIZE")
blocklist_preview_max_lines: int = Field(default=100, env="BANGUI_PREVIEW_MAX_LINES")
history_retention_days: int = Field(default=90, env="BANGUI_HISTORY_RETENTION")
```
2. Validate ranges (max_page_size > 0, < 10000)
3. Update .env.example with all options
4. Document in configuration guide
**Possible traps and issues**:
- Too many configuration options can be overwhelming
- Some limits have dependencies (page_size < max_records)
**Docs changes needed**:
- Add to configuration reference
**Doc references**:
- DETAILED_FINDINGS.md - Issue #20 "Hardcoded Constants"
---
### Issue #27: MEDIUM - Inconsistent Error Handling Patterns
**Where found**: