Files
BanGUI/Docs/Tasks.md
Lukas 7ad885d276 refactor: separate config service from jail config service
- Split config_service.py into config_service.py and jail_config_service.py
- Update Docs/Tasks.md, Security.md, TROUBLESHOOTING.md
2026-05-03 01:05:18 +02:00

911 lines
24 KiB
Markdown

### Issue #16: MEDIUM - Silent Failures in Error Handling (Broad Exception Handlers)
**Where found**:
- `backend/app/routers/config_misc.py` (line 54) - `except Exception:`
- `backend/app/ban_service.py` - Multiple broad exception handlers
- Silent failures hide programming errors
**Why this is needed**:
Broad `except Exception:` catches programming errors (AttributeError, KeyError) alongside legitimate errors, hiding bugs in logs.
**Goal**:
Catch only specific exceptions; let programming errors bubble up to global error handler.
**What to do**:
1. Replace broad handlers with specific exceptions:
```python
try:
config = parse_config(raw_text)
except ConfigParseError as e: # Specific
logger.error(f"Config parse failed: {e}")
except FileNotFoundError as e:
logger.error(f"Config file not found: {e}")
except Exception as e:
logger.exception("Unexpected error parsing config")
raise # Re-raise to global handler
```
2. Create domain-specific exception classes
3. Document what exceptions each function can raise
4. Update tests to verify exception handling
**Possible traps and issues**:
- Missing exception types will let errors bubble up unexpectedly
- Catching too few exceptions leads to uncaught errors
- Global exception handler needed to catch unhandled exceptions
**Docs changes needed**:
- Add exception handling guidelines to dev docs
- Create exception taxonomy document
**Doc references**:
- DETAILED_FINDINGS.md - Issue #16 "Broad Exception Handlers"
---
### Issue #17: MEDIUM - No API Response Status Code Documentation
**Where found**:
- All routers lack OpenAPI `responses={}` documentation
- Status codes for success/failure not documented
- Frontend must infer from response body
**Why this is needed**:
Frontend doesn't know:
- Is 400 a validation error or configuration error?
- Is 502 from backend or fail2ban?
- Which 503 status means setup incomplete vs fail2ban down?
**Goal**:
Document all possible status codes and response formats for each endpoint.
**What to do**:
1. Add to each router endpoint:
```python
@router.post(
"/login",
responses={
200: {"description": "Login successful", "model": LoginResponse},
400: {"description": "Invalid request format"},
401: {"description": "Invalid password"},
429: {"description": "Too many attempts, retry after 60s"},
503: {"description": "Setup not complete"},
}
)
```
2. Generate OpenAPI schema with descriptions
3. Update API docs with status code reference table
4. Validate in CI that all endpoints documented
**Possible traps and issues**:
- Documentation might become stale as code changes
- Multiple response types for single status code (must document each)
**Docs changes needed**:
- Create API reference documenting all status codes
- Add endpoint documentation template
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "2.3 Missing Status Code Documentation"
---
### Issue #18: MEDIUM - Configuration Validation Missing at Startup
**Where found**:
- `backend/app/config.py` (lines 37-95)
- `database_path` has no validation
- `fail2ban_socket` not verified to exist
- Hard-coded paths assumed in Docker
**Why this is needed**:
Configuration errors not caught at startup:
- Database path doesn't exist - fails on first DB operation (confusing error)
- fail2ban socket wrong path - only fails when health check runs
- Directory not writable - discovered hours after deployment
**Goal**:
Validate all configuration at startup with clear error messages.
**What to do**:
1. Add validators to config fields:
```python
@field_validator("database_path")
def validate_db_path(cls, v):
path = Path(v)
parent = path.parent
if not parent.exists():
raise ValueError(
f"Database parent directory does not exist: {parent}\n"
f"Create it with: mkdir -p {parent}"
)
if not os.access(parent, os.W_OK):
raise ValueError(
f"Database directory not writable: {parent}\n"
f"Fix with: chmod 755 {parent}"
)
return v
```
2. Validate fail2ban socket exists and is readable
3. Verify session secret is sufficiently long
4. Check environment variables are set
5. Provide actionable error messages
**Possible traps and issues**:
- Validation might be too strict for some deployments
- Need to handle cases where files don't exist yet but will be created
- Docker initialization order might delay file creation
**Docs changes needed**:
- Document required directories and permissions
- Create setup validation troubleshooting guide
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.2 Missing Configuration Validation"
---
### Issue #19: MEDIUM - Sensitive Data Could Leak in Logs
**Where found**:
- `backend/app/utils/fail2ban_client.py` (line 148) - Logs subprocess output without sanitization
- Could contain passwords, API keys from config files
**Why this is needed**:
If subprocess output contains secrets, logs become security liability and violate compliance requirements.
**Goal**:
Sanitize logs to remove sensitive information patterns.
**What to do**:
1. Create sanitizer function:
```python
def sanitize_for_logging(text: str) -> str:
# Remove passwords
text = re.sub(r'password[=:]\S+', 'password=***', text, flags=re.IGNORECASE)
# Remove API keys
text = re.sub(r'api[_-]?key[=:]\S+', 'api_key=***', text, flags=re.IGNORECASE)
# Remove tokens
text = re.sub(r'token[=:]\S+', 'token=***', text, flags=re.IGNORECASE)
return text
```
2. Apply to all subprocess output and external responses
3. Add to logging middleware
4. Audit existing logs for sensitive data
**Possible traps and issues**:
- Patterns might miss some sensitive data formats
- Over-sanitization might hide helpful debug info
- Performance cost if sanitizing large outputs
**Docs changes needed**:
- Add logging best practices guide
- Document what's sanitized
**Doc references**:
- DETAILED_FINDINGS.md - Issue #25 "Sensitive Data in Logs"
---
### Issue #20: MEDIUM - No Correlation ID in Background Tasks
**Where found**:
- All task files in `backend/app/tasks/`
- Background tasks don't propagate correlation ID
- Can't correlate task logs with triggering request
**Why this is needed**:
Troubleshooting becomes hard:
- Task fails
- Logs show task name
- Can't find what triggered it
**Goal**:
Track correlation ID through entire request lifecycle including background tasks.
**What to do**:
1. Use contextvars for correlation ID:
```python
from contextvars import ContextVar
correlation_id_var: ContextVar[str] = ContextVar('correlation_id', default='bg-task')
async def blocklist_import_task(source_id: str, correlation_id: str):
token = correlation_id_var.set(correlation_id)
try:
logger.info(f"Starting import for source {source_id}")
finally:
correlation_id_var.reset(token)
```
2. Pass correlation ID to background tasks
3. Include in structured logs
4. Create task tracking UI showing correlation ID
**Possible traps and issues**:
- Correlation ID must flow through all async contexts
- Need to pass ID when scheduling tasks
- Multiple nested tasks might have parent/child correlation IDs
**Docs changes needed**:
- Add observability guide for background tasks
- Document correlation ID format
**Doc references**:
- DETAILED_FINDINGS.md - Issue #26 "Missing Correlation ID"
---
### Issue #21: MEDIUM - Missing Database Indexes for Performance
**Where found**:
- fail2ban's `bans.timeofban` column likely has no index
- BanGUI queries that filter by timeofban + jail do full table scan
- Large datasets cause slow dashboard loads
**Why this is needed**:
Without indexes on commonly filtered columns, query performance degrades as dataset grows.
**Goal**:
Add appropriate indexes to improve query performance.
**What to do**:
1. Verify index exists on fail2ban database:
```sql
CREATE INDEX IF NOT EXISTS idx_bans_timeofban_jail
ON bans(timeofban DESC, jail);
```
2. Check BanGUI database for missing indexes
3. Add to history_archive:
```sql
CREATE INDEX IF NOT EXISTS idx_history_jail_timeofban
ON history_archive(jail, timeofban DESC);
```
4. Monitor query performance before/after
5. Add index creation to migration system
**Possible traps and issues**:
- Creating indexes on large tables locks table (need CONCURRENT on PostgreSQL)
- Wrong index type slows queries instead of helping
- Too many indexes slow writes
**Docs changes needed**:
- Document query optimization patterns
- Performance tuning guide
**Doc references**:
- DETAILED_FINDINGS.md - Issue #10 "Missing DB Index"
---
### Issue #22: MEDIUM - Socket Cleanup Errors Silently Suppressed
**Where found**:
- `backend/app/utils/fail2ban_client.py` (line 150)
- Uses `contextlib.suppress(Exception)` hiding close errors
- Resource leaks possible
**Why this is needed**:
Suppressing all errors hides real problems:
- Socket never actually closes
- Connection pool exhaustion
- File descriptors leak
**Goal**:
Log errors properly while still cleaning up resources.
**What to do**:
1. Replace suppress with logging:
```python
finally:
try:
await socket.close()
except Exception as e:
logger.warning(f"Error closing fail2ban socket: {e}", exc_info=True)
```
2. Audit other resource cleanup (database connections, HTTP sessions)
3. Write tests that verify cleanup happens even on errors
**Possible traps and issues**:
- Close might raise unexpected errors
- Logging exceptions in finally blocks can cause issues
- Resource exhaustion hard to debug if not logging
**Docs changes needed**:
- Add resource cleanup guidelines to dev guide
**Doc references**:
- DETAILED_FINDINGS.md - Issue #14 "Socket Cleanup Silent Fail"
---
### Issue #23: MEDIUM - Missing Default Configuration Documentation
**Where found**:
- `.env.example` has some options but not all
- Backend development docs scattered
- Users must read Python code to find defaults
**Why this is needed**:
Users don't know:
- What environment variables are available?
- What are the defaults?
- What values are valid?
**Goal**:
Create comprehensive configuration reference documentation.
**What to do**:
1. Create `Docs/CONFIGURATION.md` with complete table:
```markdown
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| BANGUI_DATABASE_PATH | string | /data/bangui.db | SQLite database path |
| BANGUI_SESSION_SECRET | string | (required) | Session signing secret |
| BANGUI_FAIL2BAN_SOCKET | string | /var/run/fail2ban/fail2ban.sock | fail2ban socket |
```
2. Document each option with valid values and constraints
3. Organize by section (database, security, performance, etc.)
4. Cross-reference in README and deployment docs
**Possible traps and issues**:
- Documentation can become stale as config options change
- Too much detail makes it hard to find what's needed
**Docs changes needed**:
- Create comprehensive configuration reference
- Update README to link to it
- Add to API documentation
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.1, 5.5, 5.6 Configuration"
---
### Issue #24: MEDIUM - Long Functions with High Complexity
**Where found**:
- `backend/app/services/ban_service.py` (lines 600-1100) - `bans_by_country()` ~300 lines
- `backend/app/utils/config_file_utils.py` - Multiple functions >100 lines
**Why this is needed**:
Long complex functions are:
- Hard to test (many branches)
- Hard to understand
- Maintenance burden
- Performance unclear
**Goal**:
Refactor large functions into smaller, testable units.
**What to do**:
1. Identify functions >100 lines
2. Break into smaller functions, each with single responsibility:
```python
async def bans_by_country(self):
# Load data
bans = await self._load_bans_paginated()
# Aggregate
by_country = self._aggregate_by_country(bans)
# Enrich
enriched = await self._enrich_with_geo(by_country)
# Sort and return
return self._sort_by_count(enriched)
```
3. Each smaller function is testable independently
4. Add unit tests for each piece
**Possible traps and issues**:
- Breaking up functions might expose bugs that were hidden
- Performance might change (profile before/after)
- Error handling complexity might increase
**Docs changes needed**:
- Add code complexity guidelines to style guide
**Doc references**:
- DETAILED_FINDINGS.md - Issue #19 "Long Functions"
---
### Issue #25: MEDIUM - Incomplete Type Hints in Error Handling
**Where found**:
- `backend/app/main.py` (line 283)
- Error metadata uses `dict[str, str | int | list[str]]` instead of TypedDict
**Why this is needed**:
Generic types don't enable proper type narrowing in exception handlers. Code can't safely access error fields.
**Goal**:
Use TypedDict for type-safe error responses.
**What to do**:
1. Define error response types:
```python
class ErrorResponse(TypedDict):
error_id: str
timestamp: int
message: str
tracebacks: list[str]
correlation_id: str
```
2. Use in exception handlers
3. Type checker can verify correct field access
**Possible traps and issues**:
- TypedDict is Python 3.8+ only
- Need to maintain multiple error response types
**Docs changes needed**:
- Add type safety guidelines
**Doc references**:
- DETAILED_FINDINGS.md - Issue #21 "Incomplete Type Hints"
---
### Issue #26: MEDIUM - Hardcoded Constants Not Configurable
**Where found**:
- `backend/app/utils/constants.py`
- MAX_PAGE_SIZE = 1000
- BLOCKLIST_PREVIEW_MAX_LINES = 100
- HISTORY_RETENTION_DAYS = 90
**Why this is needed**:
Different deployments have different needs:
- Large deployment might want smaller pages
- User might want different preview size
- Some want longer history retention
**Goal**:
Make limits configurable via environment variables.
**What to do**:
1. Move constants to config:
```python
class Settings(BaseSettings):
max_page_size: int = Field(default=1000, env="BANGUI_MAX_PAGE_SIZE")
blocklist_preview_max_lines: int = Field(default=100, env="BANGUI_PREVIEW_MAX_LINES")
history_retention_days: int = Field(default=90, env="BANGUI_HISTORY_RETENTION")
```
2. Validate ranges (max_page_size > 0, < 10000)
3. Update .env.example with all options
4. Document in configuration guide
**Possible traps and issues**:
- Too many configuration options can be overwhelming
- Some limits have dependencies (page_size < max_records)
**Docs changes needed**:
- Add to configuration reference
**Doc references**:
- DETAILED_FINDINGS.md - Issue #20 "Hardcoded Constants"
---
### Issue #27: MEDIUM - Inconsistent Error Handling Patterns
**Where found**:
- `ban_service.py` - Raises exceptions
- `server_service.py` - Returns defaults silently
- `jail_service.py` - Mixed approach
**Why this is needed**:
Different services have different error handling contracts. Callers don't know what to expect.
**Goal**:
Establish clear error handling contract for all services.
**What to do**:
1. Document error handling patterns:
```python
class ServiceErrorContract:
"""
ABORT_ON_ERROR: Raise exception, let router handle
RETURN_DEFAULT: Return empty result, log warning
PARTIAL_RESULT: Return partial success with error list
"""
```
2. Each service method documents which pattern it follows
3. Routers handle errors consistently
4. Update service protocols
**Possible traps and issues**:
- Users of service must know pattern
- Switching patterns is breaking change
**Docs changes needed**:
- Create service development guide with error patterns
**Doc references**:
- DETAILED_FINDINGS.md - Issue "Inconsistent Error Handling"
---
## LOWER PRIORITY ISSUES (LOW-MEDIUM)
---
### Issue #28: LOW-MEDIUM - Missing Pre-Commit Hooks
**Where found**:
- No `.pre-commit-config.yaml`
- Docs mention husky but no `.husky/` directory
**Why this is needed**:
Without pre-commit hooks, developers commit code that fails linting/tests, slowing down CI.
**Goal**:
Enforce code quality checks before commit.
**What to do**:
1. Create `.pre-commit-config.yaml`:
```yaml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
hooks:
- id: check-yaml
- id: end-of-file-fixer
- repo: https://github.com/astral-sh/ruff-pre-commit
hooks:
- id: ruff
- id: ruff-format
```
2. Setup husky for frontend
3. Document in CONTRIBUTING.md
**Docs changes needed**:
- Create CONTRIBUTING.md with setup instructions
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.1 Missing Pre-Commit Hooks"
---
### Issue #29: LOW-MEDIUM - Missing .editorconfig
**Where found**:
- No `.editorconfig` file
**Why this is needed**:
Different developers use different editors with different default formatting, causing inconsistent code.
**Goal**:
Enforce consistent formatting across all editors.
**What to do**:
1. Create `.editorconfig`:
```ini
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
[*.py]
indent_style = space
indent_size = 4
[*.{js,ts,tsx,jsx}]
indent_style = space
indent_size = 2
```
2. Add editorconfig plugin to IDE guides
**Docs changes needed**:
- Add to development setup instructions
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.3 Missing .editorconfig"
---
### Issue #30: LOW-MEDIUM - IPv4-Mapped IPv6 Address Duplicates
**Where found**:
- `backend/app/utils/ip_utils.py`
- Treats "192.168.1.1" and "::ffff:192.168.1.1" as different IPs
**Why this is needed**:
Same IP can be banned twice in different formats, causing:
- Duplicate ban logs
- Geo cache duplicates
- Analytics skewed
**Goal**:
Normalize IP addresses to canonical form.
**What to do**:
1. Add normalization:
```python
def normalize_ip(ip_str: str) -> str:
ip = ipaddress.ip_address(ip_str)
# Convert IPv4-mapped IPv6 to IPv4
if isinstance(ip, ipaddress.IPv6Address) and ip.ipv4_mapped:
return str(ip.ipv4_mapped)
return str(ip)
```
2. Apply on all IP inputs (ban, import, etc.)
3. Test with various formats
**Docs changes needed**:
- Document IP normalization
**Doc references**:
- DETAILED_FINDINGS.md - Issue #22 "IPv4-Mapped IPv6"
---
### Issue #31: LOW-MEDIUM - Weak Master Password Validation
**Where found**:
- `backend/app/models/setup.py` (line 22)
- Requires uppercase, digit, special char but no dictionary check
**Why this is needed**:
Passwords can still be weak (e.g., "Password1!" is common).
**Goal**:
Prevent common passwords.
**What to do**:
1. Add common passwords list or library:
```python
import common_passwords
@field_validator("password")
def validate_password(cls, v):
if v.lower() in common_passwords.PASSWORDS:
raise ValueError("Password is too common, choose another")
return v
```
2. Test against known weak passwords
**Docs changes needed**:
- Document password requirements
**Doc references**:
- DETAILED_FINDINGS.md - Issue #23 "Weak Password Validation"
---
### Issue #32: LOW-MEDIUM - Missing Accessibility Features
**Where found**:
- `frontend/src/components/BanTable.tsx` - No aria-label on table
- `frontend/src/pages/HistoryPage.tsx` - Button has tabIndex but no onKeyDown handler
- World map missing alt text
**Why this is needed**:
Application not usable by screen reader users or keyboard-only navigation.
**Goal**:
Improve accessibility to WCAG AA compliance.
**What to do**:
1. Add ARIA labels to major components
2. Implement keyboard navigation handlers
3. Test with screen readers
4. Check color contrast ratios
**Docs changes needed**:
- Add accessibility guidelines
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "11 Accessibility"
---
### Issue #33: LOW - Missing Architecture Decision Records (ADRs)
**Where found**:
- No `Docs/adr/` directory
**Why this is needed**:
New developers don't understand architectural choices, recreate debates, make wrong assumptions.
**Goal**:
Document important decisions and their rationale.
**What to do**:
1. Create `Docs/adr/` directory
2. Add ADRs for major decisions:
- Why SQLite instead of PostgreSQL?
- Why FastAPI instead of Django?
- Why React instead of Vue?
- Why APScheduler instead of Celery?
- Why single-instance scheduler?
**Docs changes needed**:
- Create ADR template and examples
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 Missing ADRs"
---
### Issue #34: LOW - No Troubleshooting Guide
**Where found**:
- Missing `Docs/TROUBLESHOOTING.md`
**Why this is needed**:
Users can't self-serve on common issues, create issues instead.
**Goal**:
Document common problems and solutions.
**What to do**:
1. Create `Docs/TROUBLESHOOTING.md` with:
- "502 Bad Gateway" - backend is down or not ready
- "Permission denied" - database directory not writable
- "fail2ban not found" - socket path wrong
- "Geo lookups empty" - GeoLite2 database missing
- "Rate limited (429)" - too many requests
2. Expand based on real user issues
**Docs changes needed**:
- Create comprehensive troubleshooting guide
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.3 No Troubleshooting"
---
### Issue #35: LOW - Missing Upgrade/Migration Guide
**Where found**:
- No `Docs/UPGRADING.md`
**Why this is needed**:
Users don't know how to safely upgrade without losing data.
**Goal**:
Document upgrade process and breaking changes.
**What to do**:
1. Create `Docs/UPGRADING.md` with:
- Backup procedure
- Breaking changes for each version
- Step-by-step upgrade procedure
- Rollback procedure if something goes wrong
**Docs changes needed**:
- Create upgrade guide for each major version
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 No Migration Guide"
---
### Issue #36: LOW - No Backup Strategy Documented
**Where found**:
- No backup procedure in deployment docs
- No automated backup in Docker Compose
**Why this is needed**:
Users don't know how to protect their data.
**Goal**:
Document and automate database backups.
**What to do**:
1. Create `Docs/BACKUP_RESTORE.md`
2. Add backup script to Docker
3. Document retention policy
4. Document restore procedure
**Docs changes needed**:
- Create backup & restore guide
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "10.4 No Backup Strategy"
---
### Issue #37: LOW - Missing CONTRIBUTING.md
**Where found**:
- `fail2ban-master/CONTRIBUTING.md` is from fail2ban, not BanGUI
**Why this is needed**:
Contributors don't know project guidelines.
**Goal**:
Document contribution guidelines.
**What to do**:
1. Create `CONTRIBUTING.md` with:
- Development setup
- Branch naming conventions
- PR requirements
- Code style guidelines
- Testing requirements
- PR review process
**Docs changes needed**:
- Create CONTRIBUTING.md
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.5 No CONTRIBUTING.md"
---
### Issue #38: LOW - No Test Coverage Minimum Enforced
**Where found**:
- `backend/pyproject.toml` - Coverage report generated but no minimum threshold
- CI doesn't fail on low coverage
**Why this is needed**:
Code quality can degrade as coverage drops.
**Goal**:
Enforce minimum test coverage.
**What to do**:
1. Set minimum coverage threshold in CI (e.g., 80%)
2. Fail build if coverage drops below threshold
3. Add coverage badge to README
**Docs changes needed**:
- Document testing requirements
**Doc references**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.6 Test Coverage Not Enforced"
---
## DOCUMENTATION GAPS (Cross-Cutting)
---
### Issue #39: DOCUMENTATION - Missing API Reference
**Files affected**: All routers
**Create**: Comprehensive API reference documenting:
- All endpoints
- Request/response formats
- Status codes
- Examples
**References**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.1 Missing API Documentation"
---
### Issue #40: DOCUMENTATION - Missing Deployment Best Practices
**Files affected**: `Docs/Deployment.md`, Docker configuration
**Create/Update**:
- Security best practices
- Performance tuning
- Monitoring setup
- Scaling guidelines
**References**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "6 Build & Deployment"
---
### Issue #41: DOCUMENTATION - Missing Database Schema Documentation
**Create**: Document:
- All tables and their purpose
- Relationships and constraints
- Indexes and performance notes
- Migration history
**References**:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1 Database Design"