- Add _hits/_misses counters to GeoCache for cache hit/miss ratio tracking - Reset counters on clear() - Count hits before misses in lookup_batch() to avoid interleaving - Add synchronous prewarm() using asyncio.create_task for fire-and-forget - Add hits/misses fields to GeoCacheStatsResponse model - Add TestCacheMetrics (5 tests), TestPrewarm (3 tests), TestLargeBanList (2 tests) - Fix _make_async_db() mock: db.execute is not async, returns ctx manager - Move collections.abc to TYPE_CHECKING block (TC003) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
26 KiB
Issue #15: MEDIUM - N+1 Query Pattern in Geo Lookups
Where found:
backend/app/services/ban_service.py(lines 650-680)_enrich_bans_with_geo()calls resolver per IP in loop- 1000 bans = 1000 geo lookups
Why this is needed: Dashboard becomes slow with many bans:
- Each geo lookup hits database or HTTP API
- 1000 bans = 1000 lookups = seconds of latency
- API rate limiting exceeded if fallback HTTP enabled
Goal: Batch geo lookups and implement caching to reduce database round-trips.
What to do:
- Implement batch geo resolution:
async def enrich_bans_with_geo(bans): # Extract unique IPs unique_ips = set(ban.ip for ban in bans) # Batch lookup geo_data = await self.geo_cache.resolve_batch(unique_ips) # Attach to bans for ban in bans: ban.country = geo_data.get(ban.ip) - Use geo_cache for persistence (cache is already designed for batching)
- Add metrics for cache hit/miss ratio
- Implement pre-warming of common IPs
- Add tests with large ban lists
Possible traps and issues:
- Batch lookup might be slower than individual if dataset is small
- Cache invalidation on country name updates
- Memory usage if caching all IPs
Docs changes needed:
- Add performance optimization guide
- Document geo cache architecture
Doc references:
- DETAILED_FINDINGS.md - Issue #7 "N+1 Query Pattern"
Issue #16: MEDIUM - Silent Failures in Error Handling (Broad Exception Handlers)
Where found:
backend/app/routers/config_misc.py(line 54) -except Exception:backend/app/ban_service.py- Multiple broad exception handlers- Silent failures hide programming errors
Why this is needed:
Broad except Exception: catches programming errors (AttributeError, KeyError) alongside legitimate errors, hiding bugs in logs.
Goal: Catch only specific exceptions; let programming errors bubble up to global error handler.
What to do:
- Replace broad handlers with specific exceptions:
try: config = parse_config(raw_text) except ConfigParseError as e: # Specific logger.error(f"Config parse failed: {e}") except FileNotFoundError as e: logger.error(f"Config file not found: {e}") except Exception as e: logger.exception("Unexpected error parsing config") raise # Re-raise to global handler - Create domain-specific exception classes
- Document what exceptions each function can raise
- Update tests to verify exception handling
Possible traps and issues:
- Missing exception types will let errors bubble up unexpectedly
- Catching too few exceptions leads to uncaught errors
- Global exception handler needed to catch unhandled exceptions
Docs changes needed:
- Add exception handling guidelines to dev docs
- Create exception taxonomy document
Doc references:
- DETAILED_FINDINGS.md - Issue #16 "Broad Exception Handlers"
Issue #17: MEDIUM - No API Response Status Code Documentation
Where found:
- All routers lack OpenAPI
responses={}documentation - Status codes for success/failure not documented
- Frontend must infer from response body
Why this is needed: Frontend doesn't know:
- Is 400 a validation error or configuration error?
- Is 502 from backend or fail2ban?
- Which 503 status means setup incomplete vs fail2ban down?
Goal: Document all possible status codes and response formats for each endpoint.
What to do:
- Add to each router endpoint:
@router.post( "/login", responses={ 200: {"description": "Login successful", "model": LoginResponse}, 400: {"description": "Invalid request format"}, 401: {"description": "Invalid password"}, 429: {"description": "Too many attempts, retry after 60s"}, 503: {"description": "Setup not complete"}, } ) - Generate OpenAPI schema with descriptions
- Update API docs with status code reference table
- Validate in CI that all endpoints documented
Possible traps and issues:
- Documentation might become stale as code changes
- Multiple response types for single status code (must document each)
Docs changes needed:
- Create API reference documenting all status codes
- Add endpoint documentation template
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "2.3 Missing Status Code Documentation"
Issue #18: MEDIUM - Configuration Validation Missing at Startup
Where found:
backend/app/config.py(lines 37-95)database_pathhas no validationfail2ban_socketnot verified to exist- Hard-coded paths assumed in Docker
Why this is needed: Configuration errors not caught at startup:
- Database path doesn't exist - fails on first DB operation (confusing error)
- fail2ban socket wrong path - only fails when health check runs
- Directory not writable - discovered hours after deployment
Goal: Validate all configuration at startup with clear error messages.
What to do:
- Add validators to config fields:
@field_validator("database_path") def validate_db_path(cls, v): path = Path(v) parent = path.parent if not parent.exists(): raise ValueError( f"Database parent directory does not exist: {parent}\n" f"Create it with: mkdir -p {parent}" ) if not os.access(parent, os.W_OK): raise ValueError( f"Database directory not writable: {parent}\n" f"Fix with: chmod 755 {parent}" ) return v - Validate fail2ban socket exists and is readable
- Verify session secret is sufficiently long
- Check environment variables are set
- Provide actionable error messages
Possible traps and issues:
- Validation might be too strict for some deployments
- Need to handle cases where files don't exist yet but will be created
- Docker initialization order might delay file creation
Docs changes needed:
- Document required directories and permissions
- Create setup validation troubleshooting guide
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.2 Missing Configuration Validation"
Issue #19: MEDIUM - Sensitive Data Could Leak in Logs
Where found:
backend/app/utils/fail2ban_client.py(line 148) - Logs subprocess output without sanitization- Could contain passwords, API keys from config files
Why this is needed: If subprocess output contains secrets, logs become security liability and violate compliance requirements.
Goal: Sanitize logs to remove sensitive information patterns.
What to do:
- Create sanitizer function:
def sanitize_for_logging(text: str) -> str: # Remove passwords text = re.sub(r'password[=:]\S+', 'password=***', text, flags=re.IGNORECASE) # Remove API keys text = re.sub(r'api[_-]?key[=:]\S+', 'api_key=***', text, flags=re.IGNORECASE) # Remove tokens text = re.sub(r'token[=:]\S+', 'token=***', text, flags=re.IGNORECASE) return text - Apply to all subprocess output and external responses
- Add to logging middleware
- Audit existing logs for sensitive data
Possible traps and issues:
- Patterns might miss some sensitive data formats
- Over-sanitization might hide helpful debug info
- Performance cost if sanitizing large outputs
Docs changes needed:
- Add logging best practices guide
- Document what's sanitized
Doc references:
- DETAILED_FINDINGS.md - Issue #25 "Sensitive Data in Logs"
Issue #20: MEDIUM - No Correlation ID in Background Tasks
Where found:
- All task files in
backend/app/tasks/ - Background tasks don't propagate correlation ID
- Can't correlate task logs with triggering request
Why this is needed: Troubleshooting becomes hard:
- Task fails
- Logs show task name
- Can't find what triggered it
Goal: Track correlation ID through entire request lifecycle including background tasks.
What to do:
- Use contextvars for correlation ID:
from contextvars import ContextVar correlation_id_var: ContextVar[str] = ContextVar('correlation_id', default='bg-task') async def blocklist_import_task(source_id: str, correlation_id: str): token = correlation_id_var.set(correlation_id) try: logger.info(f"Starting import for source {source_id}") finally: correlation_id_var.reset(token) - Pass correlation ID to background tasks
- Include in structured logs
- Create task tracking UI showing correlation ID
Possible traps and issues:
- Correlation ID must flow through all async contexts
- Need to pass ID when scheduling tasks
- Multiple nested tasks might have parent/child correlation IDs
Docs changes needed:
- Add observability guide for background tasks
- Document correlation ID format
Doc references:
- DETAILED_FINDINGS.md - Issue #26 "Missing Correlation ID"
Issue #21: MEDIUM - Missing Database Indexes for Performance
Where found:
- fail2ban's
bans.timeofbancolumn likely has no index - BanGUI queries that filter by timeofban + jail do full table scan
- Large datasets cause slow dashboard loads
Why this is needed: Without indexes on commonly filtered columns, query performance degrades as dataset grows.
Goal: Add appropriate indexes to improve query performance.
What to do:
- Verify index exists on fail2ban database:
CREATE INDEX IF NOT EXISTS idx_bans_timeofban_jail ON bans(timeofban DESC, jail); - Check BanGUI database for missing indexes
- Add to history_archive:
CREATE INDEX IF NOT EXISTS idx_history_jail_timeofban ON history_archive(jail, timeofban DESC); - Monitor query performance before/after
- Add index creation to migration system
Possible traps and issues:
- Creating indexes on large tables locks table (need CONCURRENT on PostgreSQL)
- Wrong index type slows queries instead of helping
- Too many indexes slow writes
Docs changes needed:
- Document query optimization patterns
- Performance tuning guide
Doc references:
- DETAILED_FINDINGS.md - Issue #10 "Missing DB Index"
Issue #22: MEDIUM - Socket Cleanup Errors Silently Suppressed
Where found:
backend/app/utils/fail2ban_client.py(line 150)- Uses
contextlib.suppress(Exception)hiding close errors - Resource leaks possible
Why this is needed: Suppressing all errors hides real problems:
- Socket never actually closes
- Connection pool exhaustion
- File descriptors leak
Goal: Log errors properly while still cleaning up resources.
What to do:
- Replace suppress with logging:
finally: try: await socket.close() except Exception as e: logger.warning(f"Error closing fail2ban socket: {e}", exc_info=True) - Audit other resource cleanup (database connections, HTTP sessions)
- Write tests that verify cleanup happens even on errors
Possible traps and issues:
- Close might raise unexpected errors
- Logging exceptions in finally blocks can cause issues
- Resource exhaustion hard to debug if not logging
Docs changes needed:
- Add resource cleanup guidelines to dev guide
Doc references:
- DETAILED_FINDINGS.md - Issue #14 "Socket Cleanup Silent Fail"
Issue #23: MEDIUM - Missing Default Configuration Documentation
Where found:
.env.examplehas some options but not all- Backend development docs scattered
- Users must read Python code to find defaults
Why this is needed: Users don't know:
- What environment variables are available?
- What are the defaults?
- What values are valid?
Goal: Create comprehensive configuration reference documentation.
What to do:
- Create
Docs/CONFIGURATION.mdwith complete table:| Variable | Type | Default | Description | |----------|------|---------|-------------| | BANGUI_DATABASE_PATH | string | /data/bangui.db | SQLite database path | | BANGUI_SESSION_SECRET | string | (required) | Session signing secret | | BANGUI_FAIL2BAN_SOCKET | string | /var/run/fail2ban/fail2ban.sock | fail2ban socket | - Document each option with valid values and constraints
- Organize by section (database, security, performance, etc.)
- Cross-reference in README and deployment docs
Possible traps and issues:
- Documentation can become stale as config options change
- Too much detail makes it hard to find what's needed
Docs changes needed:
- Create comprehensive configuration reference
- Update README to link to it
- Add to API documentation
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.1, 5.5, 5.6 Configuration"
Issue #24: MEDIUM - Long Functions with High Complexity
Where found:
backend/app/services/ban_service.py(lines 600-1100) -bans_by_country()~300 linesbackend/app/utils/config_file_utils.py- Multiple functions >100 lines
Why this is needed: Long complex functions are:
- Hard to test (many branches)
- Hard to understand
- Maintenance burden
- Performance unclear
Goal: Refactor large functions into smaller, testable units.
What to do:
- Identify functions >100 lines
- Break into smaller functions, each with single responsibility:
async def bans_by_country(self): # Load data bans = await self._load_bans_paginated() # Aggregate by_country = self._aggregate_by_country(bans) # Enrich enriched = await self._enrich_with_geo(by_country) # Sort and return return self._sort_by_count(enriched) - Each smaller function is testable independently
- Add unit tests for each piece
Possible traps and issues:
- Breaking up functions might expose bugs that were hidden
- Performance might change (profile before/after)
- Error handling complexity might increase
Docs changes needed:
- Add code complexity guidelines to style guide
Doc references:
- DETAILED_FINDINGS.md - Issue #19 "Long Functions"
Issue #25: MEDIUM - Incomplete Type Hints in Error Handling
Where found:
backend/app/main.py(line 283)- Error metadata uses
dict[str, str | int | list[str]]instead of TypedDict
Why this is needed: Generic types don't enable proper type narrowing in exception handlers. Code can't safely access error fields.
Goal: Use TypedDict for type-safe error responses.
What to do:
- Define error response types:
class ErrorResponse(TypedDict): error_id: str timestamp: int message: str tracebacks: list[str] correlation_id: str - Use in exception handlers
- Type checker can verify correct field access
Possible traps and issues:
- TypedDict is Python 3.8+ only
- Need to maintain multiple error response types
Docs changes needed:
- Add type safety guidelines
Doc references:
- DETAILED_FINDINGS.md - Issue #21 "Incomplete Type Hints"
Issue #26: MEDIUM - Hardcoded Constants Not Configurable
Where found:
backend/app/utils/constants.py- MAX_PAGE_SIZE = 1000
- BLOCKLIST_PREVIEW_MAX_LINES = 100
- HISTORY_RETENTION_DAYS = 90
Why this is needed: Different deployments have different needs:
- Large deployment might want smaller pages
- User might want different preview size
- Some want longer history retention
Goal: Make limits configurable via environment variables.
What to do:
- Move constants to config:
class Settings(BaseSettings): max_page_size: int = Field(default=1000, env="BANGUI_MAX_PAGE_SIZE") blocklist_preview_max_lines: int = Field(default=100, env="BANGUI_PREVIEW_MAX_LINES") history_retention_days: int = Field(default=90, env="BANGUI_HISTORY_RETENTION") - Validate ranges (max_page_size > 0, < 10000)
- Update .env.example with all options
- Document in configuration guide
Possible traps and issues:
- Too many configuration options can be overwhelming
- Some limits have dependencies (page_size < max_records)
Docs changes needed:
- Add to configuration reference
Doc references:
- DETAILED_FINDINGS.md - Issue #20 "Hardcoded Constants"
Issue #27: MEDIUM - Inconsistent Error Handling Patterns
Where found:
ban_service.py- Raises exceptionsserver_service.py- Returns defaults silentlyjail_service.py- Mixed approach
Why this is needed: Different services have different error handling contracts. Callers don't know what to expect.
Goal: Establish clear error handling contract for all services.
What to do:
- Document error handling patterns:
class ServiceErrorContract: """ ABORT_ON_ERROR: Raise exception, let router handle RETURN_DEFAULT: Return empty result, log warning PARTIAL_RESULT: Return partial success with error list """ - Each service method documents which pattern it follows
- Routers handle errors consistently
- Update service protocols
Possible traps and issues:
- Users of service must know pattern
- Switching patterns is breaking change
Docs changes needed:
- Create service development guide with error patterns
Doc references:
- DETAILED_FINDINGS.md - Issue "Inconsistent Error Handling"
LOWER PRIORITY ISSUES (LOW-MEDIUM)
Issue #28: LOW-MEDIUM - Missing Pre-Commit Hooks
Where found:
- No
.pre-commit-config.yaml - Docs mention husky but no
.husky/directory
Why this is needed: Without pre-commit hooks, developers commit code that fails linting/tests, slowing down CI.
Goal: Enforce code quality checks before commit.
What to do:
- Create
.pre-commit-config.yaml:repos: - repo: https://github.com/pre-commit/pre-commit-hooks hooks: - id: check-yaml - id: end-of-file-fixer - repo: https://github.com/astral-sh/ruff-pre-commit hooks: - id: ruff - id: ruff-format - Setup husky for frontend
- Document in CONTRIBUTING.md
Docs changes needed:
- Create CONTRIBUTING.md with setup instructions
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.1 Missing Pre-Commit Hooks"
Issue #29: LOW-MEDIUM - Missing .editorconfig
Where found:
- No
.editorconfigfile
Why this is needed: Different developers use different editors with different default formatting, causing inconsistent code.
Goal: Enforce consistent formatting across all editors.
What to do:
- Create
.editorconfig:root = true [*] charset = utf-8 end_of_line = lf insert_final_newline = true [*.py] indent_style = space indent_size = 4 [*.{js,ts,tsx,jsx}] indent_style = space indent_size = 2 - Add editorconfig plugin to IDE guides
Docs changes needed:
- Add to development setup instructions
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.3 Missing .editorconfig"
Issue #30: LOW-MEDIUM - IPv4-Mapped IPv6 Address Duplicates
Where found:
backend/app/utils/ip_utils.py- Treats "192.168.1.1" and "::ffff:192.168.1.1" as different IPs
Why this is needed: Same IP can be banned twice in different formats, causing:
- Duplicate ban logs
- Geo cache duplicates
- Analytics skewed
Goal: Normalize IP addresses to canonical form.
What to do:
- Add normalization:
def normalize_ip(ip_str: str) -> str: ip = ipaddress.ip_address(ip_str) # Convert IPv4-mapped IPv6 to IPv4 if isinstance(ip, ipaddress.IPv6Address) and ip.ipv4_mapped: return str(ip.ipv4_mapped) return str(ip) - Apply on all IP inputs (ban, import, etc.)
- Test with various formats
Docs changes needed:
- Document IP normalization
Doc references:
- DETAILED_FINDINGS.md - Issue #22 "IPv4-Mapped IPv6"
Issue #31: LOW-MEDIUM - Weak Master Password Validation
Where found:
backend/app/models/setup.py(line 22)- Requires uppercase, digit, special char but no dictionary check
Why this is needed: Passwords can still be weak (e.g., "Password1!" is common).
Goal: Prevent common passwords.
What to do:
- Add common passwords list or library:
import common_passwords @field_validator("password") def validate_password(cls, v): if v.lower() in common_passwords.PASSWORDS: raise ValueError("Password is too common, choose another") return v - Test against known weak passwords
Docs changes needed:
- Document password requirements
Doc references:
- DETAILED_FINDINGS.md - Issue #23 "Weak Password Validation"
Issue #32: LOW-MEDIUM - Missing Accessibility Features
Where found:
frontend/src/components/BanTable.tsx- No aria-label on tablefrontend/src/pages/HistoryPage.tsx- Button has tabIndex but no onKeyDown handler- World map missing alt text
Why this is needed: Application not usable by screen reader users or keyboard-only navigation.
Goal: Improve accessibility to WCAG AA compliance.
What to do:
- Add ARIA labels to major components
- Implement keyboard navigation handlers
- Test with screen readers
- Check color contrast ratios
Docs changes needed:
- Add accessibility guidelines
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "11 Accessibility"
Issue #33: LOW - Missing Architecture Decision Records (ADRs)
Where found:
- No
Docs/adr/directory
Why this is needed: New developers don't understand architectural choices, recreate debates, make wrong assumptions.
Goal: Document important decisions and their rationale.
What to do:
- Create
Docs/adr/directory - Add ADRs for major decisions:
- Why SQLite instead of PostgreSQL?
- Why FastAPI instead of Django?
- Why React instead of Vue?
- Why APScheduler instead of Celery?
- Why single-instance scheduler?
Docs changes needed:
- Create ADR template and examples
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 Missing ADRs"
Issue #34: LOW - No Troubleshooting Guide
Where found:
- Missing
Docs/TROUBLESHOOTING.md
Why this is needed: Users can't self-serve on common issues, create issues instead.
Goal: Document common problems and solutions.
What to do:
- Create
Docs/TROUBLESHOOTING.mdwith:- "502 Bad Gateway" - backend is down or not ready
- "Permission denied" - database directory not writable
- "fail2ban not found" - socket path wrong
- "Geo lookups empty" - GeoLite2 database missing
- "Rate limited (429)" - too many requests
- Expand based on real user issues
Docs changes needed:
- Create comprehensive troubleshooting guide
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.3 No Troubleshooting"
Issue #35: LOW - Missing Upgrade/Migration Guide
Where found:
- No
Docs/UPGRADING.md
Why this is needed: Users don't know how to safely upgrade without losing data.
Goal: Document upgrade process and breaking changes.
What to do:
- Create
Docs/UPGRADING.mdwith:- Backup procedure
- Breaking changes for each version
- Step-by-step upgrade procedure
- Rollback procedure if something goes wrong
Docs changes needed:
- Create upgrade guide for each major version
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 No Migration Guide"
Issue #36: LOW - No Backup Strategy Documented
Where found:
- No backup procedure in deployment docs
- No automated backup in Docker Compose
Why this is needed: Users don't know how to protect their data.
Goal: Document and automate database backups.
What to do:
- Create
Docs/BACKUP_RESTORE.md - Add backup script to Docker
- Document retention policy
- Document restore procedure
Docs changes needed:
- Create backup & restore guide
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "10.4 No Backup Strategy"
Issue #37: LOW - Missing CONTRIBUTING.md
Where found:
fail2ban-master/CONTRIBUTING.mdis from fail2ban, not BanGUI
Why this is needed: Contributors don't know project guidelines.
Goal: Document contribution guidelines.
What to do:
- Create
CONTRIBUTING.mdwith:- Development setup
- Branch naming conventions
- PR requirements
- Code style guidelines
- Testing requirements
- PR review process
Docs changes needed:
- Create CONTRIBUTING.md
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.5 No CONTRIBUTING.md"
Issue #38: LOW - No Test Coverage Minimum Enforced
Where found:
backend/pyproject.toml- Coverage report generated but no minimum threshold- CI doesn't fail on low coverage
Why this is needed: Code quality can degrade as coverage drops.
Goal: Enforce minimum test coverage.
What to do:
- Set minimum coverage threshold in CI (e.g., 80%)
- Fail build if coverage drops below threshold
- Add coverage badge to README
Docs changes needed:
- Document testing requirements
Doc references:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.6 Test Coverage Not Enforced"
DOCUMENTATION GAPS (Cross-Cutting)
Issue #39: DOCUMENTATION - Missing API Reference
Files affected: All routers
Create: Comprehensive API reference documenting:
- All endpoints
- Request/response formats
- Status codes
- Examples
References:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.1 Missing API Documentation"
Issue #40: DOCUMENTATION - Missing Deployment Best Practices
Files affected: Docs/Deployment.md, Docker configuration
Create/Update:
- Security best practices
- Performance tuning
- Monitoring setup
- Scaling guidelines
References:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "6 Build & Deployment"
Issue #41: DOCUMENTATION - Missing Database Schema Documentation
Create: Document:
- All tables and their purpose
- Relationships and constraints
- Indexes and performance notes
- Migration history
References:
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1 Database Design"