refactoring-backend #3
80
.github/workflows/ci.yml
vendored
Normal file
80
.github/workflows/ci.yml
vendored
Normal file
@@ -0,0 +1,80 @@
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
backend:
|
||||
name: Backend Tests
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: backend
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
pip install -e ".[dev]"
|
||||
|
||||
- name: Run tests with coverage
|
||||
run: pytest --cov=app --cov-report=term-missing --cov-fail-under=80
|
||||
|
||||
- name: Upload coverage report
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: coverage-report
|
||||
path: backend/htmlcov/
|
||||
retention-days: 7
|
||||
|
||||
ruff:
|
||||
name: Lint
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: backend
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: Install dependencies
|
||||
run: pip install ruff
|
||||
|
||||
- name: Run ruff
|
||||
run: ruff check .
|
||||
|
||||
mypy:
|
||||
name: Type Check
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: backend
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: Install dependencies
|
||||
run: pip install -e ".[dev]"
|
||||
|
||||
- name: Run mypy
|
||||
run: mypy app
|
||||
@@ -104,10 +104,11 @@ BanGUI/
|
||||
# Backend
|
||||
cd backend && pytest --cov=app --cov-report=term-missing
|
||||
|
||||
# Frontend — run once
|
||||
cd frontend && npm test
|
||||
# Coverage threshold: 80%. Build fails if coverage drops below.
|
||||
```
|
||||
|
||||
The CI pipeline enforces the same 80% minimum coverage threshold.
|
||||
|
||||
---
|
||||
|
||||
## Stack
|
||||
|
||||
896
Docs/Tasks.md
896
Docs/Tasks.md
@@ -1,153 +1,3 @@
|
||||
### Issue #32: LOW-MEDIUM - Missing Accessibility Features
|
||||
|
||||
**Status**: ✅ Done (code already compliant; docs expanded)
|
||||
|
||||
**Code review findings**:
|
||||
- `BanTable.tsx` line 244: `aria-label="Ban records table"` ✅
|
||||
- `HistoryPage.tsx` lines 148-159: `<span role="button" tabIndex={0} onKeyDown={...}>` ✅
|
||||
- `WorldMap.tsx` lines 327/366: `role="img"` + descriptive `aria-label` on both wrapper and `<svg>` ✅
|
||||
- All zoom buttons have `aria-label` ✅
|
||||
- Country `<path>` elements have `role="button"`, `tabIndex`, `aria-label`, `aria-pressed` ✅
|
||||
|
||||
**Docs change**: Web-Development.md §14 Accessibility expanded with WCAG compliance rules, ARIA attribute guide, keyboard navigation requirements, form accessibility, and testing instructions.
|
||||
|
||||
---
|
||||
|
||||
### Issue #33: LOW - Missing Architecture Decision Records (ADRs)
|
||||
|
||||
**Where found**:
|
||||
- No `Docs/adr/` directory
|
||||
|
||||
**Why this is needed**:
|
||||
New developers don't understand architectural choices, recreate debates, make wrong assumptions.
|
||||
|
||||
**Goal**:
|
||||
Document important decisions and their rationale.
|
||||
|
||||
**What to do**:
|
||||
1. Create `Docs/adr/` directory
|
||||
2. Add ADRs for major decisions:
|
||||
- Why SQLite instead of PostgreSQL?
|
||||
- Why FastAPI instead of Django?
|
||||
- Why React instead of Vue?
|
||||
- Why APScheduler instead of Celery?
|
||||
- Why single-instance scheduler?
|
||||
|
||||
**Docs changes needed**:
|
||||
- Create ADR template and examples
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 Missing ADRs"
|
||||
|
||||
---
|
||||
|
||||
### Issue #34: LOW - No Troubleshooting Guide
|
||||
|
||||
**Where found**:
|
||||
- Missing `Docs/TROUBLESHOOTING.md`
|
||||
|
||||
**Why this is needed**:
|
||||
Users can't self-serve on common issues, create issues instead.
|
||||
|
||||
**Goal**:
|
||||
Document common problems and solutions.
|
||||
|
||||
**What to do**:
|
||||
1. Create `Docs/TROUBLESHOOTING.md` with:
|
||||
- "502 Bad Gateway" - backend is down or not ready
|
||||
- "Permission denied" - database directory not writable
|
||||
- "fail2ban not found" - socket path wrong
|
||||
- "Geo lookups empty" - GeoLite2 database missing
|
||||
- "Rate limited (429)" - too many requests
|
||||
2. Expand based on real user issues
|
||||
|
||||
**Docs changes needed**:
|
||||
- Create comprehensive troubleshooting guide
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.3 No Troubleshooting"
|
||||
|
||||
---
|
||||
|
||||
### Issue #35: LOW - Missing Upgrade/Migration Guide
|
||||
|
||||
**Where found**:
|
||||
- No `Docs/UPGRADING.md`
|
||||
|
||||
**Why this is needed**:
|
||||
Users don't know how to safely upgrade without losing data.
|
||||
|
||||
**Goal**:
|
||||
Document upgrade process and breaking changes.
|
||||
|
||||
**What to do**:
|
||||
1. Create `Docs/UPGRADING.md` with:
|
||||
- Backup procedure
|
||||
- Breaking changes for each version
|
||||
- Step-by-step upgrade procedure
|
||||
- Rollback procedure if something goes wrong
|
||||
|
||||
**Docs changes needed**:
|
||||
- Create upgrade guide for each major version
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 No Migration Guide"
|
||||
|
||||
---
|
||||
|
||||
### Issue #36: LOW - No Backup Strategy Documented
|
||||
|
||||
**Where found**:
|
||||
- No backup procedure in deployment docs
|
||||
- No automated backup in Docker Compose
|
||||
|
||||
**Why this is needed**:
|
||||
Users don't know how to protect their data.
|
||||
|
||||
**Goal**:
|
||||
Document and automate database backups.
|
||||
|
||||
**What to do**:
|
||||
1. Create `Docs/BACKUP_RESTORE.md`
|
||||
2. Add backup script to Docker
|
||||
3. Document retention policy
|
||||
4. Document restore procedure
|
||||
|
||||
**Docs changes needed**:
|
||||
- Create backup & restore guide
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "10.4 No Backup Strategy"
|
||||
|
||||
---
|
||||
|
||||
### Issue #37: LOW - Missing CONTRIBUTING.md
|
||||
|
||||
**Where found**:
|
||||
- `fail2ban-master/CONTRIBUTING.md` is from fail2ban, not BanGUI
|
||||
|
||||
**Why this is needed**:
|
||||
Contributors don't know project guidelines.
|
||||
|
||||
**Goal**:
|
||||
Document contribution guidelines.
|
||||
|
||||
**What to do**:
|
||||
1. Create `CONTRIBUTING.md` with:
|
||||
- Development setup
|
||||
- Branch naming conventions
|
||||
- PR requirements
|
||||
- Code style guidelines
|
||||
- Testing requirements
|
||||
- PR review process
|
||||
|
||||
**Docs changes needed**:
|
||||
- Create CONTRIBUTING.md
|
||||
|
||||
**Doc references**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.5 No CONTRIBUTING.md"
|
||||
|
||||
---
|
||||
|
||||
### Issue #38: LOW - No Test Coverage Minimum Enforced
|
||||
|
||||
@@ -174,10 +24,6 @@ Enforce minimum test coverage.
|
||||
|
||||
---
|
||||
|
||||
## DOCUMENTATION GAPS (Cross-Cutting)
|
||||
|
||||
---
|
||||
|
||||
### Issue #39: DOCUMENTATION - Missing API Reference
|
||||
|
||||
**Files affected**: All routers
|
||||
@@ -217,4 +63,744 @@ Enforce minimum test coverage.
|
||||
- Migration history
|
||||
|
||||
**References**:
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1 Database Design"
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1 Database Design"
|
||||
|
||||
---
|
||||
|
||||
### Issue #42: CRITICAL - Single-Worker Constraint Not Enforced at Startup
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/main.py` – `create_app()` factory has no worker-count validation
|
||||
- `backend/app/utils/runtime_state.py` – documents single-process requirement but never asserts it
|
||||
|
||||
**Why this is needed**:
|
||||
In-memory structures (session cache, RuntimeState, rate-limit windows) are process-local. Running more than one Uvicorn worker silently causes each worker to diverge on shared state, leading to stale rate limits, ghost sessions, and inconsistent server status.
|
||||
|
||||
**Goal**:
|
||||
Fail loudly at startup when a multi-worker configuration is detected, preventing silent data corruption.
|
||||
|
||||
**What to do**:
|
||||
1. On app startup, detect `WEB_CONCURRENCY` / `--workers` > 1 and raise a `RuntimeError` with a clear message.
|
||||
2. Add an explicit assertion in `create_app()` guarded by the config value.
|
||||
3. Document the single-worker requirement prominently in `Docs/Deployment.md`.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Gunicorn passes worker count via env; Uvicorn may not set it — check both.
|
||||
- Testing frameworks may fork workers; ensure the check is skipped in test mode.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/Deployment.md`: add "Single-Worker Requirement" section with rationale.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/utils/runtime_state.py` top-of-file comment
|
||||
|
||||
---
|
||||
|
||||
### Issue #43: CRITICAL - Rate Limiting Is Process-Local
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/middleware/rate_limit.py:35-107` – global rate limiter uses an in-process sliding window
|
||||
- `backend/app/routers/bans.py:42-97` – per-endpoint rate limiting also process-local
|
||||
|
||||
**Why this is needed**:
|
||||
With N workers an attacker can send up to N × limit requests before any single worker triggers the limit, effectively multiplying the allowed request rate.
|
||||
|
||||
**Goal**:
|
||||
Either enforce single-worker (Issue #42) as a prerequisite and document the limitation, or replace the in-process store with a shared backend (e.g., Redis).
|
||||
|
||||
**What to do**:
|
||||
1. Short-term: Block multi-worker deployments (Issue #42); add a warning log on startup stating rate limiting is process-local.
|
||||
2. Long-term: Abstract the rate-limit store behind an interface so a Redis adapter can be swapped in without touching middleware logic.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Introducing Redis adds an operational dependency; consider making it optional with a feature flag.
|
||||
- Shared counters need atomic increment semantics (use `INCR` + `EXPIRE` in Redis, not GET+SET).
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/Deployment.md`: document rate-limiting scope and its dependency on single-worker mode.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/middleware/rate_limit.py` module docstring
|
||||
|
||||
---
|
||||
|
||||
### Issue #44: CRITICAL - Session Cache Not Invalidated Across Workers on Logout
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/dependencies.py:100-115` – cache is populated per process, never broadcast to siblings
|
||||
|
||||
**Why this is needed**:
|
||||
After logout the revoked session token lives in other workers' caches until TTL expires. Any request routed to a worker that still has the token cached will be accepted.
|
||||
|
||||
**Goal**:
|
||||
Ensure session revocation is immediately visible to all processes handling requests.
|
||||
|
||||
**What to do**:
|
||||
1. Short-term: Enforce single-worker (Issue #42).
|
||||
2. Long-term: Store session cache in a shared layer (Redis / database) and invalidate atomically on logout.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Cache reads must remain fast; a synchronous DB lookup on every request defeats the purpose.
|
||||
- Consider a hybrid: cache positive results for a short TTL, never cache negative results.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/Deployment.md`: document session cache behavior and invalidation guarantees.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/config.py` – `session_cache_enabled` field description
|
||||
|
||||
---
|
||||
|
||||
### Issue #45: HIGH - Session Cache Not Invalidated on Login
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/routers/auth.py:184-195` – logout invalidates the cache entry; login does not
|
||||
|
||||
**Why this is needed**:
|
||||
A stolen token `S1` can be re-used for up to the cache TTL seconds after the legitimate user has logged out and logged back in, because the old cache entry is never cleared.
|
||||
|
||||
**Goal**:
|
||||
Invalidate any existing cache entry for a user/session on every new login.
|
||||
|
||||
**What to do**:
|
||||
1. On successful login, look up and delete any existing cache entries bound to the same user.
|
||||
2. Alternatively, scope cache keys to `(user_id, issued_at)` so old tokens are naturally distinct.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Cache keys may currently be token-hashes only; ensure the user dimension is queryable.
|
||||
|
||||
**Docs changes needed**:
|
||||
- None beyond inline comments.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/dependencies.py` – cache key construction
|
||||
|
||||
---
|
||||
|
||||
### Issue #46: HIGH - Fail2ban DB Opened Without Explicit Read-Only Enforcement
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/repositories/fail2ban_db_repo.py:72` – uses `?mode=ro` URI flag
|
||||
- `backend/app/utils/fail2ban_db_utils.py:46-47` – no PRAGMA to enforce read-only semantics
|
||||
|
||||
**Why this is needed**:
|
||||
The `?mode=ro` URI flag is SQLite-level but not verified after connection open. A malformed URI or library version inconsistency could silently open a writable connection, allowing accidental writes to the live fail2ban database.
|
||||
|
||||
**Goal**:
|
||||
Guarantee that no write operations are ever executed against the fail2ban database.
|
||||
|
||||
**What to do**:
|
||||
1. After opening the connection, execute `PRAGMA query_only = ON;` to enforce read-only at the connection level.
|
||||
2. Wrap the connection factory in an assertion that tries a write and expects it to fail.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- `PRAGMA query_only` is SQLite ≥ 3.8.0; verify the bundled SQLite version.
|
||||
- aiosqlite wraps the connection; ensure PRAGMA is executed before any queries.
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add inline comment explaining why both URI flag and PRAGMA are used.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/services/ban_service.py` module docstring ("read-only mode")
|
||||
|
||||
---
|
||||
|
||||
### Issue #47: HIGH - CSRF Middleware Hardcodes Cookie Name
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/middleware/csrf.py:38-39` – `_SESSION_COOKIE_NAME = "bangui_session"` literal
|
||||
- `backend/app/utils/constants.py` – `SESSION_COOKIE_NAME` constant already exists
|
||||
|
||||
**Why this is needed**:
|
||||
If the session cookie name is changed via configuration, the CSRF middleware will silently stop finding the cookie, either breaking all requests or bypassing CSRF protection depending on the failure mode.
|
||||
|
||||
**Goal**:
|
||||
Use the single source of truth (`SESSION_COOKIE_NAME` constant) everywhere the cookie name is referenced.
|
||||
|
||||
**What to do**:
|
||||
1. Replace the local literal in `csrf.py` with an import of `SESSION_COOKIE_NAME` from `constants.py`.
|
||||
2. Audit all files for other inline `"bangui_session"` occurrences and replace them.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- If the constant is loaded after middleware is instantiated, ensure import order is safe.
|
||||
|
||||
**Docs changes needed**:
|
||||
- None.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/utils/constants.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #48: HIGH - CSRF Header Name Has No Shared Constant
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/middleware/csrf.py` – header name hardcoded
|
||||
- `frontend/src/api/client.ts:176` – same header name hardcoded independently
|
||||
|
||||
**Why this is needed**:
|
||||
The CSRF header name is duplicated across two codebases with no link between them. Changing it in one place silently breaks CSRF protection.
|
||||
|
||||
**Goal**:
|
||||
Single source of truth for the CSRF header name, consumed by both frontend and backend.
|
||||
|
||||
**What to do**:
|
||||
1. Define `CSRF_HEADER_NAME` in `backend/app/utils/constants.py`.
|
||||
2. Expose it via a public API endpoint (e.g., `/api/v1/config/constants`) or document it in a shared config file consumed by the frontend build.
|
||||
3. Reference the constant in `csrf.py` and `client.ts`.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Exposing security-related header names via API may aid reconnaissance; weigh the trade-off.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/`: add or update a "Security Headers" section.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/middleware/csrf.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #49: HIGH - Dual Auth Error Handlers With No Deduplication Guard
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/providers/AuthProvider.tsx:50-56`
|
||||
- `frontend/src/api/client.ts:143` – `setUnauthorizedHandler`
|
||||
- `frontend/src/utils/fetchError.ts` – `setAuthErrorHandler`
|
||||
|
||||
**Why this is needed**:
|
||||
Both handlers fire independently on a 401. Without a deduplication guard, logout logic executes twice, causing a React state mutation race and potentially double-navigating to the login page.
|
||||
|
||||
**Goal**:
|
||||
Ensure auth error handling fires exactly once per 401 response.
|
||||
|
||||
**What to do**:
|
||||
1. Introduce a module-level `isLoggingOut` flag (or use a ref) that is set before the first logout dispatch and checked before the second.
|
||||
2. Alternatively, consolidate into a single handler and remove one registration.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- The "defense-in-depth" rationale for two handlers may be intentional; confirm with team before removing one.
|
||||
- The flag must be reset after the login page is reached.
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add comment in `AuthProvider.tsx` explaining the deduplication guard.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/providers/AuthProvider.tsx` inline comment about cross-tab logout
|
||||
|
||||
---
|
||||
|
||||
### Issue #50: HIGH - Navigation Abort Signal Timing Bug
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/providers/NavigationCancellationProvider.tsx:52-67`
|
||||
|
||||
**Why this is needed**:
|
||||
The `AbortController` is recreated inside a `useEffect` that fires after render. A request initiated synchronously during the render (before the effect runs) receives the previous cycle's signal — which may already be aborted — and is cancelled immediately instead of completing.
|
||||
|
||||
**Goal**:
|
||||
Ensure the fresh `AbortController` is available before any request is initiated on a new route.
|
||||
|
||||
**What to do**:
|
||||
1. Create the new `AbortController` synchronously (outside `useEffect`) when the pathname changes, using `useMemo` or `useRef` with a pathname-keyed guard.
|
||||
2. Only abort the old controller inside the effect to avoid aborting the new signal.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- React strict mode double-invokes effects; ensure the controller is not aborted on the first invocation.
|
||||
- `useMemo` is not guaranteed to be stable; prefer `useRef` + pathname comparison.
|
||||
|
||||
**Docs changes needed**:
|
||||
- Update `frontend/src/providers/PROVIDER_ORDER.md` if the timing contract changes.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/providers/PROVIDER_ORDER.md`
|
||||
|
||||
---
|
||||
|
||||
### Issue #51: MEDIUM - Repository Boundary Not Type-Enforced
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/dependencies.py:20-25` – documented constraint that routers must not import `DbDep` directly
|
||||
|
||||
**Why this is needed**:
|
||||
Nothing prevents a developer from bypassing the service/repository layer and accessing the DB directly from a router, violating the layered architecture and making the codebase harder to test.
|
||||
|
||||
**Goal**:
|
||||
Make architectural violations detectable automatically.
|
||||
|
||||
**What to do**:
|
||||
1. Add a custom `flake8` or `ruff` rule (or `import-linter` boundary) that forbids `app.routers.*` from importing `app.dependencies.DbDep`.
|
||||
2. Add a CI check that runs the import boundary linter.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- `import-linter` requires defining contracts in `pyproject.toml`; coordinate with the existing linting setup.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `CONTRIBUTING.md`: document the layer boundary rule and linting check.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/dependencies.py` module docstring
|
||||
|
||||
---
|
||||
|
||||
### Issue #52: MEDIUM - Error Handling Patterns Not Declared on Services
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/services/error_handling.py:1-65` – three patterns (ABORT_ON_ERROR, RETURN_DEFAULT, PARTIAL_RESULT) defined but not annotated on callers
|
||||
|
||||
**Why this is needed**:
|
||||
Callers must read docstrings to know whether a service raises, returns a default, or returns partial results. A service silently changing its pattern is a breaking change that type checking and tests may not catch.
|
||||
|
||||
**Goal**:
|
||||
Make each service's error contract explicit and machine-checkable.
|
||||
|
||||
**What to do**:
|
||||
1. Create typed wrapper classes or protocol types for each error pattern.
|
||||
2. Annotate service return types to reflect the chosen pattern (e.g., `Result[T, E]` or a tagged union).
|
||||
3. Alternatively, use a decorator that records and enforces the declared pattern.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Adding a full `Result` type may require widespread refactoring; start with documentation-only annotations and migrate incrementally.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `backend/app/services/error_handling.py`: update module docstring with pattern descriptions and usage examples.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/services/error_handling.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #53: MEDIUM - Pagination Contract Inconsistent (Offset vs Cursor)
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/models/ban.py:87-95`
|
||||
- `backend/app/utils/pagination.py:265-305`
|
||||
- `backend/app/models/response.py:125-180`
|
||||
|
||||
**Why this is needed**:
|
||||
Some endpoints use `PaginatedListResponse` with a `pagination` object; others return `page` and `page_size` directly. Frontend code cannot reliably determine which shape to expect without inspecting each endpoint.
|
||||
|
||||
Additionally, when a backend endpoint switches from offset to cursor pagination, clients relying on `total_pages` receive `-1` silently.
|
||||
|
||||
**Goal**:
|
||||
A single, documented pagination envelope used consistently across all list endpoints.
|
||||
|
||||
**What to do**:
|
||||
1. Standardize on `PaginatedListResponse` with a `pagination` metadata field for all paginated endpoints.
|
||||
2. Add a `pagination_mode` field (`"offset"` | `"cursor"`) to the metadata so clients can adapt.
|
||||
3. Migrate non-conforming endpoints and add a linting check.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- This is a breaking API change for existing frontend consumers; version the affected endpoints or use a feature flag.
|
||||
|
||||
**Docs changes needed**:
|
||||
- API reference: document the standard pagination envelope.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/models/ban.py` inline comment about pagination fields
|
||||
|
||||
---
|
||||
|
||||
### Issue #54: MEDIUM - Provider Composition Order Is Fragile
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/App.tsx:74-223` – providers manually nested with a documented-but-unenforced order
|
||||
|
||||
**Why this is needed**:
|
||||
A developer refactoring `App.tsx` can accidentally swap two providers (e.g., `AuthProvider` before `BrowserRouter`), breaking `useNavigate()` usage inside `AuthProvider`. No test or lint rule will catch this.
|
||||
|
||||
**Goal**:
|
||||
Make incorrect provider ordering either impossible or immediately detectable.
|
||||
|
||||
**What to do**:
|
||||
1. Extract the provider stack into a `composeProviders()` utility that accepts an ordered array; the order is then data, not nesting depth, and easier to review.
|
||||
2. Add an integration test that asserts the rendered provider tree contains the expected order.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- `composeProviders` utilities can obscure stack traces; ensure display names are preserved.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `frontend/src/providers/PROVIDER_ORDER.md`: keep updated as the canonical order reference.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/providers/PROVIDER_ORDER.md`
|
||||
|
||||
---
|
||||
|
||||
### Issue #55: MEDIUM - Correlation ID Scope Is Module-Level (Resets on HMR)
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/api/client.ts:26-39` – `sessionCorrelationId` stored as a module variable
|
||||
|
||||
**Why this is needed**:
|
||||
Hot Module Replacement (HMR) re-evaluates modules, generating a new correlation ID mid-session. Distributed traces lose continuity, making it impossible to correlate logs from the same user session.
|
||||
|
||||
**Goal**:
|
||||
Persist the correlation ID across module re-evaluations for the lifetime of the browser tab.
|
||||
|
||||
**What to do**:
|
||||
1. Store the correlation ID in `sessionStorage` on first generation; read from there on subsequent module evaluations.
|
||||
2. Clear it on logout.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- `sessionStorage` is tab-local, which is the desired scope.
|
||||
- Ensure the ID is not leaked in URLs or logs.
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add comment explaining the persistence strategy in `client.ts`.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/api/client.ts` – `getSessionCorrelationId()`
|
||||
|
||||
---
|
||||
|
||||
### Issue #56: MEDIUM - No API Versioning or Deprecation Strategy
|
||||
|
||||
**Where found**:
|
||||
- All backend routers register under `/api/v1/` prefix but no versioning mechanism exists
|
||||
|
||||
**Why this is needed**:
|
||||
Breaking backend changes immediately break all frontend clients. Without a deprecation path, there is no safe way to evolve the API.
|
||||
|
||||
**Goal**:
|
||||
Define and implement an API lifecycle policy.
|
||||
|
||||
**What to do**:
|
||||
1. Document the versioning strategy (URL versioning is already in place; formalize it).
|
||||
2. Add a `Deprecation` response header to endpoints scheduled for removal.
|
||||
3. Implement a `/api/v2/` prefix for the next breaking change cycle.
|
||||
4. Add a CI check that flags new breaking changes against the OpenAPI spec.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Running two API versions simultaneously doubles maintenance surface; set a sunset date policy.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/`: create `API_VERSIONING.md` documenting the lifecycle and deprecation process.
|
||||
|
||||
**Doc references**:
|
||||
- All router files under `backend/app/routers/`
|
||||
|
||||
---
|
||||
|
||||
### Issue #57: MEDIUM - Health Endpoint Does Not Check Subsystems
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/routers/health.py`
|
||||
|
||||
**Why this is needed**:
|
||||
A process that is running but cannot reach the fail2ban socket, database, or config directory still returns `200 OK`. Load balancers and orchestrators treat it as healthy and route traffic to it, causing silent failures.
|
||||
|
||||
**Goal**:
|
||||
Health endpoint reflects true readiness of all critical subsystems.
|
||||
|
||||
**What to do**:
|
||||
1. Add a structured health check that tests: database connectivity, fail2ban socket accessibility, config directory read access, scheduler liveness.
|
||||
2. Return `200` only when all checks pass; return `503` with a JSON body listing failed checks otherwise.
|
||||
3. Expose a separate `/health/live` (process alive) and `/health/ready` (subsystems ready) endpoint for Kubernetes probes.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Slow health checks (e.g., DB connect timeout) can overwhelm the endpoint under load; set short timeouts per check.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/Deployment.md`: document liveness vs readiness probe URLs.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/routers/health.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #58: MEDIUM - Abort Signal Not Propagated in Request Deduplication
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/hooks/useFetchData.ts:93-113`
|
||||
|
||||
**Why this is needed**:
|
||||
When multiple hook instances share a `requestKey`, they await a single in-flight promise. When one component unmounts and aborts its signal, the shared request continues and calls `setData()` / `onSuccess()` on the unmounted component, causing React "state update on unmounted component" warnings and memory leaks.
|
||||
|
||||
**Goal**:
|
||||
Unmounting a component that joined a deduplicated request must not receive the result.
|
||||
|
||||
**What to do**:
|
||||
1. In the deduplication await path, check the component's own abort signal before calling `setData()` or `onSuccess()`.
|
||||
2. Wrap the deduplication subscriber list so each subscriber can individually opt out on abort.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- If all subscribers abort before the request resolves, consider whether the underlying request should also be cancelled.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `frontend/src/hooks/README.md`: document abort signal contract for deduplicated requests.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/hooks/README.md`
|
||||
|
||||
---
|
||||
|
||||
### Issue #59: MEDIUM - Middleware Registration Order Not Validated at Startup
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/main.py:53+` – middleware added via `app.add_middleware()` without order assertion
|
||||
|
||||
**Why this is needed**:
|
||||
The required order `CorrelationId → CSRF → RateLimit` is security-critical. A developer adding or reordering middleware silently breaks CSRF validation or produces rate-limit counters with no correlation ID attached.
|
||||
|
||||
**Goal**:
|
||||
Detect incorrect middleware order at startup, not at runtime under attack.
|
||||
|
||||
**What to do**:
|
||||
1. After all middleware is registered, introspect `app.middleware_stack` and assert the expected order.
|
||||
2. Write a unit test that instantiates the app and checks middleware ordering.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- FastAPI reverses the middleware stack internally (last registered = outermost); account for this when asserting order.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `backend/app/main.py`: add inline comment documenting the required order and why.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/middleware/` – individual middleware module docstrings
|
||||
|
||||
---
|
||||
|
||||
### Issue #60: MEDIUM - NavigationCancellationProvider Orphans Requests on Rapid Navigation
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/providers/NavigationCancellationProvider.tsx`
|
||||
- `frontend/src/hooks/useNavigationAbortSignal.ts:42-52`
|
||||
|
||||
**Why this is needed**:
|
||||
When a user navigates A → B → C rapidly, B's in-flight requests are not cancelled because B's signal is replaced before B's requests check it. These requests complete and may write stale data into the wrong page's state.
|
||||
|
||||
**Goal**:
|
||||
Every request initiated for a page is cancelled when that page is navigated away from, regardless of navigation speed.
|
||||
|
||||
**What to do**:
|
||||
1. Associate each request with the pathname that was active when it started, not the current pathname.
|
||||
2. On navigation, abort all controllers whose associated pathname no longer matches the current route.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Requests that intentionally survive navigation (e.g., background syncs) must opt out; provide an `ignoreCancellation` flag.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `frontend/src/providers/PROVIDER_ORDER.md`: document the cancellation contract.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/providers/NavigationCancellationProvider.tsx`
|
||||
|
||||
---
|
||||
|
||||
### Issue #61: MEDIUM - Pagination Offset vs Cursor Mode Indistinguishable to Frontend
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/utils/pagination.py:265-305`
|
||||
- `backend/app/models/response.py:125-180`
|
||||
|
||||
**Why this is needed**:
|
||||
The `PaginationMetadata` object uses sentinel values (`total=-1`, `total_pages=-1`) for cursor mode. If a backend endpoint silently switches pagination modes, frontend code using `total_pages` to render page controls will display `-1` with no error.
|
||||
|
||||
**Goal**:
|
||||
Frontend code can reliably detect which pagination mode is in use and render accordingly.
|
||||
|
||||
**What to do**:
|
||||
1. Add a `mode: "offset" | "cursor"` discriminator field to `PaginationMetadata`.
|
||||
2. Update frontend pagination components to branch on `mode` rather than checking for `-1`.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Adding a required field is a breaking change; make it optional with a default of `"offset"` for backward compatibility.
|
||||
|
||||
**Docs changes needed**:
|
||||
- API reference: document the `mode` field and its values.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/utils/pagination.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #62: MEDIUM - Blocklist URL Validation Is Async With No Rollback on Failure
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/services/blocklist_service.py`
|
||||
- `backend/app/models/blocklist.py:36-40`
|
||||
|
||||
**Why this is needed**:
|
||||
DNS validation runs asynchronously after the model is validated. If validation fails or is slow, concurrent requests can insert duplicate or invalid blocklist sources before the validation result is checked, leaving the database in a dirty state.
|
||||
|
||||
**Goal**:
|
||||
Blocklist source creation is atomic: either validation passes and the row is committed, or validation fails and no row exists.
|
||||
|
||||
**What to do**:
|
||||
1. Perform DNS/URL validation inside a database transaction; roll back on failure.
|
||||
2. Add a unique constraint on the URL column to catch duplicates at the DB level.
|
||||
3. Return a conflict error (409) on duplicate URL submissions.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Async DNS lookup inside a transaction holds the transaction open longer; use a short timeout.
|
||||
|
||||
**Docs changes needed**:
|
||||
- API reference: document the 409 conflict response for duplicate URLs.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/services/blocklist_service.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #63: MEDIUM - Correlation ID Lost Across Background Task Boundaries
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/tasks/health_check.py:70-74`
|
||||
- `backend/app/utils/correlation.py`
|
||||
|
||||
**Why this is needed**:
|
||||
Background tasks that spawn sub-tasks (e.g., health check triggering failover logic) do not propagate the correlation ID `ContextVar` to child asyncio tasks. Logs from child tasks appear without a correlation ID, breaking distributed tracing.
|
||||
|
||||
Additionally, `reset_correlation_id()` in the `finally` block clears the ID before all child tasks have logged.
|
||||
|
||||
**Goal**:
|
||||
Every log line emitted during a background job carries its originating correlation ID.
|
||||
|
||||
**What to do**:
|
||||
1. Use `asyncio.create_task(coro, context=copy_context())` to propagate the `ContextVar` to child tasks.
|
||||
2. Move `reset_correlation_id()` to after all child tasks have completed.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- `copy_context()` captures a snapshot; mutations in the parent after the copy won't be seen by the child (this is the desired behavior).
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add inline comment in `health_check.py` explaining context propagation.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/utils/correlation.py`
|
||||
|
||||
---
|
||||
|
||||
### Issue #64: MEDIUM - External Logging Failure Silently Swallowed
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/main.py:192-213`
|
||||
|
||||
**Why this is needed**:
|
||||
When Datadog, Papertrail, or Elasticsearch log handler initialization fails, the error is caught, logged as a warning to stdout, and the application continues. In production this means critical logs may never reach the monitoring system, and operators will not know until an incident occurs.
|
||||
|
||||
**Goal**:
|
||||
External logging failures are surfaced to operators at deployment time.
|
||||
|
||||
**What to do**:
|
||||
1. Promote the warning to an error and expose it via the health endpoint (Issue #57).
|
||||
2. Add a startup check: if `EXTERNAL_LOG_REQUIRED=true` and initialization fails, abort startup.
|
||||
3. Emit a metric/alert on initialization failure.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Making startup fail on logging issues may be too strict for some environments; make `EXTERNAL_LOG_REQUIRED` optional and default to `false`.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `Docs/Deployment.md`: document `EXTERNAL_LOG_REQUIRED` and the health check for logging status.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/main.py` logging initialization block
|
||||
|
||||
---
|
||||
|
||||
### Issue #65: MEDIUM - Abort Selector Inconsistency in useFetchData
|
||||
|
||||
**Where found**:
|
||||
- `frontend/src/hooks/useFetchData.ts:124-131`
|
||||
|
||||
**Why this is needed**:
|
||||
When a request is aborted, `refresh()` returns the raw response without running the `selector()` function. In non-aborted paths the selector runs. Callers of `refresh()` receive different types depending on the abort state, making the return type unreliable and causing subtle state shape mismatches.
|
||||
|
||||
**Goal**:
|
||||
`refresh()` returns a consistent type regardless of abort state.
|
||||
|
||||
**What to do**:
|
||||
1. On abort, return `null` (or a typed sentinel) instead of the raw response, so callers can handle the aborted case explicitly.
|
||||
2. Update the TypeScript return type of `refresh()` to reflect the nullable result.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Existing callers that ignore the return value are unaffected; callers that use it need to handle `null`.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `frontend/src/hooks/README.md`: document the `null` return on abort.
|
||||
|
||||
**Doc references**:
|
||||
- `frontend/src/hooks/README.md`
|
||||
|
||||
---
|
||||
|
||||
### Issue #67: LOW - Default Page Size Inconsistently Applied Across Routers
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/routers/history.py:80-84` – uses `DEFAULT_PAGE_SIZE` constant
|
||||
- Multiple other routers – may hardcode page size values
|
||||
|
||||
**Why this is needed**:
|
||||
Endpoints with different default page sizes create an inconsistent API experience and make it hard to reason about server load. A client that does not pass `page_size` gets different result counts from different endpoints.
|
||||
|
||||
**Goal**:
|
||||
All paginated endpoints use the same default page size driven by a single constant.
|
||||
|
||||
**What to do**:
|
||||
1. Audit all `page_size` Query parameters across routers.
|
||||
2. Replace all hardcoded defaults with `DEFAULT_PAGE_SIZE` from `constants.py`.
|
||||
3. Add a linting check or unit test that asserts no hardcoded page size defaults exist in routers.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Some endpoints may intentionally use a different page size for performance reasons; document exceptions explicitly.
|
||||
|
||||
**Docs changes needed**:
|
||||
- API reference: document the default page size and how to override it.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/utils/constants.py` – `DEFAULT_PAGE_SIZE`
|
||||
|
||||
---
|
||||
|
||||
### Issue #68: LOW - No Reserved Keyword Validation for Jail Names
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/models/jail.py` – jail name validated against alphanumeric regex only
|
||||
- `backend/app/routers/jail_config.py`
|
||||
|
||||
**Why this is needed**:
|
||||
Fail2ban uses reserved jail names and command keywords (e.g., `all`, `status`, `purge`). A user-created jail with a reserved name could shadow fail2ban built-in commands or produce confusing behavior when management commands are issued.
|
||||
|
||||
**Goal**:
|
||||
Reject jail names that conflict with fail2ban reserved words at model validation time.
|
||||
|
||||
**What to do**:
|
||||
1. Define a `FAIL2BAN_RESERVED_JAIL_NAMES` set in `constants.py`.
|
||||
2. Add a Pydantic validator on the jail name field that rejects reserved words.
|
||||
3. Return a 422 with a descriptive error message.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- The reserved word list may change across fail2ban versions; source it from fail2ban documentation and version-gate if necessary.
|
||||
|
||||
**Docs changes needed**:
|
||||
- API reference: document the list of reserved jail names.
|
||||
|
||||
**Doc references**:
|
||||
- Fail2ban documentation on reserved jail identifiers
|
||||
|
||||
---
|
||||
|
||||
### Issue #69: LOW - Jail Names Echoed in Error Messages Without Sanitization
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/exceptions.py:138,351` – jail names interpolated directly into error strings
|
||||
|
||||
**Why this is needed**:
|
||||
Although Python's `repr()` provides basic escaping, user-supplied jail names are reflected back in error messages. If these messages are ever rendered in an HTML context (e.g., a future admin UI or email notification), they become XSS vectors. They also act as confirmation oracles when combined with timing attacks.
|
||||
|
||||
**Goal**:
|
||||
Error messages referencing user input are sanitized before inclusion.
|
||||
|
||||
**What to do**:
|
||||
1. Pass user-supplied values through a dedicated `sanitize_for_display()` helper before interpolation.
|
||||
2. Ensure the helper strips or escapes HTML special characters.
|
||||
3. For API responses, always return the original (validated) field name rather than the raw user input.
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Over-escaping in JSON responses is not needed (JSON is not HTML); apply sanitization only at HTML render boundaries.
|
||||
|
||||
**Docs changes needed**:
|
||||
- `CONTRIBUTING.md`: document the rule that user input must not be echoed raw in messages.
|
||||
|
||||
**Doc references**:
|
||||
- `backend/app/exceptions.py`
|
||||
39
Docs/Testing-Requirements.md
Normal file
39
Docs/Testing-Requirements.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Testing Requirements
|
||||
|
||||
## Coverage Threshold
|
||||
|
||||
- **Minimum: 80% line coverage** for all backend code
|
||||
- Critical paths (auth, banning, scheduling, API endpoints): **100%**
|
||||
|
||||
## CI Enforcement
|
||||
|
||||
`.github/workflows/ci.yml` runs pytest with `--cov-fail-under=80`. Build fails if coverage drops below threshold.
|
||||
|
||||
## Running Tests Locally
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
pytest --cov=app --cov-report=term-missing
|
||||
```
|
||||
|
||||
## Coverage Reports
|
||||
|
||||
- Terminal: `--cov-report=term-missing`
|
||||
- HTML: `--cov-report=html` (output in `htmlcov/`)
|
||||
|
||||
## Coverage Badge
|
||||
|
||||
Add to README once CI runs successfully:
|
||||
|
||||
```md
|
||||
[](https://codecov.io/gh/<owner>/BanGUI)
|
||||
```
|
||||
|
||||
Requires codecov.io integration with repository.
|
||||
|
||||
## Writing Tests
|
||||
|
||||
- Follow pattern: `test_<unit>_<scenario>_<expected>`
|
||||
- Mock external dependencies (fail2ban socket, aiohttp calls)
|
||||
- Test happy path AND error/edge cases
|
||||
- See `Docs/Backend-Development.md §9` for detailed testing guide
|
||||
Reference in New Issue
Block a user