From 1830da496d392fc1cc827ea9d10908683e989cd7 Mon Sep 17 00:00:00 2001 From: Lukas Date: Fri, 1 May 2026 20:23:54 +0200 Subject: [PATCH] backup Co-authored-by: Copilot --- Docs/Tasks.md | 1638 ++++++++++++++++++++++++++++++++++++++++++++++- Docs/runner.csx | 4 +- 2 files changed, 1621 insertions(+), 21 deletions(-) diff --git a/Docs/Tasks.md b/Docs/Tasks.md index f355e70..ae1214b 100644 --- a/Docs/Tasks.md +++ b/Docs/Tasks.md @@ -1,33 +1,1633 @@ -## [LOW] No request deduplication on frontend +### Issue #1: CRITICAL - Services Return Response Models Instead of Domain Models -**Where found** +**Where found**: +- `backend/app/services/jail_service.py` (list_jails, get_jail) +- `backend/app/services/server_service.py` (get_settings) +- `backend/app/services/config_service.py` (get_jail_config, get_global_config) +- `backend/app/services/history_service.py` (list_history, get_ip_detail) +- `backend/app/services/filter_config_service.py` (list_filters) +- `backend/app/services/blocklist_service.py` (preview_blocklist) +- `backend/app/services/health_service.py` (get_service_status) +- `backend/app/repositories/protocols.py` (service interface definitions) -- `frontend/src/hooks/useFetchData.ts` — each call launches new request -- User clicks "Refresh" twice → two identical requests +**Why this is needed**: +The documented architecture explicitly states: "Services return **domain models** (e.g., `DomainActiveBanList`) that represent pure business logic. Response models are defined in `app/models/` and used only by routers. Conversion happens at the **router boundary**." This violates the principle that "Domain logic can evolve without affecting API shape." -**Why this is needed** +**Goal**: +Refactor all services to return domain models instead of response models. This enables services to be reusable across different frontends (GraphQL, gRPC, CLI), makes testing simpler, and decouples HTTP concerns from business logic. -Duplicates waste bandwidth, cause race conditions (response 2 arrives first, then response 1 overwrites with stale data). +**What to do**: +1. Create `app/models/{domain}_domain.py` for each domain (config, jail, history, server, health) +2. Create `app/mappers/{domain}_mappers.py` with conversion functions +3. Refactor each service to return domain models +4. Update routers to use mappers at the boundary: `domain_result = await service(...); return map_domain_to_response(domain_result)` +5. Update service protocols to specify domain model returns +6. Update service tests to work with simple domain objects -**Goal** +**Possible traps and issues**: +- Services like `ban_service` already implement this correctly (with `ban_domain.py` and `ban_mappers.py`). Use this as template to avoid inconsistency. +- Frontend tests that mock services will need updates if mocking response models +- Existing code that imports response models from services will break and need refactoring +- Performance might change if mappers do additional transformations (profile and optimize) -Deduplicate identical in-flight requests. +**Docs changes needed**: +- Update `Docs/Architekture.md` section 2.2 to clarify domain vs response model distinction with clear examples +- Update service development guidelines in `Docs/Backend-Development.md` with step-by-step guide for adding new services +- Create `Docs/DOMAIN_MODELS.md` explaining the pattern and where to find examples -**What to do** +**Doc references**: +- `Docs/Architekture.md` - Section 2.2 "Module Purposes - Services" +- ARCHITECTURE_REVIEW.md - Section "CRITICAL: Services Return Response Models" +- DETAILED_FINDINGS.md - (implicit in architecture violation) -1. Implement request cache -2. Clear cache entry when response received -3. Use in `useFetchData` +--- -**Possible traps and issues** +### Issue #2: CRITICAL - Scheduler Lock Race Condition (Background Tasks Could Permanently Stop) -- Cache must be cleared on data mutation -- Stale data in cache possible if not careful +**Where found**: +- `backend/app/utils/scheduler_lock.py` (lines 114+) +- `backend/app/tasks/` (all background job registration) +- `backend/app/startup.py` (scheduler initialization) -**Docs changes needed** +**Why this is needed**: +The current scheduler lock implementation can leave an orphaned lock if the heartbeat task crashes. When this happens, the second instance cannot acquire the scheduler role, and ALL background tasks (blocklist imports, geo cache cleanup, history sync, session cleanup) stop permanently. This is a critical production issue. -- No documentation changes +**Goal**: +Implement atomic lock acquisition with timeout-based expiration, ensuring that dead processes don't block the entire scheduler system. -**Doc references** +**What to do**: +1. Add `heartbeat_timeout` to `scheduler_lock` table (default 5 minutes) +2. Update `acquire_lock()` to use `INSERT ... ON CONFLICT` with `BEGIN IMMEDIATE` transaction +3. Add logic to steal lock if previous holder's heartbeat is stale +4. Add heartbeat update with error handling (don't crash if update fails) +5. Add monitoring to detect stuck locks +6. Write tests for: + - Concurrent lock acquisition attempts + - Orphaned lock recovery + - Heartbeat update failures -- `frontend/src/hooks/useFetchData.ts` \ No newline at end of file +**Possible traps and issues**: +- Must use explicit transactions (`BEGIN IMMEDIATE`) to avoid race conditions in SQLite +- Heartbeat update timing is critical: too short = false positives, too long = stale locks +- Multiple processes might try to steal lock simultaneously - need atomic operation +- Database lock contention on scheduler_lock table during high load +- Clock skew between servers could cause lock expiration issues (use DB time, not system time) + +**Docs changes needed**: +- Add section to `Docs/Observability.md` on monitoring scheduler lock health +- Document in `Docs/Deployment.md` what to do if scheduler stops +- Add error recovery procedures to `Docs/TROUBLESHOOTING.md` + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #11 "Scheduler Lock Race Condition" +- `backend/app/utils/scheduler_lock.py` - Current implementation +- `backend/app/startup.py` (lines 89-103) - Startup validation + +--- + +## HIGH PRIORITY ISSUES + +--- + +### Issue #3: HIGH - Unbounded Query Results Causing OOM (Out of Memory) + +**Where found**: +- `backend/app/repositories/history_archive_repo.py` - `get_all_archived_history()` +- `backend/app/services/ban_service.py` (lines 589-600) - `bans_by_country()` loads all unique IPs into memory +- `backend/app/services/ban_service.py` (lines 650-680) - N+1 geo lookup pattern + +**Why this is needed**: +With large deployments having millions of ban records, queries that load entire tables into memory cause: +- Memory spikes that crash the container +- Slow dashboard performance +- Database file growth without bounds + +**Goal**: +Implement pagination, streaming, and batch processing for all large queries to ensure bounded memory usage and consistent performance. + +**What to do**: +1. Refactor `get_all_archived_history()` to only be called with pagination parameters +2. Refactor `bans_by_country()` to: + - Process countries in batches + - Stream results instead of collecting all in memory + - Implement server-side aggregation in SQL instead of Python loops +3. Add `LIMIT` + `OFFSET` or cursor-based pagination to all list endpoints +4. Implement batch geo lookups instead of per-IP loops +5. Add tests with large datasets (1M+ records) to catch performance regressions + +**Possible traps and issues**: +- Changing query patterns might break sorting/filtering logic +- Pagination cursor format must be consistent across endpoints +- Memory usage must be monitored in production +- Aggregation queries might need new database indexes +- Frontend pagination UI assumes cursor format - changes will break old clients + +**Docs changes needed**: +- Add performance guidelines to `Docs/Backend-Development.md` - "Never load unbounded result sets" +- Create `Docs/PERFORMANCE.md` with query optimization patterns +- Document pagination standards in API docs + +**Doc references**: +- DETAILED_FINDINGS.md - Issues #2, #3, #4 (Unbounded queries, N+1, Large structures) +- DATABASE_API_DEPLOYMENT_ISSUES.md - Section "Database Design Issues" + +--- + +### Issue #4: HIGH - Missing Rate Limiting on Write Operations + +**Where found**: +- `backend/app/middleware/rate_limit.py` - Only applied to login endpoint +- `backend/app/routers/bans.py` - POST /api/bans/ban, POST /api/bans/unban (NO rate limit) +- `backend/app/routers/blocklist.py` - POST /api/blocklists/:id/import (NO rate limit) +- `backend/app/routers/config.py` - PUT endpoints (NO rate limit) + +**Why this is needed**: +Without rate limiting on state-mutating endpoints, an attacker can: +- Spam ban requests to exhaust fail2ban resources +- Trigger repeated blocklist imports consuming bandwidth/CPU +- Cause DoS by hammering config updates + +**Goal**: +Extend rate limiting to all write operations (POST, PUT, DELETE) with appropriate rate limits per operation type. + +**What to do**: +1. Create rate limit buckets for different operations: + - `bans:ban` - 100/minute per IP + - `bans:unban` - 100/minute per IP + - `blocklist:import` - 10/hour per IP + - `config:update` - 50/minute per IP +2. Apply rate limiting middleware to all write endpoints +3. Return 429 with `Retry-After` header when limit exceeded +4. Add metrics/monitoring for rate limit hits +5. Make rate limits configurable via environment variables + +**Possible traps and issues**: +- Rate limiting at IP level doesn't work behind proxies (need proper X-Forwarded-For handling) +- Different operations need different rate limits (can't use global limit) +- Legitimate bulk operations might hit limits unexpectedly +- Rate limit state must be persistent across process restarts (use database or Redis) +- False positives from internal monitoring scripts hammering endpoints + +**Docs changes needed**: +- Add rate limit table to API documentation +- Document in `Docs/CONFIGURATION.md` how to adjust rate limits +- Add to `Docs/TROUBLESHOOTING.md` - "Getting 429 Too Many Requests" + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #5 "Missing Rate Limiting" +- `backend/app/middleware/rate_limit.py` - Current implementation + +--- + +### Issue #5: HIGH - API Has No Versioning Strategy + +**Where found**: +- All routers use `/api/` prefix with no version indicator +- `backend/app/main.py` - FastAPI app initialization +- Frontend type generation expects schema at `/api/openapi.json` +- No deprecation warnings in responses + +**Why this is needed**: +Without versioning, any breaking change in API responses immediately breaks all deployed frontend clients. There's no way to support multiple client versions. The current architecture has no migration path for API evolution. + +**Goal**: +Implement API versioning that allows: +- Supporting multiple client versions simultaneously +- Deprecating endpoints safely +- Rolling out breaking changes without downtime + +**What to do**: +1. Introduce versioning: `/api/v1/`, `/api/v2/`, etc. +2. Move all current endpoints to `/api/v1/` +3. Keep v1 stable and backward compatible +4. When breaking changes needed, create v2 endpoints +5. Add deprecation headers to old endpoints: `Deprecation: true`, `Sunset: date` +6. Update frontend to use versioned API URLs +7. Add CI check to prevent accidental breaking changes + +**Possible traps and issues**: +- Frontend needs to be updated to use versioned URLs +- Existing deployments must continue working (can't remove v1) +- Developers might accidentally break backward compatibility in v1 +- Multiple versions mean duplicate code (need proper code sharing) +- Route registering becomes more complex + +**Docs changes needed**: +- Create `Docs/API_VERSIONING.md` explaining strategy and when to bump versions +- Update API documentation to show which version each endpoint is in +- Add to `Docs/Backend-Development.md` - "Creating versioned endpoints" + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "2.2 No API Versioning Strategy" + +--- + +### Issue #6: HIGH - Inconsistent API Response Format Between Endpoints + +**Where found**: +- `backend/app/routers/bans.py` - Returns `CommandResponse(success: bool, message: str)` +- `backend/app/routers/history.py` - Returns `HistoryListResponse(bans: [], pagination: {})` +- `backend/app/routers/status.py` - Returns `dict[str, str]` with no consistent structure +- Various routers return different error formats + +**Why this is needed**: +Frontend cannot assume consistent structure, must write different parsing code for each endpoint. Type safety is compromised. Mapping backend schema to frontend types becomes fragile. + +**Goal**: +Standardize on a unified response format across all endpoints to ensure type safety and consistency. + +**What to do**: +1. Define standard response format: + ```python + @dataclass + class ApiResponse(Generic[T]): + success: bool + data: T | None + error: str | None + error_code: str | None = None + pagination: Pagination | None = None + metadata: dict[str, Any] | None = None + ``` +2. Update all routers to use this format +3. Map endpoint-specific responses into `data` field +4. Add error codes for different error types +5. Update frontend to parse this format +6. Add validation that all endpoints conform to format + +**Possible traps and issues**: +- Backward compatibility breaks (clients expecting old format fail) +- Existing frontend code expects old response format +- Performance impact of additional wrapping (measure and optimize) +- Migration burden - all routers need updates + +**Docs changes needed**: +- Update API documentation with new response format +- Add examples of standard response across different scenarios +- Create migration guide for API consumers + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "2.1 Inconsistent Response Format" + +--- + +### Issue #7: HIGH - Docker Health Check Fails in Production + +**Where found**: +- `Docker/compose.prod.yml` (lines 40-47) +- Backend health check runs Python in subprocess but image has no Python in runtime +- Frontend health check uses `wget` that may not exist + +**Why this is needed**: +Health checks always fail, container marked unhealthy, Kubernetes/orchestrators evict pod thinking service is down. Cascades to service outages. + +**Goal**: +Implement reliable health checks that actually verify service health without depending on missing tools. + +**What to do**: +1. Replace Python subprocess check with simple HTTP request: + ```yaml + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"] + ``` +2. Implement comprehensive health check endpoint that verifies: + - Database is accessible + - fail2ban socket is reachable + - Background scheduler is healthy + - Cache systems are initialized +3. Return 503 if any component is unhealthy +4. Add timeout and retry parameters: + ```yaml + interval: 30s + timeout: 10s + retries: 3 + start_period: 40s + ``` +5. Test health check in both Alpine and full Python images + +**Possible traps and issues**: +- curl might not be installed in lightweight images (use `wget` as fallback or multi-tool) +- Health check runs frequently (every 30s) - must be lightweight +- False positives if timeout is too short +- Can't check fail2ban socket during initial startup (timing issue) + +**Docs changes needed**: +- Add troubleshooting section to `Docs/TROUBLESHOOTING.md` - "502 Bad Gateway errors" +- Document health check behavior in `Docs/Deployment.md` + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "6.2 Missing Health Checks in Production" +- `Docker/compose.prod.yml` - Current health check definition + +--- + +### Issue #8: HIGH - Database Migration Fragility (Rollback Missing) + +**Where found**: +- `backend/app/startup.py` (lines 105-130) - `run_migrations()` +- `backend/app/db.py` (lines 132-142) - Schema migration table +- No transaction wrapping migrations + +**Why this is needed**: +If a migration fails mid-transaction, the database schema becomes inconsistent: +- Some tables created, others not +- Next startup tries to re-apply migration +- Second attempt fails because column already exists +- Application can't start + +**Goal**: +Implement transactional migrations with automatic rollback on failure. + +**What to do**: +1. Wrap each migration in explicit transaction: + ```python + async def run_migration(name: str, migration_func): + try: + await db.execute("BEGIN EXCLUSIVE") + await migration_func() + await db.execute( + "INSERT INTO schema_migrations (name, applied_at) VALUES (?, ?)", + name, int(time.time()) + ) + await db.execute("COMMIT") + except Exception: + await db.execute("ROLLBACK") + raise + ``` +2. Add idempotent migration checks (CREATE TABLE IF NOT EXISTS) +3. Implement downtime procedure for failed migrations +4. Add migration rollback capability +5. Test migration failures and recovery + +**Possible traps and issues**: +- SQLite WAL mode can leave orphaned `.wal` files after crashes (handle cleanup) +- Long-running migrations might timeout +- Complex migrations spanning multiple steps hard to make atomic +- Rollback procedure must be well-documented and tested +- Development vs production migration paths might diverge + +**Docs changes needed**: +- Create `Docs/DATABASE_MIGRATIONS.md` explaining migration process +- Add disaster recovery section to `Docs/Deployment.md` +- Document in `Docs/TROUBLESHOOTING.md` - "Database migration failed" + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "6.3 Database Migration Issues" +- `backend/app/startup.py` - Migration runner + +--- + +### Issue #9: HIGH - No Graceful Shutdown Handling + +**Where found**: +- `Docker/Dockerfile.backend` (lines 75-76) - Entrypoint doesn't handle signals +- `backend/app/main.py` - Lifespan doesn't gracefully shutdown +- No SIGTERM handler in background tasks +- Container receives SIGKILL without cleanup opportunity + +**Why this is needed**: +Without graceful shutdown: +- In-flight ban requests don't complete +- Background jobs interrupted mid-import +- Incomplete blocklist imports leave stale data +- Database connections don't close properly + +**Goal**: +Implement graceful shutdown that allows pending operations to complete before process exits. + +**What to do**: +1. Implement lifespan context manager that handles shutdown: + ```python + @asynccontextmanager + async def lifespan(app: FastAPI): + # Startup + yield + # Shutdown: allow pending tasks + tasks = [t for t in asyncio.all_tasks() if not t.done()] + await asyncio.gather(*tasks, return_exceptions=True) + ``` +2. Set graceful_timeout in Dockerfile: `docker stop --time=30` +3. Handle SIGTERM signal to trigger shutdown +4. Drain in-flight requests before exit +5. Close database connections and scheduler cleanly +6. Add logging for shutdown events + +**Possible traps and issues**: +- Tasks might take longer than shutdown timeout (configure appropriate timeout) +- Hanging tasks won't terminate (need proper cancellation) +- Scheduler might reject new jobs during shutdown +- Race conditions between shutdown and new requests + +**Docs changes needed**: +- Add deployment best practices to `Docs/Deployment.md` +- Document graceful shutdown in `Docs/TROUBLESHOOTING.md` + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "6.4 No Graceful Shutdown" + +--- + +### Issue #10: HIGH - Database Type Inconsistency (Timestamps Mixed Across Tables) + +**Where found**: +- `backend/app/db.py` (lines 68-75) +- `import_log` table uses TEXT ISO 8601 format +- `history_archive` table uses INTEGER UNIX timestamp +- Frontend receives both formats + +**Why this is needed**: +Frontend parsing code must handle multiple formats: +- Parser might parse one format incorrectly +- Inconsistent type representation makes bugs harder to track +- Aggregation queries mixing both formats require conversions + +**Goal**: +Standardize all timestamps on UNIX timestamps (INTEGER seconds since epoch) throughout entire database. + +**What to do**: +1. Migrate `import_log.timestamp` from TEXT to INTEGER: + ```sql + ALTER TABLE import_log ADD COLUMN timestamp_unix INTEGER; + UPDATE import_log SET timestamp_unix = strftime('%s', timestamp); + ALTER TABLE import_log DROP COLUMN timestamp; + ALTER TABLE import_log RENAME COLUMN timestamp_unix TO timestamp; + ``` +2. Update all code to use UNIX timestamps +3. Add validation in repositories that timestamps are UNIX format +4. Update frontend parsing to handle UNIX timestamps +5. Write migration tests with various date formats + +**Possible traps and issues**: +- Existing data with invalid timestamps causes migration to fail +- Timezone issues if code assumes local time +- Backward compatibility breaks for old APIs +- Performance impact of migration on large tables + +**Docs changes needed**: +- Document timestamp format requirement in API docs +- Add to backend development guide +- Create migration runbook + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1.2 Data Type Inconsistency" + +--- + +### Issue #11: HIGH - Foreign Key ON DELETE Semantics Problem (Data Loss) + +**Where found**: +- `backend/app/db.py` (lines 61-70) +- `import_log.source_id` uses `ON DELETE SET NULL` +- `import_log.source_url` remains populated +- Orphaned log records with NULL source_id but populated URL + +**Why this is needed**: +When a blocklist source is deleted: +- Import logs become orphaned and meaningless +- UI can't link logs back to source +- Data becomes inconsistent + +**Goal**: +Fix foreign key cascade strategy to maintain data integrity. + +**What to do**: +1. Decide cascade strategy: + - Option A: `ON DELETE CASCADE` - delete all logs when source deleted (data loss) + - Option B: `ON DELETE RESTRICT` - prevent source deletion if logs exist (prevent data loss) + - Option C: Keep source_id for history without URL (reference counting) +2. Recommendation: Use Option B with proper deletion workflow: + ```sql + ALTER TABLE import_log + DROP CONSTRAINT fk_source_id, + ADD CONSTRAINT fk_source_id + FOREIGN KEY(source_id) REFERENCES blocklist_sources(id) + ON DELETE RESTRICT; + ``` +3. If deleting source, first archive and delete old logs +4. Update deletion API to handle RESTRICT error +5. Document deletion procedures + +**Possible traps and issues**: +- Existing schema already has ON DELETE SET NULL - migration needed +- User tries to delete source with logs - must handle RESTRICT error +- Cascading deletes can cause unexpected data loss +- Historical logs might be valuable for audit + +**Docs changes needed**: +- Add data retention policy to `Docs/Features.md` +- Document deletion constraints in API docs +- Add to admin guide + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1.3 Foreign Key ON DELETE" + +--- + +### Issue #12: HIGH - Race Condition in Concurrent Writes (Import Runs Duplication) + +**Where found**: +- `backend/app/repositories/import_run_repo.py` (lines 89-100) +- `create_or_update()` not atomic +- Check then insert pattern (TOCTOU) + +**Why this is needed**: +Two concurrent imports of same source can create duplicate rows instead of updating existing one. + +**Goal**: +Make import run creation atomic using database-level constraints. + +**What to do**: +1. Replace check-then-insert with INSERT ON CONFLICT: + ```python + await self.db.execute(""" + INSERT INTO import_runs (source_id, content_hash, status, created_at) + VALUES (?, ?, 'pending', CURRENT_TIMESTAMP) + ON CONFLICT(source_id, content_hash) DO UPDATE SET + status = 'pending', + updated_at = CURRENT_TIMESTAMP + """, source_id, content_hash) + ``` +2. Ensure UNIQUE(source_id, content_hash) constraint exists +3. Test concurrent import scenario +4. Handle conflict resolution properly + +**Possible traps and issues**: +- ON CONFLICT syntax varies by database (SQLite vs PostgreSQL) +- Concurrent inserts might still have race windows +- Error handling for constraint violations + +**Docs changes needed**: +- Add concurrency guidelines to development docs +- Document data consistency model + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "10.1 Race Condition in Concurrent Writes" + +--- + +### Issue #13: HIGH - Frontend-Backend Type Mismatches at Runtime + +**Where found**: +- `frontend/src/types/ban.ts` expects `country_code: string | null` +- `backend/app/models/ban.py` could return empty string `""` +- Frontend type narrowing: `if (ban.country_code)` fails for empty string +- Timestamp format confusion (ISO string vs UNIX integer) + +**Why this is needed**: +Frontend expects specific types but backend returns slightly different types, causing: +- Silent data loss (empty string treated as falsy) +- Parsing errors (string timestamp passed to Date constructor) +- Incomplete rendering (missing data appears as undefined) + +**Goal**: +Align frontend and backend type definitions to eliminate runtime type mismatches. + +**What to do**: +1. Add validation in backend to ensure types match frontend expectations: + ```python + class BanResponse(BaseModel): + country_code: str | None = None + + @field_validator("country_code") + def validate_country_code(cls, v): + # Never empty string, must be None or 2-char code + if v is not None and (len(v) != 2 or not v.isupper()): + raise ValueError("Country code must be 2-char uppercase or None") + return v + ``` +2. Standardize timestamp format (use UNIX epoch everywhere) +3. Update frontend types to match backend validation +4. Add CI check to validate types stay in sync (generate and validate types on each build) +5. Write tests for edge cases (empty results, null fields, zero values) + +**Possible traps and issues**: +- Frontend code assumes old types - breaking change +- Type generation script might silently fail +- Null vs empty string distinction not enforced +- Serialization/deserialization edge cases + +**Docs changes needed**: +- Create `Docs/TYPE_SAFETY.md` explaining shared type system +- Add to API documentation type constraints +- Document type generation process in development guide + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "4.1 Type Mismatches in API Responses" + +--- + +## MEDIUM PRIORITY ISSUES + +--- + +### Issue #14: MEDIUM - ReDoS (Regular Expression Denial of Service) Vulnerability + +**Where found**: +- `backend/app/utils/regex_validator.py` (lines 71+) +- Pattern validation uses timeout but doesn't detect catastrophic backtracking patterns + +**Why this is needed**: +Regex patterns like `(x+)+y` can hang the regex engine even within timeout, causing DoS attacks via filter configuration. + +**Goal**: +Detect known ReDoS patterns before compiling them. + +**What to do**: +1. Add regex pattern analysis library: + ```bash + pip install regexploit + ``` +2. Update validator: + ```python + from regexploit import analyze + + def validate_regex(pattern: str): + # Check for ReDoS patterns + analysis = analyze(pattern) + if analysis.has_redos: + raise ValueError(f"ReDoS pattern detected: {analysis.reason}") + + # Also do timeout check + try: + re.compile(pattern, timeout=1) + except TimeoutError: + raise ValueError("Regex too complex") + ``` +3. Test against known ReDoS patterns +4. Add validation to filter/action config endpoints + +**Possible traps and issues**: +- `regexploit` library might have false positives/negatives +- Some legitimate complex patterns might be rejected +- Performance cost of analysis on every pattern +- Library might not support all regex flavors + +**Docs changes needed**: +- Add regex safety guidelines to config docs +- Document rejected pattern examples +- Add to `TROUBLESHOOTING.md` - "Regex pattern rejected" + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #6 "ReDoS Vulnerability" + +--- + +### Issue #15: MEDIUM - N+1 Query Pattern in Geo Lookups + +**Where found**: +- `backend/app/services/ban_service.py` (lines 650-680) +- `_enrich_bans_with_geo()` calls resolver per IP in loop +- 1000 bans = 1000 geo lookups + +**Why this is needed**: +Dashboard becomes slow with many bans: +- Each geo lookup hits database or HTTP API +- 1000 bans = 1000 lookups = seconds of latency +- API rate limiting exceeded if fallback HTTP enabled + +**Goal**: +Batch geo lookups and implement caching to reduce database round-trips. + +**What to do**: +1. Implement batch geo resolution: + ```python + async def enrich_bans_with_geo(bans): + # Extract unique IPs + unique_ips = set(ban.ip for ban in bans) + + # Batch lookup + geo_data = await self.geo_cache.resolve_batch(unique_ips) + + # Attach to bans + for ban in bans: + ban.country = geo_data.get(ban.ip) + ``` +2. Use geo_cache for persistence (cache is already designed for batching) +3. Add metrics for cache hit/miss ratio +4. Implement pre-warming of common IPs +5. Add tests with large ban lists + +**Possible traps and issues**: +- Batch lookup might be slower than individual if dataset is small +- Cache invalidation on country name updates +- Memory usage if caching all IPs + +**Docs changes needed**: +- Add performance optimization guide +- Document geo cache architecture + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #7 "N+1 Query Pattern" + +--- + +### Issue #16: MEDIUM - Silent Failures in Error Handling (Broad Exception Handlers) + +**Where found**: +- `backend/app/routers/config_misc.py` (line 54) - `except Exception:` +- `backend/app/ban_service.py` - Multiple broad exception handlers +- Silent failures hide programming errors + +**Why this is needed**: +Broad `except Exception:` catches programming errors (AttributeError, KeyError) alongside legitimate errors, hiding bugs in logs. + +**Goal**: +Catch only specific exceptions; let programming errors bubble up to global error handler. + +**What to do**: +1. Replace broad handlers with specific exceptions: + ```python + try: + config = parse_config(raw_text) + except ConfigParseError as e: # Specific + logger.error(f"Config parse failed: {e}") + except FileNotFoundError as e: + logger.error(f"Config file not found: {e}") + except Exception as e: + logger.exception("Unexpected error parsing config") + raise # Re-raise to global handler + ``` +2. Create domain-specific exception classes +3. Document what exceptions each function can raise +4. Update tests to verify exception handling + +**Possible traps and issues**: +- Missing exception types will let errors bubble up unexpectedly +- Catching too few exceptions leads to uncaught errors +- Global exception handler needed to catch unhandled exceptions + +**Docs changes needed**: +- Add exception handling guidelines to dev docs +- Create exception taxonomy document + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #16 "Broad Exception Handlers" + +--- + +### Issue #17: MEDIUM - No API Response Status Code Documentation + +**Where found**: +- All routers lack OpenAPI `responses={}` documentation +- Status codes for success/failure not documented +- Frontend must infer from response body + +**Why this is needed**: +Frontend doesn't know: +- Is 400 a validation error or configuration error? +- Is 502 from backend or fail2ban? +- Which 503 status means setup incomplete vs fail2ban down? + +**Goal**: +Document all possible status codes and response formats for each endpoint. + +**What to do**: +1. Add to each router endpoint: + ```python + @router.post( + "/login", + responses={ + 200: {"description": "Login successful", "model": LoginResponse}, + 400: {"description": "Invalid request format"}, + 401: {"description": "Invalid password"}, + 429: {"description": "Too many attempts, retry after 60s"}, + 503: {"description": "Setup not complete"}, + } + ) + ``` +2. Generate OpenAPI schema with descriptions +3. Update API docs with status code reference table +4. Validate in CI that all endpoints documented + +**Possible traps and issues**: +- Documentation might become stale as code changes +- Multiple response types for single status code (must document each) + +**Docs changes needed**: +- Create API reference documenting all status codes +- Add endpoint documentation template + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "2.3 Missing Status Code Documentation" + +--- + +### Issue #18: MEDIUM - Configuration Validation Missing at Startup + +**Where found**: +- `backend/app/config.py` (lines 37-95) +- `database_path` has no validation +- `fail2ban_socket` not verified to exist +- Hard-coded paths assumed in Docker + +**Why this is needed**: +Configuration errors not caught at startup: +- Database path doesn't exist - fails on first DB operation (confusing error) +- fail2ban socket wrong path - only fails when health check runs +- Directory not writable - discovered hours after deployment + +**Goal**: +Validate all configuration at startup with clear error messages. + +**What to do**: +1. Add validators to config fields: + ```python + @field_validator("database_path") + def validate_db_path(cls, v): + path = Path(v) + parent = path.parent + + if not parent.exists(): + raise ValueError( + f"Database parent directory does not exist: {parent}\n" + f"Create it with: mkdir -p {parent}" + ) + + if not os.access(parent, os.W_OK): + raise ValueError( + f"Database directory not writable: {parent}\n" + f"Fix with: chmod 755 {parent}" + ) + + return v + ``` +2. Validate fail2ban socket exists and is readable +3. Verify session secret is sufficiently long +4. Check environment variables are set +5. Provide actionable error messages + +**Possible traps and issues**: +- Validation might be too strict for some deployments +- Need to handle cases where files don't exist yet but will be created +- Docker initialization order might delay file creation + +**Docs changes needed**: +- Document required directories and permissions +- Create setup validation troubleshooting guide + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.2 Missing Configuration Validation" + +--- + +### Issue #19: MEDIUM - Sensitive Data Could Leak in Logs + +**Where found**: +- `backend/app/utils/fail2ban_client.py` (line 148) - Logs subprocess output without sanitization +- Could contain passwords, API keys from config files + +**Why this is needed**: +If subprocess output contains secrets, logs become security liability and violate compliance requirements. + +**Goal**: +Sanitize logs to remove sensitive information patterns. + +**What to do**: +1. Create sanitizer function: + ```python + def sanitize_for_logging(text: str) -> str: + # Remove passwords + text = re.sub(r'password[=:]\S+', 'password=***', text, flags=re.IGNORECASE) + # Remove API keys + text = re.sub(r'api[_-]?key[=:]\S+', 'api_key=***', text, flags=re.IGNORECASE) + # Remove tokens + text = re.sub(r'token[=:]\S+', 'token=***', text, flags=re.IGNORECASE) + return text + ``` +2. Apply to all subprocess output and external responses +3. Add to logging middleware +4. Audit existing logs for sensitive data + +**Possible traps and issues**: +- Patterns might miss some sensitive data formats +- Over-sanitization might hide helpful debug info +- Performance cost if sanitizing large outputs + +**Docs changes needed**: +- Add logging best practices guide +- Document what's sanitized + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #25 "Sensitive Data in Logs" + +--- + +### Issue #20: MEDIUM - No Correlation ID in Background Tasks + +**Where found**: +- All task files in `backend/app/tasks/` +- Background tasks don't propagate correlation ID +- Can't correlate task logs with triggering request + +**Why this is needed**: +Troubleshooting becomes hard: +- Task fails +- Logs show task name +- Can't find what triggered it + +**Goal**: +Track correlation ID through entire request lifecycle including background tasks. + +**What to do**: +1. Use contextvars for correlation ID: + ```python + from contextvars import ContextVar + + correlation_id_var: ContextVar[str] = ContextVar('correlation_id', default='bg-task') + + async def blocklist_import_task(source_id: str, correlation_id: str): + token = correlation_id_var.set(correlation_id) + try: + logger.info(f"Starting import for source {source_id}") + finally: + correlation_id_var.reset(token) + ``` +2. Pass correlation ID to background tasks +3. Include in structured logs +4. Create task tracking UI showing correlation ID + +**Possible traps and issues**: +- Correlation ID must flow through all async contexts +- Need to pass ID when scheduling tasks +- Multiple nested tasks might have parent/child correlation IDs + +**Docs changes needed**: +- Add observability guide for background tasks +- Document correlation ID format + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #26 "Missing Correlation ID" + +--- + +### Issue #21: MEDIUM - Missing Database Indexes for Performance + +**Where found**: +- fail2ban's `bans.timeofban` column likely has no index +- BanGUI queries that filter by timeofban + jail do full table scan +- Large datasets cause slow dashboard loads + +**Why this is needed**: +Without indexes on commonly filtered columns, query performance degrades as dataset grows. + +**Goal**: +Add appropriate indexes to improve query performance. + +**What to do**: +1. Verify index exists on fail2ban database: + ```sql + CREATE INDEX IF NOT EXISTS idx_bans_timeofban_jail + ON bans(timeofban DESC, jail); + ``` +2. Check BanGUI database for missing indexes +3. Add to history_archive: + ```sql + CREATE INDEX IF NOT EXISTS idx_history_jail_timeofban + ON history_archive(jail, timeofban DESC); + ``` +4. Monitor query performance before/after +5. Add index creation to migration system + +**Possible traps and issues**: +- Creating indexes on large tables locks table (need CONCURRENT on PostgreSQL) +- Wrong index type slows queries instead of helping +- Too many indexes slow writes + +**Docs changes needed**: +- Document query optimization patterns +- Performance tuning guide + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #10 "Missing DB Index" + +--- + +### Issue #22: MEDIUM - Socket Cleanup Errors Silently Suppressed + +**Where found**: +- `backend/app/utils/fail2ban_client.py` (line 150) +- Uses `contextlib.suppress(Exception)` hiding close errors +- Resource leaks possible + +**Why this is needed**: +Suppressing all errors hides real problems: +- Socket never actually closes +- Connection pool exhaustion +- File descriptors leak + +**Goal**: +Log errors properly while still cleaning up resources. + +**What to do**: +1. Replace suppress with logging: + ```python + finally: + try: + await socket.close() + except Exception as e: + logger.warning(f"Error closing fail2ban socket: {e}", exc_info=True) + ``` +2. Audit other resource cleanup (database connections, HTTP sessions) +3. Write tests that verify cleanup happens even on errors + +**Possible traps and issues**: +- Close might raise unexpected errors +- Logging exceptions in finally blocks can cause issues +- Resource exhaustion hard to debug if not logging + +**Docs changes needed**: +- Add resource cleanup guidelines to dev guide + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #14 "Socket Cleanup Silent Fail" + +--- + +### Issue #23: MEDIUM - Missing Default Configuration Documentation + +**Where found**: +- `.env.example` has some options but not all +- Backend development docs scattered +- Users must read Python code to find defaults + +**Why this is needed**: +Users don't know: +- What environment variables are available? +- What are the defaults? +- What values are valid? + +**Goal**: +Create comprehensive configuration reference documentation. + +**What to do**: +1. Create `Docs/CONFIGURATION.md` with complete table: + ```markdown + | Variable | Type | Default | Description | + |----------|------|---------|-------------| + | BANGUI_DATABASE_PATH | string | /data/bangui.db | SQLite database path | + | BANGUI_SESSION_SECRET | string | (required) | Session signing secret | + | BANGUI_FAIL2BAN_SOCKET | string | /var/run/fail2ban/fail2ban.sock | fail2ban socket | + ``` +2. Document each option with valid values and constraints +3. Organize by section (database, security, performance, etc.) +4. Cross-reference in README and deployment docs + +**Possible traps and issues**: +- Documentation can become stale as config options change +- Too much detail makes it hard to find what's needed + +**Docs changes needed**: +- Create comprehensive configuration reference +- Update README to link to it +- Add to API documentation + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "5.1, 5.5, 5.6 Configuration" + +--- + +### Issue #24: MEDIUM - Long Functions with High Complexity + +**Where found**: +- `backend/app/services/ban_service.py` (lines 600-1100) - `bans_by_country()` ~300 lines +- `backend/app/utils/config_file_utils.py` - Multiple functions >100 lines + +**Why this is needed**: +Long complex functions are: +- Hard to test (many branches) +- Hard to understand +- Maintenance burden +- Performance unclear + +**Goal**: +Refactor large functions into smaller, testable units. + +**What to do**: +1. Identify functions >100 lines +2. Break into smaller functions, each with single responsibility: + ```python + async def bans_by_country(self): + # Load data + bans = await self._load_bans_paginated() + + # Aggregate + by_country = self._aggregate_by_country(bans) + + # Enrich + enriched = await self._enrich_with_geo(by_country) + + # Sort and return + return self._sort_by_count(enriched) + ``` +3. Each smaller function is testable independently +4. Add unit tests for each piece + +**Possible traps and issues**: +- Breaking up functions might expose bugs that were hidden +- Performance might change (profile before/after) +- Error handling complexity might increase + +**Docs changes needed**: +- Add code complexity guidelines to style guide + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #19 "Long Functions" + +--- + +### Issue #25: MEDIUM - Incomplete Type Hints in Error Handling + +**Where found**: +- `backend/app/main.py` (line 283) +- Error metadata uses `dict[str, str | int | list[str]]` instead of TypedDict + +**Why this is needed**: +Generic types don't enable proper type narrowing in exception handlers. Code can't safely access error fields. + +**Goal**: +Use TypedDict for type-safe error responses. + +**What to do**: +1. Define error response types: + ```python + class ErrorResponse(TypedDict): + error_id: str + timestamp: int + message: str + tracebacks: list[str] + correlation_id: str + ``` +2. Use in exception handlers +3. Type checker can verify correct field access + +**Possible traps and issues**: +- TypedDict is Python 3.8+ only +- Need to maintain multiple error response types + +**Docs changes needed**: +- Add type safety guidelines + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #21 "Incomplete Type Hints" + +--- + +### Issue #26: MEDIUM - Hardcoded Constants Not Configurable + +**Where found**: +- `backend/app/utils/constants.py` +- MAX_PAGE_SIZE = 1000 +- BLOCKLIST_PREVIEW_MAX_LINES = 100 +- HISTORY_RETENTION_DAYS = 90 + +**Why this is needed**: +Different deployments have different needs: +- Large deployment might want smaller pages +- User might want different preview size +- Some want longer history retention + +**Goal**: +Make limits configurable via environment variables. + +**What to do**: +1. Move constants to config: + ```python + class Settings(BaseSettings): + max_page_size: int = Field(default=1000, env="BANGUI_MAX_PAGE_SIZE") + blocklist_preview_max_lines: int = Field(default=100, env="BANGUI_PREVIEW_MAX_LINES") + history_retention_days: int = Field(default=90, env="BANGUI_HISTORY_RETENTION") + ``` +2. Validate ranges (max_page_size > 0, < 10000) +3. Update .env.example with all options +4. Document in configuration guide + +**Possible traps and issues**: +- Too many configuration options can be overwhelming +- Some limits have dependencies (page_size < max_records) + +**Docs changes needed**: +- Add to configuration reference + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #20 "Hardcoded Constants" + +--- + +### Issue #27: MEDIUM - Inconsistent Error Handling Patterns + +**Where found**: +- `ban_service.py` - Raises exceptions +- `server_service.py` - Returns defaults silently +- `jail_service.py` - Mixed approach + +**Why this is needed**: +Different services have different error handling contracts. Callers don't know what to expect. + +**Goal**: +Establish clear error handling contract for all services. + +**What to do**: +1. Document error handling patterns: + ```python + class ServiceErrorContract: + """ + ABORT_ON_ERROR: Raise exception, let router handle + RETURN_DEFAULT: Return empty result, log warning + PARTIAL_RESULT: Return partial success with error list + """ + ``` +2. Each service method documents which pattern it follows +3. Routers handle errors consistently +4. Update service protocols + +**Possible traps and issues**: +- Users of service must know pattern +- Switching patterns is breaking change + +**Docs changes needed**: +- Create service development guide with error patterns + +**Doc references**: +- DETAILED_FINDINGS.md - Issue "Inconsistent Error Handling" + +--- + +## LOWER PRIORITY ISSUES (LOW-MEDIUM) + +--- + +### Issue #28: LOW-MEDIUM - Missing Pre-Commit Hooks + +**Where found**: +- No `.pre-commit-config.yaml` +- Docs mention husky but no `.husky/` directory + +**Why this is needed**: +Without pre-commit hooks, developers commit code that fails linting/tests, slowing down CI. + +**Goal**: +Enforce code quality checks before commit. + +**What to do**: +1. Create `.pre-commit-config.yaml`: + ```yaml + repos: + - repo: https://github.com/pre-commit/pre-commit-hooks + hooks: + - id: check-yaml + - id: end-of-file-fixer + - repo: https://github.com/astral-sh/ruff-pre-commit + hooks: + - id: ruff + - id: ruff-format + ``` +2. Setup husky for frontend +3. Document in CONTRIBUTING.md + +**Docs changes needed**: +- Create CONTRIBUTING.md with setup instructions + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.1 Missing Pre-Commit Hooks" + +--- + +### Issue #29: LOW-MEDIUM - Missing .editorconfig + +**Where found**: +- No `.editorconfig` file + +**Why this is needed**: +Different developers use different editors with different default formatting, causing inconsistent code. + +**Goal**: +Enforce consistent formatting across all editors. + +**What to do**: +1. Create `.editorconfig`: + ```ini + root = true + + [*] + charset = utf-8 + end_of_line = lf + insert_final_newline = true + + [*.py] + indent_style = space + indent_size = 4 + + [*.{js,ts,tsx,jsx}] + indent_style = space + indent_size = 2 + ``` +2. Add editorconfig plugin to IDE guides + +**Docs changes needed**: +- Add to development setup instructions + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.3 Missing .editorconfig" + +--- + +### Issue #30: LOW-MEDIUM - IPv4-Mapped IPv6 Address Duplicates + +**Where found**: +- `backend/app/utils/ip_utils.py` +- Treats "192.168.1.1" and "::ffff:192.168.1.1" as different IPs + +**Why this is needed**: +Same IP can be banned twice in different formats, causing: +- Duplicate ban logs +- Geo cache duplicates +- Analytics skewed + +**Goal**: +Normalize IP addresses to canonical form. + +**What to do**: +1. Add normalization: + ```python + def normalize_ip(ip_str: str) -> str: + ip = ipaddress.ip_address(ip_str) + # Convert IPv4-mapped IPv6 to IPv4 + if isinstance(ip, ipaddress.IPv6Address) and ip.ipv4_mapped: + return str(ip.ipv4_mapped) + return str(ip) + ``` +2. Apply on all IP inputs (ban, import, etc.) +3. Test with various formats + +**Docs changes needed**: +- Document IP normalization + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #22 "IPv4-Mapped IPv6" + +--- + +### Issue #31: LOW-MEDIUM - Weak Master Password Validation + +**Where found**: +- `backend/app/models/setup.py` (line 22) +- Requires uppercase, digit, special char but no dictionary check + +**Why this is needed**: +Passwords can still be weak (e.g., "Password1!" is common). + +**Goal**: +Prevent common passwords. + +**What to do**: +1. Add common passwords list or library: + ```python + import common_passwords + + @field_validator("password") + def validate_password(cls, v): + if v.lower() in common_passwords.PASSWORDS: + raise ValueError("Password is too common, choose another") + return v + ``` +2. Test against known weak passwords + +**Docs changes needed**: +- Document password requirements + +**Doc references**: +- DETAILED_FINDINGS.md - Issue #23 "Weak Password Validation" + +--- + +### Issue #32: LOW-MEDIUM - Missing Accessibility Features + +**Where found**: +- `frontend/src/components/BanTable.tsx` - No aria-label on table +- `frontend/src/pages/HistoryPage.tsx` - Button has tabIndex but no onKeyDown handler +- World map missing alt text + +**Why this is needed**: +Application not usable by screen reader users or keyboard-only navigation. + +**Goal**: +Improve accessibility to WCAG AA compliance. + +**What to do**: +1. Add ARIA labels to major components +2. Implement keyboard navigation handlers +3. Test with screen readers +4. Check color contrast ratios + +**Docs changes needed**: +- Add accessibility guidelines + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "11 Accessibility" + +--- + +### Issue #33: LOW - Missing Architecture Decision Records (ADRs) + +**Where found**: +- No `Docs/adr/` directory + +**Why this is needed**: +New developers don't understand architectural choices, recreate debates, make wrong assumptions. + +**Goal**: +Document important decisions and their rationale. + +**What to do**: +1. Create `Docs/adr/` directory +2. Add ADRs for major decisions: + - Why SQLite instead of PostgreSQL? + - Why FastAPI instead of Django? + - Why React instead of Vue? + - Why APScheduler instead of Celery? + - Why single-instance scheduler? + +**Docs changes needed**: +- Create ADR template and examples + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 Missing ADRs" + +--- + +### Issue #34: LOW - No Troubleshooting Guide + +**Where found**: +- Missing `Docs/TROUBLESHOOTING.md` + +**Why this is needed**: +Users can't self-serve on common issues, create issues instead. + +**Goal**: +Document common problems and solutions. + +**What to do**: +1. Create `Docs/TROUBLESHOOTING.md` with: + - "502 Bad Gateway" - backend is down or not ready + - "Permission denied" - database directory not writable + - "fail2ban not found" - socket path wrong + - "Geo lookups empty" - GeoLite2 database missing + - "Rate limited (429)" - too many requests +2. Expand based on real user issues + +**Docs changes needed**: +- Create comprehensive troubleshooting guide + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.3 No Troubleshooting" + +--- + +### Issue #35: LOW - Missing Upgrade/Migration Guide + +**Where found**: +- No `Docs/UPGRADING.md` + +**Why this is needed**: +Users don't know how to safely upgrade without losing data. + +**Goal**: +Document upgrade process and breaking changes. + +**What to do**: +1. Create `Docs/UPGRADING.md` with: + - Backup procedure + - Breaking changes for each version + - Step-by-step upgrade procedure + - Rollback procedure if something goes wrong + +**Docs changes needed**: +- Create upgrade guide for each major version + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.5 No Migration Guide" + +--- + +### Issue #36: LOW - No Backup Strategy Documented + +**Where found**: +- No backup procedure in deployment docs +- No automated backup in Docker Compose + +**Why this is needed**: +Users don't know how to protect their data. + +**Goal**: +Document and automate database backups. + +**What to do**: +1. Create `Docs/BACKUP_RESTORE.md` +2. Add backup script to Docker +3. Document retention policy +4. Document restore procedure + +**Docs changes needed**: +- Create backup & restore guide + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "10.4 No Backup Strategy" + +--- + +### Issue #37: LOW - Missing CONTRIBUTING.md + +**Where found**: +- `fail2ban-master/CONTRIBUTING.md` is from fail2ban, not BanGUI + +**Why this is needed**: +Contributors don't know project guidelines. + +**Goal**: +Document contribution guidelines. + +**What to do**: +1. Create `CONTRIBUTING.md` with: + - Development setup + - Branch naming conventions + - PR requirements + - Code style guidelines + - Testing requirements + - PR review process + +**Docs changes needed**: +- Create CONTRIBUTING.md + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.5 No CONTRIBUTING.md" + +--- + +### Issue #38: LOW - No Test Coverage Minimum Enforced + +**Where found**: +- `backend/pyproject.toml` - Coverage report generated but no minimum threshold +- CI doesn't fail on low coverage + +**Why this is needed**: +Code quality can degrade as coverage drops. + +**Goal**: +Enforce minimum test coverage. + +**What to do**: +1. Set minimum coverage threshold in CI (e.g., 80%) +2. Fail build if coverage drops below threshold +3. Add coverage badge to README + +**Docs changes needed**: +- Document testing requirements + +**Doc references**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "12.6 Test Coverage Not Enforced" + +--- + +## DOCUMENTATION GAPS (Cross-Cutting) + +--- + +### Issue #39: DOCUMENTATION - Missing API Reference + +**Files affected**: All routers + +**Create**: Comprehensive API reference documenting: +- All endpoints +- Request/response formats +- Status codes +- Examples + +**References**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "8.1 Missing API Documentation" + +--- + +### Issue #40: DOCUMENTATION - Missing Deployment Best Practices + +**Files affected**: `Docs/Deployment.md`, Docker configuration + +**Create/Update**: +- Security best practices +- Performance tuning +- Monitoring setup +- Scaling guidelines + +**References**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "6 Build & Deployment" + +--- + +### Issue #41: DOCUMENTATION - Missing Database Schema Documentation + +**Create**: Document: +- All tables and their purpose +- Relationships and constraints +- Indexes and performance notes +- Migration history + +**References**: +- DATABASE_API_DEPLOYMENT_ISSUES.md - Issue "1 Database Design" diff --git a/Docs/runner.csx b/Docs/runner.csx index 95c9d06..c26643a 100644 --- a/Docs/runner.csx +++ b/Docs/runner.csx @@ -48,12 +48,12 @@ async Task RunCopilot(IEnumerable extraArgs, string prompt) { var output = new StringBuilder(); - var argList = new List { "--model", "auto", "--allow-all-tools" }; + var argList = new List { "launch", "copilot", "--model", "minimax-m2.7:cloud", "--yes", "--", "--allow-all-tools" }; argList.AddRange(extraArgs); argList.Add("-p"); argList.Add(prompt); - var psi = new ProcessStartInfo("copilot") + var psi = new ProcessStartInfo("ollama") { WorkingDirectory = repoRoot, RedirectStandardOutput = true,