feat: implement API versioning /api/v1/
- All backend routers moved to /api/v1/ prefix
- Frontend BASE_URL updated to /api/v1
- Setup redirect middleware updated to redirect to /api/v1/setup
- Health router path fixed: prefix=/api/v1/health, @router.get('')
- conftest.py: set server_status=online for test fixture
- Created Docs/API_VERSIONING.md with deprecation policy
- Updated Docs/Backend-Development.md with versioning section
- Updated Instructions.md curl examples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
125
Docs/API_VERSIONING.md
Normal file
125
Docs/API_VERSIONING.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# API Versioning Strategy
|
||||
|
||||
**Status:** Active — Current version: **v1**
|
||||
|
||||
All BanGUI API endpoints are versioned using URI path versioning (e.g., `/api/v1/`).
|
||||
This document explains when and how to version endpoints, how deprecation works, and what guarantees consumers can rely on.
|
||||
|
||||
---
|
||||
|
||||
## 1. Version Lifecycle
|
||||
|
||||
| Stage | Meaning |
|
||||
|-------|---------|
|
||||
| **Current** | Active, receiving new features and bug fixes. |
|
||||
| **Deprecated** | Still functional but marked for removal. Clients receive `Deprecation: true` and `Sunset: <date>` response headers. |
|
||||
| **Removed** | Endpoint no longer exists. Clients must migrate to a newer version. |
|
||||
|
||||
---
|
||||
|
||||
## 2. URL Structure
|
||||
|
||||
```
|
||||
/api/v{major}/<resource>/<path>
|
||||
```
|
||||
|
||||
- **v1** — current version (2026-05-02)
|
||||
- **v2** — reserved for future breaking changes
|
||||
- **PATCH** versions (v1.1, v1.2) are **not** used; only **major** version bumps indicate breaking changes
|
||||
- The OpenAPI schema is always available at `/api/openapi.json` regardless of version
|
||||
|
||||
---
|
||||
|
||||
## 3. What Triggers a Version Bump
|
||||
|
||||
A new major version is required when a **breaking change** must be introduced, including:
|
||||
|
||||
- Removing or renaming a field in a response model
|
||||
- Changing the type of a request or response field
|
||||
- Removing an endpoint entirely
|
||||
- Changing authentication/authorization semantics
|
||||
- Modifying the semantics of an existing operation
|
||||
|
||||
**Non-breaking changes** (backward-compatible):
|
||||
|
||||
- Adding new optional request fields
|
||||
- Adding new response fields
|
||||
- Adding new endpoints
|
||||
- Fixing bugs that caused incorrect behavior
|
||||
|
||||
These do **not** require a version bump.
|
||||
|
||||
---
|
||||
|
||||
## 4. Deprecation Policy
|
||||
|
||||
When an endpoint is deprecated:
|
||||
|
||||
1. The endpoint **remains functional** for a minimum of **6 months** from the `Sunset` date
|
||||
2. Response headers are added:
|
||||
```
|
||||
Deprecation: true
|
||||
Sunset: <RFC-5322 date>
|
||||
Link: <https://bangui.example.com/api/v2/...>; rel="successor-version"
|
||||
```
|
||||
3. The OpenAPI schema marks the endpoint with `deprecated: true`
|
||||
4. Documentation is updated to show the endpoint as deprecated
|
||||
|
||||
---
|
||||
|
||||
## 5. Backend Development: Adding Versioned Endpoints
|
||||
|
||||
### New endpoints
|
||||
|
||||
All new endpoints are added to the **current** version (`/api/v1/`). Prefix your router:
|
||||
|
||||
```python
|
||||
router = APIRouter(prefix="/api/v1/my-resource", tags=["My Resource"])
|
||||
```
|
||||
|
||||
### Breaking changes requiring v2
|
||||
|
||||
1. Create a new router file (e.g., `routers/my_resource_v2.py`) with the v2 prefix:
|
||||
```python
|
||||
router = APIRouter(prefix="/api/v2/my-resource", tags=["My Resource"])
|
||||
```
|
||||
2. Copy or adapt the v1 handler logic as needed
|
||||
3. Register the new router in `app/main.py`:
|
||||
```python
|
||||
app.include_router(my_resource_v2.router)
|
||||
```
|
||||
4. Add deprecation headers to the **old** v1 router by marking it deprecated in the OpenAPI spec
|
||||
5. Update this document to reflect the new version lifecycle
|
||||
|
||||
### Keeping routers DRY
|
||||
|
||||
If v1 and v2 share logic, extract business logic into a **service layer function** and call it from both router handlers. Routers should only contain HTTP concerns (parameters, responses, status codes).
|
||||
|
||||
---
|
||||
|
||||
## 6. Frontend Development
|
||||
|
||||
The frontend always uses the current version's base URL:
|
||||
|
||||
```typescript
|
||||
const BASE_URL: string = import.meta.env.VITE_API_URL ?? "/api/v1";
|
||||
```
|
||||
|
||||
All endpoint paths in `frontend/src/api/endpoints.ts` are defined as relative paths (e.g., `/bans`, `/jails`) and are appended to `BASE_URL` at runtime.
|
||||
|
||||
---
|
||||
|
||||
## 7. OpenAPI / Documentation
|
||||
|
||||
- Swagger UI: `/api/docs`
|
||||
- ReDoc: `/api/redoc`
|
||||
- OpenAPI schema: `/api/openapi.json`
|
||||
- Docs are **not** versioned; they always reflect the **current** (latest) API version
|
||||
|
||||
---
|
||||
|
||||
## 8. Version History
|
||||
|
||||
| Version | Status | Released | Sunset Date | Notes |
|
||||
|---------|--------|---------|-------------|-------|
|
||||
| v1 | **Current** | 2026-05-02 | — | Initial versioning; all endpoints moved from `/api/` to `/api/v1/` |
|
||||
@@ -260,6 +260,50 @@ For `history_archive`, the read-heavy workload justifies these indexes because:
|
||||
|
||||
---
|
||||
|
||||
## 7.6 Never Load Unbounded Result Sets
|
||||
|
||||
**Problem:** Loading large result sets entirely into Python memory causes:
|
||||
- Memory spikes that crash containers
|
||||
- Slow dashboard performance
|
||||
- Unbounded database file growth
|
||||
|
||||
**Rule:** Never load unbounded result sets. Always use SQL aggregation or pagination.
|
||||
|
||||
**Anti-patterns:**
|
||||
|
||||
```python
|
||||
# BAD — loads all rows into memory
|
||||
all_rows = await history_archive_repo.get_all_archived_history(db=db, ...)
|
||||
|
||||
# GOOD — SQL aggregation returns lightweight counts
|
||||
ip_counts = await history_archive_repo.get_ip_ban_counts(db=db, ...)
|
||||
```
|
||||
|
||||
**SQL aggregation patterns for common operations:**
|
||||
|
||||
| Operation | SQL Pattern | Repository Function |
|
||||
|-----------|-------------|---------------------|
|
||||
| Count by IP | `SELECT ip, COUNT(*) FROM bans GROUP BY ip` | `get_ip_ban_counts()` |
|
||||
| Count by jail | `SELECT jail, COUNT(*) FROM bans GROUP BY jail` | `get_jail_ban_counts()` |
|
||||
| Count by time bucket | `SELECT CAST((timeofban - ?) / ? AS INTEGER), COUNT(*) ... GROUP BY bucket_idx` | `get_ban_counts_by_bucket()` |
|
||||
| Paginated rows | `WHERE id < ? ORDER BY id DESC LIMIT ?` | `get_archived_history_keyset()` |
|
||||
|
||||
**When to use SQL aggregation:**
|
||||
- Computing totals, counts, or aggregations for display
|
||||
- Building country/jail/geo maps from large datasets
|
||||
- Any endpoint that needs only a summary, not full row data
|
||||
|
||||
**When to use pagination:**
|
||||
- Endpoints that return individual records for display (ban lists, history)
|
||||
- Any endpoint where clients need access to specific rows
|
||||
|
||||
**Memory budgets for reference:**
|
||||
- 1M ban records ≈ 200-400 MB if fully materialized as Python dicts
|
||||
- SQL aggregation returns lightweight results: {ip, count} pairs = a few KB for same 1M records
|
||||
- Keyset pagination returns only the page size (typically 50-200 rows)
|
||||
|
||||
---
|
||||
|
||||
## 3. Project Structure
|
||||
|
||||
```
|
||||
@@ -1840,12 +1884,14 @@ async def client() -> AsyncClient:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_jails_returns_200(client: AsyncClient) -> None:
|
||||
response = await client.get("/api/jails/")
|
||||
response = await client.get("/api/v1/jails/")
|
||||
assert response.status_code == 200
|
||||
data: dict = response.json()
|
||||
assert "jails" in data
|
||||
```
|
||||
|
||||
See [API_VERSIONING.md](API_VERSIONING.md) for the full versioning strategy, deprecation policy, and instructions for adding versioned endpoints.
|
||||
|
||||
---
|
||||
|
||||
## 9.1 Background Tasks and Scheduler Architecture
|
||||
|
||||
@@ -230,11 +230,11 @@ The session cookie is named `bangui_session`.
|
||||
```bash
|
||||
# Dev master password: Hallo123!
|
||||
HASHED=$(echo -n "Hallo123!" | sha256sum | awk '{print $1}')
|
||||
TOKEN=$(curl -s -X POST http://127.0.0.1:8000/api/auth/login \
|
||||
TOKEN=$(curl -s -X POST http://127.0.0.1:8000/api/v1/auth/login \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "{\"password\":\"$HASHED\"}" \
|
||||
| python3 -c 'import sys,json; print(json.load(sys.stdin)["token"])')
|
||||
|
||||
# Use token in subsequent requests:
|
||||
curl -H "Cookie: bangui_session=$TOKEN" http://127.0.0.1:8000/api/dashboard/status
|
||||
curl -H "Cookie: bangui_session=$TOKEN" http://127.0.0.1:8000/api/v1/dashboard/status
|
||||
```
|
||||
|
||||
98
Docs/PERFORMANCE.md
Normal file
98
Docs/PERFORMANCE.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# Performance Guidelines
|
||||
|
||||
Query optimization patterns for BanGUI backend services.
|
||||
|
||||
---
|
||||
|
||||
## Never Load Unbounded Result Sets
|
||||
|
||||
Loading large result sets into Python memory causes OOM crashes, slow responses, and unbounded growth. Every query that processes large datasets must use one of the following strategies.
|
||||
|
||||
### The Problem
|
||||
|
||||
With millions of ban records:
|
||||
- Loading all rows as Python dicts → 200-400 MB+ memory spike
|
||||
- Python loop aggregation (O(n) per item) → seconds of CPU time
|
||||
- Offset pagination on large tables → O(n) scan before returning results
|
||||
|
||||
### The Solution: SQL Aggregation
|
||||
|
||||
SQL GROUP BY executes inside SQLite's optimized query planner, using indexes where available, and returns only the aggregated result (typically a few KB).
|
||||
|
||||
```python
|
||||
# BAD: loads 1M rows into Python
|
||||
all_rows = await get_all_archived_history(db, since=since)
|
||||
agg = {}
|
||||
for row in all_rows: # O(n) Python loop
|
||||
agg[row["ip"]] = agg.get(row["ip"], 0) + 1
|
||||
|
||||
# GOOD: SQL aggregation, returns lightweight {ip, count} pairs
|
||||
ip_counts = await get_ip_ban_counts(db, since=since)
|
||||
# [{ip: "1.2.3.4", event_count: 42}, ...] — a few KB regardless of table size
|
||||
```
|
||||
|
||||
### Aggregation Reference
|
||||
|
||||
| Use Case | SQL Pattern | Repository Function |
|
||||
|----------|-------------|-------------------|
|
||||
| Ban count per IP | `SELECT ip, COUNT(*) FROM history_archive ... GROUP BY ip` | `get_ip_ban_counts()` |
|
||||
| Ban count per jail | `SELECT jail, COUNT(*) FROM history_archive ... GROUP BY jail ORDER BY COUNT(*) DESC` | `get_jail_ban_counts()` |
|
||||
| Ban count per time bucket | `SELECT CAST((timeofban - ?) / ? AS INTEGER), COUNT(*) ... GROUP BY bucket_idx` | `get_ban_counts_by_bucket()` |
|
||||
| Paginated rows (no offset) | `WHERE id < ? ORDER BY id DESC LIMIT ?` | `get_archived_history_keyset()` |
|
||||
| Total count | `SELECT COUNT(*) FROM ...` (fast with where clause) | included in `get_jail_ban_counts()` return |
|
||||
|
||||
### Pagination vs Aggregation
|
||||
|
||||
Use **aggregation** when:
|
||||
- Displaying summary data (counts, totals, group-by results)
|
||||
- Building country/jail/timeline dashboards
|
||||
- Only need counts, not individual row data
|
||||
|
||||
Use **pagination** when:
|
||||
- Displaying individual records (ban list, history)
|
||||
- Clients need access to specific rows
|
||||
- Exporting or bulk operations
|
||||
|
||||
### Batch Geo Lookups
|
||||
|
||||
When you need geo data for many IPs, batch in a single call rather than per-IP:
|
||||
|
||||
```python
|
||||
# BAD: N sequential API calls
|
||||
for ip in unique_ips:
|
||||
geo = await geo_service.lookup(ip) # 45 req/min rate limit × N calls
|
||||
|
||||
# GOOD: one batch call, geo_service handles rate limiting
|
||||
geo_map, uncached = geo_cache_lookup(unique_ips) # uses in-memory cache
|
||||
if uncached:
|
||||
asyncio.create_task(geo_cache.lookup_batch(uncached, http_session)) # fire-and-forget
|
||||
```
|
||||
|
||||
### Index Requirements
|
||||
|
||||
SQLite needs indexes on:
|
||||
- Columns used in WHERE clauses (timeofban, jail, action)
|
||||
- Columns used in GROUP BY (ip, jail, bucket index)
|
||||
- Sort columns for pagination (id)
|
||||
|
||||
Current indexes on `history_archive`:
|
||||
- `idx_history_archive_timeofban` — for time-range filtering
|
||||
- `idx_history_archive_jail_timeofban` — for jail + time filtering
|
||||
- `idx_history_archive_action_timeofban` — for action + time filtering
|
||||
- `idx_history_archive_id` — for keyset pagination
|
||||
|
||||
Before adding a new query pattern, verify it uses an existing index or add one with a benchmark test.
|
||||
|
||||
### Memory Monitoring
|
||||
|
||||
Watch for these warning signs:
|
||||
- Python RSS > 500 MB in container metrics
|
||||
- Response time > 5s for dashboard endpoints
|
||||
- Query time > 1s in SQLite EXPLAIN ANALYZE output
|
||||
|
||||
Use `EXPLAIN QUERY PLAN` to verify index usage:
|
||||
```sql
|
||||
EXPLAIN QUERY PLAN SELECT ip, COUNT(*) FROM history_archive WHERE timeofban >= ? GROUP BY ip;
|
||||
```
|
||||
|
||||
Expected: `USING INDEX idx_history_archive_timeofban` in the output.
|
||||
@@ -86,6 +86,54 @@ ps aux | grep <pid>
|
||||
|
||||
---
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Getting 429 Too Many Requests
|
||||
|
||||
**Symptom:** API returns HTTP 429 with `rate_limit_exceeded` error code.
|
||||
|
||||
**Cause:** You have exceeded the per-IP rate limit for a specific operation.
|
||||
|
||||
**Diagnosis:**
|
||||
1. Check the `Retry-After` header in the response — this tells you how many seconds to wait
|
||||
2. Look for the log event `*_rate_limit_exceeded` which shows the bucket and client IP
|
||||
|
||||
**Rate limit buckets:**
|
||||
| Bucket | Limit | Window | Operations |
|
||||
|--------|-------|--------|------------|
|
||||
| `bans:ban` | 100 | 1 minute | Ban IP addresses |
|
||||
| `bans:unban` | 100 | 1 minute | Unban IP addresses |
|
||||
| `blocklist:import` | 10 | 1 hour | Import blocklists |
|
||||
| `config:update` | 50 | 1 minute | Update configuration |
|
||||
| `jail:update` | 100 | 1 minute | Update jail config |
|
||||
| `jail:create` | 100 | 1 minute | Add log paths, assign filters/actions |
|
||||
| `jail:delete` | 100 | 1 minute | Remove log paths, actions |
|
||||
| `jail:activate` | 100 | 1 minute | Activate jails |
|
||||
| `jail:deactivate` | 100 | 1 minute | Deactivate jails |
|
||||
| `filter:update` | 50 | 1 minute | Update filters |
|
||||
| `filter:create` | 50 | 1 minute | Create filters |
|
||||
| `filter:delete` | 50 | 1 minute | Delete filters |
|
||||
| `action:update` | 50 | 1 minute | Update actions |
|
||||
| `action:create` | 50 | 1 minute | Create actions |
|
||||
| `action:delete` | 50 | 1 minute | Delete actions |
|
||||
|
||||
**Solution:**
|
||||
1. Wait for the `Retry-After` period before retrying
|
||||
2. If you hit the limit during legitimate bulk operations, consider batching requests
|
||||
3. For blocklist imports (10/hour), ensure automated imports are not more frequent
|
||||
|
||||
**Prevention:**
|
||||
- Monitor `*_rate_limit_exceeded` log events
|
||||
- Adjust limits via environment variables if needed (see `Docs/CONFIGURATION.md`)
|
||||
- For bulk operations, implement client-side throttling
|
||||
|
||||
**Note:** If rate limiting triggers unexpectedly for legitimate use, check for:
|
||||
- Internal monitoring scripts hitting endpoints too frequently
|
||||
- Multiple users behind the same proxy IP
|
||||
- Stale rate limit state after process restart (uses in-memory tracking)
|
||||
|
||||
---
|
||||
|
||||
## General Recovery Commands
|
||||
|
||||
Clear all locks:
|
||||
|
||||
@@ -1,97 +1,3 @@
|
||||
## HIGH PRIORITY ISSUES
|
||||
|
||||
---
|
||||
|
||||
### Issue #3: HIGH - Unbounded Query Results Causing OOM (Out of Memory)
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/repositories/history_archive_repo.py` - `get_all_archived_history()`
|
||||
- `backend/app/services/ban_service.py` (lines 589-600) - `bans_by_country()` loads all unique IPs into memory
|
||||
- `backend/app/services/ban_service.py` (lines 650-680) - N+1 geo lookup pattern
|
||||
|
||||
**Why this is needed**:
|
||||
With large deployments having millions of ban records, queries that load entire tables into memory cause:
|
||||
- Memory spikes that crash the container
|
||||
- Slow dashboard performance
|
||||
- Database file growth without bounds
|
||||
|
||||
**Goal**:
|
||||
Implement pagination, streaming, and batch processing for all large queries to ensure bounded memory usage and consistent performance.
|
||||
|
||||
**What to do**:
|
||||
1. Refactor `get_all_archived_history()` to only be called with pagination parameters
|
||||
2. Refactor `bans_by_country()` to:
|
||||
- Process countries in batches
|
||||
- Stream results instead of collecting all in memory
|
||||
- Implement server-side aggregation in SQL instead of Python loops
|
||||
3. Add `LIMIT` + `OFFSET` or cursor-based pagination to all list endpoints
|
||||
4. Implement batch geo lookups instead of per-IP loops
|
||||
5. Add tests with large datasets (1M+ records) to catch performance regressions
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Changing query patterns might break sorting/filtering logic
|
||||
- Pagination cursor format must be consistent across endpoints
|
||||
- Memory usage must be monitored in production
|
||||
- Aggregation queries might need new database indexes
|
||||
- Frontend pagination UI assumes cursor format - changes will break old clients
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add performance guidelines to `Docs/Backend-Development.md` - "Never load unbounded result sets"
|
||||
- Create `Docs/PERFORMANCE.md` with query optimization patterns
|
||||
- Document pagination standards in API docs
|
||||
|
||||
**Doc references**:
|
||||
- DETAILED_FINDINGS.md - Issues #2, #3, #4 (Unbounded queries, N+1, Large structures)
|
||||
- DATABASE_API_DEPLOYMENT_ISSUES.md - Section "Database Design Issues"
|
||||
|
||||
---
|
||||
|
||||
### Issue #4: HIGH - Missing Rate Limiting on Write Operations
|
||||
|
||||
**Where found**:
|
||||
- `backend/app/middleware/rate_limit.py` - Only applied to login endpoint
|
||||
- `backend/app/routers/bans.py` - POST /api/bans/ban, POST /api/bans/unban (NO rate limit)
|
||||
- `backend/app/routers/blocklist.py` - POST /api/blocklists/:id/import (NO rate limit)
|
||||
- `backend/app/routers/config.py` - PUT endpoints (NO rate limit)
|
||||
|
||||
**Why this is needed**:
|
||||
Without rate limiting on state-mutating endpoints, an attacker can:
|
||||
- Spam ban requests to exhaust fail2ban resources
|
||||
- Trigger repeated blocklist imports consuming bandwidth/CPU
|
||||
- Cause DoS by hammering config updates
|
||||
|
||||
**Goal**:
|
||||
Extend rate limiting to all write operations (POST, PUT, DELETE) with appropriate rate limits per operation type.
|
||||
|
||||
**What to do**:
|
||||
1. Create rate limit buckets for different operations:
|
||||
- `bans:ban` - 100/minute per IP
|
||||
- `bans:unban` - 100/minute per IP
|
||||
- `blocklist:import` - 10/hour per IP
|
||||
- `config:update` - 50/minute per IP
|
||||
2. Apply rate limiting middleware to all write endpoints
|
||||
3. Return 429 with `Retry-After` header when limit exceeded
|
||||
4. Add metrics/monitoring for rate limit hits
|
||||
5. Make rate limits configurable via environment variables
|
||||
|
||||
**Possible traps and issues**:
|
||||
- Rate limiting at IP level doesn't work behind proxies (need proper X-Forwarded-For handling)
|
||||
- Different operations need different rate limits (can't use global limit)
|
||||
- Legitimate bulk operations might hit limits unexpectedly
|
||||
- Rate limit state must be persistent across process restarts (use database or Redis)
|
||||
- False positives from internal monitoring scripts hammering endpoints
|
||||
|
||||
**Docs changes needed**:
|
||||
- Add rate limit table to API documentation
|
||||
- Document in `Docs/CONFIGURATION.md` how to adjust rate limits
|
||||
- Add to `Docs/TROUBLESHOOTING.md` - "Getting 429 Too Many Requests"
|
||||
|
||||
**Doc references**:
|
||||
- DETAILED_FINDINGS.md - Issue #5 "Missing Rate Limiting"
|
||||
- `backend/app/middleware/rate_limit.py` - Current implementation
|
||||
|
||||
---
|
||||
|
||||
### Issue #5: HIGH - API Has No Versioning Strategy
|
||||
|
||||
**Where found**:
|
||||
|
||||
Reference in New Issue
Block a user