docs: Add security best practices to Deployment.md
- Secrets management via environment variables - Container security hardening (non-root user, filesystem permissions, capabilities) - Network security and TLS termination guidance - Prune obsolete task tracking from Tasks.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -437,3 +437,486 @@ See `Docs/DATABASE_MIGRATIONS.md` for full recovery procedures.
|
||||
- **Development**: Run `make up` to start with default limits
|
||||
- **Staging**: Test with realistic data volumes and monitor resource usage
|
||||
- **Production**: Adjust limits based on observed usage patterns, then commit changes
|
||||
|
||||
---
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Secrets Management
|
||||
|
||||
**Never hard-code secrets.** All secrets must be injected at runtime via environment variables.
|
||||
|
||||
| Secret | Purpose | Generation |
|
||||
|--------|---------|------------|
|
||||
| `BANGUI_SESSION_SECRET` | Signs session cookies | `python -c 'import secrets; print(secrets.token_hex(32))'` |
|
||||
| fail2ban credentials | jail config access | From fail2ban configuration |
|
||||
|
||||
- Store secrets in a secrets manager (e.g., Docker secrets, Kubernetes Secrets, HashiCorp Vault)
|
||||
- Rotate `BANGUI_SESSION_SECRET` periodically — sessions become invalid, users must re-login
|
||||
- Never log or expose session secrets
|
||||
|
||||
### Container Security Hardening
|
||||
|
||||
**Non-root user**: Backend runs as `bangui:bangui` (UID 1000). Frontend runs as nginx default. This limits container breakout damage.
|
||||
|
||||
**Filesystem permissions**:
|
||||
```bash
|
||||
# Data directory (SQLite DB) — only bangui user rw
|
||||
chmod 700 /data
|
||||
chown 1000:1000 /data
|
||||
|
||||
# Config directory — read-only for backend (it reads fail2ban config)
|
||||
# Write access only for config management operations via BanGUI
|
||||
chmod 755 /config
|
||||
```
|
||||
|
||||
**Capabilities**: fail2ban container requires `NET_ADMIN` and `NET_RAW` for raw socket manipulation and iptables interaction. No additional capabilities needed for app containers.
|
||||
|
||||
**No privileged mode**: BanGUI containers must not run `--privileged`. The fail2ban container needs only specific capabilities, not full host access.
|
||||
|
||||
### Network Security
|
||||
|
||||
- **Internal network only**: All BanGUI containers communicate on `bangui-net`. Only the frontend port (default 8080) is exposed to the host.
|
||||
- **fail2ban socket**: Mounted read-only (`ro`) from host — backend reads status only
|
||||
- **fail2ban config**: Mounted read-write — BanGUI modifies jail configurations as requested
|
||||
- **Drop traffic between containers**: Use Docker network isolation to prevent lateral movement:
|
||||
```yaml
|
||||
networks:
|
||||
bangui-net:
|
||||
driver: bridge
|
||||
internal: false # Allow external only for frontend
|
||||
```
|
||||
|
||||
### TLS / HTTPS
|
||||
|
||||
BanGUI does not terminate TLS. Handle TLS at the reverse proxy or load balancer level:
|
||||
|
||||
**Nginx (existing frontend container)**:
|
||||
```nginx
|
||||
server {
|
||||
listen 443 ssl http2;
|
||||
server_name bangui.example.com;
|
||||
|
||||
ssl_certificate /etc/ssl/certs/bangui.crt;
|
||||
ssl_certificate_key /etc/ssl/private/bangui.key;
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
|
||||
|
||||
# Proxy to existing frontend container
|
||||
location / {
|
||||
proxy_pass http://bangui-frontend:80;
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Security headers** (already in nginx.conf):
|
||||
- CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy
|
||||
- Uncomment HSTS header when HTTPS is fully configured
|
||||
|
||||
**HTTP to HTTPS redirect**: Add in your TLS terminator:
|
||||
```nginx
|
||||
server {
|
||||
listen 80;
|
||||
server_name bangui.example.com;
|
||||
return 301 https://$host$request_uri;
|
||||
}
|
||||
```
|
||||
|
||||
### Dependency Scanning
|
||||
|
||||
Scan base images for vulnerabilities regularly:
|
||||
|
||||
```bash
|
||||
# Trivy (Docker/Podman compatible)
|
||||
trivy image python:3.12-slim
|
||||
trivy image nginx:1.27-alpine
|
||||
trivy image node:22-alpine
|
||||
|
||||
# CI integration
|
||||
trivy image --exit-code 1 --severity HIGH,CRITICAL git.lpl-mind.de/lukas.pupkalipinski/bangui/backend:latest
|
||||
```
|
||||
|
||||
Update base images quarterly or when CVEs are published.
|
||||
|
||||
### Rate Limiting at Deployment Level
|
||||
|
||||
The application-level rate limiter (`BANGUI_RATE_LIMIT_*` env vars) handles API requests. Add deployment-level protection:
|
||||
|
||||
**Nginx** (existing reverse proxy):
|
||||
```nginx
|
||||
# Limit concurrent connections per IP
|
||||
limit_conn_zone $binary_remote_addr zone=conn_limit:10m;
|
||||
server {
|
||||
limit_conn conn_limit 100;
|
||||
}
|
||||
```
|
||||
|
||||
**Fail2ban** (already running):
|
||||
- BanGUI manages fail2ban jails
|
||||
- Additional deployment-level rate limits should target infrastructure endpoints (SSH, management UIs), not BanGUI itself
|
||||
|
||||
### Audit Logging
|
||||
|
||||
All authentication events are logged via structlog:
|
||||
|
||||
| Event | Log Key | Severity |
|
||||
|-------|---------|----------|
|
||||
| Login success | `auth_login_success` | INFO |
|
||||
| Login failure | `auth_login_failure` | WARNING |
|
||||
| Session created | `session_created` | INFO |
|
||||
| Session destroyed | `session_destroyed` | INFO |
|
||||
| Session expired | `session_expired` | INFO |
|
||||
|
||||
Forward these logs to a SIEM or log aggregator for security monitoring. See [Structured Logging](#structured-logging) below.
|
||||
|
||||
---
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### SQLite Performance
|
||||
|
||||
SQLite is single-writer. Under write-heavy load (blocklist imports, history writes), writes may queue.
|
||||
|
||||
**WAL mode** (default, do not disable):
|
||||
```
|
||||
PRAGMA journal_mode=WAL; -- Already enabled by default
|
||||
```
|
||||
|
||||
**Synchronous mode** for production:
|
||||
```
|
||||
PRAGMA synchronous=NORMAL; -- Balanced (not FULL, not OFF)
|
||||
```
|
||||
This survives process crashes without corruption while maintaining good write performance.
|
||||
|
||||
**Cache size** (increase for production):
|
||||
```bash
|
||||
# In-memory cache: 64MB (adjust based on available RAM)
|
||||
PRAGMA cache_size=-65536; -- negative = KB
|
||||
```
|
||||
|
||||
**temp_store** for large sorts:
|
||||
```
|
||||
PRAGMA temp_store=MEMORY;
|
||||
```
|
||||
|
||||
**Read performance**:
|
||||
- Most reads are point queries by IP or jail name — indexes handle this efficiently
|
||||
- Large history scans (dashboard) — paginate, use `LIMIT/OFFSET`
|
||||
- Avoid `SELECT *` on large tables — always specify needed columns
|
||||
|
||||
### Gzip Compression
|
||||
|
||||
Already enabled in nginx.conf. Verify effective compression:
|
||||
```bash
|
||||
curl -H "Accept-Encoding: gzip" -I http://localhost:8080/api/v1/dashboard/status
|
||||
# Should show: Content-Encoding: gzip
|
||||
```
|
||||
|
||||
### Backend Performance
|
||||
|
||||
**Startup warm-up**: On first request after start, caches are cold. First blocklist query may be slower. This is normal — subsequent requests hit cache.
|
||||
|
||||
**Memory tuning**:
|
||||
```yaml
|
||||
# docker-compose.yml — increase if OOM
|
||||
backend:
|
||||
deploy:
|
||||
limits:
|
||||
memory: 1024M # Up from 512M for large blocklists
|
||||
```
|
||||
|
||||
**Single worker enforced**: The session cache is process-local. Multiple workers would cause random logouts. This is intentional — scale horizontally via orchestration, not vertically via workers.
|
||||
|
||||
### Frontend Performance
|
||||
|
||||
**Static asset caching** (already configured):
|
||||
```
|
||||
location /assets/ {
|
||||
expires 1y;
|
||||
add_header Cache-Control "public, immutable";
|
||||
}
|
||||
```
|
||||
|
||||
**Bundle size**: Production build uses esbuild minification. Monitor with:
|
||||
```bash
|
||||
du -sh frontend/dist/
|
||||
ls -lh frontend/dist/assets/*.js
|
||||
```
|
||||
|
||||
### Database Maintenance
|
||||
|
||||
**Periodic checkpoint** (production, monthly or after large blocklist imports):
|
||||
```bash
|
||||
sqlite3 /data/bangui.db "PRAGMA wal_checkpoint(FULL);"
|
||||
```
|
||||
|
||||
**Analyze for query planner** (after bulk inserts/deletes):
|
||||
```bash
|
||||
sqlite3 /data/bangui.db "ANALYZE;"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Setup
|
||||
|
||||
### Health Check Endpoint
|
||||
|
||||
`GET /api/v1/health` — primary monitoring target.
|
||||
|
||||
| Status | HTTP Code | Meaning |
|
||||
|--------|-----------|---------|
|
||||
| `ok` | 200 | All components healthy |
|
||||
| `degraded` | 200 | Some components unhealthy — investigate |
|
||||
| `unavailable` | 503 | fail2ban unreachable — container will be restarted |
|
||||
|
||||
### Structured Logging
|
||||
|
||||
All logs are structured (JSON via structlog). Key fields:
|
||||
|
||||
| Log field | Description |
|
||||
|-----------|-------------|
|
||||
| `event` | Event name (e.g., `auth_login_success`) |
|
||||
| `request_id` | Per-request correlation ID |
|
||||
| `user_id` | Session user (if authenticated) |
|
||||
| `duration_ms` | Request duration |
|
||||
| `component` | Component name (e.g., `scheduler`, `database`) |
|
||||
|
||||
**Log levels**:
|
||||
|
||||
| Level | Use |
|
||||
|-------|-----|
|
||||
| DEBUG | Detailed debugging (query SQL, cache hits) |
|
||||
| INFO | Operational events (startup, shutdown, login, ban action) |
|
||||
| WARNING | Recoverable issues (cache miss, lock contention) |
|
||||
| ERROR | Failures requiring attention (DB error, fail2ban offline) |
|
||||
|
||||
**Configure via env**:
|
||||
```
|
||||
BANGUI_LOG_LEVEL=info # debug, info, warning, error
|
||||
```
|
||||
|
||||
### Log Aggregation
|
||||
|
||||
**Docker Compose** — forward container logs to aggregator:
|
||||
```yaml
|
||||
services:
|
||||
backend:
|
||||
logging:
|
||||
driver: "json-file"
|
||||
options:
|
||||
max-size: "10m"
|
||||
max-file: "3"
|
||||
```
|
||||
|
||||
**External aggregators**:
|
||||
```yaml
|
||||
# Fluentd example
|
||||
services:
|
||||
backend:
|
||||
logging:
|
||||
driver: fluentd
|
||||
options:
|
||||
fluentd-address localhost:24224
|
||||
tag bangui-backend
|
||||
```
|
||||
|
||||
**ELK Stack** — send JSON logs directly to Logstash or via Filebeat.
|
||||
|
||||
### Metrics to Monitor
|
||||
|
||||
| Metric | Source | Alert Threshold |
|
||||
|--------|--------|----------------|
|
||||
| Health check failures | `/api/v1/health` | 3 consecutive → container restart |
|
||||
| Backend memory | `docker stats` | >450M (of 512M limit) |
|
||||
| Backend CPU | `docker stats` | >80% sustained |
|
||||
| Disk usage (`/data`) | `df -h` | >80% |
|
||||
| fail2ban container restarts | `docker ps` | >2/hour |
|
||||
| Backend container restarts | `docker ps` | >2/hour |
|
||||
| Database file size | `ls -lh /data/bangui.db` | Grows >10MB/day indicates issue |
|
||||
| Session count | `/api/v1/sessions` | Sudden drop indicates cache issue |
|
||||
| Blocklist import duration | Logs (`blocklist_import_completed`) | >5 minutes may indicate performance issue |
|
||||
|
||||
### Uptime Monitoring
|
||||
|
||||
**External checks**:
|
||||
- Monitor `https://your-domain.com/api/v1/health` from multiple geographic locations
|
||||
- Use services: Better Uptime, UptimeRobot, Pingdom
|
||||
- Alert on: HTTP 503, HTTP 200 + `degraded` status, connection timeout
|
||||
|
||||
### Alerting
|
||||
|
||||
**Critical (PagerDuty / immediate)**:
|
||||
- Health check HTTP 503 for >30 seconds
|
||||
- Backend OOM kill (exit code 137)
|
||||
- fail2ban offline for >5 minutes
|
||||
|
||||
**Warning (Slack / email)**:
|
||||
- Health check returns `degraded`
|
||||
- Disk usage >80%
|
||||
- Memory usage >450M
|
||||
- Backend restarts >2/hour
|
||||
|
||||
---
|
||||
|
||||
## Scaling Guidelines
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
BanGUI is **designed for horizontal scaling** via container orchestration (not multiple workers):
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Load Balancer │
|
||||
│ (nginx, HAProxy, Traefik) │
|
||||
└──────────────────┬─────────────────────────────┘
|
||||
│
|
||||
┌─────────────┼─────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ Backend │ │ Backend │ │ Backend │
|
||||
│ (inst 1) │ │ (inst 2) │ │ (inst 3) │
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
||||
│ │ │
|
||||
└────────────┼────────────┘
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Scheduler │
|
||||
│ Lock (DB) │ ← Only one instance runs jobs
|
||||
└───────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ SQLite │
|
||||
│ (shared fs) │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
**How it works**:
|
||||
- Scheduler lock ensures only one instance runs background jobs
|
||||
- Session cache is per-instance — use sticky sessions at load balancer, OR configure `BANGUI_SESSION_CACHE=redis` for shared sessions
|
||||
- SQLite on shared storage — use network file system (NFS, GlusterFS) or block storage (AWS EBS)
|
||||
|
||||
### Stateless Design
|
||||
|
||||
For true stateless scaling without sticky sessions, migrate session cache to Redis:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
backend:
|
||||
environment:
|
||||
- BANGUI_SESSION_CACHE=redis
|
||||
- BANGUI_REDIS_URL=redis://redis:6379/0
|
||||
depends_on:
|
||||
redis:
|
||||
condition: service_healthy
|
||||
|
||||
redis:
|
||||
image: docker.io/library/redis:7-alpine
|
||||
deploy:
|
||||
limits:
|
||||
cpus: '0.5'
|
||||
memory: 256M
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Sessions shared across all instances → no sticky sessions needed
|
||||
- Load balancer can distribute freely
|
||||
- Scales linearly
|
||||
|
||||
Trade-offs:
|
||||
- Redis is another dependency to monitor
|
||||
- Redis persistence required for session survival across Redis restarts
|
||||
- Redis failure causes mass logouts
|
||||
|
||||
### Database Scaling
|
||||
|
||||
SQLite does not support read replicas. Scaling reads is limited.
|
||||
|
||||
**Read scaling** (if needed):
|
||||
- Cache aggressively — BanGUI caches blocklist data in-memory
|
||||
- Add read-only views for dashboard queries
|
||||
- Consider periodic snapshot exports to separate read-optimized store
|
||||
|
||||
**Write scaling**:
|
||||
- Single writer only — SQLite WAL helps but doesn't parallelize writes
|
||||
- If write throughput becomes a bottleneck, consider:
|
||||
- Periodic batching (already used for blocklist imports)
|
||||
- Sharding by jail (separate DB per jail) — architectural change
|
||||
- Migration to PostgreSQL — significant effort
|
||||
|
||||
### CDN for Static Assets
|
||||
|
||||
For large-scale deployments, serve `/assets/` from a CDN:
|
||||
|
||||
```nginx
|
||||
# Replace /assets/ proxy with CDN origin
|
||||
location /assets/ {
|
||||
proxy_pass https://your-cdn.cloudfront.net/assets/;
|
||||
proxy_cache_valid 1y;
|
||||
add_header Cache-Control "public, immutable";
|
||||
}
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Reduces frontend container load
|
||||
- Assets served from edge locations close to users
|
||||
- Reduces bandwidth costs
|
||||
|
||||
### Autoscaling
|
||||
|
||||
**Docker Swarm**: Use the `labels` + `update_config` pattern for rolling updates. Autoscaling requires external metrics (Prometheus + VPA or similar).
|
||||
|
||||
**Kubernetes**: HorizontalPodAutoscaler (HPA) based on CPU/memory:
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: bangui-backend
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: bangui-backend
|
||||
minReplicas: 2
|
||||
maxReplicas: 10
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
- type: Resource
|
||||
resource:
|
||||
name: memory
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 80
|
||||
```
|
||||
|
||||
### Load Balancer Configuration
|
||||
|
||||
**Health check**:
|
||||
```yaml
|
||||
# HAProxy example
|
||||
backend-check:
|
||||
option httpchk GET /api/v1/health
|
||||
http-check expect status 200
|
||||
```
|
||||
|
||||
**Sticky sessions** (if NOT using Redis):
|
||||
```yaml
|
||||
# HAProxy
|
||||
appsession _SESSION_ID len 64 timeout 24h
|
||||
```
|
||||
|
||||
**Connection limits**:
|
||||
```yaml
|
||||
# Per-backend limit to prevent overload
|
||||
server backend1 backend:8000 maxconn 50
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
Reference in New Issue
Block a user