# Deployment Guide

## Health Checks

The backend container includes a health check endpoint at `GET /api/health` that reports application and fail2ban daemon status:

- **HTTP 200** with `{"status": "ok", "fail2ban": "online"}` — backend is healthy and fail2ban is reachable
- **HTTP 503** with `{"status": "unavailable", "fail2ban": "offline"}` — fail2ban is unreachable (backend will restart)

**Docker Health Check:**

The Dockerfile includes a HEALTHCHECK that queries the endpoint. Docker interprets HTTP 503 as unhealthy and restarts the container after 3 consecutive failures (90 seconds by default).

**Why 503 for offline fail2ban?**

If fail2ban goes offline but the backend always returns 200, Docker treats the container as healthy. This can mask infrastructure failures. By returning 503 when fail2ban is unreachable, orchestration tools (Docker, Kubernetes, Docker Swarm) will automatically restart the backend container until fail2ban recovers.

---


## Resource Allocation

All containers have hard limits (max usage) and soft reservations (guaranteed allocation). This ensures:
- **Isolation**: A misbehaving container cannot crash others or the host
- **Predictability**: Reservations guarantee minimum resources even under load
- **Efficiency**: Unused reserved capacity can be borrowed by other containers

### Container Resource Limits

| Container | Limit CPU | Limit Memory | Reserved CPU | Reserved Memory | Purpose |
|-----------|-----------|--------------|--------------|-----------------|---------|
| **fail2ban** | 0.5 | 128M | 0.1 | 64M | Monitors logs, bans IPs—typically idle |
| **backend** | 2.0 | 512M | 1.0 | 256M | Core app: database, fail2ban API, config management |
| **frontend** | 0.5 | 128M | 0.25 | 64M | Nginx: serves SPA + API proxy |

### Rationale

- **fail2ban**: Lightweight log monitoring. Occasionally CPU spikes during ban processing but memory usage is minimal.
- **backend**: Heavy lifting—Python runtime, SQLite database, background jobs. May need extra memory for large blocklists. Reservation of 1.0 CPU ensures responsive API even when frontend is busy.
- **frontend**: Nginx is efficient. Limit of 0.5 CPU and 128M memory is more than sufficient for reverse proxy duties.

---

## Memory Considerations

### Backend Memory Requirements

The backend typically runs in 256–512M under normal load. Memory usage depends on:
- **Blocklist size**: Large blocklists (>1M entries) require more heap space
- **Cache warmth**: First query after startup may require more memory as caches fill
- **Concurrent connections**: Each active user session uses a small amount of memory

**Tuning:** If you see OOM kills in logs, increase backend limits and reservations (e.g., 1024M limit). Test under realistic load before finalizing.

### Frontend Memory Usage

Nginx is typically <50M. If you see memory pressure on frontend, check for:
- Misconfigured cache headers on static assets
- Large log volumes (nginx access logs)

---

## Docker Swarm & Kubernetes

For production deployments using orchestration platforms:

### Docker Swarm

The `deploy` sections in `docker-compose.yml` are compatible with `docker stack deploy`:

```bash
docker stack deploy -c Docker/docker-compose.yml bangui
```

Swarm respects the same `limits` and `reservations` fields.

### Kubernetes

For Kubernetes, translate resource constraints to equivalent `resources` fields in your deployment manifests:

```yaml
containers:
  - name: backend
    image: git.lpl-mind.de/lukas.pupkalipinski/bangui/backend:latest
    resources:
      limits:
        cpu: "2"
        memory: "512Mi"
      requests:
        cpu: "1"
        memory: "256Mi"
```

Kubernetes equivalent mappings:
- Docker `deploy.limits` → Kubernetes `resources.limits`
- Docker `deploy.reservations` → Kubernetes `resources.requests`

---

## Monitoring Resource Usage

### Docker Compose (Development)

```bash
docker stats
```

Shows real-time CPU and memory usage for all running containers.

### Production (Docker Swarm / Kubernetes)

Use native monitoring:
- **Docker Swarm**: Prometheus + Grafana
- **Kubernetes**: Metrics Server + dashboard or Prometheus

---

## Environment Variables

Resource limits are configured in `Docker/docker-compose.yml` and cannot be overridden via environment variables. To adjust limits:

1. Edit `Docker/docker-compose.yml`
2. Modify the `deploy.limits` and `deploy.reservations` sections
3. Restart containers: `make down && make up`

---

## Troubleshooting

| Issue | Symptom | Solution |
|-------|---------|----------|
| Backend OOM kills | "Exit code 137" in logs | Increase backend `memory` limit |
| Throttling | CPU at 100%, requests slow | Increase CPU limit or optimize code |
| Service startup timeout | Containers not becoming "healthy" | Increase reservation to guarantee capacity at startup |
| Host unresponsive | System-wide lag | Reduce container limits to prevent host starvation |

---

## Next Steps

- **Development**: Run `make up` to start with default limits
- **Staging**: Test with realistic data volumes and monitor resource usage
- **Production**: Adjust limits based on observed usage patterns, then commit changes