10) Implement explicit startup DAG for resource initialization
- Created StartupDAG class to orchestrate startup stages with explicit dependencies - Defined 6 startup stages: WORKER_MODE → DATABASE → GEO_CACHE → HTTP_SESSION → SCHEDULER → TASKS - Each stage has prerequisites, error handling, and rollback support - Refactored startup_shared_resources() to use the DAG - Added StartupContext for resource tracking and failure management - Partial failures automatically roll back all completed resources in reverse order - Added health checks to verify all resources initialized successfully - Comprehensive test coverage: 15 DAG unit tests + 3 integration tests + 6 existing tests - Documented startup DAG in Architekture.md with detailed stage descriptions and failure modes This replaces implicit ordering with explicit dependency tracking, making lifecycle changes safe and failure modes predictable. Hidden order dependencies no longer exist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -292,6 +292,98 @@ blocklist_service.py (Public API)
|
||||
- Logging is contextual and tied to the appropriate layer
|
||||
- Retry logic and transient error handling are isolated
|
||||
|
||||
#### Startup DAG (`app/startup_dag.py`, `app/startup.py`)
|
||||
|
||||
The startup process is orchestrated by an explicit **Directed Acyclic Graph (DAG)** that defines all resource initialization stages, their dependencies, health checks, and rollback strategy. This replaces implicit ordering with explicit, documented prerequisites.
|
||||
|
||||
**Why This Exists:**
|
||||
|
||||
Previously, startup resources were created in a procedural sequence without documented dependencies. If a stage was reordered or a prerequisite was missed, initialization could fail in non-obvious ways. Partial failures could leave stale resources (open database connections, HTTP sessions, running schedulers) that prevented clean rollback.
|
||||
|
||||
**Startup Stages (in order):**
|
||||
|
||||
```
|
||||
1. WORKER_MODE
|
||||
└─ Validates that BANGUI_WORKERS=1 (scheduler cannot run in multiple workers)
|
||||
|
||||
2. DATABASE
|
||||
├─ Prerequisite: WORKER_MODE
|
||||
├─ Creates database directory
|
||||
├─ Initializes database schema
|
||||
├─ Caches setup completion state
|
||||
└─ Loads persisted runtime settings
|
||||
|
||||
3. GEO_CACHE
|
||||
├─ Prerequisite: DATABASE
|
||||
├─ Loads IP geolocation cache from database
|
||||
├─ Counts unresolved IPs
|
||||
├─ Initializes MaxMind GeoLite2 database
|
||||
└─ Configures HTTP fallback (if enabled)
|
||||
|
||||
4. HTTP_SESSION
|
||||
├─ Prerequisite: GEO_CACHE
|
||||
├─ Creates aiohttp.ClientSession
|
||||
└─ Configures timeouts and connection limits
|
||||
|
||||
5. SCHEDULER
|
||||
├─ Prerequisite: HTTP_SESSION
|
||||
├─ Creates APScheduler AsyncIOScheduler
|
||||
└─ Starts the scheduler
|
||||
|
||||
6. TASKS
|
||||
├─ Prerequisite: SCHEDULER
|
||||
├─ Registers health_check task (fail2ban connectivity probe)
|
||||
├─ Registers blocklist_import task (scheduled imports)
|
||||
├─ Registers geo_cache_cleanup task (stale entry purge)
|
||||
├─ Registers geo_cache_flush task (periodic persistence)
|
||||
├─ Registers geo_re_resolve task (stale record re-resolution)
|
||||
├─ Registers history_sync task (ban history sync)
|
||||
└─ Registers session_cleanup task (expired session purge)
|
||||
```
|
||||
|
||||
**Failure Mode & Rollback:**
|
||||
|
||||
If any stage fails:
|
||||
|
||||
1. All completed stages are rolled back **in reverse order** (Tasks → Scheduler → HTTP_SESSION → GEO_CACHE → DATABASE → WORKER_MODE)
|
||||
2. Each rollback suppresses exceptions to ensure all resources are cleaned up
|
||||
3. Database connections are closed
|
||||
4. HTTP sessions are closed
|
||||
5. The scheduler is shut down
|
||||
6. The application startup fails with a clear error message
|
||||
|
||||
**Health Checks:**
|
||||
|
||||
After all stages complete, a final health check verifies:
|
||||
- All resources have initialized successfully
|
||||
- Resources pass their individual health_check() methods
|
||||
- No failures occurred during any stage
|
||||
|
||||
**Implementation:**
|
||||
|
||||
- **StartupDAG**: Orchestrates the entire flow, manages prerequisites, and handles failures
|
||||
- **StartupStage**: Enum defining the 6 startup stages
|
||||
- **StageDependency**: Defines stage metadata (description, prerequisites, rollback policy)
|
||||
- **StartupContext**: Tracks registered resources, completed stages, and failure state
|
||||
- **startup_shared_resources()**: Main entry point that builds and executes the DAG
|
||||
- **_stage_*()**: Functions that implement each stage's initialization logic
|
||||
|
||||
**Example Usage in Tests:**
|
||||
|
||||
```python
|
||||
# Test that a stage with missing prerequisites fails
|
||||
dag = StartupDAG()
|
||||
dag.register_stage(StartupStage.HTTP_SESSION, "Create HTTP session",
|
||||
prerequisites=frozenset([StartupStage.DATABASE]))
|
||||
dag.register_stage(StartupStage.SCHEDULER, "Create scheduler")
|
||||
|
||||
async def http_session_func():
|
||||
return aiohttp.ClientSession()
|
||||
|
||||
# This will raise RuntimeError because DATABASE hasn't completed
|
||||
await dag.execute_stage(StartupStage.HTTP_SESSION, http_session_func)
|
||||
```
|
||||
|
||||
#### Mappers (`app/mappers/`)
|
||||
|
||||
The response mapping layer. Mappers convert domain models (returned by services) to response models (consumed by HTTP routers). This layer enforces the separation between business logic and API shape.
|
||||
|
||||
Reference in New Issue
Block a user