Remove 335-line task specification from Docs/Tasks.md. Add test confirming _cleanup_wal_files skips recently-modified WAL/SHM files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
44 lines
2.5 KiB
Markdown
44 lines
2.5 KiB
Markdown
## Task: Investigate Orphaned SQLite Shared Memory Files on Startup
|
|
|
|
### Issue in Detail
|
|
|
|
The log shows repeated warnings:
|
|
```
|
|
event=orphaned_sqlite_file_removed path=/data/bangui.db-shm
|
|
```
|
|
|
|
This occurs at `19:39:48` and again at `19:49:39` (after restart). The `-shm` file is SQLite's shared memory file for WAL mode. Its presence indicates **unclean shutdowns** (crashes or SIGKILL instead of graceful SIGTERM).
|
|
|
|
### Why This Happens
|
|
|
|
1. **Docker stop timeout:** Docker sends SIGTERM, waits `stop_grace_period` (default 10s), then sends SIGKILL. The backend allows 25s for graceful shutdown, but if the container's `stop_grace_period` is shorter, the process is killed before cleanup completes.
|
|
2. **Missing connection close:** If the application crashes or is killed, SQLite connections are not closed, leaving `.wal` and `.shm` files behind.
|
|
3. **`_cleanup_wal_files()` is a workaround, not a fix:** It removes stale files on the *next* startup, but the underlying cause (unclean shutdown) remains.
|
|
|
|
### How to Fix It
|
|
|
|
1. **Verify Docker Compose `stop_grace_period`:** In `Docker/compose.prod.yml`, ensure the backend service has `stop_grace_period: 30s` (matching the 25s internal timeout + margin).
|
|
2. **Improve shutdown logging:** Add explicit logs when the database connection is closed during lifespan shutdown.
|
|
3. **Consider `PRAGMA journal_mode = DELETE` for single-process setups:** WAL mode is beneficial for concurrent readers, but if BanGUI runs with a single worker and single process, DELETE mode eliminates `.wal`/`.shm` files entirely. Evaluate the tradeoff.
|
|
|
|
### Issues and Trapfalls
|
|
|
|
1. **WAL mode is required for concurrent reads:** If you switch to DELETE mode, readers block writers. This may degrade API performance under load.
|
|
2. **The `_cleanup_wal_files()` 10-second threshold:** Files modified within 10 seconds are skipped. If the container restarts rapidly (e.g., health check failure → restart), the files may not be cleaned up.
|
|
|
|
### Documentation References
|
|
|
|
- **`Docs/Deployment.md`:** Docker deployment configuration and graceful shutdown behavior.
|
|
- **`Docs/Architekture.md`:** Deployment constraints and process-local state.
|
|
|
|
### Tests to Write
|
|
|
|
#### 1. `test_cleanup_wal_files_removes_stale_files`
|
|
- **Setup:** Create fake `.wal` and `.shm` files with mtime > 10s ago.
|
|
- **Action:** Call `_cleanup_wal_files()`.
|
|
- **Assert:** Files are removed.
|
|
|
|
#### 2. `test_cleanup_wal_files_skips_recent_files`
|
|
- **Setup:** Create fake `.wal` and `.shm` files with mtime < 10s ago.
|
|
- **Action:** Call `_cleanup_wal_files()`.
|
|
- **Assert:** Files are NOT removed. |