diff --git a/data/config.json b/data/config.json index 7479b05..ab01155 100644 --- a/data/config.json +++ b/data/config.json @@ -17,7 +17,7 @@ "keep_days": 30 }, "other": { - "master_password_hash": "$pbkdf2-sha256$29000$8P7fG2MspVRqLaVUyrn3Pg$e0HxlEoo7eAfETUFCi7G4/0egtE.Foqsf9eR69Dg6a0" + "master_password_hash": "$pbkdf2-sha256$29000$cC6FsJayNmYsZezdW6tVyg$5LMyYrqVoM0qwxugSedT6UFMnLHePg2atdECBxAVJEk" }, "version": "1.0.0" } \ No newline at end of file diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 6bb6b7e..89c68c8 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -141,6 +141,78 @@ Source: [src/config/settings.py](../src/config/settings.py#L1-L96) --- +## 11. Graceful Shutdown + +The application implements a comprehensive graceful shutdown mechanism that ensures data integrity and proper cleanup when the server is stopped via Ctrl+C (SIGINT) or SIGTERM. + +### 11.1 Shutdown Sequence + +``` +1. SIGINT/SIGTERM received + +-- Uvicorn catches signal + +-- Stops accepting new requests + +2. FastAPI lifespan shutdown triggered + +-- 30 second total timeout + +3. WebSocket shutdown (5s timeout) + +-- Broadcast {"type": "server_shutdown"} to all clients + +-- Close each connection with code 1001 (Going Away) + +-- Clear connection tracking data + +4. Download service stop (10s timeout) + +-- Set shutdown flag + +-- Persist active download as "pending" in database + +-- Cancel active download task + +-- Shutdown ThreadPoolExecutor with wait + +5. Progress service cleanup + +-- Clear event subscribers + +-- Clear active progress tracking + +6. Database cleanup (10s timeout) + +-- SQLite: Run PRAGMA wal_checkpoint(TRUNCATE) + +-- Dispose async engine + +-- Dispose sync engine + +7. Process exits cleanly +``` + +Source: [src/server/fastapi_app.py](../src/server/fastapi_app.py#L142-L210) + +### 11.2 Key Components + +| Component | File | Shutdown Method | +| ------------------- | ------------------------------------------------------------------------------------------ | ------------------------ | +| WebSocket Service | [websocket_service.py](../src/server/services/websocket_service.py) | `shutdown(timeout=5.0)` | +| Download Service | [download_service.py](../src/server/services/download_service.py) | `stop(timeout=10.0)` | +| Database Connection | [connection.py](../src/server/database/connection.py) | `close_db()` | +| Uvicorn Config | [run_server.py](../run_server.py) | `timeout_graceful_shutdown=30` | +| Stop Script | [stop_server.sh](../stop_server.sh) | SIGTERM with fallback | + +### 11.3 Data Integrity Guarantees + +1. **Active downloads preserved**: In-progress downloads are saved as "pending" and can resume on restart. + +2. **Database WAL flushed**: SQLite WAL checkpoint ensures all writes are in the main database file. + +3. **WebSocket clients notified**: Clients receive shutdown message before connection closes. + +4. **Thread pool cleanup**: Background threads complete or are gracefully cancelled. + +### 11.4 Manual Stop + +```bash +# Graceful stop via script (sends SIGTERM, waits up to 30s) +./stop_server.sh + +# Or press Ctrl+C in terminal running the server +``` + +Source: [stop_server.sh](../stop_server.sh#L1-L80) + +--- + ## 3. Component Interactions ### 3.1 Request Flow (REST API) diff --git a/docs/instructions.md b/docs/instructions.md index 637a9fa..9420c38 100644 --- a/docs/instructions.md +++ b/docs/instructions.md @@ -143,180 +143,92 @@ Currently, the application uses SQLAlchemy sessions with auto-commit behavior th --- -### Step 1: Create Transaction Utilities Module +## Task: Graceful Shutdown Implementation ✅ COMPLETED -**File**: `src/server/database/transaction.py` +### Objective -Create a new module providing transaction management utilities: +Implement proper graceful shutdown handling so that Ctrl+C (SIGINT) or SIGTERM triggers a clean shutdown sequence that terminates all concurrent processes and prevents database corruption. -1. **`@transactional` decorator** - Wraps a function in a transaction boundary +### Background - - Accepts a session parameter or retrieves one via dependency injection - - Commits on success, rolls back on exception - - Re-raises exceptions after rollback - - Logs transaction start, commit, and rollback events +The application runs multiple concurrent services (WebSocket connections, download service with ThreadPoolExecutor, database sessions) that need to be properly cleaned up during shutdown. Without graceful shutdown, active downloads may corrupt state, database writes may be incomplete, and WebSocket clients won't receive disconnect notifications. -2. **`TransactionContext` class** - Context manager for explicit transaction control +### Implementation Summary - - Supports `with` statement usage - - Provides `savepoint()` method for nested transactions using `begin_nested()` - - Handles commit/rollback automatically +The following components were implemented: -3. **`atomic()` function** - Async context manager for async operations - - Same behavior as `TransactionContext` but for async code +#### 1. WebSocket Service Shutdown ([src/server/services/websocket_service.py](src/server/services/websocket_service.py)) -**Interface Requirements**: +- Added `shutdown()` method to `ConnectionManager` that: + - Broadcasts `{"type": "server_shutdown"}` notification to all connected clients + - Gracefully closes each WebSocket connection with code 1001 (Going Away) + - Clears all connection tracking data structures + - Supports configurable timeout (default 5 seconds) +- Added `shutdown()` method to `WebSocketService` that delegates to the manager -- Decorator must work with both sync and async functions -- Must handle the case where session is already in a transaction -- Must support optional `propagation` parameter (REQUIRED, REQUIRES_NEW, NESTED) +#### 2. Download Service Stop ([src/server/services/download_service.py](src/server/services/download_service.py)) ---- +- Enhanced `stop()` method to: + - Persist active downloads back to "pending" status in database (allows resume on restart) + - Cancel active download tasks with proper timeout handling + - Shutdown ThreadPoolExecutor with `wait=True` and configurable timeout (default 10 seconds) + - Fall back to forced shutdown if timeout expires -### Step 2: Update Connection Module +#### 3. FastAPI Lifespan Shutdown ([src/server/fastapi_app.py](src/server/fastapi_app.py)) -**File**: `src/server/database/connection.py` +- Expanded shutdown sequence in proper order: + 1. Broadcast shutdown notification via WebSocket + 2. Stop download service and persist state + 3. Clean up progress service (clear subscribers and active progress) + 4. Close database connections with WAL checkpoint +- Added timeout protection (30 seconds total) with remaining time tracking +- Each step has individual timeout to prevent hanging -Modify the existing session management: +#### 4. Uvicorn Graceful Shutdown ([run_server.py](run_server.py)) -1. Add `get_transactional_session()` generator that does NOT auto-commit -2. Add `TransactionManager` class for manual transaction control -3. Keep `get_db_session()` unchanged for backward compatibility -4. Add session state inspection utilities (`is_in_transaction()`, `get_transaction_depth()`) +- Added `timeout_graceful_shutdown=30` parameter to uvicorn.run() +- Ensures uvicorn allows sufficient time for lifespan shutdown to complete +- Updated docstring to document Ctrl+C behavior ---- +#### 5. Stop Script ([stop_server.sh](stop_server.sh)) -### Step 3: Wrap Service Layer Operations +- Replaced `kill -9` (SIGKILL) with `kill -TERM` (SIGTERM) +- Added `wait_for_process()` function that waits up to 30 seconds for graceful shutdown +- Only falls back to SIGKILL if graceful shutdown times out +- Improved user feedback during shutdown process -**File**: `src/server/database/service.py` +#### 6. Database WAL Checkpoint ([src/server/database/connection.py](src/server/database/connection.py)) -Apply transaction handling to all compound write operations: +- Enhanced `close_db()` to run `PRAGMA wal_checkpoint(TRUNCATE)` for SQLite +- Ensures all pending WAL writes are flushed to main database file +- Prevents database corruption during shutdown -**AnimeService**: +### How Graceful Shutdown Works -- `create_anime_with_episodes()` - if exists, wrap in transaction -- Any method that calls multiple repository methods +1. **Ctrl+C or SIGTERM received** → uvicorn catches signal +2. **uvicorn triggers lifespan shutdown** → FastAPI's lifespan context manager exits +3. **WebSocket broadcast** → All connected clients receive shutdown notification +4. **Download service stops** → Active downloads persisted, executor shutdown +5. **Progress service cleanup** → Event subscribers cleared +6. **Database cleanup** → WAL checkpoint, connections disposed +7. **Process exits cleanly** → No data loss or corruption -**EpisodeService**: +### Testing -- `bulk_update_episodes()` - if exists -- `mark_episodes_downloaded()` - if handles multiple episodes +```bash +# Start server +conda run -n AniWorld python run_server.py -**DownloadQueueService**: - -- `add_batch_to_queue()` - if exists -- `clear_and_repopulate()` - if exists -- Any method performing multiple writes - -**SessionService**: - -- `rotate_session()` - delete old + create new must be atomic -- `cleanup_expired_sessions()` - bulk delete operation - -**Pattern to follow**: - -```python -@transactional -def compound_operation(self, session: Session, data: SomeModel) -> Result: - # Multiple write operations here - # All succeed or all fail +# Press Ctrl+C to trigger graceful shutdown +# Or use the stop script: +./stop_server.sh ``` ---- +### Verification -### Step 4: Update Queue Repository - -**File**: `src/server/services/queue_repository.py` - -Ensure atomic operations for: - -1. `save_item()` - check existence + insert/update must be atomic -2. `remove_item()` - if involves multiple deletes -3. `clear_all_items()` - bulk delete should be transactional -4. `reorder_queue()` - multiple position updates must be atomic +- All existing tests pass (websocket, download service, database transactions) +- WebSocket clients receive disconnect notification before connection closes +- Active downloads are preserved and can resume on restart +- SQLite WAL file is checkpointed before shutdown --- - -### Step 5: Update API Endpoints - -**Files**: `src/server/api/anime.py`, `src/server/api/downloads.py`, `src/server/api/auth.py` - -Review and update endpoints that perform multiple database operations: - -1. Identify endpoints calling multiple service methods -2. Wrap in transaction boundary at the endpoint level OR ensure services handle it -3. Prefer service-level transactions over endpoint-level for reusability - ---- - -### Step 6: Add Unit Tests - -**File**: `tests/unit/test_transactions.py` - -Create comprehensive tests: - -1. **Test successful transaction commit** - verify all changes persisted -2. **Test rollback on exception** - verify no partial writes -3. **Test nested transaction with savepoint** - verify partial rollback works -4. **Test decorator with sync function** -5. **Test decorator with async function** -6. **Test context manager usage** -7. **Test transaction propagation modes** - -**File**: `tests/unit/test_service_transactions.py` - -1. Test each service's compound operations for atomicity -2. Mock exceptions mid-operation to verify rollback -3. Verify no orphaned data after failed operations - ---- - -### Step 7: Update Integration Tests - -**File**: `tests/integration/test_db_transactions.py` - -1. Test real database transaction behavior -2. Test concurrent transaction handling -3. Test transaction isolation levels if applicable - ---- - -### Step 7: Update Dokumentation - -1. Check Docs folder and updated the needed files - ---- - -### Implementation Notes - -- **SQLAlchemy Pattern**: Use `session.begin_nested()` for savepoints -- **Error Handling**: Always log transaction failures with full context -- **Performance**: Transactions have overhead - don't wrap single operations unnecessarily -- **Testing**: Use `session.rollback()` in test fixtures to ensure clean state - -### Files to Modify - -| File | Action | -| ------------------------------------------- | ------------------------------------------ | -| `src/server/database/transaction.py` | CREATE - New transaction utilities | -| `src/server/database/connection.py` | MODIFY - Add transactional session support | -| `src/server/database/service.py` | MODIFY - Apply @transactional decorator | -| `src/server/services/queue_repository.py` | MODIFY - Ensure atomic operations | -| `src/server/api/anime.py` | REVIEW - Check for multi-write endpoints | -| `src/server/api/downloads.py` | REVIEW - Check for multi-write endpoints | -| `src/server/api/auth.py` | REVIEW - Check for multi-write endpoints | -| `tests/unit/test_transactions.py` | CREATE - Transaction unit tests | -| `tests/unit/test_service_transactions.py` | CREATE - Service transaction tests | -| `tests/integration/test_db_transactions.py` | CREATE - Integration tests | - -### Acceptance Criteria - -- [x] All database write operations use explicit transactions -- [x] Compound operations are atomic (all-or-nothing) -- [x] Exceptions trigger proper rollback -- [x] No partial writes occur on failures -- [x] All existing tests pass (1090 tests passing) -- [x] New transaction tests pass with >90% coverage (90% achieved) -- [x] Logging captures transaction lifecycle events -- [x] Documentation updated in DATABASE.md -- [x] Code follows project coding standards diff --git a/run_server.py b/run_server.py index 39f173c..ed82e76 100644 --- a/run_server.py +++ b/run_server.py @@ -2,7 +2,8 @@ """ Startup script for the Aniworld FastAPI application. -This script starts the application with proper logging configuration. +This script starts the application with proper logging configuration +and graceful shutdown support via Ctrl+C (SIGINT) or SIGTERM. """ import uvicorn @@ -15,6 +16,11 @@ if __name__ == "__main__": # Run the application with logging. # Only watch .py files in src/, explicitly exclude __pycache__. # This prevents reload loops from .pyc compilation. + # + # Graceful shutdown: + # - Ctrl+C (SIGINT) or SIGTERM triggers graceful shutdown + # - timeout_graceful_shutdown ensures shutdown completes within 30s + # - The FastAPI lifespan handler orchestrates cleanup in proper order uvicorn.run( "src.server.fastapi_app:app", host="127.0.0.1", @@ -24,4 +30,5 @@ if __name__ == "__main__": reload_includes=["*.py"], reload_excludes=["*/__pycache__/*", "*.pyc"], log_config=log_config, + timeout_graceful_shutdown=30, # Allow 30s for graceful shutdown ) diff --git a/src/server/database/connection.py b/src/server/database/connection.py index e00b776..511b182 100644 --- a/src/server/database/connection.py +++ b/src/server/database/connection.py @@ -150,11 +150,29 @@ async def init_db() -> None: async def close_db() -> None: """Close database connections and cleanup resources. + Performs a WAL checkpoint for SQLite databases to ensure all + pending writes are flushed to the main database file before + closing connections. This prevents database corruption during + shutdown. + Should be called during application shutdown. """ global _engine, _sync_engine, _session_factory, _sync_session_factory try: + # For SQLite: checkpoint WAL to ensure all writes are flushed + if _sync_engine and "sqlite" in str(_sync_engine.url): + logger.info("Running SQLite WAL checkpoint before shutdown...") + try: + from sqlalchemy import text + with _sync_engine.connect() as conn: + # TRUNCATE mode: checkpoint and truncate WAL file + conn.execute(text("PRAGMA wal_checkpoint(TRUNCATE)")) + conn.commit() + logger.info("SQLite WAL checkpoint completed") + except Exception as e: + logger.warning(f"WAL checkpoint failed (non-critical): {e}") + if _engine: logger.info("Closing async database engine...") await _engine.dispose() diff --git a/src/server/fastapi_app.py b/src/server/fastapi_app.py index 713177f..250d613 100644 --- a/src/server/fastapi_app.py +++ b/src/server/fastapi_app.py @@ -155,30 +155,81 @@ async def lifespan(_application: FastAPI): # Yield control to the application yield - # Shutdown - logger.info("FastAPI application shutting down") + # Shutdown - execute in proper order with timeout protection + logger.info("FastAPI application shutting down (graceful shutdown initiated)") - # Shutdown download service and its thread pool + # Define shutdown timeout (total time allowed for all shutdown operations) + SHUTDOWN_TIMEOUT = 30.0 + + import time + shutdown_start = time.monotonic() + + def remaining_time() -> float: + """Calculate remaining shutdown time.""" + elapsed = time.monotonic() - shutdown_start + return max(0.0, SHUTDOWN_TIMEOUT - elapsed) + + # 1. Broadcast shutdown notification via WebSocket + try: + ws_service = get_websocket_service() + logger.info("Broadcasting shutdown notification to WebSocket clients...") + await asyncio.wait_for( + ws_service.shutdown(timeout=min(5.0, remaining_time())), + timeout=min(5.0, remaining_time()) + ) + logger.info("WebSocket shutdown complete") + except asyncio.TimeoutError: + logger.warning("WebSocket shutdown timed out") + except Exception as e: # pylint: disable=broad-exception-caught + logger.error("Error during WebSocket shutdown: %s", e, exc_info=True) + + # 2. Shutdown download service and persist active downloads try: from src.server.services.download_service import ( # noqa: E501 _download_service_instance, ) if _download_service_instance is not None: logger.info("Stopping download service...") - await _download_service_instance.stop() + await asyncio.wait_for( + _download_service_instance.stop(timeout=min(10.0, remaining_time())), + timeout=min(15.0, remaining_time()) + ) logger.info("Download service stopped successfully") + except asyncio.TimeoutError: + logger.warning("Download service shutdown timed out") except Exception as e: # pylint: disable=broad-exception-caught logger.error("Error stopping download service: %s", e, exc_info=True) - # Close database connections + # 3. Cleanup progress service + try: + progress_service = get_progress_service() + logger.info("Cleaning up progress service...") + # Clear any active progress tracking and subscribers + progress_service._subscribers.clear() + progress_service._active_progress.clear() + logger.info("Progress service cleanup complete") + except Exception as e: # pylint: disable=broad-exception-caught + logger.error("Error cleaning up progress service: %s", e, exc_info=True) + + # 4. Close database connections with WAL checkpoint try: from src.server.database.connection import close_db - await close_db() + logger.info("Closing database connections...") + await asyncio.wait_for( + close_db(), + timeout=min(10.0, remaining_time()) + ) logger.info("Database connections closed") + except asyncio.TimeoutError: + logger.warning("Database shutdown timed out") except Exception as e: # pylint: disable=broad-exception-caught logger.error("Error closing database: %s", e, exc_info=True) - logger.info("FastAPI application shutdown complete") + elapsed_total = time.monotonic() - shutdown_start + logger.info( + "FastAPI application shutdown complete (took %.2fs)", + elapsed_total + ) # Initialize FastAPI app with lifespan diff --git a/src/server/services/download_service.py b/src/server/services/download_service.py index baee091..477f1eb 100644 --- a/src/server/services/download_service.py +++ b/src/server/services/download_service.py @@ -997,30 +997,76 @@ class DownloadService: """ logger.info("Download queue service initialized") - async def stop(self) -> None: - """Stop the download queue service and cancel active downloads. + async def stop(self, timeout: float = 10.0) -> None: + """Stop the download queue service gracefully. - Cancels any active download and shuts down the thread pool immediately. + Persists in-progress downloads back to pending state, cancels active + tasks, and shuts down the thread pool with a timeout. + + Args: + timeout: Maximum time (seconds) to wait for executor shutdown """ - logger.info("Stopping download queue service...") + logger.info("Stopping download queue service (timeout=%.1fs)...", timeout) - # Set shutdown flag + # Set shutdown flag first to prevent new downloads self._is_shutting_down = True self._is_stopped = True + # Persist active download back to pending state if one exists + if self._active_download: + logger.info( + "Persisting active download to pending: item_id=%s", + self._active_download.id + ) + try: + # Reset status to pending so it can be resumed on restart + self._active_download.status = DownloadStatus.PENDING + self._active_download.completed_at = None + await self._save_to_database(self._active_download) + logger.info("Active download persisted to database as pending") + except Exception as e: + logger.error("Failed to persist active download: %s", e) + # Cancel active download task if running active_task = self._active_download_task if active_task and not active_task.done(): logger.info("Cancelling active download task...") active_task.cancel() try: - await active_task + # Wait briefly for cancellation to complete + await asyncio.wait_for( + asyncio.shield(active_task), + timeout=2.0 + ) + except asyncio.TimeoutError: + logger.warning("Download task cancellation timed out") except asyncio.CancelledError: logger.info("Active download task cancelled") + except Exception as e: + logger.warning("Error during task cancellation: %s", e) - # Shutdown executor immediately, don't wait for tasks + # Shutdown executor with wait and timeout logger.info("Shutting down thread pool executor...") - self._executor.shutdown(wait=False, cancel_futures=True) + try: + # Run executor shutdown in thread to avoid blocking event loop + loop = asyncio.get_event_loop() + await asyncio.wait_for( + loop.run_in_executor( + None, + lambda: self._executor.shutdown(wait=True, cancel_futures=True) + ), + timeout=timeout + ) + logger.info("Thread pool executor shutdown complete") + except asyncio.TimeoutError: + logger.warning( + "Executor shutdown timed out after %.1fs, forcing shutdown", + timeout + ) + # Force shutdown without waiting + self._executor.shutdown(wait=False, cancel_futures=True) + except Exception as e: + logger.error("Error during executor shutdown: %s", e) logger.info("Download queue service stopped") diff --git a/src/server/services/websocket_service.py b/src/server/services/websocket_service.py index f45c320..39ef2f3 100644 --- a/src/server/services/websocket_service.py +++ b/src/server/services/websocket_service.py @@ -322,6 +322,85 @@ class ConnectionManager: connection_id=connection_id, ) + async def shutdown(self, timeout: float = 5.0) -> None: + """Gracefully shutdown all WebSocket connections. + + Broadcasts a shutdown notification to all clients, then closes + each connection with proper close codes. + + Args: + timeout: Maximum time (seconds) to wait for all closes to complete + """ + logger.info( + "Initiating WebSocket shutdown, connections=%d", + len(self._active_connections) + ) + + # Broadcast shutdown notification to all clients + shutdown_message = { + "type": "server_shutdown", + "timestamp": datetime.now(timezone.utc).isoformat(), + "data": { + "message": "Server is shutting down", + "reason": "graceful_shutdown", + }, + } + + try: + await self.broadcast(shutdown_message) + except Exception as e: + logger.warning("Failed to broadcast shutdown message: %s", e) + + # Close all connections gracefully + async with self._lock: + connection_ids = list(self._active_connections.keys()) + + close_tasks = [] + for connection_id in connection_ids: + websocket = self._active_connections.get(connection_id) + if websocket: + close_tasks.append( + self._close_connection_gracefully(connection_id, websocket) + ) + + if close_tasks: + # Wait for all closes with timeout + try: + await asyncio.wait_for( + asyncio.gather(*close_tasks, return_exceptions=True), + timeout=timeout + ) + except asyncio.TimeoutError: + logger.warning( + "WebSocket shutdown timed out after %.1f seconds", timeout + ) + + # Clear all data structures + async with self._lock: + self._active_connections.clear() + self._rooms.clear() + self._connection_metadata.clear() + + logger.info("WebSocket shutdown complete") + + async def _close_connection_gracefully( + self, connection_id: str, websocket: WebSocket + ) -> None: + """Close a single WebSocket connection gracefully. + + Args: + connection_id: The connection identifier + websocket: The WebSocket connection to close + """ + try: + # Code 1001 = Going Away (server shutdown) + await websocket.close(code=1001, reason="Server shutdown") + logger.debug("Closed WebSocket connection: %s", connection_id) + except Exception as e: + logger.debug( + "Error closing WebSocket %s: %s", connection_id, str(e) + ) + class WebSocketService: """High-level WebSocket service for application-wide messaging. @@ -579,6 +658,18 @@ class WebSocketService: elapsed_seconds=round(elapsed_seconds, 2), ) + async def shutdown(self, timeout: float = 5.0) -> None: + """Gracefully shutdown the WebSocket service. + + Broadcasts shutdown notification and closes all connections. + + Args: + timeout: Maximum time (seconds) to wait for shutdown + """ + logger.info("Shutting down WebSocket service...") + await self._manager.shutdown(timeout=timeout) + logger.info("WebSocket service shutdown complete") + # Singleton instance for application-wide access _websocket_service: Optional[WebSocketService] = None diff --git a/stop_server.sh b/stop_server.sh index d722e59..7e89368 100644 --- a/stop_server.sh +++ b/stop_server.sh @@ -1,22 +1,93 @@ #!/bin/bash -# Stop Aniworld FastAPI Server +# Stop Aniworld FastAPI Server (Graceful Shutdown) +# +# This script performs a graceful shutdown by sending SIGTERM first, +# allowing the application to clean up resources properly before +# falling back to SIGKILL if needed. -echo "Stopping Aniworld server..." +GRACEFUL_TIMEOUT=30 # seconds to wait for graceful shutdown -# Method 1: Kill uvicorn processes -pkill -f "uvicorn.*fastapi_app:app" && echo "✓ Stopped uvicorn processes" +echo "Stopping Aniworld server (graceful shutdown)..." -# Method 2: Kill any process using port 8000 +# Function to wait for a process to terminate +wait_for_process() { + local pid=$1 + local timeout=$2 + local count=0 + + while [ $count -lt $timeout ]; do + if ! kill -0 "$pid" 2>/dev/null; then + return 0 # Process terminated + fi + sleep 1 + count=$((count + 1)) + echo -ne "\r Waiting for graceful shutdown... ${count}/${timeout}s" + done + echo "" + return 1 # Timeout +} + +# Method 1: Gracefully stop uvicorn processes +UVICORN_PIDS=$(pgrep -f "uvicorn.*fastapi_app:app") +if [ -n "$UVICORN_PIDS" ]; then + echo "Sending SIGTERM to uvicorn processes..." + for pid in $UVICORN_PIDS; do + kill -TERM "$pid" 2>/dev/null + done + + # Wait for graceful shutdown + all_terminated=true + for pid in $UVICORN_PIDS; do + if ! wait_for_process "$pid" "$GRACEFUL_TIMEOUT"; then + all_terminated=false + echo " Process $pid did not terminate gracefully, forcing..." + kill -9 "$pid" 2>/dev/null + fi + done + + if $all_terminated; then + echo "✓ Uvicorn processes stopped gracefully" + else + echo "✓ Uvicorn processes stopped (forced)" + fi +else + echo "✓ No uvicorn processes running" +fi + +# Method 2: Gracefully stop any process using port 8000 PORT_PID=$(lsof -ti:8000) if [ -n "$PORT_PID" ]; then - kill -9 $PORT_PID - echo "✓ Killed process on port 8000 (PID: $PORT_PID)" + echo "Found process on port 8000 (PID: $PORT_PID)" + + # Send SIGTERM first + kill -TERM "$PORT_PID" 2>/dev/null + + if wait_for_process "$PORT_PID" "$GRACEFUL_TIMEOUT"; then + echo "✓ Process on port 8000 stopped gracefully" + else + echo " Graceful shutdown timed out, forcing..." + kill -9 "$PORT_PID" 2>/dev/null + echo "✓ Process on port 8000 stopped (forced)" + fi else echo "✓ Port 8000 is already free" fi -# Method 3: Kill any python processes running the server -pkill -f "run_server.py" && echo "✓ Stopped run_server.py processes" +# Method 3: Gracefully stop run_server.py processes +SERVER_PIDS=$(pgrep -f "run_server.py") +if [ -n "$SERVER_PIDS" ]; then + echo "Sending SIGTERM to run_server.py processes..." + for pid in $SERVER_PIDS; do + kill -TERM "$pid" 2>/dev/null + done + + for pid in $SERVER_PIDS; do + if ! wait_for_process "$pid" 10; then + kill -9 "$pid" 2>/dev/null + fi + done + echo "✓ Stopped run_server.py processes" +fi echo "" echo "Server stopped successfully!"