To Do #287
Updated by Harikrishnan Murugan about 1 month ago
**Task Description** Set up production-ready FastAPI project structure using the team's boilerplate foundation, integrating Pipecat + Gemini Live for real-time voice AI capabilities, and implementing a minimal React frontend for testing. This task establishes the technical foundation for the AI Communication Coach project with proper code organization, audio capture functionality, and event-driven tool response patterns. **This task has TWO sequential parts:** - **Part A (Ishita)**: Production setup, backend APIs, minimal frontend - **Part B (Prisha)**: Tool migration (scenario generation + feedback generation), static user data preparation **IMPORTANT: NO tool integration required in this task. Keep the AI assistant basic and generic. Tool integration will be handled separately by Prisha.** **Technical Approach** **Branching Strategy:** 1. Clone the boilerplate repository (https://github.com/harikrishnan-crayond/fastapi-boilerplate) into a new local folder 2. Copy relevant code from: - Audio Capture POC (provided as ZIP) - reference for Pipecat + Gemini integration approach - Existing ai-coach-gen-ai repository - any required existing code 3. Organize all code according to boilerplate structure and BEST_PRACTICES.md standards 4. Remove the boilerplate's .git directory 5. Initialize with ai-coach-gen-ai as origin: git remote add origin https://github.com/harikrishnan-crayond/ai-coach-gen-ai 6. Create and push to new branch: feature/production-setup **Code Organization (Domain-Based - Netflix Dispatch Pattern):** ``` src/ ├── sessions/ # Session management domain │ ├── router.py # API endpoints for session lifecycle │ ├── schemas.py # Pydantic request/response models │ └── service.py # Business logic ├── data/ # Static data │ └── users.json # User profiles ├── health/ │ └── router.py # Health check endpoint ├── config.py # Configuration management ├── exceptions.py └── main.py ``` **Backend Implementation:** 1. **Generic AI Assistant** (NO tools): tools yet): - Pipecat + Gemini Live (model: gemini-2.5-flash-native-audio-preview-09-2025) - AudioBufferProcessor for turn-based audio capture (on_user_turn_audio_data, on_bot_turn_audio_data) - System prompt: "You are a helpful AI assistant. Keep responses concise and natural." - SmallWebRTC transport for peer-to-peer audio - **Do NOT register any tool functions - basic assistant only** 2. **6 Mandatory API Endpoints** (async with Pydantic schemas): - POST /sessions/start - Body: {user_id, level, topic?} → Returns: {session_id, webrtc_offer} - POST /sessions/{session_id}/stop - Stops AudioBufferProcessor recording - GET /sessions/{session_id}/audio-messages - Returns captured audio array - POST /sessions/{session_id}/webrtc-answer - WebRTC signaling - POST /sessions/{session_id}/ice-candidate - ICE candidate exchange - GET /health - Health check 3. **Storage Structure:** ```python active_sessions[session_id] = { "user_id": "user_001", "audio_messages": [ {"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000}, {"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500} ], ... } ``` **Frontend Implementation:** Minimal React UI in frontend/ directory for testing backend functionality: - Start conversation button (initiates WebRTC connection) - End conversation button (stops session) - Audio message list with playback controls - Based on Audio Capture POC frontend structure **Code Quality Requirements (MANDATORY):** - Read and follow BEST_PRACTICES.md in boilerplate root - Run ruff format for code formatting - Run ruff check --fix for linting - Type hints on all functions - Async/await for all I/O operations - Pydantic models with Field validation - Response models specified for all endpoints **Acceptance Criteria** **Part A - Ishita's Completion Criteria:** - Production-ready boilerplate structure established with domain-based organization - All 6 API endpoints functional and properly documented - Generic AI assistant working (can have voice conversation via frontend) - WebRTC connection functional (SmallWebRTC transport configured) - AudioBufferProcessor capturing both user and assistant audio correctly - Minimal React frontend working (start/end conversation, audio playback) - Code passes ruff format and ruff check with no errors - .env.example created with GEMINI_API_KEY - README.md updated (concise, required sections only) - All dependencies installed and pyproject.toml updated - Code pushed to feature/production-setup branch **Part B - Prisha's Completion Criteria** (Sequential after Ishita): - Scenario generation tool code migrated to new production structure - Feedback generation tool code migrated to new production structure (reference Issue #288 for implementation details) - src/data/users.json created with 3 diverse user profiles - **Note: Tool integration with AI assistant will be handled separately - just migrate the tool code** **Resources/References** **MANDATORY Prerequisites:** - python-fastapi-boilerplate/BEST_PRACTICES.md - Code standards and patterns - Audio Capture POC ZIP - Reference implementation **Documentation:** - Pipecat Documentation: https://docs.pipecat.ai/ - Pipecat SmallWebRTC Transport: https://docs.pipecat.ai/server/services/transport/small-webrtc - Pipecat React Client: https://docs.pipecat.ai/client/react/introduction - FastAPI Documentation: https://fastapi.tiangolo.com/ - Google Gemini Python SDK: https://github.com/googleapis/python-genai **Related Issues:** - Issue #288 (Feedback Generation Tool): Reference for feedback generation tool implementation details **Note:** Prisha begins Part B after Ishita completes Part A and notifies team.