To Do #287: Production Setup with Boilerplate & Pipecat Integration - AI Communication Coach - Redmine

To Do #287

Updated by Harikrishnan Murugan 3 months ago

**Task Description** 

 Set up production-ready FastAPI project structure using the team's boilerplate foundation, integrating Pipecat + Gemini Live for real-time voice AI capabilities, and implementing a minimal React frontend for testing. This task establishes the technical foundation for the AI Communication Coach project with proper code organization, audio capture functionality, and event-driven tool response patterns. 

 **This task has TWO sequential parts:** 
 - **Part A (Ishita)**: Production setup, backend APIs, minimal frontend 
 - **Part B (Prisha)**: Tool migration (scenario generation + feedback generation), static user data preparation 

 **IMPORTANT: NO tool integration required in this task. Keep the AI assistant basic and generic. Tool integration will be handled separately by Prisha.** 

 **Technical Approach** 

 **Branching Strategy:** 
 1. Clone the boilerplate repository (https://github.com/harikrishnan-crayond/fastapi-boilerplate) into a new local folder 
 2. Copy relevant code from: 
    - Audio Capture POC (provided as ZIP) - reference for Pipecat + Gemini integration approach 
    - Existing ai-coach-gen-ai repository - any required existing code 
 3. Organize all code according to boilerplate structure and BEST_PRACTICES.md standards 
 4. Remove the boilerplate's .git directory 
 5. Initialize with ai-coach-gen-ai as origin: git remote add origin https://github.com/harikrishnan-crayond/ai-coach-gen-ai 
 6. Create and push to new branch: feature/production-setup 

 **Code Organization (Domain-Based - Netflix Dispatch Pattern):** 
 ``` 
 src/ 
 ├── sessions/             # Session management domain 
 │     ├── router.py        # API endpoints for session lifecycle 
 │     ├── schemas.py       # Pydantic request/response models 
 │     └── service.py       # Business logic 
 ├── data/                # Static data 
 │     └── users.json       # User profiles 
 ├── health/ 
 │     └── router.py        # Health check endpoint 
 ├── config.py            # Configuration management 
 ├── exceptions.py 
 └── main.py 
 ``` 

 **Backend Implementation:** 

 1. **Generic AI Assistant** (NO tools): tools yet): 
    - Pipecat + Gemini Live (model: gemini-2.5-flash-native-audio-preview-09-2025) 
    - AudioBufferProcessor for turn-based audio capture (on_user_turn_audio_data, on_bot_turn_audio_data) 
    - System prompt: "You are a helpful AI assistant. Keep responses concise and natural." 
    - SmallWebRTC transport for peer-to-peer audio 
    - **Do NOT register any tool functions - basic assistant only** 

 2. **6 Mandatory API Endpoints** (async with Pydantic schemas): 
    - POST /sessions/start - Body: {user_id, level, topic?} → Returns: {session_id, webrtc_offer} 
    - POST /sessions/{session_id}/stop - Stops AudioBufferProcessor recording 
    - GET /sessions/{session_id}/audio-messages - Returns captured audio array 
    - POST /sessions/{session_id}/webrtc-answer - WebRTC signaling 
    - POST /sessions/{session_id}/ice-candidate - ICE candidate exchange 
    - GET /health - Health check 

 3. **Storage Structure:** 
 ```python 
 active_sessions[session_id] = { 
     "user_id": "user_001", 
     "audio_messages": [ 
         {"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000}, 
         {"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500} 
     ], 
     ... 
 } 
 ``` 

 **Frontend Implementation:** 

 Minimal React UI in frontend/ directory for testing backend functionality: 
 - Start conversation button (initiates WebRTC connection) 
 - End conversation button (stops session) 
 - Audio message list with playback controls 
 - Based on Audio Capture POC frontend structure 

 **Code Quality Requirements (MANDATORY):** 
 - Read and follow BEST_PRACTICES.md in boilerplate root 
 - Run ruff format for code formatting 
 - Run ruff check --fix for linting 
 - Type hints on all functions 
 - Async/await for all I/O operations 
 - Pydantic models with Field validation 
 - Response models specified for all endpoints 

 **Acceptance Criteria** 

 **Part A - Ishita's Completion Criteria:** 
 - Production-ready boilerplate structure established with domain-based organization 
 - All 6 API endpoints functional and properly documented 
 - Generic AI assistant working (can have voice conversation via frontend) 
 - WebRTC connection functional (SmallWebRTC transport configured) 
 - AudioBufferProcessor capturing both user and assistant audio correctly 
 - Minimal React frontend working (start/end conversation, audio playback) 
 - Code passes ruff format and ruff check with no errors 
 - .env.example created with GEMINI_API_KEY 
 - README.md updated (concise, required sections only) 
 - All dependencies installed and pyproject.toml updated 
 - Code pushed to feature/production-setup branch 

 **Part B - Prisha's Completion Criteria** (Sequential after Ishita): 
 - Scenario generation tool code migrated to new production structure 
 - Feedback generation tool code migrated to new production structure (reference Issue #288 for implementation details) 
 - src/data/users.json created with 3 diverse user profiles 
 - **Note: Tool integration with AI assistant will be handled separately - just migrate the tool code** 

 **Resources/References** 

 **MANDATORY Prerequisites:** 
 - python-fastapi-boilerplate/BEST_PRACTICES.md - Code standards and patterns 
 - Audio Capture POC ZIP - Reference implementation 

 **Documentation:** 
 - Pipecat Documentation: https://docs.pipecat.ai/ 
 - Pipecat SmallWebRTC Transport: https://docs.pipecat.ai/server/services/transport/small-webrtc 
 - Pipecat React Client: https://docs.pipecat.ai/client/react/introduction 
 - FastAPI Documentation: https://fastapi.tiangolo.com/ 
 - Google Gemini Python SDK: https://github.com/googleapis/python-genai 

 **Related Issues:** 
 - Issue #288 (Feedback Generation Tool): Reference for feedback generation tool implementation details 

 **Note:** Prisha begins Part B after Ishita completes Part A and notifies team.

Back

Project

General

Profile

AI Communication Coach

To Do #287