To Do #287
Updated by Harikrishnan Murugan about 1 month ago
**Task Description**
Set up production-ready FastAPI project structure using the team's boilerplate foundation, integrating Pipecat + Gemini Live for real-time voice AI capabilities, and implementing a minimal React frontend for testing. This task establishes the technical foundation for the AI Communication Coach project with proper code organization, audio capture functionality, and event-driven tool response patterns.
**This task has TWO sequential parts:**
- **Part A (Ishita)**: Production setup, backend APIs, minimal frontend
- **Part B (Prisha)**: Tool migration (scenario generation + feedback generation), static user data preparation
**Technical Approach**
**Branching Strategy:**
1. Clone the boilerplate repository (https://github.com/harikrishnan-crayond/fastapi-boilerplate) into a new local folder
2. Copy relevant code from:
- Audio Capture POC (provided as ZIP) - reference for Pipecat + Gemini integration approach
- Existing ai-coach-gen-ai repository - any required existing code
3. Organize all code according to boilerplate structure and BEST_PRACTICES.md standards
4. Remove the boilerplate's .git directory
5. Initialize with ai-coach-gen-ai as origin: git remote add origin https://github.com/harikrishnan-crayond/ai-coach-gen-ai
6. Create and push to new branch: feature/production-setup
**Code Organization (Domain-Based - Netflix Dispatch Pattern):**
```
src/
├── sessions/ # Session management domain
│ ├── router.py # API endpoints for session lifecycle
│ ├── schemas.py # Pydantic request/response models
│ └── service.py # Business logic
├── data/ # Static data
│ └── users.json # User profiles
├── health/
│ └── router.py # Health check endpoint
├── config.py # Configuration management
├── exceptions.py
└── main.py
```
**Backend Implementation:**
1. **Generic AI Assistant** (NO tools yet):
- Pipecat + Gemini Live (model: gemini-2.5-flash-native-audio-preview-09-2025)
- AudioBufferProcessor for turn-based audio capture (on_user_turn_audio_data, on_bot_turn_audio_data)
- System prompt: "You are a helpful AI assistant. Keep responses concise and natural."
- SmallWebRTC transport for peer-to-peer audio
2. **6 Mandatory API Endpoints** (async with Pydantic schemas):
- POST /sessions/start - Body: {user_id, level, topic?} → Returns: {session_id, webrtc_offer}
- POST /sessions/{session_id}/stop - Stops AudioBufferProcessor recording
- GET /sessions/{session_id}/audio-messages - Returns captured audio array
- POST /sessions/{session_id}/webrtc-answer - WebRTC signaling
- POST /sessions/{session_id}/ice-candidate - ICE candidate exchange
- GET /health - Health check
3. **Storage Structure:**
```python
active_sessions[session_id] = {
"user_id": "user_001",
"audio_messages": [
{"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000},
{"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500}
],
...
}
```
**Frontend Implementation:**
Minimal React UI in frontend/ directory for testing backend functionality:
- Start conversation button (initiates WebRTC connection)
- End conversation button (stops session)
- Audio message list with playback controls
- Based on Audio Capture POC frontend structure
**Code Quality Requirements (MANDATORY):**
- Read and follow BEST_PRACTICES.md in boilerplate root
- Run ruff format for code formatting
- Run ruff check --fix for linting
- Type hints on all functions
- Async/await for all I/O operations
- Pydantic models with Field validation
- Response models specified for all endpoints
**Acceptance Criteria**
**Part A - Ishita's Completion Criteria:**
- Production-ready boilerplate structure established with domain-based organization
- All 6 API endpoints functional and properly documented
- Generic AI assistant working (can have voice conversation via frontend)
- WebRTC connection functional (SmallWebRTC transport configured)
- AudioBufferProcessor capturing both user and assistant audio correctly
- Minimal React frontend working (start/end conversation, audio playback)
- Code passes ruff format and ruff check with no errors
- .env.example created with GEMINI_API_KEY
- README.md updated (concise, required sections only)
- All dependencies installed and pyproject.toml updated
- Code pushed to feature/production-setup branch
**Part B - Prisha's Completion Criteria** (Sequential after Ishita):
- Scenario generation tool code migrated to new production structure (NOT integrated yet - code only)
- Feedback generation tool code migrated to new production structure (reference Issue #288 for implementation details)
- src/data/users.json created with 3 diverse user profiles
**Resources/References**
**MANDATORY Prerequisites:**
- python-fastapi-boilerplate/BEST_PRACTICES.md - Code standards and patterns
- Audio Capture POC ZIP - Reference implementation
**Documentation:**
- Pipecat Documentation: https://docs.pipecat.ai/
- Pipecat SmallWebRTC Transport: https://docs.pipecat.ai/server/services/transport/small-webrtc
- Pipecat React Client: https://docs.pipecat.ai/client/react/introduction
- FastAPI Documentation: https://fastapi.tiangolo.com/
- Google Gemini Python SDK: https://github.com/googleapis/python-genai
**Related Issues:**
- Issue #288 (Feedback Generation Tool): Reference for feedback generation tool implementation details
**Note:** Prisha begins Part B after Ishita completes Part A and notifies team.