To Do #287
openProduction Setup with Boilerplate & Pipecat Integration
0%
Description
Task Description
Set up production-ready FastAPI project structure using the team's boilerplate foundation, integrating Pipecat + Gemini Live for real-time voice AI capabilities, and implementing a minimal React frontend for testing. This task establishes the technical foundation for the AI Communication Coach project with proper code organization, audio capture functionality, and event-driven tool response patterns.
This task has TWO sequential parts:
- Part A (Ishita): Production setup, backend APIs, minimal frontend
- Part B (Prisha): Tool migration (scenario generation + feedback generation), static user data preparation
IMPORTANT: NO tool integration required in this task. Keep the AI assistant basic and generic. Tool integration will be handled in a separate issue.
Technical Approach
Branching Strategy:
- Clone the boilerplate repository (https://github.com/harikrishnan-crayond/fastapi-boilerplate) into a new local folder
- Copy relevant code from:
- Audio Capture POC (provided as ZIP) - reference for Pipecat + Gemini integration approach
- Existing ai-coach-gen-ai repository - any required existing code
- Organize all code according to boilerplate structure and BEST_PRACTICES.md standards
- Remove the boilerplate's .git directory
- Initialize with ai-coach-gen-ai as origin: git remote add origin https://github.com/harikrishnan-crayond/ai-coach-gen-ai
- Create and push to new branch: feature/production-setup
Code Organization (Domain-Based - Netflix Dispatch Pattern):
src/
├── sessions/ # Session management domain
│ ├── router.py # API endpoints for session lifecycle
│ ├── schemas.py # Pydantic request/response models
│ └── service.py # Business logic
├── data/ # Static data
│ └── users.json # User profiles
├── health/
│ └── router.py # Health check endpoint
├── config.py # Configuration management
├── exceptions.py
└── main.py
Backend Implementation:
-
Generic AI Assistant (NO tools):
- Pipecat + Gemini Live (model: gemini-2.5-flash-native-audio-preview-09-2025)
- AudioBufferProcessor for turn-based audio capture (on_user_turn_audio_data, on_bot_turn_audio_data)
- System prompt: "You are a helpful AI assistant. Keep responses concise and natural."
- SmallWebRTC transport for peer-to-peer audio
- Do NOT register any tool functions - basic assistant only
-
6 Mandatory API Endpoints (async with Pydantic schemas):
- POST /sessions/start - Body: {user_id, level, topic?} → Returns: {session_id, webrtc_offer}
- POST /sessions/{session_id}/stop - Stops AudioBufferProcessor recording
- GET /sessions/{session_id}/audio-messages - Returns captured audio array
- POST /sessions/{session_id}/webrtc-answer - WebRTC signaling
- POST /sessions/{session_id}/ice-candidate - ICE candidate exchange
- GET /health - Health check
-
Storage Structure:
active_sessions[session_id] = {
"user_id": "user_001",
"audio_messages": [
{"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000},
{"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500}
],
...
}
Frontend Implementation:
Minimal React UI in frontend/ directory for testing backend functionality:
- Start conversation button (initiates WebRTC connection)
- End conversation button (stops session)
- Audio message list with playback controls
- Based on Audio Capture POC frontend structure
Code Quality Requirements (MANDATORY):
- Read and follow BEST_PRACTICES.md in boilerplate root
- Run ruff format for code formatting
- Run ruff check --fix for linting
- Type hints on all functions
- Async/await for all I/O operations
- Pydantic models with Field validation
- Response models specified for all endpoints
Acceptance Criteria
Part A - Ishita's Completion Criteria:
- Production-ready boilerplate structure established with domain-based organization
- All 6 API endpoints functional and properly documented
- Generic AI assistant working (can have voice conversation via frontend)
- WebRTC connection functional (SmallWebRTC transport configured)
- AudioBufferProcessor capturing both user and assistant audio correctly
- Minimal React frontend working (start/end conversation, audio playback)
- Code passes ruff format and ruff check with no errors
- .env.example created with GEMINI_API_KEY
- README.md updated (concise, required sections only)
- All dependencies installed and pyproject.toml updated
- Code pushed to feature/production-setup branch
Part B - Prisha's Completion Criteria (Sequential after Ishita):
- Scenario generation tool code migrated to new production structure
- Feedback generation tool code migrated to new production structure (reference Issue #288 for implementation details)
- src/data/users.json created with 3 diverse user profiles
- Note: Just migrate tool code - NO integration with AI assistant required
Resources/References
MANDATORY Prerequisites:
- python-fastapi-boilerplate/BEST_PRACTICES.md - Code standards and patterns
- Audio Capture POC ZIP - Reference implementation
Documentation:
- Pipecat Documentation: https://docs.pipecat.ai/
- Pipecat SmallWebRTC Transport: https://docs.pipecat.ai/server/services/transport/small-webrtc
- Pipecat React Client: https://docs.pipecat.ai/client/react/introduction
- FastAPI Documentation: https://fastapi.tiangolo.com/
- Google Gemini Python SDK: https://github.com/googleapis/python-genai
Related Issues:
- Issue #288 (Feedback Generation Tool): Reference for feedback generation tool implementation details
Note: Prisha begins Part B after Ishita completes Part A and notifies team.