To Do #287: Production Setup with Boilerplate & Pipecat Integration - AI Communication Coach - Redmine

Actions

Copy link

To Do #287

open

Production Setup with Boilerplate & Pipecat Integration

Added by Harikrishnan Murugan 3 months ago. Updated 3 months ago.

Status:

Pending

Priority:

High

Assignee:

Ishita Singh Faujdar

Target version:

Start date:

11/24/2025

Due date:

11/26/2025 (about 3 months late)

% Done:

Estimated time:

Prioritization:

Description

Task Description

Set up production-ready FastAPI project structure using the team's boilerplate foundation, integrating Pipecat + Gemini Live for real-time voice AI capabilities, and implementing a minimal React frontend for testing. This task establishes the technical foundation for the AI Communication Coach project with proper code organization, audio capture functionality, and event-driven tool response patterns.

This task has TWO sequential parts:

Part A (Ishita): Production setup, backend APIs, minimal frontend
Part B (Prisha): Tool migration (scenario generation + feedback generation), static user data preparation

IMPORTANT: NO tool integration required in this task. Keep the AI assistant basic and generic. Tool integration will be handled in a separate issue.

Technical Approach

Branching Strategy:

Clone the boilerplate repository (https://github.com/harikrishnan-crayond/fastapi-boilerplate) into a new local folder
Copy relevant code from:
- Audio Capture POC (provided as ZIP) - reference for Pipecat + Gemini integration approach
- Existing ai-coach-gen-ai repository - any required existing code
Organize all code according to boilerplate structure and BEST_PRACTICES.md standards
Remove the boilerplate's .git directory
Initialize with ai-coach-gen-ai as origin: git remote add origin https://github.com/harikrishnan-crayond/ai-coach-gen-ai
Create and push to new branch: feature/production-setup

Code Organization (Domain-Based - Netflix Dispatch Pattern):

src/
├── sessions/           # Session management domain
│   ├── router.py      # API endpoints for session lifecycle
│   ├── schemas.py     # Pydantic request/response models
│   └── service.py     # Business logic
├── data/              # Static data
│   └── users.json     # User profiles
├── health/
│   └── router.py      # Health check endpoint
├── config.py          # Configuration management
├── exceptions.py
└── main.py

Backend Implementation:

Generic AI Assistant (NO tools):
- Pipecat + Gemini Live (model: gemini-2.5-flash-native-audio-preview-09-2025)
- AudioBufferProcessor for turn-based audio capture (on_user_turn_audio_data, on_bot_turn_audio_data)
- System prompt: "You are a helpful AI assistant. Keep responses concise and natural."
- SmallWebRTC transport for peer-to-peer audio
- Do NOT register any tool functions - basic assistant only
6 Mandatory API Endpoints (async with Pydantic schemas):
- POST /sessions/start - Body: {user_id, level, topic?} → Returns: {session_id, webrtc_offer}
- POST /sessions/{session_id}/stop - Stops AudioBufferProcessor recording
- GET /sessions/{session_id}/audio-messages - Returns captured audio array
- POST /sessions/{session_id}/webrtc-answer - WebRTC signaling
- POST /sessions/{session_id}/ice-candidate - ICE candidate exchange
- GET /health - Health check
Storage Structure:

active_sessions[session_id] = {
    "user_id": "user_001",
    "audio_messages": [
        {"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000},
        {"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500}
    ],
    ...
}

Frontend Implementation:

Minimal React UI in frontend/ directory for testing backend functionality:

Start conversation button (initiates WebRTC connection)
End conversation button (stops session)
Audio message list with playback controls
Based on Audio Capture POC frontend structure

Code Quality Requirements (MANDATORY):

Read and follow BEST_PRACTICES.md in boilerplate root
Run ruff format for code formatting
Run ruff check --fix for linting
Type hints on all functions
Async/await for all I/O operations
Pydantic models with Field validation
Response models specified for all endpoints

Acceptance Criteria

Part A - Ishita's Completion Criteria:

Production-ready boilerplate structure established with domain-based organization
All 6 API endpoints functional and properly documented
Generic AI assistant working (can have voice conversation via frontend)
WebRTC connection functional (SmallWebRTC transport configured)
AudioBufferProcessor capturing both user and assistant audio correctly
Minimal React frontend working (start/end conversation, audio playback)
Code passes ruff format and ruff check with no errors
.env.example created with GEMINI_API_KEY
README.md updated (concise, required sections only)
All dependencies installed and pyproject.toml updated
Code pushed to feature/production-setup branch

Part B - Prisha's Completion Criteria (Sequential after Ishita):

Scenario generation tool code migrated to new production structure
Feedback generation tool code migrated to new production structure (reference Issue #288 for implementation details)
src/data/users.json created with 3 diverse user profiles
Note: Just migrate tool code - NO integration with AI assistant required

Resources/References

MANDATORY Prerequisites: