Project

General

Profile

Actions

To Do #287

open

Production Setup with Boilerplate & Pipecat Integration

Added by Harikrishnan Murugan about 1 month ago. Updated about 1 month ago.

Status:
Pending
Priority:
High
Target version:
-
Start date:
11/24/2025
Due date:
11/26/2025 (36 days late)
% Done:

0%

Estimated time:
Prioritization:
P0

Description

Task Description

Set up production-ready FastAPI project structure using the team's boilerplate foundation, integrating Pipecat + Gemini Live for real-time voice AI capabilities, and implementing a minimal React frontend for testing. This task establishes the technical foundation for the AI Communication Coach project with proper code organization, audio capture functionality, and event-driven tool response patterns.

This task has TWO sequential parts:

  • Part A (Ishita): Production setup, backend APIs, minimal frontend
  • Part B (Prisha): Tool migration (scenario generation + feedback generation), static user data preparation

IMPORTANT: NO tool integration required in this task. Keep the AI assistant basic and generic. Tool integration will be handled in a separate issue.

Technical Approach

Branching Strategy:

  1. Clone the boilerplate repository (https://github.com/harikrishnan-crayond/fastapi-boilerplate) into a new local folder
  2. Copy relevant code from:
    • Audio Capture POC (provided as ZIP) - reference for Pipecat + Gemini integration approach
    • Existing ai-coach-gen-ai repository - any required existing code
  3. Organize all code according to boilerplate structure and BEST_PRACTICES.md standards
  4. Remove the boilerplate's .git directory
  5. Initialize with ai-coach-gen-ai as origin: git remote add origin https://github.com/harikrishnan-crayond/ai-coach-gen-ai
  6. Create and push to new branch: feature/production-setup

Code Organization (Domain-Based - Netflix Dispatch Pattern):

src/
├── sessions/           # Session management domain
│   ├── router.py      # API endpoints for session lifecycle
│   ├── schemas.py     # Pydantic request/response models
│   └── service.py     # Business logic
├── data/              # Static data
│   └── users.json     # User profiles
├── health/
│   └── router.py      # Health check endpoint
├── config.py          # Configuration management
├── exceptions.py
└── main.py

Backend Implementation:

  1. Generic AI Assistant (NO tools):

    • Pipecat + Gemini Live (model: gemini-2.5-flash-native-audio-preview-09-2025)
    • AudioBufferProcessor for turn-based audio capture (on_user_turn_audio_data, on_bot_turn_audio_data)
    • System prompt: "You are a helpful AI assistant. Keep responses concise and natural."
    • SmallWebRTC transport for peer-to-peer audio
    • Do NOT register any tool functions - basic assistant only
  2. 6 Mandatory API Endpoints (async with Pydantic schemas):

    • POST /sessions/start - Body: {user_id, level, topic?} → Returns: {session_id, webrtc_offer}
    • POST /sessions/{session_id}/stop - Stops AudioBufferProcessor recording
    • GET /sessions/{session_id}/audio-messages - Returns captured audio array
    • POST /sessions/{session_id}/webrtc-answer - WebRTC signaling
    • POST /sessions/{session_id}/ice-candidate - ICE candidate exchange
    • GET /health - Health check
  3. Storage Structure:

active_sessions[session_id] = {
    "user_id": "user_001",
    "audio_messages": [
        {"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000},
        {"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500}
    ],
    ...
}

Frontend Implementation:

Minimal React UI in frontend/ directory for testing backend functionality:

  • Start conversation button (initiates WebRTC connection)
  • End conversation button (stops session)
  • Audio message list with playback controls
  • Based on Audio Capture POC frontend structure

Code Quality Requirements (MANDATORY):

  • Read and follow BEST_PRACTICES.md in boilerplate root
  • Run ruff format for code formatting
  • Run ruff check --fix for linting
  • Type hints on all functions
  • Async/await for all I/O operations
  • Pydantic models with Field validation
  • Response models specified for all endpoints

Acceptance Criteria

Part A - Ishita's Completion Criteria:

  • Production-ready boilerplate structure established with domain-based organization
  • All 6 API endpoints functional and properly documented
  • Generic AI assistant working (can have voice conversation via frontend)
  • WebRTC connection functional (SmallWebRTC transport configured)
  • AudioBufferProcessor capturing both user and assistant audio correctly
  • Minimal React frontend working (start/end conversation, audio playback)
  • Code passes ruff format and ruff check with no errors
  • .env.example created with GEMINI_API_KEY
  • README.md updated (concise, required sections only)
  • All dependencies installed and pyproject.toml updated
  • Code pushed to feature/production-setup branch

Part B - Prisha's Completion Criteria (Sequential after Ishita):

  • Scenario generation tool code migrated to new production structure
  • Feedback generation tool code migrated to new production structure (reference Issue #288 for implementation details)
  • src/data/users.json created with 3 diverse user profiles
  • Note: Just migrate tool code - NO integration with AI assistant required

Resources/References

MANDATORY Prerequisites:

  • python-fastapi-boilerplate/BEST_PRACTICES.md - Code standards and patterns
  • Audio Capture POC ZIP - Reference implementation

Documentation:

Related Issues:

  • Issue #288 (Feedback Generation Tool): Reference for feedback generation tool implementation details

Note: Prisha begins Part B after Ishita completes Part A and notifies team.

Actions

Also available in: Atom PDF