To Do #155: Pipecat Pipeline Integration & Gemini Service - AI Communication Coach - Redmine

Actions

Copy link

To Do #155

closed

Pipecat Pipeline Integration & Gemini Service

Added by Harikrishnan Murugan 5 months ago. Updated 4 months ago.

Status:

Closed

Priority:

High

Assignee:

Ishita Singh Faujdar

Target version:

Start date:

10/09/2025

Due date:

10/17/2025

% Done:

100%

Estimated time:

Prioritization:

Description

Task Description

Build Pipecat pipeline with GeminiMultimodalLiveLLMService for real-time audio communication. Implement SmallWebRTC transport for peer-to-peer audio streaming and create a weather assistant demo to verify the complete setup works correctly. This establishes the technical foundation that can later be adapted to the AI Communication Coach by changing the system prompt and tools configuration.

IMPORTANT NOTE: Use SmallWebRTCTransport specifically for this project. Do NOT use DailyTransport, FastAPIWebSocketTransport, or any other transport implementations.

Model to Use: gemini-2.5-flash-native-audio-preview-09-2025

Technical Approach

Configure GeminiMultimodalLiveLLMService with model: gemini-2.5-flash-native-audio-preview-09-2025
Implement SmallWebRTCTransport (NOT DailyTransport or FastAPIWebSocketTransport)
Build Pipecat pipeline (audio input → processing → Gemini → audio output)
Set up session management (start/stop/reconnect)
Add error handling and connection recovery
Create weather assistant demo with function calling
Implement weather API tool for real-time tool calls
Create minimal React frontend using Pipecat React SDK
Implement weather UI display component for tool results
Test complete pipeline with voice commands

Acceptance Criteria

Pipecat pipeline runs successfully
GeminiMultimodalLiveLLMService configured with gemini-2.5-flash-native-audio-preview-09-2025 model
SmallWebRTCTransport implemented for audio streaming (not other transports)
Audio input/output streaming works via WebRTC
Error handling covers disconnections
Session lifecycle managed properly
Weather assistant responds to voice queries
Tool calling works (get_weather function)
Frontend displays weather UI when tool is called
Complete setup can be adapted by changing system prompt and tools config

Resources/References

Pipecat Gemini Multimodal Live Guide: https://docs.pipecat.ai/guides/features/gemini-multimodal-live
Pipecat SmallWebRTC Example: https://github.com/pipecat-ai/pipecat-examples/tree/main/p2p-webrtc
SmallWebRTCTransport Documentation: https://docs.pipecat.ai/server/services/transport/small-webrtc#smallwebrtctransport
Pipecat React Client SDK: https://docs.pipecat.ai/client/react/introduction
Pipecat Pipeline Documentation: https://docs.pipecat.ai/server/introduction

Actions

Copy link

Updated by Harikrishnan Murugan 5 months ago

Description updated (diff)

Actions

Copy link

Updated by Ishita Singh Faujdar 4 months ago

The system now provides:

Voice Interaction:¶

Real-time voice conversation with AI assistant
Automatic greeting when you connect
Natural language understanding and response
Weather information for any city worldwide

Technical Features:¶

Gemini 2.5 Flash with native audio I/O (no separate STT/TTS)
WebRTC peer-to-peer audio streaming (low latency)
Professional voice activity detection (Silero VAD)
Function calling (AI can fetch real weather data from APIs)
Multi-user session support
Production-grade logging

User Experience:¶

Real-time audio level visualization
Clear conversation state indicators
Automatic speech detection (knows when you're done speaking)
Silence detection with helpful prompts

Documentation¶

Complete documentation available in docs/:
Task completion report
Installation guide
Architecture documentation
All fixes documented with code examples
Guide for switching to Communication Coach

Note: The Zip file is attached below. Switch to weather-app branch.
To run

Backend:

python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Configure
Create a .env file and add GEMINI API KEY and OPEN WEATHER MAP API KEY
Frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000

Expected Experience¶

Click "Start Voice Session" button
Allow microphone access when prompted
Hear AI greet you: "Hello! I'm Aria, your weather assistant. I can help you check the weather in any city around the world..."
Speak: "What's the weather in London?" or "How's the weather in Delhi right now?"
See your microphone level at the bottom (green bar = good audio level)
Wait ~2-3 seconds after speaking (AI detects you're done)
Hear AI respond with current weather: "Currently in London, it's 15 degrees Celsius with partly cloudy skies..."
Continue conversation naturally - ask about other cities!

Visual Feedback:¶

Screen shows "🔊 AI is responding..." when AI speaks
"✅ Listening - Speak now" when waiting for you
"🎤 Recording your voice..." when you speak
Audio level bar turns green when you're loud enough

Pipeline Flow¶

User Speech → Microphone → WebRTC → SmallWebRTC Transport
    ↓
Voice Activity Detection (Silero VAD)
    ↓
Context Aggregator (tracks user messages)
    ↓
Gemini Multimodal Live LLM
  - Understands speech
  - Calls functions if needed (get_weather)
  - Generates natural response
  - Converts to speech
    ↓
Transport Output → WebRTC → Speaker
    ↓
Context Aggregator (stores AI responses)
    ↓
User Hears AI Response 🔊

Actions

Copy link

Updated by Harikrishnan Murugan 4 months ago

Status changed from Pending to Closed
% Done changed from 0 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

AI Communication Coach

To Do #155

Pipecat Pipeline Integration & Gemini Service

Updated by Harikrishnan Murugan 5 months ago

Updated by Ishita Singh Faujdar 4 months ago

Voice Interaction:¶

Technical Features:¶

User Experience:¶

Documentation¶

Expected Experience¶

Visual Feedback:¶

Pipeline Flow¶

Updated by Harikrishnan Murugan 4 months ago