To Do #155
closedPipecat Pipeline Integration & Gemini Service
100%
Description
Task Description
Build Pipecat pipeline with GeminiMultimodalLiveLLMService for real-time audio communication. Implement SmallWebRTC transport for peer-to-peer audio streaming and create a weather assistant demo to verify the complete setup works correctly. This establishes the technical foundation that can later be adapted to the AI Communication Coach by changing the system prompt and tools configuration.
IMPORTANT NOTE: Use SmallWebRTCTransport specifically for this project. Do NOT use DailyTransport, FastAPIWebSocketTransport, or any other transport implementations.
Model to Use: gemini-2.5-flash-native-audio-preview-09-2025
Technical Approach
- Configure GeminiMultimodalLiveLLMService with model: gemini-2.5-flash-native-audio-preview-09-2025
- Implement SmallWebRTCTransport (NOT DailyTransport or FastAPIWebSocketTransport)
- Build Pipecat pipeline (audio input → processing → Gemini → audio output)
- Set up session management (start/stop/reconnect)
- Add error handling and connection recovery
- Create weather assistant demo with function calling
- Implement weather API tool for real-time tool calls
- Create minimal React frontend using Pipecat React SDK
- Implement weather UI display component for tool results
- Test complete pipeline with voice commands
Acceptance Criteria
- Pipecat pipeline runs successfully
- GeminiMultimodalLiveLLMService configured with gemini-2.5-flash-native-audio-preview-09-2025 model
- SmallWebRTCTransport implemented for audio streaming (not other transports)
- Audio input/output streaming works via WebRTC
- Error handling covers disconnections
- Session lifecycle managed properly
- Weather assistant responds to voice queries
- Tool calling works (get_weather function)
- Frontend displays weather UI when tool is called
- Complete setup can be adapted by changing system prompt and tools config
Resources/References
- Pipecat Gemini Multimodal Live Guide: https://docs.pipecat.ai/guides/features/gemini-multimodal-live
- Pipecat SmallWebRTC Example: https://github.com/pipecat-ai/pipecat-examples/tree/main/p2p-webrtc
- SmallWebRTCTransport Documentation: https://docs.pipecat.ai/server/services/transport/small-webrtc#smallwebrtctransport
- Pipecat React Client SDK: https://docs.pipecat.ai/client/react/introduction
- Pipecat Pipeline Documentation: https://docs.pipecat.ai/server/introduction
Updated by Ishita Singh Faujdar 2 months ago
The system now provides:
Voice Interaction:¶
- Real-time voice conversation with AI assistant
- Automatic greeting when you connect
- Natural language understanding and response
- Weather information for any city worldwide
Technical Features:¶
- Gemini 2.5 Flash with native audio I/O (no separate STT/TTS)
- WebRTC peer-to-peer audio streaming (low latency)
- Professional voice activity detection (Silero VAD)
- Function calling (AI can fetch real weather data from APIs)
- Multi-user session support
- Production-grade logging
User Experience:¶
- Real-time audio level visualization
- Clear conversation state indicators
- Automatic speech detection (knows when you're done speaking)
- Silence detection with helpful prompts
Documentation¶
- Complete documentation available in docs/:
- Task completion report
- Installation guide
- Architecture documentation
- All fixes documented with code examples
- Guide for switching to Communication Coach
Note: The Zip file is attached below. Switch to weather-app branch.
To run
- Backend:
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-
Configure
Create a .env file and add GEMINI API KEY and OPEN WEATHER MAP API KEY -
Frontend
cd frontend
npm install
npm run dev
Expected Experience¶
- Click "Start Voice Session" button
- Allow microphone access when prompted
- Hear AI greet you: "Hello! I'm Aria, your weather assistant. I can help you check the weather in any city around the world..."
- Speak: "What's the weather in London?" or "How's the weather in Delhi right now?"
- See your microphone level at the bottom (green bar = good audio level)
- Wait ~2-3 seconds after speaking (AI detects you're done)
- Hear AI respond with current weather: "Currently in London, it's 15 degrees Celsius with partly cloudy skies..."
- Continue conversation naturally - ask about other cities!
Visual Feedback:¶
- Screen shows "🔊 AI is responding..." when AI speaks
- "✅ Listening - Speak now" when waiting for you
- "🎤 Recording your voice..." when you speak
- Audio level bar turns green when you're loud enough
Pipeline Flow¶
User Speech → Microphone → WebRTC → SmallWebRTC Transport
↓
Voice Activity Detection (Silero VAD)
↓
Context Aggregator (tracks user messages)
↓
Gemini Multimodal Live LLM
- Understands speech
- Calls functions if needed (get_weather)
- Generates natural response
- Converts to speech
↓
Transport Output → WebRTC → Speaker
↓
Context Aggregator (stores AI responses)
↓
User Hears AI Response 🔊
Updated by Harikrishnan Murugan 2 months ago
- Status changed from Pending to Closed
- % Done changed from 0 to 100