To Do #288
openFeedback Generation Tool for AI Communication Coach
0%
Description
Task Description
Build a Pipecat-compatible tool function that generates comprehensive communication feedback by analyzing complete audio-audio conversations. This tool analyzes practice session conversations stored as audio messages and returns detailed performance assessment across fluency, pronunciation, grammar, and vocabulary dimensions.
The tool follows the same Pipecat tool function pattern as the Scenario Generation Tool, making it callable by the AI Communication Coach when feedback is needed.
This work will be done in the existing repository (https://github.com/harikrishnan-crayond/ai-coach-gen-ai) and will be migrated to the production setup later once Part A is complete in Issue #287.
Technical Approach
Tool Pattern: Follow Pipecat tool function format (same pattern as Scenario Generation Tool):
- Tool name:
feedback_generation - Input:
session_id(string) - Processing: Retrieve audio messages + scenario details from stored sessions, build conversation array with inline base64 audio, send to Gemini with thinking_budget
- Output: Comprehensive feedback JSON matching Speech Analysis POC structure
Model & Configuration:
- Use
google-genaipackage withgemini-2.5-flash-native-audio-preview-09-2025model - Include
thinking_budgetparameter in balanced mode (reference Speech Analysis POC approach) - Support inline base64 audio format:
{"inline_data": {"mime_type": "audio/pcm", "data": audio_base64}}
Message Construction:
messages = [
{"role": "system", "content": "Analyze communication performance across fluency, pronunciation, grammar, and vocabulary. Return feedback as JSON."},
{"role": "user", "content": [{"inline_data": {"mime_type": "audio/pcm", "data": user_audio_base64}}]},
{"role": "assistant", "content": [{"inline_data": {"mime_type": "audio/pcm", "data": bot_audio_base64}}]},
# ... alternating user/assistant audio messages
{"role": "user", "content": "Above are the audio-audio conversation that happened for the given scenario. Now generate the feedback as JSON."}
]
Static Test Data Setup:
- Store test sessions with session IDs and corresponding audio messages
- Use Audio Capture POC to generate test conversation audio by updating system prompt for minimal conversations
- Create at least 3 diverse conversation samples for testing
Acceptance Criteria
-
Tool Function Implementation:
- Create
feedback_generationPipecat-compatible tool function - Accepts
session_idas input parameter - Retrieves
audio_messagesfrom stored session data - Retrieves
scenario_detailsfrom stored session data
- Create
-
Gemini API Integration:
- Builds conversation array with inline base64 audio format
- Appends final user message requesting JSON feedback
- Calls
google-genaipackage withgemini-2.5-flash-native-audio-preview-09-2025model - Includes
thinking_budgetparameter in balanced mode
-
Output JSON Structure (matches Speech Analysis POC format):
{ "overall_scores": {"fluency": 8.0, "pronunciation": 7.0, "grammar": 7.5, "vocabulary": 8.0}, "fluency_analysis": {"speech_length": "2 minutes 30 seconds", "speaking_rate": "145 words per minute", "pause_fillers": ["um", "uh"], "long_pauses": 3}, "pronunciation": {"mispronounced_words": [{"word": "schedule", "accuracy_score": 65}]}, "grammar_analysis": {"errors": [{"error": "I doesn't", "correction": "I don't"}]}, "vocabulary_analysis": {"suggestions": [{"word": "good", "better_alternative": "excellent"}]} } -
Testing (following Speech Analysis POC testing pattern):
- Test with maximum 3 conversations using Audio Capture POC
- Update system prompt in POC to generate minimal conversations for test audio
- Store test sessions with session IDs and corresponding audio messages
- Verify JSON output structure matches Speech Analysis POC fields and format exactly
Success Indicators:
- Tool function successfully processes session_id and returns valid feedback JSON
- All 5 sections present in output (overall_scores, fluency_analysis, pronunciation, grammar_analysis, vocabulary_analysis)
- Pipecat-compatible tool ready for integration
- Testing complete with 3 diverse conversation samples
Resources/References
- Issue #274 (Scenario Generation Tool): Reference for Pipecat tool function pattern and structure
- Issue #156 (Speech Analysis POC): Reference for exact feedback JSON structure, thinking_budget configuration, and testing approach
- Audio Capture POC ZIP: Use to generate test conversation audio messages
- Google Generative AI Python Package: https://github.com/googleapis/python-genai (inline audio format documentation)
- Repository: https://github.com/harikrishnan-crayond/ai-coach-gen-ai (can work in main or separate branch)
Updated by Harikrishnan Murugan about 1 month ago
- Tracker changed from Bug to To Do