To Do #288: Feedback Generation Tool for AI Communication Coach - AI Communication Coach - Redmine

Actions

Copy link

To Do #288

closed

Feedback Generation Tool for AI Communication Coach

Added by Harikrishnan Murugan 3 months ago. Updated about 2 months ago.

Status:

Closed

Priority:

High

Assignee:

Prisha S

Target version:

Start date:

11/24/2025

Due date:

11/26/2025

% Done:

Estimated time:

Prioritization:

Description

Task Description

Build a Pipecat-compatible tool function that generates comprehensive communication feedback by analyzing complete audio-audio conversations. This tool analyzes practice session conversations stored as audio messages and returns detailed performance assessment across fluency, pronunciation, grammar, and vocabulary dimensions.

The tool follows the same Pipecat tool function pattern as the Scenario Generation Tool, making it callable by the AI Communication Coach when feedback is needed.

This work will be done in the existing repository (https://github.com/harikrishnan-crayond/ai-coach-gen-ai) and will be migrated to the production setup later once Part A is complete in Issue #287.

Technical Approach

Tool Pattern: Follow Pipecat tool function format (same pattern as Scenario Generation Tool):

Tool name: feedback_generation
Input: session_id (string)
Processing: Retrieve audio messages + scenario details from stored sessions, build conversation array with inline base64 audio, send to Gemini with thinking_budget
Output: Comprehensive feedback JSON matching Speech Analysis POC structure

Model & Configuration:

Use google-genai package with gemini-2.5-flash-native-audio-preview-09-2025 model
Include thinking_budget parameter in balanced mode (reference Speech Analysis POC approach)
Support inline base64 audio format: {"inline_data": {"mime_type": "audio/pcm", "data": audio_base64}}

Message Construction:

messages = [
  {"role": "system", "content": "Analyze communication performance across fluency, pronunciation, grammar, and vocabulary. Return feedback as JSON."},
  {"role": "user", "content": [{"inline_data": {"mime_type": "audio/pcm", "data": user_audio_base64}}]},
  {"role": "assistant", "content": [{"inline_data": {"mime_type": "audio/pcm", "data": bot_audio_base64}}]},
  # ... alternating user/assistant audio messages
  {"role": "user", "content": "Above are the audio-audio conversation that happened for the given scenario. Now generate the feedback as JSON."}
]

Static Test Data Setup:

Store test sessions with session IDs and corresponding audio messages
Use Audio Capture POC to generate test conversation audio by updating system prompt for minimal conversations
Create at least 3 diverse conversation samples for testing

Acceptance Criteria

Tool Function Implementation:
- Create feedback_generation Pipecat-compatible tool function
- Accepts session_id as input parameter
- Retrieves audio_messages from stored session data
- Retrieves scenario_details from stored session data
Gemini API Integration:
- Builds conversation array with inline base64 audio format
- Appends final user message requesting JSON feedback
- Calls google-genai package with gemini-2.5-flash-native-audio-preview-09-2025 model
- Includes thinking_budget parameter in balanced mode

Output JSON Structure (matches Speech Analysis POC format):

{
  "overall_scores": {"fluency": 8.0, "pronunciation": 7.0, "grammar": 7.5, "vocabulary": 8.0},
  "fluency_analysis": {"speech_length": "2 minutes 30 seconds", "speaking_rate": "145 words per minute", "pause_fillers": ["um", "uh"], "long_pauses": 3},
  "pronunciation": {"mispronounced_words": [{"word": "schedule", "accuracy_score": 65}]},
  "grammar_analysis": {"errors": [{"error": "I doesn't", "correction": "I don't"}]},
  "vocabulary_analysis": {"suggestions": [{"word": "good", "better_alternative": "excellent"}]}
}

Testing (following Speech Analysis POC testing pattern):
- Test with maximum 3 conversations using Audio Capture POC
- Update system prompt in POC to generate minimal conversations for test audio
- Store test sessions with session IDs and corresponding audio messages
- Verify JSON output structure matches Speech Analysis POC fields and format exactly

Success Indicators:

Tool function successfully processes session_id and returns valid feedback JSON
All 5 sections present in output (overall_scores, fluency_analysis, pronunciation, grammar_analysis, vocabulary_analysis)
Pipecat-compatible tool ready for integration
Testing complete with 3 diverse conversation samples

Resources/References

Issue #274 (Scenario Generation Tool): Reference for Pipecat tool function pattern and structure
Issue #156 (Speech Analysis POC): Reference for exact feedback JSON structure, thinking_budget configuration, and testing approach
Audio Capture POC ZIP: Use to generate test conversation audio messages
Google Generative AI Python Package: https://github.com/googleapis/python-genai (inline audio format documentation)
Repository: https://github.com/harikrishnan-crayond/ai-coach-gen-ai (can work in main or separate branch)