Project

General

Profile

Actions

To Do #288

open

Feedback Generation Tool for AI Communication Coach

Added by Harikrishnan Murugan about 1 month ago. Updated about 1 month ago.

Status:
Pending
Priority:
High
Assignee:
Target version:
-
Start date:
11/24/2025
Due date:
11/26/2025 (36 days late)
% Done:

0%

Estimated time:
Prioritization:
P0

Description

Task Description

Build a Pipecat-compatible tool function that generates comprehensive communication feedback by analyzing complete audio-audio conversations. This tool analyzes practice session conversations stored as audio messages and returns detailed performance assessment across fluency, pronunciation, grammar, and vocabulary dimensions.

The tool follows the same Pipecat tool function pattern as the Scenario Generation Tool, making it callable by the AI Communication Coach when feedback is needed.

This work will be done in the existing repository (https://github.com/harikrishnan-crayond/ai-coach-gen-ai) and will be migrated to the production setup later once Part A is complete in Issue #287.

Technical Approach

Tool Pattern: Follow Pipecat tool function format (same pattern as Scenario Generation Tool):

  • Tool name: feedback_generation
  • Input: session_id (string)
  • Processing: Retrieve audio messages + scenario details from stored sessions, build conversation array with inline base64 audio, send to Gemini with thinking_budget
  • Output: Comprehensive feedback JSON matching Speech Analysis POC structure

Model & Configuration:

  • Use google-genai package with gemini-2.5-flash-native-audio-preview-09-2025 model
  • Include thinking_budget parameter in balanced mode (reference Speech Analysis POC approach)
  • Support inline base64 audio format: {"inline_data": {"mime_type": "audio/pcm", "data": audio_base64}}

Message Construction:

messages = [
  {"role": "system", "content": "Analyze communication performance across fluency, pronunciation, grammar, and vocabulary. Return feedback as JSON."},
  {"role": "user", "content": [{"inline_data": {"mime_type": "audio/pcm", "data": user_audio_base64}}]},
  {"role": "assistant", "content": [{"inline_data": {"mime_type": "audio/pcm", "data": bot_audio_base64}}]},
  # ... alternating user/assistant audio messages
  {"role": "user", "content": "Above are the audio-audio conversation that happened for the given scenario. Now generate the feedback as JSON."}
]

Static Test Data Setup:

  • Store test sessions with session IDs and corresponding audio messages
  • Use Audio Capture POC to generate test conversation audio by updating system prompt for minimal conversations
  • Create at least 3 diverse conversation samples for testing

Acceptance Criteria

  1. Tool Function Implementation:

    • Create feedback_generation Pipecat-compatible tool function
    • Accepts session_id as input parameter
    • Retrieves audio_messages from stored session data
    • Retrieves scenario_details from stored session data
  2. Gemini API Integration:

    • Builds conversation array with inline base64 audio format
    • Appends final user message requesting JSON feedback
    • Calls google-genai package with gemini-2.5-flash-native-audio-preview-09-2025 model
    • Includes thinking_budget parameter in balanced mode
  3. Output JSON Structure (matches Speech Analysis POC format):

    {
      "overall_scores": {"fluency": 8.0, "pronunciation": 7.0, "grammar": 7.5, "vocabulary": 8.0},
      "fluency_analysis": {"speech_length": "2 minutes 30 seconds", "speaking_rate": "145 words per minute", "pause_fillers": ["um", "uh"], "long_pauses": 3},
      "pronunciation": {"mispronounced_words": [{"word": "schedule", "accuracy_score": 65}]},
      "grammar_analysis": {"errors": [{"error": "I doesn't", "correction": "I don't"}]},
      "vocabulary_analysis": {"suggestions": [{"word": "good", "better_alternative": "excellent"}]}
    }
    
  4. Testing (following Speech Analysis POC testing pattern):

    • Test with maximum 3 conversations using Audio Capture POC
    • Update system prompt in POC to generate minimal conversations for test audio
    • Store test sessions with session IDs and corresponding audio messages
    • Verify JSON output structure matches Speech Analysis POC fields and format exactly

Success Indicators:

  • Tool function successfully processes session_id and returns valid feedback JSON
  • All 5 sections present in output (overall_scores, fluency_analysis, pronunciation, grammar_analysis, vocabulary_analysis)
  • Pipecat-compatible tool ready for integration
  • Testing complete with 3 diverse conversation samples

Resources/References

Actions #1

Updated by Harikrishnan Murugan about 1 month ago

  • Tracker changed from Bug to To Do
Actions

Also available in: Atom PDF