Feature #289: AI Communication Coach Service - AI Communication Coach - Redmine

Actions

Copy link

Feature #289

closed

AI Communication Coach Service

Added by Harikrishnan Murugan 3 months ago. Updated about 2 months ago.

Status:

Closed

Priority:

High

Assignee:

Prisha S

Target version:

Start date:

11/24/2025

Due date:

11/28/2025

% Done:

Estimated time:

Prioritization:

Description

Task Description

Build the AI Communication Coach service that helps users practice English communication skills through realistic conversation scenarios. The coach generates appropriate practice scenarios, conducts natural conversations, tracks duration, and provides comprehensive feedback on fluency, pronunciation, grammar, and vocabulary.

The coach uses two integrated tools (scenario generation and feedback generation) to deliver the complete practice experience. Work is split between system prompt design (Prisha) and technical integration (Ishita), with both collaborating on end-to-end testing.

Repository: https://github.com/harikrishnan-crayond/ai-coach-gen-ai
Branching: Checkout from feature/production-setup, create new branch for this work

Complete User Journey

User starts practice session:

User selects their profile (user_id), proficiency level (beginner/intermediate/advanced), and optional topic of interest
Backend initializes AI Communication Coach with user context
First-time users: Coach welcomes user and informs them about generating a practice scenario
Returning users: Coach directly informs about generating next scenario (skip welcome)
Loading state shown: "Generating your practice scenario..."
Scenario appears on screen with title, description, and duration (e.g., 8 minutes)

Conversation practice:
6. User and coach engage in natural conversation based on the scenario
7. Coach tracks elapsed time and plans to wrap up gracefully within ±30 seconds of target duration
8. All audio (user and coach) is captured throughout the session

Feedback delivery:
9. When duration completes, coach calls feedback_generation tool
10. Tool immediately stops recording (audiobuffer.stop_recording()) to save compute resources
11. Loading state shown: "Analyzing your performance..."
12. Tool generates comprehensive feedback while user waits
13. User sees detailed analysis displayed as text: overall scores, fluency metrics, mispronounced words, grammar corrections, vocabulary suggestions
14. User can choose to practice the same scenario again or start a new one

Why stop session during feedback generation: Recording stops to save compute resources (WebRTC connection) since feedback generation takes time and user reviews feedback before next session.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     USER STARTS SESSION                          │
│  Frontend: User selects user_id, level, topic (optional)        │
│  POST /sessions/start {user_id, level, topic?}                  │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│              AI COMMUNICATION COACH INITIALIZES                  │
│                                                                   │
│  • Load user profile from users.json (industry, role)            │
│  • Check if first-time user (for welcome message)                │
│  • System Prompt with user context                               │
│  • Register Tools: scenario_generation, feedback_generation      │
│  • AudioBufferProcessor ready for audio capture                  │
│  • WebRTC connection established                                 │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│               SCENARIO GENERATION (Tool Call)                    │
│                                                                   │
│  First-time: Welcome + inform about scenario generation          │
│  Returning: Directly inform about next scenario generation       │
│                                                                   │
│  Tool start event → Frontend shows: "Generating scenario..."     │
│  Coach → scenario_generation(user_id, topic?, personalization)   │
│  Returns: {title, description, duration}                         │
│  Update: active_sessions[session_id]["scenario_details"]         │
│  Sent to frontend via rtvi.send_server_message()                │
│  Store scenario_start_timestamp                                  │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│           CONVERSATION PRACTICE (Live Audio via WebRTC)          │
│                                                                   │
│  • Scenario displayed, conversation auto-starts                  │
│  • Coach conducts natural conversation based on scenario         │
│  • Time tracking: Inject periodic time updates via               │
│    LLMMessagesAppendFrame ("[Time remaining: X.X minutes]")      │
│  • Coach plans wrap-up at duration ±30 seconds                   │
│  • AudioBufferProcessor captures user + coach audio              │
│  • Continuous update: active_sessions[session_id]["audio_msgs"] │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│            FEEDBACK GENERATION (Tool Call)                       │
│                                                                   │
│  Tool start event → Frontend shows: "Analyzing performance..."   │
│  FIRST: audiobuffer.stop_recording() (save compute)              │
│  Coach → feedback_generation(session_id)                         │
│  Tool retrieves: audio_messages + scenario_details from store    │
│  Analyzes with Gemini                                            │
│  Returns: {overall_scores, fluency_analysis, pronunciation,      │
│            grammar_analysis, vocabulary_analysis}                │
│  Update: active_sessions[session_id]["feedback"]                 │
│  Sent to frontend via rtvi.send_server_message()                │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│             FEEDBACK DISPLAYED TO USER (Text Only)               │
│  User reviews performance analysis in text format                │
│  • Overall scores                                                │
│  • Fluency metrics                                               │
│  • Mispronounced words (text list)                               │
│  • Grammar corrections                                           │
│  • Vocabulary suggestions                                        │
│                                                                   │
│  Options: "Practice Same Scenario" or "New Scenario"             │
└─────────────────────────────────────────────────────────────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                    IN-MEMORY STORAGE STRUCTURE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Python FastAPI global dictionary:

active_sessions = {}

active_sessions[session_id] = {
    "user_id": "user_001",
    "scenario_details": {
        "input": {
            "user_id": "user_001",
            "industry": "technology",
            "role": "software engineer",
            "level": "intermediate",
            "topic": "client meeting",
            "personalization_note": ""
        },
        "output": {
            "title": "Client Meeting Simulation",
            "description": "Practice professional communication...",
            "duration": 8
        }
    },
    "audio_messages": [
        {"role": "user", "audio": "base64...", "timestamp": 1234.56, "duration_ms": 3000},
        {"role": "assistant", "audio": "base64...", "timestamp": 1238.12, "duration_ms": 2500}
    ],
    "feedback": {
        "overall_scores": {"fluency": 8.0, "pronunciation": 7.0, ...},
        "fluency_analysis": {...},
        "pronunciation": {...},
        "grammar_analysis": {...},
        "vocabulary_analysis": {...}
    },
    "scenario_start_timestamp": 1732467234.56,
    ...
}

Storage updates during session:
- scenario_generation tool → Updates scenario_details
- AudioBufferProcessor → Continuously appends to audio_messages
- feedback_generation tool → Updates feedback

Technical Approach

Branching Strategy:

Checkout from feature/production-setup branch
Create new branch for AI Communication Coach integration work
Push changes to new branch

System Prompt Design (Prisha):

Create AI Communication Coach system message with user context (industry, role, level)
Include first-time user detection logic (check if user has previous sessions)
Define coach behavior:
- First-time: Welcome user + inform about scenario generation (user-friendly, no technical details)
- Returning: Directly inform about next scenario generation
Define coach responsibilities: generate scenarios, conduct natural conversations, track duration, call feedback_generation tool
Document available tools and their usage patterns
Single system message at initialization

Tool Registration & Loading States (Ishita):

Register scenario_generation tool with Pipecat
Register feedback_generation tool with Pipecat
Implement tool start events for loading states:
- scenario_generation starts → Send event to frontend: "Generating your practice scenario..."
- feedback_generation starts → Send event to frontend: "Analyzing your performance..."
Implement event-driven tool responses using rtvi.send_server_message()
Both tools return results via params.result_callback() for LLM context

Temporal Context Implementation (Ishita):

Implement time updates using LLMMessagesAppendFrame
Inject periodic time remaining messages as user role
Calculate: remaining_time = scenario_duration - (current_time - scenario_start_time)
Enable AI coach to plan graceful wrap-up at duration ±30 seconds

In-Memory Storage Management (Ishita):

Implement global active_sessions = {} dictionary
Structure per session: user_id, scenario_details (input + output), audio_messages, feedback, timestamps
scenario_generation tool updates: active_sessions[session_id]["scenario_details"]
AudioBufferProcessor continuously updates: active_sessions[session_id]["audio_messages"]
feedback_generation tool updates: active_sessions[session_id]["feedback"]

Feedback Generation Tool Logic (Reference Issue #288):

FIRST action when tool called: audiobuffer.stop_recording() (saves compute)
Retrieve audio_messages and scenario_details from active_sessions storage
Generate feedback with Gemini
Update storage with feedback results
Send to frontend

Session Management (Ishita):

Track user session history to determine first-time vs returning user
Store flag in active_sessions for coach behavior differentiation
Implement session stop logic in feedback_generation tool (not coach manually)

Acceptance Criteria

Complete End-to-End Flow Working:

User selects user_id, level, topic → AI coach detects first-time vs returning user
First-time users: Coach welcomes and informs about scenario generation (user-friendly language)
Returning users: Coach directly informs about next scenario generation (no welcome)
Loading state displayed: "Generating your practice scenario..."
Scenario displayed with title, description, duration → Conversation auto-starts
AI coach conducts natural conversation based on scenario context
Time tracking enables coach to plan wrap-up at duration ±30 seconds
Coach calls feedback_generation tool
Tool stops recording immediately, loading state displayed: "Analyzing your performance..."
Comprehensive feedback displayed as text only: overall scores, fluency analysis, pronunciation errors (text list), grammar corrections, vocabulary suggestions
User can choose to practice same scenario again or start new scenario
Event-driven tool responses working (no polling required)
All 3 static users tested successfully with different scenarios (both first-time and returning behavior)
Audio messages captured correctly throughout all sessions

System Prompt (Prisha):

System message includes user context (industry, role, level)
First-time user detection logic included
Defines coach behavior for first-time vs returning users
Coach communication is user-friendly without technical details
Lists available tools with usage patterns
Single system message at initialization

Tool Integration & Loading States (Ishita):

Both tools registered with Pipecat
Tool start events implemented for both tools with appropriate loading messages
Event-driven responses via rtvi.send_server_message()
Results also returned via params.result_callback() for LLM context
Frontend receives tool results without polling
Loading states displayed correctly during tool execution

Temporal Context (Ishita):

LLMMessagesAppendFrame injects time updates
Time remaining format: "[Time remaining: X.X minutes]"
Updates sent periodically during conversation
AI coach receives temporal context for wrap-up planning

In-Memory Storage (Ishita):

Global active_sessions dictionary implemented
Storage structure matches specification (user_id, scenario_details with input/output, audio_messages, feedback)
scenario_generation tool updates storage correctly
AudioBufferProcessor updates storage continuously
feedback_generation tool retrieves and updates storage correctly

Session Management (Ishita):

User session history tracked correctly
First-time vs returning user flag working
feedback_generation tool stops recording first (audiobuffer.stop_recording())
Session stop saves compute resources as intended
Text-only feedback display working

Success Indicators:

Complete user journey works seamlessly from session start to feedback display
Coach behavior adapts appropriately for first-time vs returning users
AI coach communication is natural and user-friendly
Loading states provide clear feedback during tool execution
Duration tracking enables proper conversation pacing
feedback_generation tool handles session stop correctly
In-memory storage structure working correctly across all tool calls
Feedback displayed as text format successfully
System ready for user testing with 3 static profiles

Resources/References

Issue #287 (Production Setup): Base infrastructure and APIs, feature/production-setup branch
Issue #274 (Scenario Generation Tool): Tool function implementation
Issue #288 (Feedback Generation Tool): Tool function implementation and session stop logic
Pipecat Tool Registration: https://docs.pipecat.ai/
Repository: https://github.com/harikrishnan-crayond/ai-coach-gen-ai