Skip to main content

Timeline Entry

Each element under timeline contains multimodal signals for a single utterance.
utterance_id
string
Unique identifier for the utterance
speaker
string
Speaker identifier from diarization
t_start
number
Start time in seconds
t_end
number
End time in seconds
text
string
Transcribed text from speech recognition
asr_confidence
number
ASR confidence score (0-1)
prosody
object
Acoustic metrics including:
  • Pitch/energy statistics
  • Pauses
  • Spectral features
  • Commitment/passion markers
  • Lexical markers
  • Hesitation flags
prosody_desc
string
Natural-language descriptor (e.g., “COMMITTED, PASSIONATE, emphatic”)
emotion
object
Full emotion analysis including:
  • Emotion scores (joy, sadness, etc.)
  • PAD scores (valence, arousal, dominance)
  • Dominant emotion
  • Confidence
sentiment
object
Structured sentiment analysis
polarity
number
Sentiment polarity (-1 to 1)
subjectivity
number
Subjectivity score (0-1)
cognitive
object
Cognitive analysis including:
  • Engagement score
  • Cognitive load level
  • Cognitive signals
commitment_score
number
Derived commitment score from prosody
commitment_level
string
Commitment level classification
passion_score
number
Derived passion score from prosody
passion_level
string
Passion level classification
conviction_score
number
Conviction score derived from prosody and acoustic reasoning
lexical_markers
object
Lexical analysis including:
  • Hedges
  • Fillers
  • Questions
  • Disfluency rate
  • Self-repair flags
acoustic_reasoning
string
Narrative produced explaining how acoustic cues map to cognition/emotion
confidence
number
Average confidence across all agents (0-1)

ConversationSummary Schema (Reference)

Fields:
FieldTypeNotes
overviewstringExecutive-level synopsis.
key_insightsstring[]Bullet-ready highlights (timestamp references encouraged).
notable_patternsstring[]Trends across the conversation.
recommendationsstring[]Actionable next steps.
what_went_wellstring[] | nullPopulated when the user explicitly asks “what went well.”
what_went_wrongstring[] | nullPopulated on “what went wrong” requests.
speaker_dynamics[{speaker, description}] | nullCharacterization per speaker.
conversation_quality`“excellent""good""fair""poor”`Defaults to "good".
When contexts such as “call out what went well” are provided, those optional arrays will contain 3-5 evidence-backed bullets.

Insights

Aggregated insights from the fusion service.
overall_sentiment
object
Sentiment statistics:
  • mean: Average sentiment score
  • std: Standard deviation
  • trend: Sentiment trend (increasing/decreasing/stable)
dominant_emotions
array
Top three dominant emotions (strings)
engagement_pattern
string
Engagement pattern: increasing, decreasing, or stable
key_moments
array
Curated subset of detected moments (MomentEvent objects)
speaker_profiles
object
Profiles for each speaker
conversation_dynamics
object
Analysis of conversation dynamics
cognitive_profile
object
Overall cognitive profile

Transition Event

Affective state transition detection.
transition_id
string
Unique transition identifier
at_ms
integer
Timestamp in milliseconds
from_state
object
Previous affective state
to_state
object
New affective state
evidence_utterance_ids
array
Array of utterance IDs that contributed to this transition
drivers
object
Valence/arousal/engagement deltas that triggered the transition
confidence
number
Confidence score (0-1)
stability
string
Stability indicator

Moment Event

Key moment detection.
moment_id
string
Unique moment identifier
category
string
Moment category: objection, cta_offered, cta_accepted, cta_rejected, topic_shift
speaker_id
string
Speaker identifier
start_ms
integer
Start timestamp in milliseconds
end_ms
integer
End timestamp in milliseconds
utterance_ids
array
Array of associated utterance IDs
summary
string
Summary of the moment
labels
array
Additional labels
confidence
number
Confidence score (0-1)

Conversation Segmentation

Segmentation of conversation into phases.
phases
array
Array of ConversationPhase objects
total_phases
integer
Total number of phases
segmentation_rationale
string
Explanation of segmentation approach
narrative_arc
string
Narrative arc description
key_transitions
array
Array of key transitions (TransitionEvent objects)

Conversation Phase

Individual phase within a segmented conversation.
phase_id
string
Unique phase identifier
phase_name
string
Name of the phase
start_timestamp
number
Start timestamp in seconds
end_timestamp
number
End timestamp in seconds
start_utterance_id
string
First utterance ID in this phase
end_utterance_id
string
Last utterance ID in this phase
utterance_count
integer
Number of utterances in this phase
theme
string
Theme of the phase
summary
string
Summary of the phase
emotional_tone
string
Emotional tone of the phase
key_moments
array
Key moments within this phase (MomentEvent objects)
transition_signal
object
Transition signal that led to this phase (TransitionEvent)

Error Response

Standard error response format.
detail
string
Error message describing what went wrong
Example:
{
  "detail": "Invalid or missing API key"
}

Live Session Response

Response from starting a live session.
session_id
string
Unique session identifier
expires_at
string
ISO 8601 timestamp when session expires
status
string
Current session status
next_chunk_seq
integer
Next expected chunk sequence number (zero-based)

Metrics Response

Observability metrics for dashboards.
events
object
Event counts by type:
  • status: Number of status events
  • final_transcript: Number of final transcript events
  • emotion: Number of emotion events
  • Additional event types as key-value pairs
agent_latency_ms
object
Average agent latency per utterance type (milliseconds)
Example:
{
  "events": {
    "status": 45,
    "final_transcript": 12,
    "emotion": 12
  },
  "agent_latency_ms": {
    "utterance": 215.34
  }
}