Compare/Voice-only

Emobot vs voice-only monitoring.

Vocal biomarkers are a legitimate and well-researched signal for depression. The question is not whether voice works, it is whether one signal is enough to build a clinical monitoring program on.

What Voice-only is

Voice-only tools analyze speech, prosody, and energy from recorded audio or calls to estimate mood and depression severity. The vocal biomarker literature is solid, and these tools can produce useful estimates from a single modality.

Where Voice-only is genuinely good

  • Vocal biomarkers are well studied and genuinely informative for affect.
  • A voice sample can be quick to capture in some workflows.
  • Useful where a single, lightweight signal is all that is needed.

Where it falls short

  • One modality is more fragile: it degrades when the patient speaks little, or in noisy or inconsistent conditions.
  • Many voice tools require the patient to actively record or take a call, which reintroduces burden and drop-off.
  • A single channel offers less corroboration than several signals agreeing or disagreeing.

Side by side

FeatureEmobotVoice-only
Signals usedFace + voice + activity + behaviorVoice only
Patient action requiredNone after setupOften: record / take a call
RobustnessMultiple signals corroborateSingle point of failure
FrequencyContinuousWhen voice is captured
Validationr=0.89 vs MADRS, 11 studiesVaries by tool

Where Emobot is different

Four signals, not one

Facial expression, vocal biomarkers, actigraphy, and digital behavior are fused, so the score does not collapse when any single channel is sparse.

Truly passive

No recordings to make and no calls to take. Monitoring continues whether or not the patient speaks much.

Privacy by design

Facial analysis runs on-device and voice is processed in a short ephemeral window then discarded, leaving only a numerical score.

Frequently asked questions

Do vocal biomarkers work for depression monitoring?

Yes, vocal biomarkers are a well-researched and informative signal. The limitation of voice-only tools is reliance on a single modality, which is more fragile and often still requires the patient to actively provide a sample.

How is Emobot different from voice-only tools?

Emobot fuses four passive signals (face, voice, activity, digital behavior) rather than relying on voice alone, so it is more robust and requires no action from the patient after a 3-minute setup.

Is multimodal monitoring more accurate than voice-only?

Multiple corroborating signals are generally more robust than one. Emobot's fused multimodal score correlates with MADRS at r=0.89 across 11 clinical studies.

See the difference on a real patient case.

A 30-minute demo shows exactly what continuous, passive, multimodal monitoring looks like in your dashboard.