CHIM Speech-to-Text Services

This page documents the current speech-to-text services still actively used by HerikaServer for CHIM. The split below follows the current quickstart flow: recommended first, then the other valid services.

Recommended

Deepgram

Hosted speech-to-text and one of the current recommended baseline choices. The current schema supports language selection and a model picker, including newer Deepgram model entries.

  • Best for: recommended hosted microphone transcription.
  • Needs: Deepgram API key.
  • Current model examples: nova-3, nova-2, whisper-medium.

Deepgram

Parakeet

The other current recommended speech-to-text option in HerikaServer. It is treated as a first-class recommended service in the speech-to-text connector grouping.

  • Best for: recommended alternative speech-to-text path.
  • Current schema focus: language selection.
  • Good fit when you want the current CHIM-recommended non-Deepgram route.

Other Services

OpenAI Whisper

Hosted OpenAI Whisper speech-to-text. Supports language selection and an optional translate-to-English behavior in the current schema.

OpenAI Speech-to-Text

Local Whisper

The DwemerDistro-installed local Whisper endpoint. This is the route to use if you want your speech recognition to stay on your own machine instead of using a hosted speech-to-text provider.

Gemini

Google Gemini speech-to-text plus emotion detection. The current schema supports Google AI API credentials, language selection, and current Gemini model choices.

Google AI Studio

Azure

Azure speech-to-text with language and profanity handling controls. Use this if Azure is already your preferred speech stack.

Azure Speech-to-Text

Inworld

Inworld speech-to-text with provider/model identifiers and BCP-47 language codes. This is the speech-to-text path to use if you are already building around Inworld services.

Inworld