CHIM Speech-to-Text Services

This page documents the current speech-to-text services still actively used by HerikaServer for CHIM. The split below follows the current quickstart flow: recommended first, then the other valid services.

Other Services

OpenAI Whisper

Hosted OpenAI Whisper speech-to-text. Supports language selection and an optional translate-to-English behavior in the current schema.

OpenAI Speech-to-Text

Local Whisper

The DwemerDistro-installed local Whisper endpoint. This is the route to use if you want your speech recognition to stay on your own machine instead of using a hosted speech-to-text provider.

Gemini

Google Gemini speech-to-text plus emotion detection. The current schema supports Google AI API credentials, language selection, and current Gemini model choices.

Google AI Studio

Azure

Azure speech-to-text with language and profanity handling controls. Use this if Azure is already your preferred speech stack.

Azure Speech-to-Text

Inworld

Inworld speech-to-text with provider/model identifiers and BCP-47 language codes. This is the speech-to-text path to use if you are already building around Inworld services.

Inworld