CHIM Text-to-Speech Services
This page documents the current text-to-speech services still actively used by HerikaServer for CHIM. The split below follows the current quickstart flow: recommended first, then the other valid services.
Recommended
PocketTTS
CPU-oriented local text-to-speech using the standard CHIM voice cache workflow. This is one of the main first-run recommendations.
- Best for: easiest local baseline.
Chatterbox
Local text-to-speech service using the same general voice cache and management flow as XTTS and PocketTTS, but aimed at a more expressive output path.
- Best for: recommended local expressive speech.
CHIM XTTS
The dedicated XTTS path in CHIM. This is the service to use when you specifically want the CHIM XTTS workflow, uploaded voice samples, and the familiar clone/cache flow.
- Best for: XTTS-specific voice cloning workflow.
- Driver id:
xtts-fastapi. - Supports paralinguistic tag prompts in the current server config.
Inworld
Hosted text-to-speech with language, model, temperature, and speed controls. It is a good option if you want a cloud voice service instead of running speech locally.
- Best for: hosted voice generation with cloning support.
- Strengths: voice cloning flow, multi-language support, model selection.
- Needs: Inworld workspace and API credentials.
Cartesia
Hosted text-to-speech with current language and model options, including the newer sonic line in the server schema.
- Best for: higher-end hosted voice path with strong language support.
- Strengths: multiple models, speed control, automatic voice sync from the CHIM voice cache.
- Needs: Cartesia API key.
Other Services
MeloTTS
Local text-to-speech that maps strongly to Skyrim voice types. This is useful when you want a voice-type oriented local setup instead of the XTTS-style cloned voice flow.
Mimic3
Local HTTP text-to-speech service with direct voice, rate, and volume controls. Useful if you want simple local synthesis without using the XTTS-family flow.
Piper Text-to-Speech
Local Piper endpoint integration. Good for self-hosted offline speech if you are comfortable downloading and managing voice models manually.
Kokoro
Local Kokoro endpoint integration. It appears as a valid current text-to-speech driver in the connector system and uses a local HTTP endpoint.
KoboldCPP Text-to-Speech
Local text-to-speech path exposed through the KoboldCPP extra text-to-speech endpoint. Use it only if your local KoboldCPP stack is already set up for speech.
Zonos
Zonos Gradio endpoint integration for users who want that specific self-hosted model family. It supports a wide language list and CHIM voice cache style voice ids in the current schema.
xVASynth
Legacy-friendly local Skyrim voice workflow tied to xVASynth models. Use this only if you already know you want the xVASynth route.
Azure
Hosted Azure speech synthesis with mood/style support, voice selection, and prosody controls. This is still one of the more configurable cloud voice options in the CHIM stack.
ElevenLabs
Hosted ElevenLabs speech with model, stability, similarity, style, speed, and optional v3 audio tag controls.
OpenAI Text-to-Speech
Hosted OpenAI speech synthesis using the current audio speech endpoint. Useful if you want a direct OpenAI voice path instead of a separate cloud text-to-speech provider.
Deepgram
Hosted Deepgram text-to-speech. It is still a valid current CHIM service, but it is more of an alternative pick than a main quickstart choice.


