STOBE Text-to-Speech Services

STOBE currently supports five text-to-speech options. The simplest way to think about them is local voices first, then hosted voice services if you want cloud speech instead.

Recommended Local Options

PocketTTS

PocketTTS is the easiest local starting point. Use it if you want STOBE talking without setting up a more involved voice workflow.

Chatterbox

Chatterbox is a local voice option with a more expressive feel. Use it when you want a stronger local speech option than the most basic setup.

STOBE XTTS

XTTS is the option to pick when you specifically want the STOBE XTTS voice-cloning path. It fits users who want to work with cloned or more customized voices.

Hosted Services

Cartesia

Cartesia is a cloud text-to-speech option. Use it if you want a hosted voice service instead of running speech locally on your own machine.

Cartesia

Inworld

Inworld is STOBE's other hosted speech option. Use it if you want a cloud provider and prefer the Inworld voice platform.

Inworld

How STOBE Uses Text-to-Speech

STOBE's local text-to-speech options are PocketTTS, XTTS, and Chatterbox. In the current server setup, those local providers normally point at the same local speech endpoint.

The hosted options are Cartesia and Inworld. Those use cloud accounts instead of a local speech server, so you configure them with their service credentials inside the STOBE server tools.

STOBE also keeps fallback male and female voice choices for cases where an NPC does not have a more specific voice mapping yet.