STOBE Text-to-Speech Services
STOBE currently supports five text-to-speech options. The simplest way to think about them is local voices first, then hosted voice services if you want cloud speech instead.
Recommended Local Options
PocketTTS
PocketTTS is the easiest local starting point. Use it if you want STOBE talking without setting up a more involved voice workflow.
Chatterbox
Chatterbox is a local voice option with a more expressive feel. Use it when you want a stronger local speech option than the most basic setup.
STOBE XTTS
XTTS is the option to pick when you specifically want the STOBE XTTS voice-cloning path. It fits users who want to work with cloned or more customized voices.
Hosted Services
Cartesia
Cartesia is a cloud text-to-speech option. Use it if you want a hosted voice service instead of running speech locally on your own machine.
Inworld
Inworld is STOBE's other hosted speech option. Use it if you want a cloud provider and prefer the Inworld voice platform.
How STOBE Uses Text-to-Speech
STOBE's local text-to-speech options are PocketTTS, XTTS, and Chatterbox. In the current server setup, those local providers normally point at the same local speech endpoint.
The hosted options are Cartesia and Inworld. Those use cloud accounts instead of a local speech server, so you configure them with their service credentials inside the STOBE server tools.
STOBE also keeps fallback male and female voice choices for cases where an NPC does not have a more specific voice mapping yet.


