CHIM Image-to-Text

Image-to-text is the vision side of CHIM. In normal use, this mainly means Soulgaze: CHIM looks at a screenshot, reads what is on screen, and turns that into useful context for your game.

How Soulgaze Works

When you use Soulgaze, CHIM takes a screenshot and sends it to your chosen image-reading service. It also passes along helpful context like who is visible and where you are, so the result is more useful than a plain screenshot description.

The Soulgaze wheel gives you four main options:

Soulgaze: the standard screenshot read. Use this when you want CHIM to understand the current scene.
NPC Photo Zoom: takes a tighter portrait-style shot of the current NPC.
NPC Photo: takes a normal NPC photo without the zoomed framing.
Just Upload: uploads the image without the full Soulgaze presentation flow.

Finished images are saved into your Soulgaze gallery, so you can keep using the system as part of your broader roleplay setup.

Soulgaze Notes

Soulgaze works best when the screen clearly shows what you want CHIM to notice. If you turn on Soulgaze HD Mode, CHIM uses a higher-quality capture path for cleaner image reading, but it is heavier and VR users will usually want it off.

Recommended Image-to-Text Service

OpenRouter is the main recommended option. It is the easiest general-purpose choice if you want Soulgaze working without setting up your own local image service.

OpenRouter

Other Supported Image-to-Text Services

OpenAI is direct OpenAI image reading. It is a good fit if you already use OpenAI for the rest of your setup.

OpenAI

Google OpenAI uses Google's image-reading models. It is a good fit if you prefer Google's model family.

Google OpenAI

Custom is for people running their own image-reading service or using a special provider that is not one of the built-in defaults.

llama.cpp is a local self-hosted option for people who want image reading on their own machine.

llama.cpp