
Google has announced the rollout preview for the Gemini 3.1 Flash TTS.
This model is Google’s new text-to-speech model focused on improved voice quality, expressiveness, and developer control.
According to Google, the updated system produces more natural-sounding speech and supports over 70 languages.
It also introduces better handling of multi-speaker conversations and more detailed control over tone, pacing, and speaking style.



Also, there is a new feature called “audio tags,” which lets developers adjust how speech is delivered by adding simple instructions inside text. These can be used to change things like emotion, speed, or emphasis during speech generation.
Google says the model has performed strongly in third-party evaluations, including human preference tests on the Artificial Analysis TTS leaderboard.

All audio generated by the system includes a hidden watermark using SynthID, which is designed to identify AI-generated audio.
The Gemini 3.1 Flash TTS model is currently rolling out in preview through the Gemini API and Google AI Studio, and is also available for enterprises via Vertex AI. It is also being integrated into Google Workspace tools such as Google Vids.

ManilaShaker is a tech media producing insightful and helpful content for our local and growing international audience. Our goal is to create a premier Philippine digital consumer electronics resource that provides the most objective reviews and comparisons globally.