Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
WellSaid Labs, a leading artificial intelligence (AI) voice company, unveiled a new technology today that allows users to direct the performance of AI voices in a more natural, nuanced way. The technology, called HINTS (Highly Intuitive Naturally Tailored Speech), enables content creators to shape AI voices by adding contextual annotations, like tempo or loudness adjustments, just like a movie director.
“We have long heard from our customers that they would like to have more direction in shaping our AI’s vocal outputs,” Michael Petrochuk, co-founder and CTO of WellSaid Labs, said in an exclusive interview with VentureBeat. “We wanted to develop a system that is intuitive and natural, that allows our model to predict natural performances based on the users’ production context, so that creatives can better see their artistic vision through.”
Meeting creative needs with AI
Unlike current methods of controlling AI voices through rigid markup languages or prompts, HINTS allows for fine-grained and interpolable adjustments. For example, users can make a specific passage slower by precisely 0.7x or louder by 5 dB, with the AI voice responding naturally. The contextual awareness means annotations can be nested and layered across long scripts.
“Because it is using actual (consensually obtained) human data to make its final audio outputs, its annotated verbalizations are just as ‘realistic’ as unannotated outputs,” Petrochuk told VentureBeat. “Interestingly, we discovered in this research that not only is the model able to model a single dataset effectively, it can be generalized even further and use performances from multiple speakers to inform its use of prosody. We were floored when we first heard this, and it seriously highlights what’s to come with further research.”
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
- Turning energy into a strategic advantage
- Architecting efficient inference for real throughput gains
- Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
Expanding creative possibilities
HINTS addresses a longstanding need for more customizable and director-focused AI voice tools. The new architecture could unlock creative possibilities for voice-based content across audiobooks, training narrations, marketing videos, and more. Early evaluation shows improvements in accuracy and naturalness.
The research also emphasizes responsible and ethical AI practices. “We have been committed to ethical innovation from the start,” said Petrochuk. WellSaid obtains explicit consent from voice donors, protects privacy, and moderates content to prevent misuse or deception.
With vocal AI increasingly embedded in consumer tech and entertainment, HINTS demonstrates how the technology can become an empathetic storytelling medium, not just a vocal machine. While limitations remain compared to working with human talent, tools like HINTS bring us one step closer to truly expressive synthetic voices.