Imagine a customer support voicebot that speaks in your founder’s voice, a virtual coach that sounds like you, or an interactive character in a game that talks with a familiar tone. Personalized AI voice assistants built with realistic AI voice generators are transforming how we build more human-centered AI experiences for business, creative, or accessibility-focused applications.
In this post, we’ll walk you through how to build a real-time AI voicebot that uses your own cloned voice, powered by Cartesia’s advanced Text-to-Speech (TTS) technology and LiveKit Agents, a framework for deploying conversational voice agents.
The Role of a Voicebot’s Voice
Voice AI applications are typically composed of a cascade of three different AI models:
- A Speech Recognition Model – Converts the user’s spoken input into text in real time.
- A Natural Language Understanding and Response Generation Model – Interprets the user’s intent and generates an appropriate response.
- A Text-to-Speech (TTS) Model – Transforms the generated text into spoken audio to reply to the user.
While all three models are essential, the Text-to-Speech (TTS) model plays a leading role in shaping the user experience—it’s the first thing users hear, and it defines how human or robotic your voicebot feels.. The more lifelike and expressive the voice that users hear, the more likely they feel engaged and confident interacting with the bot.
Cartesia is a popular Text-to-Speech service. Their flagship model, Sonic, is built on a State Space Model architecture designed for low-latency, high-performance speech synthesis. This allows it to generate expressive, realistic speech in real time—ideal for interactive applications like voicebots.
Among their offerings is the ability to clone real voices that you can use in your voicebot, giving users a more familiar and human-sounding voice to interact with.
Prerequisites
To build a voicebot with your own voice like the one shown in this post, you need:
- A Cartesia account
- A Cartesia API Key
- API Keys for the Speech-To-Text and Large Language Model of your choice, as available in LiveKit Agents. We chose to use Deepgram Enterprise Voice AI and OpenAI gpt-4o-mini.
- A set of keys for LiveKit Cloud
Cloning your Voice
To make your voicebot sound like you, you’ll first need to clone your voice using Cartesia’s Instant Clone feature.
Get started by signing in into your Cartesia account and heading to the Instant Clone page. You’ll need to provide a sample of your voice to clone it, so you will record yourself directly in the web interface or uploading a file containing a recording.
To ensure your cloned voice sounds natural and aligned with your voicebot’s tone, we recommend reading a script that resembles what you expect your bot to say and using the tone you expect the bot to have.
When possible, try to use a high quality microphone and be in a quiet space. If this is not possible, you can check the Reduce Background Noise box. However, beware that this may reduce the similarity of the clone to source clip.
Next, set a name and a language for the voice and click on “Clone”. After the process is finished, note the ID number that has been created.
Configure TTS service Using LiveKit Agents
Now it’s time to create our voicebot!
A great way to do this is by leveraging the LiveKit Agents framework, which simplifies the process of connecting the user to the bot. It handles the underlying complexity using the power of LiveKit Cloud—or your own self-hosted LiveKit deployment.
For this post, we use the code examples from the Voice AI quickstart, but configure the TTS service in LiveKit Agents to use the ID of the new voice.
In the AgentSession configuration we configure tts
as follows:
from livekit.plugins import cartesia
session = AgentSession(
tts=cartesia.TTS(
model="sonic-2",
voice="<the-id-of-your-voice>"
)
# ... llm, stt, etc.
)
Next, follow the steps in the quickstart
and run the agent.py
file like this:
python agent.py dev
Then, open the LiveKit Agents Playground and select the project where you created the set of keys. Now you can talk to yourself without looking too crazy!
Voicebots with More Personalized Interactions
Cloning your voice for use in a real-time AI application unlocks a new level of personalized, engaging voicebot experiences. Whether you’re building a custom AI assistant, an interactive marketing tool, or an emotionally intelligent support bot, using your own voice adds a layer of authenticity that scripted voices can’t match.
As Voice AI technology continues to advance, platforms like Cartesia and LiveKit Agents make it easier than ever to deliver human-sounding, real-time voice interactions. These tools help bridge the gap between AI and human communication—creating more natural, immersive conversations with bots.
At WebRTC.ventures, we specialize in creating real-time, AI-powered voice applications that combine cutting-edge tools like Cartesia voice cloning and LiveKit Agents. We can help you deliver a voice-first experience that’s as unique as your business.
Contact WebRTC.ventures today to bring your voicebot idea to life. Let’s Make It Live!