The era of clunky, keypad-driven legacy IVR customer service systems that have long frustrated users is finally over. The future of Interactive Voice Response is truly conversational, and it’s ready for prime time. That’s why Deepgram’s State of Voice AI 2025 report says 84% of business leaders are increasing their Voice AI budgets this year.

Our expert team at WebRTC.ventures built a Smart IVR Agent that serves as a blueprint for this future: a voice-first IVR system powered by LiveKit’s Voice AI technology, along with Deepgram for transcription, OpenAI for language understanding, and Cartesia for text-to-speech.

A Personalized Call Journey, from the First “Hello”

Picture calling a service and being recognized instantly. That’s the standard we aimed for. Here’s how a typical call with our Smart IVR agent unfolds:

  1. Instant Identification. When a call comes in via a traditional phone number, our system immediately identifies the caller by their phone number.
  2. Warm Welcome for Returning Users. If you’ve called before, the agent greets you by name. No need to repeat your life story. It immediately asks how it can help, for example, “Welcome back, Jane! What department are you calling for today?”
  3. Seamless Onboarding for New Users. First-time caller? The agent politely gathers your information in a natural conversation.
  4. Intelligent Routing. The agent asks what you are calling about. Based on the response, it performs a seamless SIP transfer to the correct department’s line. If requested, it can also transfer the call directly to a human agent.

This entire process is smooth, quick, and feels less like interacting with a machine and more like talking to a capable assistant.

Under the Hood: How the Smart IVR System Works

Our IVR Agent system is built on a modern, flexible tech stack, with the LiveKit Agent SDK at its core. The brain of our operation is a Python-based agent that orchestrates the entire conversational flow. 

LiveKit Agents don’t lock you into a specific provider, so you can change the STT, LLM, and TTS to match the needs of your use case. 

Here’s how it works and the models we chose for this project:

  • Speech-to-Text (STT): We used Deepgram’s Nova-3 model to transcribe the user’s speech in real-time.
  • Language Model (LLM): The transcribed text is sent to OpenAI’s GPT-4o-mini, which understands the user’s intent and decides on the next action.
  • Text-to-Speech (TTS): The agent’s response is generated by Cartesia’s Sonic model, creating a natural and human-like voice.

The real power lies in how the LiveKit Agent connects these pieces. Using Function Tools, our agent can interact with our FastAPI backend to perform actions like looking up a user in the PostgreSQL database, registering a new user, or initiating a call transfer. This makes the agent not just a conversationalist, but a fully integrated component of our application logic.

Smart IVR system architecture - enabling AI-powered call intelligence.
Smart IVR System Architecture

Why LiveKit is a Game-Changer for Voice AI

Building a conversational IVR presents unique challenges, and this is where LiveKit truly shines.

  1. Low Latency is King. In a real conversation, timing is everything. LiveKit is built from the ground up for real-time communication, ensuring that the back-and-forth between the user and the AI agent is fluid and natural, minimizing the latency that plagues so many voice bots.
  2. Unmatched Flexibility and Control. You have complete control to plug in any STT, LLM, or TTS service you choose. Want to switch from OpenAI to a fine-tuned open-source model? No problem. This flexibility allows you to build the best possible experience without being tied to a single vendor’s ecosystem.
  3. Lower Costs, No Black Boxes. By giving you the freedom to choose your components, LiveKit allows you to manage costs effectively. You can select more affordable models or even self-hosted solutions, avoiding the opaque pricing of all-in-one platforms.
  4. Seamless Telephony Integration. Our project proves how easy it is to bridge the gap 

between traditional telephony and modern VoIP. LiveKit’s SIP integration allows us to receive calls from the public telephone network (via Twilio) and transfer them back out. This is critical for any real-world IVR solution.

Conversational IVR is the Future

Traditional IVR systems have served their purpose. Today’s callers expect conversational experiences that respect their time and intent.

Our AI-Powered IVR Agent shows that:

  • Conversational Voice AI can work in production today
  • It integrates easily with existing telephony systems
  • It delivers a smoother, more human experience

How WebRTC.ventures Can Help

At WebRTC.ventures, we build production-ready Voice AI systems tailored to your needs. Our team can help you:

  • Build custom conversational IVR systems with caller recognition
  • Enable intelligent call routing using real-time AI
  • Integrate with your CRM and backend systems
  • Deploy SIP telephony bridges to modernize your infrastructure
  • Support multi-language voice experiences for global audiences

Contact the WebRTC.ventures team to discuss how we can transform your customer service experience with LiveKit’s Voice AI technology.

WebRTC.ventures is a LiveKit development partner.

Further Reading:

Related Stories of Success:

Recent Blog Posts