A voice bot, also known as a voice assistant, is a type of bot that uses natural language processing (NLP) and text-to-speech (TTS) technology to interact with users via voice commands. Voice bots are designed to respond to spoken requests and provide information or perform tasks in a conversational manner.

One key difference between voice bots and regular chatbots is the way they interact with users. Voice bots use speech recognition to understand spoken requests, while chatbots use text recognition to interpret written requests. Additionally, voice bots are designed to provide a more conversational experience. 

We recently worked with a client who was transitioning from a traditional live agent to a voice bot for the customer services duties for their teletherapy application. 

Why use a voice bot for your WebRTC application? 

  1. Uninterrupted service. The bot will be attending calls 24/7.
  2. User satisfaction. You can connect the caller to the bot immediately instead of waiting for an operator to be available. Then the user can be transferred to an agent easily and can ask frequently asked questions that the bot will be able to answer.
  3. Convenience. No need to have operators available all day. Also, the operator can work on other tasks instead of being busy handling multiple calls every day and answering repetitive questions.
  4. Personalization. Train the bot to respond to specific needs.
  5. Accessibility. Users can ask the voice bot questions and receive immediate responses or access relevant information from services hands-free.
  6. Multi-lingual. The bot can interact in multiple languages.
  7. Highly trainable. Add more functionality as you need it.

Our Client’s Previous Implementation

Callers are distributed in multiple countries and regions. With hundreds of potential callers, managing who each user needs to reach can become complicated. Generally, several operators were on duty taking calls. The operator talks to the caller and asks them who they want to talk to. Then the operator proceeds to do a (warm) transfer to the appropriate agent. If the agent is not available, the operator sends the caller to voicemail to leave a message.

Voice bot implementation

Our client trained the voice bot with some frequent questions and answers. For example, requesting to connect to an agent and providing a name or some identification. They are also adding a language selector to be able to ask and answer questions in other languages.

Initially the bot was trained as a chat bot, but our client wanted to provide this capability and more to the callers.

High Level Flow Example

Diagram Legend:
TTS: TTS stands for Text-to-Speech. It refers to the technology that converts written text into spoken words. 
ASR: Automatic Speech Recognition (Speech to text)
SLU: SLU stands for Spoken Language Understanding. It is a subfield of natural language processing (NLP) that focuses on understanding and extracting meaning from spoken language.
LLM: LLM is an abbreviation for “Language Model.” A language model is a statistical model that predicts the likelihood of a sequence of words or phrases in a given language.

One approach that might be helpful in standardizing your solution is to leverage existing protocols built for managing media in voice applications. MRCP (Media Resource Control Protocol) is used for managing media resources in voice and speech applications. It enables communication between application servers and media servers for tasks such as speech recognition and synthesis. 

Once that media connection is established it can forward it to TTS or ASR services and then to SLU or LLM that provide responses and that we can forward back to the IP-PBX.


If you prefer something more custom, you can also directly connect like an additional participant and capture that media and forward RTP to the third party service you want or even have your own custom bot service built in house.


Artificial Intelligence (AI) is moving out from science fiction stories to something we also use everyday. Voice bots are just one example. 

Here at WebRTC.ventures, we’re combining WebRTC and AI to create more intelligent and personalized live video applications that offer a competitive edge to businesses large and small. Contact us today and let’s take your application to the next level!

Recent Blog Posts