Should Contact Centers Develop Custom AI Solutions for LLM Features

Contact centers are constantly seeking ways to enhance customer experience and operational efficiency. Large Language Models (LLMs) have emerged as powerful tools capable of transforming how businesses interact with their customers. From streamlining support processes to personalizing interactions, LLMs offer a wealth of possibilities for contact centers looking to stay ahead of the curve. But do they need to develop custom AI solutions to take advantage of the benefits of LLMs?

In this post, we cover the challenges of leveraging commercial LLMs and take a look into the potential of building custom AI solutions powered by open-source LLMs like Meta Llama or Google’s Gemma Open Models for call center applications. We’ll explore the compelling reasons behind this approach, the architectural considerations, and deployment options available.

We will also introduce Conectara, a “middle ground” approach that leverages a cloud contact center solution that offers all the LLM-based features you need to improve the experience of your customers, without the heavy development costs and ongoing maintenance, and with enough safeguards to ensure that the data is well protected.

The Importance of LLMs for Contact Centers

Contact centers serve millions of customers daily with a wide variety of services that include technical support and product or service inquiries. Customer service agents often deal with complex troubleshooting procedures that require them to navigate extensive knowledge bases in order to effectively resolve customer issues.

Each of these interactions provides valuable information that can be used both to evaluate the agent and to make data-driven decisions to improve processes and increase customer satisfaction. But effectively managing the high volume of data is not an easy task, and in some cases is not even humanly possible!

In a previous post, we talked about how Generative AI and LLMs greatly enhance customer experience by providing agents and managers not only with the tools to effectively handle customer inquiries, but  also to process the information generated by each interaction efficiently to ensure compliance and improve quality.

These features that aid in this work include:

  • Call summarizations
  • Sentiment analysis
  • Compliance analysis
  • In-call agent assistance

Why Build a Custom AI Solution?

GPT-4 and Claude 3 are great examples of commercial LLMs that contact centers can use to add AI features to their systems. These are hosted by private vendors that make the LLM available through a convenient Application Programming Interface (API). Imagine being able to summarize calls or instantly analyze customer interactions for hidden trends and post-call insights by just writing a couple lines of code!

However, all the information about the call will also be sent to these  vendors. 

For those uncomfortable or unable to share information it makes sense to leverage an open-source LLM like Llama 3, Phi 3 or Gemma 2, which  can be hosted independently or through a trusted partner, and build a custom AI system around it. 

What Does a Custom AI Solution for a Contact Center Look Like?

A custom AI system that supports LLM-based features for contact centers should have:

  • Low-latency: Features like In-Call Agent Assistance require interacting with the AI system in real-time or near real-time, otherwise the reply could come too late and lead to a bad user experience.
  • Access to custom knowledge base: In the same vein, the AI system should be proficient in the product or service that it’s supporting, in order to provide accurate and relevant suggestions.
  • High availability and reliability: It’s crucial that the system runs on top of production-ready tools designed to guarantee high availability and reliability.
  • Security and privacy considerations: Customer interactions usually involve sensitive information such as identification numbers, addresses and contact details. It’s important that any AI system that processes such data is secured properly and follows strict privacy guidelines.

In terms of functionality, custom AI solutions should offer the following capabilities:

  • Knowledge ingestion and retrieval
  • User queries & responses generation

Let’s explore these in detail.

Knowledge Ingestion and Retrieval

The first thing is to provide a way for ingesting required knowledge into the system, which will later be retrieved to provide the enhanced context that enables the LLM to generate an appropriate response. This technique is known as Retrieval-Augmented Generation (RAG). 

This is a two-step process, as follows:

  • Transform the information into a numeric representation, known as embedding, using a separate AI model known as an embedding model. 
  • Store the embeddings in a special type of database known as a vector database.

The process is depicted in the picture below.

Depiction of a Knowledge Ingestion Service that create embeddings from multiple sources and store these in a vector database
Depiction of a Knowledge Ingestion Service that create embeddings from multiple sources and store these in a vector database

User Queries & Responses Generation

Next is the interface for querying the LLM. Such an interface builds a prompt that it passes to the LLM along with the custom knowledge base, as shown below.

Depiction of a Queries & Responses Generation Service that takes user queries from a customer interaction and sends it along with an enhanced context to a LLM.

Alternatives for Running Open Source LLMs

One of the most challenging parts is how to run the LLM effectively. Like other machine learning models, LLMs require a considerable amount of resources, mostly in the shape of Random Access Memory (RAM) and Graphical Processing Unit (GPU), leading to the need for specific hardware.

You also need a way to deploy and run your chosen LLM. There are a couple of ways to do this, as explored below.

NVIDIA NIM

NVIDIA Inference Microservices (NIM) makes it easy to deploy AI models at scale. It offers optimized, cloud-native microservices powered by a wide range of AI models that include LLMs and domain-specific models. Such microservices can be deployed across different platforms, including cloud, on-premise data centers and GPU workstations.

Microservices are available through Docker containers that can be deployed in production-ready environments like Kubernetes. It includes an industry-standard API which allows developers to integrate the AI models into their applications with just a few lines of code, and using already existing libraries. It also implements optimizations on top of the models that improve performance and reduce latency.

Tools For Running LLMs Locally

Tools like llama.cpp and Ollama (which runs on top of the former) allows developers to run LLMs locally with ease. They provide a simple interface to pull and manage a wide number of LLMS and also run as a system service that provides web endpoints.

These tools also have support for OpenAI API (for Ollama, this is experimental) which allow them to integrate with existing solutions that work with commercial LLMs like GPT-4, without having to rewrite the whole integration.

However, as these tools were originally developed to run models on consumer devices (i.e., developer workstations), they might require additional work to be used as the backbone of production applications.

Cloud Managed Services

Some vendors offer the ability to run open source LLMs as a managed service.

Amazon Bedrock is a great example. It is a managed Generative AI platform with support for various foundational models like Llama 3 and Mistral Large. Bedrock provides a convenient way to play around with multiple GenAI models and also build production ready applications without having to worry about provisioning the underlying infrastructure.

On top of that, Amazon Bedrock provides a comprehensive set of tools that allows you to protect your data, allowing you to comply with common standards including ISO, SOC, CSA STAR Level 2, HIPAA eligible, and GDPR.

The Goldilocks Principle

It’s undeniable that leveraging LLMs for adding AI features to your call center solution brings important benefits and makes life easier for agents, managers, and ultimately, for customers who gain an enhanced experience.

Building a custom AI solution to support these features allows contact centers to control the way their applications interact with the LLM, but there is the increased complexity of managing the required resources. Fortunately, tools like NVIDIA NIM, Ollama and llama.cpp mentioned above make it easier to manage and run open-source LLMs. Cloud offerings such as Amazon Bedrock provide a managed option to achieve the same, while also giving tools to retain control over the data.

There are circumstances where a custom approach is the right choice, and also times when a commercial service will do just fine. There is also a third option that may be ‘just right!’

Enter Conectara, which offers all the LLM-based features you need to improve the experience of your customers without the heavy development costs and ongoing maintenance. Our solution for modernizing contact and call center operations builds on top of Amazon Connect, which employs encryption at rest and in transit, along with granular access controls and secure storage options. This ensures that your data is protected by robust security measures.

Our team at WebRTC.ventures are pioneers in custom and semi-custom real-time communication, contact center, and AI solutions. Explore conectara.ventures today and see how we make LLM-powered interactions effortless. Or, reach out to us for custom development work

Recent Blog Posts