At TADSummit in Paris on October 19, 2023, I had the pleasure of moderating a panel discussion on the role of AI in Video and Voice communication applications. In this blog post, I’ll summarize our discussion and conclude with a few of the key insights I drew from the panel discussion.
If you want to see a more general overview of the conference and the TADSummit experience, check out my video update from TADSummit in Paris. You can also watch all of the TADSummit presentations for free on the TADSummit YouTube channel.
Video
We live streamed the panel to the WebRTC.ventures webinar series, WebRTC Live. You can watch the complete panel discussion here:
Introductory Remarks and Panelist Introductions
I opened the session by noting that we all already understand that AI has the potential to dramatically impact our industry and the way we interact with our customers and users. From audio transcriptions to AI assistants during a call, the possibilities seem limitless right now!
What is the role of AI in video and voice applications and what are the concerns that come with its capabilities? This is what brought together a variety of industry players to discuss. I was very happy to assemble an excellent collection of individuals, all in person in Paris!
I’m the founder and CEO of WebRTC.ventures. If you’re reading this blog post, you probably already know that we are a design and development agency focused on building live video applications. We are increasingly also integrating those applications with conversational AI.
After briefly introducing the panel topic, I asked each panelist to introduce themselves, and give a quick mention of how they’re using AI now across their wide perspectives of experiences.
Our Panelists
Paul Sweeney is the Chief Strategy Officer and Co-Founder of Webio, a conversation technology company based in Ireland and focused on credit collection in the European market. For debt collection agencies or banks, they provide a very focused service to help them use conversational AI to efficiently contact customers to help them establish payment plans. Due to the financial nature of their work, they have a strong commitment to privacy and security.
Lorenzo Miniero is the Chairman of Meetcho and author of the open source Janus WebRTC Server. Lorenzo noted they have recently hired an AI lead in their team, which provides consultancy and development around Janus. They have already incorporated AI for things like transcriptions, and they look to expand that type of capability with their newest hire.
Romain Valleux is the DevRel & Partnership Manager at Apizee (pronounced “App E-Z”), a CPaaS for customer services using video, which includes WebRTC and more broadly communications applications. They are very focused on customer engagement solutions and have been in business for a decade.
Pieter Luitjens is Co-Founder and CTO at Private AI, a Canadian company that provides data redaction services around AI services globally. Pieter noted that if you’ve used Deepgram or AssemblyAI, then you’ve probably used their service as well. Pieter’s background is around machine learning and deployment at large scale.
Paula Osés is an AI Engineer from Noumena, a computer vision firm based in Barcelona. They use computer vision to understand human interaction with different spaces such as office spaces, public transportation and public spaces.
AI in Video and Voice Application Use Cases
Lorenzo began the use case discussion talking about the use cases they are seeing for AI with Janus users. He noted several had started a few years ago, with the main one being transcription. Transcription is really the building block of many other AI uses in communication applications. Janus users have also built sentiment analysis and identify verification solutions, and the role of the Janus and MeetEcho team has been how to facilitate those applications using their server for video and audio use cases. Janus has a very flexible architecture which allows their users to come up with creative AI implementations, such as one customer who used real time video manipulation to turn people into live Van Gogh pictures.
Romain talked about use cases they have seen in the Apizee customer base. He noted a couple of fields they work with which are using AI. For example, they work with product support calls that enable video and then use AI to understand the video stream and try to match that with a database of existing photos of the product to help see what is working or not working for the customer.
Apizee also has a client who has used AI and remote video to monitor a factory production line and look for defects as the product is being built. I noted how these are great examples of AI being used in practical uses beyond just changing background detection and funny hats.
Paul talked about how Webio uses AI in their text messaging platform for credit collections, which is a very targeted application. The biggest problem with credit collection conversations is people don’t engage with resolving their credit issues and they get into more trouble as their problems pile up. Webio has seen that SMS messaging is actually a very good engagement modality for credit conversations because customers can respond at their pace and it takes some of the emotion out of what can be a stressful conversation. Customers can take time to consider their options and then respond. Credit collections is a very interesting context where having richer real-time conversation technology like Voice would not be helpful because that forces the conversation to happen at a faster pace.
With text-based conversational AI, you can do better parsing of the message to understand the customer’s context and current situation, and respond appropriately. For instance, if a customer says they are at the hospital right now and cannot respond, the bot can understand that and respond empathetically. Strong conversational design is very important, and will lead to better outcomes when you then later present payment plan options.
Paula talked about a video AI project at Noumena where they installed cameras at different intersections in Barcelona to help analyze traffic patterns and better inform transportation design. This allowed them to find better balances between vehicles and pedestrians. For more specifics about Paula’s work on this project, she was a guest on a past episode of WebRTC Live, “How Computer Vision is Changing the Game for Video Data Analytics”.
To wrap up the use cases discussion, Pieter talked about their work at Private AI. Their data redaction work is primarily around Automatic Speech Recognition (ASR) and they work with chatbots a lot as well. I asked Pieter if there are use cases where AI should not be used because the privacy is too sensitive. Because of the power of modern data redaction solutions, Pieter didn’t feel there was any situation where AI cannot be used. You just need to be sensitive as to what information needs to be redacted and have a good plan in place to do so. Even in areas like patient and doctor conversations, AI can be used, as long as it is with the right controls and data redaction techniques that are in regulatory compliance.
Architecture for AI Applications
Pieter’s comments on privacy in AI applications and upcoming regulations provided a segue to a discussion on the architecture of AI applications. Regulations vary by region, and since this panel discussion was in Paris, Pieter specifically called out the importance of GDPR and the upcoming EU’s AI Act. Pieter discussed how privacy concerns, especially in edge-based applications, are the number one reason why companies are hesitant to adopt AI-based applications. For instance, many consumers are turning away from the concept of Microsoft’s copilot application that runs on your computer because they are concerned that everything they do is being pulled into an AI system controlled by Microsoft.
Paula talked about the architecture of their large-scale traffic analysis application. Because of the large amounts of data being processed, the solution was built with a very centralized processing of that data, which is not done in real time. They had to install cameras around the municipality and had to comply with local regulations around how to handle public recording, which did not allow them to stream the video over a connected network. As a result, they had to record for set dates and times and then recover the recordings for centralized processing in their office.
This choice required purchasing a lot of GPUs and was not scalable, so they are looking at more edge computing-based options for the future. Ideally, they will put more object detection algorithms directly in the cameras, and then only positions and data about objects like cars would be stored in the central database, instead of full images.
Romain talked about the sustainability of AI-based applications. Because of their reliance on large amounts of processing power, concerns are being raised about the carbon footprint of AI similar to those that have been raised about blockchain applications. Romain talked about finding a balance of edge processing, which may reduce the amount of data transferred and stored centrally. He noted that 2-3% of the total carbon emissions globally comes from the digital economy, a large portion of which is video streaming. These are all very intensive solutions and so we should be judicious in how they are used. Discarded mobile phones are also a major environmental impact, so designing systems that can run on older mobile devices also helps sustainability since users wouldn’t have to upgrade their device as frequently.
Again, designing your application to only include useful features is beneficial to reducing the financial cost and environmental cost of implementing AI. Measuring the usefulness of features after you deploy them is also beneficial, Romain noted, since you can architect your application to make it easy to remove a feature if users don’t find it valuable.
Pause for Audience Questions
Before moving on to our final segment with the panelists on privacy and security, I opened up the microphone to questions from TADSummit attendees. One attendee commented that edge computing is just an extension of cloud computing and asked the panel what they see as the biggest challenges to moving more real-world workloads out to the small space of the edge. What are the biggest issues with truly distributed systems, or does it not matter if we see the cloud as very high performing and nearly zero latency?
Romain noted that power is the number one constraint for distributing AI. If you have to include the amount of processing power necessary for AI on an edge device, how much will that affect battery life on your mobile or edge device?
Paul talked about the importance of looking at the entire system – don’t think about the drones in a system for example, think systematically about what you are building. If the system is solar powered, as in a drone-based AI application for vineyard monitoring that Paul learned about, then you have effectively broken out of the power constraint.
Pieter talked about the current paradigm of LLMs as they stand today. It’s simply not possible to go to the edge in many cases, because you can’t run them on a single processor. People get misled sometimes thinking, “I’ll just download an open source LLaMA LLM to my computer and use that”. But you still need a lot of computing power to do anything useful with it and that can add up to six figures annually very easily. It’s also important to recognize how much optimization work has been done by companies like OpenAI and that the LLMs are built to run across thousands of GPUs, so it’s hard to duplicate this in your own environment.
In response to another attendee question, Paul also noted how important it is that users always know up front if they are talking with an AI. This is a matter of trust with the user. In some cases, Paul explained, users are actually more comfortable disclosing information to an AI because they won’t feel judged in the way they worry they may be by a human agent. When they are less self-conscious, they can disclose more of the information necessary to find a resolution to their problem.
Privacy and Security Around AI in Communication Apps
We didn’t have time to get to more detailed questions on privacy and security around AI in communication applications, but Pieter noted in his opening comments that it’s possible to manage confidential data like patient information, and so AI can even be used in sensitive areas like healthcare applications. One of the most important considerations for consumer data regulations and privacy is to anonymize any data that might be used in the training of a model.
This is an area where utilizing commercial services can be helpful. A hosted LLM service will already handle many issues for you, similar to how AWS manages many security issues for you on their cloud. But just like cloud servers, you must still work with it in a secure way so that you don’t introduce vulnerabilities through poor configuration or careless data management procedures.
If you want to learn more about building-in privacy and security into video and communication applications, check out our episode of WebRTC Live this Wednesday, November 8! My guest will be Robert Strobl from Digital Samba. Robert will talk with us about issues ranging from GDPR compliance to End-to-End-Encryption (E2EE).
Panelist Conclusions
To wrap up, each panelist shared a bit about their next moves or predictions for the future.
Paula talked about the next big step for their public traffic analysis is to be able to better analyze the large amounts of data that still meets the privacy constraints of each region where they conduct studies.
Pieter mentioned the importance of optimization for your use case – general purpose LLMs often don’t work well in enterprise applications unless it’s been trained and optimized for the case at hand. When using a general purpose public LLM, remember that any information you give it about your business will likely pop out as part of answers it provides to other users. So be careful what you share! Romain also stressed respecting the chain of custody of user data.
Lorenzo shared his excitement about the possibilities of AI in communication applications. Even real time transcription was not possible a short time ago. Things are rapidly becoming more accessible to smaller companies and projects!
Paul Sweeney ended by predicting that every company will eventually have their own custom LLM for their use case.
My Thanks and Takeaways
Many thanks to our excellent set of knowledgeable panelists for their time. I really enjoyed the discussion and gained a lot of information and new perspectives on AI in Voice and Video Applications.
A few key insights that I’ve drawn:
1. Consider the Context
Does the addition of AI actually add value to the customer and the business? Don’t just add cool AI features for the tech cool factor, that will distract from the usefulness of your app.
2. Minimize the Impact and Cost
Building on the previous point, don’t add AI to your application for just the “wow” factor. Not only can it distract from more useful features, AI requires a lot of computing power – which means additional costs in server usage, as well as additional burdens on the power grid which contributes to climate change. These are additional financial and environmental reasons why building unnecessary AI features is wasteful.
3. Consider Privacy and Check with Legal
Be careful what information you share with LLMs, and how much exposure of user data is shared with LLMs, because this data may be used in other ways that violate privacy or corporate confidentiality.
4. Optimize for your use case
A general purpose LLM may not provide the most useful outputs, so look for ways that you can optimize, customize, or (confidentially and carefully) share relevant data with the LLM so it can provide more useful answers.
5. AI will be staying in the Cloud
At least for now, the compute resources of LLMs are too large for most edge computing architectures. Depending on your use case, you may be able to do some analysis on the edge to minimize the data that needs to be analyzed and stored in a cloud repository, but it’s unlikely that you can build a complete AI solution on the edge.
6. Leverage experts
LLMs are complex to maintain, as our panelists all touched on in different ways. In a separate presentation made by Paul Sweeney at TADSummit, he talked about “ML Ops” for the extensive devops work that is needed around AI solutions. The more you can use established services and Infrastructure-as-a-Service (IaaS), the less you will have to manage yourself. You can also look at other services like Private AI to handle specific tasks for you like redaction of sensitive data.
When you’re ready to integrate an AI service into your communications application, you should leverage experts there too. Our team at WebRTC.ventures has deep expertise in communications protocols like WebRTC, as well as experience integrating it with various AI and ML services. Would you like to learn more, and explore ways to build AI into your video or voice communications application? Contact us today!