WebRTC Live 91

On the May 15, 2024 episode of WebRTC Live, Arin welcomed industry guests Jim Rand and James Meier of Synervoz to focus on unique audio use cases for WebRTC. Synervoz is an innovation and software development studio focused on audio, entertainment and online collaboration. They have worked on sound technology with many well known companies such as Bose, Meta, Amazon, and Unity.

They discussed their Switchboard SDK for audio processing and explored audio processing use cases such as watch parties, gaming, and karaoke / music collaboration. In addition, they caught up on the use of AI in audio processing pipelines.

Bonus Content

  • A brief interview with our CTO Alberto Gonzalez about his experience at JanusCon last month.
  • Our regular monthly industry chat with Tsahi Levent-Levi. This month, Arin and Tsahi will discuss Machine Learning in audio processing, covering a few of the common ways it’s used in WebRTC calls.

Key Insights and Episode Highlights below!

Watch Episode 91!

Key Insights

Effective audio processing plays a big part in ensuring smooth live interactions in WebRTC apps. For instance, features like noise reduction or echo cancellation can drastically improve the quality of the user experience. Jim says, “As far as where WebRTC fits in, it’s probably half the projects or something that have a communication component where it’s like, hey, it’s a voice or a video chat coming together with music or a game or a watch party or some other media source. And there’s sort of more complex audio pipelines that need to be built into that WebRTC application.”

Audio ducking improves understandability in WebRTC apps. Audio ducking is a technique used to ensure that important sounds, such as voiceovers or announcements, are clearly heard over background music or other audio tracks. James explains, “Audio ducking is a tool that audio engineers are very familiar with. Basically, you have a key input and a main input, which is probably playing music, so whenever there’s a signal on the key input, the music will turn down. So it basically makes space inside of that music to signal for something else. A lot of disco would use ducking to really bring the bass drum forward. But it turns out that it’s also very helpful for understandability and communication when you’re putting content and human voice together.”

AI offers many opportunities for audio processing. AI is making waves across all industries, and it holds a major place in audio processing. Jim explains some of the biggest developments and opportunities in this niche. He says, “There’s a lot of individual nodes that will be AI-based. So whether that’s STEM separation where you might feed music in and feed different stems out, or you might strip vocals out for a karaoke app to, it might be generative music, it could be, again, just noise echo suppression related stuff, effects like voice changers. So a lot of different nodes and we’re experimenting with a lot of them right now.”

Episode Highlights

Latency is a top priority to ensure clear communication

Low latency in communication apps can improve the user experience by reducing delays and enabling natural conversations. That’s why latency is one of the most vital factors for maintaining efficient communication, especially in real-time interactions such as video calls.

James explains, “Latency is very important for us because the ability to communicate with other humans is one of the main things that we want to make sure is baked in and utilizable on everything we do. So, we care about latency. We make sure it’s measured. We make sure it’s something that’s observable inside of our systems. And then we take on the models that we have a sense of, okay, we know where we can fix this and either minimize latency, process it correctly, get our pipelines, perform it in the way we need to, or fix it upstream of it.”

Switchboard makes it easier to create great audio experiences.

Switchboard, an audio platform developed by Synervoz, provides an easy way to integrate advanced audio features without tons of custom coding. With Switchboard, developers can quickly try out different tools like noise suppression or voice modulation, making it simpler to create great audio experiences for all kinds of apps.

Jim explains, “We basically developed a C++ audio engine, and then there’s OS-specific libraries that handle all of that for you, and you can very quickly audition different … it might be an AI-based noise suppression library … it might be a voice changer … it might be auto-tune. So there’s a lot of different things that you can start to mix and match very, very easily when you’re using Switchboard.”

AI helps augment human capabilities, but not replace them

The rapid development of AI tools is exciting, yet it remains largely unfamiliar. While AI can drastically augment human abilities, the critical question remains: can it ever fully replace them? James shares his thoughts, “I’m definitely excited with AI as a tool, as another tool to put inside of the producer or the musician or the mastering engineer’s toolbox to make the best art that they can. […] My challenge is if anybody’s actually bookmarked and listened to one of these things 10 times, please let me know. I’m kind of using that as a ruler, have we gotten far enough?”


Up Next! WebRTC Live Episode 92

with Pion WebRTC maintainer and WebRTC for the Curious author, Sean DuBois

Tuesday, June 18 at 12:30 pm Eastern

Register

Recent Blog Posts