CommCon 2024, CommCon’s first US event, was held at Cloudflare’s San Francisco office this June. I was happy to be able to attend and finally see CommCon in person for the first time! The rapport between the friendly community of internet media and telephony developers and the excellent talks exceeded my expectations and left me excited for future events.
Here’s a roundup of the presentations.
MOQing the World at Low Latencies
The conference started with a bang as Ali Begen spoke about Low Latency DASH, and Media Over QUIC Transport (MOQ or MoQ). Ali presented results from testing and comparing LL-DASH and MOQT, demonstrating the use cases where LL-DASH fits and where MOQT fits in terms of achieving a desired latency.
It was very interesting to see how MOQ fits very well with live streaming at a high scale, with a focus around Content Delivery Network (CDN) integration. MOQ has many other interesting factors such as low level details like how important and beneficial prioritization can be, and higher level concepts such as the pub/sub model.
It was also very interesting to see the IETF work towards the standardization of MOQ. I look forward to seeing more about MOQ and QUIC in general as more web services start moving towards HTTP3.
Automating Your WebRTC Test Calls
Dominik Ridjic from Sipfront demonstrated software they built for developing automated tests for VoIP platforms. Automating tests for WebRTC and VoIP in general is difficult, and tend to have more manual testing in areas that are time consuming and monotonous that would be much better replaced with automated tests leaving more time and bandwidth for manual testing in other areas that make more sense.
The test software setup uses and integrates many well known testing services today such as Selenium, devices clouds like browserstack, and call analytics. The SIP Test Setup integration which is controlled via the sipfront-app using Kamailio and rtpengine to connect to and test VoIP systems is very impressive and it is satisfying to see these components work together. This service is able to cover a wide variety of different test strategies around load testing and being able to use different types of clients for the test and build custom scenarios can add a large amount of stability to a system.
Replacing WebRTC with MOQ
Another look into the world of QUIC and specifically, Media Over QUIC – this time from a slightly more disrupting position. Luke Curley from Discord gave a very interesting talk about his work in the IETF working group for moq.
Luke spoke about some of the reasons why working with WebRTC is difficult and limiting for applications outside of real time conferencing (which is what WebRTC is designed to solve), and also touched on WebCodecs and WebTransport.
WebCodecs API provides low-level access to encoders and decoders by sending or receiving video frames and chunks of audio, which can be used to make media processing more efficient and do custom rendering/editing with javascript in the browser. WebCodecs APIs can be used with WebRTC to provide enhancements, but it does require re-implementing some of the pieces of WebRTC back together.
WebTransport APIs provide low latency bidirectional communication between client and server. This uses QUIC with streams and datagrams (as Luke mentions only with TCP due to reasons), and is the step into the direction of MOQ. Luke breaks down how MOQ can perform as well as WebRTC currently can mostly due to the benefits of QUIC. Looking at how MOQ works with prioritization was also very interesting.
The MOQ standard is early, and will be a few years until it is finalized and published. But there are interesting goals being worked towards such as CDN integration, handling generic payloads, and supporting any latency (DVR/VOD to live).
Cloudflare Calls + NACK Handling
Renan Dincer from Cloudflare (who was a great guest on WebRTC Live last year, discussing Anycast Routing for WebRTC) talked about Cloudflare Calls, which was recently released. Cloudflare Calls is a service SFU with a cascading style architecture to optimize for scale and global availability. Renan spoke about the “NACK shield” concept where packets are cached on edge SFUs for when clients send NACKs to have the packet retransmitted available on the SFU closer to the client. This was very interesting to see particularly with Cloudflare’s distributed network showing how NACKs are more common in some country’s networks than others particularly when comparing first mile and last mile packet transmission reliability. It’s fun to think of this as being a good way to find ISP misconfigurations.
Renan also served as a friendly welcome wagon during the conference, showing us around the Cloudflare office and also providing valuable insights into the Cloudflare APIs that we used during the hackathon portion of the event. Thanks for helping me debug, Renan!
How to crack under 500ms latency with AI voicebots?
An enlightening seminar by Nikhil Gupta of Vapi Voice AI was the first talk of the conference bringing AI into the world of WebRTC. Nikhil talked about achieving sub-500ms latency, which is very important for fluid voice interactions. By leveraging cutting-edge language models (LLMs), advanced speech-to-text and text-to-speech systems, and sophisticated techniques for handling interruptions and backchanneling, his team managed to reduce latency from 2500ms to under 500ms.
Nikhil spoke about some interesting optimizations which included efficient model architectures, real-time audio streaming, and the use of contextual cues. He emphasized running models on client devices and maintaining stateful connections to enhance performance. The presentation also highlighted the shared KV cache among GPUs and Nvidia’s leading role in this technological advancement.
Protecting WebRTC and SIP with APIBAN
Fred Posner (another past WebRTC Live guest for the episode Using Kamailio to Connect WebRTC to SIP and PSTN) introduced APIBAN, a free service powered by a global honeypot network that protects systems from unwanted traffic. APIBAN offers an effective solution for blocking malicious IPs. And best of all, it is free to use!
Fred demonstrated how to get a list of bad IPs and block them using iptables, emphasizing the ease and efficacy of the process. He also shared intriguing insights into the patterns of IP addresses flagged by APIBAN. This is such a nice service that is easy to integrate, including it in a Kamailio deployment feels like a no brainer.
A novel approach to load testing of WebRTC media in a cloud contact centre world
Geoff Willshire of Genesys addressed the complexities with testing, and particularly load testing, that is introduced by WebRTC in traditional VoIP contact centers. He also spoke about issues brought about with the reliance on browsers for call signaling control.
Geoff showcased a novel approach to high-scale load testing of WebRTC media that bypasses the need for hundreds of servers, overcoming the limited scalability of real browsers. This method allows for effective load testing without the significant resource investment typically required.
One of the highlights was Geoff’s compelling story about building the load testing tool with Selenium, specifically tailored for testing contact center solutions. His presentation provided practical insights and showcased advanced techniques for anyone dealing with WebRTC media in the context of cloud contact centers.
Building Jellyfish Media Server in Elixir
Przemyslaw Roznawski from Software Mansion delivered an engaging presentation about building a Media Server with the Elixir programming language and frameworks.
The Jellyfish Media Server is built on the Membrane Framework, a multimedia server developed in Elixir. It was exciting to see different WebRTC integrations and this one with Elixir. He explained how RTCEngine, a Selective Forwarding Unit (SFU), serves as the foundation but lacks extensibility.
He introduced FishJam Media Server, an open-source, general-purpose media server using the Phoenix Framework, Elixir’s go-to web framework. It provides a REST and WebSocket API, making it easy to set up and use. Przemyslaw detailed the main server concepts—rooms, peers, and components—emphasizing the flexibility in controlling media streams. His presentation offered valuable insights into innovative WebRTC solutions using Elixir.
Przemyslaw also discussed the RTC.ON conference this November in Poland, for which WebRTC.ventures is a sponsor!
Voice and Conversational AI In Production
Kwindla Hultman Kramer, the CEO and Co-Founder of WebRTC.ventures partner Daily, discussed the potential of Large Language Models (LLMs) for creating multi-turn conversations that are useful, interesting, and fun. The talk provided an overview of combining WebRTC with LLMs for seamless voice-to-voice interactions.
Kwindla introduced Pipecat, an open-source framework available at git.new/ai, designed for multimodal conversational AI. He emphasized the use of Python due to its popularity in the AI community. The framework includes a pipeline to facilitate transport, format data for LLM processing, and feed the results back to voice, ensuring smooth integration and performance.
Kwindla highlighted Pipecat’s aspiration to integrate with new WebRTC platforms, showcasing the potential for innovative applications in voice and conversational AI. Kwindla’s presentation offered valuable insights and practical tools for anyone interested in advancing voice-to-voice interaction technologies.
It is interesting to see how much has changed (and how much has not) since Kwindla’s Kwindla’s Fireside Chat on the Future of WebRTC episode of WebRTC Live back in 202.
WebRTC Track Mixing for Streaming & Recording
Wojciech Barczynski, another representative of Software Mansion, delivered a great presentation about WebRTC Track Mixing. He discussed the challenges posed by WebRTC’s peer-to-peer design when it comes to recording video conferences or streaming them to a larger audience. To address these challenges, multiple media streams need to be combined, which led to the creation of an open-source media server called LiveCompositor.
Wojciech explored some of the mainstream solutions for mixing live streams like with chrome, GStreamer, and FFMPEG, which then led to the motivation behind developing LiveCompositor. He spoke about how LiveCompositor aims to provide a seamless solution for real-time media stream mixing, overcoming issues faced with headless browsers, GStreamer, and FFmpeg. Wojciech also used LiveCompositor to integrate with Cloudflare Calls for the hackathon, so we got a cool early look into it then as well.
Wojciech also demonstrated how to record WebRTC calls using LiveCompositor, showcasing one of their projects as an example. Wojciech’s presentation touched on valuable insights and practical solutions for anyone dealing with WebRTC streaming and recording challenges. This is what I love to see at CommCon!
Standardized Signaling: Today, Future and Opportunity
Sean DuBois delivered an insightful presentation on standardizing signaling in WebRTC, which he segwayed nicely into a talk about WebRTC in general, and where it’s at today. Sean discussed how WHIP and WHEP protocols have revolutionized WebRTC, enabling new possibilities and integrations with tools like ffmpeg, GStreamer, and OBS.
A notable use case Sean mentioned was the ease of bridging video systems or transitioning from one provider to another, made possible by WHIP and WHEP. One of the unexpected yet exciting integrations facilitated by these protocols is the ability to connect two OBS instances directly.
Sean emphasized that despite the progress, WebRTC is still in its early days, with vast opportunities for contribution to open-source projects. He illustrated how protocols like WHIP and WHEP simplify integration and lead to adoption by services and developers.
The direction of WebRTC today is starting to become more wide yet specific. Users have different use cases from high quality screen share to embedded WebRTC. AI and WebRTC are also a strong pair and will be seen in many services existing and new. Sean expresses that there are lots of new challenges and great opportunities with WebRTC!
Sean was Arin’s guest on WebRTC Live just last month discussing many of these same topics: Next Gen Interactive Broadcasting with WebRTC & OBS.
Thank you!
I can’t thank Dan Jenkins and his group of organizers, the presenters, and my fellow WebRTC attendees/enthusiasts enough for making my time at CommCon so engaging and enjoyable.
Here’s to many more CommCon’s in the future!
Since I have been referring to so many of the presenters’ appearances on WebRTC Live, I should also mention Dan’s great episode from earlier this year: Where Does WebRTC Fit in the State of Broadcasting?
Next up: Stay tuned for the roundup of talks from the London event, attended by my WebRTC.ventures colleague Alfred Gonzalez.