A WebRTC Developer’s Take on RTC.ON 2024

When I attended the CommCon London conference earlier this year, one of the speakers, Michał Śledź, invited us to join the RTC.ON 2024 conference. It was organized by his company, Software Mansion, and was going to happen in Krakow on September 11-13, 2024.

A few days after returning home, I sent a proposal to be a speaker at the conference. It was accepted!

A couple months later, I arrived in Krakow!

Here is my summary of the talks I saw at the conference. Most were about WebRTC, but there were others about AI, Media over QUIC (MoQ) and integrations of Elixir WebRTC and Membrane. The videos are available on Software Mansion’s RTC.ON 2024 YouTube playlist.

Day 0

The first day consisted of a speakers evening event in the Software Mansion headquarters, where we had dinner with most of the speakers and some of the Software Mansion employees as well. This was a great opportunity to meet everyone and talk about all kinds of topics.

I also finally had the pleasure to meet Arin Sime, CEO and Founder of WebRTC.ventures, in person! You can read his post about the conference and watch the WebRTC Live episode that he broadcasted live from the event, where he interviewed myself and a few other presenters.

Arin Sime, CEO and Founder of WebRTC.ventures, interviewing Violina Popova of ClipMyHorse.TV for WebRTC Live.

Day 1

This was the first day of the conferences. It started in the morning, at the organizers office. They welcomed us with breakfast and then Maciej Rys, the chairman of the conference, gave an initial speech and introduced the first speaker, Chad Hart (RingCentral), which you may know from webrtcHacks.

WebRTC Developer Dynamics – An Open Source Analysis

In this presentation, Chad talked about the past and current trend of WebRTC and related topics, like MoQ, WHIP, WHEP and WebCodecs and analyzed the data he collected from stackoverflow threads and messages, github repositories and others. Spoiler alert, it seems WebRTC is doing more than fine!

The next speaker was Lorenzo Miniero from Meetecho (Janus).

WebRTC and QUIC: how hard can it be?

Lorenzo dived into the QUIC, RoQ (RTP over QUIC) and MoQ (Media over QUIC) protocols and explained the process he went through to test the use of RoQ and MoQ. He worked in some proof-of-concept projects, starting from a basic approach, then using RTP over QUIC (RoQ) and finally being able to send media over QUIC (MoQ). He had a Janus server receiving traffic from a RoQ client and sending it to a WebRTC client, and the other way around, i.e. receiving traffic from a WebRTC client and sending it to a RoQ client. Then he did the same but with a MoQ client instead. It was a really interesting topic since the rise of QUIC and MoQ concepts, and not the only talk about it in this conference.

Lorenzo and his team also organized the JanusCon in Napoli earlier this year, which our CTO Alberto Gonzalez had the chance to attend. You can check out his post on that event.

After that, Wojciech Jasiński (Software Mansion), gave a talk.

On challenges and considerations for real time AI processing

Wojciech showed some of the AI projects they have worked on (like video inpainting) and explained some basics about computer vision. He showed the potential and use cases for real time video processing using AI, the challenges and some of the solutions they came up with while working on their latest project. Really interesting topic if you are into AI!

What happens when AI starts grocking streaming audio directly

Rob Pickering (aplisay) explained some of the considerations in using real-time audio processing when integrating it with AI tools, like for conversational bots. Then he talked about their project aplisay, which allows you to integrate a large language model (LLM) of your choice (15 models supported) in your telephony flow and showed the architecture and some of the challenges and solutions.

WebRTC and Spatial Computing on Apple Vision Pro

Damien Stolarz (Evercast LLC) explained the techniques they used to build an app to stream stereoscopic and 360-degree video over WebRTC via Janus and using Apple Vision Pro to visualize it. It can display streams in 3 ways: planar projection, spatial video streaming and 3D object streaming. He also showed some of the challenges they had to overcome. He finished with a demo in which he was using the Apple Vision Pro and displaying what he saw on the screen, explaining how it worked and showing different use cases. It is used for remote editing of films and TV shows and some of their clients are major film productions.

The next speaker was Dan Jenkins (Nimble Ape).

Taking ICEPerf.com to the next level

Dan’s talk was about the new app they developed called ICEPerf.com, which allows us to analyze the performance of a STUN and TURN server. It was a continuation to the talk he gave in CommCon London about ICEPerf.com, with some new additions since then. He explained how it works and some issues they had. Then he compared the metrics (candidate latency, time to connected state and overall latency for STUN and TURN, the max TURN throughput and the API response times) from different service providers like Twilio, Cloudflare, Xirsys, Google (STUN), Metered and ExpressTURN. In all the metrics Cloudflare performed really well. There is real-time data of the metrics for those providers available at ICEPerf.com, you can go and check it out. They are planning on adding new features. This is open source and you can download and run the test with any STUN/TURN server you want, even your own!

Improving DX and adoption of Membrane Framework

Mateusz Front (Software Mansion) explained that Membrane is a media framework focused on live streaming, pipeline-based (like GStreamer) and written in Elixir. Some of the use cases are broadcasting, streaming to an AI tool and for CCTV. He showed some examples, explaining how to build the pipelines, how they work and displaying the results. The learning curve is a bit steep so they decided to create Meet Boombox, which is a library on top of Membrane and easier to use. He also ran some examples to show how it works, including some integration with an AI tool called “Not hotdog” (it indicates if the video contains a hotdog). They are working on adding more features, like WHIP and WHEP, and support for more codecs.

React and WebRTC – Real-time communication on mobile

Perttu Lähteenlahti (Noice) talked about the importance of real-time communication in mobile. He explained why React Native is a great choice for that, recommended us to use Expo (a React Native framework) and react-native-webrtc library, and showed the similarities between web and mobile libraries. Then he explained some of the things to consider when building WebRTC apps for mobile and for other devices as well (like tvOS). He finished by showing some of the limitations.

Exploring LLMs and GenAI: A Hands-On Guide to Building Your Own RAG Model for Document Interaction

Paula Osés (IAG) started with an introduction on LLMs and RAG (Retrieval Augmented Generation) and explained the architecture. Then she jumped into a hands-on about building your own RAG system, listing the tools she used (LangChain, groq, Qdrant and python) and explaining all the processes step by step: document preparation, embeddings, vector database and querying. She also showed the LLM configuration with the prompt, the model and other values. Finally, she ran a demo showing the capabilities of RAG. Basically, you will have an LLM that can learn and answer about specific data that you feed into it, like for example a manual of your company or laws from a country. You just need to set the source (documents) of your RAG and you will get replies based on that, with no incorrect responses (called hallucinations). This talk gave me ideas and motivation to build a RAG project myself.

To close the first day of conferences, we had Zafer Cesur, Co-founder and CTO of Algora.

How we built Algora.TV (live streaming for developers) using Membrane and Elixir

Zafer explained how they created Algora.TV, a live streaming platform (like Twitch) but for developers. It is written in Elixir and they used Membrane to build it. He started by showing us the live application, commented on some of the reasons they chose Elixir and Membrane and showed some examples on how they built the pipelines. Then he explained which services they used for delivery of media (tigris and fly.io), the current features they have implemented, like multistreaming and in-video ads, and some issues they ran into.

After all the meetings, some of us went for dinner together, thanks to Michał Śledź for organizing it. It was fun to have more time to get to know everyone.

Day 2

The 2nd (and last) day of the conference was on Friday and started the same way as the first: we had a welcome breakfast and then the conference started.

The first speaker of the day was Piotr Skalski (Roboflow).

Everything you wanted to know about VLMs but were afraid to ask

Piotr defined VLMs (Vision-Language Models), which have the capability to generate visual and textual information. He talked about the paradigm shift that happened after ChatGPT released the CLIP model and how that was a breakthrough in the computer vision field. It uses image and text encoding and if they have similar semantic meaning (eg: a picture of a man holding a dog and the text “a man holding a dog”), the embeddings (encoded text and encoded image) will be similar too, and the more they differ the more the embeddings differ too. He listed some of the open-source VLMs and showed the power of VLMs (using Florence-2) with different examples where the model was able to generate captions of a picture, find different objects and recognize text, among others. Really great talk for those like myself that are interested in AI.

Real-Time Video Streaming with WebSockets in React Native

Violina Popova (ClipMyHorse.TV) showed us their application, what it is used for and some features. She compared HTTP and WebSockets and justified why using WebSockets was the right solution for them. Then she explained how WebSockets work and how to use them in React Native, showing some code from their application.

Jitsi Videobridge: the state of the art SFU that powers Jitsi Meet

Jitsi Videobridge is an open-source SFU, already 10 years old. Boris Grozev (8×8) explained the history behind it, the architecture and also showed its usability, features and performance. For people that are interested in building their own WebRTC application, you may want to take a look at this open-source media server. It is the second one we have seen during the conference, after Janus at the first conference day (from Lorenzo’s MoQ talk).

Building a Low-Latency Voice Assistant leveraging Elixir and Membrane: Insights and Challenges

Enzo Piacenza (Telnyx) walked us through their project to build a low-latency voice-assistant and integrate it into their current app. They used Membrane to build it. They started from the bottom with the most basic approach, in which performance was lacking, up to the final product which was real-time with 900-1000 milliseconds response time. He explained how they were able to optimize it from the initial approach (6-10 seconds response time) by splitting the different processes (like speech-to-text, LLM model invocation and text-to-speech) and optimizing them one by one. They used all open-source projects to build it. This was a really interesting talk for me since one of my most recent projects (and my talk) was related and it gave me some ideas that I could use for other projects.

Stop Fighting Hydra – Replacing Headless Chromium

Wojciech Barczyński (Software Mansion) explained that most of the current approaches used for real-time audio and video processing (eg: video call recording) rely on headless chromium, ffmpeg or GStreamer. Then he talked about why and how they implemented it in a different way when building their LiveCompositor. It is a media server which receives input streams (currently RTP or MP4) and outputs it back and you can configure it with HTTP requests to the server. He showed some examples on how you can build your video composition with the different HTTP API requests, defining the layout in a dynamic way, like adding animations. They used React to keep track of all the state changes. This is still a work in progress and they are planning to add more things to it.

Elixir WebRTC – batteries included WebRTC implementation for Elixir ecosystem

After lunch, Michał Śledź (Software Mansion) talked about some of the issues and things they had to consider when building Elixer WebRTC, a WebRTC library for Elixir. They decided to add demo applications to show what it can do (eg: integration with AI, broadcasting and video conferencing) and to shorten the learning curve for developers to start building with it. He listed some of the features it has: observability (server-side webrtc-internals), simulcast, tutorials on WebRTC and Elixir WebRTC (including implementation examples in javascript and its counterpart in Elixir WebRTC), WHIP/WHEP, examples, among others. He showed server-side rendering and how simple it is to create a video player in the browser which creates a peer connection in both backend and frontend (but this is still something they are working on), and also how you can get a dashboard with the webrtc-internals. He finished by explaining their roadmap for the upcoming new features.

The next talk was mine! Alfred Gonzalez Trastoy from WebRTC.ventures.

Boosting Inclusivity: Closed Captioning & Translations in WebRTC

I explained some key concepts about speech recognition and translation, the options you have (3rd party and open-source) for the speech-to-text and translation services and the things to consider when choosing the service. Then I dived into a real-world scenario, in which I used Azure AI Speech, a 3rd party speech service, to add closed captions and translations to live meetings. I mentioned some of the challenges and solutions I ran into when implementing it. I ended it with a small demo in which I switched between talking in English and Spanish with the live transcription and translation running and the screen showed the transcription and translations results in English, Spanish and Polish. The benefits of adding real-time closed captions and translations is that you can increase the inclusivity of your WebRTC application.

WebRTC: The Kubernetes way

Mate Nagy (L7mp Technologies) was up next. He started by listing some of the challenges building and scaling WebRTC applications using Kubernetes. He gave a brief explanation about their project STUNner, which rethinks the TURN functionality by treating it like an ingress gateway, making it easier to integrate it in Kubernetes deployments. He showed some of the common needs their customers have when they try to use Kubernetes for WebRTC applications, grouping the problems in networking, load balancing and scalability. He showed a graph with the most popular cloud providers and media servers their clients use and also some non-regular use cases. He finished by listing some cases where you should not consider using Kubernetes.

The last speaker of the conference was Ali C. Begen (Professor at Ozyegin University).

DASH and Media-over-QUIC Transport Face-Off: Performance Showdown at Low Latency

Ali is one of the minds working behind the new and upcoming Media-over-QUIC Transport (MOQT) protocol. He gave an introduction to DASH and HLS and then to Media-over-QUIC Transport (MOQT). He compared WebRTC to the others and showed where each protocol fits (from higher to lower latency) and where they are trying to fit MOQT, which is in ultra low latency, around 1 second, between LL-DASH and WebRTC. He explained the benefits of MoQ and the things they are working on. Really interesting talk and I’m looking forward to it.

Sailing Away

After all the talks, we had a few hours to rest and then we had dinner and drinks on a boat/restaurant on the Vistula river, which runs through Krakow. This was a nice place to expand your network and talk with everyone about the conference, projects, and life’s mysteries.

It was a great event to meet awesome and interesting people, to hear all the talks, and to explore and get new ideas. It was also my first time speaking at an in-person conference and the experience was great. I hope I can join the RTC.ON conference next year, as well. If so, I will make sure I have time to visit the beautiful Krakow and surroundings!

A WebRTC Developer’s Take on RTC.ON 2024.

Day 0

Day 1