WebRTC Media Servers

The WebRTC ecosystem is vast and sometimes can be a bit scary for newcomers. When I first tried to understand WebRTC, I remember coming across an incredible amount of acronyms. This article will provide a guide to webRTC media servers and a few open source options such as kurento, janus, jitsi.org and more. I will  also aim to lower the technical barrier needed to understand WebRTC’s business value.

2020 Update: WebRTC has become the preferred technology to send low latency video, voice, and data. While it was originally designed for use as a direct single communication between peers (browsers), today we’re building much more complex and scalable applications that would be limited by bandwidth and CPU if we followed the direct peer-to-peer approach. For this reason, the use of WebRTC media servers in real-time applications has become necessary to lower these limitations.

What is a WebRTC Server?

Since the early days of WebRTC, one of the main selling points of the tech was that it allowed peer-to-peer (browser-to-browser) communication with little intervention of a server, which is usually used only for signaling.  This is why the concept of a WebRTC media server may be counterproductive.

Below, I’ll try to illustrate why media servers are useful, what type of features they normally offer, and which open-source alternatives the user has at their disposal.

Multiple participants in a video call

WebRTC conference with mesh architecture

While it’s possible to hold video calls with multiple participants using peer-to-peer communication (Fig 1. mesh architecture), it stops being practical as the number of participants increases. This is because a peer must send their video/audio stream to every participant while also receiving a video/audio stream per participant.

In practice, even under optimal network conditions, a mesh video call doesn’t work well beyond five participants. This is where a media server comes in handy as it helps reduce the number of streams a client needs to send, usually to one, and can even reduce the number of streams a client needs to receive, depending on the media server’s capabilities.

When a media server acts as this kind of media relay, it is usually called a single forwarding unit (SFU). Its main purpose is to forward media streams between clients.

There’s also the multipoint conferencing unit (MCU), which is used to address a media server that not only forwards but can operate on the media streams that go through it. An example of this is mixing all video or audio streams into a single one.

Video recording

One of the main benefits of having all video streams go through a media server (cluster) is that the media can be recorded and stored for any purpose. This would be difficult to do on a mesh architecture — if it is possible at all.

Integration with other communication technologies

Another advantage of using a media server is communicating with systems beyond  what web technologies allow, such as the PSTN via SIP trunking or streaming through RTMP to services that support it, like Facebook Live and YouTube Live Streaming.

You can see this in one of our previous blog posts, in which Kurento Media Server is used to connect a video call between a browser and a SIP phone.

Processing media streams

Some media servers allow the processing the video and audio streams at a very low level, like being able to run computer vision models on the video or send the audio stream to a speech recognition engine, such as Google Speech. These are features that take WebRTC to another level. In my opinion, it allows for richer and innovative real-time interactions that can add a lot of value to an otherwise normal communication platform.

We’ve also discussed this subject before. In the this blog post, we demonstrated how to use Kurento Media Server to build a live streaming application with real-time image detection.

Which OSS media server options are available?

As mentioned before, the WebRTC ecosystem is vast, and there are quite a few open-source options on the market. These are some of the most mature and popular ones:

Jitsi Platform

Jitsi is not just a WebRTC media server. It has a whole platform built around it! The Jitsi family of products includes Jitsi Videobridge (Media Relay, SFU), Jitsi Meet (conference web client), Jicofo (Jitsi Conference Focus), Jigasi (Jitsi Gateway to SIP), Jitsi SIP Phone, and others. The most appealing feature of the Jitsi platform is that it includes everything for a communication platform to get up and running in a few hours. It also implements its own signaling using Jingle (XMPP) and a fully featured web interface. It also provides SDKs for web, Android, iOS, React-native, and Electron apps. Sadly, however, one of the biggest pain points is implementing media recording, as there’s no solid, easy to use solution.

Kurento Media Server

Kurento is one of the most versatile solutions out there. Not only is it a media server, but it’s a toolkit to build one. The main advantage of Kurento is its versatility. It introduces the concept of a media workflow, in which you can define, in code, how and where the media flows. This allows a WebRTC developer to compose and integrate interesting features, such as computer vision (e.g., recognize QR codes, face detection), real-time media modification, and interop with RTP (VoIP) services. Kurento can also be configured to function as an SFU or MCU, or both, in a single instance. Updated in 2020: A few years ago, they started with OpenVidu, a new platform to facilitate the use of Kurento functionalities from a higher-level client in your web or mobile applications.

Janus WebRTC Gateway

While its description doesn’t mention “media server” anywhere, Janus can be setup as an SFU easily. One of its most notable features is its plugin architecture, which allows you to augment the service’s core capabilities. 

mediasoup

This is a relatively new and interesting media server. What makes it different from the rest is that it’s designed to be a library (for Node), allowing it to be integrated in bigger applications.

More options added in 2020

There are other popular platforms that weren’t originally developed to be WebRTC media servers but have WebRTC media server capabilities.

Asterisk

Asterisk is an open-source framework for building communications applications. Asterisk turns an ordinary computer into a communications server and powers IP PBX systems, VoIP gateways, conference servers, and other custom solutions. Although it is mostly used in telephony applications, it also supports WebRTC and is frequently used in conjunction with JsSIP or SIP.js (SIP over WebSockets JavaScript libraries) to connect web apps with the telephony network.

FreeSWITCH

FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. Similar to Asterisk, FreeSWITCH’s core functionalities are in the telephony field, support WebRTC, and have built-in modules for handling video conferencing. With modules such as Verto, it’s possible to establish WebRTC video calls between web clients and SIP clients.

Pion

Pion is an interesting new stack for Web Real-Time Communications. Pion is built on Go and allows developers to use the WebRTC stack as small pieces of lego. Although it isn’t a WebRTC media server, you can build one with the functions Pion exposes.

Final thoughts

We hope that this article helped demystify the concept of WebRTC media servers, explain the features they offer, and provider a few open-source options that are available to you.

Contact us to build your WebRTC app!

Would your business benefit from a WebRTC real-time video and audio chat-based application? Are ready to discuss how you can incorporate a real-time communications solution into your business? We have an experienced team ready and happy to help you out. Contact us today!

Recent Blog Posts