Real-time communication for the web has changed substantially since its initial inception in 2010. While WebRTC has become the preferred technology to send low latency video, voice, and data, it was originally designed for use as a direct single communication between peers (browsers). Today, we are building much more complex and scalable applications that would be limited by bandwidth and CPU if we followed the direct peer-to-peer approach. For that reason, the use of WebRTC media servers in real-time applications has become a must-have to lower those limitations.
What is a WebRTC server?
As its name indicates, a WebRTC media server is a server optimized to efficiently receive and send media. One of WebRTC’s main selling points was that it allowed peer-to-peer communication with little intervention of a server, which is usually used only for signaling. However, in some situations, this approach is not recommended or even possible. In this blog post, we will discuss why this is not recommended and what open source media server solutions are available.
Multiple participants in a video call
While it’s possible to hold video calls with multiple participants using peer-to-peer communication, it stops being practical as the number of participants increases. This is because a peer must sends their video/audio stream to every participant while receiving a video/audio stream from every participant. This is where a media server comes in handy: it reduces the number of streams a client needs to send (usually to one) and, depending on the media server’s capabilities, can even reduce the number of streams a client needs to receive.
When a media server acts as this kind of media relay, it is usually called an SFU, or a single forwarding unit. Its main purpose is to forward media streams between clients. There is also the MCU, or the multipoint conferencing unit. This is used to address a media server that not only forwards but can also operate on the media streams that go through it. An example of this is mixing all video or audio streams into a single one.
Integration with other communication technologies
Another advantage of using a media server is communicating with systems beyond what web technologies allow. An example is the PSTN via SIP trunking. Another example is streaming through RTMP to services that support it, like Facebook Live and YouTube Live Streaming. In this blog post, we go through the development of a simple video conferencing application interacting with SIP clients.
Processing of media streams
Some media servers allow the processing of video and audio streams at a very low level, like running computer vision models on the video or sending the audio stream to a speech recognition engine.
We’ve discussed this topic before. In this blog post, we demonstrated how to build a live streaming application capable of real-time image detection.
Which OSS WebRTC media server options are available?
These are some of the most mature and popular ones:
Jitsi is not just a WebRTC media server. It has a whole platform built around it! The Jitsi family of products include Jitsi Videobridge (Media Relay, SFU), Jitsi Meet (conference web client), Jigasi (Jitsi Gateway to SIP), and others. The most appealing feature of the Jitsi platform is that it includes everything for a communication platform to be up and running in a few hours. It implements its own signaling using Jingle (XMPP) and a fully featured web interface. It also provides SDKs for Web, Android, iOS, React-native, and Electron apps.
Kurento is not only a media server but also a toolkit to build one. The main advantage of Kurento is its versatility. It introduces the concept of a media workflow to define in the code how and where the media flows. This allows a WebRTC developer to compose and integrate interesting features, such as computer vision (face detection, QR code recognition), real-time media modification, and interop with RTP (VoIP) services. Kurento can also be configured to function as SFU, MCU, or both in a single instance. A few years ago, they started with OpenVidu, a new platform to facilitate the use of Kurento functionalities from a higher-level client in your web or mobile applications.
Other OSS alternatives with WebRTC media server capabilities
There are some other open source WebRTC media servers that are fairly popular, too:
This is a relatively new and interesting media server. What makes it different is that it’s designed to be a library (for Node), allowing for integration with bigger applications.
There are other popular platforms that weren’t originally developed to be WebRTC media servers but have WebRTC media server capabilities:
FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. Similar to Asterisk, FreeSWITCH’s core functionalities are in the telephony field, support WebRTC, and have built-in modules for handling video conferencing. With modules such as Verto, it’s possible to establish WebRTC video calls between web clients and SIP clients.
Pion is an interesting new stack for Web Real-Time Communications. Pion is built on Go and allows developers to use the WebRTC stack as small pieces of lego. Although it isn’t a WebRTC media server, you can build one with the functions Pion exposes. This WebRTC SFU is an example.
There are many options for WebRTC media servers, and all of them have pros and cons. When choosing an open source WebRTC media server, we should gather the user requirements to choose the best fit for the use case.
Contact us to help you with your WebRTC media servers and build your real-time applications!
Note: This is an update to our original blog post from 2017.