WebRTC allows two users to communicate in a peer-to-peer fashion with no servers involved. This is an optimal approach in terms of cost and quality. Communication flows directly without any additional jumps between peers. With no infrastructure involved, there’s less cost and less complexity.

When we want to add more users or more advanced capabilities, such as recordings or simulcast, we want to put a server in the middle. This is where media servers are useful.

The main challenge for a media server is to scale beyond one single server while optimizing media latency. It’s important to get an optimal path between the endpoints in a conference.

One-to-One vs. Multiparty

In a one-to-one video conference, the ICE protocol is in charge of several things: (1) selecting the best route between the peers, (2) establishing a direct connection, and (3) relying on a TURN server when a direct connection is not possible. In this scenario, the ICE protocol does a pretty good job.

However, when there are more than two users in a video conference, things get more complicated. ICE is not sufficient to provide a robust and scalable platform for more than two users. So we use a media server.

Many CPaaS providers rely on media servers that implement a Selective Forwarding Unit (SFU) mechanism. This provides a good balance between cost and performance.

Geolocation in the Cloud

Cloud vendors, such as AWS, make it easy to provision sets of servers in multiple regions and provide services that allow users in those regions to connect to the server closest to them. While this behavior was originally intended for classical web, video conferencing could also benefit from it as real-time communication applications are very sensitive to network conditions like throughput, delay, and packet loss.

With the growth of WebRTC, conferencing applications are often implemented as web applications. This makes it easier to reuse technologies for geolocation.

Single Server vs. Distributed Conferences

Two popular approaches for hosting video conferences are single server and distributed conferences.

The single server approach means that conferences are stored on a single server to which all clients connect. The size of the server would indicate the size of the rooms where conferences would be hosted and the number of users that can be supported.

In a geolocated infrastructure, the location of the first user usually determines the location of the server where the conference is hosted. 

It’s optimal for all users to be in the same region. When users are in different regions, there may be some issues. In these cases, the next approach may be a better fit.

Distributed conferences allow you to host a conference across multiple media servers by cascading. Adding extra servers increases end-to-end roundtrip time, but it also reduces latency.

This improves the scalability of the infrastructure but increases its complexity and related costs. A proper study should be performed to determine whether the additional servers’ complexity, cost, and RTT justify the benefits gained during a conference.


Media servers are undoubtedly important for WebRTC, especially when it comes to supporting video conferences with more than two users. In these cases, a geolocated scaling mechanism can improve call quality, allowing users across different regions to connect to it in such a way that it reduces network issues. Just be sure to consider both the benefits and the costs when deciding between a single server or distributed conferences.

Recent Blog Posts