Session Initiation Protocol (SIP) and WebRTC are both essential technologies in the field of real-time communications, particularly for voice and video over IP. While they serve complementary roles, they operate differently and have distinct functionalities.

In this post, we explore how to architect the integration of WebRTC and SIP using Multipoint Control Unit (MCU) and Selective Forwarding Unit (SFU), enabling businesses to modernize their communication infrastructure without abandoning existing systems.

Understanding the Relationship Between SIP and WebRTC

SIP is a signaling protocol used primarily for initiating, maintaining, and terminating real-time communication sessions that involve multimedia elements such as voice, video, and instant messaging. Its main responsibility is to manage the initial negotiation prior conversation, which includes:

  • User Location: Identifying where an endpoint is located.
  • User Availability: Determining if the intended recipient can accept a call.
  • User Capabilities: Assessing what media types can be used in the session.
  • Session Setup: Establishing the parameters of the communication.

SIP does not handle the transmission of media itself; instead, it sets up the connection through which media can flow, typically using protocols like Real-time Transport Protocol (RTP) for actual media transfer.

WebRTC, on the other hand, is a set of multiple APIs and protocols that enable real-time communication directly between web browsers and devices without requiring additional plugins. It allows for audio, video, and data sharing directly through web applications. Key features of WebRTC include:

  • Peer-to-Peer Communication: Facilitates direct connections between users’ devices.
  • Encryption of Data: Encrypts data that is sent through the network.
  • Media Handling: Manages audio and video streams effectively.
  • Data Channels: Supports arbitrary data transfer alongside media streams.

WebRTC is designed to simplify the integration of real-time communication into web applications by providing a comprehensive framework that integrates with any kind of signaling mechanism, and takes care of both handling and securing media transmission.

How SIP and WebRTC Work Together

WebRTC can utilize SIP as one of its signaling methods to establish connections. This means that when a WebRTC application needs to set up a call or session, it can send SIP messages -or somehow “translate” its messages into something understandable by a SIP endpoint and viceversa- to negotiate the connection parameters.

This allows many organizations that already use SIP for their current VoIP systems to integrate with  WebRTC applications, enhancing their communication capabilities without completely overhauling their existing infrastructure. 

This integration enables use cases that go from adding the ability to dial in & out for video conferences, to implementing complex scenarios where users in legacy telephony systems can interact with others in modern web applications.

But what does this integration entail and how to architect it in an efficient way? To know the answer, let’s have a short discussion about MCU and SFU approaches. 

Enter MCU and SFU

When discussing video conferencing architectures, Multipoint Control Unit (MCU) and Selective Forwarding Unit (SFU) are two prominent models that serve different purposes and have distinct operational characteristics.

Both are a type of media server that sits in the middle of the participants of a video/audio conference and processes their media streams.

MCU

First came MCU, which combines all incoming streams into a single mixed stream before sending it out to participants. Each participant sends one stream to the MCU and receives one combined stream back, which simplifies client-side processing but increases server-side resource demands.

This process involves significant CPU usage on the server side due to the mixing of streams, which can lead to higher costs for server infrastructure. Not to mention that MCUs may introduce higher latency due to the time taken to mix and encode streams before sending them out.

However, this approach reduces the processing burden on client devices since they only need to decode a single stream. This makes it a perfect fit for audio interactions on low-end devices, which is a common case for many VoIP telephony systems using SIP.

SFU

An SFU also receives multiple media streams from participants but does not mix them. Instead, it forwards each participant’s stream to others as-is. This allows for efficient bandwidth usage since each participant sends their stream only once but receives multiple streams from other participants.

This approach requires less CPU power on the server because it only forwards streams without processing them. On top of that, SFUs can offer more flexibility in terms of stream quality adjustments based on individual participant conditions (e.g., adaptive bitrate streaming)

However, clients must handle multiple streams, which can be resource-intensive if there are many participants. 

WebRTC applications usually rely on SFUs for their media processing and transporting capabilities.

SIP Integration Using MCU & SFU

WebRTC offers powerful, browser-based real-time communication, while SIP remains the backbone of many telephony systems. Bridging these two worlds can be complex, but it’s essential for businesses looking to modernize their communication infrastructure without abandoning existing systems.

By leveraging both MCU and SFU architectures, we can create a system that efficiently handles WebRTC clients while seamlessly integrating with SIP-based systems. 

This approach consists of the following: 

  1. Configuring WebRTC clients to connect directly to an SFU for video conferencing, implementing functionality for clients to join video rooms, publish streams, and subscribe to others as usual.
  2. Set up SIP clients to connect to an MCU and configure call routing to direct SIP calls into appropriate conference rooms within it.
  3. Have a dispatcher/signaling application that orchestrates the integration between the SFU and MCU. Such an application communicates with both via an appropriate protocol such as HTTP or Websockets.
  4. When a SIP client joins, use the dispatcher/signaling application to make the MCU subscribe to relevant WebRTC streams in the SFU, and mix these for delivery to SIP clients.
  5. Publish the mixed stream from the MCU back to the SFU for WebRTC clients.

This whole flow is depicted in the figure below:

SIP, WebRTC, MCU & SFU: A Harmonious Union for Modern Real-Time Communications

Integrating WebRTC with SIP using both MCU and SFU architectures presents a convenient solution for businesses looking to modernize their communication infrastructure. By understanding the strengths of each approach, you can build a system that efficiently handles real-time communication between legacy SIP clients and modern WebRTC applications.

Ready to Revolutionize Your Communication Infrastructure? If you’re interested in exploring the potential of SIP and WebRTC integration, consider partnering with experts who have hands-on experience in architecting scalable solutions. 

The WebRTC.ventures team is dedicated to guiding businesses like yours through this complex process, from assessment to implementation. We’ll help you choose the right architecture (MCU or SFU) for your use case and develop a tailored solution that meets your unique needs. Contact us today to discuss how we can help you unlock the full potential of SIP and WebRTC integration. Let’s make it live!

Recent Blog Posts