Real-time communication has become an essential part of the way we interact with each other over the internet. From video calls and live streaming to interactive gaming and instant messaging, we are constantly relying on reliable and instantaneous exchanges of information. Enter WebRTC: a transformative technology standard that enables real-time communication directly through web browsers without needing additional software.
In this post, we’ll delve into the essence of WebRTC, exploring its standards, technological framework, and the growing developer ecosystem that supports its growth and innovation.
What is RTC and Why Do We Want it ‘Web’?
WebRTC stands for Web Real-Time Communication. Its goal is to provide web browsers and any kind of software application with real-time communication capabilities.
Real-Time Communication (RTC) refers to the exchange of information over the network without perceived delay. This information usually involves video and audio, but it applies to any kind of data. Most common applications of RTC are video conferencing, interactive live streaming, file sharing, instant messaging and cloud gaming.
To understand the importance of WebRTC we need to go back in time, to mid-late 2000s to be precise, where real-time communication involved installing specific applications that were often not available in all platforms and in some cases limited to a narrow set of users.
Even browser-based solutions, which like today were ubiquitous in most devices, required installing additional licensed or proprietary software that often contained security issues or bugs that disrupted the experience.
These limitations brought up the need for a common framework that would allow developers to build powerful and secure RTC applications that run everywhere and that don’t require users to install additional software.
These circumstances prompted Google, after acquiring Global IP Solutions company, to release an open-source project for browser-based real-time communication known as WebRTC in 2011.
WebRTC as a Standard
On the heels of the arrival of WebRTC came the effort of standardization by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). The W3C defines the Application Programming Interfaces (APIs) that developers use to build applications, while IETF defines the network protocols that enable exchanging data over the network.
It took ten years! It wasn’t until early 2021 that the WebRTC specification transitioned from a Proposed Recommendation to a Recommendation. This status provided WebRTC with the endorsement of the W3C and meant a huge step forward in ensuring interoperability between different browsers and platforms.
The WebRTC standard describes key components such as the details for establishing peer-to-peer (p2p) connections, the means for handling media content, and the specifications for direct data exchange. It ensures secure communication through encryption and incorporates mechanisms for managing permissions and user consent.
The specification also lists out the relevant protocols defined by the IETF, such as RFC8825: “Overview: Real-Time Protocols for Browser-Based Applications” and RFC8826 “Security Considerations for WebRTC”.
Intentionally omitted from the specification is a defined mechanism for the pre-connection negotiation process, known as Signaling. This is a critical step as it will allow peers to exchange the information that will enable communication such as the pair of IP addresses and ports (also known as ICE candidates) of each client, the codecs to use, and encryption details.
Instead, the implementation of a Signaling mechanism falls to the developer. Common solutions for Signaling include WebSockets, SIP, and message brokers.
WebRTC as a Technology
Alongside the standard is the open source technology, known as libWebRTC, that is present in most modern browsers. This provides the Javascript API that developers use to build their applications. We cover this and also other WebRTC implementations that take such capabilities outside of the browser in a previous post: “Native WebRTC Development: A Guide to libWebRTC and Alternatives”.
Such an API includes components like getUserMedia to access camera and microphone, getDisplayMedia to access content from the device screen, RTCPeerConnection to establish calls, and RTCDataChannel for bidirectional data transfer.
It’s worth noting that building a real-time communication application using WebRTC involves much more than just calling components from the API. Additional work includes performing Signaling, configuring settings for traversing Network Address Translation (NAT) and media processing & routing.
WebRTC Developer Ecosystem
While it’s possible to build WebRTC applications with entirely in-house developed tools. The common practice is to leverage third-party tools and libraries which allow developers to speed up development while also relying on proven solutions.
On top of that, such tools often provide their own client-side abstraction of the API components. This abstraction facilitates interaction with the specific solution and adapts the APIs to more recent web development technologies such as React or Vue.
This set of external solutions lays the foundations of a whole ecosystem around WebRTC. The ecosystem is mainly composed of the following:
Client-Side Libraries
As mentioned before, some third party libraries abstract components from the API and offer a simpler or tailored approach for implementing such features. Client-side libraries are usually specific for the application stack, for instance React Native applications use the react-native-webrtc library, and web applications connecting to Amazon Chime SDK use the amazon-chime-sdk-js one.
Signaling Libraries
These are a subset of client-side libraries whose goal is to integrate WebRTC applications with a signaling mechanism. Some examples are SIP.js and socket.io which provide integration with the Session Initiation Protocol (SIP) and a Socket.IO server, respectively.
Media Servers
These provide a server-side WebRTC interface that allows the implementation of Multipoint Control Unit (MCU) and Selective Forwarding Unit (SFU) architectures. These enable more advanced features such as multi party calls, server-side recordings, bridging to other telephony systems or applying specific processing to media streams. Examples are Janus, mediasoup and LiveKit.
ICE Servers
These provide tools that help WebRTC applications traverse NAT limitations by supplying peers with ICE candidates, and relaying media traffic when direct connection is not possible. A typical example of this is Coturn.
WebRTC Architecture
Building a WebRTC application involves combining multiple solutions and tools, each designed to perform a specific task. These components, based on the WebRTC standard and technology, work together to ultimately enable communication between the peers. Such a process is depicted below.
Provisioning an architecture like the one depicted above can become a significant challenge for companies lacking the proper expertise, especially when running at scale. For these cases, there is a simpler approach known as Communication Platform as a Service (CPaaS).
In this approach, instead of having to provision a WebRTC infrastructure yourself and deal with the complexities mentioned above, your application connects to the provider’s platform through a simple API. Examples of such providers are the Amazon Chime SDK and Daily.
Challenges and Trends
It’s been over a decade plus a couple of years since WebRTC was first announced. The way we interact with each other over the internet has further evolved since then, and WebRTC has been able to adapt to most of these changes. However, it has failed to provide a well-suited solution for all use cases.
One typical example is live streaming. While WebRTC provides a low-latency approach that enables interactive experiences, it hasn’t been able to fully equate with other streaming protocols like HLS and RTMP. Despite adding a bit of latency, these support a considerably wider audience.
Also, real-time communication applications nowadays go beyond just video conferencing, including everything from cloud gaming to remote desktop applications. Such use cases usually don’t require all the features from WebRTC, but developers often end up having to implement the whole package in order to make things like FlexFEC and Adaptive Bitrate available.
Both scenarios have led to the development of new trends and solutions related to real-time communication capabilities for the web.
One is the “unbundling” of WebRTC into smaller pieces:
- WebTransport for managing media sending over the network
- WebCodecs for encoding/decoding media
- WebAssembly as the glue, managing packet loss, retransmission, echo cancellation, etc.
Other new developments, more focused on providing a more optimal approach to live streaming use cases, are Media over Quic (MoQ), and WHIP & WHEP.
And there is also Artificial Intelligence (AI), which, although not directly related to WebRTC, is becoming a must-have across various industries and products. Real-time communication is no exception, and AI is increasingly being integrated to enhance the functionality and user experience of products such as video conferencing applications and contact center solutions.
Common applications of AI and WebRTC goes from having AI enhancing media compression and processing behind the scenes, to enabling in-call features such as background removal, transcriptions and sentiment analysis.
WebRTC: Revolutionizing Communication Over the Internet
WebRTC is not a one size fits all solution. Fortunately, new trends and technologies are emerging to fill any gaps. Our team can help guide you through the decision, and work with you to build the real-time communication solution that is the best fit for your project.
Ready to harness the power of real-time communication for your next project? Contact us today and let’s make it live!