Over the last few months, many live video applications, products, and platforms have experienced huge growth in usage. Because of the coronavirus outbreak, niche applications quickly became mainstream. While these applications are keeping us connected and helping us work from home, this transition has also exposed some major weaknesses.

We know that security and privacy are important topics for companies. Users and businesses care about these topics and ask questions about them. Is my WebRTC application secure? What’s the deal with Zoom’s privacy and security issues? Do those issues exist in my WebRTC application?

If you’re curious about the security risks of using Zoom, you may want to read this great technical post. To make the platform scalable and easy to use, they made some sacrifices to security. Combined with weak encryption, these sacrifices affected the company economically and forced them to apologize for their security lapses.

While using WebRTC doesn’t mean you’re 100% secure, you won’t experience common issues like weak encryption. WebRTC is a public standard and framework, so it’s tested much more than many other proprietary protocols. Many companies use it because it’s based on proven standards, written by industry experts, and has been used in commercial products for years.

Intro to WebRTC Security

Mobile apps undergo a review process and are explicitly installed by the user. Web applications, on the other hand, don’t require any installation, and it’s the browser’s job to enable access to the internet while providing adequate security protections for users. The browser is the portal through which the user accesses all WebRTC applications and video/audio content. WebRTC always requires the user to give explicit permission to use their camera and microphone with a new web application.

There are two core protocols defined by the IETF for providing WebRTC security: SRTP for media traffic and DTLS-SRTP for key negotiation. WebRTC-compatible endpoints use the AES cipher with 128-bit keys to encrypt audio and video and HMAC-SHA1 to verify data integrity.

To establish media communications, we need to establish the connecting through a discovery and negotiation process called signaling. This process is not defined, but as we do in other types of web applications, we need to implement authentication, authorization, and end-to-end (E2E) encryption.

One-to-one

WebRTC media exchange with its security mechanisms
Figure 1: Flow diagram of a WebRTC peer-to-peer media exchange with its security mechanisms

As shown in the image above, the one-to-one minimalistic approach lets us use all the properties of WebRTC making it P2P and encrypted end-to-end. This is an ideal scenario that gets more complicated if we need to support multiparty with media servers in between.

Multiparty

Figure 2: Flow diagram of a WebRTC multiparty media exchange through a media server with its security mechanisms

For simplicity, the image above only shows the media exchange. The signaling part would be implemented in the same way as a one-to-one communication. As shown in Figure 2, we would keep using the DTLS-SRTP framework. In this case, however, we have an intermediate participant, the media server, which would decrypt and re-encrypt the media. Obviously, that’s not great if you don’t trust the media server.

One of the main security challenges left with live video today is end-to-end encryption

The closer you can get to end-to-end encryption at large scale is what most CPaaS do: Media streams are temporarily decrypted within the cloud servers and then immediately re-encrypted before being sent through the internet to the subscribing client. This decryption is necessary for managing group calls, other types of media exchange, intelligent quality control, and session recording.

Many companies are already working on a E2E encryption solution for their media servers. The idea is that, rather than trying to tweak the existing DTLS-SRTP implementation, conferences can simply add the additional layer of E2E protection on top of the existing one. Basically, the clients would encrypt media frames (using the Insertable Streams API), and only the participants at the other end would be able to decrypt that video. You can find some interesting information about this here

Figure 3: E2EE with insertable streams demo from webrtc-samples where Middlebox represents what the media server would see. Insertable Streams is not supported by default in Chrome yet, so you might need to enable that in chrome://flags in Canary.

Interactive Connectivity Establishment (ICE)

WebRTC applications must collect ICE candidates as part of the process of connecting with other clients. ICE is the IP address discovery process. The web app will contact its configured STUN and TURN servers and asking them for IP addresses. Those IP addresses will be used to connect with the other client directly or, if that’s not possible, relaying UDP packets containing application data between clients through the TURN server. As long as DTLS is implemented and used properly, using a TURN relay will not weaken WebRTC security.

WebRTC IP address leaks. Because WebRTC uses the local IP address of the browser, this is available for the client to send through the signaling channel. While this used to be a privacy problem and some sites were exploiting it, mDNS is preventing most of those IP address leaks today. 

SIP Legacy Integration

This specific case is more challenging because many VoIP systems are not encrypted or only support SDES encryption. As a result, WebRTC applications are often forced to use this for the VoIP media exchange leg.

Please note: For simplicity, I wrote about web-based applications in this blog post. However, most of the concepts described here also apply to WebRTC iOS and Android applications.

References: Security Considerations for WebRTC and Interactive Connectivity Establishment (ICE):  A Protocol for Network Address Translator (NAT) Traversal

Recent Blog Posts