How to Build a Custom Integration to External Telephony for your CPaaS-based WebRTC App

The seamless browser to browser real-time audio and video communication that WebRTC enables is supported by a complex infrastructure. Things like signaling, NAT traversal and codec optimization can be difficult to maintain by the average development team unfamiliar with the intricacies of the WebRTC stack. CPaaS providers such as the Amazon Chime SDK, Daily and Vonage encapsulate such complexity and offer easy to use Application Programming Interfaces (APIs) that you can integrate into your application with minimal effort. 

Facilitating communication between a WebRTC application and Public Switched Telephone Network (PSTN) and external Voice over Internet Protocol (VoIP) systems is a common requirement across multiple industries. Session Initiation Protocol (SIP) allows for this interoperability between different communication systems, but requires a bridge or gateway server that “translates” communication between WebRTC and SIP entities.

Many CPaaS providers offer this functionality and take care of setting up and managing gateway servers through an API, just as they do with the rest of real-time communication features. For most cases, this is enough. However, there are some situations where you need more than just an API to manage the communication correctly. For these, a custom SIP integration is required. 

In this post, we begin by taking a look at why you might need to build a custom SIP Integration to external telephony systems in a CPaaS-based WebRTC application. We will then show you the architecture and how to build it using the WebCodecs API and the open source solutions, Puppeteer and the Janus WebRTC Server.

When You Need a Custom SIP Integration

Using a CPaaS can be incredibly convenient. This is just one of the many reasons why you might choose a CPaaS for your application. Yet it comes at the cost of decreasing the control you have over your underlying infrastructure. How SIP integration is achieved is a case in point – as the only option is through an API. 

There are some situations where you need more than just an API to manage the communication correctly. For example:

  • When you want to support media that is not supported by the CPaaS implementation, i.e., it only supports voice but you also want video communication.
  • You’re trying to connect with a legacy VoIP system that doesn’t support current codecs, so you want to transcode the media on your own.
  • You’re connecting to a VoIP system that sends the video of multiple call participants in a single video track, but you want to manage them separately in your application.

In short, any scenario that requires your application to perform additional steps that are not provided by your CPaaS provider of choice is a perfect scenario for implementing a custom SIP integration.

Custom SIP Integration Architecture

At the heart of a Custom SIP integration architecture is the addition of an extra “participant” in the call.  This participant is not a human user, but a software entity that operates through a “headless browser”—a web browser running without any graphical interface—controlled by automation tools such as Puppeteer or Selenium

This participant uses the CPaaS provider’s APIs to join the call in the same way a regular user does, therefore gaining access to the audio and video streams from all participants. With this access, it can use specialized browser functionalities—such as the WebCodecs API—to manipulate the media streams directly, adjusting or processing the data as needed, and also it can send media to the call.

Depiction of the regular and the extra participant joining the call.

Similarly, the extra participant also connects with the WebRTC-SIP Gateway. Once connected, it has the ability to send and receive audio and video streams to and from the both systems, allowing participants in a web conference to communicate with those on external SIP systems without perceptible barriers. 

Depiction of the extra participant connecting to WebRTC-SIP Gateway.

Now that you know the components that make up a custom SIP integration, let’s take a look at how you can implement these in your application.

Running A Headless Browser Using Puppeteer

The first step is to give life to the extra participant. To do, so you can use Puppeteer. Puppeteer is a Node.js library that operates a headless Chrome browser instance, allowing you to mimic real user actions like navigating web pages, filling and submitting forms, and even more advanced ones like capturing the state of a web page as a screenshot or a PDF, programmatically.

Installing Puppeteer in a Node.js application is as easy as follows:

# using npm
npm install puppeteer
# using yarn
yarn add puppeteer

Then, you can write a simple snippet like the one below to join a call in your application. Note that this example doesn’t cover authentication, which should be performed the same as you do with a regular user.

import puppeteer from 'puppeteer';

(async () => {
  // launch the browser and join a video call with id abc123
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('');

  /* add your own logic here */

When the call finishes you can simply close the browser by running:

await browser.close();

Building a Bridge between WebRTC and SIP Using Janus WebRTC Server

Next, you need to set up the bridge between your WebRTC application and the external SIP system. The Janus WebRTC Server provides a SIP plugin that allows WebRTC clients to connect to a SIP server just like a regular SIP client would do. It manages most of the SIP events transparently while exposing the relevant ones to WebRTC clients so they can make and accept calls to and from other clients.

To enable communication between WebRTC and SIP clients, the Janus server has available multiple protocols for the headless browser to establish a connection with it. Once the connection is established, the latter interacts with the plugin through JSON messages that perform multiple tasks. 

For instance, to call a user “goofy” on a “” server running at port 5060, the following message could be used:

   "request" : "call",
   "uri" : "",

You can learn more about how this works in our previous blog post here: Thinking about the Janus Gateway to Build a WebRTC to SIP, READ THIS!

Manipulating the Streams Between Worlds

Finally, you need a way to manipulate the media before sending it to the clients. The WebCodecs API provides low-level access to the individual frames of a video stream and audio chunks. This makes it easy to do many types of transformations like transcoding, cropping, resizing, etc. to them.

Let’s revisit one of the above mentioned scenarios for custom SIP implementation, and assume that your WebRTC application encodes video with VP9 codec but you’re connecting to a legacy system that only supports H.264. Below is a simple example of how you can leverage the WebCodecs API for transcoding the video stream before sending it:

// defining a function for handling encoded video chunks
function handleEncodedChunk(chunk) {
  // Placeholder for handling encoded chunks, 
  //   e.g., sending them to Janus server
  console.log('Encoded chunk:', chunk);

// a function for encoding using h.264
function encode(stream) {
  // creating the video encoder
  const videoEncoder = new VideoEncoder({
    output: handleEncodedChunk,
    error: (e) => console.error('VideoEncoder error:', e),

  // configuring the encoder
    codec: 'avc1.42E01E', // H.264 baseline profile
    width: 640,
    height: 480,
    bitrate: 2_000_000, // 2 Mbps
    framerate: 30

  // Prepare to feed input frames to the encoder
  const videoTrack = stream.getVideoTracks()[0];
  const videoProcessor = new VideoTrackProcessor({track: videoTrack});
  const reader = videoProcessor.readable.getReader();

  // Read and encode frames
  while (true) {
    const {done, value} = await;
    if (done) {
    // Assuming `value` is a VideoFrame
    value.close(); // Close the VideoFrame when done

Take Control of Your WebRTC Application Even When Using CPaaS

Having control over how your real-time communication application communicates with external telephony systems is possible, even when using a CPaaS provider! With Puppeteer you can run and manage a headless browser that joins the session and delivers the media streams to both WebRTC and SIP clients. Janus WebRTC Server allows you to build the bridge between the two worlds. And the WebCodecs API provides you with the customization capabilities you need to tailor the media streams for each of the mediums where these are sent.

If you’re looking into integrating your WebRTC application with PSTN and VoIP systems, our team has you covered! Contact us today to know more about how we can help you. Let’s make it live!

Recent Blog Posts