In a previous post, we showed you how to perform background removal using the Insertable Streams for MediaTrack API and the VideoFrame interface of the WebCodecs API, as an alternative to HTML Canvas elements. Having just scratched the surface of what a complete live video processing pipeline looks like, today we will expand on this functionality. We will add the ability to manipulate video streams in order to add text and overlay images (even a QR code!) into a broadcasting application.

Stay tuned for future posts where we will explore real-world implementation of these features.

Prerequisites

The code for this application is available on Github. Code along as you read if you like, or go straight to the finished version. You’ll need at least NodeJS 16. (Installation instructions, if needed.)

You can download the code using the git cli tool as shown below. (Installation instructions, if needed.)

# Download code from the repo
git clone https://github.com/WebRTCventures/live-video-processing-demo
cd live-video-processing-demo
# If you plan to code along, switch to code-along branch
git checkout code-along

If you don’t want to install Git, you can also download the code using the zip version available in the repo. If you plan to code along, be sure to download the zip file from the code-along branch.

Once you have downloaded the code, make sure to install all the required dependencies. To do so, open a terminal window, navigate to the project folder and run the npm install command as shown below:

# navigate to project folder
cd route/to/live-video-processing-demo
# install dependencies
npm install

Running The Base Application

First, let’s get familiar with the base peer-to-peer broadcasting application we will work with. This is the same as the one in the code-along branch. To run it, open a terminal application and, from the project folder, run the server.js file using Node, as shown below:

# run the application
node server

Next, open http://localhost:3000/presenter.html in your favorite web browser. Then, open a couple more browser tabs and navigate to http://localhost:3000/viewer.html. You’ll see a user broadcasting his/her video streams and a couple of other users viewing it.

In addition, all users can chat using the controls on the right.

The base broadcast application showing the presenter’s view.

Creating a Live Video Processing Pipeline

Now that you have the application running, we’re ready to create our live video processing pipeline. To do so, let’s create a new file track-utils.js under the public folder. There, add a function that will create a processed video track

This function will be similar to the one described in our background removal post, with a pair of MediaStreamTrackProcessor and MediaStreamTrackGenerator objects that will be used as the input and output of the pipeline, and also a TransformStream object that will act as the transformer. 

This TransformStream object receives a transform function – where all the magic happens – as a parameter along with the video track that will be processed. Finally, we bundle all together and return the MediaStreamTrackGenerator object that contains the processed video track. The code of the function is shown below:

// public/track-utils.js

// function that creates a processed track
// it receives a track and a transform function
function createProcessedTrack({ track, transform }) {
  // create MediaStreamTrackProcessor and MediaStreamTrackGenerator objects
  const trackProcessor = new MediaStreamTrackProcessor({ track });
  const trackGenerator = new MediaStreamTrackGenerator({ kind: track.kind });

  // create the transformer object passing the transform function
  const transformer = new TransformStream({ transform });

  // connecting all together
  trackProcessor.readable
    .pipeThrough(transformer)
    .pipeTo(trackGenerator.writable);

  // returning the resulting track
  return trackGenerator;
}

Now, we need to create a transform function. This function receives two parameters: a video frame and a controller. It processes each video frame, applying the desired manipulations, and then adds it to the controller’s queue to create a new processed track.

The function’s content will be different based on the desired effect, and we aim to have the flexibility to change it dynamically. To simplify this process, we will use a “transform function factory” that will create a custom transform function each time.

Let’s start with a very minimal transform function where we simply pass the current video frame to the controller without any modifications. We will use this one to “clear” any other processing we make.

// public/track-utils.js
...
// our "clean" transform factory
function cleanStream() {
  // it returns the actual transform function
  return function transform(frame, controller) {
    // for now, let's queue the current video frame
    controller.enqueue(frame);
  }
}

The next step is to include this pipeline in the presenter’s view. To do so, start by adding the newly created script on presenter.html. Then, let’s create a variable that stores the selected transform function, and also change the handler for getUserMedia so it processes the video track before adding it to the video element. 

The trick is to pass the video track acquired from getUserMedia along with the transform function we get from the transform function factory to our createProcessedTrack. We achieve this by using an anonymous function that calls whatever transform function is stored in transformFn.

The code for this is shown below:

<!-- public/presenter.html -->
...
  <script src="/socket.io/socket.io.js"></script>
  <!-- adding the newly created script -->
  <script src="track-utils.js"></script>
  <script src="main.js"></script>
  <script>
    const localVideo = document.getElementById("localVideo");
    // a variable to store the selected transformation
    let transformFn;

    navigator.mediaDevices
      .getUserMedia({ audio: false, video: true })
      .then((stream) => {
        /* use the stream */
        // create a transform function and assign it to transformFn variable
        transformFn = cleanStream();

        // start the video processing pipeline
        const pTrack = createProcessedTrack({
          track: stream.getVideoTracks()[0],
          transform: (frame, controller) => transformFn(frame, controller)
        });

        // add the processed track to video element in a new stream
        localVideo.srcObject = new MediaStream([pTrack]);
        styleVideos();
        startSignaling('p');
      })
...

Try refreshing the presenter view (or open it once again if you have closed it). While it seems like nothing happens, our video processing pipeline is working behind the scenes!

Adding Text

To make it more interesting, let’s add a way for a presenter to display text announcements over the video. To do so, we will create a new transform function factory that sets up the necessary components and provides the required transform function.

This function will combine the video stream with a moving text overlay, and render the final result in an OffscreenCanvas, which is then used to create a new video frame using WebCodecs API’s VideoFrame interface.

// public/text-utils.js
...
// a customizable transform function factory for adding text
// let's add some default values
function showText({
  text,
  txtInitialX,
  txtColor = 'white',
  txtFontSize = '48px',
  txtFont = 'serif',
  textSpeed = 2,
  bgColor = '#08b9a6',
  bgPadding = 10,
  position = 'top'
}) {
  // an ofscreencanvas for drawing video frame and text
  const canvas = new OffscreenCanvas(1, 1);
  const ctx = canvas.getContext('2d');
  
  // some values for text size and x position in the canvas
  const intTxtFontSize = parseInt(txtFontSize);
  let x = txtInitialX;

  // the transform function
  return function transform(frame, controller) {
    // set canvas size same as the video frame
    const width = frame.displayWidth;
    const height = frame.displayHeight;
    canvas.width = width;
    canvas.height = height;

    // determine position of the text based on the params
    const bgHeight = intTxtFontSize + bgPadding;
    const bgPositionY = position === 'bottom'
      ? height - (intTxtFontSize + bgPadding + 5)
      : 5;
    const txtPositionY = position === 'bottom'
      ? height - (Math.floor(bgPadding / 2) + 10)
      : 5 + intTxtFontSize;

    // let's draw!
    ctx.clearRect(0, 0, width, height);
    ctx.drawImage(frame, 0, 0, width, height);
    ctx.font = txtFontSize + ' ' + txtFont;
    ctx.fillStyle = bgColor;
    ctx.fillRect(0, bgPositionY, width, bgHeight)
    ctx.fillStyle = txtColor;
    ctx.fillText(text, x, txtPositionY);

    // move the x position of the text
    x -= textSpeed;
    // restart the position after it leaves the screen
    if (x <= (0 - 100 - text.length * 20)) {
      x = width
    }

    // create a new frame based on the content of the canvas
    const newFrame = new VideoFrame(canvas, { timestamp: frame.timestamp });

    // close the current frame
    frame.close();
    // enqueue the new one
    controller.enqueue(newFrame);
  }
}

The next step is to call this new transform function factory from the UI and update the value of the transformFn variable so the pipeline processes the video track accordingly. Additionally, we’ll include an extra text box for users to input their announcements and a button to display them.

While we’re at it, let’s also add a way for the user to clear up the message and get back to the original stream. Or should we say, apply the “clean” transformation we added before.

<!-- public/presenter.html -->
...
<div id="chat" class="chat">
  <!-- adding the new UI elements -->
  <input id="announcement" type="text" placeholder="Make an announcement..." />  
  <button id="announcementBtn">Announce</button>
  <button id="clsAnnouncementBtn">Clear</button>
  ...
</div>
...
<script>
  ...
  let transformFn;
  // get references from html elements
  const announcement = document.getElementById('announcement');
  const announcementBtn = document.getElementById('announcementBtn');
  const clsAnnouncementBtn = document.getElementById('clsAnnouncementBtn');

  ...

  // an event for adding the announcement
  announcementBtn.addEventListener('click', () => {
    // update transformFn with transform function from showText factory
    // use current video element stream width to calculate initial x position
    transformFn = showText({
      text: announcement.value,
      txtInitialX: localVideo.
        srcObject.
        getVideoTracks()[0].
        getSettings()['width']
    });
  });

  // an event for "cleaning" the stream
  clsAnnouncementBtn.addEventListener('click', () => {
    transformFn = cleanStream();
    announcement.value = '';
  });
</script>

It’s time to see our first two transform functions in action! Open the application once again and now try to send an announcement. You should see something like the image below.

The broadcast application now with the ability to add text on top of the video stream.

You should also be able to clear the video stream, which is the same as changing to a transform function that doesn’t perform any processing whatsoever.

Adding Overlay Images

Let’s introduce a new feature to our pipeline: the ability to overlay images onto the video stream. To achieve this, we’ll create another transform function factory. Here, we’ll leverage the File API to read an image provided by the user, and then utilize an OffscreenCanvas to combine all the elements.

As before, we will use WebCodecs API’s VideoFrame interface for generating video frames for the processed track.

<!-- public/presenter.html -->
...
<div id="chat" class="chat">
  <!-- adding the new UI elements -->
  <input id="announcement" type="text" placeholder="Make an announcement..." />  
  <button id="announcementBtn">Announce</button>
  <button id="clsAnnouncementBtn">Clear</button>
  ...
</div>
...
<script>
  ...
  let transformFn;
  // get references from html elements
  const announcement = document.getElementById('announcement');
  const announcementBtn = document.getElementById('announcementBtn');
  const clsAnnouncementBtn = document.getElementById('clsAnnouncementBtn');

  ...

  // an event for adding the announcement
  announcementBtn.addEventListener('click', () => {
    // update transformFn with transform function from showText factory
    // use current video element stream width to calculate initial x position
    transformFn = showText({
      text: announcement.value,
      txtInitialX: localVideo.
        srcObject.
        getVideoTracks()[0].
        getSettings()['width']
    });
  });

  // an event for "cleaning" the stream
  clsAnnouncementBtn.addEventListener('click', () => {
    transformFn = cleanStream();
    announcement.value = '';
  });
</script>

Next, let’s add an input element where the user can add the image. We will also need an event listener that creates a new transform function after an image has been added.

<!-- public/presenter.html -->
...
<div id="chat" class="chat">
  <input type="file" id="image" placeholder="Add an image..." accept="image/*" />
  <input id="announcement" type="text" placeholder="Make an announcement..." />
...
<script>
  ...
  const image = document.getElementById('image');
  let transformFn;
  ...
  
  image.addEventListener('change', () => {
    transformFn = showImage({image: image.files[0]});
    announcement.value = '';
  });
...

Open the application once again and try adding an image. You should see something similar to the image below:

The broadcast application now has the ability of showing an image on top of the video screen.

Adding QR Codes

This new superpower feels great, doesn’t it? What other kind of live video processing can you think of? What about QR codes? These are a convenient way to share information with others and we can use our pipeline to easily add them to our application.

First, we need to generate a QR code. Let’s add the QRCode.js QR code generator library to the code. This is as simple as adding the following line to the presenter.html file.

<script src="https://cdnjs.cloudflare.com/ajax/libs/qrcodejs/1.0.0/qrcode.min.js">
</script>

Then, let’s create a transform function factory. It will be similar to the one we created for adding images. But instead of reading an image, we will generate a new QR code using the third party library we just added. The code for the transform function factory is shown below:

// public/text-utils.js
...
// a customizable transform function factory for adding QR codes
// let's add some default values
function showQr({
  text,
  qrWidth = 256,
  qrHeight = 256,
  colorDark = '#000000',
  colorLight = '#FFFFFF',
  positionX = 10,
  positionY = 10
}) {
  // a canvas for bundling everything together
  const canvas = new OffscreenCanvas(1, 2);
  const ctx = canvas.getContext('2d');
  
  // a div element for hosting the QR code
  const qrDiv = document.createElement('div');
  
  // generating a new QR code on the qrDiv element
  new QRCode(qrDiv, {
    text,
    width: qrWidth,
    height: qrHeight,
    colorDark,
    colorLight
  });

  // the transform function
  return function transform(frame, controller) {
    // setting canvas size same as the video frame
    const width = frame.displayWidth;
    const height = frame.displayHeight;
    canvas.width = width;
    canvas.height = height;

    // drawing the current video frame and QR code
    ctx.clearRect(0, 0, width, height);
    ctx.drawImage(frame, 0, 0, width, height);
    ctx.drawImage(
      qrDiv.querySelector('canvas'), 
      positionX, 
      positionY, 
      qrWidth, 
      qrHeight
    );

    // get the current video frame timestamp before closing it
    const timestamp = frame.timestamp;
    // close the current video frame
    frame.close();

    // create a new video frame based on the content of the canvas
    const newFrame = new VideoFrame(canvas, { timestamp });
    // enqueue the new video frame
    controller.enqueue(newFrame);
  }
}

For the UI, let’s reuse the announcement text input and add an additional button for generating a QR code from the content of such input. Finally, let’s add an event listener for this new button to update the selected transformation:

<!-- public/presenter.html -->
...
<input id="announcement" type="text" placeholder="Make an anoouncement..." />
<button id="announcementBtn">Announce</button>
<button id="qrBtn">Make QR</button>
<button id="clsAnnouncementBtn">Clear</button>
...
<script>
  ...
  const qrBtn = document.getElementById('qrBtn');
  let transformFn;
  ...
  qrBtn.addEventListener('click', () => {
    transformFn = showQr({ text: announcement.value });
    image.value = '';
  });
...

Let’s open the application once again and see our video showing a QR code.

The broadcasting application shows a QR code on top of the video frame.

Conclusion

The Insertable Streams for MediaStreamTrack and WebCodecs APIs provide an easy and straightforward approach for implementing live video processing pipelines for real-time communication applications. This empowers you to incorporate powerful capabilities like adding text, overlay images, and even QR codes that greatly enhance the value of your application. Keep an eye out for a future post where we’ll explore a real-world implementation of these features. Stay tuned for more exciting updates!

If you’re interested in implementing live video processing on real-time communication applications with WebRTC, you can always leverage the experts on WebRTC applications. Contact us and let’s make it live!

Recent Blog Posts