While most people know Zoom as a video conferencing app, the Zoom Video SDK opens up the powerful infrastructure behind it to developers who want to embed custom video experiences directly into their own web applications. Behind the scenes, Video SDK v2 now leverages WebRTC, the industry-standard protocol for real-time communication in the browser, ensuring lower latency and broad cross-platform compatibility.

If you’re looking to create a live video experience—such as internal collaboration tools, telehealth platforms, educational apps, or virtual events—the Zoom Video SDK for Web is a great starting point, especially for teams already using Zoom.

In this post, we’ll walk you through the foundational steps to integrate WebRTC-powered video into your web app using the Zoom Video SDK. You’ll learn how to:

  • Set up your Zoom Developer account
  • Generate the credentials needed to authenticate your app
  • Enable and configure WebRTC video sessions for seamless in-browser communication
  • Build custom user interfaces

Whether you’re new to real-time video application development or just exploring Zoom’s SDK options, this post will help you hit the ground running.

Creating a Developer Account on the Zoom Platform

The first thing you need to get started is a Developer account in the Zoom platform. To do so, head over to the Plans & Pricing for Developers page and select your preferred plan. You can get started for free with the Video SDK Pay As You Go option, and while a credit card is required, you get a generous quota of free 10,000 minutes each month.

If you already have a Zoom Workplace account, you can add the Video SDK in Plans & Billing, under Plan Management.

In both cases, you need to get your SDK credentials to build your application. To retrieve these, while in your profile, click on Build App, and then scroll down until you see your keys.

Now you’re ready to build your application! If you want to follow along with the examples below, check our zoom-videosdk-demo repository on Github.

Authorizing Requests & Enabling WebRTC

Once you have your SDK credentials, the next step is to create a backend service that authorizes your users to create and join sessions in the Zoom platform. To do so you need to create JWT tokens, and sign them using the SDK secret with a HMAC SHA256 algorithm.

The token requires a header containing an “alg” key set to “HS256” and “typ” set to “JWT”, along with a payload. For the payload, the Zoom platform expects multiple values, including the following:

  • app_key: the Video SDK key.
  • role_type: the type of user making the request.
    • 1 for the session’s host & co-host.
    • 0 for participants.
  • tpc: the session name.
  • iat: the token’s timestamp.
  • user_identity: identifier for the user in the session.
  • session_key: if set by the host, all the participants should provide the same key to join the session.
  • geo_regions: data center connection preferences for the user joining the session.
  • video_webrtc_mode: tells Zoom to enable WebRTC for video streams
    • 1 for enabling it
    • 0 for disabling it
  • audio_webrtc_mode: tells Zoom to enable WebRTC for audio streams
    • 1 for enabling it
    • 0 for disabling it

Note: Zoom’s WebRTC implementation is not yet supported across all browsers and operating systems. Please refer to the Zoom’s Browser Support page for a list of supported platforms. Where unsupported, it will fall back to WebAssembly.

You can see the complete list of available keys at the Authorize page of Video SDK documentation.

With all this information you can use a cryptography library such as jsrsasign to create JWT tokens for your clients. 

For example, here’s how you would create an /auth route for authorizing hosts, using express: 

import express from 'express';
import { KJUR } from 'jsrsasign';
import 'dotenv/config';

const app = express();
const PORT = process.env.PORT || 3000;

app.use(express.json());

app.post('/auth', (req, res) => {
  // get parameters from clients
  //   for prod applications, make sure to validate/sanitize properly 
  const { sessionName, userId, sessionKey } = req.body;

  if (!sessionName || !userId || !sessionKey) {
    return res.status(400).json({ error: 'Missing required fields' });
  }

  const iat = Math.floor(Date.now() / 1000);
  const exp = iat + 60 * 60 * 2

  // create the header for the token
  const oHeader = { alg: 'HS256', typ: 'JWT' }

  // create payload with desired settings
  const oPayload = {
    app_key: process.env.ZOOM_VIDEO_SDK_KEY,
    role_type: 1,
    tpc: sessionName,
    version: 1,
    iat,
    exp,
    user_identity: userId,
    sessionKey: sessionKey,
    geo_regions: 'US',
    video_webrtc_mode: 1, // enable webrtc for video when available
    audio_webrtc_mode: 1, // enable webrtc for audio when available
  }

  const sHeader = JSON.stringify(oHeader);
  const sPayload = JSON.stringify(oPayload);
  
  // sign the token using Video SDK secret
  const token = KJUR.jws.JWS.sign(
    'HS256', 
    sHeader, 
    sPayload, 
    process.env.ZOOM_VIDEO_SDK_SECRET
  );

  res.status(200).json({
    success: true,
    token
  });
});

app.listen(PORT, () => {
  console.log(`Server listening on port ${PORT}`);
});

With the authorization route in place you need to make your clients consume it. For instance, in a React application you can use the useEffect hook and the fetch function to retrieve a token when the component is mounted.

import React, { useEffect, useState } from 'react';

function MyComponent() {
  const [token, setToken] = useState(null);

  useEffect(() => {
    const fetchToken = async () => {
      try {
        const response = await fetch('/auth', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({ 
            sessionName: 'MySession', // Example data
            userId: 'user123',      // Example data
            sessionKey: 'secretKey',    // Example data
          })
        });

        if (response.ok) {
          const data = await response.json();
          setToken(data.token); // Assuming the token is in data.token
        } else {
          console.error('Failed to fetch token');
        }
      } catch (error) {
        console.error('Error fetching token:', error);
      }
    };

    fetchToken();
  }, []); // Empty dependency array ensures this runs only once on mount

  return (
    <div>
      {token ? (
        // render UI for the session
      ) : (
        <p>Fetching token...</p>
      )}
    </div>
  );
}

export default MyComponent;

Quickly add video conferencing using the Zoom UI Toolkit

Once you have your backend authenticator, and your clients are getting their tokens from it, it’s time to join some sessions in the Zoom platform.

One quick way to do this is by leveraging prebuilt components from UI Toolkit. This approach allows you to instantly add a core set of Video SDK features in your app through a ready-made video chat user interface.

This ready-to-use user interface includes features like previewing media devices, sharing video, audio, and screen streams, chat, and call settings. These features can be rendered as individual components or as a combined composite.

Let’s walk through an example of using UI Toolkit in a React application. 

First install the @zoom/videosdk-ui-toolkit library.

npm install @zoom/videosdk-ui-toolkit --save

Next, in your application import the library and create a configuration object including the authorization token, session and user names, and a session key if set. You’ll use this to allow the client to join the session.

import uitoolkit from "@zoom/videosdk-ui-toolkit";
import "@zoom/videosdk-ui-toolkit/dist/videosdk-ui-toolkit.css";
...
function MyComponent() {
  ...
  const config = {
    videoSDKJWT: "your_jwt_token",
    sessionName: "MySessionName",
    userName: "MyUser",
    sessionPasscode: "MySessionKey",
    scenario: {
        mode: uitoolkit.ScenarioMode ?
              uitoolkit.ScenarioMode.DEFAULT : 
              0
      },
    features: [
      'video', 
      'audio', 
      'chat', 
      'share', 
      'users', 
      'settings'
    ]
  };
}
...

Then, define an HTML container for UI Toolkit composite user interface or individual components. You’ll pass the reference of such an element to uitoolkit’s joinSession method along with the configuration object to join the session.

In addition, also define the logic for handling the end of the session. Typically, you’ll want to listen for onSessionClosed and onSessionDestroyed events, and call uitoolkit.destroy() in the latter. You can also end sessions programmatically by calling uitoolkit.closeSession(sessionContainerRef).

...
function MyComponent() {
  ...
  // get the reference of the container for UI Toolkit
  const sessionContainerRef = useRef<HTMLDivElement>(null);

  function joinSession() {
    // join the session by passing reference of ui container and config object
    uitoolkit.joinSession(sessionContainerRef.current, config);

    // handle the end of the session
    uitoolkit.onSessionClosed(() => {
      console.log('Session closed');
    });
    uitoolkit.onSessionDestroyed(() => {
      uitoolkit.destroy();
    });
  }
  ...
  // render ui container
  return (
    ...
    <div id="sessionContainer" ref={sessionContainerRef}></div>
    <button onClick={joinSession}>Join Session</button>
  )
}
...

If you wish to render each component individually, then define a wrapper element instead and add the components you want inside it, later you can use showControlsComponent and hideControlsComponent methods to control its visibility. Note that some components such as the controls component are required.

...
function MyComponent() {
  ...
  function joinSession() {
    ...
    // show controls component when joining the session
    uitoolkit.showControlsComponent(controlsContainerRef.current)
  }
  ...
  return (
    ...
a    <div id="sessionContainer" ref={sessionContainerRef}>
      <div id="controlsContainer" ref={controlsContainerRef}></div>
      ... /* rest of the components */
    </div>
  );
}
Default look of the Zoom UI Toolkit in a React application.

Building custom user interfaces with the Zoom Video SDK

You can also use the Zoom Video SDK to build custom user interfaces. This approach consists of the following steps:

  1. Create a Video SDK client.
  2. Define a UI element for rendering media streams in the page, and also for managing session status.
  3. Initialize the client and set an event handler for the peer-video-state-change event. This will allow the application to handle remote streams changes.
  4. Join the session.
  5. Retrieve the Stream interface and start local audio & video streams.
  6. Render such streams in the page.

Let’s go through each of these steps using a React application as an example. Before anything else, install the @zoom/videosdk library.

$ npm install @zoom/videosdk --save

Next, let’s import the library in our component and create a Video SDK client.

...
import ZoomVideo, { VideoQuality } from '@zoom/videosdk';
...
function MyComponent() {
  ...
  const client = useRef(ZoomVideo.createClient());
  ...
}

Now let’s define a UI element for rendering media streams. Same as before, use the useRef hook to get a reference to it in code. While we’re at it let’s also define buttons for joining and leaving the session.

...
function MyComponent(){
  ...
  const [inSession, setInSession] = useState(false);
  const videoContainerRef = useRef(null);
  ...
  return (
    ...
    {!inSession ? (
      <button onClick={joinSession}>Join Session</button>
    ) : (
      <button onClick={leaveSession}>Leave Session</button>
    )}
    <video-player-container ref={videoContainerRef}></video-player-container>
    ...
  );
}

Let’s now define the joinSession function we referenced above. In there, let’s start by initializing the client and setting an event handler for the peer-video-state-change. Such a handler receives a payload with two properties: action and userId. We use such properties to know whether we want to show or hide, based on the value of the action, the video of a specific user, defined by userId.

...
const joinSession = async () => {
  // initialize the client
  await client.current.init('en-US', 'Global', { patchJsMedia: true });
  // set event handler for remote streams changes
  client.current.on('peer-video-state-change', renderVideo);
}

const renderVideo = async (payload) => {
  // get the stream object. More on this later
  const stream = client.current.getMediaStream();

  // check the action
  if (payload.action === "Start") {
      //show video of userId defined in payload
      const userVideo = await stream.attachVideo(
        payload.userId, 
        VideoQuality.Video_360P
      );
      videoContainerRef.current.appendChild(userVideo);
    } else if (payload.action === "Stop") {
      // hide video of the userId defined in payload
      stream.detachVideo(payload.userId);
    }
}

With the event handler defined let’s continue with the rest of the joinSession function. We want to join the session passing the session name, authorization token, user name and session password. These should match the ones we used to build the token.

After joining the session, we start the local audio and video using an instance of the Stream interface which provides methods for managing the behavior of media streams. Then we can render such media in the page using the renderVideo function we defined previously.

...
const joinSession = async () {
  ...
  // join the session and update session status
  await client.current.join("MySessionName", "your_jwt_token", "MyUser", "MySessionKey");
  setInSession(true);

  // retrieve an instance of the Stream interface
  const stream = client.current.getMediaStream();

  // start local audio and video capture
  await stream.startAudio();
  await stream.startVideo();

  // render video in the page
  await renderVideo({ 
    action: 'Start', 
    userId: client.current.getCurrentUserInfo().userId 
  });
}
...
A React application with custom AI joining a session on an Android device.

Wrapping Up

In this guide, you’ve learned the essential steps to integrate WebRTC-powered video using the Zoom Video SDK for Web into your applications. From creating a developer account and generating JWT tokens, to quickly implementing video conferencing with the UI Toolkit and building custom user interfaces, you now have the foundation to harness Zoom’s powerful video platform and the real-time communication capabilities of WebRTC.

If you’re ready to bring Zoom’s industry-leading video conferencing to your app but want more than just basic integration, WebRTC.ventures is here to help you architect scalable, reliable real-time communication systems tailored to your business needs. From optimizing performance and ensuring security, to integrating complex workflows and providing ongoing support, our team works with you to deliver a polished, future-proof solution.

Let’s make it live! Contact us today and let’s unlock the full potential of the Zoom Video SDK for your web application. 

Recent Blog Posts