Real-time communication is quickly becoming a must-have feature in many types of applications. From customer service to telehealth, video conferencing is rapidly integrating into different industries’ workflows. If you’re reading this, chances are that this phenomenon caught your attention. You may be asking yourself: How do I build a video conference app?
Building this kind of application can be complex without the proper guidance. You may end up with an app that “kills” your business by costing twice its value. It may result in an app that barely satisfies your requirements and doesn’t have the features you need.
In this article, we’ll provide some guidelines that will aid you in building a successful video conference application with the help of our core building block, WebRTC.
Step 1: Select a Platform (or not)
The first thing to consider when building a video conference application is the target platform. This will determine both the tools needed to build the app and the budget you’ll work with.
On which devices do you want your application to run? Smartphones? Tablets? Laptops or desktop computers? All of the above?
When you use WebRTC, you don’t need to select a specific platform because you can offer to support all of them. However, you should select the approach that best matches your requirements and the resources you have at hand.
Available options are web-based and native applications.
Initially, you may want to develop a web-based application. WebRTC APIs are already included in all major browsers, so any device that has one of them installed can be used to access your app.
Many popular video conference applications offer web-based applications as the default type of application for the desktop platform. Although this generally works for desktops and laptops and SHOULD work well on mobile, you may want to use something more optimized for that specific platform.
In this case, a native application may make more sense. Additionally, if you want to support legacy devices which don’t support WebRTC APIs, native is the way to go.
For native applications, take into account which mobile platform you wish to support.
According to WebRTC.org, Android and iOS are officially supported through their respective programming languages: Java and Swift/Objective-C. In order to support both operating systems, you may need to build a different application for each. Or you can use frameworks like React Native, which allows you to build native apps that run on both Android and iOS devices.
A technology that’s gaining popularity is Progressive Web App (PWA). It combines the best of both web and native applications. You might want to check this out when considering your strategy to build your application.
Many popular video conferencing applications like Hangouts or appear.in offer a web-based application for desktop and a native application for mobile devices. Think about what suits your needs best. Adopt a strategy that allows you progressively support your desired target platform(s).
Step 2: Define Your Features
After selecting your application’s platform, you need to define the features your app will offer. Depending on your business requirements, you might want to prioritize and focus your efforts and budget toward these.
For example, being able to use filters or funny icons during a call is a nice touch, but it might not be suitable for a technical support or customer service app. Instead, the ability to record calls can be a useful feature, especially for quality analysis or regulatory compliance.
Here’s a short list of the most common real-time features offered by popular video conference apps.
Pre-call video preview
Popular apps like Google Hangouts allows the user to check the camera before joining a call. They also provide the option to disable the video camera.
Written communication is a must in most video conference applications. It provides an additional channel for users to communicate during a call.
Exchanging files is a convenient feature. In a telehealth app, this allows a patient to send test results and medical records to their physician. Please note that you must take into account the measures needed to securely store and transport these files, especially for a telehealth application.
Allowing more than two users to join a call is complex because it involves not only the application but also the underlying infrastructure that hosts it. Having a clear idea of the number of users you want your application to support in a call is the key to choosing an appropriate strategy later.
Sharing screens during a call is helpful in areas like remote technical support, where a specialist can guide users when they’re having trouble by giving instructions based on what the user sees.
Many apps allow call recording for different purposes. If you want your application to record calls, consider the type of storage needed, the format of the recordings, and the security measures necessary to prevent unauthorized users from accessing them.
Filters and icons
Popular social media applications allow users to add filters or icons to media streams during a call. This can be fun and can serve different purposes, but its practicality depends on your business and use case.
This is a valuable feature for educational applications, in which a user teaches something to other users. Whiteboard gives the teacher user a tool to visually express an idea to student users.
Live streaming is another popular feature thanks to social media! It allows a user to stream video and audio to other users in real-time. This can be really helpful in disaster control applications, where a rescue worker with access to disaster zones can give feedback about a situation to government or rescue organizations in real-time so that they can respond appropriately.
Step 3: Know the Stack
Now that you have a good idea of the platform where your app will run and the features it will offer, it’s time to determine how you’ll build it.
From a technical point of view, WebRTC is nothing more than a group of standards and features comprised in APIs that can be used to gain access to media devices and establish a peer-to-peer connection with other clients. These APIs, in conjunction with a signaling process and other elements, are used to initiate video/audio calls between two or more users.
The signaling process is not defined as part of the technology. The developer is free to use any of the well-known signaling protocols, such as SIP and XMPP, or implement their own solution using a full-duplex communication technology like Websockets.
There are two ways to develop and run a video conference application using WebRTC: On-premise and using a CPaaS (Communication-Platform-as-a-Service) provider.
On-premise means that you’re responsible for both developing the application and managing the required server infrastructure.
On the other hand, using a CPaaS means that you only take care of developing the app. You pay a monthly fee to a provider to maintain the infrastructure.
This leaves us with three strategies to develop a video conference application:
- On-premise: Peer-to-peer approach
- On-premise: Media server approach
- Using a Communication-Platform-as-A-Service (CPaaS) Provider
Let’s briefly discuss these options below.
On-Premise: Peer-to-Peer Approach
WebRTC is peer-to-peer by nature. This means that most of the time, there will be no intermediaries in a WebRTC call. The communication will be direct, browser-to-browser or device-to-device. Coupled with the fact that it encrypts the media transport by default, this makes WebRTC a secure solution for real-time communication.
Client devices typically reside behind NAT configurations and/or firewall restrictions, making it difficult and sometimes impossible to establish a direct connection between them. To overcome this, STUN/TURN servers are used to help with the establishment of a peer-to-peer connection or to relay media to the other user when a connection is not possible.
When building a web conference application on-premise using this approach, you’re in charge of building the application and setting up both the signaling layer, whether developing an in-house solution or using something like SIP or XMPP, and the STUN/TURN servers.
The main advantage of this approach is that you have full control over the performance of your application. The downside is that you need to provide and maintain your infrastructure.
Remember that due to the peer-to-peer nature of WebRTC, some features, like enabling call recording, manipulating streams, and adding multi-party capabilities, may not be easily implemented without adding additional burden to your application. This can lead to the call failure in some circumstances. Because of this, you may want to add a media server to do the hard work of preventing calls from failing with these additional strains.
On Premise: Media Server Approach
A media server sits in the middle of the call participants and sends and receives streams to and from them. This provides you a central point for manipulating media streams, allowing you to add advanced features like recording, simulcast, and multi-party calls.
When building an on-premise web conference application using the media server approach, besides building the application and adding the signaling layer and STUN/TURN servers, you need to add the actual media server and configure it accordingly. Some popular open-source options for media servers are Kurento, Jitsi, and Janus.
Using a Communication-Platform-as-A-Service (CPaaS) Provider.
This is the simplest way to build a video conference app, as it frees you from having to provision and maintain your own infrastructure, thereby allowing you to focus on creating your application.
Note that when you use a CPaaS, you have little to no control on the infrastructure. You should take into count the monthly fee that you’ll be charged for its use.
Some popular CPaaS providers are TokBox, Vidyo, and Twilio. If you want to know more, check out our comparison of CPaaS providers post..
It’s possible to develop a video conference application efficiently. We hope this blog post provided you the insights you need to accomplish this. Focusing on the platform and features you really need and adopting the right strategy according to your needs will give you an application that surely will increase your business value without dying while trying.
Bonus: Learn to Build with Us!
As bonus content, we offer a complete course of video conference application development. In this course, we go through the definition of each one of the required components of a WebRTC call and provide working examples of the three strategies discussed above.
If you’d like to leave the building to us, we’re happy to help! We have an experienced team ready to build a video conference application for your business. Contact us today!