Building recording into your WebRTC video or audio application should never be an afterthought. When clients come to us at looking to develop a WebRTC video or audio application, one of the first questions we ask is, “do you need recording?” 

If the answer is yes, the next question is always, “what for?” And then, “how much recording?” It is essential to think about these things up front. The answers will be an important driver in the architecture you use to create your app (CPaaS, open source, or native), as well as whether to handle the recordings as composite or individual streams. More on all of this below.

You can watch this and other tips from our engineering team as part of our WebRTC Tips YouTube video series. Or, read on. 

Some Use Cases for Adding Recording to Your WebRTC Application

  • Record a meeting
  • Customer Service or Quality Assurance, whether for training purposes or record keeping
  • Producing a webinar/event

Considerations When Adding Recording to Your WebRTC Application

Recording costs

If you are using a CPaaS, there will be an additional fee for this service. If you are using recording on a large scale, this fee could be prohibitive. 

Storage/CPU costs

Even if you are using an open source architecture, cost is still a consideration as it may add some processing burden to your media servers and therefore affect how well your media servers scale under heavy load. Post-recording processing (if necessary) can also create an extra processing burden on your media server. You will most likely want to store the recording somewhere other than your media server (Amazon S3 is typical). And so, the longer term storage costs of keeping the files around needs to be considered as well.

Recording layouts

Depending on the architecture you choose, you may not have as much flexibility in your recording layout. 

And, the layout options in the recording may not be the same as in the application itself. Issues arise when you have a meeting tool with multiple people on the screen, screen sharing, and more. 

Recording quality

Do you need full HD-quality recording or can you make do with something smaller? Also, how long will you be keeping the recordings? These all affect your storage cost.

Security of recordings

Most recordings, such as a corporate meeting, inherently have private content. You want to make sure it isn’t accidentally shared with anyone publicly. So, where are you going to put those recordings? And what type of security are you going to put around them?

Cannot use E2EE

If you are doing true end to end encryption (E2EE) of video and audio between each participant, you probably cannot record it. This is because the recording is typically done on a media server, which breaks the encryption chain since it sits in between the clients talking to each other. Any recording would have to be done on client end devices and then uploaded somewhere, which is not efficient. 

When discussing true E2EE in WebRTC, you will end up learning about insertable streams. This is a relatively new concept, and you can learn more in this episode of WebRTC Live: Watch WebRTC Live #51: NV – The Next Version of the WebRTC Standard with Bernard Aboba. Keep in mind that insertable streams work by doing the encryption/decryption of your video streams on the client devices, outside of the WebRTC connection. Therefore, an application’s media servers won’t be able to do the recording of an E2EE call. That’s why E2EE and Recording don’t mix well.

Ways to Do the Recording in Your WebRTC Application

There are two options for doing the actual recording: through composite or individual streams. Let’s look at both in light of the considerations laid out above.

Option 1: Composite Stream

Example Recording Scenario: Composite Stream
Example Recording Scenario: Composite Stream

In this scenario, the recording is done on a WebRTC media server. The output will be a single media file with all of the different streams in that one file. 

The composite stream is nice and simple, with little work to do. All you need to worry about is where you are going to store the recording and the security around it. 

The first drawback (there is always a drawback, isn’t there?), is that you don’t have much control over the layout. The composite recording may not look the same as you had in the meeting tool itself. If the recording is only available on a grid, for example, there is going to be an issue if there is screen sharing. You need to figure out how to tell the media server that one stream should be shown predominantly. Also, if the layout changes during conversation, that may or may not translate to the recording.

The second drawback is that this work is usually done on the same media server that is handling all the processing streams. This added burden means that a single instance of that media server can scale to fewer conversations if it is recording all of those conversations, as well. 

Option 2: Individual Streams

Alternate Recording Scenario: Individual Streams
Alternate Recording Scenario: Individual Streams

The alternative is recording as individual streams. This means the media server is going to write to file individual streams for each speaker. It may even write individual video and audio streams. 

Individual streams give you a lot of flexibility in your recording. You can change layouts throughout the length of the call. You can take the screen share and make sure it is big. You can do post-processing on the files (and might be required to depending on the output file type the media server creates). And more. 

Another benefit is that it is less work on the media server because it is not trying to combine the files in real time. It is just writing them. Your media server can now handle more conversations in parallel. However, you will likely need other media servers in your architecture that will process the files and then store them securely. 

The downside? It is going to take more work on your part to use or play back the recordings. How will you be playing those media files back to the user? If it is four different files, how do you make sure they are time-synched? What if they are different lengths because one participant was not on the call for the entire length of time? 

This flexibility of being able to handle the streams differently might be beneficial, or even essential, for your use case. In this case, the extra work may be unavoidable. You must also note that this post-processing work will mean a small delay before the recording is available. 

Have I made my case for making recording decisions in advance?

As you can see, this is an incredibly important factor to consider up front. 

First, whether to record at all may change the decision on your architectural solution: whether to go open source and even which CPaaS to choose based on the flexibility they have in displaying a recording. 

Second, how frequently you want to record, the purpose of that recording, and what you want to do with it afterward will help determine whether you should go with composite or individual streams.

Of course, this doesn’t mean that you can’t add recording after the fact! It may be easy or hard to do based on your structure, but we’re here to help. Whether you would like us to assess the ease of building recording into your current app or build a new one, contact today

Recent Blog Posts