Real-time video communication applications face unique scalability challenges that can make or break the user experience. When thousands of users simultaneously join virtual classrooms,  video conferences or other streaming video experiences, traditional autoscaling approaches often fall short.

The key to managing predictable traffic spikes in WebRTC applications lies in proactive scaling strategies that anticipate demand before it hits your infrastructure. Scheduling scaling ensures your infrastructure is ready ahead of demand, avoiding the latency issues that can disrupt real-time communication.

Our team at WebRTC.ventures has developed proven strategies to support thousands of concurrent users in virtual classrooms by giving your cloud infrastructure a strategic head start on scalability. As an Amazon Partner Network member with deep AWS expertise, we use a combination of AWS services to handle predictable load patterns in real-time video applications. I’ll use our implementation for Mathnasium’s online learning platform to demonstrate how these strategies work in practice.

An edtech platform with thousands of users around the globe

Mathnasium is a franchise of after school supplemental math programs with over 1000 locations worldwide. Post-pandemic, they quickly realized that e-learning was here to stay. We were brought in to build a customized virtual solution tailored to their unique instruction model.

They clearly stated that they had predictable spikes in load given their scheduled lessons around the world. Students log into their online education system at specific times of the day with classes starting every hour for 12 hours straight. This means that the application must support spikes every 60 minutes.

With this in mind, we designed the system to autoscale when workloads increase. And we validated the design with load testing. The challenge with load tests in applications that involve live videocalls is that you need to simulate users. We achieve this with the use of Loadero. You can read more details about how we executed load testing with simulated users in virtual classrooms in this post: The Transformative Impact of Automated Load Testing: A WebRTC EdTech Case Study.

AWS ECS and Aurora Serverless Architecture for Real-Time Video Applications

Our design distributed the workload amongst several Amazon ECS tasks that run both the frontend and backend servers of our application. We also decided to use Amazon Aurora for our database to take advantage of its built-in autoscalability and high availability in locations outside the US. We also wanted to provide the client with an optimal cost structure by reducing hardware costs when the system was idle. 

With our design in place and autoscalability in our ECS tasks based on CPU and memory usage, the application was able to support twice as many users as reported by our client during peak hours. This was a requirement for them since the beginning of the project and we were confident that we had all bases covered.

Beta Testing Results: Identifying Scalability Bottlenecks

Our client had learned many lessons from their necessarily hasty move to online instruction when the pandemic hit. This time, they wanted the smoothest roll out possible to ensure the robustness of the application and also to facilitate tool adoption by both instructors and students. 

Beta tests were planned with a specific set of users that represented their real-life workload. During these tests, we discovered that at certain login rates, some students were locked in the lobby room. Students that managed to log in successfully noticed glitches in their audio and video. This was due to the time it took to provision new servers and computational resources, which was not as fast as the rate at which users were logging into the platform.

Implementing AWS Scheduled Autoscaling for Predictable Traffic Pattern

While we have built and tested the system to withhold 2x the expected load using ECS and Aurora’s autoscalability, we also understood we needed to pre-provision more tasks in advance to prepare the system to the upcoming waves of users. This was because autoscalable services need a bit of cold-start time to ramp up the additional servers.

First, we fine tuned the amount of ECS tasks and Aurora’s computational units by re-configuring our load tests to represent the ramp up rate we observed in Production during beta tests. We contrasted the results from our baseline load tests versus the results of the load tests with increased resources in several magnitudes (i.e. load testing with 4, 6 and 8 backend servers).

With the sweet spot of how many ECS tasks and Aurora’s ACUs we needed in anticipation of the workload waves, we changed the minimum resources of our cloud services one hour before the first classroom of the day. This scheduled change was triggered by a simple Amazon CloudWatch operation. Finally, to optimize costs, another CloudWatch trigger, set back the minimum to the original configuration, so AWS would start to drain down connections and dispose of unused resources. The client needs to use the system during off-peak hours for other administrative tasks, thus we don’t turn off all cloud services, but rather scale them down.

With this scheduled scalability we prevented users from  experiencing issues and glitches while logging into their classrooms, and our client could guarantee to their customers that the product had the expected quality, even during peak hours.

Sample Code

While we cannot disclose client’s code, below are some examples of how you could schedule scaling on your system using AWS.

If you want to manually schedule ECS tasks to scale to a certain amount of tasks at a specific time, you need to increase the minimum capacity of your cluster at the desired time.

To scale down, you need to set that minimum capacity back to its original. In this sample script, we are increasing the minimum capacity of tasks from 2 to 8 at 4pm EST. And then back to 2 tasks at midnight. This ensures that AWS provisions at least 4 tasks (in our case, servers) at the time needed:

# 1. Register the scalable target (if not already registered)
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 2 \
  --max-capacity 4

# 2. Schedule to scale UP to 4 tasks at 4 PM EST (21:00 UTC)
aws application-autoscaling put-scheduled-action \
  --service-namespace ecs \
  --scheduled-action-name scale-up-to-4 \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --schedule "cron(0 21 * * ? *)" \
  --scalable-target-action MinCapacity=4,MaxCapacity=4

# 3. Schedule to scale DOWN to 2 tasks at 12 AM EST (5:00 UTC)
aws application-autoscaling put-scheduled-action \
  --service-namespace ecs \
  --scheduled-action-name scale-down-to-2 \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --schedule "cron(0 5 * * ? *)" \
  --scalable-target-action MinCapacity=2,MaxCapacity=2

Increasing resources on Aurora in a scheduled fashion

If you are using Aurora Serverless V2, you can increase (or decrease) the Aurora Computational Units (ACUs). These are the amount of computational resources, specifically CPU units and memory that your cluster can autoscale. This should be done in addition to the autoscalability policies that Aurora provides. Remember that we wanted to remove the cold start time when the students and teachers were logging into the platform.

This sample was generated using ChatGPT and shows how you can leverage CloudWatch events to trigger modifications to your Aurora cluster using Terraform:

Pre-requisites

  1. You’ll need an IAM role with sufficient permissions to run the following automations.
  2. We’ll create an AWS System Manager (SSM) document that contains the automation that we want to run.
  3. Then we’ll generate the Terraform script that will run the SSM document with the appropriate permissions granted by the IAM role.

SSM Document

resource "aws_ssm_document" "scale_aurora" {
  name          = "ScaleAuroraServerlessV2"
  document_type = "Automation"

  content = jsonencode({
    schemaVersion = "0.3",
    description   = "Scale Aurora Serverless v2 min/max capacity",
    assumeRole    = "{{ AutomationAssumeRole }}",
    parameters    = {
      DBClusterIdentifier = {
        type        = "String",
        description = "The Aurora DB cluster ID"
      },
      MinCapacity = {
        type        = "String",
        description = "Minimum ACU"
      },
      MaxCapacity = {
        type        = "String",
        description = "Maximum ACU"
      }
    },
    mainSteps = [
      {
        name = "scaleCluster",
        action = "aws:executeAwsApi",
        inputs = {
          Service = "rds",
          Api     = "ModifyDBCluster",
          DBClusterIdentifier = "{{ DBClusterIdentifier }}",
          ServerlessV2ScalingConfiguration = {
            MinCapacity = "{{ MinCapacity }}",
            MaxCapacity = "{{ MaxCapacity }}",
          },
          ApplyImmediately = true
        }
      }
    ]
  })
}

Using EventBridge to Scale Up Aurora

Using Amazon EventBridge you can trigger an automation described in an SSM document as follows.

resource "aws_cloudwatch_event_rule" "scale_up" {
  name                = "scale-up-aurora-v2"
  schedule_expression = "cron(0 21 * * ? *)"
}

resource "aws_cloudwatch_event_target" "scale_up_target" {
  rule      = aws_cloudwatch_event_rule.scale_up.name
  target_id = "ScaleUpAuroraTarget"
  arn       = aws_ssm_document.scale_aurora.arn
  role_arn  = aws_iam_role.eventbridge_ssm_role.arn

  input = jsonencode({
    DocumentName         = aws_ssm_document.scale_aurora.name,
    Parameters           = {
      DBClusterIdentifier = "my-aurora-cluster",
      MinCapacity         = "4",
      MaxCapacity         = "8"
    },
    AutomationAssumeRole = aws_iam_role.eventbridge_ssm_role.arn
  })
}

For the scaling down, you need to create another EventBridge trigger, but with the minCapacity back to the original and the cron job set at a different time.

Next Steps: Implementing AWS Scheduled Autoscaling for Your WebRTC Application

Real-time video applications require consistent performance during peak usage periods. The difference between a successful video platform and one that users abandon often comes down to maintaining seamless performance when demand spikes. AWS scheduled autoscaling provides a proactive approach to handle predictable traffic patterns without degrading user experience during critical moments.

Key takeaways for implementing scheduled autoscaling:

  • Analyze your traffic patterns to identify predictable peaks
  • Implement load testing that simulates real user behavior
  • Configure scheduled scaling 15-30 minutes before expected traffic spikes
  • Monitor and adjust scaling parameters based on actual usage data
  • Combine scheduled scaling with reactive policies for comprehensive coverage

The combination of scheduled and reactive autoscaling strategies ensures your WebRTC infrastructure can handle both predictable and unexpected load patterns. By proactively scaling resources before demand hits, you eliminate the cold-start delays that can disrupt real-time communication experiences.

When implementing these strategies, remember that the specific timing and resource allocation will depend on your application’s unique usage patterns. Start with conservative estimates and refine based on monitoring data to find the optimal balance between performance and cost efficiency.

If you’re ready to build a production-ready, reliable, and scalable WebRTC system that handles traffic spikes seamlessly, contact our team of AWS-certified experts at WebRTC.ventures.

WebRTC.ventures is a member of the AWS Partner Network.

Recent Blog Posts