Peermetrics at Scale: When WebRTC Monitoring Hits a Million Events a Day

Peermetrics is WebRTC.ventures’ open-source WebRTC monitoring stack. Earlier this year, a client pushed it to a scale that stress-tested assumptions you can’t easily replicate in development: thousands of video conferences a day totaling over a million events. That traffic surfaced things that only show up at volume.

The result: Peermetrics SDK v2.8, @peermetrics/webrtc-stats v5.9.0, and a round of API and dashboard changes focused on server-side aggregation, Redis caching, cache pre-warming, and drilldown correctness. We also used the opportunity to make Jitsi and LiveKit SDK integrations more practical.

These are not flashy releases, and that is partly the point. The work is mostly about production hardening.

Production WebRTC debugging

Production WebRTC debugging isn’t one tool or one metric. Depending on which layer you’re investigating, the toolbox includes webrtc-internals, Wireshark, Playwright, or Homer — ideally alongside Peermetrics for session-level monitoring and historical context. It has to be built in early, not retrofitted after support tickets start arriving.

We covered this from two angles in recent WebRTC Live episodes: #111: Improving End-to-End Quality with WebRTC Observability and #112: How Experienced Teams Debug and Monitor WebRTC in Production. Both are worth a watch!

A WebRTC monitoring funnel from WebRTC Live #112. In production, “the call failed” is rarely one event; teams need visibility into each step from join click to ICE connection, track publishing, media reception, and sustained stability.

Hardening WebRTC Monitoring for Production Scale

Over the last few months, we fixed issues in Peermetrics that appeared when monitoring code is exercised repeatedly, across real SDKs, under real production load.

WebRTC Capture Lifecycle Bugs

One important fix was in @peermetrics/webrtc-stats. We fixed a getUserMedia wrapping recursion issue that had been the root cause of “Maximum call stack exceeded” reports in production for customers running many multiple sessions per day in the same tab. The old approach could create a wrapper chain that eventually referenced itself. The fix moved to a single module-level wrapper with a subscriber registry: the browser API is wrapped once per page, each WebRTCStats instance subscribes to it, and the native function is restored only when the last subscriber goes away.

That detail matters because monitoring code often lives close to browser APIs with unpredictable lifecycles. If capture instrumentation is installed and removed incorrectly, the monitoring layer becomes part of the failure path.

We also added wrapGetDisplayMedia support with the same single-wrap semantics as wrapGetUserMedia, including timeline events for request, stream, and error flows. For applications where screen sharing is operationally important, those events need to be visible alongside camera and microphone behavior.

WebRTC Stats Interpretation Bugs

We also fixed several issues in the stats interpretation layer.

computeRate now handles counter regressions, such as ICE restarts or stats resets, by returning “null” instead of producing misleading values.
PacketRate was corrected, too. It had been treated like bitrate-style data, but packet counters should be reported as packets per second.

These are low-level fixes, but low-level interpretation bugs propagate upward. If rate calculations are wrong, every chart, alert, and debugging session built on top of them becomes suspect.

Jitsi and LiveKit Integration Fixes

Peermetrics SDK v2.8 work focused heavily on making SDK integrations more practical in real applications.

For Jitsi, we added Jitsi Meet SDK support with automatic WebRTC detection, improved event handling, clarified transport labeling, deduped peer connections, and supported multiple Jitsi peer connections with distinct connection IDs.
For LiveKit, we improved connection detection and naming, prioritized wrap-based RTCPeerConnection capture, enriched direction labels using SDK metadata, improved diagnostics, and tightened teardown behavior.

Server-Side Aggregation for WebRTC Dashboards

For a single conference, an engineer can often inspect participant-level detail and reason through what happened. At thousands of conferences per day and more than a million events per day, that workflow breaks down. The system has to summarize aggressively, but without flattening away the details engineers need when they drill into a bad session.

That was the real challenge in the recent Peermetrics API and dashboard work.

Peermetrics Monitoring Flow: data is captured in the browser, enriched through the SDK and webrtc-stats, aggregated through the API and cache layer, and surfaced in dashboard summaries that still drill down to conferences, sessions, connections, and issues.

We moved expensive dashboard aggregation out of the frontend and into dedicated API summary endpoints. The goal was not just faster charts, but to make the first query cheap enough for a high-level operational view while preserving a path back to the underlying conferences, sessions, connections, and issues.
We also added Redis caching and pre-warming for common dashboard views. That work exposed a subtle bug: the dashboard’s default “last 30 days” filter included millisecond-level timestamps, so repeated page loads produced different cache keys and missed Redis even when the user was effectively asking for the same dashboard. Normalizing timestamps before generating the cache key made the warm-cache path useful in real usage (up to 70x faster loads).

The other fixes were semantic, not just performance-related. Filtering conferences by issue code could return the same conference multiple times when multiple issues shared the same code, which broke pagination and counts in drilldowns. The getUserMedia summary endpoint was also hardened so unexpected legacy JSON shapes would not crash the whole summary request.

The lesson was not simply “the backend needs to scale.” That is obvious. The useful lesson was that observability data has two competing access patterns: broad aggregation for situational awareness, and precise drilldown for root-cause analysis. Peermetrics had to improve both paths.

A Quick Peermetrics Walkthrough

For a practical look at Peermetrics in action, this walkthrough shows Peermetrics monitoring Amazon IVS Real-Time:

Peermetrics is improving because it is being exercised by real production workloads

I want to thank the clients who have been willing to use Peermetrics in serious environments and work through these improvements with us. Real monitoring products are not perfected in isolation. They get better when production usage exposes edge cases, and when clients stay engaged long enough for those edge cases to turn into fixes.

The recent SDK, stats, API, and dashboard work came from the same place: usage exposed edge cases, and those edge cases forced better engineering. We fixed capture lifecycle bugs, corrected stats math, added Jitsi support, hardened the LiveKit integration, moved aggregation server-side, added caching and pre-warming, and tightened the path from summary views back to the underlying data.

This is also the kind of work WebRTC.ventures does with client teams: integrating, customizing, scaling, and maintaining real-time communication applications in production. Contact us if your team needs help with a WebRTC application that has to work reliably beyond the demo.

Need help with a non-realtime app? Check out our parent company, AgilityFeat!

Peermetrics at Scale: When WebRTC Monitoring Hits a Million Events a Day.

Production WebRTC debugging