Zoom Developer Summit 2025 RTMS, Vision-Based RAG, Secure CX & Next-Gen Dev Tools

Since Zoom adopted WebRTC, we’ve been closely monitoring their developer platform evolution. Zoom’s WebRTC-powered Video SDK is a powerful addition to the CPaaS landscape, offering rapid integration, robust performance, and a wide array of features for custom video solutions. At their 2025 Developer Summit, Zoom unveiled significant platform upgrades focused on real-time AI, multimodal streaming, and enhanced customer experience capabilities.

Here’s a few from the sessions I was able to attend:

RTMS: Real-Time Media Streams Are Here

Zoom launched RTMS (Real-Time Media Streams), a developer-friendly way to get live, structured audio, video, screen share, chat, and transcript data via WebSocket. No bots, no virtual cameras.

Built for:

  • AI assistants
  • Live transcription
  • Real-time coaching
  • Meeting analytics
  • Privacy-compliant recording

A standout demo featured AssemblyAI leveraging Zoom RTMS to power a live sales coaching assistant, combining streaming transcription with LLM reasoning (Claude). This enables real-time feedback and insights during customer interactions, without intrusive bots cluttering the participant list.

RTMS positions Zoom as a direct competitor to CPaaS solutions like Twilio’s media relay, but with a privacy-first, frictionless integration model.

Vision-Based RAG: Understanding What’s Shown in Meetings

Yahia Salman showcased “ZoneOut,” a real-time assistant that answers questions based on both what’s said and what’s shown in meetings using Vision-Language Models (VLMs) + RTMS.

How it works:

  • Samples video frames every ~20–30s
  • Indexes visuals with CoLa/Poly embeddings
  • Captures and embeds audio transcripts
  • Combines results with OpenAI (or Claude) to answer context-aware queries. For example, if you missed 5 minutes of class, ZoneOut tells you what the teacher said and wrote on screen.

This vision-based RAG approach unlocks new value in edtech, corporate training, and any workflow where visual context is as critical as spoken dialogue.

Yahia will be a guest on the July 16 episode of WebRTC Live: Why Vision Language Models Deserve a Closer Look. Join us!

Secure CX: Voice Authentication in Zoom Contact Center

Zoom’s Developer Advocate Engineer, Rehema Armorer, demoed a smart CRM app using voice-based identity verification via the Zoom Contact Center SDK. Instead of answering security questions, users speak a passphrase. Their voice is compared in real time, and agents get conditional access to sensitive info.

Key tools used:

  • Smart Embed v3 for softphone UI
  • Zoom Flows + Events to trigger audio capture
  • Future plans: leverage RTMS for higher-fidelity voice matching

This approach not only enhances customer experience but lays the groundwork for secure, AI-driven workflows across industries. 

Dev-First Tools: Rivet, Connect, Workflows

Zoom is doubling down on developer enablement with a suite of new tools:

  • Zoom Rivet: A comprehensive toolkit for integrating server-side apps, handling authentication, webhooks, and API calls with minimal boilerplate. Rivet abstracts away infrastructure, letting developers focus on business logic and rapid prototyping.
  • Zoom Connect: No-code/low-code automation platform for building workflow automations—think triggers, actions, and integrations across Zoom Workplace and third-party tools.
  • BYOI & GenAI: Bring Your Own Index lets enterprises power AI assistants with proprietary knowledge bases, enabling custom copilots and domain-specific automation.

Workflow Automation now empowers non-technical users to automate routine tasks, like onboarding, approvals, and notifications, directly within Zoom Workplace, using drag-and-drop builders and prebuilt templates.

Final Take

Zoom is no longer just a meeting platform — it’s now a full-fledged CPaaS, expanding rapidly into the real-time AI space. Whether you’re building intelligent assistants, secure customer flows, or advanced analytics, the new Zoom developer platform offers the primitives and tools to create differentiated, high-value experiences. Zoom now gives you:

  • Per-participant structured media (RTMS)
  • Embeddable UIs (Smart Embed v3)
  • Vision + LLM integration paths
  • API-first tools for automation

At WebRTC.ventures, we specialize in building and integrating with platforms like Zoom. If you’re looking to leverage these next-gen capabilities for your project, our team can help you architect, build, and deploy solutions that take full advantage of Zoom’s evolving ecosystem.

Further Reading: 

Recent Blog Posts