Since Zoom adopted WebRTC, we’ve been closely monitoring their developer platform evolution. Zoom’s WebRTC-powered Video SDK is a powerful addition to the CPaaS landscape, offering rapid integration, robust performance, and a wide array of features for custom video solutions. At their 2025 Developer Summit, Zoom unveiled significant platform upgrades focused on real-time AI, multimodal streaming, and enhanced customer experience capabilities.
Here’s a few from the sessions I was able to attend:
RTMS: Real-Time Media Streams Are Here
Zoom launched RTMS (Real-Time Media Streams), a developer-friendly way to get live, structured audio, video, screen share, chat, and transcript data via WebSocket. No bots, no virtual cameras.
Built for:
- AI assistants
- Live transcription
- Real-time coaching
- Meeting analytics
- Privacy-compliant recording
A standout demo featured AssemblyAI leveraging Zoom RTMS to power a live sales coaching assistant, combining streaming transcription with LLM reasoning (Claude). This enables real-time feedback and insights during customer interactions, without intrusive bots cluttering the participant list.
RTMS positions Zoom as a direct competitor to CPaaS solutions like Twilio’s media relay, but with a privacy-first, frictionless integration model.
Vision-Based RAG: Understanding What’s Shown in Meetings
Yahia Salman showcased “ZoneOut,” a real-time assistant that answers questions based on both what’s said and what’s shown in meetings using Vision-Language Models (VLMs) + RTMS.
How it works:
- Samples video frames every ~20–30s
- Indexes visuals with CoLa/Poly embeddings
- Captures and embeds audio transcripts
- Combines results with OpenAI (or Claude) to answer context-aware queries. For example, if you missed 5 minutes of class, ZoneOut tells you what the teacher said and wrote on screen.
This vision-based RAG approach unlocks new value in edtech, corporate training, and any workflow where visual context is as critical as spoken dialogue.
Yahia will be a guest on the July 16 episode of WebRTC Live: Why Vision Language Models Deserve a Closer Look. Join us!
Secure CX: Voice Authentication in Zoom Contact Center
Zoom’s Developer Advocate Engineer, Rehema Armorer, demoed a smart CRM app using voice-based identity verification via the Zoom Contact Center SDK. Instead of answering security questions, users speak a passphrase. Their voice is compared in real time, and agents get conditional access to sensitive info.
Key tools used:
- Smart Embed v3 for softphone UI
- Zoom Flows + Events to trigger audio capture
- Future plans: leverage RTMS for higher-fidelity voice matching
This approach not only enhances customer experience but lays the groundwork for secure, AI-driven workflows across industries.
Dev-First Tools: Rivet, Connect, Workflows
Zoom is doubling down on developer enablement with a suite of new tools:
- Zoom Rivet: A comprehensive toolkit for integrating server-side apps, handling authentication, webhooks, and API calls with minimal boilerplate. Rivet abstracts away infrastructure, letting developers focus on business logic and rapid prototyping.
- Zoom Connect: No-code/low-code automation platform for building workflow automations—think triggers, actions, and integrations across Zoom Workplace and third-party tools.
- BYOI & GenAI: Bring Your Own Index lets enterprises power AI assistants with proprietary knowledge bases, enabling custom copilots and domain-specific automation.
Workflow Automation now empowers non-technical users to automate routine tasks, like onboarding, approvals, and notifications, directly within Zoom Workplace, using drag-and-drop builders and prebuilt templates.
Final Take
Zoom is no longer just a meeting platform — it’s now a full-fledged CPaaS, expanding rapidly into the real-time AI space. Whether you’re building intelligent assistants, secure customer flows, or advanced analytics, the new Zoom developer platform offers the primitives and tools to create differentiated, high-value experiences. Zoom now gives you:
- Per-participant structured media (RTMS)
- Embeddable UIs (Smart Embed v3)
- Vision + LLM integration paths
- API-first tools for automation
At WebRTC.ventures, we specialize in building and integrating with platforms like Zoom. If you’re looking to leverage these next-gen capabilities for your project, our team can help you architect, build, and deploy solutions that take full advantage of Zoom’s evolving ecosystem.
Further Reading: