Over the last few years, Voice AI agents have moved quickly from experimentation into production. Early adoption centered on customer support, basic IVR modernization, sales automation, meeting summaries, and general-purpose voice assistants. These early use cases were low-stakes enough to tolerate imperfection.

That is changing. Real-time Voice AI agents are now being deployed in regulated and mission-critical environments such as telecom platforms, telehealth systems, emergency response workflows, and financial infrastructure. In those settings, the hard part is no longer just model quality. It is systems architecture: reliability, security, observability, scalability, and compliance.

Choosing the right AI orchestration framework is one of the most important decisions in that architecture. We have worked with many here at WebRTC.ventures, but in this article we focus on four widely used options we have deployed in production: Amazon Bedrock Agents and Amazon Bedrock AgentCore, Google Vertex AI and ADK, LiveKit Agents, and Daily’s Pipecat Flows.

The Four Production AI Frameworks at a Glance

Bedrock / AgentCoreVertex AI / ADKLiveKitPipecat
Realtime Voice NativeYes (recently added)Partial (WebSockets based only)YesYes
Fully Open SourceNoNoYesYes
IAM / GovernanceStrongStrongModerateModerate
Media HandlingPartial. Not a full RTC platformPartial. Requires additional stackBuilt-inBuilt-in
Best FitAWS-standardized enterpriseGoogle-cloud enterprise, multimodal agentsWebRTC-first voice and videoCustom WebRTC / telephony / multimodal agents

Amazon Bedrock Agents and AgentCore

AWS offers Bedrock Agents as a structured orchestration layer on top of foundation models. With AgentCore, you get additional flexibility around infrastructure, framework, and model choice. AgentCore’s bidirectional streaming handles basic call-center style interactions well.

An example of a production-grade multi-account AWS architecture where model access, workload execution, and network routing are isolated and policy-enforced through VPC Lattice and IAM. In this model, an agent cannot “decide” to execute something outside its allowed scope, the infrastructure simply does not expose the route.
An example of a production-grade multi-account AWS architecture where model access, workload execution, and network routing are isolated and policy-enforced through VPC Lattice and IAM. In this model, an agent cannot “decide” to execute something outside its allowed scope, the infrastructure simply does not expose the route.

Key characteristics of Amazon Bedrock Agents and AgentCore:

  • Tight IAM integration
  • Structured action schemas
  • Policy-driven tool execution
  • Guardrails configurable at the model boundary
  • Recently added low-latency Nova 2 Sonic model and WebRTC bidirectional support

Bedrock Agents are particularly strong in environments already standardized on AWS. A recent example using AWS for AI in regulated industries is Visa’s intelligent commerce initiative using Bedrock AgentCore to enable agent-driven commerce flows with strict policy enforcement. (See AWS blog: Introducing Visa Intelligent Commerce on AWS)

For telephony-grade session control, multi-leg bridging, media fan-out, or translation legs, Bedrock pairs well with a dedicated RTC stack rather than handling media natively.

We have used Bedrock in voice and non-voice projects where IAM integration and enterprise compliance were top priorities. One voice AI example is Conectara, a telephony platform with customizable AI agents built on Amazon Connect and Bedrock.

Amazon Bedrock Agents and AgentCore is best for: Enterprise teams already on AWS, compliance-heavy workflows, complex policy enforcement.

Google Vertex AI and ADK

Google Cloud provides two primary components for agent systems. Vertex AI handles end-to-end ML lifecycle management, model hosting, and governance. The Agent Development Kit (Google ADK) provides structured orchestration, tool and function calling, workflow control, and model selection across Gemini and third-party models including Anthropic’s Claude.

For real-time voice systems, Vertex and ADK serve as the AI and orchestration layer while a dedicated media stack handles latency-sensitive audio operations.

Unified AI and Orchestration Layers

This three-plane separation is particularly effective in regulated environments:

  • Media plane: Streaming audio, barge-in handling, session state via SIP or WebRTC. For basic call-center flows, Google Dialogflow integrates naturally here.
  • Agent plane: Workflow orchestration, tool invocation, guardrails, escalation logic. Implemented using ADK with a single session identifier propagated across all agents for full observability.
  • Governance plane: Identity and access, tool authorization, network and data boundaries, retention windows, audit log routing.

Google Vertex AI and ADK is best for: Multi-model environments, teams already on Google Cloud, workflows requiring auditable execution across governance structures.

LiveKit Agent Workflows

LiveKit started as a real-time media infrastructure platform. Its Agents layer builds agent workflows on top of open-source, media-first infrastructure, which is a key distinction from Bedrock and Vertex.

In this basic diagram, users talk to a LiveKit Server via WebRTC. LiveKit streams audio to a LiveKit Agent, which uses Amazon Bedrock Nova Sonic 2 speech to speech model and can call external tools/APIs or MCP servers. The agent’s responses stream back to the user through LiveKit/WebRTC. Many other models are supported: Large language models (LLM) overview | LiveKit Documentation

Because LiveKit is WebRTC-first and designed for low-latency real-time applications, it reduces the integration burden for production voice systems. Session state, streaming audio, barge-in handling, latency control, and media observability come from the RTC layer rather than being retrofitted onto a general-purpose agent engine.

LiveKit Agent Workflow Overview

On top of that media foundation, LiveKit’s Agents framework recently added structured workflow capabilities such as explicit tool definitions, controlled execution paths, workflow graphs, and deterministic branching. In this model, the LLM is one component inside an explicit execution system: agents own session control, tasks encapsulate discrete steps, and tools define the side-effect surface with clear inputs and outputs. The result is more bounded and auditable behavior inside a live voice runtime.

LiveKit Agent Workflows is best for: WebRTC-first architectures, teams that want voice-native performance without building a separate media stack, latency-sensitive production deployments.

Pipecat Flows

Pipecat, the open-source project created by Daily, has quickly become one of the most important frameworks in the realtime voice and video AI ecosystem. Its Pipecat Flows framework brings structured flow control to flexible, composable voice pipelines.

This diagram shows a real-time Voice AI agent built on Pipecat. A user talks through a WebRTC app using Daily for transport; Pipecat runs voice activity detection and Amazon Transcribe for speech-to-text, manages dialog flow and in-memory state, optionally calls external tools/APIs, uses Amazon Bedrock for the model response, and Amazon Polly to speak the reply back in the call.

Pipecat Flows offers explicit flow definitions, state-driven conversation paths, structured actions at transitions, tool invocation constraints, and clear separation between LLM reasoning and execution. That shift toward deterministic behavior reflects a broader production reality: some regulated voice agents require predictable, auditable execution paths, not just capable ones.

For telephony, web-based voice agents, and edge or IoT deployments, Pipecat’s open, composable runtime is especially compelling. Teams can own the full voice pipeline, deploy in hybrid or edge topologies, and keep workflows bounded through Flows.

Pipecat Flows is best for: Custom and self-hosted deployments, telephony and web-based voice agents, edge/IoT scenarios, and teams that want vendor-neutral building blocks with fine-grained media pipeline control.

How to Choose a Voice AI Agent Production Framework

For telecom, emergency response, and healthcare systems, latency, media control, and infrastructure determinism are architectural requirements. The decision comes down to where your governance needs and your media needs are centered.

  • Choose Bedrock or Vertex when complex governance, IAM-aligned enforcement, and enterprise compliance dominate your requirements. Both typically require a separate media stack for production voice.
  • Choose LiveKit or Pipecat when you need a voice-native RTC pipeline with structured agent workflows built in. LiveKit is the stronger choice for WebRTC-first cloud deployments that require a media server; Pipecat for custom or existing media infrastructure.
  • Choose a hybrid when you want cloud enterprise guardrails and managed models alongside a dedicated media-first real-time stack. Many production regulated deployments land here.

Every production voice deployment has constraints that no framework comparison can fully capture: regulatory requirements, existing infrastructure, team expertise, latency budgets, and more. Tell us what you’re building, and we’ll help you figure out where to start! Reach out to the WebRTC.ventures team today.

Recent Blog Posts