Voice AI Conversation Records: Why vCons Belong in Your Production Architecture

Voice AI systems generate more than recordings and transcripts. Every production interaction produces a web of artifacts across multiple systems: call-setup metadata, ASR output, LLM responses, tool calls, CRM updates, escalation events, and compliance-relevant signals like caller identity verification. Most Voice AI architectures store some of these. Few store all of them in a way that survives an audit, a customer dispute, or a platform migration.

That gap has a name and an emerging solution. The vCon standard, currently under active development at the IETF, defines a structured, portable container for conversation data. It gives voice AI and video communications teams a format that can hold the full record of an interaction, not just the transcript, and carry it across systems, trust boundaries, and time.

Voice AI Conversation Records: What Production Systems Actually Capture

Consider a specific scenario: A financial services customer disputes an automated outbound call. Your compliance team needs to answer several questions: What was said? Who initiated the call? Was the caller identity authenticated? What attestation level did STIR/SHAKEN assign? Was the call escalated? What tools or data sources did the AI reference during the interaction?

A transcript answers the first question. It does not answer the rest.

This is the core records problem in production Voice AI. The conversation itself is only one layer. A complete record also needs to capture:

Who participated, including identity verification signals, not just names or phone numbers
How the call was established, including SIP signaling metadata and STIR/SHAKEN verification results
What systems processed the interaction, including ASR providers, LLM orchestration layers, and tool calls
What analysis was generated, including summaries, classifications, sentiment scores, and AI outputs
What happened operationally, including escalations, hold events, transfers, and CRM updates
The audit chain, including timestamps, system identifiers, and signed or encrypted record versions

When these artifacts live in separate systems: an SBC log here, a transcript in an ASR vendor dashboard, a summary in a CRM, a STIR/SHAKEN result in a carrier trace. Reconstructing the full record after the fact becomes a manual, error-prone process. In regulated industries, that is a governance problem, not just an operational inconvenience.

What Is vCon? The IETF Conversation Container Standard

A vCon is a JSON-based conversation container developed under the IETF Virtualized Conversations working group. It is best understood as the conversation equivalent of a vCard: a portable, structured format designed to carry conversation data across systems without losing context.

The core vCon format defines several object types:

Parties. who participated in the conversation, with support for identity parameters including STIR/SHAKEN data
Dialog. the actual conversation content, with references to audio, video, or text
Attachments. related artifacts including SIP messages, certificate chains, verification reports, and supporting files
Analysis. transcripts, summaries, classifications, sentiment, and AI-generated outputs
Metadata. timing, identifiers, system context, and audit information

The design principle is separation of concerns. Audio is not the same artifact as a transcript. A transcript is not the same artifact as an identity verification result. A SIP trace is not the same artifact as a compliance report. vCon gives each of these a defined place in one container, rather than forcing everything into a product-specific schema or scattering it across vendor storage.

One architectural detail matters for production pipelines: vCons are designed to evolve over time. Different components of the record can be produced by different systems at different stages. Signed versions become immutable. When additional content needs to be added later, a new vCon references the earlier signed version rather than overwriting it. That versioning model fits real production pipelines, where telephony infrastructure, ASR, LLM orchestration, and post-call analysis all run in separate services on different timelines.

Why Conversational Data Gets Fragmented Across Systems

Modern Voice AI deployments are not single systems. A typical production pipeline involves:

SIP or PSTN infrastructure for call origination and routing
WebRTC media paths for browser or app-based voice
ASR services for speech-to-text
LLM orchestration for reasoning, response generation, and tool use
TTS services for voice synthesis
Tool calls to external APIs, databases, or CRMs
Human escalation paths and agent handoff logic
Analytics, QA, and evaluation pipelines
Compliance and retention workflows

Each of these layers generates data about the conversation. Without a common record format, that data accumulates in vendor dashboards, application databases, log aggregators, media storage buckets, and temporary observability pipelines. Much of it has a short retention window by default.

The downstream effect is that basic production questions become difficult to answer reliably:

What exactly happened in this interaction, end to end?
What did the AI say, and what data or tools influenced that response?
Was the caller identity verified, and at what attestation level?
Was the interaction escalated, and when?
Can we reconstruct the full record six months from now?
Can a different system consume this record without losing context?

vCon addresses this by providing a single container that can hold or reference all of these artifacts in a defined, portable structure. The pipeline complexity does not disappear, but the record of what happened becomes coherent rather than fragmented.

vCon SIP Signaling and STIR/SHAKEN: What the New IETF Draft Adds

A new IETF Internet-Draft published in April 2026 extends vCon specifically for SIP signaling and STIR/SHAKEN data. This extension is worth attention for any team operating telephony infrastructure.

The problem it addresses is specific: most Voice systems capture what was said but discard the evidence about how the call was established. SIP signaling data — including the Call-ID, INVITE and response metadata, and STIR/SHAKEN verification results — typically lives in SBC logs or carrier traces with short retention windows. Once that data ages out, the transcript becomes the only surviving record of the call.

For regulated or high-risk workflows, that matters. STIR/SHAKEN has been required in the IP portions of U.S. voice networks since June 2021. Caller authentication data is now part of mainstream telephony infrastructure. For Voice AI, this is especially relevant to outbound customer interactions, collections, financial services, healthcare, or other workflows where caller identity, consent, traceback, or call legitimacy may later be questioned.The SIP sign aling extension distributes call-setup data across existing vCon objects:

Party objects can carry sip_contact, sip_user_agent, and sip_display_name
Dialog objects can carry sip_call_id and related fields
Attachment objects can store SIP messages, certificate chains, and STIR/SHAKEN verification reports

Record Type	What You Preserve	What You Lose
Voice AI without vCon + SIP	Audio, transcript, summary, agent output	SIP Call-ID, INVITE metadata, PASSporT, attestation level, certificate chain, verification result
Voice AI with vCon + SIP	Audio, transcript, analysis, plus SIP and STIR/SHAKEN evidence in one portable container	Less fragmented evidence, simpler audit reconstruction

Voice AI Compliance Architecture: What the Record Layer Needs

vCon defines the record format. It still needs to sit inside a production-grade architecture. Teams building toward this should also plan for:

Object storage for media files with defined retention and cleanup policies
Managed databases for application and session state
Authentication, authorization, and tenant-aware access controls
Signed or encrypted vCon records for tamper-evident audit trails
Retention policies that satisfy regulatory requirements by vertical
AI evaluation pipelines that can consume structured vCon data
Monitoring across media, AI, and application layers

The record format and the surrounding architecture are separate problems. vCon makes the record portable and structured. The architecture around it determines whether that record is actually preserved, secured, and accessible when it matters.

Working With WebRTC.ventures on Production Voice AI Architecture

WebRTC.ventures builds production Voice AI systems for regulated and high-volume customer workflows across telehealth, fintech, contact center, and CPaaS platforms. Our work includes real-time media architecture, SIP and WebRTC integration, LLM orchestration, observability design, compliance-aware record keeping, and long-term production support.

The record layer is part of how we design systems from the start, not something we add after a compliance review surfaces a gap. If your team is moving a Voice AI system from prototype to production and wants architecture that holds up under audit, dispute resolution, and long-term operations, we would like to hear about what you are building!

Voice AI Conversation Records: Why vCons Belong in Your Production Architecture.

Voice AI Conversation Records: What Production Systems Actually Capture

What Is vCon? The IETF Conversation Container Standard

Why Conversational Data Gets Fragmented Across Systems

vCon SIP Signaling and STIR/SHAKEN: What the New IETF Draft Adds

Voice AI Compliance Architecture: What the Record Layer Needs

Working With WebRTC.ventures on Production Voice AI Architecture

Building Multi-Agent Voice AI: Real-Time Orchestration Lessons from a Clinical Training Simulator

Scaling Janus WebRTC Server: Building a Media Resource Broker

Migrating from Kurento to LiveKit in Production: A Real-World Case Study

Building a Video AI Agent with Vonage Video Connector SDK and Pipecat Transport

Recent Blog Posts

Building Multi-Agent Voice AI: Real-Time Orchestration Lessons from a Clinical Training Simulator

Scaling Janus WebRTC Server: Building a Media Resource Broker

Migrating from Kurento to LiveKit in Production: A Real-World Case Study

Building a Video AI Agent with Vonage Video Connector SDK and Pipecat Transport

We’re one of the few agencies in the world dedicated to WebRTC development. This dedication and experience is why so many people trust us to help bring real-time application dreams to life.

Let's get started!

Contact us today

Join our mailing list!

Categories

Voice AI Conversation Records: What Production Systems Actually Capture

What Is vCon? The IETF Conversation Container Standard

Why Conversational Data Gets Fragmented Across Systems

vCon SIP Signaling and STIR/SHAKEN: What the New IETF Draft Adds

Voice AI Compliance Architecture: What the Record Layer Needs

Working With WebRTC.ventures on Production Voice AI Architecture

Recent Blog Posts

Recent Blog Posts

We’re one of the few agencies in the world dedicated to WebRTC development. This dedication and experience is why so many people trust us to help bring real-time application dreams to life.