Voice agents are moving out of demos and into production systems that handle customer requests, account actions, healthcare questions, financial details, and support escalations. That changes the security model.
A voice agent is listening, deciding, acting, and speaking in real time. That is why voice agent security cannot rely only on prompts or post-call transcript review. Sensitive data guardrails need to run inside the live pipeline: before sensitive data reaches the model, before tools are executed, and before unsafe responses are spoken back to the user.
To make this pattern more concrete, we built a reference implementation of a Voice AI pipeline using LiveKit, Whisper, OpenAI, and Kokoro for the core agent and media stack, and GLiGuard, a small 300M classifier for PII, PHI, PCI and other risk detection inside the live pipeline.
In this post, we walk through a reference architecture for secure realtime voice AI, show how guardrails are enforced inside a live LiveKit pipeline, and cover the layered security model that production systems need.
Reference architecture for secure realtime voice AI

This diagram illustrates this secure realtime voice AI pipeline from the moment a user speaks to the moment the system responds.
Audio enters through a browser, phone call, or other client and is transported through the realtime communications layer. The speech is transcribed into text, analyzed for sensitive information and policy risks, and then filtered through guardrails before any content is provided to the AI model. The model can invoke tools through an authorization layer, and its responses are checked by output guardrails before being converted back into speech and delivered to the user.
Each stage is a control point. In our reference implementation, the important architectural point is that PII and risk classification happens before the agent receives the full user input, not after the conversation is already complete.
The system should decide:
- what data enters model context
- what needs to be redacted or blocked
- what tools the agent can request
- what should be logged for audit and review
- when to escalate to a human
That is a much stronger model than hoping one prompt can contain the entire risk surface.
Enforcing voice agent guardrails inside the live voice pipeline
In the reference implementation, the guardrail layer is split into two parts: a local GLiGuard classification adapter and a LiveKit agent hook that screens each completed user turn before the LLM sees it.
The local adapter exposes a simple /classify endpoint and returns a score, category, reason, model name, and optional raw model output.
A simplified version of the classification endpoint looks like this:
@app.post("/classify", response_model=ClassifyResponse)
def classify(payload: ClassifyRequest) -> ClassifyResponse:
result = get_model().classify_text(
payload.text,
TASKS,
threshold=THRESHOLD,
include_confidence=True,
include_spans=False,
)
moderation = result.get("moderation") if isinstance(result, dict) else None
label = "safe"
confidence = 0.0
if isinstance(moderation, dict):
label = str(moderation.get("label") or "safe")
confidence = float(moderation.get("confidence") or 0.0)
elif isinstance(moderation, str):
label = moderation
label = label if label in CATEGORIES else "safe"
score = 0.03 if label == "safe" else max(0.0, min(1.0, confidence))
return ClassifyResponse(
score=score,
categories=[label],
reason=(
"GLiGuard classified text as safe"
if label == "safe"
else f"GLiGuard classified text as {label}"
),
model=MODEL_ID,
raw=result,
)
The voice agent calls the guardrail pipeline when a user turn is completed. In this context, a “completed turn” means the user has stopped speaking and voice activity detection has decided that the audio segment is ready to process. At that point, speech has been transcribed into text, but the text has not yet been handed to the LLM.
Because this check runs in the live turn-taking loop, the classifier needs to be fast enough to avoid adding noticeable delay before the agent responds.
If the turn is safe, the worker replaces the original message with the screened or redacted version before the LLM sees it. If the turn is blocked, the raw sensitive text is not passed to the model.
async def on_user_turn_completed(self, turn_ctx, new_message):
text = _message_text(new_message).strip()
if not text:
return
result = await _screen_text(text, room_name, "livekit-user")
moderation = result.get("moderation") or {}
llm_safe = bool(moderation.get("llmSafe"))
redacted_text = str(
result.get("llmInput")
or moderation.get("redactedText")
or ""
).strip()
if llm_safe or redacted_text:
_set_message_text(new_message, redacted_text or text)
return
reason = str(
result.get("blockedReason")
or moderation.get("reason")
or "privacy guardrail blocked this turn"
)
_set_message_text(
new_message,
(
"The user shared sensitive details that were removed by "
f"the privacy guardrail. Reason: {reason}. "
"Ask for a version without sensitive personal data."
),
)
This is the core input-side security pattern: classify the transcript before model handoff, replace raw user text with safe or redacted text, and convert blocked sensitive content into a safe instruction instead of sending it directly to the LLM.
In many production workflows, the system may still need to collect sensitive data. The safer pattern is to capture that data through a controlled application flow, validate it, and persist it in a secure database with the right access controls, encryption, retention policy, and audit trail. The LLM should receive only the minimum context it needs, such as “payment method collected” or “identity verification completed,” rather than the raw card number, SSN, medical identifier, or credential.
Why deployment boundaries matter in Voice AI security
For many teams, the question is how much control they need over sensitive data and where it is allowed to flow.
Some organizations cannot send raw audio, transcripts, embeddings, or logs to external services without tight constraints. They need clear trust boundaries around where data flows, where policies are enforced, and where records are retained. Self-hosted and on-premise components help with data residency, compliance, private network access, retention policy, and auditability.
That does not mean every part of the stack must be self-hosted. In practice, hybrid deployments are common. The important part is that teams control the boundary and keep sensitive enforcement close to the realtime path.
For example, a team may decide to use a cloud-hosted LLM while keeping the media layer, transcript filtering, redaction, and audit storage inside its own infrastructure. Another team may self-host the classifier and TTS components but rely on a managed STT provider. The right boundary depends on the sensitivity of the data, the regulatory environment, and the operational maturity of the team.
For teams in healthcare, financial services, or other regulated industries where HIPAA, PCI-DSS, or GDPR requirements govern how patient and customer data is handled, having clear control over where enforcement happens and where data flows is often a hard requirement.
Realtime PII, PHI, and PCI screening before LLM handoff
This architecture is getting easier to deploy because smaller classification models can now run closer to the live path. GLiNER-style PII models, GLiNER Guard-style classifiers, and tools like Microsoft Presidio give teams more options for local detection, redaction, and policy decisions before text is passed downstream. GLiNER (Generalist and Lightweight Named Entity Recognition) is an open-source model architecture that can detect arbitrary entity types without task-specific fine-tuning, making it practical for PII detection across varied input without the overhead of a large model.
These tools can identify names, addresses, phone numbers, emails, account numbers, payment details, healthcare identifiers, and other sensitive spans before they are passed downstream. They become even more useful when paired with deterministic recognizers for patterns like SSNs, credit card numbers, account formats, and medical record IDs.
The goal is not perfect redaction, but measurable risk reduction with known precision and recall tradeoffs, while keeping latency low enough for realtime use. For realtime voice AI systems, security guardrails need to be accurate enough to reduce risk and fast enough to preserve a natural conversation.
A layered Voice AI security model for production systems
In practice, secure voice AI systems need multiple layers. Provider-level safety features are useful, but they should not be the only control point.
- Input guardrails inspect live transcripts before model context is built. This includes PII, PHI, PCI, prompt injection, abuse, prohibited content, or workflow-specific policy checks. For sensitive workflows, this should happen before text is sent to the LLM provider.
- Context guardrails limit what the model receives. The agent should not automatically get raw transcripts, full customer records, payment details, medical identifiers, credentials, or long-term memory. It should receive only the minimum context needed for the next step.
- Provider guardrails add safety around the LLM call. Depending on the provider, this may include moderation, PII detection, jailbreak detection, URL filtering, off-topic checks, output checks, and tool-call guardrails. These are useful, but they are only one layer.
- Tool guardrails separate reasoning from execution. The model can request an action, but application policy should decide whether it is allowed. Refunds, account changes, payment operations, medical guidance, record exports, and identity updates need explicit authorization outside the model.
- Output guardrails check what the agent is about to say before it is spoken. In realtime voice, unsafe content can otherwise become audio immediately. Output checks can block, replace, interrupt, or escalate the response.
- Audit guardrails record redactions, blocked requests, tool calls, policy decisions, escalation reasons, and latency without unnecessarily storing raw sensitive data.
Building secure voice AI for production
The safest voice AI systems are defined by controlled realtime pipelines where media, transcripts, models, tools, policies, and audit logs are governed together.
WebRTC sits at the center of the enforcement architecture: media, agent logic, guardrails, tools, and compliance requirements all flow through it. At WebRTC.ventures, we design and build secure realtime voice AI systems where those pieces work together from the start. Talk to our team.
Further Reading:
- Bedrock vs Vertex vs LiveKit vs Pipecat: Choosing a Voice AI Agent Production Framework
- Production Voice AI Architecture for Regulated Industries
- Building a Voice AI Agent with Policy Guardrails Using Twilio, Pipecat, and LangGraph
- Talking to Yourself Without Looking Crazy: Building a Voicebot with Your Cloned Voice Using Cartesia and LiveKit Agents
- QA Testing for AI Voice Agents: A Real-Time Communication QA Framework
- Don’t Mistake the AI Avatar for the Voice AI System Behind It

