The choice between conversation-based and turn-based Voice AI agent patterns is a strategic business decision, not just a technical detail. Beyond what your agent will say, you must decide how it will run. This architectural choice defines how your voicebot will scale, what it will cost to operate, and how your customers will experience it.

There are two primary patterns for deploying a Voice AI agent:

  • Conversation-based (Isolated Process): A stateful, dedicated “concierge” for each user, staying with them for the entire call
  • Turn-based (Shared Process): A stateless, highly efficient “operator” that handles requests from all users one turn at a time

Choosing the right pattern is a strategic decision that aligns your technology with your core business goals, whether those goals prioritize ultimate performance or massive scale. In this post, we’ll explore the architecture, advantages, and trade-offs of each approach to help you make the right choice.

Understanding the Two Voice AI Agent Patterns

Let’s examine the architecture, advantages, and disadvantages of each approach, while also seeing live examples of each.

What is Conversation-based Voice AI Architecture? (Isolated Process Pattern)

The conversation-based pattern is a stateful, long-running process. Think of it as a dedicated server or container that is provisioned the moment a user connects, and lives for the entire duration of that single user’s session.

Why Use a Conversation-based Voice AI Agent Pattern

  • Effortless Context: Because the process is dedicated to one user, the session’s context (who the user is, what they just said, the history of the chat) can be held directly in memory. This is extremely fast and simplifies development.
  • No Context-retrieval Latency Overhead: Since the agent doesn’t need to make an external database call to “remember” what was just said, this instant recall of short-term context enables a more fluid conversational flow.
  • Good for Complexity: This pattern is ideal for complex, multi-turn dialogues where maintaining deep, instant context is critical to the agent’s function.

What to Watch Out For When Using a Conversation-based Voice AI Agent Pattern

  • Scaling Challenge: The scaling model is 1:1. If you have 10,000 concurrent users, you must run 10,000 concurrent processes.
  • Cost Inefficiency: You are paying for the compute resources for that entire process, even during moments when the user is silent.

This pattern is common with frameworks like Pipecat or LiveKit Agents. It can be implemented by running a dedicated container per session in a cluster manager like Amazon ECS or Kubernetes.

See a Conversational-based Pattern in Action

This simplified Python example shows how to configure Agent’s components like Speech-to-Text (STT), Large Language Model (LLM) and Text-to-Speech (TTS) models using the LiveKit Agents framework. The framework takes care of orchestrating requests to each of the involved servers, and handling audio streams to and from the LiveKit Session.

"""LiveKit voice agent entry point."""

from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent
from livekit.plugins import silero
from agent_logic import SYSTEM_PROMPT

load_dotenv()

async def entrypoint(ctx: agents.JobContext):
    """Main agent entrypoint - called for each session."""
    
    # Define Agent parameters
    session = AgentSession(
        stt="deepgram/nova-3:en", # STT using Deepgram Nova 3
        llm="openai/gpt-4o-mini", # LLM using OpenAI GPT-4o Mini
        tts="rime/mistv2:courtney", # TTS using Rime MistV2 
        vad=silero.VAD.load(), # VAD using Silero
    )
    
    # Connect to LiveKit session and set main instructions
    await session.start(
        room=ctx.room,
        agent=Agent(instructions=SYSTEM_PROMPT),
    )
    
    # Generate a first greeting
    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

# Start the agent process
if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Conversation-based Voice AI Agent Pattern Demo:

What is Turn-based Voice AI Architecture? (Shared Process Pattern)

The turn-based pattern is a stateless, short-running process. Think of a serverless function (like AWS Lambda) or a shared pool of containers. This process handles just one “turn” (a single user’s sentence) and then either disappears or moves on to serve a different user.

This turn-based approach typically relies on a voice pipeline managed by an external provider, such as Amazon Connect, Twilio ConversationRelay, or Layercode.

Why Use a Turn-based Voice AI Agent Pattern

  • Massive, Elastic Scale: Processes are shared, not dedicated. This makes it incredibly cost-effective and easy to handle huge, unpredictable spikes in traffic.
  • High Resource Efficiency: If a serverless model like AWS Lambda is used, you only pay for compute time when a user is actively speaking and a turn is being processed.

What to Watch Out For When Using a Turn-based Voice AI Agent Pattern

  • The Context Hurdle: This is the critical challenge. Since the process is stateless, it has no memory of the user. Context must be stored externally (e.g., in a high-speed database like Redis or DynamoDB) and fetched on every single turn.
  • Context-Retrieval Latency Overhead: This external database call to fetch context and write it back adds latency to every turn. This can make the agent feel slower or less responsive.
  • Engineering Complexity: Developers must be extremely careful to load the correct user’s context for every turn and prevent data “bleed” between different user sessions.

See a Turn-based Pattern in Action

This simplified example shows a webhook handling request from the managed media pipeline running on Layercode. Notice how it fetches context from an external database (using the get_context function) at the start of each turn, and writes it back at the end (using the save_context function)

@app.post("/webhook")
async def agent_webhook(request: Request):
    # more code ...
    
    # get the request body
    body = await request.body()
    payload = json.loads(body)
    
    # more code ...
    
    # retrieve context from the request body
    conversation_id = payload.get('conversation_id')
    turn_id = payload.get('turn_id')
    event_type = payload.get('type')
    
    # function for generating responses    
    async def generate():
        if event_type == 'session.start':
		 # set the initial context
            save_context(conversation_id, [])
            print(f"  → Initialized conversation {conversation_id[:8]} 
               (external storage)")
            
		 # if it's the first turn, start with a greeting
            yield f'data: {json.dumps({"type": "response.tts", "content": 
                "Hello! How can I help you today?", "turn_id": turn_id})}\n\n'
            yield f'data: {json.dumps({"type": "response.end", "turn_id": 
              turn_id})}\n\n'
            
        elif event_type == 'message':
            # handle further turns
            text = payload.get('text', '')
            
 # more code ...
            
		 # get conversation context based on conversation_id
            print(f"  → Loading context from storage...")
            history = get_context(conversation_id)
            print(f"  → User: {text}")
            
            # more code ...
            
            # Store user message immediately with turn_id
            history.append({"role": "user", "turn_id": turn_id, "content": text})
            
            # more code ...

		 # update conversation context
            save_context(conversation_id, history)
            
            messages = [{"role": "system", "content": SYSTEM_PROMPT}]
            # Filter out turn_id for LLM (only needs role/content)
            messages.extend([{"role": m["role"], "content": m["content"]} 
               for m in history if m["content"]])
            
            print(f"  → Streaming LLM response with {len(history)} 
                messages in context...")
            
            # generate response
            response = await client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                stream=True
            )
            
            full_response = ""
            async for chunk in response:
                if chunk.choices[0].delta.content:
                    content = chunk.choices[0].delta.content
                    full_response += content
                    yield f'data: {json.dumps({"type": "response.tts", "content": 
                       content, "turn_id": turn_id})}\n\n'
            
            print(f"  → Agent: {full_response}")
            
            # Add agent response to conversation context
            for msg in reversed(history):
                if msg.get('role') == 'assistant' and msg.get('turn_id') == turn_id:
                    msg['content'] = full_response
                    break
            
            print(f"  → Saving context to storage...")
            save_context(conversation_id, history)
            
            yield f'data: {json.dumps({"type": "response.end", "turn_id": 
                 turn_id})}\n\n'
            
        # more code ...
    
    # return partial responses
    return StreamingResponse(generate(), media_type="text/event-stream")

Turn-based Voice AI Agent Pattern Demo:

Voice AI Agent Patterns Comparison: Key Differences

This table provides a simple breakdown of the two patterns.

FeatureConversation-based (Isolated)Turn-based (Shared)
StateStateful (In-Memory)Stateless (Externalized)
Process LifetimeLong-running (Full Session)Short-running (Single Turn)
Scalability1-to-1 (Processes = Users)N-to-M (Shared Pool)
Cost ModelPay-per-session-timePay-per-turn/request
Context Mgt.SimpleComplex
Best For…Complex Dialogues, Max PerformanceHigh Volume, Cost-Efficiency

How to Choose the Right Voice AI Agent Pattern for Your Business

The right choice depends entirely on your business requirements.

Choose a Conversation-based AI Agent Pattern if…

  • Your primary goal is the most natural, lowest-latency user experience possible.
  • Your application involves complex, long-running interactions (e.g., a virtual therapist, a detailed sales consultation, an advanced co-pilot).
  • You want full control over the media pipeline.
  • You have a relatively predictable number of concurrent users and a budget to support dedicated processes.

Choose a Turn-based AI Agent Pattern if…

  • Your primary goal is massive scale and cost-efficiency.
  • You anticipate huge, spiky, and unpredictable traffic patterns (e.g., a marketing event or viral promotion).
  • Your media pipeline (session management and speech-to-text & text-to-speech models) is offloaded to an external service such as Layercode, Amazon Connect, or Twilio ConversationRelay.
  • Your interactions are more transactional (e.g., “check balance,” “book a flight,” “what’s the weather?”).

Making the Right Voice AI Architecture Decision

Your first decision is the most important. The choice between Conversation-based and Turn-based is a strategic business decision, not just a minor technical detail. It defines how your application will scale, what it will cost to operate, and how your customers will experience it.

Make a deliberate choice for your tech stack. The stateful (performance-first) and stateless (scale-first) models have fundamentally different cost and customer experience profiles.

Before you write a single line of code, let’s discuss your use case. Contact our expert AI Integration team at WebRTC.ventures for an architectural review to ensure your Voice AI agent is built on the right foundation for your specific business goals from day one.

Further Reading:

Recent Blog Posts