LLMs alone can’t “act.” They generate text. The key to success, and the way to avoid the 80% of AI projects that never leave the prototype stage, is moving beyond conversation to orchestration. This means integrating LLM reasoning with automation frameworks, enabling explainable outcomes and human oversight, adding guardrails, and solving real pain points.

Joining us on WebRTC Live episode 109 to demonstrate how AI agents can be orchestrated to take meaningful action within production-grade environments were Alberto González and Mariana López, CTO and COO of WebRTC.ventures and AgilityFeat. Drawing from the companies’ AI integration work in real-time and asynchronous applications, they show us how to identify high-value opportunities for agentic automation, integrate AI agents into existing systems, and apply design patterns for agentic workflows, including single-agent chatbot, RAG agent, multi-agent coordination, human-in-the-loop, and event-driven agent. They also outline strategies for maintaining and scaling sustainable AI operations.

The episode also features Arin Sime and Tsahi Level-Levi’s Monthly WebRTC Industry Chat. This month, they discussed importance of monitoring user-side CPU issues. Watch on YouTube.

Watch Episode 109: Agentic Workflows That Work in Production

Episode highlights and key insights below.

Key Insights

Successful AI starts with the problem, not the model. Many teams fail by assuming LLMs can solve everything. But effective AI systems are built by first identifying the real problem, then choosing the right tools for each task.

Mariana explains, “I think that that’s where a lot of us fail in thinking LLMs can solve everything and not thinking, ‘Well, what is the real pain point here? What are we trying to solve? And then let’s find the perfect tool for each of the pain points that we have and combine them.’ And I think that that’s where it is really important that you have a technologist that can help you and that can help build the solution for you, because they will know, they’ll say, ‘Okay, this is a perfect use case for AI, we can build this in,’ but then you also have to think about, how do I add guardrails, how do I make sure that it’s not approving all of my loans or that it’s not escalating when it has to, it’s not hallucinating?”

  There is no “best” model, only the right model for your use case. Model quality is not universal; it’s contextual. Choosing the right model, therefore, isn’t about picking a winner from a leaderboard, but about evaluating what best supports your specific goals. 

Alberto explains, “Qualities are a bit objective. No matter who you ask, everyone will have their preferred option. I think it varies. Also, there are all these tests out there that evaluate different points of view on how these models perform. Some are very good at code. We know others are for maths or for conversation, others are more polite, you name it, they’re all these tests. So I think it’s a bit about finding the one that fits what you want to build, that fits also the cost of course of what you want to do, and in terms of latency. So I would say it depends, as a consultant, that’s the typical answer, but I would evaluate for this specific use case. See what’s the latency you want more sometimes, better accurate LLMs are slower. So we don’t have a preferred option that we think goes well with everything.”

Designing AI safely requires prioritizing guardrails, privacy, and security from the start. Without proper safeguards, personally identifiable information (PII) can travel across multiple third-party systems, creating serious privacy and security risks. 

As Alberto explains, “I can talk from the WebRTC perspective or the real-time perspective. The danger is that you are sending whatever is being said by the user directly to a model so it’s obviously a bit dangerous that PII information could be already captured by a third party model, then sent another other element that’s another third party. So it’s PII information going everywhere, and obviously, someone could even have that model provider and get access to it, which would be horrible. So you want to make sure you block that from the beginning.”

Episode Highlights

Production-ready AI requires system design, not just smart models

Everyone’s building AI features. Very few are building AI systems that actually work in production. 

So, how to move from the text response of an LLM to an agentic workflow that can take action in the real world? Mariana explains, “I think when you do that, you’ll have to think about what are the strengths of the LLM generating text and how can you add that into a workflow that has knowledge basis, that can perform actions, give it the right tooling but also, when you do that, when you give it more power than just generating words, how do you also enable guardrails, observability, and you make sure that you test everything in a production environment? So it’s essentially embedding something that can generate, that can create into an existing workflow, and giving it just the right amount of context and just the right amount of tools.”

Hallucinations are a system design problem, not a prompting problem

Most teams try to solve AI hallucinations at the prompt level: more instructions, more constraints, more clever wording. But in production systems, that approach breaks down quickly. 

As Alberto explains in this episode, the real solution isn’t better prompting, it’s better system design: “There isn’t a hundred percent solution out there, but we treat hallucination today like a control problem and not a prompt problem. So it means it’s not just about telling the LLM, ‘Please, please, please, don’t hallucinate’ or ‘Please, please, please never answer these or that.’ This is never going to work because probably everyone knows that sometimes fails, sometimes they hallucinate. So what you do then is have this separation and have a reasoning stage, which is the LLM, and an execution stage, which is just built by code. So you basically will have maybe a decision table, you will have maybe a specific software. You don’t need to have a table; it could be just a flag, just some software, some policy system. There can be many ways to do that, but basically, some way to rely on something else to make decisions.”

Most AI projects fail because they don’t solve real user pain.

Chasing the latest AI trends often results in impressive demos but weak, unusable products. As Mariana explains in this episode, successful AI implementations start by addressing real user friction, not by adding technology for its own sake.

She says, “There’s always this race for new tools or new technology, and everybody rushed to create the AI chatbot. And we have to stop and think, ‘Well, is this really going to make our users journey better? Or are we just building for the sake of building?’ And you have to think about where are our pain points, what are the friction points that our users are really feeling? And when you have real pain points or things that you haven’t been able to solve, then you can start thinking, ‘Well, is AI the right technology here?’”


Up Next! WebRTC Live #110

Everything You Need to Know About TURN Servers

TURN servers remain one of the most common points of confusion in WebRTC applications. That’s why we’ve assembled a panel of experts to cover everything you need to know.

Wednesday, February 25, 2026 at 10:00 am Eastern (note early time!)

Register for WebRTC Live 110

Recent Blog Posts