Blog
Real-World Agent Fabric Challenges (And How We Actually Solved Them)
- November 12, 2025
- Ajay Konda
By Ajay Konda, Co-Founder & Chief Growth Officer at Prowess Software Services
You know what nobody tells you about implementing Agent Fabric? The documentation makes it sound straightforward. The demos look clean. The presentations are polished.
Then you actually start implementing it, and boom – you hit problems that weren’t in any of those carefully prepared materials.
I’ve been architecting Agent Fabric implementations for about a year now. Worked with maybe twenty different organizations across healthcare, finance, retail, manufacturing. And honestly? Every single one hit unexpected issues. Different issues, sure, but nobody sailed through this without some bumps.
The thing is, most of these problems are solvable. You just need to know they’re coming and have a plan for dealing with them.
So let me walk you through the real challenges we’ve seen – not the theoretical ones from vendor presentations, but the actual messy problems that happen when you’re implementing this stuff in a real organization with real legacy systems and real organizational politics.
Challenge 1: Nobody Actually Knows What Agents They Have
This sounds basic, right? Before you can govern your agents, you need to know what agents exist.
Except here’s what actually happens. You ask your development teams “what agents are we running?” And they’ll tell you about the official ones. The ones in the project plans and architecture diagrams.
Then you start digging. And you find agents nobody mentioned. Marketing has this chatbot they spun up using some SaaS tool. Finance has an automation thing processing invoices that someone built during a hackathon three years ago. Customer service is piloting an AI assistant that wasn’t approved but is somehow handling 20% of their ticket volume.
What Actually Worked!
This sounds basic, right? Before you can govern your agents, you need to know what agents exist.
Except here’s what actually happens. You ask your development teams “what agents are we running?” And they’ll tell you about the official ones. The ones in the project plans and architecture diagrams.
Then you start digging. And you find agents nobody mentioned. Marketing has this chatbot they spun up using some SaaS tool. Finance has an automation thing processing invoices that someone built during a hackathon three years ago. Customer service is piloting an AI assistant that wasn’t approved but is somehow handling 20% of their ticket volume.
Challenge 2: Existing Agents Weren't Built For Governance
Here’s a problem that hit almost every client. They’ve got agents running in production. These agents work. Maybe not perfectly, but they’re providing business value.
Then you try to bring them under Agent Fabric governance, and… they don’t fit the model. They weren’t designed with governance in mind. They make assumptions about how they’ll be deployed, how they’ll authenticate, what data they can access.
The Healthcare Example That Taught Us A Lesson
Healthcare client had five agents handling patient interactions. Built over eighteen months by different teams. All working fine independently.
We tried to implement Flex Gateway policies. Three of the five agents immediately broke.
Why? They were hardcoded to authenticate directly against the patient database. No concept of going through a gateway. The agents expected direct database credentials. Our policies required agent-to-system communication to go through the gateway with proper authorization tokens.
We had two options:
- Rebuild the agents from scratch (expensive, time-consuming)
- Create an adapter layer that translated between how the agents worked and how governance expected them to work
Guess which one we chose?
The adapter pattern saved us. We built a lightweight translation layer that sat between the agents and Flex Gateway. The agents still authenticated the old way, but the adapter translated that into proper token-based auth that the gateway understood.
Not ideal architecturally. Kind of hacky. But it let us bring existing agents under governance without a complete rewrite.
Over time, as those agents got updated for other reasons, we gradually refactored them to work natively with Agent Fabric. But the adapter let us get governance in place immediately rather than waiting months for rebuilds.
Lesson learned: Perfect is the enemy of good. Sometimes you need transitional architecture to bridge from where you are to where you want to be.
Challenge 3: Performance Impact Nobody Expected
Agent Fabric adds layers. Layers add latency. That’s just physics.
Registry lookups, policy evaluations, routing decisions, monitoring overhead – all of that takes time. Usually not much time. But when you’ve got agents making hundreds of calls per second, milliseconds add up.
The Manufacturing Disaster (That We Fixed)
Manufacturing client had agents coordinating their production line. Time-sensitive stuff – agents needed to make decisions in under 100 milliseconds or the line would backup.
We implemented Agent Broker for orchestration. Testing looked good. We deployed to production.
Within an hour, the production line started missing deadlines. Agent response times had gone from 60ms to 180ms. Not a huge increase in absolute terms, but enough to break their SLAs.
Management was furious. Wanted to roll back everything. Threatened to cancel the whole project.
We spent that weekend troubleshooting. Found three problems:
Problem 1: Agent Broker was doing synchronous policy evaluations. Every routing decision waited for policy checks to complete. We switched to asynchronous policy evaluation with cached results. Cut 40ms right there.
Problem 2: Agent Visualizer was logging everything in real-time. Every single interaction. Writing to the monitoring database was creating I/O bottlenecks. We moved to sampling – log 1 in every 100 routine interactions, but always log errors and anomalies. Saved another 30ms.
Problem 3: Agent discovery was happening on every call. Agent Broker would query the registry to find the right agent, even for agents it had just talked to seconds earlier. We implemented intelligent caching with a 30-second TTL. Shaved off final 20ms.
After those changes, we were actually 10ms faster than before Agent Fabric. The routing optimizations more than made up for the governance overhead.
Key lesson: Performance test with production-like load before you deploy. And build in monitoring from day one so you can see where bottlenecks are.
Challenge 4: Organizational Politics Are Real
This one’s messy because it’s not technical. But honestly? It’s killed more Agent Fabric projects than any technical challenge.
Different teams built different agents. Each team has invested time, effort, political capital in their agents. Now you’re telling them their agents need to be governed, cataloged, maybe consolidated with similar agents from other teams.
People get defensive. Territorial. “My agent is special. It can’t work under these governance policies. You don’t understand our requirements.”
The Financial Services Political Minefield
Financial services client had three different teams who’d each built fraud detection agents. Each team was convinced their approach was superior.
When we suggested consolidating to one agent (or at least coordinating them through Agent Broker), all three teams pushed back hard.
Risk team: “Our agent uses proprietary algorithms we can’t share with other teams.”
Operations team: “Our agent has to respond in real-time, can’t wait for orchestration overhead.”
Compliance team: “Our agent follows regulatory requirements the others ignore.”
All of these concerns were… not entirely baseless. But they also weren’t insurmountable. What we really had was a turf war disguised as technical requirements.
What actually worked: We didn’t fight it head-on.
Instead, we implemented Agent Fabric for a completely different use case first – their customer onboarding process. Different teams, no territorial issues. Got that working smoothly, proved the value, built organizational credibility.
Then six months later, we came back to fraud detection. By then:
- People had seen Agent Fabric working elsewhere
- We had metrics showing the benefits
- Leadership was bought in
- The political landscape had shifted
We still didn’t force consolidation. But we got all three fraud agents registered, properly governed, and coordinating through Agent Broker. They each kept their specialized algorithms. But now they shared information and could hand off suspicious cases to whichever agent was best suited.
Not the elegant single-agent solution we’d originally envisioned. But way better than three completely siloed agents.
Key lesson: Sometimes you need to go around obstacles rather than through them. Pick your battles. Build credibility with wins in less contentious areas first.
Challenge 5: The Legacy Integration Nightmare
Most organizations aren’t greenfield. You’re not starting fresh. You’ve got existing integration architecture – ESBs, point-to-point connections, custom middleware, probably some stuff that predates everyone currently working there.
Agent Fabric needs to work alongside all that. And documentation doesn’t really cover “here’s how to integrate with that ESB you implemented in 2012 that’s held together with duct tape and prayers.”
The Retail Client With The Frankenstein Integration Layer
Retail client had agents that needed to talk to their order management system. Sounds simple.
Except their OMS was behind an ancient ESB. Which connected to a legacy AS/400. Through a custom Java adapter. That hadn’t been updated since 2015. And nobody who wrote it still worked there.
We needed agents to place orders. Which meant going through Agent Fabric governance. Which meant routing through Flex Gateway. Which meant… somehow getting Flex Gateway to talk to that ESB.
First attempt: Failed spectacularly. The ESB couldn’t handle the authentication tokens Flex Gateway was sending. It expected ancient WS-Security headers that nothing has used since Obama’s first term.
Second attempt: We tried building an adapter. Worked in testing. Fell over in production because the ESB had undocumented rate limits that the adapter exceeded.
Third attempt: Worked. Here’s what we did:
Built a translation service that sat between Agent Fabric and the ESB. The service:
Spoke modern API protocols toward Agent Fabric
- Spoke ancient ESB protocols toward the legacy system
- Implemented intelligent queuing to respect the ESB’s rate limits
- Added monitoring so we could see when things broke
- Cached responses where possible to reduce load
It was ugly. The code made me sad. But it worked.
And here’s the thing: That translation service became reusable. When we onboarded more agents that needed to talk to legacy systems, we just extended the same service.
Six months later, that “ugly hack” was processing 40% of their agent-to-legacy-system traffic and had become critical infrastructure.
Key lesson: Don’t let legacy integration stop you. Build bridges. They don’t have to be pretty, they just have to work. And sometimes ugly bridges become permanent infrastructure.
Challenge 6: Security Team Freakouts
Security teams – rightfully – freak out about AI agents. Autonomous systems making decisions, accessing data, interacting with customers? That’s scary from a security perspective.
When you implement Agent Fabric, you’re making the security team’s concerns visible in a way they weren’t before. Suddenly they can see in Agent Visualizer all the things agents are doing. And they don’t like what they see.
The Insurance Company Security Crisis
Insurance company implemented Agent Visualizer. Within a week, their CISO called an emergency meeting.
Turned out, their customer service agents were accessing way more customer data than anyone realized. Not maliciously. Just because nobody had ever properly scoped what data those agents actually needed.
One agent was pulling entire customer profiles (including SSN, financial info, medical history) when all it actually needed was name and policy number. It had been doing this for eight months. Nobody noticed because there was no visibility.
Now the CISO could see it in Agent Visualizer. And he was Not Happy.
The project almost got shut down. CISO wanted to pause everything until we did a complete security review.
What saved us: We turned it into a win.
“Look,” we said, “this problem existed before Agent Fabric. You just couldn’t see it. Now you can see it. And more importantly, we can fix it.”
We spent two weeks doing an intensive data access review:
- Documented what data each agent actually needed
- Created least-privilege policies in Flex Gateway
- Implemented those policies
- Monitored the results in Agent Visualizer
That customer service agent? Went from accessing 45 fields to accessing 6. Reduced data exposure by over 85%. And it still worked fine for its intended purpose.
We did this for all twenty-three agents they had running. Massive improvement in security posture.
The CISO went from wanting to shut down the project to becoming its biggest champion. He presented the results to the board as a security win.
Key lesson: Visibility can be scary at first, but it’s better than ignorance. Use Agent Fabric’s observability as a security feature, not just a bug. Make the security team your allies by helping them solve problems they didn’t even know they had.
Challenge 7: The "It Works In Dev" Syndrome
Testing Agent Fabric in a dev environment is one thing. Running it in production with real load, real users, real consequences? Completely different.
We’ve had multiple implementations that worked perfectly in testing and fell apart in production. Usually for reasons that seem obvious in hindsight but weren’t caught during testing.
The Healthcare Deployment That Went Sideways
Healthcare client tested their Agent Fabric implementation thoroughly. Three months of testing. Everything looked great. We deployed to production on a Tuesday morning.
By Tuesday afternoon, patient intake was backing up. Agents were timing out. Calls were getting dropped.
What happened? In production, they had 50x more concurrent users than they’d tested with. The registry couldn’t handle that many simultaneous lookups. Agent Broker’s routing queues filled up. Everything started failing in a cascading way.
We spent a very stressful 18 hours:
Scaling up registry infrastructure (more instances, better caching)
- Increasing Agent Broker queue sizes
- Implementing circuit breakers so failures didn’t cascade
- Adding load balancing we should’ve had from the start
Got it stable by Wednesday evening. But it was a rough 36 hours.
What we learned: Load testing is not optional. And realistic load testing – with production-like patterns, not just “run a bunch of requests” – is essential.
Now we use a phased rollout approach:
- Week 1: 10% of traffic through Agent Fabric
- Week 2: 25% of traffic
- Week 3: 50% of traffic
- Week 4: 100% of traffic
At each phase, we monitor like crazy. If we see problems, we can roll back before everything’s on fire. It takes longer. But way less stressful than the alternative.
Challenge 8: Documentation Drift
Here’s something nobody warns you about. You implement Agent Fabric. You document everything beautifully. Registry entries are complete. Policies are documented. Runbooks are written.
Six months later, half of it’s out of date.
Agents got updated but registry entries weren’t. Policies changed but documentation didn’t. New team members joined and nobody told them about Agent Fabric governance requirements.
Documentation drift is real, and it’s insidious because it happens gradually.
The Manufacturing Client’s Documentation Problem
Manufacturing client had great documentation at implementation. Really thorough. Gold star.
A year later, we came back for an expansion project. Tried to use their registry to understand their agent architecture.
Couldn’t. Registry was hopelessly out of date. Agents listed that no longer existed. New agents not listed at all. Descriptions that bore no resemblance to what the agents actually did.
We had to do the audit all over again. From scratch.
The solution that actually stuck:
Made documentation part of the deployment pipeline. You literally could not deploy an agent update without updating the registry. CI/CD pipeline checked:
- Is this agent registered?
- Does the registry entry match the current code?
- Are the documented capabilities accurate?
- Is the ownership info current?
- If any of those checks failed, deployment failed.
Annoying for developers at first. But after a few weeks, it became habit. And documentation stayed current because keeping it current was easier than fighting the deployment pipeline.
Also implemented quarterly registry reviews. Someone from each team spent an hour verifying their agents’ registry entries were accurate. Made it part of team leads’ objectives so it actually happened.
Not sexy. Not exciting. But it worked.
What Actually Prevents These Problems !
After dealing with all these challenges across multiple clients, some patterns emerged for preventing issues before they happen:
Start Small, Win Quick Don’t try to govern every agent on day one. Pick one workflow. Make it work perfectly. Build credibility. Then expand.
Automate Everything Possible If it requires human discipline to maintain, it will fail. Build automation. Make the right thing the easy thing.
Build Monitoring From Day One You can’t fix what you can’t see. Agent Visualizer isn’t optional. It’s essential for catching problems before they become disasters.
Get Security Involved Early Don’t surprise the security team. Make them partners from the beginning. They’ll help you, not fight you.
Plan For Legacy You will need to integrate with systems that weren’t designed for modern governance. Budget time and resources for translation layers.
Document The Weird Stuff Don’t just document the happy path. Document the edge cases, the workarounds, the “we did this weird thing because…” explanations.
Test With Production Load Dev testing is necessary but not sufficient. Load test with realistic patterns before you deploy.
Create Feedback Loops Regular retrospectives. Quarterly reviews. Continuous improvement. Agent Fabric isn’t a project you finish, it’s infrastructure you maintain.
The Honest Truth !
Implementing Agent Fabric is hard. Not technically impossible, but definitely challenging. You will hit unexpected problems. Things will break. People will be frustrated.
But here’s the thing – it’s still worth doing. Every organization that’s gotten through the rough spots has said the same thing: “Glad we did this when we did, rather than waiting until we had even more agents and even more problems.”
The challenges are real. But they’re solvable. And the alternative – letting agents proliferate without governance – is way worse.
Just go in with eyes open. Expect problems. Build slack into your timeline. Don’t promise your CEO that everything will be perfect by end of quarter.
And when you hit challenges? Reach out to people who’ve been there. We’ve probably seen your problem before and can save you some pain.
That’s kind of the point of writing this, honestly. So the next person implementing Agent Fabric can skip some of the mistakes we made and learn from our expensive lessons.
Good luck out there.
About the Author
Ajay Konda is the Co-founder and Chief Growth Officer at Prowess Software Services. He leads the company’s innovation in MuleSoft Agent Fabric, helping enterprises build governed, orchestrated, and observable agent ecosystems. Ajay is passionate about solving enterprise-scale challenges through Agent Fabric, turning fragmented AI initiatives into connected, compliant, and high-performing agent networks. Reach out at ajay.konda@prowesssoft.com