What We Learned Building 54 AI Agents (And Why Most of Them Almost Didn't Work)
The real story of building an AI team — including the failures, the rewrites, and the moment everything clicked.
When people hear that Waymaker has 54 AI agents, they usually assume we built them all at once with some grand architecture document. The truth is messier, more interesting, and a lot more instructive.
The first agent was Cameron — the AI cofounder. Version one was one prompt, one API call, and it mostly just repeated itself. It would tell you your idea was great no matter what you described. It would forget your name between messages. It had the strategic depth of a fortune cookie. I remember showing it to a friend who said, "So it's ChatGPT with a name?" He wasn't wrong.
Today Cameron has 48 tools, 18 slash commands, 20 built-in skills, persistent memory across sessions, voice input and output, dream mode for subconscious ideation, and the ability to delegate to any of the other 53 agents. Getting from point A to point B nearly broke us three separate times. Here's what we learned along the way — not the polished version, but the real one.
Lesson 1: Specialists Beat Generalists Every Time
Our first instinct was the obvious one: make Cameron do everything. One mega-prompt. Thousands of tokens of instructions covering market research, code generation, copywriting, financial modeling, competitive analysis, project management, and emotional coaching. The result was an agent that did all of those things poorly. It hallucinated market data. It wrote code with confident syntax errors. Its marketing copy read like a compliance document. And the longer the conversation went, the worse it got, because the context window was drowning in instructions it couldn't prioritize.
The breakthrough came when I stopped thinking about AI architecture and started thinking about org charts. Real companies don't hire one person to do everything. They hire specialists and organize them into departments. So that's what we did. We split the agents into seven departments — exactly like a real company: Product (vision, roadmap, specs), Engineering (frontend, backend, code review, QA), Marketing (content, SEO, social, email), Sales (CRM, lead scoring, outreach), Executive (strategy, finance, legal, pricing), Operations (analytics, integrations, automation), and Coaching (focus, energy, motivation, founder wellness).
Each agent has ONE job. The Market Radar agent doesn't write code. The Frontend Developer agent doesn't do competitive analysis. The Finance agent doesn't write blog posts. This sounds obvious in retrospect, but it was genuinely counterintuitive at the time — we kept thinking a "smarter" model would make a generalist work. It didn't. Constraints made every single agent dramatically better. A focused agent with a tight system prompt and three relevant tools will outperform a generalist with a PhD-length prompt and access to everything, every single time.
Lesson 2: Tools Beat Prompts
For the first few months, our agents were basically elaborate prompt chains. "Pretend you have access to the user's product data." "Imagine you can search the market." "Act as if you can query the database." The results were creative fiction. The agents would invent plausible-sounding market statistics, fabricate competitor features, and generate code that referenced APIs that didn't exist. We were asking language models to hallucinate competently, and then acting surprised when they hallucinated.
The second breakthrough was giving agents real tools instead of asking them to imagine outputs. We built a tool registry with 28 tools — search_memories, get_user_context, get_product_info, execute_code, manage_files, call_agents_parallel, and more. The shift from "pretend you have access to the database" to "here's a function that queries the database and returns real results" was night and day. Agents stopped making things up because they didn't have to.
The architecture is straightforward but it took us three iterations to get right. Every agent extends BaseAgent, which provides a call_llm_with_tools() loop. The agent sends its prompt to the LLM, the LLM decides which tools to call (if any), the tools execute and return real data, and the loop continues until the agent has what it needs. All of this streams back to the user in real-time via Server-Sent Events, so you can watch the agent think, act, and respond. No spinners. No "please wait 30 seconds." You see it working, tool call by tool call. That transparency turned out to matter more than we expected — users trust agents they can watch.
Lesson 3: Memory Changes Everything
Early Cameron had the memory of a goldfish. Every conversation started from zero. Users would explain their product, their market, their goals, their constraints — and then the next day they'd have to do it all over again. It was like working with a brilliant colleague who got amnesia every night. The advice was good in isolation but disconnected from everything that came before. Users got frustrated fast, and honestly, so did we.
The third breakthrough was persistent memory. We built three layers of it. First, cross-session memory: when a conversation hits 40 messages, Cameron automatically compacts it into a summary and stores it. Next session, those prior summaries get injected into the LLM context, so Cameron remembers what you talked about last week, what decisions you made, what you were struggling with. Second, the Brain context service: Cameron doesn't just remember conversations — it has awareness of your actual business data. Your leads, your campaigns, your financials, your operational metrics. It knows you got 12 new signups yesterday and your email open rate dropped 15% this week, and it connects those dots without you having to tell it. Third, proactive nudges: 15 detectors that run every 5 minutes, scanning for drift, procrastination, missed opportunities, and potential problems. Cameron doesn't wait for you to ask — it notices you haven't updated your roadmap in two weeks and asks if you're stuck.
Memory transformed Cameron from a chatbot into something that actually feels like a cofounder. The difference between an AI that says "tell me about your product" and one that says "last time we talked, you were worried about your churn rate — I looked at this week's numbers and I have some thoughts" is the difference between a stranger and a partner. That's not a feature. That's the entire product.
Lesson 4: The SmartRouter Was the Hardest Part
Here's a problem that sounds trivial and absolutely isn't: when a user says something, which of the 54 agents should handle it? "Help me write a landing page" — is that the Marketing Copywriter, the Frontend Developer, or the Product Strategist? "My revenue is flat" — is that Finance, Marketing, Sales, or Cameron doing a coaching session? "Review this code and tell me if it's ready to ship" — Code Review agent or QA agent? Every ambiguous request is a routing decision, and bad routing means the user gets a confused response from the wrong specialist.
The SmartRouter went through four complete rewrites. The first version was keyword matching — embarrassingly bad. The second was a classifier — better but brittle. The third was an LLM call with the full agent registry in the prompt — accurate but slow. The final version uses selective tools: the router has access to search_memories, get_user_context, get_product_info, and get_missions, so it can understand the user's full context before deciding who should respond. It knows that when a SaaS founder asks about "conversion," they probably mean the Sales agent, but when a course creator asks the same word, they probably mean the Marketing agent.
One thing I'm proud of: the agent fallback list is now empty. All 43 registered backend agents are fully implemented. No stubs. No "coming soon" placeholders. No fallback to a generic response. Every agent does its actual job with real tools. That took over a year to achieve, and there were long stretches where half the agents were essentially wrappers around a generic prompt. Getting to zero fallbacks was a grind, but it means every routing decision lands on an agent that can actually deliver.
Lesson 5: Ship Ugly, Fix Live
The first version of every single agent was embarrassing. The Marketing Copywriter agent wrote copy that sounded like a corporate press release from 2008. The Code Review agent missed obvious bugs and flagged working code as broken. The Finance agent hallucinated revenue projections with decimal-point precision and absolute confidence. The Competitive Analysis agent would sometimes compare your product to companies that didn't exist. I'm not exaggerating. These were bad.
But we shipped them anyway. Every single one. Because an agent that exists in production, getting real user feedback, getting real edge cases thrown at it, improving week over week — that agent is infinitely more valuable than a perfect agent sitting in a development branch that nobody uses. The agents that exist today have been through 10 to 20 revision cycles each, trained on real user interactions, real failures, and real feedback. The Marketing Copywriter now writes copy that founders actually use on their landing pages. The Code Review agent catches architectural issues that junior developers miss. The Finance agent connects to real revenue data instead of imagining it.
This is the hardest lesson for perfectionists, and I am one. The instinct is to polish in private and reveal when it's ready. But "ready" never comes for AI agents because the edge cases are infinite. You have to let users find the edges. You have to let the agent fail in public. You have to be comfortable with "this is 60% of what it should be, but it's live and learning." Every week it gets better. That only happens if you ship.
The Numbers Today
Cross-session memory with auto-compaction at 40 messages. Zero agent fallbacks. Every agent fully implemented.
The Lesson That Keeps Repeating
If you're building AI agents — whether it's 1 or 100 — the lesson we keep learning is the same: start specific, give them real tools, give them memory, and ship before you're comfortable. Don't build a generalist and hope it figures things out. Don't ask an LLM to pretend it has capabilities — give it actual functions that return real data. Don't treat each conversation as isolated — build memory so the agent gets smarter over time. And don't wait until it's perfect, because it never will be.
The agent that works today — imperfectly, with rough edges, with failure modes you haven't discovered yet — is infinitely more valuable than the perfect agent that never ships.
That's how 54 agents got built. Not all at once. Not perfectly. But relentlessly — one lesson, one failure, one rewrite at a time. And we're not done. We never will be. That's the point.
See How We Compare
To the builders who ship before they're ready: you're our people.
Stay Updated with AI Insights
Get weekly tips on using AI to grow your business. No spam, unsubscribe anytime.
We respect your privacy. Unsubscribe at any time.
Related Articles
Waymaker rocks AND Waymaker sucks: an honest review from the founder
What's genuinely working in Waymaker, what still has rough edges, and what I'm betting wins anyway.
How We Built Waymaker: From Burnout to Building the Product OS I Wish I'd Had
After 20+ years building for other people's dreams and drowning in a $500/month SaaS stack that didn't talk to itself, I snapped. This is the honest story of how Waymaker went from a 3am prototype to 54 AI agents — and why faith, stubbornness, and sleepless nights were the only ingredients that mattered.
AI Consulting in West Palm Beach: What to Expect
What AI consulting actually looks like for a South Florida business — from first session to implementation.
Comments (0)
Comments are coming soon!