54 AI Agents Inside Aurelius: Our Claude Code Framework

Most AI coding tools are generalists. You open a chat window, describe what you want, and a single model does its best to be a frontend developer, a backend engineer, a designer, a tester, and a copywriter all at once. That works fine for a quick script or a throwaway prototype. It falls apart the moment you try to ship real product work for a client, because real product work isn't one job. It's a dozen jobs, each with its own standards, and a generalist quietly cuts corners on most of them.

So we built something different. It's called Aurelius, named after Marcus Aurelius, and it's the framework that sits behind how PMDS actually ships apps. Aurelius is a Claude Code-integrated React framework where 54 specialized AI agents, each one auto-selected based on the task at hand, handle engineering, design, testing, marketing, and ops. Instead of one model pretending to be good at everything, you get the right specialist for each step, working inside a pipeline that refuses to skip the boring-but-critical parts.

I build websites and apps for small businesses. That's the lens I see everything through. A small-business client doesn't care how clever your AI setup is. They care whether the thing works, looks like their brand, loads fast, and doesn't break next month. Aurelius is how we get there consistently, and this post is a walk through what's actually inside it.

What Aurelius Actually Is

At its core, Aurelius is an AI development framework that plugs directly into Claude Code. It's multi-framework on the output side, so it can target Next.js or Vite depending on the project, and everything it produces is TypeScript with Tailwind CSS. The headline feature is that it turns a design into working, tested code automatically, rather than leaving you to hand-translate a Figma file into components and hope you got the spacing right.

The capabilities that matter most are easy to list: 54 specialized agents, 20 reusable skills, a 10-phase design-to-React pipeline, and app-type awareness so it knows whether it's building a standard web app, a Chrome extension, or a PWA and adjusts accordingly. On top of that sits a full testing stack: Vitest and React Testing Library for unit and component tests, Playwright for end-to-end flows, Storybook for component documentation, and pixel-diff QA to catch visual regressions a human would miss. None of that is decoration. Each piece is there because skipping it is how client projects go sideways.

The 54 AI Agents for App Development, by Category

The heart of Aurelius is its roster of 54 agents, organized into nine categories. I won't list all 54 here, but walking through the categories with a few standouts gives you the shape of it. The point isn't the headcount. It's that each agent has a narrow job and does it well, instead of one model spreading itself thin across all of them.

Engineering Agents (12)

This is the largest group, and it covers the actual building. The frontend-developer and backend-architect agents handle the obvious split between UI and data. The rapid-prototyper exists for when you need a working shell fast. But the more interesting ones are the specialists: test-writer-fixer writes and repairs tests, error-boundary-architect makes sure failures are caught and handled instead of crashing the whole app, and migration-specialist deals with the messy work of moving schemas and data. There's also an i18n-engineer for internationalization, an animation-optimizer for smooth motion that doesn't tank performance, and a bundle-analyzer that keeps your JavaScript payload from quietly ballooning. That last one matters more than people think for the kind of fast, lightweight sites I aim for.

Design and Design-to-Code Agents (5 + 7)

The design group includes a ui-designer for layout and visual decisions, a ux-researcher for thinking through how people actually use a thing, and a brand-guardian whose entire job is making sure the output stays on-brand. Then there's a separate design-to-code group of seven agents built around converting designs into components. The figma-react-converter is the flagship, but there's also a canva-react-converter plus converters targeting Astro, Vue, Svelte, and React Native, and an asset-cataloger that organizes images and icons pulled out of a design. This is the bridge between a designer's file and shipped code, and it's the part that usually eats the most manual hours when done by hand.

Testing, Marketing, and Ops Agents

The remaining categories round out the lifecycle. Testing agents own the Vitest, Playwright, and pixel-diff work so coverage isn't an afterthought. The marketing and content agents handle the words: landing-page copy, app store listings, and the kind of clear, no-fluff writing that small-business sites live or die on. And the ops agents deal with the unglamorous infrastructure and project-coordination work that keeps a build moving. Together with engineering and design, these nine categories add up to the full 54, which is enough specialists to cover an entire product from first sketch to launch without a generalist having to fake any of it.

How They Work Together: The 10-Phase Figma-to-React Pipeline

Agents on their own are just a toolbox. What makes Aurelius useful is how they're orchestrated. The signature workflow is the Figma-to-React pipeline, and it runs from a single command: /build-from-figma <url>. You point it at a design, and ten phases run in order, each handing off to the next.

Phase 0 — Token sync and drift check: pull the design tokens (colors, spacing, type) and detect any drift from what's already in the codebase.
Phase 1 — Intake and build-spec: read the design and produce a concrete build specification.
Phase 2 — Token lock: freeze the design tokens so nothing downstream can quietly hardcode a value.
Phase 3 — TDD gate: write failing tests first. This is mandatory, not optional.
Phase 4 — Build to pass tests: write the actual components until those tests go green.
Phase 5 — Visual pixel-diff loop: compare the rendered result against the design with a 2% threshold, iterating up to five times.
Phase 6 — End-to-end testing: run Playwright flows that are aware of the app type being built.
Phase 7 — Cross-browser checks: verify it holds up across browsers.
Phase 8 — Quality gate: enforce test coverage, TypeScript correctness, a clean build, token compliance, and Lighthouse scores.
Phase 8.5 — Responsive pass: confirm it works across screen sizes.
Phase 9 — Report: produce a summary of what was built and how it scored.

There's also a Canva variant of the pipeline for designs that start in Canva instead of Figma, which is handy because plenty of small-business clients hand me Canva files rather than polished Figma boards. The important thing is the order: tests before code, tokens locked before building, and a quality gate that the work has to clear before it's considered done.

Why Specialization Beats One Big Generalist

It's fair to ask why any of this is better than just asking a capable model to build the whole thing. The answer is in the guardrails that specialization makes possible. Enforced TDD means tests get written first, every time, instead of being skipped under time pressure the way they always are when a generalist is in a hurry. Locked design tokens mean no hardcoded hex values or random margins creeping in, so the design stays consistent across the whole app. Pixel-diff QA means visual correctness is measured against the design at a 2% threshold instead of being eyeballed and waved through. And app-type-aware end-to-end testing means the Playwright tests for a Chrome extension are different from the ones for a standard web app, because those things genuinely behave differently.

For client work, that translates directly into quality and consistency. The same standards apply whether I'm building a brochure site for a local service business or something more involved. The framework doesn't get tired, doesn't cut the corner nobody will notice until launch day, and doesn't forget to check the mobile layout. That reliability is the whole point.

Proof and Lineage: Battle-Tested on Real Products

None of this is theoretical. The Chrome-extension side of the pipeline was hardened against a real product: AI SEO Copilot, which we rebuilt from a Webflow app into a free, open-source Chrome extension. I wrote about that rebuild in why we're bringing AI SEO Copilot to Chrome, and you can install the result from the Chrome Web Store listing. Shipping a real extension is exactly how the app-type-aware parts of Aurelius got tested in anger rather than in a demo.

Aurelius also has siblings. Flavian is our framework for WordPress work, and it grew out of the same instinct toward specialized agents that I described in my post on the two-agent workflow I use for WordPress. Nerva handles API and backend work, and Claudius is the chat widget framework, the same one powering the assistant on this very site. Each one applies the same philosophy to a different domain. And when the job is a full-stack product rather than a marketing site, the lessons carry over directly from builds like the Bridleway marketplace case study, where careful, tested architecture mattered far more than raw speed.

The design-to-code agents earn their keep on the fiddly real-world problems too. Anyone who has fought with platform quirks knows the work isn't just generating components; it's handling the edge cases, like the kind of API limitation I documented in my Webflow Designer API alt text workaround. Agents that specialize are simply better at carrying that kind of context than one model juggling everything at once.

Where This Fits at PMDS

Aurelius is the engine behind how PMDS ships. It's not a product I'm selling you; it's the framework that lets a one-person studio deliver work that holds up to the standards a much larger team would apply. If you're a developer who wants to look under the hood, the code is on the Aurelius GitHub repo.

And if you're a small-business owner who just wants a fast, well-built site or app without worrying about any of this, that's the easier path. You get the benefit of the framework without ever having to think about it. Get in touch and let's talk about what you're trying to build. I work with small businesses in Baltimore, across Maryland, and remotely nationwide.

54 Specialized AI Agents Inside Aurelius: Our Claude Code App-Dev Framework

What Aurelius Actually Is

The 54 AI Agents for App Development, by Category

Engineering Agents (12)

Design and Design-to-Code Agents (5 + 7)

Testing, Marketing, and Ops Agents

How They Work Together: The 10-Phase Figma-to-React Pipeline

Why Specialization Beats One Big Generalist

Proof and Lineage: Battle-Tested on Real Products

Where This Fits at PMDS

Your digital website sherpa: practical web tips for small businesses, straight to your inbox

Paul Mulligan

Support My Open Source Work

Related Articles

Why We're Bringing AI SEO Copilot to Chrome (And What It Took to Build It)

Two AI Agents for WordPress: My Dev Workflow

Building Bridleway: A Next.js Horse Marketplace with Escrow, Auction Aggregation, and Verified Sellers

Ready to Transform Your Business's Website?