Build effective and cost-efficient AI agents

There's a moment every builder hits — you've got access to powerful LLMs, a shiny new agentic framework, and the energy to automate everything. So you start building. Agents everywhere. Tools calling tools. Memory layers on memory layers.

And then your cloud bill shows up.

I've been there. And honestly, watching companies like Microsoft and Uber publicly wrestle with runaway AI infrastructure costs was a wake-up call for me too. These aren't small teams fumbling around — these are organizations with some of the best AI engineers on the planet. And they're still figuring out how to make agentic systems not just work, but work efficiently.

So here's what I've personally been learning and unlearning.

1. Match the Model to the Job

This one took me longer to internalize than I'd like to admit: you don't need Claude or GPT-4 for every single step in your pipeline.

A well-prompted DeepSeek or a capable flash-class model can get you 99% of the way there on tasks like classification, extraction, structured formatting, or simple reasoning — at a fraction of the cost. The moment I stopped treating LLM selection as a one-size-fits-all decision and started matching model capability to task complexity, costs dropped noticeably.

And here's the thing — as models have gotten better, I think many of us have quietly let our prompt engineering get lazier. That's a mistake. Strong prompting still matters enormously, especially when you're trying to squeeze real performance out of a lighter model. Don't outsource your thinking to the most expensive model available when a well-crafted prompt to a cheaper one does the job.

2. Context Engineering is Everything

I've seen agents fail not because of bad models or broken logic — but because they were drowning in noise.

Dumping your entire knowledge base into context is not engineering, it's hoping. What you actually want is a clean, well-structured knowledge layer where the agent can ask for exactly what it needs — and receive back only what it needs, nothing more.

This applies to tools too. Every tool in your agentic system should be designed with surgical precision: clear inputs, scoped outputs, no data dumps. When an agent calls a tool and gets back a wall of loosely structured JSON, it has to spend tokens just making sense of it. That's waste you're paying for.

Good context engineering is the difference between an agent that reasons cleanly and one that hallucinates under the weight of its own context window.

3. Not Everything Needs an Agent

This one is hard to say out loud in certain circles, but: most workflows don't need a full agent.

Agents are expensive by design. They carry instructions to handle many scenarios, they manage tools, they maintain memory, they loop until they reach a goal. All of that burns tokens — on every single run.

Before reaching for an agent, ask yourself: can I solve this with a deterministic workflow, a few LLM calls with structured output, and some conditional logic? More often than not, the answer is yes. A clean workflow with targeted LLM calls — one to extract, one to transform, one to summarize — is faster, cheaper, more debuggable, and honestly, easier to trust in production.

Build agents where genuine autonomy and adaptive reasoning are required. Not because it sounds impressive on a slide.

4. LLMs Are a Tool, Not the Default

This is the underlying principle behind all of the above.

Just because a language model is accessible doesn't mean it's always the right tool. Regex handles pattern matching better than an LLM. A database query retrieves structured data more reliably than prompting a model to "find" something. Deterministic logic is more trustworthy for branching decisions than an LLM-reasoned if/else.

The best AI systems I've seen aren't the ones that use AI the most — they're the ones that use AI precisely, reserving it for the steps where natural language understanding, generation, or fuzzy reasoning actually adds something that code alone can't.

5. Build Small, Focused Sub-Agents — Then Orchestrate Them Well

Here's something I've started doing that changed how my systems perform and what they cost: instead of building one big general-purpose agent with a massive system prompt, a long list of tools, and instructions to handle everything — I build small, purpose-built sub-agents, each designed for one specific outcome.

Each sub-agent gets its own tightly engineered prompt, only the tools it actually needs, and clear input/output contracts. A research sub-agent that only knows how to search and summarize. A data extraction sub-agent that only reads and structures. An action sub-agent that only writes or calls APIs. Focused. Contained. Predictable.

Then you build a clean orchestration layer that routes work to the right sub-agent based on what's needed at each step. This is where the real gains come from — the orchestrator doesn't need to be expensive either, it just needs to be smart about routing. Fewer tokens wasted on irrelevant instructions, fewer hallucinations from overloaded context, better debuggability because each unit has a single responsibility.

For production-grade systems, I've been using Mastra — an open-source TypeScript framework that makes this pattern genuinely easy to build. It handles agent orchestration, tool definitions, memory, and step-based workflows with a clean developer experience that doesn't get in your way. It's solid, it scales, and it's built for the real world — not just demos.

The mental model shift is simple: stop thinking "one smart agent that handles everything" and start thinking "a well-designed team of focused agents, coordinated by a clear orchestration layer." That's how you build systems that are both powerful and efficient.

The Real Goal

We're not here to build AI agents. We're here to build systems that create real value by automating real problems.

Sometimes that's a fully autonomous multi-agent system. Often, it's a workflow with two LLM calls and a well-placed conditional. The measure of good engineering isn't how much AI you used — it's how much value you delivered, at what cost, with how much reliability.

That's the shift I keep coming back to. Build for outcomes, not optics.

Building effective and cost-efficient AI agents

1. Match the Model to the Job

2. Context Engineering is Everything

3. Not Everything Needs an Agent

4. LLMs Are a Tool, Not the Default

5. Build Small, Focused Sub-Agents — Then Orchestrate Them Well

The Real Goal

Comments

More from this blog

NFT.JS - A Javascript library to implement NFTs

Command Palette

1. Match the Model to the Job

2. Context Engineering is Everything

3. Not Everything Needs an Agent

4. LLMs Are a Tool, Not the Default

5. Build Small, Focused Sub-Agents — Then Orchestrate Them Well

The Real Goal

Comments

More from this blog