Building an Autonomous Cold Email Agent That Runs Lead Generation End to End
A walkthrough of how to wire up a self-operating outbound agent that provisions infrastructure, sources prospects, writes copy, launches campaigns, and handles replies without you babysitting it.
Most "AI cold email" builds floating around right now are dressed-up Zapier flows with a GPT call bolted on the side. They send a templated first touch, maybe classify a reply, and call it autonomous. That's not an agent. That's a macro.
What I want to walk through here is different: a single agent that behaves like an operator. It buys and monitors the sending infrastructure, pulls a lead list against an ICP you describe in plain English, writes the copy, builds the campaign inside a real sending platform, watches every reply come back in, and reports on what's degrading so it can rotate sending assets before deliverability craters. One human in the loop for approvals, everything else on rails.
The build below uses Claude Code as the construction environment and Smartlead as the sending engine underneath. You can swap pieces, but the architecture is the part worth stealing.
The five jobs an outbound agent actually has to do
Before touching any code, it helps to be precise about what the agent is replacing. A real outbound operator does five distinct jobs, and the agent needs to own all of them or it's just another script.
Infrastructure. Provision domains and inboxes, run warm-up, watch deliverability signals, and swap out anything that's tanking.
Data sourcing. Translate a written ICP into an actual list of contacts with verified emails.
Copywriting. Produce the initial message and the follow-up sequence, grounded in the product's actual value prop.
Campaign assembly. Push the list, the copy, the schedule, and the sender rotation into the platform and launch.
Reply handling. Read every incoming response, classify it, and respond to the interested ones without spamming auto-replies at out-of-office bounces.
If you can't point at where in your system each of those five jobs lives, you don't have an agent yet. You have a workflow.
Why a single API surface matters for lead generation
There's a temptation when you build something like this to stitch together six vendors: one for domains, one for warm-up, one for data, one for sending, one for reply parsing, one for reporting. It looks impressive on a diagram. It's a nightmare to operate.
The shortcut is to anchor the whole agent to one platform that already exposes most of those capabilities through a single API key, and only reach outside when you have to. In this build, Smartlead carries the weight. It owns the sending infrastructure, the warm-up monitoring, the campaign object, the reply webhooks, and through endpoints like search-contact, find-email, and smart-senders/place-order, it can also feed the agent prospects and even purchase additional sending assets.
That single-engine choice collapses the integration surface from "six vendors talking to each other" to "one API key, many endpoints." The agent becomes the brain. The platform is the body.
Setting up the build environment without overcomplicating it
The tooling stack is short:
Claude Code for writing and orchestrating the agent. The Max plan handles the kind of long planning sessions this build requires.
VS Code as the editor. Install the Claude Code extension and sign in.
WhisperFlow or any decent voice-to-text. When you're prompting Claude through a multi-component spec, dictating it is dramatically faster than typing it, and the output prompts tend to be more complete because you stop self-editing mid-sentence.
fly.io for hosting. Once the agent is built, it can't live on your laptop.
GitHub for the codebase, with API keys excluded from the repo.
Start Claude Code in plan mode before writing a line of code. Plan mode forces the model to map the entire architecture, surface gaps, and ask clarifying questions before generating anything. This is the single biggest difference between a build that ships and a build that turns into a tangled mess by hour three.
The brief-and-agent split
Here's the design decision that makes the whole thing reusable.
The agent itself should know nothing about your product. It should know how to run outbound. The product knowledge, the offer, the ICP, the guarantees, the discount structure, the positioning, all of that lives in a separate object called the brief.
When you want to run outbound for a different company next month, you don't rebuild the agent. You swap the brief. The agent loads it, learns the offer, and starts operating against the new context. That separation is what turns a one-off build into something you can actually run across multiple campaigns or, if you're an agency, multiple clients.
Make this split explicit when you prompt the planner. Otherwise the model will happily hard-code your product details into the agent's logic and you'll regret it the first time you try to point it at anything else.
What the agent can and can't do on day one
When you run plan mode against this spec, Claude will come back with an honest list of what's possible through public APIs and what isn't. A few things to expect:
Campaign creation, warm-up stats, and reply handling are fully covered. These are the core endpoints and they behave.
Domain and inbox provisioning is gated. The
smart-senders/place-orderendpoint exists, but access is granted through support rather than self-serve. For a v1, the cleanest move is to buy domains elsewhere, upload them, and let the agent monitor and flag rather than purchase.Lead sourcing through the platform's own prospect database works, once you confirm the right endpoints (
search-contactandfind-email) with support. You can also pair with a dedicated data provider if your ICP is niche, for example local businesses or ecommerce where specialized databases outperform general ones.
Don't fight the gated pieces in v1. Get the autonomous loop running with what's available, then layer in provisioning later.
Wiring up the deliverability loop
This is the part most "AI outbound" demos skip, and it's the part that determines whether the system is still working three months from now.
Every sending asset degrades. Inboxes that were landing in the primary tab last month start hitting promotions. Domains pick up complaints. If nothing is watching, your reply rate quietly falls off a cliff and you blame the copy.
The agent needs a scheduled job that does three things on a fixed cadence (monthly works for most senders, biweekly if you're sending at volume):
Pull deliverability stats for every active inbox and domain.
Identify the bottom performers, say the worst ten.
Rotate those out of active campaigns and swap in assets that have finished warm-up.
The report goes to Slack so a human can glance at it, but the rotation itself is automatic. Tired of worrying about deliverability? Check out Slicey.ai's Inboxes. Pairing managed inbox infrastructure with an agent that rotates assets on schedule is what keeps a cold email program from quietly dying.
How reply handling should actually behave
The naive version of reply handling is: webhook fires on any reply, agent generates a response, agent sends it. Don't do this.
The agent should classify first, then act. Positive and interested replies get a drafted response that either books a meeting or sends the requested information. Out-of-office gets ignored. Unsubscribe requests get logged and the contact gets suppressed across all campaigns, not just the one they replied to. Negative replies get logged for sequence analysis but don't get an apology email that re-engages someone who already said no.
Wire this up by registering a webhook in the platform's settings pointing at your deployed agent's reply endpoint, with the event type set to email reply at the account level. When a reply comes in, the platform pings the webhook, the webhook hands the message to the agent's classifier, and the classifier decides whether to draft, ignore, or escalate.
In testing, send yourself a campaign, reply with something like "I'm interested, can you send more information," and watch the agent draft and send a contextual response inside a few minutes. If you see the round trip working end to end on a real inbox, the loop is closed.
Deploying without turning it into a DevOps project
Local builds are fine for development. They're useless for an autonomous system because the second your laptop sleeps, the agent stops responding to webhooks.
fly.io is a reasonable default host for this kind of small, always-on service. It's cheap, it handles secrets, and Claude Code can run the deploy commands directly once you've created the account and dropped in a credit card. You don't need to know the deployment commands by heart. Ask the model what stage you're on and what you need to do next, and have it execute the commands it can run on its own.
For secrets, you need at minimum:
The Smartlead API key (Settings, API Key Management inside the platform)
The Anthropic API key for the agent's reasoning calls (Console, API Keys under organization settings)
The webhook URL the platform will hit when replies come in
Keep all of these out of the repo. Store them as deployment secrets. When you push to GitHub, double-check that nothing API-key-shaped slipped into a config file.
For human-in-the-loop control, the cleanest v1 is a command-line chat interface that lets you talk to the deployed agent from your terminal. Slack integration is nice but adds a meaningful amount of configuration work. Ship the CLI first, add Slack when the agent is stable.
Running the first real lead generation test
Once the agent is deployed, the test that actually matters isn't a unit test. It's this prompt: tell the agent to study the product brief, define the ICP across role, title, company size, and industry, pull a sample list of ten contacts with verified emails, write a first message and follow-up sequence grounded in the offer, build the campaign in the platform, and stop short of launching.
If the agent comes back with a campaign object that contains real contacts, three copy variations, a sensible send schedule, and an inbox assignment, the system works. You inspect it, tweak what needs tweaking, and click launch yourself.
This is the right level of autonomy for v1. The agent does the eight hours of operator work. You spend ten minutes approving the output. Over time, as you tighten the brief and improve the copy training, you can move more of the launch decision into the agent itself.
Where to push it next
The skeleton above is the floor, not the ceiling. The obvious upgrades from here:
Better copywriting training. Feed the agent your best-performing historical sequences as examples. Generic AI copy converts at generic AI rates.
Smarter reply classification. Move beyond positive/negative/OOO into intent buckets that map to specific draft templates.
Self-service provisioning. Once you have support access to the inbox purchase endpoint, let the agent buy and warm up new sending assets without human approval below a spend threshold.
Cross-campaign learning. Have the agent compare reply rates across ICP slices and recommend tightening or broadening the targeting.
None of those require rebuilding the foundation. They're additions to a system that already runs.
The point of all of this isn't that AI replaces the outbound operator. It's that the operator stops spending their week on infrastructure babysitting, list pulling, and replying to "sounds interesting, tell me more" emails, and starts spending it on the parts of lead generation that actually compound: positioning, offer design, and the conversations that turn into revenue.