Building Your First Agent with the Claude API

This lesson walks through building a working agent end to end. We will build a small reconnaissance agent that can search the web and summarise findings on a target. You will see every component of the ReAct loop wired up with real code, and you will see the failure modes you have to handle in production. By the end you will have something that runs and a clear map of where the sharp edges live.

We are using Python for clarity (the Anthropic SDK is identical in shape across languages). The architecture transfers directly to TypeScript.

The skeleton

Every agent has the same shape. Here is the entire control flow in eighty lines:

import anthropic
from dataclasses import dataclass, field
 
client = anthropic.Anthropic()
 
MAX_ITERATIONS = 10
MODEL = "claude-opus-4-5"
SYSTEM_PROMPT = """You are a reconnaissance assistant. You help a security
researcher gather public information about a target. Use the provided tools
to search the web and read pages. Be thorough but concise. When you have
enough information to answer the user, respond with your findings.
 
Do not make assumptions about the target. Only state what you have verified
through tool calls.
"""
 
@dataclass
class AgentState:
    messages: list = field(default_factory=list)
    iteration: int = 0
    tool_call_count: int = 0
 
def run_agent(user_query: str, tools: list, tool_handlers: dict) -> str:
    state = AgentState()
    state.messages.append({"role": "user", "content": user_query})
 
    while state.iteration < MAX_ITERATIONS:
        state.iteration += 1
 
        response = client.messages.create(
            model=MODEL,
            max_tokens=4096,
            system=SYSTEM_PROMPT,
            tools=tools,
            messages=state.messages,
        )
 
        # Append the assistant turn to history so the model sees its own moves
        state.messages.append({"role": "assistant", "content": response.content})
 
        if response.stop_reason == "end_turn":
            # Model is done. Extract the final text.
            text_blocks = [b for b in response.content if b.type == "text"]
            return "".join(b.text for b in text_blocks)
 
        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type != "tool_use":
                    continue
                state.tool_call_count += 1
                result = dispatch_tool(block, tool_handlers)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
            state.messages.append({"role": "user", "content": tool_results})
            continue
 
        # Any other stop reason is unexpected. Stop cleanly.
        return f"[agent halted: {response.stop_reason}]"
 
    return "[agent halted: max iterations reached]"
 
 
def dispatch_tool(block, handlers: dict) -> str:
    handler = handlers.get(block.name)
    if not handler:
        return f"[error: unknown tool {block.name}]"
    try:
        return handler(**block.input)
    except Exception as e:
        # Always return a string to the model. Never crash the loop.
        return f"[error: {type(e).__name__}: {e}]"

Read that twice. The model produces messages. Some messages are final answers (stop_reason == "end_turn"). Some are tool calls (stop_reason == "tool_use"). Tool calls get executed, results get appended to the message list, and the loop continues. There is no magic here, only a state machine.

Defining tools

Tools are dictionaries with a name, a description, and an input schema. The description is what the model uses to decide when to call the tool. Be specific.

TOOLS = [
    {
        "name": "search_web",
        "description": (
            "Search the public web using DuckDuckGo. Returns the top results "
            "with title, url, and snippet. Use this for general information "
            "gathering. Do not use this for accessing internal company "
            "systems."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query. Be specific.",
                },
                "limit": {
                    "type": "integer",
                    "default": 5,
                    "minimum": 1,
                    "maximum": 10,
                },
            },
            "required": ["query"],
        },
    },
    {
        "name": "fetch_page",
        "description": (
            "Fetch a public URL and return the text content. Use this when "
            "search results have a promising URL you want to read in full. "
            "Returns up to 8000 characters of text."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "Full HTTP or HTTPS URL.",
                },
            },
            "required": ["url"],
        },
    },
]

Notice the constraints: the schema enforces limit between 1 and 10, and url is required. The model literally cannot call the tool with limit=10000 or with no URL. This is your first layer of defense and it costs nothing.

The handler functions:

import requests
from urllib.parse import quote
 
def handle_search_web(query: str, limit: int = 5) -> str:
    # Real implementation would call a search API. This is illustrative.
    resp = requests.get(
        f"https://duckduckgo.com/?q={quote(query)}&format=json",
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()
    results = data.get("Results", [])[:limit]
    return "\n".join(f"{r['Title']} - {r['FirstURL']}" for r in results)
 
def handle_fetch_page(url: str) -> str:
    if not url.startswith(("http://", "https://")):
        return "[error: only http and https URLs are allowed]"
    # Block private/internal addresses (basic SSRF protection)
    if any(host in url for host in ("localhost", "127.", "169.254.", "10.", "192.168.")):
        return "[error: private addresses not allowed]"
    resp = requests.get(url, timeout=15, headers={"User-Agent": "ReconAgent/1.0"})
    resp.raise_for_status()
    return resp.text[:8000]
 
TOOL_HANDLERS = {
    "search_web": handle_search_web,
    "fetch_page": handle_fetch_page,
}

The handlers are normal Python functions. They take typed arguments, do their work, return a string. Note the SSRF guard in fetch_page: the model will yes try to fetch http://localhost/ or http://169.254.169.254/latest/meta-data/ if an attacker can influence the URL. You stop that at the handler level, not in the prompt.

Running it

if __name__ == "__main__":
    result = run_agent(
        "Find recent public CVEs affecting nginx in 2025 and summarise the top 3.",
        tools=TOOLS,
        tool_handlers=TOOL_HANDLERS,
    )
    print(result)

When you run this, the agent will call search_web once or twice, possibly call fetch_page on a promising result, then synthesise a summary. Each model call is logged in state.messages so you can replay the entire trajectory after the fact.

The failure modes you have to handle

Five things will go wrong, in roughly this order of frequency.

1. Tool errors. The web is unreliable. APIs rate-limit you, pages 500, JSON parsers choke on bad data. Every tool handler must catch its own exceptions and return a string error to the model. Never let an exception kill the loop, and never crash on the user's behalf. The model is surprisingly good at recovering from a string like [error: rate limit; retry after 30s]. It will choose a different action.

2. Token overflow. Long tool results balloon the context. A page fetch returning 50KB of HTML gets appended verbatim and the next iteration costs more. Truncate tool outputs at the handler level (notice the [:8000] slice). For long-running agents, add a summarisation step that compresses old turns when the context gets large.

3. Infinite loops. The model decides it does not have enough information, calls another search, gets more results, decides it still does not have enough, and so on. Without MAX_ITERATIONS you are at the mercy of the model's patience. Ten iterations is a reasonable starting bound. Adjust based on the task.

4. Stochastic refusals. Sometimes the model decides the request looks suspicious and refuses to use tools. It will produce a polite "I cannot help with that" and end_turn. For legitimate use, you can usually tighten the system prompt to clarify intent. For testing, run multiple seeds before concluding a capability is unavailable.

5. Malformed tool calls. Rare with frontier models, but a less capable model can produce a tool call that fails schema validation or passes weird types. The SDK usually catches this, but if you are building your own tool dispatch, wrap parameter binding in a try/except and return an informative error to the model so it can correct.

Streaming for production UX

The blocking example above is fine for scripts and CLIs. For interactive UIs, you want streaming so the user sees the model thinking. The Anthropic SDK supports it:

with client.messages.stream(
    model=MODEL,
    max_tokens=4096,
    system=SYSTEM_PROMPT,
    tools=TOOLS,
    messages=state.messages,
) as stream:
    for event in stream:
        # event types: message_start, content_block_start,
        # content_block_delta, content_block_stop, message_delta, message_stop
        if event.type == "content_block_delta":
            yield event.delta.text  # send to UI
    final_message = stream.get_final_message()

Streaming complicates error handling because you can fail partway through. Your UI has to handle a half-streamed response gracefully. For a first agent, build the blocking version, get it working, then add streaming as a wrapper.

Security posture from day one

Two security defaults you bake in before you ever ship.

First, never let user input influence tool parameters without validation. The user query goes into the user message slot. The model decides what to put in the tool call. But the model can be tricked. So every tool handler validates its own inputs at the function level. The schema is layer one. The handler check is layer two.

Second, every tool call gets logged with the user ID and the full input. A successful injection attack will produce a tool call that does not match what the user asked for. The audit log is what lets you detect that after the fact and improve your defenses.

The lab linked at the top of this lesson has the full code, a Dockerfile, and exercises. Run it, then break it. The next lesson covers memory, which is where the architecture gets harder.