Skip to main content
AI-Developer → AI Engineering

Build an Autonomous Research Agent with Claude: Web Search, PDF Downloads, and Conversational Memory

Learn to build a fully autonomous AI research agent using the Anthropic Messages API—with web search via Tavily, PDF downloading, multi-turn memory, and a Streamlit UI. Step-by-step from zero to production.

March 14, 2026
20 min read
#Claude#Anthropic#AI Agents#Tool Use#Tavily#Streamlit#Python#Autonomous Agent

Last week I spent 3 hours researching "how the body digests protein." I read 10 papers, downloaded PDFs, took notes, and synthesized findings. This week I built an autonomous agent that does exactly the same thing—in 5 minutes, without my involvement. Here's exactly how it works and how you can build it.

What the agent does:
  • Searches the web for research papers (Tavily API)
  • Downloads PDFs automatically
  • Reads and analyzes documents
  • Answers follow-up questions with memory
  • Runs a Streamlit web UI
What you'll learn:
  • The correct Anthropic Messages API + Tool Use pattern
  • How to implement conversational memory manually
  • How to build the agentic loop (tool → observe → act → repeat)
  • How to add a Streamlit UI with PDF viewer

Clearing Up a Common Misconception

Before writing a single line of code, let's address the most common mistake in "Claude Agent" tutorials.

❌ This Does NOT Exist
# This code will throw AttributeError
agent = Anthropic().agents.create(
  name="Research Agent",
  memory=True
)
session = agent.create_session()
response = session.query("How...")
The Anthropic Python SDK has NO agents attribute, no create_session(), and no query(). This is a fictional API.
✅ The Real API
# This is what actually works
client = Anthropic()

response = client.messages.create(
  model="claude-sonnet-4-6",
  max_tokens=4096,
  system="You are a research assistant",
  tools=[web_search_tool],
  messages=conversation_history
)
The real pattern: client.messages.create() with a manually managed conversation_history list and a tools array.

The key insight: There is no magic "agent" object. Autonomy comes from a loop you write yourself—calling messages.create(), reading the response, executing tools, feeding results back, and repeating until Claude stops requesting tools.


Architecture: The Four Components

┌───────────────────────────────────────────────────────────────────┐
│                     THE RESEARCH AGENT                            │
│                                                                   │
│  ┌─────────────────┐         ┌──────────────────────────────┐    │
│  │  SYSTEM PROMPT  │         │     CONVERSATION HISTORY     │    │
│  │                 │         │                              │    │
│  │ "You are an     │         │  [{role: "user",             │    │
│  │  expert         │         │    content: "Research..."},  │    │
│  │  research       │         │   {role: "assistant",        │    │
│  │  assistant..."  │         │    content: [tool_use_block]},│    │
│  └─────────────────┘         │   {role: "user",             │    │
│                               │    content: [tool_result]},  │    │
│  ┌─────────────────┐         │   {role: "assistant",        │    │
│  │  TOOLS          │         │    content: "Based on..."}]  │    │
│  │                 │         └──────────────────────────────┘    │
│  │ • web_search    │                                             │
│  │ • download_pdf  │              ↕ Passed to every API call     │
│  │ • read_file     │                                             │
│  └─────────────────┘                                             │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                  THE AGENTIC LOOP                          │  │
│  │                                                            │  │
│  │  messages.create() → stop_reason == "tool_use"?           │  │
│  │       YES → execute tool → add result → call again        │  │
│  │       NO  → extract text response → return to user        │  │
│  └────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────┘

Step 1: Basic Setup and Stateless Queries

Installation

pip install anthropic tavily-python requests streamlit python-dotenv

Create a .env file (add to .gitignore):

ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...

Your First Working Claude Query

# step1_basic.py
from anthropic import Anthropic
from dotenv import load_dotenv
import os

load_dotenv()

client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def query_claude(prompt: str, system: str = "You are a helpful research assistant.") -> str:
    """Single stateless query — no memory between calls."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

if __name__ == "__main__":
    result = query_claude("How does the human body digest protein? Explain the key enzymes involved.")
    print(result)

This works, but has a critical limitation: every call is independent. Ask a follow-up and Claude has no idea what you were discussing. That's where memory comes in.


Step 2: Conversational Memory

Memory in Claude is not a setting—it's a list you maintain. Every turn, you append both the user message and the assistant's response to conversation_history, then pass the entire list on the next API call.

# step2_memory.py
from anthropic import Anthropic
from dotenv import load_dotenv
import os

load_dotenv()
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

SYSTEM_PROMPT = """You are an expert research assistant specializing in biology and biochemistry.
When answering questions, be thorough and scientific. Reference previous context in the conversation."""

class ResearchAgent:
    """Research agent with multi-turn conversational memory."""

    def __init__(self, system: str = SYSTEM_PROMPT):
        self.system = system
        self.conversation_history: list[dict] = []

    def chat(self, user_message: str) -> str:
        """Send a message and maintain conversation history."""
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })

        # Call API with full history
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            system=self.system,
            messages=self.conversation_history  # Full history every time
        )

        assistant_text = response.content[0].text

        # Add assistant response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_text
        })

        return assistant_text

    def clear(self):
        """Start a new conversation."""
        self.conversation_history = []

    def token_count(self) -> int:
        """Rough estimate of tokens in history."""
        total_chars = sum(
            len(str(msg["content"])) for msg in self.conversation_history
        )
        return total_chars // 4  # ~4 chars per token


if __name__ == "__main__":
    agent = ResearchAgent()

    # Turn 1
    r1 = agent.chat("How does the body digest protein?")
    print(f"Turn 1:\n{r1}\n{'='*60}\n")

    # Turn 2 — Claude remembers the context from Turn 1
    r2 = agent.chat("What specific enzymes break down the peptide bonds you mentioned?")
    print(f"Turn 2:\n{r2}\n{'='*60}\n")

    # Turn 3 — Even deeper follow-up
    r3 = agent.chat("What happens if someone has a protease deficiency?")
    print(f"Turn 3:\n{r3}")
Why this works
On Turn 2, you're sending the full history: [user: "How does..."], [assistant: "Protein digestion involves..."], [user: "What specific enzymes..."]. Claude sees everything, so "the enzymes you mentioned" makes perfect sense.

Step 3: Tool Use — Web Search and PDF Download

Now we give the agent real capabilities. Claude's tool use works via function calling: you define tool schemas in JSON Schema format, Claude decides when to call them, and you execute the actual code.

Define the Tools

# tools.py
from tavily import TavilyClient
import requests
import os
import json
from dotenv import load_dotenv

load_dotenv()
tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

# --- TOOL SCHEMAS (what Claude sees) ---

WEB_SEARCH_TOOL = {
    "name": "web_search",
    "description": (
        "Search the web for research papers and scientific articles. "
        "Use this when you need current information, recent studies, or "
        "specific papers on a topic. Prefer academic sources."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query — be specific, include key terms"
            },
            "search_depth": {
                "type": "string",
                "enum": ["basic", "advanced"],
                "description": "'advanced' returns more academic results"
            },
            "max_results": {
                "type": "integer",
                "description": "Number of results (1-10, default 5)"
            }
        },
        "required": ["query"]
    }
}

DOWNLOAD_PDF_TOOL = {
    "name": "download_pdf",
    "description": (
        "Download a PDF from a URL and save it locally. "
        "Use this when you find a PDF link in search results that contains "
        "relevant research you want to analyze in depth."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "url": {
                "type": "string",
                "description": "Direct URL to the PDF file"
            },
            "topic": {
                "type": "string",
                "description": "Research topic (used for folder organization)"
            }
        },
        "required": ["url", "topic"]
    }
}

# --- TOOL IMPLEMENTATIONS (what your code actually runs) ---

def execute_web_search(query: str, search_depth: str = "advanced", max_results: int = 5) -> dict:
    """Execute web search using Tavily API."""
    try:
        results = tavily.search(
            query=query,
            search_depth=search_depth,
            max_results=max_results,
            include_answer=True,  # Get Tavily's synthesized answer too
        )
        return {
            "success": True,
            "answer": results.get("answer", ""),
            "results": [
                {
                    "title": r.get("title", ""),
                    "url": r.get("url", ""),
                    "content": r.get("content", "")[:500],  # First 500 chars
                    "score": r.get("score", 0),
                }
                for r in results.get("results", [])
            ]
        }
    except Exception as e:
        return {"success": False, "error": str(e)}


def execute_download_pdf(url: str, topic: str) -> dict:
    """Download PDF from URL and save locally."""
    try:
        # Create topic-specific folder
        safe_topic = "".join(c for c in topic if c.isalnum() or c in " -_").strip()
        folder = os.path.join("papers", safe_topic[:50])
        os.makedirs(folder, exist_ok=True)

        # Derive filename from URL
        filename = url.split("/")[-1].split("?")[0]
        if not filename.endswith(".pdf"):
            filename += ".pdf"
        filepath = os.path.join(folder, filename)

        # Download with timeout and size limit (50MB)
        response = requests.get(url, timeout=30, stream=True)
        response.raise_for_status()

        content_type = response.headers.get("content-type", "")
        if "pdf" not in content_type and not url.endswith(".pdf"):
            return {"success": False, "error": f"URL does not appear to be a PDF (content-type: {content_type})"}

        size = 0
        with open(filepath, "wb") as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
                size += len(chunk)
                if size > 50 * 1024 * 1024:
                    return {"success": False, "error": "File too large (>50MB)"}

        return {
            "success": True,
            "filepath": filepath,
            "size_kb": size // 1024,
            "message": f"PDF saved to {filepath}"
        }
    except requests.exceptions.Timeout:
        return {"success": False, "error": "Download timed out after 30s"}
    except Exception as e:
        return {"success": False, "error": str(e)}


# Dispatch table — maps tool name to implementation
TOOL_REGISTRY = {
    "web_search": execute_web_search,
    "download_pdf": execute_download_pdf,
}

ALL_TOOLS = [WEB_SEARCH_TOOL, DOWNLOAD_PDF_TOOL]

The Agentic Loop

This is the core of autonomous behavior. The loop continues as long as Claude's stop_reason is "tool_use":

# step3_agent_with_tools.py
from anthropic import Anthropic
from tools import ALL_TOOLS, TOOL_REGISTRY
from dotenv import load_dotenv
import os
import json

load_dotenv()
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

RESEARCH_SYSTEM = """You are an autonomous research assistant with access to web search and PDF downloading.

When given a research topic:
1. Search for recent, authoritative papers and articles
2. Download any PDFs that look especially relevant
3. Synthesize what you find into a clear, well-cited summary
4. Be specific about where each piece of information comes from

Work autonomously — use your tools proactively without waiting to be asked."""

class AutonomousResearchAgent:
    """Research agent with tool use and agentic loop."""

    def __init__(self):
        self.conversation_history: list[dict] = []
        self.tools_used: list[str] = []

    def _execute_tool(self, tool_name: str, tool_input: dict) -> str:
        """Execute a tool and return its result as JSON string."""
        if tool_name not in TOOL_REGISTRY:
            return json.dumps({"error": f"Unknown tool: {tool_name}"})

        print(f"  🔧 [{tool_name}] {json.dumps(tool_input)[:100]}...")
        result = TOOL_REGISTRY[tool_name](**tool_input)
        self.tools_used.append(tool_name)
        return json.dumps(result)

    def research(self, topic: str) -> str:
        """
        Run autonomous research on a topic.
        The agentic loop continues until Claude stops requesting tools.
        """
        self.conversation_history.append({
            "role": "user",
            "content": f"Research this topic thoroughly: {topic}"
        })

        print(f"\n🔬 Researching: {topic}")

        iteration = 0
        max_iterations = 10  # Safety limit to prevent runaway loops

        while iteration < max_iterations:
            iteration += 1

            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=4096,
                system=RESEARCH_SYSTEM,
                tools=ALL_TOOLS,
                messages=self.conversation_history
            )

            if response.stop_reason == "end_turn":
                # Claude is done — extract the text response
                final_text = next(
                    (block.text for block in response.content if hasattr(block, "text")),
                    "Research complete."
                )
                self.conversation_history.append({
                    "role": "assistant",
                    "content": final_text
                })
                return final_text

            elif response.stop_reason == "tool_use":
                # Claude wants to call one or more tools
                # First, record the full assistant message (including tool_use blocks)
                self.conversation_history.append({
                    "role": "assistant",
                    "content": response.content  # Keep as list of content blocks
                })

                # Execute all tool calls in this response
                tool_results = []
                for block in response.content:
                    if block.type == "tool_use":
                        tool_result = self._execute_tool(block.name, block.input)
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": tool_result
                        })

                # Add all tool results in a single user message
                self.conversation_history.append({
                    "role": "user",
                    "content": tool_results
                })

            else:
                # Unexpected stop reason
                break

        return "Research loop exceeded maximum iterations. Partial results may be available."

    def follow_up(self, question: str) -> str:
        """Ask a follow-up question after initial research."""
        self.conversation_history.append({
            "role": "user",
            "content": question
        })

        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            system=RESEARCH_SYSTEM,
            tools=ALL_TOOLS,
            messages=self.conversation_history
        )

        # Handle tool use in follow-ups too
        while response.stop_reason == "tool_use":
            self.conversation_history.append({
                "role": "assistant",
                "content": response.content
            })
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = self._execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            self.conversation_history.append({"role": "user", "content": tool_results})
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=2048,
                system=RESEARCH_SYSTEM,
                tools=ALL_TOOLS,
                messages=self.conversation_history
            )

        answer = next(
            (block.text for block in response.content if hasattr(block, "text")),
            "No answer."
        )
        self.conversation_history.append({"role": "assistant", "content": answer})
        return answer


if __name__ == "__main__":
    agent = AutonomousResearchAgent()

    # Autonomous research
    summary = agent.research("protein digestion enzymes and proteolysis mechanisms")
    print(f"\n📊 RESEARCH SUMMARY:\n{summary}")
    print(f"\nTools used: {agent.tools_used}")

    # Conversational follow-up (agent remembers everything above)
    answer = agent.follow_up("What's the clinical relevance of protease inhibitors?")
    print(f"\n💬 FOLLOW-UP:\n{answer}")

Step 4: Memory Compression for Long Sessions

As research sessions grow, conversation_history can hit Claude's context limit. Compress old turns into a summary while keeping recent turns verbatim:

# memory.py

def compress_history(
    history: list[dict],
    keep_recent: int = 6,
    max_summary_tokens: int = 800
) -> list[dict]:
    """
    Compress old conversation turns into a summary.
    Keeps the most recent `keep_recent` turns intact for full context.
    """
    if len(history) <= keep_recent:
        return history  # Nothing to compress

    old_turns = history[:-keep_recent]
    recent_turns = history[-keep_recent:]

    # Build a text representation of old turns for summarization
    old_text = "\n".join(
        f"{msg['role'].upper()}: {str(msg['content'])[:200]}"
        for msg in old_turns
        if isinstance(msg.get("content"), str)
    )

    # Summarize with Claude
    summary_response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Fast, cheap model for summaries
        max_tokens=max_summary_tokens,
        messages=[{
            "role": "user",
            "content": f"Summarize this research conversation concisely, preserving key facts and findings:\n\n{old_text}"
        }]
    )
    summary = summary_response.content[0].text

    # Replace old turns with a single summary system context
    compressed = [
        {
            "role": "user",
            "content": f"[Previous research context summary]\n{summary}"
        },
        {
            "role": "assistant",
            "content": "Understood. I have the previous research context. Please continue."
        }
    ] + recent_turns

    return compressed

Step 5: Streamlit Web Interface

# app.py
import streamlit as st
from step3_agent_with_tools import AutonomousResearchAgent
import os
import glob

st.set_page_config(
    page_title="Research Agent",
    page_icon="🔬",
    layout="wide"
)

st.title("🔬 Autonomous Research Agent")
st.caption("Powered by Claude + Tavily Search")

# Initialize agent in session state (persists across reruns)
if "agent" not in st.session_state:
    st.session_state.agent = AutonomousResearchAgent()
if "chat_display" not in st.session_state:
    st.session_state.chat_display = []  # [(role, message), ...]

# ── Layout ──────────────────────────────────────────────
col_chat, col_files = st.columns([2, 1])

with col_chat:
    st.subheader("Research Chat")

    # Show conversation history
    for role, message in st.session_state.chat_display:
        with st.chat_message(role):
            st.markdown(message)

    # Input area
    if topic := st.chat_input("Enter a research topic or follow-up question..."):
        st.session_state.chat_display.append(("user", topic))
        with st.chat_message("user"):
            st.markdown(topic)

        with st.chat_message("assistant"):
            with st.spinner("Researching... (this may take 30–60 seconds)"):
                if len(st.session_state.chat_display) == 1:
                    # First message — run full autonomous research
                    result = st.session_state.agent.research(topic)
                else:
                    # Follow-up — use conversational memory
                    result = st.session_state.agent.follow_up(topic)

            st.markdown(result)
            st.session_state.chat_display.append(("assistant", result))

        if st.session_state.agent.tools_used:
            st.caption(f"Tools used: {', '.join(set(st.session_state.agent.tools_used))}")

with col_files:
    st.subheader("Downloaded Papers")

    # Show all downloaded PDFs
    pdf_files = glob.glob("papers/**/*.pdf", recursive=True)

    if pdf_files:
        selected = st.selectbox(
            "Select paper to view",
            pdf_files,
            format_func=lambda p: os.path.basename(p)
        )
        if selected:
            with open(selected, "rb") as f:
                st.download_button(
                    label=f"⬇️ Download {os.path.basename(selected)}",
                    data=f,
                    file_name=os.path.basename(selected),
                    mime="application/pdf"
                )
            st.caption(f"Saved to: {selected}")
    else:
        st.info("No PDFs downloaded yet. Research a topic to find papers.")

    # New research button
    if st.button("🔄 Start New Research"):
        st.session_state.agent = AutonomousResearchAgent()
        st.session_state.chat_display = []
        st.rerun()

Run with:

streamlit run app.py

The ReAct Pattern: Structured Reasoning

For complex research tasks, you can prompt Claude to explicitly reason before acting using the ReAct (Reasoning + Acting) pattern:

REACT_SYSTEM = """You are a research agent that thinks before acting.

For each step, structure your internal reasoning as:
THOUGHT: [What do I need to find? What's my next step?]
ACTION: [Which tool to use and why]
OBSERVATION: [What did I find?]
... (repeat)
CONCLUSION: [Final synthesis]

This explicit reasoning makes your research more thorough and traceable."""

When Claude receives this prompt, it naturally produces structured reasoning traces that you can parse, log, and display to users—making the agent's decision-making transparent and auditable.


Advanced: Parallel Tool Calls

Claude can request multiple tool calls in a single response. The agentic loop already handles this—when response.content contains multiple tool_use blocks, the code processes all of them:

# Multiple tool calls in one response
tool_results = []
for block in response.content:
    if block.type == "tool_use":
        # Execute each tool (could be parallelized with threading)
        tool_result = self._execute_tool(block.name, block.input)
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": tool_result
        })

# All results sent back in one message
self.conversation_history.append({
    "role": "user",
    "content": tool_results  # List with multiple tool_result blocks
})

For I/O-bound tools like web search and PDF download, use concurrent.futures.ThreadPoolExecutor to execute them in parallel, reducing wait time from N×latency to ~1×latency.


Security and Best Practices

🔐 API Key Safety
  • Always use python-dotenv with .env files
  • Add .env to .gitignore immediately
  • In production: use AWS Secrets Manager or similar
  • Rotate API keys if they're accidentally committed
🛡️ Tool Safety
  • Validate tool inputs before executing them
  • Limit file download sizes (implemented: 50MB cap)
  • Sanitize filenames from URLs before saving
  • Add max_iterations to prevent infinite agentic loops
💰 Cost Control
  • Monitor tokens with response.usage
  • Use claude-haiku-4-5 for summarization tasks
  • Cache search results to avoid repeat API calls
  • Set reasonable max_tokens limits per call
🔁 Memory Management
  • Check response.usage.input_tokens on each call
  • Compress history when approaching context limit
  • claude-sonnet-4-6 supports 200K token context
  • Tool results can be large — truncate if needed

Performance Benchmarks

Configuration Response Time Typical Use
Stateless query (no tools) 2–5 sec Quick factual Q&A
Conversational (with history) 3–8 sec Multi-turn research dialogue
Tool use (1 web search) 8–15 sec Web-grounded research
Full autonomous loop (3–5 tools) 30–90 sec End-to-end research task
PDF download + analysis 45–120 sec Document-heavy research

Research task comparison:

Approach Time Papers Found Human Effort
Manual research (Google Scholar) 3+ hours 10–20 High
Agent (single topic) 5–15 min 5–10 Minimal
Agent (with follow-ups) 15–30 min 10–20 Low

Common Mistakes

Mistake Consequence Fix
Using fictional agents.create() API AttributeError at runtime Use client.messages.create() with conversation history
Not adding tool_use blocks to history Claude loses track of what it called Append response.content (not just text) when stop_reason == "tool_use"
Sending tool results as role: "assistant" API validation error Tool results must be role: "user" with type: "tool_result"
No max_iterations limit Infinite loop, runaway API costs Add while iteration < max_iterations guard
Not handling stop_reason == "tool_use" Agent stops after one tool call Loop until stop_reason == "end_turn"
Hardcoding API keys Security breach Use os.getenv() with .env files
Requesting ALL tools every call Confuses Claude with irrelevant tools Only include tools relevant to the task

Key Takeaways

What to remember from this article:
1
There is no "Agent SDK" — use the Messages API. The Anthropic Python SDK uses client.messages.create(). Any tutorial using agents.create() is fictional. Autonomy comes from your agentic loop, not a magic API.
2
Memory is a list you manage. Append every user message and every assistant response (including raw content blocks, not just text) to conversation_history and pass it on every API call.
3
The agentic loop pattern is simple. While stop_reason == "tool_use": execute tools, add results as role: "user" with type: "tool_result", call API again. Exit when stop_reason == "end_turn".
4
Build incrementally. Stateless query → add conversation_history → add tools → add agentic loop → add UI. Each step is independently testable. Jumping straight to autonomy makes bugs nearly impossible to diagnose.
5
Add safety limits from the start. max_iterations, download size caps, input validation, and token monitoring prevent runaway costs and unexpected failures in production.
MH

Mohamed Hamed

20 years building production systems — the last several deep in AI integration, LLMs, and full-stack architecture. I write what I've actually built and broken. If this was useful, the next one goes to LinkedIn first.

Follow on LinkedIn →