Last week I spent 3 hours researching "how the body digests protein." I read 10 papers, downloaded PDFs, took notes, and synthesized findings. This week I built an autonomous agent that does exactly the same thing—in 5 minutes, without my involvement. Here's exactly how it works and how you can build it.
- Searches the web for research papers (Tavily API)
- Downloads PDFs automatically
- Reads and analyzes documents
- Answers follow-up questions with memory
- Runs a Streamlit web UI
- The correct Anthropic Messages API + Tool Use pattern
- How to implement conversational memory manually
- How to build the agentic loop (tool → observe → act → repeat)
- How to add a Streamlit UI with PDF viewer
Clearing Up a Common Misconception
Before writing a single line of code, let's address the most common mistake in "Claude Agent" tutorials.
agent = Anthropic().agents.create(
name="Research Agent",
memory=True
)
session = agent.create_session()
response = session.query("How...")
agents attribute, no create_session(), and no query(). This is a fictional API.client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system="You are a research assistant",
tools=[web_search_tool],
messages=conversation_history
)
client.messages.create() with a manually managed conversation_history list and a tools array.The key insight: There is no magic "agent" object. Autonomy comes from a loop you write yourself—calling messages.create(), reading the response, executing tools, feeding results back, and repeating until Claude stops requesting tools.
Architecture: The Four Components
┌───────────────────────────────────────────────────────────────────┐
│ THE RESEARCH AGENT │
│ │
│ ┌─────────────────┐ ┌──────────────────────────────┐ │
│ │ SYSTEM PROMPT │ │ CONVERSATION HISTORY │ │
│ │ │ │ │ │
│ │ "You are an │ │ [{role: "user", │ │
│ │ expert │ │ content: "Research..."}, │ │
│ │ research │ │ {role: "assistant", │ │
│ │ assistant..." │ │ content: [tool_use_block]},│ │
│ └─────────────────┘ │ {role: "user", │ │
│ │ content: [tool_result]}, │ │
│ ┌─────────────────┐ │ {role: "assistant", │ │
│ │ TOOLS │ │ content: "Based on..."}] │ │
│ │ │ └──────────────────────────────┘ │
│ │ • web_search │ │
│ │ • download_pdf │ ↕ Passed to every API call │
│ │ • read_file │ │
│ └─────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ THE AGENTIC LOOP │ │
│ │ │ │
│ │ messages.create() → stop_reason == "tool_use"? │ │
│ │ YES → execute tool → add result → call again │ │
│ │ NO → extract text response → return to user │ │
│ └────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
Step 1: Basic Setup and Stateless Queries
Installation
pip install anthropic tavily-python requests streamlit python-dotenv
Create a .env file (add to .gitignore):
ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...
Your First Working Claude Query
# step1_basic.py
from anthropic import Anthropic
from dotenv import load_dotenv
import os
load_dotenv()
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
def query_claude(prompt: str, system: str = "You are a helpful research assistant.") -> str:
"""Single stateless query — no memory between calls."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
if __name__ == "__main__":
result = query_claude("How does the human body digest protein? Explain the key enzymes involved.")
print(result)
This works, but has a critical limitation: every call is independent. Ask a follow-up and Claude has no idea what you were discussing. That's where memory comes in.
Step 2: Conversational Memory
Memory in Claude is not a setting—it's a list you maintain. Every turn, you append both the user message and the assistant's response to conversation_history, then pass the entire list on the next API call.
# step2_memory.py
from anthropic import Anthropic
from dotenv import load_dotenv
import os
load_dotenv()
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
SYSTEM_PROMPT = """You are an expert research assistant specializing in biology and biochemistry.
When answering questions, be thorough and scientific. Reference previous context in the conversation."""
class ResearchAgent:
"""Research agent with multi-turn conversational memory."""
def __init__(self, system: str = SYSTEM_PROMPT):
self.system = system
self.conversation_history: list[dict] = []
def chat(self, user_message: str) -> str:
"""Send a message and maintain conversation history."""
# Add user message to history
self.conversation_history.append({
"role": "user",
"content": user_message
})
# Call API with full history
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=self.system,
messages=self.conversation_history # Full history every time
)
assistant_text = response.content[0].text
# Add assistant response to history
self.conversation_history.append({
"role": "assistant",
"content": assistant_text
})
return assistant_text
def clear(self):
"""Start a new conversation."""
self.conversation_history = []
def token_count(self) -> int:
"""Rough estimate of tokens in history."""
total_chars = sum(
len(str(msg["content"])) for msg in self.conversation_history
)
return total_chars // 4 # ~4 chars per token
if __name__ == "__main__":
agent = ResearchAgent()
# Turn 1
r1 = agent.chat("How does the body digest protein?")
print(f"Turn 1:\n{r1}\n{'='*60}\n")
# Turn 2 — Claude remembers the context from Turn 1
r2 = agent.chat("What specific enzymes break down the peptide bonds you mentioned?")
print(f"Turn 2:\n{r2}\n{'='*60}\n")
# Turn 3 — Even deeper follow-up
r3 = agent.chat("What happens if someone has a protease deficiency?")
print(f"Turn 3:\n{r3}")
Step 3: Tool Use — Web Search and PDF Download
Now we give the agent real capabilities. Claude's tool use works via function calling: you define tool schemas in JSON Schema format, Claude decides when to call them, and you execute the actual code.
Define the Tools
# tools.py
from tavily import TavilyClient
import requests
import os
import json
from dotenv import load_dotenv
load_dotenv()
tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
# --- TOOL SCHEMAS (what Claude sees) ---
WEB_SEARCH_TOOL = {
"name": "web_search",
"description": (
"Search the web for research papers and scientific articles. "
"Use this when you need current information, recent studies, or "
"specific papers on a topic. Prefer academic sources."
),
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query — be specific, include key terms"
},
"search_depth": {
"type": "string",
"enum": ["basic", "advanced"],
"description": "'advanced' returns more academic results"
},
"max_results": {
"type": "integer",
"description": "Number of results (1-10, default 5)"
}
},
"required": ["query"]
}
}
DOWNLOAD_PDF_TOOL = {
"name": "download_pdf",
"description": (
"Download a PDF from a URL and save it locally. "
"Use this when you find a PDF link in search results that contains "
"relevant research you want to analyze in depth."
),
"input_schema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "Direct URL to the PDF file"
},
"topic": {
"type": "string",
"description": "Research topic (used for folder organization)"
}
},
"required": ["url", "topic"]
}
}
# --- TOOL IMPLEMENTATIONS (what your code actually runs) ---
def execute_web_search(query: str, search_depth: str = "advanced", max_results: int = 5) -> dict:
"""Execute web search using Tavily API."""
try:
results = tavily.search(
query=query,
search_depth=search_depth,
max_results=max_results,
include_answer=True, # Get Tavily's synthesized answer too
)
return {
"success": True,
"answer": results.get("answer", ""),
"results": [
{
"title": r.get("title", ""),
"url": r.get("url", ""),
"content": r.get("content", "")[:500], # First 500 chars
"score": r.get("score", 0),
}
for r in results.get("results", [])
]
}
except Exception as e:
return {"success": False, "error": str(e)}
def execute_download_pdf(url: str, topic: str) -> dict:
"""Download PDF from URL and save locally."""
try:
# Create topic-specific folder
safe_topic = "".join(c for c in topic if c.isalnum() or c in " -_").strip()
folder = os.path.join("papers", safe_topic[:50])
os.makedirs(folder, exist_ok=True)
# Derive filename from URL
filename = url.split("/")[-1].split("?")[0]
if not filename.endswith(".pdf"):
filename += ".pdf"
filepath = os.path.join(folder, filename)
# Download with timeout and size limit (50MB)
response = requests.get(url, timeout=30, stream=True)
response.raise_for_status()
content_type = response.headers.get("content-type", "")
if "pdf" not in content_type and not url.endswith(".pdf"):
return {"success": False, "error": f"URL does not appear to be a PDF (content-type: {content_type})"}
size = 0
with open(filepath, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
size += len(chunk)
if size > 50 * 1024 * 1024:
return {"success": False, "error": "File too large (>50MB)"}
return {
"success": True,
"filepath": filepath,
"size_kb": size // 1024,
"message": f"PDF saved to {filepath}"
}
except requests.exceptions.Timeout:
return {"success": False, "error": "Download timed out after 30s"}
except Exception as e:
return {"success": False, "error": str(e)}
# Dispatch table — maps tool name to implementation
TOOL_REGISTRY = {
"web_search": execute_web_search,
"download_pdf": execute_download_pdf,
}
ALL_TOOLS = [WEB_SEARCH_TOOL, DOWNLOAD_PDF_TOOL]
The Agentic Loop
This is the core of autonomous behavior. The loop continues as long as Claude's stop_reason is "tool_use":
# step3_agent_with_tools.py
from anthropic import Anthropic
from tools import ALL_TOOLS, TOOL_REGISTRY
from dotenv import load_dotenv
import os
import json
load_dotenv()
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
RESEARCH_SYSTEM = """You are an autonomous research assistant with access to web search and PDF downloading.
When given a research topic:
1. Search for recent, authoritative papers and articles
2. Download any PDFs that look especially relevant
3. Synthesize what you find into a clear, well-cited summary
4. Be specific about where each piece of information comes from
Work autonomously — use your tools proactively without waiting to be asked."""
class AutonomousResearchAgent:
"""Research agent with tool use and agentic loop."""
def __init__(self):
self.conversation_history: list[dict] = []
self.tools_used: list[str] = []
def _execute_tool(self, tool_name: str, tool_input: dict) -> str:
"""Execute a tool and return its result as JSON string."""
if tool_name not in TOOL_REGISTRY:
return json.dumps({"error": f"Unknown tool: {tool_name}"})
print(f" 🔧 [{tool_name}] {json.dumps(tool_input)[:100]}...")
result = TOOL_REGISTRY[tool_name](**tool_input)
self.tools_used.append(tool_name)
return json.dumps(result)
def research(self, topic: str) -> str:
"""
Run autonomous research on a topic.
The agentic loop continues until Claude stops requesting tools.
"""
self.conversation_history.append({
"role": "user",
"content": f"Research this topic thoroughly: {topic}"
})
print(f"\n🔬 Researching: {topic}")
iteration = 0
max_iterations = 10 # Safety limit to prevent runaway loops
while iteration < max_iterations:
iteration += 1
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system=RESEARCH_SYSTEM,
tools=ALL_TOOLS,
messages=self.conversation_history
)
if response.stop_reason == "end_turn":
# Claude is done — extract the text response
final_text = next(
(block.text for block in response.content if hasattr(block, "text")),
"Research complete."
)
self.conversation_history.append({
"role": "assistant",
"content": final_text
})
return final_text
elif response.stop_reason == "tool_use":
# Claude wants to call one or more tools
# First, record the full assistant message (including tool_use blocks)
self.conversation_history.append({
"role": "assistant",
"content": response.content # Keep as list of content blocks
})
# Execute all tool calls in this response
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_result = self._execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": tool_result
})
# Add all tool results in a single user message
self.conversation_history.append({
"role": "user",
"content": tool_results
})
else:
# Unexpected stop reason
break
return "Research loop exceeded maximum iterations. Partial results may be available."
def follow_up(self, question: str) -> str:
"""Ask a follow-up question after initial research."""
self.conversation_history.append({
"role": "user",
"content": question
})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=RESEARCH_SYSTEM,
tools=ALL_TOOLS,
messages=self.conversation_history
)
# Handle tool use in follow-ups too
while response.stop_reason == "tool_use":
self.conversation_history.append({
"role": "assistant",
"content": response.content
})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = self._execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
self.conversation_history.append({"role": "user", "content": tool_results})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=RESEARCH_SYSTEM,
tools=ALL_TOOLS,
messages=self.conversation_history
)
answer = next(
(block.text for block in response.content if hasattr(block, "text")),
"No answer."
)
self.conversation_history.append({"role": "assistant", "content": answer})
return answer
if __name__ == "__main__":
agent = AutonomousResearchAgent()
# Autonomous research
summary = agent.research("protein digestion enzymes and proteolysis mechanisms")
print(f"\n📊 RESEARCH SUMMARY:\n{summary}")
print(f"\nTools used: {agent.tools_used}")
# Conversational follow-up (agent remembers everything above)
answer = agent.follow_up("What's the clinical relevance of protease inhibitors?")
print(f"\n💬 FOLLOW-UP:\n{answer}")
Step 4: Memory Compression for Long Sessions
As research sessions grow, conversation_history can hit Claude's context limit. Compress old turns into a summary while keeping recent turns verbatim:
# memory.py
def compress_history(
history: list[dict],
keep_recent: int = 6,
max_summary_tokens: int = 800
) -> list[dict]:
"""
Compress old conversation turns into a summary.
Keeps the most recent `keep_recent` turns intact for full context.
"""
if len(history) <= keep_recent:
return history # Nothing to compress
old_turns = history[:-keep_recent]
recent_turns = history[-keep_recent:]
# Build a text representation of old turns for summarization
old_text = "\n".join(
f"{msg['role'].upper()}: {str(msg['content'])[:200]}"
for msg in old_turns
if isinstance(msg.get("content"), str)
)
# Summarize with Claude
summary_response = client.messages.create(
model="claude-haiku-4-5-20251001", # Fast, cheap model for summaries
max_tokens=max_summary_tokens,
messages=[{
"role": "user",
"content": f"Summarize this research conversation concisely, preserving key facts and findings:\n\n{old_text}"
}]
)
summary = summary_response.content[0].text
# Replace old turns with a single summary system context
compressed = [
{
"role": "user",
"content": f"[Previous research context summary]\n{summary}"
},
{
"role": "assistant",
"content": "Understood. I have the previous research context. Please continue."
}
] + recent_turns
return compressed
Step 5: Streamlit Web Interface
# app.py
import streamlit as st
from step3_agent_with_tools import AutonomousResearchAgent
import os
import glob
st.set_page_config(
page_title="Research Agent",
page_icon="🔬",
layout="wide"
)
st.title("🔬 Autonomous Research Agent")
st.caption("Powered by Claude + Tavily Search")
# Initialize agent in session state (persists across reruns)
if "agent" not in st.session_state:
st.session_state.agent = AutonomousResearchAgent()
if "chat_display" not in st.session_state:
st.session_state.chat_display = [] # [(role, message), ...]
# ── Layout ──────────────────────────────────────────────
col_chat, col_files = st.columns([2, 1])
with col_chat:
st.subheader("Research Chat")
# Show conversation history
for role, message in st.session_state.chat_display:
with st.chat_message(role):
st.markdown(message)
# Input area
if topic := st.chat_input("Enter a research topic or follow-up question..."):
st.session_state.chat_display.append(("user", topic))
with st.chat_message("user"):
st.markdown(topic)
with st.chat_message("assistant"):
with st.spinner("Researching... (this may take 30–60 seconds)"):
if len(st.session_state.chat_display) == 1:
# First message — run full autonomous research
result = st.session_state.agent.research(topic)
else:
# Follow-up — use conversational memory
result = st.session_state.agent.follow_up(topic)
st.markdown(result)
st.session_state.chat_display.append(("assistant", result))
if st.session_state.agent.tools_used:
st.caption(f"Tools used: {', '.join(set(st.session_state.agent.tools_used))}")
with col_files:
st.subheader("Downloaded Papers")
# Show all downloaded PDFs
pdf_files = glob.glob("papers/**/*.pdf", recursive=True)
if pdf_files:
selected = st.selectbox(
"Select paper to view",
pdf_files,
format_func=lambda p: os.path.basename(p)
)
if selected:
with open(selected, "rb") as f:
st.download_button(
label=f"⬇️ Download {os.path.basename(selected)}",
data=f,
file_name=os.path.basename(selected),
mime="application/pdf"
)
st.caption(f"Saved to: {selected}")
else:
st.info("No PDFs downloaded yet. Research a topic to find papers.")
# New research button
if st.button("🔄 Start New Research"):
st.session_state.agent = AutonomousResearchAgent()
st.session_state.chat_display = []
st.rerun()
Run with:
streamlit run app.py
The ReAct Pattern: Structured Reasoning
For complex research tasks, you can prompt Claude to explicitly reason before acting using the ReAct (Reasoning + Acting) pattern:
REACT_SYSTEM = """You are a research agent that thinks before acting.
For each step, structure your internal reasoning as:
THOUGHT: [What do I need to find? What's my next step?]
ACTION: [Which tool to use and why]
OBSERVATION: [What did I find?]
... (repeat)
CONCLUSION: [Final synthesis]
This explicit reasoning makes your research more thorough and traceable."""
When Claude receives this prompt, it naturally produces structured reasoning traces that you can parse, log, and display to users—making the agent's decision-making transparent and auditable.
Advanced: Parallel Tool Calls
Claude can request multiple tool calls in a single response. The agentic loop already handles this—when response.content contains multiple tool_use blocks, the code processes all of them:
# Multiple tool calls in one response
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute each tool (could be parallelized with threading)
tool_result = self._execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": tool_result
})
# All results sent back in one message
self.conversation_history.append({
"role": "user",
"content": tool_results # List with multiple tool_result blocks
})
For I/O-bound tools like web search and PDF download, use concurrent.futures.ThreadPoolExecutor to execute them in parallel, reducing wait time from N×latency to ~1×latency.
Security and Best Practices
- Always use
python-dotenvwith.envfiles - Add
.envto.gitignoreimmediately - In production: use AWS Secrets Manager or similar
- Rotate API keys if they're accidentally committed
- Validate tool inputs before executing them
- Limit file download sizes (implemented: 50MB cap)
- Sanitize filenames from URLs before saving
- Add
max_iterationsto prevent infinite agentic loops
- Monitor tokens with
response.usage - Use
claude-haiku-4-5for summarization tasks - Cache search results to avoid repeat API calls
- Set reasonable
max_tokenslimits per call
- Check
response.usage.input_tokenson each call - Compress history when approaching context limit
- claude-sonnet-4-6 supports 200K token context
- Tool results can be large — truncate if needed
Performance Benchmarks
Research task comparison:
Common Mistakes
Key Takeaways
client.messages.create(). Any tutorial using agents.create() is fictional. Autonomy comes from your agentic loop, not a magic API.content blocks, not just text) to conversation_history and pass it on every API call.stop_reason == "tool_use": execute tools, add results as role: "user" with type: "tool_result", call API again. Exit when stop_reason == "end_turn".max_iterations, download size caps, input validation, and token monitoring prevent runaway costs and unexpected failures in production.