A chatbot responds to your question. An agent decides what to do next.
I typed: 'Find me a place to hike today.' The agent detected my location, checked the weather, found 8 national parks, analyzed trails, and gave me 3 ranked recommendations — in 20 seconds. All running offline on my laptop.
An agent has tools, a planning loop, and the ability to reflect on whether it accomplished its goal. It doesn't just generate text; it executes actions across external systems.
Chatbots vs. Agents: The Fundamental Difference
Architectural Comparison
- User sends a message
- Model generates text
- Done
- Limitation: No real-world interaction.
- User states a goal
- Agent plans & calls tools
- Agent reflects & retries
- Synthesizes & Responds
- Benefit: Can query APIs and take actions.
The 4-Step Agentic Loop
Every AI agent follows the same fundamental loop. Understanding this loop means you can build any agent, from a personal assistant to an autonomous developer.
The Universal Agentic Loop
Parse intent into structured tasks. Extract constraints and success criteria from the user's messy natural language.
Decide the sequence of actions: "First I need location, then weather, then I'll search for open parks nearby."
Execute Python functions that interact with the real world. This is where the LLM becomes a 'Tool User.'
Evaluate quality: "Is this weather too dangerous for hiking? Did the search return zero results? Do I need to pivot my plan?"
Setting Up Your Local LLM Environment
We'll use Ollama to run models entirely on your machine. This ensures your data never leaves your network and your agent works even when you're on a plane.
- Ollama: Install from ollama.com.
- Python: 3.10+ recommended.
- RAM: 8GB for 8B models, 16GB+ for 14B+ models.
- llama3.2:3b (4GB RAM) → Fast, perfect for simple agents and low-latency loops.
- llama3.1:8b (8GB RAM) → Balanced, great instruction following and multi-step reasoning.
- mistral:7b (8GB RAM) → Excellent for tool use and producing structured JSON outputs.
- qwen2.5:14b (16GB RAM) → Complex reasoning and high-fidelity plan generation.
The Implementation: Building the Orchestrator
The "Agentic Loop" is essentially a while loop that continues until the model decides it has the final answer.
# orchestrator.py
import json
import ollama
from tools import get_location, get_weather, get_parks
def run_agent(goal: str):
messages = [
{"role": "system", "content": "You are a local hiking agent. Use tools to find info. Goal: Give a ranked list of 3 trails."},
{"role": "user", "content": goal}
]
tools = [get_location, get_weather, get_parks]
# ── THE AGENTIC LOOP ──────────────────────────────────────────
while True:
response = ollama.chat(
model='llama3.1:8b',
messages=messages,
tools=tools
)
# Add assistant message to history
messages.append(response['message'])
# Check if the model wants to use tools
if not response['message'].get('tool_calls'):
return response['message']['content']
# Execute tools and add results to history
for tool_call in response['message']['tool_calls']:
tool_name = tool_call['function']['name']
args = tool_call['function']['arguments']
print(f"🛠️ Executing: {tool_name}({args})")
result = execute_tool(tool_name, args)
messages.append({
'role': 'tool',
'content': json.dumps(result),
'name': tool_name
})
def execute_tool(name, args):
if name == "get_location": return get_location()
if name == "get_weather": return get_weather(args['lat'], args['lon'])
if name == "get_parks": return get_parks(args['state'])
return {"error": "Tool not found"}
The LLM doesn't see your Python code. It sees the docstring of your functions. Write your docstrings like documentation for a junior engineer: "Use this when X happens, it returns Y."
Why Reflection Is the Most Important Concept
Reflection separates a useful agent from a brittle script. If the weather tool returns "Thunderstorms," a script might blindly continue to "Find Parks." An agent will Reflect and say: "Wait, it's dangerous to hike today. I should warn the user and suggest indoor alternatives instead."
Three Types of Reflection
"Should we proceed?" — Deciding if conditions are safe before executing the next plan step.
"Is this enough?" — If the search returns generic data, the agent can decide to search again with better terms.
"Which is best?" — Ranking 10 trails based on the user's skill level and current daylight hours.
Making Your Agent Robust: The Retry Pattern
Local models can sometimes hallucinate JSON or tool names. You need a "Correction Loop."
# Simplified Retry Pattern
def safe_chat(messages, retries=3):
for i in range(retries):
try:
return ollama.chat(model='...', messages=messages)
except Exception as e:
print(f"⚠️ Retry {i+1}: {e}")
messages.append({"role": "user", "content": f"Fix the error: {e}"})
raise Exception("Max retries exceeded")
Performance Benchmarks: M3 Pro (18GB RAM)
- Location Detection: < 1 second
- Weather Lookup: ~2 seconds
- Planning/Reflection: 4–6 seconds
- Final Synthesis: 8–12 seconds
- TOTAL RUNTIME: ~15–20 seconds
Key Takeaways
Understand → Plan → Act → Reflect. This pattern powers everything from simple task runners to autonomous coding agents.
Without reflection, you have a script. Reflection allows the agent to self-correct, handle failures, and pivot based on environment changes.
Models like Llama 3.1 (8B) and Qwen 2.5 (7B) are fast and intelligent enough for reliable agentic loops on modern laptops.