Agents -- what even are they
Some rambling thoughts about nomenclature and what is going on with AI agents -- 6/12/2023
What is an Agent?
The talk of the AI town these last few months has been singular: “agents.” Some public examples from the last 6 months:
LangChain Agents (“Some applications require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user’s input”)
AutoGPT (/BabyAGI) (“This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set.”)
Voyager (“Use LLMs as a reasoning engine to play Minecraft”)
Agent has become an overloaded term. People use “agent” to refer to way too many different things.
Let’s break down what people are talking about.
A Taxonomy of Agents
Here’s my attempt at categorizing the work that people are labeling as “agents”:
AGI
Explanation: The flashiest of the “agents” are (as far as I can tell) trying to solve general intelligence. They are essentially programs of the structure “specify any task, get a solution.”
Diagnosis: Unsurprisingly, these don’t work yet. We haven’t solved AGI! If these projects end up working… we can probably safely ignore all of the rest of my writing 🙂
What I would call this (instead of agent): AGI
LLMs that prompt themselves
Examples: LangChain Agents, Smol AI, a lot of the noise on Twitter
Explanation: The broadest definition basically says that “if an LLM is coming up with prompts, then it’s an agent.” This accounts for a lot of the “agent” noise — people have started using the word agent to describe complex chains of prompts that are unknown at runtime.
Diagnosis: This is simply a natural evolution of LLM applications.
What I would call this (instead of agent): Nondeterministic prompt chains
LLMs that use tools
Examples: ChatGPT Plugins, LangChain Agents, Fixie Agents, Gorilla
Explanation: If you directly integrate the input or output of an LLM with some form of API, then you have an agent.
Diagnosis: Another natural evolution of LLM applications.
What I would call this (instead of agent): Tools / skills / plugins
Software with intuition
Examples: Voyager
Explanation: LLMs seem to have a “world model”, and we can use that to include the concept of intuition in programs.
Diagnosis: The most exciting development in software right now. Software that has a baseline understanding of how the world works will change how our world works in the coming years.
What I would call this (instead of agent): Agent. I think this is the underlying building block that makes software “agential.”
My Thoughts
I mostly ignore the AGI agents. I think they’re important fundamental work! But I don’t think we’re that close to AGI, and I think OpenAI and Anthropic are likely further along than OSS.
Nondeterministic prompt chains + LLMs that use tools are new programming paradigms that demand new developer tools (and best practices).
Software with intuition is a new way of conceiving what is possible that will lead to a wide swath of products that will disrupt how we interact with computers.
WE NEED A BETTER VOCABULARY THAN JUST THE WORD AGENT
Happy Monday, all!