Teach Your Agents What You Know (Part 1)

20 Dec 2025

Part 2: using Socratic to improve the success rate of tau-bench airline agent by 10-17%.

TL;DR: Vertical AI agents often struggle because domain knowledge is tacit and difficult to capture via static system prompts or by retrieving from raw documents. This post proposes treating agents as students: human domain experts teach the agent through iterative, interactive chats while the agent distills domain rules, definitions, and heuristics into a continuously improving knowledge base. I’ve implemented this workflow in an open source prototype Socratic.

Table of Contents

3-min Video Demo
The Context Bottleneck
Existing Approaches
Transferring Knowledge Through Teaching
A Teacher-Student System Prototype
Vision: Teaching as the New Training
“This Sounds Like Just Another Agent Memory System”

3-min Video Demo

The Context Bottleneck

It’s widely known that the key to building effective agents is giving them the right context, especially for vertical agents: agents that specialize in a specific domain. To reliably perform highly specialized tasks, human experts use intuitions, best practices, and heuristics built over accumulated experience. However, transferring this expert knowledge from humans to agents is not easy. This knowledge is often fragmented and unstructured. More importantly, domain expertise requires a deep understanding of principles, heuristics, and edge cases beyond memorizing facts. So how can we transfer such knowledge from humans to agents?

Existing Approaches

Two common approaches for transferring knowledge are prompt engineering and information retrieval.

1. Prompt Engineering (aka. Write the System Prompt by Hand)

In this approach, a domain expert crafts a detailed system prompt that encodes the relevant policies and procedures. A canonical example is an airline customer support agent whose system prompt includes airline rules for cancellations and refunds (e.g., tau-bench).

This can work, but it has two costs. First, it requires an expert at both the domain and LLM prompt engineering. Typically this person is difficult to find, since the domain can be unrelated to LLMs (e.g., legal, health care, construction). Second, writing system prompts demands that experts enumerate their knowledge up front. But human expertise is often tacit and difficult to articulate (“We can know more than we can tell”). We don’t naturally operate by listing all our rules and heuristics; we apply them in response to specific situations.

2. Runtime Information Retrieval (e.g. RAG / File-based Agentic Search)

In this approach, the agent retrieves from a given set of documents and reasons over the information it finds. Practically, this looks like: a human provides a set of potentially relevant artifacts (design docs, code, meeting notes, emails). At runtime, the user asks the agent a question. The agent searches over the artifact corpus, retrieves relevant parts, and synthesizes an answer.

This is a good fit for “facts lookup” tasks, where the answer can be found directly somewhere in the source documents. It’s less reliable when the domain knowledge is complex, because the hardest part is not finding information, but correctly understanding and applying domain knowledge.

An analogy: it’s like onboarding a junior engineer by handing them a folder of documents and saying “good luck.” The most critical part of knowledge transfer, synthesizing this “raw” information into usable knowledge, is pushed entirely onto the learner. This is the “document dump” approach: dump your docs and hope the agent works.

Transferring Knowledge Through Teaching

Humans transfer knowledge to each other all the time, commonly through teaching: “the practice implemented by a teacher aimed at transmitting skills to a learner” (Wikipedia). Could this be a viable way to transfer knowledge to agents? The human expert acts as the teacher and the agent as the student.

I’ll focus on two properties that make teaching effective:

Interactive: For effective learning, both the teacher and student should contribute to the process. The teacher teaches new concepts to the student (e.g. lectures, exams). The student digests the materials asks clarifying questions to enhance their understanding (e.g. in-class questions, office hours).
Iterative: Teaching happens across multiple sessions. The student’s knowledge evolves over time.

These two properties motivate the design of a ‘student agent’ system that facilitates this teaching process, which I’ll describe next.

A Teacher-Student System Prototype

To test this idea, I prototyped a system called Socratic (open source, Apache license). A video demo is shown at the beginning of this blog.

The inputs to the system are source documents. In practice, these can be anything that contain domain knowledge: design docs, code, meeting notes, daily logs, past chat transcripts, etc. These are things that you previously would have put into a RAG/document dump. Teaching happens through chat sessions between the human user and the agent student. The agent maintains and updates a knowledge base (in plain text) based on information received from the human teacher.

The output of this process is the knowledge base, capturing the distilled rules, definitions, and strategies that emerged during the teaching process. Practically, it can be exported as AGENTS.md, uploaded into a chat UI, or stored alongside a codebase, etc.

As a concrete example, I used Socratic to build a knowledge base about Socratic itself: the problem it tries to solve, the design decisions I made, and how those decisions evolved over time. The source documents include the Socratic source code, my design logs, and brainstorming conversations with ChatGPT. The resulting knowledge bases are available here.

Following our insight that effective teaching is interactive, Socratic implements two methods to initiate knowledge transfer:

Teacher-initiated teaching. The human user instructs the agent on a topic to learn. E.g., “Let’s look at how Socratic stores the knowledge base.” The agent studies the relevant documents and proposes updates to the knowledge base. The human reviews, corrects, and approves these updates.
Student-initiated learning. The agent studies the existing knowledge base and source documents to identify inconsistencies, gaps, or ambiguities and generates questions for the human. This process starts without a specific human instruction. E.g., Agent: “The current knowledge base mentions that we assume X, but a source document seems to assume Y. Please clarify which one is correct.”

Socratic is naturally iterative. The student agent updates the knowledge base over each chat session. As the knowledge base evolves, the student agent’s understanding of the domain evolves as well.

Vision: Teaching as the New Training

The key question is: how much can we improve an agent’s practical competence through repeated teaching? Ideally, the more we teach an agent, the “better” the knowledge base becomes, and the more reliably the agent can act within its domain.

This framing also highlights an important question: what does it mean to be a good teacher for an agent? If teaching becomes a core workflow, then “agent education” might become a real skill. E.g. deciding what to teach first, picking the right examples, probing for misconceptions.

Of course, while this vision is exciting, it is unclear how much performance we can squeeze out of existing agents through teaching. In Part 2 of this blog series, I will share a concrete use case of Socratic, optimizing an airline customer service agent, and evaluate its effectiveness.

“This Sounds Like Just Another Agent Memory System”

There are many good memory systems out there today. Most of them let a memory agent manage memory automatically: given a stream of conversations or documents, the agent indexes them into an internal structure (for example, a vector store or knowledge graph) and later recalls relevant facts. The goal is a transparent user experience: the agent remembers user preferences automatically.

This works well for simple fact retention (e.g., “Alice likes sci-fi movies”), but my intuition is that they are less effective for complex domain knowledge, where it is unclear what the agent should learn and how it should interpret evidence (intuition only, experiments needed…). Continuing our analogy: this is like a student taking notes without ever asking questions. The teacher has no way to correct misunderstandings.

Socratic makes a different tradeoff: instead of automatic memory, it enables the human to better control the knowledge transfer process. The knowledge base is just bunch of plain text files. Every update is visible, editable, and approved by the human teacher, making knowledge transfer explicit, inspectable, and correctable.

Second, a key goal of Socratic is to build agents that are good at asking high-quality questions. I strongly believe that this is a critical capability for effective knowledge transfer. Good questions drive deeper understanding, surface gaps in knowledge, and help address the “we can know more than we can tell” problem by drawing out tacit expertise from the human teacher.

That being said, it is not very clear how to quantitatively define “high quality questions” and how to measure an agent’s ability to ask the “right” questions. Building a concrete evaluation framework would be an interesting next step.

(Fun fact: in Chinese, the word for “knowledge” is “学问”: “学” means “learn” and “问” means “ask”. Knowledge is half learning, half asking questions!)