Testing

The pkg/testutil package provides a scriptable provider, a fake memory store, and assertion helpers so you can unit test any code built on top of chainforge — without spinning up a real LLM.

import "github.com/lioarce01/chainforge/pkg/testutil"

MockProvider

Script deterministic LLM responses in your tests:

p := testutil.NewMockProvider(
    testutil.EndTurnResponse("Hello, how can I help?"),          // turn 1
    testutil.ToolUseResponse(core.ToolCall{                      // turn 2 — calls a tool
        Name:  "search",
        Input: `{"query":"Go concurrency"}`,
    }),
    testutil.EndTurnResponse("Here is what I found: ..."),       // turn 3 — final answer
)

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("mock"),
    chainforge.WithTools(mySearchTool),
)

result, err := agent.Run(ctx, "s1", "Tell me about Go concurrency")

Responses are returned in order. Once exhausted, the last response repeats — so a single EndTurnResponse covers any number of calls.

Response builders

Builder	Description
`EndTurnResponse(text)`	A text response that ends the agent loop (`stop_reason: end_turn`).
`ToolUseResponse(calls...)`	A tool-call response (`stop_reason: tool_use`).
`ErrorResponse(err)`	A provider error — the agent surfaces this as a `Run` error.

Inspection

// How many times was the provider called?
p.CallCount()

// What did the last request look like?
req := p.LastRequest()
fmt.Println(req.Messages)

// Full call history
for _, call := range p.Calls() {
    fmt.Println(call.Request.Model)
}

// Reset for the next test case
p.Reset()

MapMemoryStore

A simple in-memory store that records operations for inspection:

mem := testutil.NewMapMemory()

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("mock"),
    chainforge.WithMemory(mem),
)

agent.Run(ctx, "session-1", "Hello")

// Inspect what was stored
testutil.AssertSessionContains(t, mem, "session-1", "Hello")
testutil.AssertSessionLen(t, mem, "session-1", 2) // user + assistant

fmt.Println(mem.AppendCount()) // total Append calls
fmt.Println(mem.SessionIDs())  // all active session IDs

Assertions

Drop-in assertion helpers for *testing.T and testing.TB:

// Verify the provider was called exactly twice
testutil.AssertCallCount(t, p, 2)

// Verify the last request contained a specific message
testutil.AssertLastRequestContains(t, p, core.RoleUser, "Go concurrency")

// Verify a session contains a specific string
testutil.AssertSessionContains(t, mem, "session-1", "expected text")

// Verify a session has an exact number of messages
testutil.AssertSessionLen(t, mem, "session-1", 4)

Debug handler

During development, attach PrettyPrintDebugHandler to see a turn-by-turn transcript of the agent loop:

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("claude-sonnet-4-6"),
    chainforge.WithDebugHandler(chainforge.PrettyPrintDebugHandler(os.Stderr)),
)

Output:

[iter 0] → LLM  (3 messages)
[iter 0] ← LLM  stop=tool_use  ""
[iter 0] ⚙  tool=search  input={"query":"test"}
[iter 0] ✓  tool=search  result=result
[iter 1] → LLM  (5 messages)
[iter 1] ← LLM  stop=end_turn  "Done."

You can also write a custom handler for test assertions:

var toolsCalled []string
agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("mock"),
    chainforge.WithDebugHandler(func(ctx context.Context, ev chainforge.DebugEvent) {
        if ev.Kind == chainforge.DebugToolCall {
            toolsCalled = append(toolsCalled, ev.ToolCall.Name)
        }
    }),
)

agent.Run(ctx, "s1", "search for something")

if len(toolsCalled) == 0 || toolsCalled[0] != "search" {
    t.Errorf("expected search tool, got %v", toolsCalled)
}

AgentTrace — rich run assertions

AgentTrace records every LLM call and tool invocation during a run, making test assertions read like specifications.

import "github.com/lioarce01/chainforge/pkg/testutil"

func TestMyAgent(t *testing.T) {
    p := testutil.NewMockProvider(
        testutil.ToolUseResponse(core.ToolCall{Name: "search", Input: `{"query":"Go"}`}),
        testutil.EndTurnResponse("Go is great."),
    )

    tr := &testutil.AgentTrace{}

    agent, _ := chainforge.NewAgent(
        chainforge.WithProvider(p),
        chainforge.WithModel("mock"),
        chainforge.WithTools(searchTool),
        chainforge.WithDebugHandler(testutil.TraceHandler(tr)),
    )

    result, _ := agent.Run(context.Background(), "s1", "Tell me about Go")

    tr.AssertNoError(t)
    tr.AssertIterations(t, 2)          // tool call iteration + final answer
    tr.AssertToolCalled(t, "search")   // verify the tool was used
    tr.AssertFinalText(t, "Go is great.")
}

AgentTrace assertion methods:

Method	Description
`AssertIterations(t, n)`	Fails if the agent did not run exactly `n` iterations.
`AssertToolCalled(t, name)`	Fails if the named tool was not called.
`AssertToolNotCalled(t, name)`	Fails if the named tool was called.
`AssertFinalText(t, want)`	Fails if the final text is not `want`.
`AssertNoError(t)`	Fails if the run returned an error.

For access to the raw data (messages, responses, tool outputs), inspect tr.Iterations directly.

RunStreamCollect — streaming in one line

Use RunStreamCollect when you want streaming for display but still need the final result:

text, usage, err := agent.RunStreamCollect(ctx, "session-1", "Hello",
    func(delta string) { fmt.Print(delta) }) // called for each text chunk

// onDelta is nil-safe — behaves like Run but also returns usage:
text, usage, err := agent.RunStreamCollect(ctx, "session-1", "Hello", nil)
fmt.Printf("tokens: %d in / %d out\n", usage.InputTokens, usage.OutputTokens)

Example: testing a tool-calling agent

func TestMyAgent_CallsSearchTool(t *testing.T) {
    var searchCalled bool
    searchTool, _ := tools.Func("search", "Search the web",
        tools.NewSchema().AddString("query", "query", true).MustBuild(),
        func(ctx context.Context, input string) (string, error) {
            searchCalled = true
            return "result", nil
        },
    )

    p := testutil.NewMockProvider(
        testutil.ToolUseResponse(core.ToolCall{Name: "search", Input: `{"query":"test"}`}),
        testutil.EndTurnResponse("Done."),
    )

    agent, err := chainforge.NewAgent(
        chainforge.WithProvider(p),
        chainforge.WithModel("mock"),
        chainforge.WithTools(searchTool),
    )
    if err != nil {
        t.Fatal(err)
    }

    result, err := agent.Run(context.Background(), "s1", "Search for test")
    if err != nil {
        t.Fatal(err)
    }

    testutil.AssertCallCount(t, p, 2) // tool call + final answer
    if !searchCalled {
        t.Error("expected search tool to be called")
    }
    if result != "Done." {
        t.Errorf("got %q, want %q", result, "Done.")
    }
}

Getting Started

Core Concepts

Production

Reference

MockProvider

Response builders

Inspection

MapMemoryStore

Assertions

Debug handler

AgentTrace — rich run assertions

RunStreamCollect — streaming in one line

Example: testing a tool-calling agent

Getting Started

Core Concepts

Production

Reference

​MockProvider

​Response builders

​Inspection

​MapMemoryStore

​Assertions

​Debug handler

​AgentTrace — rich run assertions

​RunStreamCollect — streaming in one line

​Example: testing a tool-calling agent

MockProvider

Response builders

Inspection

MapMemoryStore

Assertions

Debug handler

AgentTrace — rich run assertions

RunStreamCollect — streaming in one line

Example: testing a tool-calling agent