Skip to main content
The pkg/testutil package provides a scriptable provider, a fake memory store, and assertion helpers so you can unit test any code built on top of chainforge — without spinning up a real LLM.
import "github.com/lioarce01/chainforge/pkg/testutil"

MockProvider

Script deterministic LLM responses in your tests:
p := testutil.NewMockProvider(
    testutil.EndTurnResponse("Hello, how can I help?"),          // turn 1
    testutil.ToolUseResponse(core.ToolCall{                      // turn 2 — calls a tool
        Name:  "search",
        Input: `{"query":"Go concurrency"}`,
    }),
    testutil.EndTurnResponse("Here is what I found: ..."),       // turn 3 — final answer
)

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("mock"),
    chainforge.WithTools(mySearchTool),
)

result, err := agent.Run(ctx, "s1", "Tell me about Go concurrency")
Responses are returned in order. Once exhausted, the last response repeats — so a single EndTurnResponse covers any number of calls.

Response builders

BuilderDescription
EndTurnResponse(text)A text response that ends the agent loop (stop_reason: end_turn).
ToolUseResponse(calls...)A tool-call response (stop_reason: tool_use).
ErrorResponse(err)A provider error — the agent surfaces this as a Run error.

Inspection

// How many times was the provider called?
p.CallCount()

// What did the last request look like?
req := p.LastRequest()
fmt.Println(req.Messages)

// Full call history
for _, call := range p.Calls() {
    fmt.Println(call.Request.Model)
}

// Reset for the next test case
p.Reset()

MapMemoryStore

A simple in-memory store that records operations for inspection:
mem := testutil.NewMapMemory()

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("mock"),
    chainforge.WithMemory(mem),
)

agent.Run(ctx, "session-1", "Hello")

// Inspect what was stored
testutil.AssertSessionContains(t, mem, "session-1", "Hello")
testutil.AssertSessionLen(t, mem, "session-1", 2) // user + assistant

fmt.Println(mem.AppendCount()) // total Append calls
fmt.Println(mem.SessionIDs())  // all active session IDs

Assertions

Drop-in assertion helpers for *testing.T and testing.TB:
// Verify the provider was called exactly twice
testutil.AssertCallCount(t, p, 2)

// Verify the last request contained a specific message
testutil.AssertLastRequestContains(t, p, core.RoleUser, "Go concurrency")

// Verify a session contains a specific string
testutil.AssertSessionContains(t, mem, "session-1", "expected text")

// Verify a session has an exact number of messages
testutil.AssertSessionLen(t, mem, "session-1", 4)

Debug handler

During development, attach PrettyPrintDebugHandler to see a turn-by-turn transcript of the agent loop:
agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("claude-sonnet-4-6"),
    chainforge.WithDebugHandler(chainforge.PrettyPrintDebugHandler(os.Stderr)),
)
Output:
[iter 0] → LLM  (3 messages)
[iter 0] ← LLM  stop=tool_use  ""
[iter 0] ⚙  tool=search  input={"query":"test"}
[iter 0] ✓  tool=search  result=result
[iter 1] → LLM  (5 messages)
[iter 1] ← LLM  stop=end_turn  "Done."
You can also write a custom handler for test assertions:
var toolsCalled []string
agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("mock"),
    chainforge.WithDebugHandler(func(ctx context.Context, ev chainforge.DebugEvent) {
        if ev.Kind == chainforge.DebugToolCall {
            toolsCalled = append(toolsCalled, ev.ToolCall.Name)
        }
    }),
)

agent.Run(ctx, "s1", "search for something")

if len(toolsCalled) == 0 || toolsCalled[0] != "search" {
    t.Errorf("expected search tool, got %v", toolsCalled)
}

AgentTrace — rich run assertions

AgentTrace records every LLM call and tool invocation during a run, making test assertions read like specifications.
import "github.com/lioarce01/chainforge/pkg/testutil"

func TestMyAgent(t *testing.T) {
    p := testutil.NewMockProvider(
        testutil.ToolUseResponse(core.ToolCall{Name: "search", Input: `{"query":"Go"}`}),
        testutil.EndTurnResponse("Go is great."),
    )

    tr := &testutil.AgentTrace{}

    agent, _ := chainforge.NewAgent(
        chainforge.WithProvider(p),
        chainforge.WithModel("mock"),
        chainforge.WithTools(searchTool),
        chainforge.WithDebugHandler(testutil.TraceHandler(tr)),
    )

    result, _ := agent.Run(context.Background(), "s1", "Tell me about Go")

    tr.AssertNoError(t)
    tr.AssertIterations(t, 2)          // tool call iteration + final answer
    tr.AssertToolCalled(t, "search")   // verify the tool was used
    tr.AssertFinalText(t, "Go is great.")
}
AgentTrace assertion methods:
MethodDescription
AssertIterations(t, n)Fails if the agent did not run exactly n iterations.
AssertToolCalled(t, name)Fails if the named tool was not called.
AssertToolNotCalled(t, name)Fails if the named tool was called.
AssertFinalText(t, want)Fails if the final text is not want.
AssertNoError(t)Fails if the run returned an error.
For access to the raw data (messages, responses, tool outputs), inspect tr.Iterations directly.

RunStreamCollect — streaming in one line

Use RunStreamCollect when you want streaming for display but still need the final result:
text, usage, err := agent.RunStreamCollect(ctx, "session-1", "Hello",
    func(delta string) { fmt.Print(delta) }) // called for each text chunk

// onDelta is nil-safe — behaves like Run but also returns usage:
text, usage, err := agent.RunStreamCollect(ctx, "session-1", "Hello", nil)
fmt.Printf("tokens: %d in / %d out\n", usage.InputTokens, usage.OutputTokens)

Example: testing a tool-calling agent

func TestMyAgent_CallsSearchTool(t *testing.T) {
    var searchCalled bool
    searchTool, _ := tools.Func("search", "Search the web",
        tools.NewSchema().AddString("query", "query", true).MustBuild(),
        func(ctx context.Context, input string) (string, error) {
            searchCalled = true
            return "result", nil
        },
    )

    p := testutil.NewMockProvider(
        testutil.ToolUseResponse(core.ToolCall{Name: "search", Input: `{"query":"test"}`}),
        testutil.EndTurnResponse("Done."),
    )

    agent, err := chainforge.NewAgent(
        chainforge.WithProvider(p),
        chainforge.WithModel("mock"),
        chainforge.WithTools(searchTool),
    )
    if err != nil {
        t.Fatal(err)
    }

    result, err := agent.Run(context.Background(), "s1", "Search for test")
    if err != nil {
        t.Fatal(err)
    }

    testutil.AssertCallCount(t, p, 2) // tool call + final answer
    if !searchCalled {
        t.Error("expected search tool to be called")
    }
    if result != "Done." {
        t.Errorf("got %q, want %q", result, "Done.")
    }
}