Skip to main content
chainforge ships micro-benchmarks that measure framework overhead (memory lookups, channel scheduling, message marshalling) with a zero-latency mock provider, plus an end-to-end CLI for real provider throughput.

Running micro-benchmarks

# All benchmarks, all packages
go test -bench=. -benchmem ./tests/bench/

# Single benchmark with custom iterations
go test -bench=BenchmarkAgentRun -benchmem -count=3 ./tests/bench/

# Race detector (catches data races under concurrent load)
go test -bench=BenchmarkAgentConcurrent -race ./tests/bench/

What each benchmark measures

BenchmarkWhat it isolates
BenchmarkAgentRunFull agent.Run() overhead at zero LLM latency
BenchmarkAgentRunStreamagent.RunStream() channel drain overhead
BenchmarkAgentRunWithToolTool dispatch + ordering cost
BenchmarkAgentConcurrentGoroutine scheduling under parallel load
BenchmarkAgentSharedSessionHistory load cost as session grows
BenchmarkStreamChunkSizesSSE chunk granularity vs drain speed
BenchmarkInMemoryGetGrowingHistoryIn-memory store Get as history grows

Results (AMD Ryzen 7 7800X3D, 16 threads)

Agent loop

Benchmarkns/opB/opallocs/op
AgentRun3,5821,32417
AgentRunWithTool3,6771,48219
AgentConcurrent (8 goroutines)6,3075,26915
AgentRunStream10,07410,29733
Tool dispatch adds ~100 ns over a plain run. Concurrent sessions scale linearly — no lock contention between independent sessions.

Memory store

Benchmarkns/opB/opallocs/op
InMemoryAppend2134590
InMemoryConcurrentSessions2924911
InMemoryGet (10 messages)2518961
InMemoryGet (100 messages)2,5209,4721
InMemoryGet (1000 messages)26,48590,1121
Append is allocation-free. Get allocates a single slice regardless of history length.

Streaming

Benchmarkns/opB/opallocs/op
StreamConcurrent3,25114,08231
StreamDrain (1 KB, 64 B chunks)11,28214,73733
StreamChunkSizes/chunk=2569,16614,64025
StreamChunkSizes/chunk=10245,7487,58018
Larger chunks reduce allocations proportionally. Concurrent draining outperforms sequential due to goroutine scheduling overlap.

Interpreting results

  • ns/op — nanoseconds per agent loop iteration (lower is better)
  • B/op — bytes allocated per op (lower = less GC pressure)
  • allocs/op — number of heap allocations per op
All micro-benchmarks use pkg/benchutil.MockProvider at zero latency. Real LLM calls add 200ms–5s per turn depending on provider and model.

Qdrant benchmarks

Qdrant benchmarks require a running Qdrant instance and are automatically skipped in CI when QDRANT_URL is not set:
QDRANT_URL=localhost:6334 go test -bench=BenchmarkQdrant -benchmem ./tests/bench/

End-to-end CLI benchmark

The cmd/bench binary measures real provider throughput:
go build -o chainforge-bench ./cmd/bench/

# Mock provider — framework overhead only
./chainforge-bench --mock --requests 100 --concurrency 4

# Real provider
ANTHROPIC_API_KEY=sk-ant-... \
./chainforge-bench \
  --config config.yaml \
  --requests 50 \
  --concurrency 2 \
  --warmup 5
Output:
chainforge benchmark
  provider    : anthropic
  model       : claude-sonnet-4-6
  concurrency : 2
  requests    : 50 (+ 5 warmup)

Warming up (5 requests)...
Running benchmark (50 requests, concurrency 2)...

Results:
  requests : 20
  p50      : 3.86s
  p95      : 5.80s
  p99      : 5.80s
  mean     : 3.88s
  errors   : 0
  total    : 22.3s
  rps      : 0.90

MockProvider API

For custom benchmarks, pkg/benchutil.MockProvider supports:
p := benchutil.NewMockProvider(benchutil.LargeResponseText(512))
p.Latency = 50 * time.Millisecond // simulate network RTT
p.ChunkSize = 64                  // bytes per stream chunk
p.InjectError = errors.New("rate limited") // test error paths