Running micro-benchmarks
What each benchmark measures
| Benchmark | What it isolates |
|---|---|
BenchmarkAgentRun | Full agent.Run() overhead at zero LLM latency |
BenchmarkAgentRunStream | agent.RunStream() channel drain overhead |
BenchmarkAgentRunWithTool | Tool dispatch + ordering cost |
BenchmarkAgentConcurrent | Goroutine scheduling under parallel load |
BenchmarkAgentSharedSession | History load cost as session grows |
BenchmarkStreamChunkSizes | SSE chunk granularity vs drain speed |
BenchmarkInMemoryGetGrowingHistory | In-memory store Get as history grows |
Results (AMD Ryzen 7 7800X3D, 16 threads)
Agent loop
| Benchmark | ns/op | B/op | allocs/op |
|---|---|---|---|
AgentRun | 3,582 | 1,324 | 17 |
AgentRunWithTool | 3,677 | 1,482 | 19 |
AgentConcurrent (8 goroutines) | 6,307 | 5,269 | 15 |
AgentRunStream | 10,074 | 10,297 | 33 |
Memory store
| Benchmark | ns/op | B/op | allocs/op |
|---|---|---|---|
InMemoryAppend | 213 | 459 | 0 |
InMemoryConcurrentSessions | 292 | 491 | 1 |
InMemoryGet (10 messages) | 251 | 896 | 1 |
InMemoryGet (100 messages) | 2,520 | 9,472 | 1 |
InMemoryGet (1000 messages) | 26,485 | 90,112 | 1 |
Streaming
| Benchmark | ns/op | B/op | allocs/op |
|---|---|---|---|
StreamConcurrent | 3,251 | 14,082 | 31 |
StreamDrain (1 KB, 64 B chunks) | 11,282 | 14,737 | 33 |
StreamChunkSizes/chunk=256 | 9,166 | 14,640 | 25 |
StreamChunkSizes/chunk=1024 | 5,748 | 7,580 | 18 |
Interpreting results
ns/op— nanoseconds per agent loop iteration (lower is better)B/op— bytes allocated per op (lower = less GC pressure)allocs/op— number of heap allocations per op
pkg/benchutil.MockProvider at zero latency.
Real LLM calls add 200ms–5s per turn depending on provider and model.
Qdrant benchmarks
Qdrant benchmarks require a running Qdrant instance and are automatically skipped in CI whenQDRANT_URL is not set:
End-to-end CLI benchmark
Thecmd/bench binary measures real provider throughput:
MockProvider API
For custom benchmarks,pkg/benchutil.MockProvider supports: