Observability

chainforge provides two observability layers: structured logging (stdlib log/slog, zero deps) and OpenTelemetry tracing (OTLP gRPC export). Both are opt-in — pkg/core remains dependency-free.

Quick setup (agent options)

The simplest way to enable observability is via agent options. No extra imports needed:

import (
    "log/slog"
    chainforge "github.com/lioarce01/chainforge"
    cfgotel "github.com/lioarce01/chainforge/pkg/middleware/otel"
)

// Optional: initialise a real tracer provider before NewAgent.
tp, shutdown, _ := cfgotel.InitTracerProvider(ctx, "localhost:4317", "my-service", "1.0.0")
defer shutdown(ctx)

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("claude-sonnet-4-6"),
    chainforge.WithLogging(slog.Default()),  // structured logs per call
    chainforge.WithTracing(),                // OTel spans per call
)

WithLogging and WithTracing wrap the provider transparently — no changes to how you call Run or RunStream.
WithTracing() uses the global OTel tracer. If InitTracerProvider was not called, a noop tracer is used silently.
Both options can be stacked in any order. Recommended: WithLogging before WithTracing so log events nest inside the span.

Logging middleware (advanced)

For finer-grained control — or to wrap a memory store — use pkg/middleware/logging directly:

import "github.com/lioarce01/chainforge/pkg/middleware/logging"

// Wrap provider — emits debug/info/error events per call.
loggedProvider := logging.NewLoggedProvider(rawProvider, logger)

// Wrap memory store.
loggedMemory := logging.NewLoggedMemoryStore(rawStore, logger)

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(loggedProvider),
    chainforge.WithMemory(loggedMemory),
)

Log events emitted

Event	Level	Fields
`provider.Chat: start`	Debug	provider, model, messages
`provider.Chat: done`	Info	provider, model, duration, stop_reason, input_tokens, output_tokens
`provider.Chat: error`	Error	provider, model, duration, error
`provider.ChatStream: start`	Debug	provider, model, messages
`provider.ChatStream: done`	Info	provider, model, duration, stop_reason, text_bytes
`provider.ChatStream: stream error`	Error	provider, duration, error
`memory.Get: done`	Debug	session, messages, duration
`memory.Append: done`	Debug	session, count, duration
`memory.Clear: done`	Info	session, duration

OpenTelemetry tracing

pkg/middleware/otel wraps providers and memory with OTel spans. Requires an OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, etc.).

Setup

import (
    "go.opentelemetry.io/otel/trace/noop"
    cfgotel "github.com/lioarce01/chainforge/pkg/middleware/otel"
)

// In production: initialise a real tracer provider.
tp, shutdown, err := cfgotel.InitTracerProvider(
    ctx,
    "localhost:4317",   // OTLP gRPC endpoint
    "my-service",
    "1.0.0",
)
defer shutdown(ctx)

tracer := cfgotel.Tracer()

// In tests: use the noop tracer (no real backend needed).
tracer = noop.NewTracerProvider().Tracer("")

// Wrap at startup.
tracedProvider := cfgotel.NewTracedProvider(rawProvider, tracer)
tracedMemory   := cfgotel.NewTracedMemoryStore(rawStore, tracer)

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(tracedProvider),
    chainforge.WithMemory(tracedMemory),
)

Spans and attributes

Span name	Attributes
`chainforge.provider.chat`	provider, model, messages, session_id (auto), stop_reason, input_tokens, output_tokens
`chainforge.provider.chat_stream`	provider, model, messages, session_id (auto), stop_reason, input_tokens, output_tokens
`chainforge.memory.get`	session_id, message_count
`chainforge.memory.append`	session_id, message_count
`chainforge.memory.clear`	session_id

The session_id attribute is injected automatically by the agent loop before every provider call — no user action required.

Custom span attributes

Use WithTraceAttributes to append arbitrary attributes to every span. The function receives the call context, allowing extraction of request-scoped values:

import (
    "go.opentelemetry.io/otel/attribute"
    chainforge "github.com/lioarce01/chainforge"
)

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("claude-sonnet-4-6"),
    chainforge.WithTracing(),
    chainforge.WithTraceAttributes(func(ctx context.Context) []attribute.KeyValue {
        return []attribute.KeyValue{
            attribute.String("user_id",  userIDFromCtx(ctx)),
            attribute.String("tenant",   tenantFromCtx(ctx)),
            attribute.String("session",  chainforge.SessionIDFromContext(ctx)),
        }
    }),
)

WithTraceAttributes has no effect when WithTracing is not set.

Streaming span lifecycle

chainforge.provider.chat_stream ends after the last event is drained, not after the channel is opened. This captures true end-to-end streaming latency including tool dispatch time.

Composing logging + tracing (advanced)

rawProvider    := anthropicProvider
loggedProvider := logging.NewLoggedProvider(rawProvider, logger)
tracedProvider := otel.NewTracedProvider(loggedProvider, tracer)

// tracedProvider → loggedProvider → rawProvider
// Both wrappers are active; order determines span/log nesting.

Or equivalently via options (recommended):

chainforge.WithLogging(logger),  // logging wraps rawProvider
chainforge.WithTracing(),        // tracing wraps the logged provider

Prometheus metrics

pkg/middleware/metrics records three metric families for every provider call.

Setup

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/lioarce01/chainforge/pkg/middleware/metrics"
)

// Register on any prometheus.Registerer (use NewRegistry() in tests)
mp, err := metrics.New(provider, prometheus.DefaultRegisterer)
// or panic on registration error:
mp = metrics.MustNew(provider, prometheus.DefaultRegisterer)

agent, _ := chainforge.NewAgent(chainforge.WithProvider(mp), chainforge.WithModel(model))

Via ProviderBuilder:

p := chainforge.NewProviderBuilder(provider).
    WithMetrics(prometheus.DefaultRegisterer).
    Build()

Metrics emitted

Metric	Type	Labels	Notes
`chainforge_provider_requests_total`	Counter	`provider`, `status` (`ok`\|`error`)	Incremented after each call completes
`chainforge_provider_request_duration_seconds`	Histogram	`provider`	Latency in seconds; for streams covers open → channel close
`chainforge_provider_tokens_total`	Counter	`provider`, `token_type` (`input`\|`output`)	From `Usage` in the response or `Done` stream event

For ChatStream, all three metrics are recorded after the channel is fully drained, not at stream-open time.

Per-tool metrics

Wrap individual tools to record per-tool latency and call counts:

import "github.com/lioarce01/chainforge/pkg/middleware/metrics"

reg := prometheus.NewRegistry() // or prometheus.DefaultRegisterer

// Share metric vectors across all tools in one agent.
toolReg, err := metrics.NewToolRegistry(reg)
if err != nil { ... }

agent, _ := chainforge.NewAgent(
    chainforge.WithProvider(p),
    chainforge.WithModel("claude-sonnet-4-6"),
    chainforge.WithTools(
        toolReg.Wrap(myTool),
        toolReg.Wrap(anotherTool),
    ),
)

Metric	Type	Labels
`chainforge_tool_calls_total`	Counter	`tool`, `status` (`ok`\|`error`)
`chainforge_tool_duration_seconds`	Histogram	`tool`

Useful PromQL queries:

# Error rate per tool
rate(chainforge_tool_calls_total{status="error"}[5m])
  / rate(chainforge_tool_calls_total[5m])

# p99 tool latency
histogram_quantile(0.99, rate(chainforge_tool_duration_seconds_bucket[5m]))

Server logs

The HTTP server emits JSON logs for every request:

{
  "time": "2026-03-17T12:00:00Z",
  "level": "INFO",
  "msg": "http request",
  "method": "POST",
  "path": "/v1/chat",
  "status": 200,
  "duration": "1.234s",
  "remote_addr": "10.0.0.1:54321"
}

Local tracing with Jaeger

# Start the full stack
cd deploy/
ANTHROPIC_API_KEY=sk-ant-... docker-compose up -d

# Enable OTel in config.yaml
echo "otel_enabled: true" >> config.yaml

# View traces at http://localhost:16686

Getting Started

Core Concepts

Production

Reference

Quick setup (agent options)

Logging middleware (advanced)

Log events emitted

OpenTelemetry tracing

Setup

Spans and attributes

Custom span attributes

Streaming span lifecycle

Composing logging + tracing (advanced)

Prometheus metrics

Setup

Metrics emitted

Per-tool metrics

Server logs

Local tracing with Jaeger

Getting Started

Core Concepts

Production

Reference

​Quick setup (agent options)

​Logging middleware (advanced)

​Log events emitted

​OpenTelemetry tracing

​Setup

​Spans and attributes

​Custom span attributes

​Streaming span lifecycle

​Composing logging + tracing (advanced)

​Prometheus metrics

​Setup

​Metrics emitted

​Per-tool metrics

​Server logs

​Local tracing with Jaeger

Quick setup (agent options)

Logging middleware (advanced)

Log events emitted

OpenTelemetry tracing

Setup

Spans and attributes

Custom span attributes

Streaming span lifecycle

Composing logging + tracing (advanced)

Prometheus metrics

Setup

Metrics emitted

Per-tool metrics

Server logs

Local tracing with Jaeger