Contributing

Spans and Traces

Copy page

OpenTelemetry spans for distributed tracing and observability in the Inkeep Agent Framework

Overview

The Inkeep Agent Framework uses OpenTelemetry for distributed tracing and observability. Spans provide detailed visibility into the execution flow of agents, context resolution, tool execution, and other framework operations.

Getting Started with Spans

1. Import Required Dependencies

import { SpanStatusCode, type Span } from "@opentelemetry/api";
import { getTracer, setSpanWithError } from "../tracer";

2. Get the Tracer

// Use the centralized tracer utility
const tracer = getTracer("your-service-name");

Creating and Using Spans

return tracer.startActiveSpan(
  "context.resolve",
  {
    attributes: {
      "context.config_id": contextConfig.id,
      "context.trigger_event": options.triggerEvent,
    },
  },
  async (span: Span) => {
    try {
      // Your operation logic here
      return result;
    } catch (error) {
      // Use setSpanWithError for consistent error handling
      setSpanWithError(span, error);
      throw error;
    }
  }
);

Setting Span Attributes

Basic Attributes

span.setAttributes({
  "user.id": userId,
  "request.method": "POST",
});

Adding Events to Spans

Recording Important Milestones

// Add events for significant operations
span.addEvent("context.fetch_started", {
  definitionId: definition.id,
  url: definition.fetchConfig.url,
});

Error Events

span.addEvent("error.validation_failed", {
  definitionId: definition.id,
  error_type: "json_schema_validation",
  error_details: errorMessage,
});

Error Handling and Status

Using setSpanWithError Utility

The framework provides a convenient setSpanWithError utility function that handles error recording and status setting:

try {
  // Your operation
} catch (error) {
  // Use the setSpanWithError utility for consistent error handling
  setSpanWithError(span, error);
  throw error;
}

Cache Telemetry Attributes

LLM generation spans carry cache telemetry emitted by the gateway cost middleware. These attributes are defined in packages/agents-core/src/constants/otel-attributes.ts and consumed by the Manage UI trace timeline and cost dashboard.

Outcome attributes (OTel-GenAI semconv v1.41.1 aligned)

AttributeTypeDescription
gen_ai.usage.cache_read.input_tokensnumberTokens served from the provider cache on this call
gen_ai.usage.cache_creation.input_tokensnumberTokens written to the provider cache on this call

gen_ai.usage.input_tokens (the existing total) is cache-inclusive and unchanged. no_cache_input_tokens is derivable as input_tokens − cache_read − cache_creation and is not emitted.

Intent attributes (Inkeep cache.intent.* namespace)

AttributeTypeDescription
cache.intent.marker_countint (0–4)Number of cache markers attached to the request prefix
cache.intent.prefix_signaturestring10-character SHA-256 hex prefix of the cacheable prefix (system prompts and tool definitions), used to distinguish MISS-expected (prefix changed) from MISS-regression (stable prefix, no read)

Cache state (consumer-derived, not a span attribute)

cache_state is not emitted as a span attribute — it is computed by consumers (Manage UI, the pnpm cache-debug CLI) from the four attributes above plus prior-turn signature comparison:

StateDerivation
HITmarker_count > 0 and cache_read > 0
MISS-expectedmarker_count > 0, cache_read = 0, prefix signature differs from previous call
MISS-regressionmarker_count > 0, cache_read = 0, prefix signature identical to previous call
NOT-ATTEMPTEDmarker_count = 0
NOT-SUPPORTED-BY-PROVIDERProvider metadata absent or provider does not support marker-based caching

When adding new span attributes, follow the same append-only rule: additions to USAGE_COST_AGGREGATION_ORDER in signoz-stats.ts must be appended, never reordered, because the consumer side uses positional indexing.

Best Practices

1. Consistent Naming Convention

The span naming convention follows a hierarchical structure that mirrors your code organization.

// Format: 'class.function'
// Use descriptive span names that follow a hierarchical structure

// Agent operations
"agent.generate";
"agent.tool_execute";
"agent.transfer";

// Context operations
"context.resolve";
"context.fetch";

Naming Rules

  1. Class First: Start with the class/module name (e.g., agent, context, tool)
  2. Function Second: Follow with the specific function/method (e.g., generate, resolve)
  3. Use Underscores: For multi-word functions, use underscores (e.g., tool_execute, cache_lookup)
  4. Consistent Casing: Use lowercase with underscores for consistency

Configuration and Setup

Environment Variables

# OpenTelemetry configuration
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317

Instrumentation Setup

The framework automatically sets up OpenTelemetry instrumentation in src/instrumentation.ts:

Examples in the Codebase

Agent Operations

See src/agents/Agent.ts for span usage in agent generation and tool execution.

Example from Agent.ts:

// Class: Agent
// Function: generate
return tracer.startActiveSpan(
  "agent.generate",
  {
    attributes: {
      "agent.id": this.id,
      "agent.name": this.name,
    },
  },
  async (span: Span) => {
    // ... implementation
  }
);

Summary

Spans provide powerful observability into your Inkeep Agent Framework operations. By following these patterns:

  1. Use getTracer() for consistent tracing
  2. Use consistent naming
  3. Set meaningful attributes for searchability
  4. Handle errors properly using setSpanWithError for consistent error handling
  5. Use startActiveSpan for automatic lifecycle management

This will give you comprehensive visibility into your agent operations, making debugging and performance optimization much easier.