Typescript sdk

Using SigNoz for Observability

Copy page

Complete guide to using SigNoz for observability, monitoring, and tracing in the Inkeep Agent Framework

SigNoz is a full-stack observability platform that provides distributed tracing so that you can track requests across multiple agents and services.

Quick Start

  1. Start the Complete Stack:
# From the root directory of the agent framework
docker compose up -d

This single command starts the entire observability and integration stack:

Agent Framework Components:

  • OTEL Collector (ports 14317-14318)
  • Jaeger UI (port 16686)

SigNoz Components:

  • SigNoz UI (port 3080)
  • SigNoz OTEL Collector (ports 4317-4318)
  • ClickHouse Database
  • Zookeeper

Architecture

The observability setup uses a dual-routing OTEL Collector:

Application → OTEL Collector → Jaeger → Jaeger UI (http://localhost:16686)
                             → SigNoz → SigNoz UI (http://localhost:3080)

Application Configuration

Configure your agent framework to send traces to the OTEL Collector:

// Recommended: Use OTEL Collector for dual routing to Jaeger + SigNoz
process.env.OTEL_EXPORTER_OTLP_ENDPOINT = "http://localhost:14318/v1/traces";
process.env.OTEL_SERVICE_NAME = "inkeep-agents";

// Alternative endpoints:
// OTLP HTTP: http://localhost:14318/v1/traces (default)
// OTLP gRPC: http://localhost:14317/v1/traces
// Direct to Jaeger: http://localhost:24318 (Jaeger only)

Accessing the UI

Once the stack is running, access SigNoz at: http://localhost:3080

You can also access:

Stopping Services

# Stop the entire stack
docker compose down

Using SigNoz UI

The Traces page provides detailed request tracing:

Viewing Traces

The SigNoz traces interface provides comprehensive visibility into your agent operations:

SigNoz Traces Explorer showing inkeep-chat service traces with timestamp, service name, operation name, duration, HTTP method, and response status code columns

The traces explorer shows:

  • Timestamp: When each span occurred
  • Service Name: The service that generated the span (e.g., inkeep-chat)
  • Operation Name: Specific operations like ai.generateObject, tls.connect, ai.toolCall
  • Duration: How long each operation took (in milliseconds)
  • HTTP Method: For HTTP operations, shows the method (POST, GET, etc.)
  • Response Status Code: HTTP status codes (200, 404, etc.)

Key features of the traces view:

  • Filtering Options: Use the left sidebar to filter by duration, deployment environment, service name, and more

  • Time Range Selection: Choose from preset ranges or custom time periods

  • Multiple Views: Switch between List View, Traces, Time Series, and Table View

  • Real-time Updates: Traces refresh automatically to show new data

  • Trace List: Browse all traces with filtering options

  • Trace Details: Drill down into individual traces

  • Span Timeline: See the execution flow across agents

Filtering Traces

# Filter by service
service_name = "inkeep-chat"

# Filter by operation
operation = "agent.generate"

# Filter by status
status = "error"

# Filter by duration
duration > 1000ms

# Filter by custom attributes
agent.id = "customer-support-agent"

Analyzing Individual Traces

When you click on a specific trace from the list, you'll see the detailed trace view with a flamegraph visualization:

SigNoz Trace Details showing flamegraph visualization with span hierarchy, timing information, and detailed span attributes

Flamegraph Visualization:

  • Horizontal Bars: Each bar represents a span (operation) in your trace
  • Bar Width: Proportional to the duration of the operation
  • Color Coding:
    • Blue bars: Successful operations
    • Red bars: Operations with errors

Key Information Displayed:

  • Total Spans: Total number of operations in this trace (e.g., 122)
  • Error Spans: Number of spans that encountered errors (e.g., 19)
  • Trace Duration: Total time for the entire trace (e.g., 5.2 mins)
  • Timestamp: When the trace occurred
  • Service: The primary service (e.g., inkeep-chat)

Span Details Panel (Right Side):

  • Span Name & ID: Operation name and unique identifier
  • Timing: Start time and duration
  • Service & Kind: Which service and span type (Server, Client, etc.)
  • Status: Success/error status code
  • Attributes, Events & Links: Additional span metadata

How to Use This View:

  1. Identify Bottlenecks: Look for the widest bars in the flamegraph - these represent the longest-running operations
  2. Find Errors: Red bars indicate operations that failed - click on them to see error details
  3. Understand Flow: Follow the vertical hierarchy to see how operations call each other
  4. Analyze Performance: Use the timeline to see which operations run in parallel vs. sequentially
  5. Drill Down: Click on any span to see detailed attributes, events, and error information