Typescript sdk

Using Langfuse for LLM Observability

Copy page

Complete guide to using Langfuse for LLM observability, tracing, and analytics in the Inkeep Agent Framework

Langfuse is an open-source LLM engineering platform that provides specialized observability for AI applications, including token usage tracking, model performance analytics, and detailed LLM interaction tracing.

Quick Start

1. Setup Langfuse Account

First, create a Langfuse account and get your API keys:

  1. Sign up at Langfuse Cloud
  2. Create a new project in your Langfuse dashboard
  3. Get your API keys from the project settings:
    • Public Key: pk-lf-xxxxxxxxxx
    • Secret Key: sk-lf-xxxxxxxxxx

2. Configure Langfuse

To integrate Langfuse with your Inkeep Agent Framework instrumentation, you need to modify your instrumentation file to include the Langfuse span processor.

Replace the default setup with a custom NodeSDK configuration:

Set your environment variables:

LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxx
LANGFUSE_BASE_URL=https://us.cloud.langfuse.com

Update your instrumentation file:

import { 
  defaultSpanProcessors, 
  defaultContextManager, 
  defaultResource, 
  defaultTextMapPropagator, 
  defaultInstrumentations 
} from "@inkeep/agents-run-api/instrumentation";
import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";

export const defaultSDK = new NodeSDK({
  resource: defaultResource,
  contextManager: defaultContextManager,
  textMapPropagator: defaultTextMapPropagator,
  spanProcessors: [...defaultSpanProcessors, new LangfuseSpanProcessor()],
  instrumentations: defaultInstrumentations,
});

defaultSDK.start();

What This Configuration Does

  • Preserves all default instrumentation: Uses the same resource, context manager, propagator, and instrumentations as the default setup
  • Adds Langfuse span processor: Extends the default span processors with Langfuse's processor for specialized LLM observability
  • Maintains compatibility: Your existing traces will continue to work while adding Langfuse-specific features

Dataset setup and execution

Use the Inkeep Agent Cookbook repository which provides ready-to-use scripts for creating and running Langfuse dataset evaluations programmatically.

1. Clone the Agent Cookbook Repository

git clone https://github.com/inkeep/agent-cookbook.git
cd agent-cookbook/evals/langfuse-dataset-example

Set up environment variables in a .env file:

# Langfuse configuration (required for both scripts)
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
LANGFUSE_BASE_URL=https://cloud.langfuse.com

# Chat API configuration (for dataset runner)
INKEEP_AGENTS_RUN_API_KEY=your_api_key
INKEEP_AGENTS_RUN_API_URL=your_chat_api_base_url

# Execution context (for dataset runner)
INKEEP_TENANT_ID=your_tenant_id
INKEEP_PROJECT_ID=your_project_id
INKEEP_GRAPH_ID=your_graph_id

2. Initialize Dataset with Sample Data

Run the basic Langfuse example to initialize a dataset with sample user messages:

pnpm run langfuse-init-example

This script will:

  • Connect to your Langfuse project
  • Create a new dataset called "inkeep-weather-example-dataset" with sample dataset items

3. Run Dataset Items to Generate Traces

Run dataset items to generate traces that can be evaluated:

pnpm run langfuse-run-dataset

This script will:

  • Read items from your Langfuse dataset
  • Execute each item against your weather graph
  • Generate the data needed for evaluation

Running LLM Evaluations in Langfuse Dashboard

Langfuse provides a powerful web interface for running LLM evaluations without writing code. You can create datasets, set up evaluators, and run evaluations directly in the dashboard.

Accessing the Evaluation Features

  1. Log into your Langfuse dashboard: https://cloud.langfuse.com
  2. Navigate to your project where your agent traces are being collected
  3. Click "Evaluations" in the left sidebar
  4. Click "Set up evaluator" to begin creating evaluations

Setting Up LLM-as-a-Judge Evaluators

Set Up Default Evaluation Model

Before creating evaluators, you need to configure a default LLM connection for evaluations:

Langfuse LLM Connection setup showing OpenAI provider configuration with API key field and advanced settings

Setting up the LLM Connection:

  1. Navigate to "Evaluator Library" in your Langfuse dashboard
  2. Click "Set up" next to "Default Evaluation Model"
  3. Configure the LLM connection:
    • LLM Adapter: Select your preferred provider
    • Provider Name: Give it a descriptive name (e.g., "openai")
    • API Key: Enter your OpenAI API key (stored encrypted)
    • Advanced Settings: Configure base URL, model parameters if needed
  4. Click "Create connection" to save
  1. Go to "Evaluations""Running Evaluators"
  2. Click "Set up evaluator" button
  3. You'll see two main steps: "1. Select Evaluator" and "2. Run Evaluator"

Choose Your Evaluator Type

You have two main options:

Option A: Langfuse Managed Evaluators

Langfuse provides a comprehensive catalog of pre-built evaluators

To use a managed evaluator:

  1. Browse the evaluator list and find one that matches your needs
  2. Click on the evaluator to see its description and criteria
  3. Click "Use Selected Evaluator" button

Customizing Managed Evaluators for Dataset Runs

Once you've selected a managed evaluator, you can edit it to target your dataset runs. This is particularly useful for evaluating agent performance against known test cases.

Example: Customizing a Helpfulness Evaluator

  1. Select the "Helpfulness" evaluator from the managed list
  2. Under Target select dataset runs
  3. Configure variable mapping
    • {{input}}Object: Trace, Object Variable: Input
    • {{generation}}Object: Trace, Object Variable: Output

Option B: Create Custom Evaluator

  1. Click "+ Create Custom Evaluator" button

  2. Fill in evaluator details:

    • Name: Choose a descriptive name (e.g., "weather_tool_used")
    • Description: Explain what this evaluator measures
    • Model: Select evaluation model
    • Prompt: Configure a custom prompt

Example: Customizing a Weather Tool Evaluator

  1. Prompt
You are an expert evaluator for an AI agent system.
Your task is to rate the correctness of tool usage on a scale from 0.0 to 1.0.

Instructions:

If the user’s question is not weather-related and the tool used is not get_weather_forecast, return 1.0.

If the user’s question is not weather-related and the tool is get_weather_forecast, return 0.0.

If the user’s question is weather-related, return 1.0 only if the tool used is get_weather_forecast; otherwise return 0.0.

Input:
User Question: {`{{input}}`}
Tool Used: {`{{tool_used}}`}
  1. Configure variable mapping:
    • {{input}}Object: Trace, Object Variable: Input
    • {{tool_used}}Object: Span, Object Name: weather-forecaster.ai.toolCall, Object Variable: Metadata, JsonPath: $.attributes["ai.toolCall.name"]
Langfuse helpfulness evaluator setup screen showing evaluator configuration with variable mapping and trace targeting options

Enable and Monitor

  1. Click "Enable Evaluator" to start automatic evaluation
  2. Monitor evaluation progress in the dashboard
  3. View evaluation results as they complete