Typescript sdk

Using Langfuse for LLM Observability

Copy page

Complete guide to using Langfuse for LLM observability, tracing, and analytics in the Inkeep Agent Framework

Langfuse is an open-source LLM engineering platform that provides specialized observability for AI applications, including token usage tracking, model performance analytics, and detailed LLM interaction tracing.

Quick Start

1. Setup Langfuse Account

First, create a Langfuse account and get your API keys:

  1. Sign up at Langfuse Cloud
  2. Create a new project in your Langfuse dashboard
  3. Get your API keys from the project settings:
    • Public Key: pk-lf-xxxxxxxxxx
    • Secret Key: sk-lf-xxxxxxxxxx

2. Configure OpenTelemetry Collector

Add Langfuse as an exporter to your OTEL collector configuration:

# otel-collector-config.yaml
exporters:
  # Export to Langfuse
  otlphttp/langfuse:
    endpoint: "https://us.cloud.langfuse.com/api/public/otel" # US region
    headers:
      Authorization: "Basic <BASE64_ENCODED_CREDENTIALS>"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/langfuse]

3. Generate Authentication Credentials

Langfuse requires Basic Authentication with base64-encoded credentials:

# Generate base64 encoded credentials
echo -n "pk-lf-YOUR_PUBLIC_KEY:sk-lf-YOUR_SECRET_KEY" | base64

4. Complete Configuration Example

Here's a complete OTEL collector configuration with Langfuse integration:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 100ms
    send_batch_size: 1
    send_batch_max_size: 10

  attributes:
    actions:
      - key: http.request.header.authorization
        value: "[REDACTED]"
        action: update

exporters:
  # Export to Jaeger
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  # Export to SigNoz OTEL collector
  otlp/signoz:
    endpoint: signoz-otel-collector:4317
    tls:
      insecure: true

  # Export to Langfuse
  otlphttp/langfuse:
    endpoint: "https://us.cloud.langfuse.com/api/public/otel"
    headers:
      Authorization: "Basic XXX"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [attributes, batch]
      exporters: [otlp/jaeger, otlp/signoz, otlphttp/langfuse]

5. Start the Services

# From the root directory of the agent framework
cd deploy/docker
docker compose up -d

Architecture

The Langfuse integration works alongside your existing observability stack:

Application → OTEL Collector → Jaeger → Jaeger UI (http://localhost:16686)
                             → SigNoz → SigNoz UI (http://localhost:3080)
                             → Langfuse → Langfuse Dashboard

Running LLM Evaluations in Langfuse Dashboard

Langfuse provides a powerful web interface for running LLM evaluations without writing code. You can create datasets, set up evaluators, and run evaluations directly in the dashboard.

Accessing the Evaluation Features

  1. Log into your Langfuse dashboard: https://cloud.langfuse.com
  2. Navigate to your project where your agent traces are being collected
  3. Click "Evaluations" in the left sidebar
  4. Click "Set up evaluator" to begin creating evaluations

Setting Up LLM-as-a-Judge Evaluators

Set Up Default Evaluation Model

Before creating evaluators, you need to configure a default LLM connection for evaluations:

Langfuse LLM Connection setup showing OpenAI provider configuration with API key field and advanced settings

Setting up the LLM Connection:

  1. Navigate to "Evaluator Library" in your Langfuse dashboard
  2. Click "Set up" next to "Default Evaluation Model"
  3. Configure the LLM connection:
    • LLM Adapter: Select your preferred provider
    • Provider Name: Give it a descriptive name (e.g., "openai")
    • API Key: Enter your OpenAI API key (stored encrypted)
    • Advanced Settings: Configure base URL, model parameters if needed
  4. Click "Create connection" to save

Important Notes:

  • The default model is used by all managed evaluators
  • You can change it anytime - existing evaluators will use the new model
  • The model must support structured output
  1. Go to "Evaluations""Running Evaluators"
  2. Click "Set up evaluator" button
  3. You'll see two main steps: "1. Select Evaluator" and "2. Run Evaluator"

Choose Your Evaluator Type

You have two main options:

Langfuse provides a comprehensive catalog of pre-built evaluators including Quality & Accuracy Evaluators, RAG-Specific Evaluators, and Advanced Evaluators

To use a managed evaluator:

  1. Browse the evaluator list and find one that matches your needs
  2. Click on the evaluator to see its description and criteria
  3. Click "Use Selected Evaluator" button
Option B: Create Custom Evaluator

Best for: Specific evaluation needs not covered by managed evaluators

  1. Click "+ Create Custom Evaluator" button

  2. Fill in evaluator details:

    • Name: Choose a descriptive name (e.g., "agent-accuracy")
    • Description: Explain what this evaluator measures
    • Model: Select evaluation model (GPT-4 recommended)
    • Temperature: Set to low value (0.1) for consistent scoring
  3. Create evaluation prompt with variable placeholders:

    • {{input}} - The original user input/question
    • {{output}} - The AI agent's response
    • {{expected_output}} - Expected or correct answer
    • {{context}} - Additional context provided to the agent

Example Custom Evaluator Prompt:

You are an expert evaluator for an AI customer support agent.

Evaluate the helpfulness of this response on a scale of 0-1:

User Question: {{input}}
Agent Response: {{output}}
Expected Response: {{expected_output}}

Consider:
- Does it directly answer the question?
- Is the information accurate and complete?
- Is the tone appropriate for customer support?
- Does it provide actionable next steps?

Provide only a numeric score between 0.0 and 1.0.

Configure Evaluation Scope

Langfuse evaluator configuration screen showing target data selection, trace filtering, sampling settings, and delay configuration

This screen shows the evaluation configuration interface where you can:

Generated Score Name:

  • Score Name: Conciseness (automatically filled based on selected evaluator)
  • This will be the name of the score that appears in your traces

Evaluator Runs On:

  • New traces: Automatically evaluate incoming traces
  • Existing traces: Backfill evaluation for historical data

Target Filter:

  • Add filter: Create rules to target specific traces (by name, tags, user ID, etc.)
  • Preview: Shows sample traces that match your filters from the last 24 hours

Enable and Monitor

  1. Click "Enable Evaluator" to start automatic evaluation
  2. Monitor evaluation progress in the dashboard
  3. View evaluation results as they complete
  4. Track evaluation costs and adjust sampling if needed