Using Langfuse for LLM Observability
Copy page
Complete guide to using Langfuse for LLM observability, tracing, and analytics in the Inkeep Agent Framework
Langfuse is an open-source LLM engineering platform that provides specialized observability for AI applications, including token usage tracking, model performance analytics, and detailed LLM interaction tracing.
Quick Start
1. Setup Langfuse Account
First, create a Langfuse account and get your API keys:
- Sign up at Langfuse Cloud
- Create a new project in your Langfuse dashboard
- Get your API keys from the project settings:
- Public Key:
pk-lf-xxxxxxxxxx
- Secret Key:
sk-lf-xxxxxxxxxx
- Public Key:
2. Configure OpenTelemetry Collector
Add Langfuse as an exporter to your OTEL collector configuration:
3. Generate Authentication Credentials
Langfuse requires Basic Authentication with base64-encoded credentials:
4. Complete Configuration Example
Here's a complete OTEL collector configuration with Langfuse integration:
5. Start the Services
Architecture
The Langfuse integration works alongside your existing observability stack:
Running LLM Evaluations in Langfuse Dashboard
Langfuse provides a powerful web interface for running LLM evaluations without writing code. You can create datasets, set up evaluators, and run evaluations directly in the dashboard.
Accessing the Evaluation Features
- Log into your Langfuse dashboard: https://cloud.langfuse.com
- Navigate to your project where your agent traces are being collected
- Click "Evaluations" in the left sidebar
- Click "Set up evaluator" to begin creating evaluations
Setting Up LLM-as-a-Judge Evaluators
Set Up Default Evaluation Model
Before creating evaluators, you need to configure a default LLM connection for evaluations:

Setting up the LLM Connection:
- Navigate to "Evaluator Library" in your Langfuse dashboard
- Click "Set up" next to "Default Evaluation Model"
- Configure the LLM connection:
- LLM Adapter: Select your preferred provider
- Provider Name: Give it a descriptive name (e.g., "openai")
- API Key: Enter your OpenAI API key (stored encrypted)
- Advanced Settings: Configure base URL, model parameters if needed
- Click "Create connection" to save
Important Notes:
- The default model is used by all managed evaluators
- You can change it anytime - existing evaluators will use the new model
- The model must support structured output
Navigate to Evaluator Setup
- Go to "Evaluations" → "Running Evaluators"
- Click "Set up evaluator" button
- You'll see two main steps: "1. Select Evaluator" and "2. Run Evaluator"
Choose Your Evaluator Type
You have two main options:
Option A: Langfuse Managed Evaluators (Recommended)
Langfuse provides a comprehensive catalog of pre-built evaluators including Quality & Accuracy Evaluators, RAG-Specific Evaluators, and Advanced Evaluators
To use a managed evaluator:
- Browse the evaluator list and find one that matches your needs
- Click on the evaluator to see its description and criteria
- Click "Use Selected Evaluator" button
Option B: Create Custom Evaluator
Best for: Specific evaluation needs not covered by managed evaluators
-
Click "+ Create Custom Evaluator" button
-
Fill in evaluator details:
- Name: Choose a descriptive name (e.g., "agent-accuracy")
- Description: Explain what this evaluator measures
- Model: Select evaluation model (GPT-4 recommended)
- Temperature: Set to low value (0.1) for consistent scoring
-
Create evaluation prompt with variable placeholders:
{{input}}
- The original user input/question{{output}}
- The AI agent's response{{expected_output}}
- Expected or correct answer{{context}}
- Additional context provided to the agent
Example Custom Evaluator Prompt:
Configure Evaluation Scope

This screen shows the evaluation configuration interface where you can:
Generated Score Name:
- Score Name:
Conciseness
(automatically filled based on selected evaluator) - This will be the name of the score that appears in your traces
Evaluator Runs On:
- New traces: Automatically evaluate incoming traces
- Existing traces: Backfill evaluation for historical data
Target Filter:
- Add filter: Create rules to target specific traces (by name, tags, user ID, etc.)
- Preview: Shows sample traces that match your filters from the last 24 hours
Enable and Monitor
- Click "Enable Evaluator" to start automatic evaluation
- Monitor evaluation progress in the dashboard
- View evaluation results as they complete
- Track evaluation costs and adjust sampling if needed