Evaluations

Copy page

Manage evaluators programmatically with the TypeScript SDK

The TypeScript SDK provides an EvaluationClient that talks to the Evaluations API so you can manage evaluators, evaluation suite configs, trigger batch evaluations, and read results—all from code.

For full endpoint details and request/response shapes, see the Evaluations API reference.

Setup: create a client

Create an evaluation client with your tenant ID, project ID, API base URL, and optional API key.

import { EvaluationClient } from "@inkeep/agents-sdk";

const client = new EvaluationClient({
  tenantId: process.env.INKEEP_TENANT_ID!,
  projectId: process.env.INKEEP_PROJECT_ID!,
  apiUrl: "https://api.inkeep.com",
  apiKey: process.env.INKEEP_API_KEY,
});
ParameterTypeRequiredDescription
tenantIdstringYesYour tenant (organization) ID
projectIdstringYesYour project ID
apiUrlstringYesAPI base URL (e.g. https://api.inkeep.com or your self-hosted URL)
apiKeystringNoBearer token for authenticated requests. Omit for unauthenticated or custom auth.

Use client in the examples below (e.g. client.createEvaluator(...)).

Evaluators

Evaluators define how to score agent outputs (e.g. with a prompt and model, optional pass criteria).

Creating an evaluator

Pass an object with name, description, prompt, schema (JSON schema for the evaluator output), and model (model identifier and optional provider options). Optionally include passCriteria to define pass/fail conditions on the schema fields.

const evaluator = await client.createEvaluator({
  name: "Helpfulness",
  description: "Scores how helpful the agent response is (0-1)",
  prompt: `You are an expert evaluator. Score how helpful the assistant's response is to the user on a scale of 0.0 to 1.0.
Consider clarity, relevance, and completeness. Respond with a JSON object with a "score" field.`,
  schema: {
    type: "object",
    properties: {
      score: { type: "number", description: "Helpfulness score from 0 to 1" },
    },
    required: ["score"],
  },
  model: {
    model: "gpt-4o-mini",
    providerOptions: {},
  },
  passCriteria: {
    operator: "and",
    conditions: [{ field: "score", operator: ">=", value: 0.8 }],
  },
});

Evaluation suite configs

Suite configs group evaluators and optional agent filters and sample rates. They are used by continuous tests (evaluation run configs) to decide which conversations to evaluate automatically.

Creating an evaluation suite config

Pass evaluatorIds (required, at least one) and optionally sampleRate (0–1) and filters (e.g. agentIds to restrict which agents’ conversations are evaluated). The suite can then be attached to a continuous test (evaluation run config).

const suiteConfig = await client.createEvaluationSuiteConfig({
  evaluatorIds: ["eval-helpfulness", "eval-accuracy"],
  sampleRate: 0.1,
  filters: {
    agentIds: ["agent-support-bot"],
  },
});
OptionTypeRequiredDescription
evaluatorIdsstring[]YesAt least one evaluator ID to run in this suite
sampleRatenumberNoFraction of matching conversations to evaluate (0–1). Omit to evaluate all.
filtersobjectNoRestrict which conversations are in scope, e.g. { agentIds: ["agent-id"] }

Batch evaluation

Trigger a one-off batch evaluation over conversations, optionally filtered by conversation IDs or date range:

const result = await client.triggerBatchEvaluation({
  evaluatorIds: ["eval-1", "eval-2"],
  name: "Weekly quality check",
  dateRange: {
    startDate: "2025-02-01",
    endDate: "2025-02-07",
  },
});
// result: { message, evaluationJobConfigId, evaluatorIds }
OptionTypeRequiredDescription
evaluatorIdsstring[]YesIDs of evaluators to run
namestringNoName for the job (defaults to a timestamped name)
conversationIdsstring[]NoLimit to these conversations
dateRangeObject with startDate and endDate (YYYY-MM-DD)NoLimit to conversations in this date range

To list results by job or run config, use the Evaluations API (e.g. get evaluation results by job config ID or by run config ID).