Inkeep’s Chat Completion API endpoints make it easy to develop chatbot or copilot experiences powered by your own knowledge base.

Because these endpoints are compatible with OpenAI’s Chat Completion API schema, you can use them with most LLM application frameworks, libraries, or SDKs with zero code changes.

For example, you can build a:

Check out our examples.

Available modes

We offer various modes tailored for different use cases.

Mode NameDescription
inkeep-qaProvides sensible defaults for customer-facing support bot scenarios.
inkeep-contextA fully “passthrough” mode that injects Inkeep’s RAG context in calls to base models. Best for when you need custom tool (function) calls or full prompting control for custom LLM applications, agents or workflows.

Using the API

To use with an OpenAI compatible client, simply:

  1. Customize the baseUrl to https://api.inkeep.com/v1
  2. Specify the mode by setting model to a value in the format of {inkeep-mode}-{model}

If you’d like to let Inkeep manage experimenting with and determining the best models, use:

  • inkeep-qa-expert
  • inkeep-context-expert

If you’d like to pin the service against a preferred model, use one of:

  • inkeep-qa-sonnet-3-5
  • inkeep-qa-gpt-4o
  • inkeep-qa-gpt-4-turbo
  • inkeep-context-sonnet-3-5
  • inkeep-context-gpt-4-turbo
  • inkeep-context-gpt-4o

All modes use the OpenAI chat completion API format.

Question Answer (QA) Mode

The qa mode is specifically tailored for customer-facing responses like support chatbots or auto-reply workflows.

This mode comes with Inkeep’s sensible built-in behavior for:

  • format of citations
  • tone
  • techniques to reduce hallucinations
  • keeping answers scoped to your product and service

QA models are best suited for when you want to develop your own chat UX or automations without worrying about how to prompt or use a model.

The qa mode responds with an AI assistant message (content) and two additional pieces of information: links and aiAnnotations (provided as tools).

ComponentDescription
contentContains the conversational answer of the AI assistant as any normal OpenAI chat completion request. This is what you would display as the AI assistant answer in a traditional chat UI.
provideLinksProvides a list of links (sources) used by the AI assistant to generate a response. You can use this to display citations for an answer.
provideAIAnnotationsProvides labels for the the response, like answerConfidence. answerConfidence indicates how confident the AI assistant is in its response. You can use this to, for example, conditionally show answers to an end-user only if the AI assistant is confident.
inkeep-qa-tools-schema.ts
import { z } from "zod";

/* provideLinks tool schema */

const InkeepRecordTypes = z.enum([
    'documentation',
    'site',
    'discourse_post',
    'github_issue',
    'github_discussion',
    'stackoverflow_question',
    'discord_forum_post',
    'discord_message',
    'custom_question_answer',
]);

const LinkType = z.union([
    InkeepRecordTypes,
    z.string() // catch all
]);

const LinkSchema = z.object({
    label: z.string().nullish(), // the value of the footnote, e.g. `1`
    url: z.string(),
    title: z.string().nullish(),
    description: z.string().nullish(),
    type: LinkType.nullish(),
    breadcrumbs: z.array(z.string()).nullish(),
}).passthrough();

export const LinksSchema = z.array(LinkSchema).nullish();

export const LinksToolSchema = z.object({
    links: LinksSchema,
}); 

/* provideAIAnnotations tool schema */

const KnownAnswerConfidence = z.enum([
    'very_confident',
    'somewhat_confident',
    'not_confident',
    'no_sources',
    'other',
]);

const AnswerConfidence = z.union([KnownAnswerConfidence, z.string()]).nullish(); // evolvable

const AIAnnotationsToolSchema = z.object({
    answerConfidence: AnswerConfidence,
}).passthrough();

export const ProvideAIAnnotationsToolSchema = z.object({
    aiAnnotations: AIAnnotationsToolSchema,
});

Context Mode

The inkeep-context mode works like a “passthrough” proxy that injects Inkeep’s RAG context to a call to an underlying model from Anthropic or OpenAI. These endpoints are fully compatible with all OpenAI chat completion endpoint functionality like tool calling, JSON mode, image inputs, etc.

This is great for when you’re developing a custom AI agent, LLM-application, or workflow and require full control of outputs and custom tool calls but would like a managed RAG system.

This mode is unopinionated and provides a high-degree of flexibility; therefore, it requires a similar level of prompting, testing, and experimentation required of any LLM application.