Chat Completions API
Inkeep’s Chat Completion API endpoints make it easy to develop chatbot or copilot experiences powered by your own knowledge base.
Because these endpoints are compatible with OpenAI’s Chat Completion API format, you can use them with most LLM application frameworks, libraries, or SDKs with zero code changes.
For example, you can build a:
- ChatGPT-like UI using the Next.js Chatbot template
- Product-specific copilot using the Vercel AI SDK or LangChain
- An automation for replying to customer emails or as a plugin to your support platform using OpenAI’s TypeScript SDK.
Check out our examples.
Available modes
We offer various modes tailored for different use cases.
Mode Name | Description |
---|---|
inkeep-qa | Provides sensible defaults for customer-facing support bot scenarios. |
inkeep-context | A fully “passthrough” mode that injects Inkeep’s RAG context in calls to base models. Best for when you need custom tool (function) calls or full prompting control for custom LLM applications, agents or workflows. |
Using the API
To use with an OpenAI compatible client, simply:
- Customize the
baseUrl
tohttps://api.inkeep.com/v1
- Specify the mode by setting
model
to a value in the format of{inkeep-mode}-{model}
If you’d like to let Inkeep manage which model is used
inkeep-qa-expert
inkeep-context-expert
If you’d like to pin the service against a preferred LLM model, use one of:
inkeep-qa-sonnet-3-5
(recommended)inkeep-qa-gpt-4o
inkeep-qa-gpt-4-turbo
inkeep-context-sonnet-3-5
(recommended)inkeep-context-gpt-4-turbo
inkeep-context-gpt-4o
All modes use the OpenAI chat completion API format.
Question Answer Mode
The qa
mode is specifically tailored for customer-facing responses like support chatbots or auto-reply workflows.
This mode comes with Inkeep’s sensible built-in behavior for:
- format of citations
- tone
- techniques to reduce hallucinations
- keeping answers scoped to your product and service
QA models are best suited for when you want to develop your own chat UX or automations without worrying about how to prompt or use a model.
The qa
mode responds with an AI assistant message (content
) and two additional pieces of information: links
and aiAnnotations
(provided as tools).
Component | Description |
---|---|
content | Contains the conversational answer of the AI assistant as any normal OpenAI chat completion request. This is what you would display as the AI assistant answer in a traditional chat UI. |
provideLinks | Provides a list of links (sources) used by the AI assistant to generate a response. You can use this to display citations for an answer. |
provideAIAnnotations | Provides labels for the the response, like answerConfidence . answerConfidence indicates how confident the AI assistant is in its response. You can use this to, for example, conditionally show answers to an end-user only if the AI assistant is confident. |
import { z } from "zod";
/* provideLinks tool schema */
const InkeepRecordTypes = z.enum([
'documentation',
'site',
'discourse_post',
'github_issue',
'github_discussion',
'stackoverflow_question',
'discord_forum_post',
'discord_message',
'custom_question_answer',
]);
const LinkType = z.union([
InkeepRecordTypes,
z.string() // catch all
]);
const LinkSchema = z.object({
label: z.string().nullish(), // the value of the footnote, e.g. `1`
url: z.string(),
title: z.string().nullish(),
description: z.string().nullish(),
type: LinkType.nullish(),
breadcrumbs: z.array(z.string()).nullish(),
}).passthrough();
export const LinksSchema = z.array(LinkSchema).nullish();
export const LinksToolSchema = z.object({
links: LinksSchema,
});
/* provideAIAnnotations tool schema */
const KnownAnswerConfidence = z.enum([
'very_confident',
'somewhat_confident',
'not_confident',
'no_sources',
'other',
]);
const AnswerConfidence = z.union([KnownAnswerConfidence, z.string()]).nullish(); // evolvable
const AIAnnotationsToolSchema = z.object({
answerConfidence: AnswerConfidence,
}).passthrough();
export const ProvideAIAnnotationsToolSchema = z.object({
aiAnnotations: AIAnnotationsToolSchema,
});
Context Mode
The inkeep-context
mode works like a “passthrough” proxy that injects Inkeep’s RAG context to a call to an underlying model from Anthropic or OpenAI. These endpoints are fully compatible with all chat completion endpoint functionality like tool calling, JSON mode, image inputs, etc.
This is great for when you’re developing a custom AI agent, LLM-application, or workflow and require full control of outputs and custom tool calls but would like a managed RAG system.
This mode is unopinionated and provides a high-degree of flexibility; therefore, it requires a similar level of prompting, testing, and experimentation required of any LLM application.