Inkeep AI API Overview

Inkeep's AI API endpoints make it easy to develop chatbot or copilot experiences powered by your own knowledge base.

Because these endpoints are compatible with OpenAI's Chat Completion API format, so you can use most LLM application frameworks, libraries, or SDKs with zero code changes.

For example, you can build a:

ChatGPT-like UI using the Next.js Chatbot template
Product-specific copilot using the Vercel AI SDK or LangChain
An automation for replying to customer emails or as a plugin to your support platform using OpenAI's TypeScript SDK.

Check out our examples for quickstarts for the various APIs.

Available Models

We offer various APIs tailored for different use cases, we use the model concept to differentiate between the different scenarios and functionality.

Model (API) Name	Description
`inkeep-qa`	Provides sensible defaults for customer-facing support assistant or "question-answer" scenarios. Useful for auto-reply automations or custom chat experiences.
`inkeep-context`	A flexible "passthrough" proxy that injects Inkeep's RAG context into calls to underlying models from Anthropic or OpenAI. Fully compatible with all chat completion endpoint functionality including tool/function calling, JSON mode, and image inputs. Ideal for custom AI agents, LLM applications, or workflows requiring full output control while benefiting from a managed RAG system. Provides high flexibility but requires standard LLM application prompting and experimentation.
`inkeep-base`	A fast version of the `inkeep-context` mode that skips RAG and instead only injects a general overview of your product into calls to base models. Best for when you want the flexibility of the `inkeep-context` mode, but don't need the full comprehensive context.
`inkeep-rag`	Provides structured RAG chunks directly from your knowledge base. Returns chunks with URLs, excerpts from the original documents, and other source information. Ideal for custom implementations where you need direct access to the retrieved knowledge base content. You'd typically pass these chunks ('documents') to an LLM of your choice for any open-ended purpose.

Get an API key

Log in to the Inkeep Dashboard
Navigate to the Projects section and select your project
Open the Assistants tab
Click Create assistant and choose API from the options
Enter a Name for your new API assistant.
Click on Create
A generated API key will appear that you can use to authenticate API requests.

Using the API

To use with an OpenAI-compatible client, simply:

Customize the baseURL to https://api.inkeep.com/v1 and use an Inkeep apiKey
Specify the mode by setting model to a value in the format of {inkeep-mode}-{model}

If you'd like to let Inkeep manage which model is used

inkeep-qa-expert
inkeep-context-expert
inkeep-base-expert
inkeep-rag

If you'd like to pin the service against a preferred LLM model for the qa or context APIs, you can do so like this:

inkeep-qa-sonnet-4
inkeep-qa-gpt-4o
inkeep-qa-gpt-4-turbo
inkeep-context-sonnet-4
inkeep-context-gpt-4-turbo
inkeep-context-gpt-4o

All modes use the OpenAI chat completion API format. You can use any OpenAI compatible SDK in any language to make requests to the API. If you don't need streaming, then a regular HTTP request works fine. If you use streaming, we recommend using one of the SDKs.

Follow the guides for the mode you want to use for details on the expected response schema and usage patterns.

Limitations

Rate Limits

Per-IP throttling is enabled.
Need a higher rate? Email support@inkeep.com.

Chat Limits

30 messages per chat session
(1 user message + 1 assistant response = 2 “messages”).

Message Size

Direction	Recommended tokens	Notes
Input	≤ 100	Short fragments, single sentences, or brief paragraphs yield the best results.
Output	≤ 1 000	Longer outputs are allowed but may be less optimal.

Have a use case that is outside these recommended limits? Reach out to support@inkeep.com.