The challenges

In our journey, we’ve talked to hundreds of companies who are eager to use generative AI to provide better self-help experiences for their users. Many of them had experimented with creating their own LLM Q&A apps, but often didn’t ship because they felt the quality and reliability weren’t there. Below, we share some of the common challenges they shared with us and our approach to solving them.

Ingesting content from many sources

Knowledge about technical products often lives in many places: documentation, GitHub, Discourse forums, Slack and Discord communities, blogs, StackOverflow, support systems, and more. Smartly ingesting all of this content, and keeping it up to date over time, quickly becomes a full-time task of a team of data engineers.

Inkeep addresses this by:

  • Automatically ingesting content from common public and private content sources with out-of-the-box integrations.

  • Frequently re-crawling your sources to find differences and keep your knowledge base up to date.

  • APIs and on-demand triggers to ingest content at any time.

Finding the most relevant content

Retrieval augmented generation (RAG) is the best way to use LLMs to answer questions regarding domain-specific content. At a high level, it involves taking a user question, finding the most relevant content, and feeding it to an LLM model.

RAG relies on finding the relevant documents and “chunks” within those documents needed to answer user questions. The problem is, popular ways of doing retrieval - like slicing up all content into n-character chunks - are often arbitrary and ineffective.

Retrieval becomes even more challenging as the number of documents and sources increase. More content means higher coverage of potential user questions, but it also means more noise and the need for a precise retrieval system.

Our retrieval and neural search engines address this by using:

  • Custom embedding and chunking strategies for each content source. The most effective embedding and chunking strategy for a Slack conversation is very different from one for a “How-to” article.

  • Neural search that combines semantic and keyword search to balance vector similarity and keyword matching.

  • Tailoring of the embedding space to your specific organization and content. Out-of-the-box embedding models don’t account for what we call the “semantic space” of your company and your products. For example, “Retrieval system” is much closer in semantic meaning to “Feature” for Inkeep than for other companies.

  • Accounting for time, author, source type, and other metadata that’s important for prioritizing trustworthy content.

  • …and more.

If you’re curious about our technical approach, join our newsletter where we share product updates and engineering deep-dives.

Minimizing hallucinations

Conversational large language models are trained to provide satisfying answers to users. Unfortunately, this makes them prone to providing answers that are unsubstantiated, i.e. “hallucinating”. Dealing with hallucinations is notoriously difficult and a common blocker for many companies.

Here are some of the key ways in which we minimize hallucinations with our grounded-answer system:

  1. Retrieving the right content - When models are not given content that helps them answer a question, they are more likely to hallucinate. That’s in part why we focus so heavily on our search and retrieval engines.

  2. Providing citations - Citations give end-users easy ways to learn more and introspect answers. We also use citations in our automated evaluation system to detect when model answers are drifting from source material.

  3. Staying on topic - We’ve implemented a variety of protections to keep model answers on topic. For example, the bot won’t answer questions unrelated to your company and will guard against giving answers that create a poor perception of your product.

  4. Rapidly experimenting at scale - We continuously test and evaluate our entire retrieval and LLM-stack against both historical and new user questions. This allows us to identify and adopt new techniques while monitoring for regressions.

Incorporating feedback

Even with a best-in-class retrieval and grounded-answer systems, feedback loops are the key to continuous improvement of model performance over time.

Our platform has built-in mechanisms for this, including:

  • Thumbs up/down feedback from end-users

  • “Edit answer” feature for administrators

  • Custom FAQs

We also provide usage, topical, and sentiment analysis on all user questions. Product and content teams often use these insights to prioritize content creation and product improvements that address root causes of user questions.

Production-ready service

To launch something confidently to end-users, it’s essential to have:

  • High availability, geo-distributed, low latency search and chat services

  • API and UX monitoring

  • Continuous evaluation of search and chat results

Our platform already handles this at scale and answers tens of thousands of questions per month.

The Team

Our team is made up of MIT engineers and app developers passionate about machine learning, data engineering, developer products, and great user experiences. We’re excited to solve the challenges in this space and help companies provide the best devex possible.

We’re also fortunate to have the backing of reputable investors, including Y Combinator and Khosla Ventures.

Our founding team includes: