In our journey, we’ve talked to hundreds of companies who want to use generative AI to provide better self-help experiences for their users. Many of them have experimented with creating their own LLM Q&A apps, but often didn’t ship because they felt the quality and reliability weren’t there. Below, we share some of the common challenges they shared with us and our approach to solving them.
Ingesting content from many sources
Knowledge about technical products often lives in many places: documentation, GitHub, Discourse forums, Slack and Discord communities, blogs, StackOverflow, support systems, and more. Ingesting all of this content, and keeping it up to date over time, quickly becomes a full-time task of a team of data engineers.
Inkeep addresses this by:
Automatically ingesting content from common public and private content sources with out-of-the-box integrations.
Frequently re-crawling your sources to find differences and keep your knowledge base up to date.
APIs and on-demand triggers to ingest content at any time.
Finding the most relevant content
Retrieval augmented generation (RAG) is the best way to use LLMs to answer questions regarding domain-specific content. At a high level, it involves taking a user question, finding the most relevant content, and feeding it to an LLM model.
RAG relies on finding the relevant documents and “chunks” within those documents needed to answer user questions. The problem is, popular ways of doing retrieval - like slicing up all content into
n-character chunks - are often arbitrary and ineffective.
Retrieval becomes even more challenging as the number of documents and sources increase. More content means higher coverage of potential user questions, but it also means more noise and the need for a precise retrieval system.
Our retrieval and neural search engines address this by using:
Custom embedding and chunking strategies for each content source we support. The most effective embedding and chunking strategy for a Slack conversation is very different from one for a “How-to” article.
Hybrid search that combines vector and keyword search to balance semantic similarity and keyword matching.
Tailoring of the embedding space to your specific organization and content. Out-of-the-box embedding models don’t account for what we call the “semantic space” of your company and your products. For example, “Retrieval system” is much closer in semantic meaning to “Feature” for Inkeep than for other companies.
Accounting for time, author, source type, and other metadata that’s important for prioritizing trustworthy content.
…and (much) more.
If you’re curious about our technical approach, join our newsletter where we share product updates and engineering deep-dives.
Conversational large language models are trained to provide satisfying answers to users. Unfortunately, this makes them prone to providing answers that are unsubstantiated, i.e. “hallucinating”. Dealing with hallucinations is notoriously difficult and a common blocker for many companies.
Here are some of the key ways in which we minimize hallucinations with our grounded-answer system:
Retrieving the right content - When models are not given content that helps them answer a question, they are more likely to hallucinate. That’s in part why we focus so heavily on our search and retrieval engines.
Providing citations - Citations help ground model answers and give end-users easy ways to learn more and introspect answers. We also use citations in our automated evaluation system to detect when model answers are drifting from source material.
Staying on topic - We’ve implemented a variety of protections to keep model answers on topic. For example, the bot won’t answer questions unrelated to your company and will guard against giving answers that create a poor perception of your product.
Rapidly experimenting at scale - We continuously test and evaluate our entire retrieval and LLM-stack against both historical and new user questions. This allows us to identify and adopt new techniques while monitoring for regressions.
Even with a best-in-class retrieval and grounded-answer systems, feedback loops are the key to continuous improvement of model performance over time.
Our platform has built-in mechanisms for this, including:
Thumbs up/down feedback from end-users
“Edit answer” feature for administrators
We also provide usage, topical, and sentiment analysis on all user questions. Product and content teams often use these insights to prioritize content creation and product improvements that address root causes of user questions.
To launch something confidently to end-users, it’s essential to have:
High availability, geo-distributed, low latency search and chat services
API and UX monitoring
Continuous evaluation of search and chat results
Our platform already handles this at scale and answers tens of thousands of questions per month.
Our team is made up of MIT engineers and app developers passionate about machine learning, data engineering, developer products, and great user experiences. We’re excited to solve the challenges in this space and help companies provide the best devex possible.
Our founding team includes: