In recent years, the field of natural language processing (NLP) has witnessed unprecedented innovation, driven by significant advancements to Generative AI models. Yet, modern text generation models still lack the capability to access up to date information or more general: Information that is not part of the training set. To aid Retrieval-Augmented Generation (RAG) has been presented. However, RAG has shown significant weaknesses when it comes to contextual precision and personalization. In this blog post, we share our progress in building a more precise knowledge management and retrieval platform based on Contextual Retrieval-Augmented Generation (CRAG), a more advanced architecture.
Classic Retrieval-Augmented Generation
At its core, classic Retrieval-Augmented Generation represents a departure from traditional generative models, which rely solely on learned patterns from training data. Instead, RAG leverages external knowledge sources, such as large-scale text corpora or structured databases, to augment the generation process. By dynamically retrieving and incorporating relevant information from these repositories, RAG-equipped models can generate text that is grounded in real-world facts and context. Allowing organizations to incorporate internal information into AI models.
While RAG is often presented, similar to initially proposed, as a trained end-to-end model, today's use deviates by applying pre-trained and fixed models in a chained system. During the indexing phase the text corpus is split document by document into chunks of text (usually around 1000 characters each). These chunks are then encoded into a dense vector space using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or more advanced methods like embedding models based on the same transformer architecture that Large Language Models like ChatGPT apply. These vectors capture the semantic meaning of each chunk, allowing for not only comparison but also retrieval.
During the retrieval phase, the input prompt (e.g. “Who is the ceo?”) is encoded into the same vector space, allowing for similarity-based retrieval of the most relevant document chunks based on their semantic similarity to the prompt. Once the relevant chunks are retrieved, they are passed to the generative model, which incorporates this retrieved information along with the input prompt to generate the final output.
State of the art open source models like Llama 3, Mixtral 8x7b or commercially available counterparts like OpenAI GPT-4 or Claude 3 apply good reasoning over presented information while generating an answer to the user prompt. Presenting the wrong / outdated or irrelevant information to the model raises the chances of hallucination or misbehavior. In the process of RAG it is therefore crucial to not only retrieve all relevant information (High recall) but also eliminate noise (High precision) before presenting the context to the LLM.
Path towards Contextual RAG
The larger and complexer an organization grows the more outside factors influence what information is deemed relevant to the user prompt. Consider an employee in Europe versus someone located in South-East Asia. The prompt “What is the holiday guideline?” needs to yield different results based on the context of each person (or user). While classic RAG architectures do optimize the retrieval process through Hybrid Search (sparse combined with dense vectors), HyDE (generating an hypothetical answer to the question and retrieving based on this answer instead of the prompt) or advanced methods like fine-tuning embedding models or merging embedding and text generation moel (GRIT), most solutions lack context of the user resulting in low recall of the information send to the text generation model.
To solve this challenge we present Contextual Retrieval-Augmented Generation (CRAG).
To optimize recall, CRAG improves two parts of the RAG chain:
- Indexing
During the indexing phase of knowledge, CRAG not only persists sparse and dense vectors of the relevant chunks but also contextual metadata in the form of a content profile represented by another dense vector. - Retrieval
Next to the user prompt, a user profile, in the same vector space as the content profiles, is sent to the document store. In addition to the hybrid search scores of TF-IDF and semantic scores, a profile score is returned, promoting results that are not only the best matching content-wise but also the most relevant to the current user’s profile.
Content profiles serve as rich representations of the organizational context of document chunks, capturing not only the textual content but also the broader thematic relevance and domain-specific nuances of the organization. Further, content profiles can be fine tuned to every organization to allow them to cater for organization specific characteristics and preferences. By encoding this contextual metadata into dense vectors, CRAG enables more precise retrieval of relevant chunks based on their semantic similarity to both the user prompt and the user profile. On the other hand, user profiles encapsulate the preferences, interests, and characteristics of individual users, allowing the system to tailor retrieval results based on personalized relevance criteria.
Together, these profiles form the backbone of CRAG's retrieval optimization strategy, enabling the system to deliver highly relevant and personalized content recommendations that optimize recall. By leveraging the rich semantic representations encoded within content and user profiles, CRAG empowers organizations to extract maximum value from their knowledge repositories, driving enhanced decision-making and productivity across diverse use cases and domains.
Build on CRAG with us
Existing attempts to combine generative AI with corporate data have yielded poor results. We believe it takes an advanced approach like CRAG to unleash the full potential of generative AI in the enterprise. We are thrilled about the results we’re already seeing with CRAG and can’t wait to bring it to more leading enterprises.
Thousands of users are already experiencing the benefits of an AI-powered knowledge platform like Zive today, and more are joining every day.
Ready to supercharge your employees with AI? Sign up for a free demo and experience the results yourself.