Intro 0s
- This session is about using a repository for Retrieval Augmented Generation (RAG) and how it looks in Copilot Chat, as well as the learnings from the experience 10s.
- RAG translates into project awareness in Copilot Chat, which can be achieved through different methods in various platforms, such as VS Code, Visual Studio, and JetBrains 34s.
- The methods include workspace or intent detection in VS Code, hashtags in Visual Studio, and implicit intent detection in JetBrains 40s.
- The feature is currently being rolled out in JetBrains, and users of the JetBrains extension version 1.5.26 or later may have seen it 56s.
- The presentation will focus on the implementation details for JetBrains, but the general concepts are applicable to all implementations 1m2s.
- The speaker, Kimin Miguel, is a senior software engineer working on Copilot for IDE, focusing on project context for JetBrain Chat and prompt craft for code completions 1m22s.
- Kimin Miguel is based in France and enjoys taking care of plants and playing Final Fantasy 14 in their free time 1m32s.
Agenda 1m41s
- The agenda includes a breakdown of how project context works in JetBrains, covering local indexing, local search, and the final reranking step 1m44s.
- Key steps behind the process will be explained, including local indexing, local search, and the final reranking step 1m48s.
- The explanation of the key steps will be provided before moving on to the next topic 1m57s.
Project Context in JB Chat 2m0s
- Project context is a feature that helps large language models provide more accurate and relevant responses by enriching the prompt with code snippets from the user's codebase, allowing the model to reference symbols and constructs defined in the codebase and provide more helpful answers 2m29s.
- The goal of project context is to ground the model with factual data, reducing the likelihood of generic or misleading responses 2m43s.
- Project context can be used through various methods, including the "at workspace" action, implicit intent detection in Visual Studio, and implicit intent detection in JetBrains 3m1s.
- Initially, JetBrains planned to implement project context using the "at project" method, but due to low usage numbers, they shifted to implicit intent detection 3m11s.
- As of the current numbers, about 11.8% of answers in JetBrains include context using project context 3m36s.
- Project context can be enabled or disabled through a checkbox in the chat panel, and users who do not have it enabled yet will receive it after the Universe update 4m17s.
- When project context is enabled, the model takes a little longer to respond, and users can see an info message indicating that the model is collecting relevant project context 5m18s.
- The feature provides references to relevant code snippets from the user's codebase, which can be clicked to open the corresponding code snippet in the code editor 4m50s.
- The demo showcases how project context works in a live scenario, where the user asks a question and the model provides a response with references to relevant code snippets 4m11s.
How does project context work 5m35s
- Local project context is built using three main building blocks: local indexing, the first ranking pass, and the second ranking pass, which work together to provide relevant snippets from a project to answer a user's question 5m37s.
- The local indexing step starts when a project is opened in an IDE and the extension is activated, and it indexes the project in the background in a non-blocking way by tokenizing files using the Microsoft White Pair encoding tokenizer and splitting each file into chunks of 500 tokens 6m48s.
- When a user asks a question, the query is processed, and the result is used to find a wide area of snippets from the local index built earlier through the first ranking pass, which uses the BM25 scoring function to rank the snippets 6m5s.
- The top 50 snippets from the first ranking pass are then refined through a second ranking pass, which uses a text embedding 3 model to return embeddings, or vector representations, of the snippets and the user query, and compares them using cosine similarity 7m52s.
- The five topmost snippets from the second ranking pass are extracted, included in the prompt, and used to generate an answer, along with references to the original files and selection ranges 8m26s.
- The local indexing process is done on startup and is a non-blocking process that runs in the background, allowing the user to interact with the project while the indexing is being completed 7m11s.
- The keywords related to the user's question are used to find the most similar snippets in the local index, which are then ranked and selected for inclusion in the prompt 7m32s.
- The entire process is designed to provide relevant project context to the user's question, using a combination of natural language processing and machine learning techniques to generate accurate and helpful responses 6m39s.
What constraints are we operating under? 8m41s
- The key takeaway from the session is the need to balance engineering and science when implementing a solution, with the goal of achieving a snappy experience that provides relevant information backed by data 8m51s.
- To achieve this balance, it is essential to iterate, experiment, and ultimately compromise between engineering and science 9m8s.
- The process involves considering constraints and key points at each step, starting with local considerations 9m19s.
Local Indexing 9m19s
- Local indexing does not persist the index, meaning that when a project is closed, everything gets released, and when the project is reopened, everything is reindexed 9m19s.
- Indexing needs to be done in the background to avoid blocking operations, which can be achieved through multi-threading 9m50s.
- A bug was fixed that caused issues with indexing a file with 10,000 nested open arrays, which was taking a minute to index 10m3s.
- Files specified in .gitignore and excluded in the IDE are not indexed, and node modules and Python environments are also excluded from indexing 10m29s.
- A cap is placed on the number of files and code chunks stored in memory to prevent indexing and ranking from taking too long 10m47s.
- If a project has more than 10,000 files, only partial project context can be offered, and an info message is displayed when project context is invoked on large projects 11m9s.
- The time it takes to index a project depends on the number of files, with smaller repos taking around 5 seconds, medium-sized repos taking around 50 seconds, and large repos taking over 2 minutes 11m34s.
- Having everything in memory allows for listening to file system notifications and reindexing files quickly to keep the freshest information in the codebase 12m7s.
- The indexing algorithm used is fixed-size chunking with 500 tokens, but alternative methods like semantic chunking could be explored for better results, although they may come with a performance hit 12m47s.
First ranking pass 13m31s
- The first ranking pass involves extracting keywords from the user question and using the BM25 scoring function to score documents, in this case code snippets, against those keywords 13m36s.
- BM25 is a scoring function that depends on the frequency of the keyword, relative to the document length, and the number of documents that contain the keyword 13m55s.
- Two key questions were how to extract keywords from a natural language question and how to make it fast, with the solution being to leverage an LLM (Large Language Model) to do the work 14m14s.
- The LLM used was GPT 3.5 Turbo or GPT 40 Mini, hosted on Azure OpenAI, to extract the most relevant keywords, synonyms, and variations 14m29s.
- Testing showed that GPT 40 Mini returned more generic results than GPT 3.5 Turbo, highlighting the need for AB experimentation when swapping models 14m52s.
- The keyword and synonym request is independent of the repository size and takes between 600 milliseconds and 1.5 seconds, depending on the query complexity 15m14s.
- The processing time for the scoring logic using BM25 scales with the number of index chunks, with search times ranging from 900 milliseconds for a small repository to 10 seconds for a large one 15m36s.
- To optimize performance, only a subset of snippets is passed to the second ranking step, with the current approach selecting the top 47 highest-scored snippets 16m29s.
- The fixed snippet count may not work for larger repositories, and experimenting with scaling the number of snippets to pass to the second ranking pass is being considered 16m48s.
Second ranking pass and final prompt 17m2s
- The process involves taking the 47 snippets in the user query and sending them to the Text Embedding 3 model hosted on Azure OpenAI, which returns a vector for each string 17m3s.
- The vectorized snippets are then compared to the vectorized user query using cosine similarity, and the top five results are selected 17m19s.
- There are two levers that can be adjusted in this process: the amount of data sent to the embedding model and the vector size 17m23s.
- Increasing the amount of data sent to the model increases the time it takes to process, with 48 strings taking around 4.5 seconds 17m38s.
- The vector size can also be adjusted, with the default size being 1,536 dimensions, but reducing the dimensions can result in an accuracy cost 18m3s.
- A compromise was reached by using 1,024 dimensions, which speeds up the process to around 4-4.5 seconds without significantly impacting the ranking output 19m1s.
- The top five vectors are stored and included in the prompt, with some exceptions, such as files under content exclusion policies 19m18s.
- References to the files and code snippets included in the prompt are displayed to the user 19m51s.
- The process involves some waiting time, and an info message is displayed to inform users that responses may take a little longer 20m19s.
- Users can opt out of project context or add files references directly to the prompt if they know which files are useful 20m27s.
- A file picker can be used to select files and add them directly to the context that will be sent as part of the prompt 20m54s.
Summary 21m43s
- Building blocks for project context include local indexing, local search, and reranking using embeddings, which are essential components for creating a functional system 21m46s.
- A balance between engineering and science is necessary, as they have a healthy tension that feeds into each other, requiring consideration of the impact of changes on both sides 21m54s.
- Having a baseline to compare against is crucial, as any change is not necessarily a good change, allowing for evaluation and improvement 22m8s.
- The needs and goals of the users should not be forgotten, as the ultimate objective is to create a helpful and successful product that enables users to accomplish more during their day 22m16s.








