YouTube video summary

Sam Partee on Retrieval Augmented Generation (RAG)

Artificial intelligence

09 Feb 20242 min summaryFrom InfoQ

Sam Partee on Retrieval Augmented Generation (RAG)

Save to your library

Chat with this summary

Redis Vector Database

Sam Party, a principal applied AI engineer at Redis, discussed the integration of Redis' Vector database offering with various frameworks and customer use cases at the QCon San Francisco Conference.
Redis is particularly suitable for use cases that require real-time processing, such as long-term memory for large language models and semantic caching.
Redis provides two algorithms for vector search: K-nearest neighbors (KNN) brute force search and hierarchical navigable small world (HNSW) approximate nearest neighbors search.
Reddis supports both hashes and JSON documents for storing data.
Vector searches in Reddis can be either plain vector searches or range queries.
Hybrid searches, or filtered searches, combine vector search with other types of search features like text search, tag filters, geographic search, and polygon search.

Semantic Search Techniques

There are two main approaches to representing documents in vector space: using an LLM to summarize the entire document, or splitting the document into sentences and using vector search to find the relevant sentence and its surrounding context.
The speaker advocates for trying various techniques, including traditional machine learning methods, to find the best approach for semantic search.
Using sentence-by-sentence embeddings with large language models (LLMs) may not provide enough uniqueness for every sentence, especially if the query contains a lot of semantic information.
The speaker suggests using LLMs to create paper summaries, as they can pack more information than random sections of a paper.
A technique called "hypothetical document embeddings" (Hyde) is discussed, which involves using a hallucinated answer from an LLM to search for the right context or answer in a database.
The speaker emphasizes the effectiveness of using generated reviews to search for hotel reviews, as it returns more relevant results compared to searching for specific features or amenities.

Challenges in On-Premise AI Training

On-premise AI training has a high barrier to adoption compared to using APIs like OpenAI's.
The barrier to entry for using Triton's HPS API is much higher compared to OpenAI's API.
Acquiring data center GPUs, particularly those with CUDA capabilities, is challenging due to high demand.
AMD chips are an alternative to CUDA-enabled GPUs, but CUDA still dominates the AI industry.
Cloud platforms like Google Cloud, Lambda, and Hugging Face also face GPU shortages.

Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else

Save this summary

Keep it in your library.

Save to your library

Browse all from InfoQ →

Why We Deprecated Google Analytics (And Built a System 3x Cheaper)

Why We Deprecated Google Analytics (And Built a System 3x Cheaper)

YouTube05 Jul 2026

Craig McLuckie on Culture as a Team's Operating System in the AI Era

Craig McLuckie on Culture as a Team's Operating System in the AI Era

YouTube15 Jun 2026

Netflix Engineering Director: Why Code Scales Systems, But Clarity Scales Orgs

Netflix Engineering Director: Why Code Scales Systems, But Clarity Scales Orgs

YouTube08 Jun 2026

Why Scaling Teams Spikes Human Latency (And How to Fix It)

Why Scaling Teams Spikes Human Latency (And How to Fix It)

YouTube07 Jun 2026

How AI Erased the Software Implementation Bottleneck (90% Shipped Code)

How AI Erased the Software Implementation Bottleneck (90% Shipped Code)

YouTube02 Jun 2026

Requirements Analysis for Architects: A Conversation with Sonya Natanzon

Requirements Analysis for Architects: A Conversation with Sonya Natanzon

YouTube02 Jun 2026

Ready to get started?

Save, summarize and chat with your content.

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop, personal AI knowledge base for summarizing and chatting with your content