YouTube video summary

Stanford CS329H: Machine Learning from Human Preferences I Guest Lecture: Joseph Jay Williams

Artificial intelligence

22 Nov 202427 min summaryFrom Stanford Online

Stanford CS329H: Machine Learning from Human Preferences I Guest Lecture: Joseph Jay Williams

Stanford Online

Save to your library

Chat with this summary

Introduction and Goals

The goal of the talk is to provide a sharable message about human contraction and experimentation, allowing the audience to explain how experimentation can be used in various behavior change contexts and potentially contribute to helping someone win a Nobel Prize for experimentation 52s.
The speaker's research can be found on their website, and they have a mailing list that people can join to stay updated on lab meetings and special meetings 41s.
The speaker's objectives for the talk include providing concrete information that the audience can share with others and inspiring the audience to take action or contribute to the field of experimentation 54s.
The speaker's program aims to coach behavior change by transforming everyday technology touch points into intelligent interventions, using an approach that involves adaptive experimentation tools, personalized and contextualized interventions, and experimentation to exercise science and practice 2m13s.

Behavior Change and Intelligent Interventions

The speaker's lab has worked on various projects related to behavior change, including papers presented by Anan B Shari and others 2m47s.
The science of behavior change can help solve many human problems, and the speaker invites the audience to imagine what behaviors they wish they could start or stop doing 3m3s.
Behavior change is a crucial aspect of various areas, including education and health, and can be influenced by technology to help people make better decisions and change their behavior for good 3m45s.
The vision for 2034 is to have on-demand intelligent coaches that measurably help people think through many problems and change their behavior 4m11s.
The ADC framework has been used as a foundation from 2012 to 2024 to design intelligent interventions, which are tools that change behavior and learn over time 4m22s.
Intelligent interventions can be in the form of text messages, app messages, or explanations of concepts, and they try to come up with the best thing to give a person at a specific point in time 4m31s.
To build intelligent interventions, multiple disciplines need to be integrated, and continuous learning is essential 4m47s.
The Adap Comp framework takes components of everyday experience and makes them into micro-labs for intelligent intervention, where ideas can be tested and improved over time 5m17s.
Examples of intelligent interventions include text messages, such as a daily message that is crowdsourced and tested to figure out which one is best for a person 5m30s.
Emails can also be turned into intelligent coaches by generating multiple versions and testing what works when, such as getting students to start their homework early 6m4s.
Any everyday experience, such as explanations on a website, can be turned into an intelligent intervention, and it should be able to learn and adapt over time 6m31s.

A/B Testing and Machine Learning

Machine learning from human preferences can be applied to various aspects of life, making everything an intelligent intervention to test out different versions and changes over time 7m54s.
A/B testing can be used to improve learning management systems (LMS) by testing different versions of explanations, messages, and answers to figure out what works best for students 7m37s.
However, A/B testing in LMS is not being utilized to its full potential, with many opportunities for improvement, such as testing 200 times more versions than currently being done 7m52s.
One challenge in implementing A/B testing is achieving sufficient statistical power, especially in small exchanges where the impact on metrics like click rates may be very small 8m15s.
To overcome this challenge, collaborations can be built to conduct A/B testing at scale, such as in education and mental health settings 9m2s.
Another approach is qualitative A/B testing, which involves gathering feedback and thinking through different options, even with limited data, such as when sending an email to only one person 9m24s.
A paper titled "Abri by Moi Razer" explores qualitative A/B testing and its potential applications 9m17s.
A/B testing can also be applied in a more thought-experimental way, using tools like A/V scribe to create different versions of an email and thinking through how they would come together 9m43s.

Qualitative A/B Testing and Thought Experiments

The concept of A/B testing can be applied qualitatively, rather than just quantitatively, and a new name for this approach is being sought, with suggestions including "qualitative A/B comparisons" and "super A/B comparisons" 10m36s.
Intelligent intervention involves a process that can work even without quantitative data, consisting of two parts: generating options and exploiting the evaluation of these options, which can be achieved through adaptive experiments that integrate reinforced learning and human judgment 10m56s.
To generate options, one can use language models (LMS) and human input to create alternative versions, and then explore the space of options to come up with better ideas 11m10s.
Exploiting the evaluation of options involves explaining the execution of these options, which can be done using adaptive experiments that give people what looks better, integrating reinforced learning and human judgment 11m30s.
A concrete example of this process is assigning probabilities to different versions of an email, such as 70% to one version and 30% to another, and then giving each version to the corresponding percentage of people to gather more data 11m51s.
This approach allows for rethinking the world to not just give 100% or 0% of something, but to give things proportional to the probability of having the best outcome 13m47s.

Probabilistic Models and Human Judgment

Assigning items or actions proportionally to the probability of being the best requires defining what "best" means and deciding on the complexity of the model used to weigh different versions against each other 14m7s.
Building a probabilistic model in the background is necessary to estimate the belief of the student and plan subsequent actions, but there are many ways to do this and many different ways to plan subsequent actions 14m33s.
The process of figuring out relevant components of a model and setting probabilities for A/B testing is complex and depends on various factors, with no one-size-fits-all solution 15m8s.
One possible model for setting probabilities in a two-arm case with binary outcomes will be presented, but it has its limitations 15m18s.
An alternative approach is to use a crowdsourced mixture of experts, where a group of people, such as marketing experts, provide their opinions and update their beliefs based on the data 15m33s.
There is a need for more research on getting human beings to set probabilities, and statistical models can be used, but more innovative work is required 15m58s.
The goal is to make A/B testing accessible to 8 billion people, who may not run formal A/B tests but can still use tools to randomize and test different versions of emails or messages 16m11s.
The intention is to rethink the way decisions are made and bring elements of testing together to help think about signals that indicate what might be better or worse 17m23s.
The goal is to encourage randomization, even if it's not a 50/50 split, and to build the habit of A/B testing, which is progress towards making better decisions 17m57s.
The problem of choosing probabilities is a problem to solve, and worrying about it gives confidence that a solution can be found 18m12s.
Research has been done on models that can be used for A/B testing, but there is still a lot of room for improvement and innovation 18m21s.
Quantitative methods are being used to estimate probabilities, but this approach may not be enough to help people make better decisions, and crowdsourced approaches can be more effective in this regard 18m27s.
The quality of work will be improved with crowdsourced approaches, and one way to see this is by using models that tell us what our probabilities are 18m49s.

AB Scribe and Thought A/B Testing

An example of an experiment is AB scribe, a tool that enables users to create and test different versions of a message, such as an email, to see which one is more effective 19m21s.
AB scribe allows users to select pieces of the message and make them an A/B test, and then use a visual language model to generate different versions of the message 20m10s.
The tool can be used to experiment with different versions of a message and see which one works better for a specific audience, such as someone who is very scientific or someone who believes in the benefits of yoga 20m30s.
The goal of AB scribe is to help users create more effective messages by testing different versions and seeing which one performs better 20m53s.
The tool can be used in various contexts, such as writing an email to students before a test, and can help users think through different versions of a message and get feedback from a language model 21m10s.
The approach used in AB scribe is called thought A/B testing or thought A/B comparison, which involves defining options, thinking through different versions, getting feedback from a language model, and traversing the options 21m25s.
The future of this approach is promising, and researchers can build on this work to publish high-quality papers in this area 21m47s.

Study on Stress and Test Performance

A study was conducted to test the effectiveness of a three-minute message in improving test performance, with the message being that stress can actually help improve performance on a test, not hurt it 21m52s.
The study involved six different elements, including varying the text, showing people instructions, a video of the speaker, and asking participants to explain what they thought they'd learn 22m36s.
The core message was tested in many ways, and the results showed that the message on its own, without any additional information, resulted in an average grade of 76% 23m7s.
When the message was elaborated on, such as explaining how stress can help put more resources into the task and make the person pay more attention, the results showed a significant improvement, with an average grade of around 80% 23m38s.
The study involved running six different experiments in a couple of months, which would have taken a couple of years to run with traditional research methods 24m2s.
The results of the study suggest that the message could be effective in improving test performance, but it's not a guarantee, and the probability of it working in a particular population is not 100% 24m29s.
To increase the chances of the message being effective, it's suggested to use crowdsourcing to come up with different versions of the message and then test them using A/B testing and machine learning models 24m36s.
The data from the study could be used to calculate the probability of the message replicating in a particular setting, and this probability could be used to inform decisions about whether to use the message in a particular context 25m7s.
Instead of just using statistical models to predict the effectiveness of the message, it's suggested to give the data to someone who is teaching a class and let them decide whether to use the message based on their own judgment 25m30s.
A class was conducted to demonstrate how quantitative data can be combined with human judgment to make decisions, with the results showing a 76% chance of a particular message being the best to show students 25m34s.

Combining Quantitative Data and Human Judgment

The process involves taking quantitative data, computing probabilities based on a model, and then giving that data to a human for judgment, allowing for the combination of human judgment with model probabilities 25m49s.
This approach may seem subjective, but it can be a good way to connect technical tools with human behavior and decision-making 26m23s.
The approach does not rely on individual estimates of probabilities, but rather on combining different beliefs and models, such as Bayesian combinations of beliefs 26m54s.
There are different tools that can be used in this process, including statistical models, crowdsourcing opinions from people, and combining mathematical concepts with human behavior 27m25s.
The goal is to take mathematical concepts and apply them to real-world problems, rather than just focusing on pure math or theoretical models 28m3s.
This approach requires a combination of technical skills and human intuition, with the goal of designing experiences and interventions that can help people solve problems 28m9s.

Crowdsourcing and Human Feedback

The idea of "high-dimensional intuition" is mentioned, which refers to the ability to navigate complex and messy problems, and to know which problems are worth working on 28m17s.
Crowdsourcing opinions from people is a well-defined process that can be used to combine mathematical concepts with human behavior, and can be reproduced and changed as needed 28m50s.
Human feedback is essential in machine learning, and people are generally bad at quantifying probabilities, making it challenging to work with human preferences 29m25s.
The importance of using probabilities as signals in human feedback is questioned, and alternative signals or lower-effort methods are considered 29m47s.

Adaptive A/B Testing and Reinforcement Learning

Deviating from uniform random sampling can increase power in certain cases, and quantitative adaptive AB testing is a valuable approach 30m23s.
AB testing is a fundamental form of reinforcement learning, and Thompson sampling is a method that samples probabilities to make decisions 31m6s.
A simple form of AB testing is the beta bandit model, which involves two arms (e.g., explanation A vs. B) and a reward outcome (1 or 0) 31m17s.
In the beta bandit model, a policy learning algorithm like Thompson sampling assigns probabilities to each arm, and a beta distribution is used to model the probability of a positive outcome 31m50s.

Beta Bandit Model and Thompson Sampling

A beta distribution with parameters 1 and 1 (beta 1,1) is a uniform distribution, indicating no prior knowledge or data 32m17s.
Using a beta 1,1 distribution as a prior means that all possible probabilities are equally likely, and it's like having no data or one success and one failure 32m21s.
A beta distribution can be used to represent the probability that people will think something is helpful, with the reward being a draw from the parameter of that beta distribution, and it is simple and interpretable 32m54s.
For example, if five people are given arm one and three of them like it, while two do not, the beta distribution becomes beta 1+3, 1+2, which is skewed a bit higher, indicating a higher probability that people think it's helpful 33m19s.
If five people are given arm two and all of them like it, the beta distribution becomes beta 1+5, 1+0, which is skewed even higher, indicating a high probability that people like it, but still with some uncertainty 33m55s.
Having more observations, such as beta 60 and 10 versus beta 6 and 1, results in a more peaked distribution, indicating more confidence in the probability, even if the mean is the same 34m22s.
The mean of a beta distribution can be calculated, for example, the mean of beta 60 and 10 is 6/7, indicating a high probability that people will like something 34m35s.
Thompson sampling can be used to estimate the probability that one arm is better than another by sampling from the beta distributions and assigning the arm with the highest expected reward 35m48s.
For example, if the beta distribution for arm A is 6 and 1, and for arm B is 3 and 3, Thompson sampling would sample from each distribution and assign the arm with the highest expected reward, which in this case would be arm A 36m0s.
Thompson sampling works by sampling from the beta distributions and comparing the expected rewards, and it makes sense because it takes into account the uncertainty of the probabilities 36m35s.
To determine the probability of one arm being better than another in an adaptive experiment, one can sample from the beta distributions 10,000 times and compute the probability empirically, which can result in a probability of 80% or 70% that one arm is better than the other 37m5s.
The inference in adaptive experiments is not trivial, and there are many mistakes people make when using these experiments without realizing it 37m32s.
Sampling works by choosing the arm that gives the highest reward under certain parameters, and one can compute the probability empirically by repeating the process 10,000 times 36m44s.
The online model can be used to determine the probability of one arm being better than another, and it is possible to formalize human judgment and manual assignment using beta distributions 37m46s.
If a person has a prior belief about the probability of one arm being better than another, it is possible to back out their prior into beta distributions, which can be represented as fictional successes and failures 38m11s.
For example, if someone thinks there is a 70% chance that arm A is better than arm B, they might represent this as 8 successes and 2 failures in arm A, and 5 successes and 5 failures in arm B 38m22s.
There is a need for research on how people express probabilities that one arm is better than another and how to map these probabilities to beta distributions 38m56s.
It is possible to help people formalize their beliefs about whether one arm is better than another by providing interactive explorers or tools that help them understand how their intuitions can be mapped to beta distributions 39m12s.
It is also possible to weight people's approaches differently, for example, by giving more weight to the estimates of someone who is considered an expert 39m37s.
For instance, if someone thinks that an expert's estimates are 10 times more valuable than their own, they can weight the expert's successes and failures 10 times more than their own 40m12s.

Combining Expertise and Opinions

A method of combining expertise and opinions involves assigning 50/50 using human judgment or a model, then evaluating the results with both quantitative and qualitative metrics to determine the probability of one option being better than the other 40m21s.
This approach, called Conant experimenting, involves continuously adding new ideas and updating probabilities over time 41m2s.
The method requires a lot of trust and effort from users, and it's possible for individuals to try to skew the results if they're very confident about something 41m22s.
A paper titled "Eliciting beliefs from people to get better decision making even when some parties want to sway the vote" investigates ways of gathering feedback from different people to come to a group consensus 41m42s.
The paper examines policies for combining input from different people and investigates weighting schemes, showing that some schemes can suffer from bias if someone tries to sway the majority 41m55s.
The "hippo" approach, where decisions are made by the highest paid person's opinion, can be problematic, and a paper on this topic could explore ways to defend against this bias 42m32s.

Language Models and Personalized Interventions

Using language models to quickly refine priors and simulate human behavior is a potential area of research, with a paper by Rob Willer at the university exploring the use of language models to simulate human behavior 43m45s.
Writing a paper on using language models to help with decision-making could be a quick, easy, and highly impactful project 44m1s.
There are numerous potential research papers that can be written on the topic of using large language models (LMs) to simulate human behavior and provide personalized interventions, with at least 200 possible papers in the next 10 years 44m11s.
Personalized and contextualized interventions are an area of research that involves using LMs to provide tailored advice and support to individuals in specific situations 44m35s.
Ana's paper on "Small Steps SMS" is an example of a personalized intervention that sends daily messages to help users manage stress and be happier 44m46s.
Another paper by Ana discusses the use of design prompts for self-reflection, allowing users to ask themselves questions to manage stressful situations 44m53s.
A paper on "Stories and Messages" presents a system that provides users with stories about others who are going through similar experiences, and allows users to abstract the lessons to their own setting 45m8s.
A recent paper explores the use of LLMs as a thought partner to help users adapt stories to their own context 45m20s.
Another paper discusses the use of LLMs as a thought partner to help users overcome procrastination, using prompt engineering and interface design to facilitate effective messaging 45m33s.
The "Teny Spark" paper or demo presents an interface that allows users to type in their situation and receive a message from an LLM, with options to customize the tone and style of the response 46m40s.
The interface allows users to edit the prompt and generate multiple messages, and can be used to provide support for a range of situations, including procrastination and stress management 47m13s.
The development of effective interfaces for LLMs is crucial for unlocking their potential value, and can make a significant difference in the user experience 47m31s.
The business value of AI can be demonstrated through the development of use cases that show its impact, such as personalized interventions and support systems 47m51s.

Adaptive Experimentation and Statistical Minorities

Adapting experimentation using psychology and Human-Computer Interaction (HCI) can help discover the best intervention for a statistical minority, which may be the opposite of what's best on average 48m1s.
A graph is presented showing the outcome of an experiment where students receive messages to generate questions about what they're learning in an online course, with two different messages (blue and red) having different effects on students with higher and lower accuracy 48m31s.
The data shows that on average, the red message is more effective in getting students to generate questions, but when broken down by accuracy, the blue message is more effective for students with lower accuracy, who are a statistical minority 49m1s.
This means that taking just the average would actually discriminate against the statistical minority, and using the red message for everyone would hurt the students who would benefit more from the blue message 49m35s.
The blue message is better for 20% of the data (students with lower accuracy), while the red message is better for 80% of the data (students with higher accuracy), but the blue message is substantially better for the lower accuracy group 50m15s.
Not running experiments and relying on intuition or qualitative data would make this problem bigger, as it would lead to missing the opportunity to discover the subset of students who benefit from the blue message 50m41s.
Using contextual bandits, an experiment can be designed to assign the messages separately for the low accuracy and higher accuracy groups, allowing for adaptation over time to help the hierarchy group and collect more data for the low accuracy group 51m15s.
This approach would allow for a 70% chance that the red message is better than the blue message for the higher accuracy group, while keeping a 50/50 split for the low accuracy group to collect more data 51m32s.
An approach to experimentation allows for continuous data collection over time, adapting to discover what's better for subgroups, and potentially leading to better outcomes for people in smaller groups, with a 20% increase in statistical power to detect effects 51m55s.
This method can help identify scenarios where option B is better than option A for most people, but might be hurting a minority, and can increase the statistical power to detect this difference 52m11s.
In a traditional experiment, outcomes for minorities can be uniform, with 36% of people generating a question, but this approach can lead to better learning outcomes and increased statistical power to detect the best option for minorities 52m53s.
The statistical power to detect the best option for minorities increased from 87% to 89%, which may not seem like a lot, but can translate to 50 fewer samples needed to be equally confident 53m6s.
The approach can also reduce the number of people needed in an experiment, with even five fewer people being a good outcome 53m18s.

Epsilon Thompson Sampling and Algorithm-Induced Test Statistic

The algorithm used is not Thompson sampling, which was previously thought to be effective, but rather Epsilon Thompson sampling, which involves doing uniform random sampling Epsilon of the time and Thompson sampling the rest of the time 54m0s.
Epsilon Thompson sampling can be thought of as doing some amount of traditional experimentation to reduce the biases of Thompson sampling and guard against changes in the world or missing minority populations 54m32s.
The analysis of data from Epsilon Thompson sampling can be done using a test statistic, which is a more effective method than debiasing techniques 55m30s.
When comparing two options, A and B, using Thompson sampling or epsilon Thompson sampling, a Z test can be used to determine if there is a significant difference between the two options, but this assumes uniform random sampling, which is not the case with these algorithms 55m51s.
The algorithm-induced test statistic takes the Z test statistic and regenerates it using the algorithm to collect the data, resulting in a different distribution for the test statistic and a different cutoff value for the hypothesis test 56m21s.
Using this approach, the false positive rate can be fixed, whereas Thompson sampling can push the false positive rate to 15% compared to 5% with uniform random sampling, and the power of the test is not bad, with epsilon Thompson sampling giving a power of 66% 56m50s.
The key takeaway is that when running experiments adaptively, epsilon Thompson sampling or a version called TS postf can be used, and then the algorithm-induced test statistic can be applied to get better results 57m22s.
Traditional Thompson sampling has a specific tradeoff between false positives and other errors, but modifying the algorithm can result in other tradeoffs that might be more practical for a study, giving good inference 57m43s.
Thompson sampling is not necessarily on the Pareto frontier, and changing the algorithm can result in better uniformly more statistically sensitive results 58m3s.
The issues with Thompson sampling have been characterized, and using epsilon top sampling ideally adaptive epsilon top sampling will give better statistically sensitive results, but the way statistical tests are done should also be modified using algorithm-tuned analysis 58m19s.
The Z test is used to compare means by subtracting the sample mean of people who said they like explanation A from B, and then getting the standard error of the difference, which largely depends on the sample sizes 59m7s.
The paper shows how Thompson sampling affects positive rates compared to uniform random, and other tests were done to show that factors such as T test, inverse propensity, and waiting also cause issues 59m31s.
The AL induced test has a power of 17% and controls for a positive rate of 5%, but this is not great, and the probability technique difference exists when it does not, resulting in false positives 59m44s.
This occurs due to sampling issues, where a random low mean is observed, and sampling is stopped, leading to a false conclusion of a difference when there is none 1h0m11s.
The vertical axis represents a sample estimate of the mean, and in reality, both arms have a mean of 0.5, but due to sampling, one arm appears to have a lower mean, leading to a false positive 1h0m26s.
The adaptive algorithm stops sampling from the arm that appears worse, resulting in a false conclusion of a difference, even with 800 participants 1h0m40s.
This issue also explains why there is lower power in experiments, as the sample size is not evenly distributed between arms, leading to a lack of confidence in the results 1h1m26s.
Even when there is no difference in means, the assign probability can be 80% or higher in 53% of the time, indicating a high probability of one arm being better when there is no actual difference 1h1m55s.
Empirical exploration of this issue is necessary, and using a tuned analysis algorithm can help control the false positive rate to 5% and increase power 1h2m50s.
The algorithm allows for better behavior in experiments, with lower false positive rates and better power, especially when there is a small difference between arms 1h2m40s.

Uniform Random Sampling and Hypothesis Testing

Under uniform random sampling, the wall Z test is distributed in a way that allows for hypothesis testing, and the probability of getting a test statistic of 1.96 or bigger can be calculated to determine the alpha level 1h3m12s.
Power is the probability of getting a test statistic bigger than 1.96 when there is an arm difference of 0.1, and in this case, power is 80% with a sample size of 785 1h3m41s.
Thompson sampling (TS) can give extreme values of the test statistic, but it's not the reality, resulting in lower power because it reduces the test statistic when there is a difference 1h3m58s.
TS can't tell apart when there's a difference versus when there's not, reducing the signal from the two distributions 1h4m28s.
The algorithm-induced test statistic can be adjusted to set the cut-off values to be higher, giving a 5% false positive rate, but this results in lower power, at 17% 1h4m52s.
The problem is that the data are not discriminating, and TS or adaptive epsilon Thompson sampling drives the distributions apart 1h5m21s.
Statistically sensitive algorithms are needed, and a paper is being worked on to investigate hybrid combinations of reward-maximizing bands like Thompson sampling with traditional uniform random 1h5m40s.

TS Positive and Adaptive Epsilon Thompson Sampling

Epsilon Thompson sampling has a fixed probability, but it's not clear what epsilon value to use, and it's not the statistician's business to decide 1h6m3s.
A clever idea is to use the posterior probability of the arm difference being less than a certain value (e.g., 0.1) to determine epsilon, which is called TS positive 1h6m56s.
TS positive says that if the probability of the arm difference being smaller than the specified value is X, then that X is epsilon, and this adaptive epsilon Thompson sampling adjusts epsilon based on the posterior probability 1h7m31s.
The algorithm provides a trade-off across different sample sizes and arm differences or effect sizes, and it is believed to beat existing algorithms because it adapts epsilon to sampling, with a uniform random sampling rate of 10% regardless of the sample size 1h8m2s.
Using adaptive epsilon to sampling Thompson sampling (TS) positive results in a better trade-off between reward power and false positive rate, with a fixed false positive rate of 5% 1h8m31s.
The algorithm's power is comparable to other methods, with TS positive having a power of 82%, uniform random having a power of 87%, and Thompson sampling having a power of 66% 1h8m50s.
The use of TS positive and the algorithm allows for a good trade-off between reward inference and false positive rate, making it suitable for use in experiments 1h9m3s.

Future Research and Vision

The development of more statistically sensitive algorithms, such as adaptive epsilon Thompson sampling and algorithm-tuned analyses, is a potential area of future research 1h9m22s.
The vision of experimentation and coaching involves the use of intelligent interventions, such as adaptive experimentation tools like ABscribe, and the use of TS positive and algorithm-tuned tests 1h10m11s.
The goal of this research is to accelerate science and impact practice through the development of more effective experimentation and coaching methods 1h10m23s.

Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else

Save this summary

Keep it in your library.

Save to your library

Browse all from Stanford Online →

Stanford CS153 Frontier Systems | The Road Ahead: Resilience Required

Stanford CS153 Frontier Systems | The Road Ahead: Resilience Required

YouTube02 Jun 2026

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Artificial Intelligence

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

YouTube02 Jun 2026

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

Artificial Intelligence

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

YouTube02 Jun 2026

Stanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer

Entrepreneurship

Stanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer

YouTube25 May 2026

Stanford CS547 HCI Seminar | Spring 2026 | HCI and Human-Centered AI for Digital Health

Health & Medicine

Stanford CS547 HCI Seminar | Spring 2026 | HCI and Human-Centered AI for Digital Health

YouTube25 May 2026

Stanford CS25: Transformers United V6 I Distinct Modes of Generalization from Parameters and Context

Artificial Intelligence

Stanford CS25: Transformers United V6 I Distinct Modes of Generalization from Parameters and Context

YouTube25 May 2026

Ready to get started?

Save, summarize and chat with your content.

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop, personal AI knowledge base for summarizing and chatting with your content