YouTube video summary

Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

Artificial intelligence

06 May 20244 min summaryFrom Stanford Online

Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

Stanford Online

Save to your library

Chat with this summary

Score-based models

Score-based models, also known as diffusion models, are a state-of-the-art class of generative models for continuous data modalities like images, videos, speech, and audio.
Unlike likelihood-based models that work with probability density functions, score-based models focus on the gradient of the log density, known as the score function.
Score-based models offer an alternative interpretation of probability distributions by representing them as vector fields or gradients, which can be computationally advantageous.
Score-based models address the challenge of normalization by modeling data using the score instead of the density, allowing for more flexible parameterizations without strict normalization constraints.

Score matching

Score matching is a technique used to train energy-based models by fitting the model's score function to match the score function of the data distribution.
The Fisher Divergence, which measures the difference between two probability distributions, can be rewritten in terms of the score function, enabling efficient optimization without computing the partition function.
Score matching can be applied to a wide range of model families beyond energy-based models, as long as the gradient of the log density with respect to the input can be computed.
Score matching directly models the gradients (scores) rather than the likelihood, and it does not involve a normalization constant or latent variables.
The term "scores" is used in the literature and for loss functions like the Fisher score, hence the name "score matching."
Score matching aims to estimate the gradient of the data distribution to model the data.
The Fisher divergence is used to measure the difference between the true and estimated vector fields of gradients.
Minimizing the Fisher divergence as a function of theta is a reasonable learning objective.

Denoising score matching

Denoising score matching is an approach to address the computational challenges of score matching by estimating the gradient of data perturbed with noise.
Denoising score matching is computationally more efficient, especially when the noise level is relatively small.
The speaker introduces a method to approximate the score of a data density perturbed with noise, denoted as Q Sigma.
This approximation is achieved by replacing the Fisher Divergence between the model and the data with the Fisher Divergence between the model and the noise-perturbed data density.
The key idea is that when the noise level Sigma is small, the noise-perturbed data density Q Sigma is close to the original data density, making the estimated scores similar.
The resulting algorithm involves sampling data points, adding Gaussian noise, and estimating the denoising score matching loss based on the mini-batch.

Noising score matching

Noising score matching is a technique used in generative modeling.
It involves adding noise to data points and training a model to estimate the noise.
The goal is to minimize the loss between the estimated noise and the actual noise.
This approach is scalable and easier to implement compared to directly modeling the distribution of clean data.
Noising score matching is equivalent to minimizing the original loss function up to a constant.
The optimal denoising strategy involves following the gradient of the perturbed log-likelihood.
The technique is applicable to various noise distributions as long as the gradient can be computed.

Sliced score matching

Random projections can be used to efficiently approximate the regional score matching loss.
Sliced Fisher Divergence is a variant of the Fisher Divergence that involves Jacobian Vector products, which can be efficiently estimated using backpropagation.
The projection operation is a dot product between the data and model gradients projected along a random direction.
Biasing the projections towards certain directions does not seem to make a significant difference in practice.
Sliced versions of score matching are constant with respect to the data dimension and perform similarly to exact score matching.

Inference in diffusion models

Inference in diffusion models can be done by following the gradient of the log probability density or using Markov Chain Monte Carlo (MCMC) methods.
Lyapunov dynamics sampling is a method for generating samples from a density using the estimated gradient.
Lyapunov dynamics sampling is a valid Markov chain Monte Carlo (MCMC) procedure in the limit of small step sizes and an infinite number of steps.
Real-world data tends to lie on low-dimensional manifolds, which can cause problems for Lyapunov dynamics sampling.
Diffusion models provide a way to fix this problem by estimating these scores more accurately all over the space and getting better guidance.

Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else

Save this summary

Keep it in your library.

Save to your library

Browse all from Stanford Online →

Stanford CS153 Frontier Systems | The Road Ahead: Resilience Required

Stanford CS153 Frontier Systems | The Road Ahead: Resilience Required

YouTube02 Jun 2026

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Artificial Intelligence

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

YouTube02 Jun 2026

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

Artificial Intelligence

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

YouTube02 Jun 2026

Stanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer

Entrepreneurship

Stanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer

YouTube25 May 2026

Stanford CS547 HCI Seminar | Spring 2026 | HCI and Human-Centered AI for Digital Health

Health & Medicine

Stanford CS547 HCI Seminar | Spring 2026 | HCI and Human-Centered AI for Digital Health

YouTube25 May 2026

Stanford CS25: Transformers United V6 I Distinct Modes of Generalization from Parameters and Context

Artificial Intelligence

Stanford CS25: Transformers United V6 I Distinct Modes of Generalization from Parameters and Context

YouTube25 May 2026

Ready to get started?

Save, summarize and chat with your content.

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop, personal AI knowledge base for summarizing and chatting with your content