YouTube video summary

Stanford CS236: Deep Generative Models I 2023 I Lecture 12 - Energy Based Models

Artificial intelligence06 May 20242 min summaryFrom Stanford Online
Stanford CS236: Deep Generative Models I 2023 I Lecture 12 - Energy Based Models
Stanford Online
YouTube

Energy-Based Models (EBMs)

  • EBMs use an energy function to represent the probability distribution of data.
  • Sampling from EBMs is challenging, especially in high dimensions, making training computationally expensive.
  • Alternative training methods for EBMs are needed that do not require sampling during training.

Score Matching

  • The score function provides an alternative view of the original function by looking at things from the perspective of the gradient instead of the likelihood itself.
  • The Fisher divergence between two probability densities can be used as a loss function for training EBMs.
  • The Fisher divergence can be expressed in terms of the difference between the gradients of the log data density and the log model density.
  • This results in a loss function that can be evaluated and optimized as a function of the model parameters.
  • The loss function encourages the data points to be local maxima of the log-likelihood, ensuring a good fit of the model to the data.

Contrastive Learning

  • An alternative training method for EBMs involves contrasting data to samples from a noise distribution rather than directly to samples from the model.
  • By parameterizing the discriminator in terms of an energy-based model, the optimal discriminator will force the energy-based model to match the data distribution.
  • Contrastive learning with EBMs involves distinguishing between real data and fake samples generated from a fixed noise distribution.
  • The noise distribution should be close to the data distribution for effective learning.
  • Sampling during inference is not necessary as the trained model can be used as an energy-based model.

Noise Contrastive Estimation (NCE)

  • NCE is similar to AGN in that it uses binary cross-entropy loss and is a likelihood-free method.
  • Unlike AGN, NCE does not involve a Minimax optimization and is more stable to train.
  • NCE requires the ability to evaluate the likelihood of contrastive samples, while AGN only requires the ability to sample from the generator.
  • In NCE, the discriminator is trained to distinguish between real and noisy samples, and the energy function derived from the discriminator defines an energy-based model.

Flow Contrastive Estimation (FCE)

  • FCE is a variant of NCE where the noise distribution is defined by a normalizing flow model.
  • The flow model is trained adversarially to confuse the discriminator, making the classification problem harder and the noise distribution closer to the data distribution.
  • FCE provides both an energy-based model and a flow model, with the choice of which to use depending on the specific task.
Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else
Save this summary

Then save anything you watch or read next.

Bookmark this summary, then save any video, article or PDF you read next.

Save to your library
Browse all from Stanford Online →

Ready to get started?

Save, summarize & chat with your content.

GET STARTED

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop