A short derivation of the drift and diffusion coefficients for diffusion models.
\(\begin{equation} \label{eq:ideal_sde} \rmd \bfX_t = \underbrace{\vphantom{\sqrt{\frac{\rmd \sigma_t^2}{\rmd t}}}\frac{\rmd \log \alpha_t}{\rmd t}}_{f(t)} \bfX_t \; \rmd t + \underbrace{\sqrt{\frac{\rmd \sigma_t^2}{\rmd t} - 2\sigma_t \frac{\rmd \log \alpha_t}{\rmd t}}}_{g(t)}\; \rmd \bfW_t \end{equation}\)
As I was writing up my Ph.D. thesis
This is a very convenient property for diffusion models as it implies sampling in forward time reduces to the nice form of \(\begin{equation} \bfx_t = \alpha_t \bfx_0 + \sigma_t \boldsymbol \epsilon, \qquad \boldsymbol \epsilon \sim \mathcal{N}(\boldsymbol 0, \boldsymbol I), \end{equation}\) which enables the flexible use of simulation-free training techniques
Goal. In this blog post I walk through a derivation of these coefficients, starting with the ideal transition kernel, and then deriving the corresponding SDE which produces this transition kernel. My hope is that this is helpful to other researchers diving into the maths behind diffusion models.
Before diving straight into the derivation, we will cover some useful maths on the transition kernel. Consider a general $d$-dimensional Itô SDE driven by the standard $d’$-dimensional Brownian motion ${\bfW_t : 0 \leq t \leq T}$, \(\begin{equation} \label{eq:sde} \rmd \bfX_t = \bsf(t, \bfX_t)\; \rmd t + \bsg(t, \bfX_t)\; \rmd \bfW_t. \end{equation}\)
For the reader unfamiliar with SDEs one can think of the first term in \eqref{eq:sde}, $\bsf(t, \bfX_t)\; \rmd t$, as a standard ordinary differential equation (ODE). The diffusion coefficient $\bsg: [0,T] \times \R^d \to \R^{d\times d’}$ and infinitesimal $\rmd \bfW_t$ can be thought of another differential equation that is controlled by the noisy signal $\bfW_t$. There is a lot technical details required for Itô integration to be well-defined which we elide in this post. For further details we recommend Peter Holderrieth’s excellent blog post.
To understand the transition kernel, $q_{t|0}(\bfx_t | \bfx_0)$, of this SDE it would be helpful to understand the dynamics of $\ex[\bfX_t]$ and $\var[\bfX_t]$. Or in other words we would like to find some functions $\boldsymbol \mu: [0,T] \times \R^d \to \R^d$ and $\boldsymbol \Sigma: [0,T] \times \R^d \to \R^{d\times d}$ such that \(\begin{align} \rmd \ex[\bfX_t] &= \boldsymbol\mu(t, \bfX_t) \; \rmd t,\\ \rmd \var[\bfX_t] &= \boldsymbol\Sigma(t, \bfX_t) \; \rmd t. \end{align}\) Written in this form it seems natural to apply the chain rule from calculus since we have an expression for $\bfX_t$ in \eqref{eq:sde}.
Unlike in traditional calculus, the chain rule for Itô calculus is given by the famous Itô’s lemma (or Itô’s formula), and has a second-order correction term which can be thought of as accounting for the complexities of integration against rough stochastic signals.
Consider the Itô SDE in \eqref{eq:sde}. Then, for a sufficiently smooth function $\phi: [0,T] \times \R^d \to \R^{d''}$ we can write $\begin{equation} \label{eq:itolemma} \begin{aligned} \rmd \phi(t, \bfX_t) &= \bigg(\frac{\partial}{\partial t}\phi(t, \bfX_t) + \innerprod{\nabla_\bfx \phi(t, \bfX_t)}{\bsf(t, \bfX_t)}\\ &\qquad + \frac 12 \innerprod{\nabla_\bfx^2 \phi(t, \bfX_t)}{\bsg(t, \bfX_t)\bsg(t, \bfX_t)^\top}_F\bigg)\;\rmd t\\ & + \innerprod{\nabla_\bfx \phi(t, \bfX_t)}{\bsg(t, \bfX_t)\;\rmd \bfW_t}, \end{aligned} \end{equation}$ where $\innerprod{\cdot}{\cdot}_F$ is the Frobenius inner product.
Thus for some sufficiently smooth $\phi$ we can take the expectation of both sides in \eqref{eq:itolemma} and formally divide both sides by $\rmd t$ to find \(\begin{equation} \begin{aligned} \frac{\rmd \ex[\phi(t, \bfX_t)]}{\rmd t} &= \ex\left[\frac{\partial \phi}{\partial t}\right] + \ex\left[\innerprod{\nabla_\bfx \phi(t, \bfX_t)}{\bsf(t, \bfX_t)}\right]\\ &+ \frac 12\ex\left[\innerprod{\nabla_\bfx^2\phi(t,\bfX_t)}{\bsg(t, \bfX_t)\bsg(t, \bfX_t)^\top}_F\right]. \end{aligned} \end{equation}\) Now let $\phi$ denote the identity function $(t, \bfX_t) \mapsto \bfX_t$. Then we arrive at the rather elegant ODE \(\begin{equation} \label{eq:mean} \frac{\rmd \ex[\bfX_t]}{\rmd t} = \ex[\bsf(t, \bfX_t)]. \end{equation}\) Recall that the covariance matrix can be defined as \(\begin{equation} \var[\bfX_t] = \ex[(\bfX_t - \ex[\bfX_t])(\bfX_t - \ex[\bfX_t])^\top]. \end{equation}\) Thus, with a little algebra we find \(\begin{equation} \label{eq:var} \begin{aligned} \frac{\rmd \var [\bfX_t]}{\rmd t} &= \ex[\bsf(t, \bfX_t)(\bfX_t - \ex[\bfX_t])^\top]\\ &+ \ex[(\bfX_t - \ex[\bfX_t])\bsf(t, \bfX_t)^\top]\\ &+ \ex[\bsg(t, \bfX_t)\bsg(t, \bfX_t)^\top]. \end{aligned} \end{equation}\) For more details on deriving these equations for the mean and covariance of Itô processes
Now in the context of diffusion models we often operate within the much simpler framework of affine coefficients, i.e., \(\begin{equation} \label{eq:linear_ito} \rmd \bfX_t = f(t)\bfX_t\; \rmd t + g(t)\; \rmd \bfW_t. \end{equation}\)
Given this SDE we will derive the drift and diffusion coefficients that yield the desired transition kernel in \eqref{eq:tran_kernel}, i.e., we will spend the rest of this blog proving the following proposition.
Given the linear Itô SDE in \eqref{eq:linear_ito}, a strictly monotonically decreasing smooth function $\alpha_t \in \mathcal C^\infty([0,T];\R_{\geq 0})$, a strictly monotonically increasing smooth function $\sigma_t \in \mathcal C^\infty([0,T]; \R_{\geq 0})$, with boundary conditions $\alpha_0 = 1$ and $\sigma_0 = 0$; and a desired transition kernel of the form $\begin{equation} q_{t|0}(\bfx_t|\bfx_0) = \mathcal N(\bfx_t; \alpha_t\bfx_0, \sigma_t^2 \boldsymbol I), \end{equation}$ the drift and the diffusion coefficients for the linear SDE are: $\begin{align} f(t) &= \frac{\rmd \log \alpha_t}{\rmd t},\\ g(t) &= \frac{\rmd \sigma_t^2}{\rmd t} - 2\sigma_t^2 \frac{\rmd \log \alpha_t}{\rmd t}. \end{align}$
Remark. This particular SDE in \eqref{eq:linear_ito} describes a Gaussian process and thus the transition kernel is entirely described by the mean vector and covariance matrix in \eqref{eq:mean} and \eqref{eq:var}.
We will start by deriving the drift coefficient. Let $\boldsymbol \mu(t) = \ex[\bfX_t]$, then by \eqref{eq:mean} we have the following ODE \(\begin{equation} \frac{\rmd \boldsymbol\mu}{\rmd t}(t) = f(t)\boldsymbol \mu(t), \end{equation}\) with initial condition $\boldsymbol \mu(0) = \bfx_0$. We can solve this ODE by using the integrating factor $\exp \int_0^t f(\tau)\;\rmd \tau$ to find the solution for the first mean vector: \(\begin{equation} \boldsymbol \mu(t) = \bfx_0 e^{\int_0^t f(\tau)\;\rmd \tau}. \end{equation}\) From our definition of the transition kernel we know that $\boldsymbol \mu(t) = \alpha_t \bfx_0$, and thus we can derive $f(t)$ in terms of the schedule $\alpha_t$: \(\begin{align} \alpha_t \bfx_0 &= \bfx_0 e^{\int_0^t f(\tau)\;\rmd \tau},\nonumber\\ \alpha_t &= e^{\int_0^t f(\tau)\;\rmd \tau},\nonumber\\ \log \alpha_t &= \int_0^t f(\tau)\;\rmd \tau,\nonumber\\ \frac{\rmd \log \alpha_t}{\rmd t} &= f(t). \end{align}\)
Next, we turn towards finding an expression for $g(t)$. For convenience let $\boldsymbol \Sigma(t) = \var[\bfX_t]$. Next we perform the following simplification \(\begin{align} \ex[f(t)\bfX_t(\bfX_t - \boldsymbol \mu(t))^\top] &= f(t)\ex[\bfX_t(\bfX_t - \boldsymbol \mu(t))^\top],\nonumber\\ &= f(t)\boldsymbol \Sigma(t), \end{align}\) and the same for $\ex[(\bfX_t - \boldsymbol \mu(t))f(t)\bfX_t^\top]$ mutatis mutandis; likewise, \(\begin{equation} \ex[g(t)\boldsymbol I g(t) \boldsymbol I] = g^2(t) \boldsymbol I. \end{equation}\) Then, from \eqref{eq:var} the dynamics of the covariance matrix is described by \(\begin{equation} \frac{\rmd \boldsymbol \Sigma}{\rmd t}(t) = 2f(t)\boldsymbol \Sigma(t) + g^2(t) \boldsymbol I. \end{equation}\) From the boundary conditions we have $\boldsymbol \Sigma(0) = \boldsymbol 0$, thus using the method of integrating factors again, we find a closed form expression for $\boldsymbol \Sigma(t)$: \(\begin{equation} \boldsymbol \Sigma(t) = e^{2\int_0^t f(\tau)\;\rmd \tau} \int_0^t e^{-2\int_0^\tau f(u)\;\rmd u} g^2(\tau)\boldsymbol I\; \rmd \tau. \end{equation}\) Next, by definition of the desired transition kernel we assert that $\boldsymbol \Sigma(t) = \sigma_t^2 \boldsymbol I$. Substituting this into the previous equation yields \(\begin{align} \sigma_t^2 \boldsymbol I &= \frac{\alpha_t^2}{\alpha_0^2} \int_0^t \frac{\alpha_0^2}{\alpha_\tau^2}g^2(\tau)\boldsymbol I\; \rmd \tau,\nonumber\\ \frac{\sigma_t^2}{\alpha_t^2} \boldsymbol I &= \int_0^t \frac{g^2(\tau)}{\alpha_\tau^2}\boldsymbol I\; \rmd \tau. \end{align}\) Then with a little algebra and using Newton’s notation
With a little more work one can easily show the result of Kingma et al. (
The general transition kernel $q_{t|s}(\bfx_t\|\bfx_s)$ for $s < t$ of the Itô described in Proposition 2 is $$\begin{equation} q_{t|s}(\bfx_t|\bfx_s) = \mathcal N\left(\bfx_t; \frac{\alpha_t}{\alpha_s}\bfx_s, \left(\sigma_t^2 - \frac{\alpha_t}{\alpha_s}\sigma_s^2\right) \boldsymbol I\right). \end{equation}$$
We leave the proof as an exercise for the reader as it follows straight forwardly from our derivations for Proposition 2 with a simple change in the initial conditions.
In this blog post we presented a brief derivation for the commonly used drift and diffusion coefficients for diffusion models, starting with our desired transition kernel and then working backwards to find the resulting SDE.
Blasingame, Zander W. (May 2025). Deriving the Drift and Diffusion Coefficients for Diffusion Models. https://zblasingame.github.io.
or as a BibTeX entry:
@article{blasingame2025deriving-the-drift-and-diffusion-coefficients-for-diffusion-models,
title = {Deriving the Drift and Diffusion Coefficients for Diffusion Models},
author = {Blasingame, Zander W.},
year = {2025},
month = {May},
url = {https://zblasingame.github.io/blog/2025/noise-schedules/}
}