IJCB 2024 Spotlight

Greedy‑DiM: Greedy Algorithms for Unreasonably Effective Face Morphs

Existing Diffusion Morphs (DiM) treat the diffusion model as a black box — blend two latents at $t=T$, run the reverse ODE, accept whatever falls out. In this work we propose Greedy‑DiM, a family of morphing attacks that opens the box: at every step of the probability‑flow ODE we greedily optimize the noise prediction $\boldsymbol\epsilon$ against an identity heuristic. We find that the proposed algorithm is unreasonably effective — 100% MMPMR on ArcFace, AdaFace, and ElasticFace on SYN‑MAD 2022, the first representation-based attack to consistently outperform landmark methods, and with fewer network evaluations than vanilla DiM.

Clarkson University
Greedy-DiM* algorithm overview — animated Outer loop · PF‑ODE schedule, N steps from t=T to t=0 one tick of the ribbon = one DDIM step; the diagram below shows the work done at a single tick tN noise — t = T image — t = 0 A · inputs bona fide refs a b x0(a), x0(b) current latent xt n(ab) 1 NFE per outer step score network εθε INNER LOOP · K STEPS · PROPOSED gradient descent on the noise prediction k = 0 / K ℒ* = 0.000 heuristic falls as ε converges running noise ε(k) Tweedie estimate x0 = (x − σε) / α heuristic ℒ*ID d(vab, va) + d(vab, vb) + ⋅ update ε ℒ* ε ← ε − η ∇ backprop thru x0 repeat K times — keep the ε that minimizes ℒ*ID pick argmin D · outer DDIM step DDIM step √ᾱt n−1x0 + √(1 − ᾱt n−1) ε* next latent xt n−1 green marks the proposed changes, after Figure 1 of the paper. The score network is called once per outer step; the inner loop only touches the noise prediction ε.
outer step 1 / 10 · inner step 0 / 6
Figure 1.Overview of a single step of the Greedy-DiM* algorithm. Reading left-to-right: the current latent $\boldsymbol x_{t_n}^{(ab)}$ and conditional $\boldsymbol z_{ab}$ enter the score network $\boldsymbol\epsilon_\theta$ once per outer step; the resulting noise prediction $\boldsymbol\epsilon$ then enters the inner loop (green, proposed). The inner loop computes the Tweedie estimate $\hat{\boldsymbol x}_0 = (\boldsymbol x_{t_n} - \sigma_{t_n}\boldsymbol\epsilon)/\alpha_{t_n}$, evaluates the identity heuristic $\mathcal{L}_{ID}^*$ against the bona fide references $\boldsymbol x_0^{(a)}, \boldsymbol x_0^{(b)}$, takes a gradient step on $\boldsymbol\epsilon$, and repeats for $K$ iterations whilst tracking the running argmin $\boldsymbol\epsilon^*$. Finally the DDIM step $\Phi$ uses $\boldsymbol\epsilon^*$ to produce $\boldsymbol x_{t_{n-1}}$ and the cursor advances along the schedule. The inner loop touches only $\boldsymbol\epsilon$—the U-Net is evaluated just once per outer step. Proposed changes highlighted in green, after Figure 1 of the paper.
01 — Introduction

Existing DiM attacks leave the diffusion model as a black box

A face morphing attack creates a single image which contains the biometric information of multiple identities, such that a face recognition (FR) system registers a false acceptance with any of the contributing identities. Diffusion Morphs (DiM) are the current state-of-the-art for representation‑based morphing attacks: the two bona fide images are encoded into the twin latents of a Diffusion Autoencoder, blended, and decoded by solving the probability-flow ODE. Whilst DiM attains state-of-the-art performance, none of the existing work on DiM has leveraged the iterative nature of the diffusion model — the DiM model is treated as a black box, no differently than one would a GAN or a VAE.

Specifically, vanilla DiM fixes a blend of $w=0.5$, runs the reverse ODE once, and accepts whatever falls out. Morph‑PIPE proposes an extension: a batch of 21 morphs is generated at different blend values, each run to completion, and the candidate which minimizes an ArcFace identity loss is selected. In both cases the $N$ intermediate noise predictions along the diffusion trajectory go entirely unused.

However, a diffusion model is iterative. The probability-flow ODE is solved over $N$ timesteps, and at each timestep $t_n$ the noise prediction $\boldsymbol\epsilon_\theta(\boldsymbol x_t, \boldsymbol z, t_n)$ determines the next state — giving us $N$ knobs to turn, rather than one. We propose to use a greedy strategy to solve this optimization problem by locally choosing the optimal solution at each timestep in the sampling process. We call this family of algorithms Greedy-DiM.

Reframing the problem

The standard question is "which blend $w \in [0,1]$ produces the best morph?": a 1D search on a fixed trajectory between $\boldsymbol x_0^{(a)}$ and $\boldsymbol x_0^{(b)}$. We instead ask "at every step of the ODE solver, which $\boldsymbol\epsilon$ minimizes an identity heuristic?": an $N$-step search over the entire noise space, with the FR system's gradient guiding every move. The former search space has Lebesgue measure zero; the latter has full measure. We prove both statements.

We propose two variants in this family: Greedy-DiM-S, a discrete search over noise blends at each timestep, and Greedy-DiM*, continuous gradient descent on the noise prediction itself. Both run at fewer NFEs than vanilla DiM, and Greedy-DiM* attains a 100% MMPMR against ArcFace, AdaFace, and ElasticFace on SYN-MAD 2022. To the best of our knowledge, Greedy-DiM is the first representation-based morphing attack to consistently outperform landmark-based morphing algorithms.

02 — Greedy-DiM

A greedy strategy on the iterative sampling process

Diffusion models sample by solving the probability-flow ODE from $t=T$ down to $t=0$ via a numerical solver $\Phi$. At every timestep $t_n$, the noise prediction model $\boldsymbol\epsilon_\theta(\boldsymbol x_t,\boldsymbol z,t)$ admits an immediate clean-image estimate via Tweedie's formula — i.e., the noise predictor can be restructured into an image predictor, well before the solver has finished. The key idea of Greedy-DiM is to exploit this fact.

$\boldsymbol x_0$‑prediction (Tweedie identity) $$\hat{\boldsymbol x}_0(\boldsymbol x_t, \boldsymbol z, t) \;=\; \frac{\boldsymbol x_t \;-\; \sigma_t \,\boldsymbol\epsilon_\theta(\boldsymbol x_t,\boldsymbol z,t)}{\alpha_t}.$$

Intuitively, the noise predictor $\boldsymbol\epsilon_\theta$ can be restructured into an image predictor $\hat{\boldsymbol x}_0$, and so an identity heuristic $\mathcal{H}(\hat{\boldsymbol x}_0,\boldsymbol x_0^{(a)},\boldsymbol x_0^{(b)})$ can be evaluated at any timestep. The recipe is then straightforward: at each $t_n$, evaluate $\mathcal H$ on the current clean estimate, take a gradient step on $\boldsymbol\epsilon$ to lower it, feed the optimized noise back into the solver, and proceed. We propose to repeat this at every step, steering the morph trajectory toward an image that fools the FR system rather than passively decoding it from a fixed average.

The identity heuristic

For our heuristic function $\mathcal H$ we adopt the identity loss popularized by MIPGAN, which drives the morph close to both sources in the embedding space of an FR system $F$, whilst penalizing any asymmetry in the similarity to $a$ and $b$ to prevent the optimizer from collapsing onto a single identity.

Identity heuristic (Eq. 6 in the paper) $$\mathcal{L}_{ID}\;=\;d(v_{ab},v_a)+d(v_{ab},v_b),\qquad \mathcal{L}_{diff}\;=\;\bigl|\,d(v_{ab},v_a)-d(v_{ab},v_b)\,\bigr|,\qquad \mathcal{H}\;=\;\mathcal{L}_{ID}^{*}\;=\;\mathcal{L}_{ID}+\mathcal{L}_{diff}.$$

Here $F$ is an ArcFace network used solely to compute the heuristic during synthesis — it never sees the held-out evaluation systems.

Greedy‑DiM* — one step of the inner loop

  1. PredictRun the diffusion network once: $\boldsymbol\epsilon \gets \boldsymbol\epsilon_\theta(\boldsymbol x_{t_n}^{(ab)},\boldsymbol z_{ab},t_n)$. Snapshot $\boldsymbol\epsilon^*\!\gets\!\boldsymbol\epsilon$ as the current best.
  2. OptimizeFor $i=1,\dots,n_{opt}$: form $\hat{\boldsymbol x}_0$ from $\boldsymbol\epsilon$, evaluate $\mathcal{H}$, keep $\boldsymbol\epsilon^*$ if it improved, then take a gradient step $\boldsymbol\epsilon \gets \boldsymbol\epsilon - \eta\,\nabla_{\!\boldsymbol\epsilon}\mathcal{H}$.
  3. Step the solverApply the DDIM update with the optimized noise: $\boldsymbol x_{t_{n-1}}^{(ab)} \gets \Phi(\boldsymbol x_{t_n}^{(ab)},\boldsymbol\epsilon^*,t_n)$. Move on to $t_{n-1}$.

Crucially, the inner optimization touches only $\boldsymbol\epsilon$ and the heuristic — no additional forward passes through the diffusion U-Net are required. Consequently, Greedy‑DiM* runs at 270 NFEs in total (20 sampling + 250 encoding) — fewer than vanilla DiM (350) and a fraction of Morph-PIPE's 2350. The cost is in cheap heuristic gradient steps, not extra U-Net calls.

Greedy search and greedy optimization

We propose two algorithms in this family. Greedy-DiM-S, the search variant, runs the noise predictor twice per step — once conditioned on $\boldsymbol z_a$, once on $\boldsymbol z_b$ — and searches over $B=21$ slerp blends of the two predictions for the $w$ which minimizes $\mathcal H$.

Greedy-DiM* drops the two-trajectory framing entirely. Rather than searching for an optimal blend of two noise predictions, we propose to directly optimize the noise prediction itself: a single $\boldsymbol\epsilon$ per step, updated by gradient descent in $\mathbb R^d$. The search space at each step is the full noise space rather than a 1D blend manifold. The greedy optimization strategy provides a significant uplift in performance over the greedy search strategy whilst retaining the same number of NFE; we posit that this is in part due to the enhanced size of the search space, allowing for a more optimal solution to be found. Section 5 makes this precise.

03 — Experiments

Greedy-DiM* is unreasonably effective

A morphing attack is multi-objective: the attack must fool every recognizer whilst keeping NFE low. The radar below plots all nine attacks against three FR systems (MMPMR at $\mathrm{FMR}=0.1\%$ against AdaFace, ArcFace, ElasticFace) and a compute-efficiency axis ($1-\mathrm{NFE}/2350$). The MMPMR axes are scaled from 65% to 100% in order to make the separation between attacks legible.

We observe that Greedy-DiM* (deep blue fill) is the only attack which saturates all three FR axes, i.e., achieves 100% MMPMR on every tested FR system, whilst remaining cheaper on compute than vanilla DiM-A. This is achieved at only 270 NFEs — fewer than DiM-A, and an order of magnitude below Morph-PIPE's 2350.

AdaFace MMPMR 65% → 100% ArcFace 65% → 100% ElasticFace MMPMR 65% → 100% Compute 1 − NFE/2350
FaceMorpher (landmark) Webmorph (landmark) OpenCV (landmark) MIPGAN-II (NFE=150) Morph-PIPE (NFE=2350) DiM-A (NFE=350) Fast-DiM-ode (NFE=150) Greedy-DiM-S (NFE=350) Greedy-DiM* (NFE=270)

Morphing Attack Potential

Whilst MMPMR averages over identities, the Morphing Attack Potential metric (MAP) reports the fraction of morphs which simultaneously fool $c$ FR systems — the appropriate metric for deployments in which a single morph might be presented to multiple recognizers. We observe that Greedy-DiM* attains 100% in every column.

$\geq$ 3 FRs all three fooled $\geq$ 2 FRs any two fooled $\geq$ 1 FR at least one fooled Compute 1 − NFE/2350
FaceMorpher (landmark) Webmorph (landmark) OpenCV (landmark) MIPGAN-II (GAN) DiM-A (NFE=350) Fast-DiM-ode (NFE=150) Morph-PIPE (NFE=2350) Greedy-DiM-S (NFE=350) Greedy-DiM* (NFE=270)

Two FRLL identity pairs through five DiM-family morphing pipelines. Greedy-DiM* sits in the dead-center column — the single image each result in this section refers to — with the four baselines flanking it.

Identity A
DiM-A
baseline
Fast-DiM
fewer NFEs
Greedy-DiM*
ours
Morph-PIPE
21-blend search
Greedy-DiM-S
1D greedy
Identity B
Pair 1 — FRLL subjects 066 & 087
Bona fide identity A (FRLL subject 066)Bona fide
DiM-A morph of FRLL subjects 066 and 087DiM-A
Fast-DiM morph of FRLL subjects 066 and 087Fast-DiM
Greedy-DiM* morph of FRLL subjects 066 and 087Greedy-DiM*
Morph-PIPE morph of FRLL subjects 066 and 087Morph-PIPE
Greedy-DiM-S morph of FRLL subjects 066 and 087Greedy-DiM-S
Bona fide identity B (FRLL subject 087)Bona fide
Pair 2 — FRLL subjects 096 & 137
Bona fide identity A (FRLL subject 096)Bona fide
DiM-A morph of FRLL subjects 096 and 137DiM-A
Fast-DiM morph of FRLL subjects 096 and 137Fast-DiM
Greedy-DiM* morph of FRLL subjects 096 and 137Greedy-DiM*
Morph-PIPE morph of FRLL subjects 096 and 137Morph-PIPE
Greedy-DiM-S morph of FRLL subjects 096 and 137Greedy-DiM-S
Bona fide identity B (FRLL subject 137)Bona fide
Figure 5. Two FRLL identity pairs through five DiM-family morphing pipelines, with Greedy-DiM* (ours) in the center column. The four flanking baselines — DiM-A, Fast-DiM, Morph-PIPE, Greedy-DiM-S — all leave subtle ghosting around eyes and jawlines despite the diffusion backbone; Morph-PIPE's 21-blend search nudges sharpness up but cannot escape its measure-zero candidate set, and Greedy-DiM-S sharpens identity along a 1D slerp curve. Greedy-DiM*, optimizing over the full $\boldsymbol\epsilon$-space, lands a morph that ArcFace, AdaFace, and ElasticFace all accept as a match to both contributors at $\mathrm{FMR}=0.1\%$. Images adapted from Blasingame & Liu, IJCB 2024 (Figure 2 of the paper).
04 — Detectability Study

How does Greedy-DiM fare against morph attack detectors?

Fooling every recognizer is necessary but not sufficient. A deployed morph must also survive a morph attack detector (MAD) — a separately trained classifier deployed at the point of capture — whose job is to flag the morph as a forgery. Optimizing for MMPMR alone solves an easier problem than the one defenders actually face. Following the protocol of prior work, we train three SE-ResNeXt101 detectors, each on a different attack family — OpenCV (landmark), MIPGAN-II (GAN), DiM-A (diffusion) — and evaluate all eleven attacks against each.

EER OpenCV-MAD MACER @ 0.1% OpenCV-MAD MACER @ 1.0% OpenCV-MAD MACER @ 5.0% OpenCV-MAD EER MIPGAN-II-MAD MACER @ 0.1% MIPGAN-II-MAD MACER @ 1.0% MIPGAN-II-MAD MACER @ 5.0% MIPGAN-II-MAD EER DiM-A-MAD MACER @ 0.1% DiM-A-MAD MACER @ 1.0% DiM-A-MAD MACER @ 5.0% DiM-A-MAD
FaceMorpher (landmark) Webmorph (landmark) OpenCV (landmark) MIPGAN-I (GAN) MIPGAN-II (GAN) DiM-A DiM-C Fast-DiM Fast-DiM-ode Morph-PIPE Greedy-DiM-S (ours) Greedy-DiM* (ours)
Discussion

Under the OpenCV-trained MAD (top arc), Greedy-DiM* draws the smallest DiM polygon. We posit that the iterative gradient descent in $\boldsymbol\epsilon$-space injects high-frequency content into the latent which is helpful in fooling the FR system but introduces a fingerprint a landmark-trained MAD readily picks up. Under the DiM-A-trained MAD (left arc) the pattern reverses: Greedy-DiM* is the only DiM-family attack which retains a meaningful EER — roughly 3$\times$ further from the DiM-A decision boundary than the next best variant. In general, we observe that in order for the MAD algorithm to consistently detect a morphing attack it needs to be trained on a similar generation process. The practical implication for deployment is that defenders must train their MAD with diffusion morphs in the loop and keep the training set current with the attack frontier.

05 — Theoretical Analysis

Greed is, in fact, good

A greedy algorithm chooses a locally optimal solution at each timestep, but it is not automatically guaranteed that this locally optimal solution is globally optimal — famously, the greedy solution to the traveling salesman problem can yield the worst possible route. Motivated by our strong empirical results, we present two theorems which justify the use of a greedy strategy for the iterative sampling process of diffusion models. Under the standard assumptions of DDIM and the probability-flow ODE, the locally optimal $\boldsymbol\epsilon^*_{t_n}$ chosen by Greedy-DiM* at every step is globally optimal; moreover, the parameter space we optimize over has full Lebesgue measure, whilst Morph-PIPE's and Greedy-DiM-S's discrete subspaces have measure zero. Both statements admit short proofs in the paper; we sketch the geometry here.

Theorem 1 — the locally optimal step is globally optimal

Let $\boldsymbol\epsilon^*_{t_n}$ denote the noise prediction that minimizes the heuristic $\mathcal{H}$ at timestep $t_n$ — locally, looking only one step ahead. We claim that the DDIM update with $\boldsymbol\epsilon^*_{t_n}$ produces an $\boldsymbol x_{t_{n-1}}$ that, after the remaining solver steps, decodes to the morph $\hat{\boldsymbol x}_0^{*}$ which also globally minimizes $\mathcal H$ over all sampling trajectories starting from $\boldsymbol x_{t_n}$. The argument is short:

DDIM step (Eq. 5 of paper) $$\boldsymbol x_{t_{n-1}} \;=\; \sqrt{\bar\alpha_{t_{n-1}}}\,\hat{\boldsymbol x}_0(\boldsymbol\epsilon_{t_n}) \;+\; \sqrt{1-\bar\alpha_{t_{n-1}}}\,\boldsymbol\epsilon_{t_n}.$$

$\hat{\boldsymbol x}_0(\boldsymbol\epsilon_{t_n})$ is the model's clean-image estimate — what one would obtain by “jumping to the end” from the current noise level. Conditioned on $\boldsymbol x_{t_n}$, the entire trajectory from $t_n$ down to $0$ is determined by the choice of $\boldsymbol\epsilon_{t_n}$ — the rest of the schedule is fixed coefficients. Consequently, if $\boldsymbol\epsilon^*_{t_n}$ minimizes $\mathcal H(\hat{\boldsymbol x}_0)$, then the sampler's final output inherits that minimum. Intuitively, if the prediction is optimal at time $t$, then the same prediction at time $s < t$ remains optimal.

Theorem 2 — the prior search spaces have measure zero

The natural next question is where Greedy-DiM* searches. The optimization variable is the noise prediction $\boldsymbol\epsilon \in \mathbb R^d$ at the active timestep, with $d$ the latent dimension. Compare this to Morph-PIPE, which searches over a 21-element discrete set of fixed blends $\{w_i\} \subset [0,1]$, and to Greedy-DiM-S, which searches over a continuous 1D curve in $\mathbb R^d$ parameterized by a single blend coefficient. Whilst these earlier methods enumerate finely, they enumerate along a one-dimensional skeleton; Greedy-DiM* optimizes over the ambient space itself.

phase

Let $\mathcal S^*$, $\mathcal S_S$, and $\mathcal S_P$ denote the three search spaces and let $\mu$ be the data distribution from which the diffusion model implicitly samples. The paper establishes:

Theorem 2 — greed always wins $$\mathbb P_{\mu}(\boldsymbol\epsilon^* \in \mathcal S^*) = 1, \qquad \mathbb P_{\mu}(\boldsymbol\epsilon^* \in \mathcal S_S) = 0, \qquad \mathbb P_{\mu}(\boldsymbol\epsilon^* \in \mathcal S_P) = 0.$$

That is, under any continuous data distribution, the optimal noise $\boldsymbol\epsilon^*$ lies in Greedy-DiM*'s search space almost surely, and lies in Morph-PIPE's or Greedy-DiM-S's search space with probability zero. Consequently, Morph-PIPE plateaus around 92–96% MMPMR regardless of how many candidates it enumerates; the optimum simply does not lie in its candidate set.

Combining the two theorems

The locally greedy choice is globally optimal (Theorem 1) and the space we search contains the optimum almost surely (Theorem 2). Consequently, Greedy-DiM* is provably a stronger algorithm than Morph-PIPE in a sense which does not depend on dataset, recogniser, or hyperparameters; the empirical 100% MMPMR follows. Moreover, the compute cost is also lower — 270 NFEs versus Morph-PIPE's 2350.

For the full step-by-step pictorial of the algorithm, see the animated overview at the top of the page.

06 — Conclusions

Conclusions

Prior representation‑based diffusion morphs leave the reverse ODE as a black box; the latents are blended at $t=T$ and the solver is run to completion. Morph‑PIPE partially addresses this by searching a discrete grid of 21 blend weights, however, its parameter space has Lebesgue measure zero in the noise manifold and its compute budget runs an order of magnitude above vanilla DiM. We propose to exploit the iterative structure of the probability‑flow ODE directly, performing a greedy optimisation of an identity heuristic at every step. The resulting algorithm, Greedy‑DiM, is unreasonably effective; it consistently outperforms landmark‑based morphs against modern FR systems whilst running at a fraction of the compute of the previous representation‑based state of the art.

Contributions

References