Existing Diffusion Morphs (DiM) treat the diffusion model as a black box — blend two latents at $t=T$, run the reverse ODE, accept whatever falls out. In this work we propose Greedy‑DiM, a family of morphing attacks that opens the box: at every step of the probability‑flow ODE we greedily optimize the noise prediction $\boldsymbol\epsilon$ against an identity heuristic. We find that the proposed algorithm is unreasonably effective — 100% MMPMR on ArcFace, AdaFace, and ElasticFace on SYN‑MAD 2022, the first representation-based attack to consistently outperform landmark methods, and with fewer network evaluations than vanilla DiM.
A face morphing attack creates a single image which contains the biometric information of multiple identities, such that a face recognition (FR) system registers a false acceptance with any of the contributing identities. Diffusion Morphs (DiM) are the current state-of-the-art for representation‑based morphing attacks: the two bona fide images are encoded into the twin latents of a Diffusion Autoencoder, blended, and decoded by solving the probability-flow ODE. Whilst DiM attains state-of-the-art performance, none of the existing work on DiM has leveraged the iterative nature of the diffusion model — the DiM model is treated as a black box, no differently than one would a GAN or a VAE.
Specifically, vanilla DiM fixes a blend of $w=0.5$, runs the reverse ODE once, and accepts whatever falls out. Morph‑PIPE proposes an extension: a batch of 21 morphs is generated at different blend values, each run to completion, and the candidate which minimizes an ArcFace identity loss is selected. In both cases the $N$ intermediate noise predictions along the diffusion trajectory go entirely unused.
However, a diffusion model is iterative. The probability-flow ODE is solved over $N$ timesteps, and at each timestep $t_n$ the noise prediction $\boldsymbol\epsilon_\theta(\boldsymbol x_t, \boldsymbol z, t_n)$ determines the next state — giving us $N$ knobs to turn, rather than one. We propose to use a greedy strategy to solve this optimization problem by locally choosing the optimal solution at each timestep in the sampling process. We call this family of algorithms Greedy-DiM.
The standard question is "which blend $w \in [0,1]$ produces the best morph?": a 1D search on a fixed trajectory between $\boldsymbol x_0^{(a)}$ and $\boldsymbol x_0^{(b)}$. We instead ask "at every step of the ODE solver, which $\boldsymbol\epsilon$ minimizes an identity heuristic?": an $N$-step search over the entire noise space, with the FR system's gradient guiding every move. The former search space has Lebesgue measure zero; the latter has full measure. We prove both statements.
We propose two variants in this family: Greedy-DiM-S, a discrete search over noise blends at each timestep, and Greedy-DiM*, continuous gradient descent on the noise prediction itself. Both run at fewer NFEs than vanilla DiM, and Greedy-DiM* attains a 100% MMPMR against ArcFace, AdaFace, and ElasticFace on SYN-MAD 2022. To the best of our knowledge, Greedy-DiM is the first representation-based morphing attack to consistently outperform landmark-based morphing algorithms.
Diffusion models sample by solving the probability-flow ODE from $t=T$ down to $t=0$ via a numerical solver $\Phi$. At every timestep $t_n$, the noise prediction model $\boldsymbol\epsilon_\theta(\boldsymbol x_t,\boldsymbol z,t)$ admits an immediate clean-image estimate via Tweedie's formula — i.e., the noise predictor can be restructured into an image predictor, well before the solver has finished. The key idea of Greedy-DiM is to exploit this fact.
Intuitively, the noise predictor $\boldsymbol\epsilon_\theta$ can be restructured into an image predictor $\hat{\boldsymbol x}_0$, and so an identity heuristic $\mathcal{H}(\hat{\boldsymbol x}_0,\boldsymbol x_0^{(a)},\boldsymbol x_0^{(b)})$ can be evaluated at any timestep. The recipe is then straightforward: at each $t_n$, evaluate $\mathcal H$ on the current clean estimate, take a gradient step on $\boldsymbol\epsilon$ to lower it, feed the optimized noise back into the solver, and proceed. We propose to repeat this at every step, steering the morph trajectory toward an image that fools the FR system rather than passively decoding it from a fixed average.
For our heuristic function $\mathcal H$ we adopt the identity loss popularized by MIPGAN, which drives the morph close to both sources in the embedding space of an FR system $F$, whilst penalizing any asymmetry in the similarity to $a$ and $b$ to prevent the optimizer from collapsing onto a single identity.
Here $F$ is an ArcFace network used solely to compute the heuristic during synthesis — it never sees the held-out evaluation systems.
Crucially, the inner optimization touches only $\boldsymbol\epsilon$ and the heuristic — no additional forward passes through the diffusion U-Net are required. Consequently, Greedy‑DiM* runs at 270 NFEs in total (20 sampling + 250 encoding) — fewer than vanilla DiM (350) and a fraction of Morph-PIPE's 2350. The cost is in cheap heuristic gradient steps, not extra U-Net calls.
We propose two algorithms in this family. Greedy-DiM-S, the search variant, runs the noise predictor twice per step — once conditioned on $\boldsymbol z_a$, once on $\boldsymbol z_b$ — and searches over $B=21$ slerp blends of the two predictions for the $w$ which minimizes $\mathcal H$.
Greedy-DiM* drops the two-trajectory framing entirely. Rather than searching for an optimal blend of two noise predictions, we propose to directly optimize the noise prediction itself: a single $\boldsymbol\epsilon$ per step, updated by gradient descent in $\mathbb R^d$. The search space at each step is the full noise space rather than a 1D blend manifold. The greedy optimization strategy provides a significant uplift in performance over the greedy search strategy whilst retaining the same number of NFE; we posit that this is in part due to the enhanced size of the search space, allowing for a more optimal solution to be found. Section 5 makes this precise.
A morphing attack is multi-objective: the attack must fool every recognizer whilst keeping NFE low. The radar below plots all nine attacks against three FR systems (MMPMR at $\mathrm{FMR}=0.1\%$ against AdaFace, ArcFace, ElasticFace) and a compute-efficiency axis ($1-\mathrm{NFE}/2350$). The MMPMR axes are scaled from 65% to 100% in order to make the separation between attacks legible.
We observe that Greedy-DiM* (deep blue fill) is the only attack which saturates all three FR axes, i.e., achieves 100% MMPMR on every tested FR system, whilst remaining cheaper on compute than vanilla DiM-A. This is achieved at only 270 NFEs — fewer than DiM-A, and an order of magnitude below Morph-PIPE's 2350.
| Attack | NFE ↓ | AdaFace ↑ | ArcFace ↑ | ElasticFace ↑ |
|---|---|---|---|---|
| FaceMorpher (landmark) | — | 89.78 | 87.73 | 89.57 |
| Webmorph (landmark) | — | 97.96 | 96.93 | 98.36 |
| OpenCV (landmark) | — | 94.48 | 92.43 | 94.27 |
| MIPGAN-II (GAN) | 150 | 70.55 | 72.19 | 65.24 |
| Morph-PIPE | 2350 | 95.91 | 92.84 | 95.50 |
| DiM-A | 350 | 92.23 | 90.18 | 93.05 |
| Fast-DiM-ode | 150 | 91.82 | 88.75 | 91.21 |
| Greedy-DiM-S | 350 | 95.71 | 93.87 | 95.30 |
| Greedy-DiM* (ours) | 270 | 100 | 100 | 100 |
Numbers from Table 4 of Greedy-DiM (Blasingame & Liu, IJCB 2024). Landmark methods carry no diffusion compute (NFE listed as —); MIPGAN‑II at 150. To the best of our knowledge, Greedy-DiM* is the first representation-based attack to consistently outperform landmark methods.
Whilst MMPMR averages over identities, the Morphing Attack Potential metric (MAP) reports the fraction of morphs which simultaneously fool $c$ FR systems — the appropriate metric for deployments in which a single morph might be presented to multiple recognizers. We observe that Greedy-DiM* attains 100% in every column.
| Attack | NFE | $\geq 1$ FR | $\geq 2$ FRs | $\geq 3$ FRs |
|---|---|---|---|---|
| FaceMorpher (landmark) | — | 92.23 | 89.57 | 85.28 |
| Webmorph (landmark) | — | 98.77 | 98.36 | 96.11 |
| OpenCV (landmark) | — | 97.55 | 93.87 | 89.78 |
| MIPGAN-II (GAN) | — | 80.37 | 69.73 | 57.87 |
| DiM-A | 350 | 96.93 | 92.43 | 86.09 |
| Fast-DiM-ode | 150 | 95.91 | 91.21 | 84.66 |
| Morph-PIPE | 2350 | 98.16 | 95.71 | 90.39 |
| Greedy-DiM-S (ours) | 350 | 97.34 | 95.71 | 91.82 |
| Greedy-DiM* (ours) | 270 | 100 | 100 | 100 |
$\mathrm{MAP}[1,c]$ at $\mathrm{FMR}=0.1\%$ on SYN-MAD 2022 / FRLL. Greedy-DiM* is the only attack — landmark or representation-based — that simultaneously fools all three modern FR systems on every test pair. Landmark and GAN baselines plot at NFE=0 (saturating the compute axis); diffusion methods plot at their actual NFE.
Two FRLL identity pairs through five DiM-family morphing pipelines. Greedy-DiM* sits in the dead-center column — the single image each result in this section refers to — with the four baselines flanking it.
Bona fide
DiM-A
Fast-DiM
Greedy-DiM*
Morph-PIPE
Greedy-DiM-S
Bona fide
Bona fide
DiM-A
Fast-DiM
Greedy-DiM*
Morph-PIPE
Greedy-DiM-S
Bona fideFooling every recognizer is necessary but not sufficient. A deployed morph must also survive a morph attack detector (MAD) — a separately trained classifier deployed at the point of capture — whose job is to flag the morph as a forgery. Optimizing for MMPMR alone solves an easier problem than the one defenders actually face. Following the protocol of prior work, we train three SE-ResNeXt101 detectors, each on a different attack family — OpenCV (landmark), MIPGAN-II (GAN), DiM-A (diffusion) — and evaluate all eleven attacks against each.
| Attack | OpenCV-MAD | MIPGAN-II-MAD | DiM-A-MAD | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EER | 0.1% | 1.0% | 5.0% | EER | 0.1% | 1.0% | 5.0% | EER | 0.1% | 1.0% | 5.0% | |
| FaceMorpher (landmark) | 31.86 | 99.02 | 97.06 | 80.39 | 82.84 | 100 | 100 | 99.51 | 44.61 | 99.02 | 97.06 | 87.75 |
| Webmorph (landmark) | 0.49 | 0.98 | 0.49 | 0.49 | 51.47 | 100 | 99.51 | 93.14 | 36.27 | 98.04 | 90.69 | 74.02 |
| OpenCV (landmark) | 0.98 | 2.45 | 0.49 | 0.49 | 50.49 | 99.51 | 95.10 | 90.69 | 39.71 | 99.02 | 85.29 | 73.04 |
| MIPGAN-I (GAN) | 24.02 | 83.82 | 70.10 | 49.02 | 8.33 | 38.24 | 24.51 | 14.22 | 44.61 | 99.51 | 95.10 | 85.78 |
| MIPGAN-II (GAN) | 26.47 | 73.53 | 59.80 | 45.59 | 3.92 | 34.31 | 8.82 | 2.94 | 52.94 | 99.02 | 98.04 | 90.69 |
| DiM-A | 82.84 | 100 | 100 | 100 | 54.90 | 99.51 | 99.51 | 93.63 | 0.98 | 2.94 | 0.98 | 0 |
| DiM-C | 63.73 | 99.51 | 99.02 | 98.53 | 43.63 | 99.51 | 93.63 | 85.29 | 5.39 | 42.65 | 19.61 | 6.37 |
| Fast-DiM | 75.00 | 100 | 100 | 100 | 50.00 | 99.51 | 99.51 | 94.61 | 2.94 | 12.25 | 5.88 | 1.96 |
| Fast-DiM-ode | 75.49 | 100 | 100 | 100 | 56.37 | 99.51 | 99.51 | 98.53 | 1.96 | 7.35 | 2.94 | 0.49 |
| Morph-PIPE | 82.35 | 100 | 100 | 100 | 54.90 | 99.51 | 99.51 | 93.63 | 0.98 | 5.39 | 0.98 | 0 |
| Greedy-DiM-S (ours) | 52.94 | 100 | 100 | 98.53 | 36.76 | 100 | 92.65 | 79.41 | 6.37 | 63.73 | 14.22 | 6.86 |
| Greedy-DiM* (ours) | 37.75 | 99.02 | 93.63 | 85.29 | 34.31 | 96.57 | 93.14 | 75.00 | 17.16 | 85.78 | 56.37 | 42.65 |
EER (Equal Error Rate) and MACER @ BPCER=$\{0.1, 1.0, 5.0\}\%$ for three SE-ResNeXt101 morph-attack detectors trained on (left) OpenCV, (middle) MIPGAN-II, (right) DiM-A. Higher = harder for the defender to catch. Greedy-DiM* is the only DiM-family attack that survives a DiM-A-trained detector with non-trivial margin (17.16% EER vs $\le$6.37% for the next best). Reproduced from Table 7 of Blasingame & Liu, IJCB 2024.
Under the OpenCV-trained MAD (top arc), Greedy-DiM* draws the smallest DiM polygon. We posit that the iterative gradient descent in $\boldsymbol\epsilon$-space injects high-frequency content into the latent which is helpful in fooling the FR system but introduces a fingerprint a landmark-trained MAD readily picks up. Under the DiM-A-trained MAD (left arc) the pattern reverses: Greedy-DiM* is the only DiM-family attack which retains a meaningful EER — roughly 3$\times$ further from the DiM-A decision boundary than the next best variant. In general, we observe that in order for the MAD algorithm to consistently detect a morphing attack it needs to be trained on a similar generation process. The practical implication for deployment is that defenders must train their MAD with diffusion morphs in the loop and keep the training set current with the attack frontier.
A greedy algorithm chooses a locally optimal solution at each timestep, but it is not automatically guaranteed that this locally optimal solution is globally optimal — famously, the greedy solution to the traveling salesman problem can yield the worst possible route. Motivated by our strong empirical results, we present two theorems which justify the use of a greedy strategy for the iterative sampling process of diffusion models. Under the standard assumptions of DDIM and the probability-flow ODE, the locally optimal $\boldsymbol\epsilon^*_{t_n}$ chosen by Greedy-DiM* at every step is globally optimal; moreover, the parameter space we optimize over has full Lebesgue measure, whilst Morph-PIPE's and Greedy-DiM-S's discrete subspaces have measure zero. Both statements admit short proofs in the paper; we sketch the geometry here.
Let $\boldsymbol\epsilon^*_{t_n}$ denote the noise prediction that minimizes the heuristic $\mathcal{H}$ at timestep $t_n$ — locally, looking only one step ahead. We claim that the DDIM update with $\boldsymbol\epsilon^*_{t_n}$ produces an $\boldsymbol x_{t_{n-1}}$ that, after the remaining solver steps, decodes to the morph $\hat{\boldsymbol x}_0^{*}$ which also globally minimizes $\mathcal H$ over all sampling trajectories starting from $\boldsymbol x_{t_n}$. The argument is short:
$\hat{\boldsymbol x}_0(\boldsymbol\epsilon_{t_n})$ is the model's clean-image estimate — what one would obtain by “jumping to the end” from the current noise level. Conditioned on $\boldsymbol x_{t_n}$, the entire trajectory from $t_n$ down to $0$ is determined by the choice of $\boldsymbol\epsilon_{t_n}$ — the rest of the schedule is fixed coefficients. Consequently, if $\boldsymbol\epsilon^*_{t_n}$ minimizes $\mathcal H(\hat{\boldsymbol x}_0)$, then the sampler's final output inherits that minimum. Intuitively, if the prediction is optimal at time $t$, then the same prediction at time $s < t$ remains optimal.
The natural next question is where Greedy-DiM* searches. The optimization variable is the noise prediction $\boldsymbol\epsilon \in \mathbb R^d$ at the active timestep, with $d$ the latent dimension. Compare this to Morph-PIPE, which searches over a 21-element discrete set of fixed blends $\{w_i\} \subset [0,1]$, and to Greedy-DiM-S, which searches over a continuous 1D curve in $\mathbb R^d$ parameterized by a single blend coefficient. Whilst these earlier methods enumerate finely, they enumerate along a one-dimensional skeleton; Greedy-DiM* optimizes over the ambient space itself.
Let $\mathcal S^*$, $\mathcal S_S$, and $\mathcal S_P$ denote the three search spaces and let $\mu$ be the data distribution from which the diffusion model implicitly samples. The paper establishes:
That is, under any continuous data distribution, the optimal noise $\boldsymbol\epsilon^*$ lies in Greedy-DiM*'s search space almost surely, and lies in Morph-PIPE's or Greedy-DiM-S's search space with probability zero. Consequently, Morph-PIPE plateaus around 92–96% MMPMR regardless of how many candidates it enumerates; the optimum simply does not lie in its candidate set.
The locally greedy choice is globally optimal (Theorem 1) and the space we search contains the optimum almost surely (Theorem 2). Consequently, Greedy-DiM* is provably a stronger algorithm than Morph-PIPE in a sense which does not depend on dataset, recogniser, or hyperparameters; the empirical 100% MMPMR follows. Moreover, the compute cost is also lower — 270 NFEs versus Morph-PIPE's 2350.
For the full step-by-step pictorial of the algorithm, see the animated overview at the top of the page.
Prior representation‑based diffusion morphs leave the reverse ODE as a black box; the latents are blended at $t=T$ and the solver is run to completion. Morph‑PIPE partially addresses this by searching a discrete grid of 21 blend weights, however, its parameter space has Lebesgue measure zero in the noise manifold and its compute budget runs an order of magnitude above vanilla DiM. We propose to exploit the iterative structure of the probability‑flow ODE directly, performing a greedy optimisation of an identity heuristic at every step. The resulting algorithm, Greedy‑DiM, is unreasonably effective; it consistently outperforms landmark‑based morphs against modern FR systems whilst running at a fraction of the compute of the previous representation‑based state of the art.