Graph diffusion models, dominant in graph generative modeling, remain underexplored for graph-to-graph translation tasks like chemical reaction prediction.
We demonstrate that standard permutation equivariant denoisers face fundamental limitations in these tasks due to their inability to break symmetries in noisy inputs.
To address this, we propose
Diffusion has been successfully applied to a wide range of tasks, including molecular generation ([Vignac et al., 2023](https://arxiv.org/abs/2303.00537), [Hoogeboom et al., 2022](https://arxiv.org/abs/2203.17003)). The most notable gains of diffusion models over other generative models are the ability to generate diverse samples and the ability to condition on additional information. We want to bring these benefits to graph translation, the task of predicting a graph structure given another graph structure.
Diffusion models in this context start with a noisy graph (e.g. made of empty nodes and no edges) and an input (or source) graph. The denoising process iteratively transforms the noisy graph into a target graph conditional representing a transformation of the input graph. We show the denoising process in the context of retrosynthesis (predicting precursors for a given target molecule) below, and we define the diffusion process formally in the next section.
Consider a database of $N_\mathrm{obs}$ graphs $\mathcal{D} = \{ (\X_n, \Y_n, \P^{\Y\to\X}_n) \}_{n=1}^{N_\mathrm{obs}}$, where $\X_n$ represents the target graph, $\Y_n$ the input graph, and $\P^{\Y\to\X}_n$ are matrices defining node mappings between the two graphs. The graph translation task is: given that the data is sampled from an unknown distribution $p(\X,\Y,\P^{\Y\to\X})$, predict valid targets $\X \sim p(\X\mid\Y)$ for a given input $\Y$. We define the general diffusion process via a forward process \begin{equation}\textstyle \label{eq:forward-graph-diffusion} q(\X_{t+1} \mid \X_t) = \prod^{N_X}_{i=1} q(\X^{\N,i}_{t+1} \mid \X^{\N,i}_t) \prod_{i,j=1}^{N_X} q(\X^{E,ij}_{t+1} \mid \X^{E,ij}_t), \end{equation} to diffuse the reactant to noise, and a reverse process \begin{equation}\textstyle \label{eq:diff-backward} p_{\theta}(\X_{t-1} \mid \X_t, \Y) = \prod_{i=1}^{N_X} p_{\theta}(\X^{\N,i}_{t-1} \mid \X_t, \Y) \prod_{i,j}^{N_X} p_{\theta}(\X^{E,ij}_{t-1} \mid \X_t, \Y), \end{equation} defining our generative model. we use the neural network specifically to predict ground truth labels from noised samples, meaning that the neural network outputs a distribution $\tilde p_{\theta}\big(\X_0 \mid \X_t, \Y\big)$. The reverse process is then parameterized by \begin{equation}\textstyle \label{eq:conditional-node-backward} p_{\theta}(\X_{t-1} \mid \X_t, \Y) = \sum_{\X_0} q\big(\X_{t-1} \mid \X_t, \X_0 \big) \tilde p_{\theta}\big(\X_0 \mid \X_t, \Y\big). \end{equation} It is common [CITE] to choose $p_{\theta}\big(\X_0 \mid \X_t, \Y\big)$ as an equivariant model. Next we show how this limits the performance of the model.
It has been noted in the literature [CITE] that permutation equivariant functions struggle to map between a self-symmetrical input space to a less self-symmetrical output space. To confirm this observation, we study the task of translating a "ring" molecular structure (self-symmetrical) into a "tail" molecular structure (less self-symmetrical).
Trioxane (highly self-symmetrical)
3-chloropropan-1-ol (less self-symmetrical)
We train two models to translate between the two structures. The first model is an equivariant GNN (GraphTransformer) and the second is a non-equivariant GNN (achieved by adding unique positional encodings to the input features of each node and edge). Both models are trained with the same hyperparameters (i.e. same number of layers, hidden dimensions, etc.) We share the samples generated by the two models below and you can run the models yourself in this notebook.
Diffusion models iteratively remove noise from a noisy input to transform it into a target output. However, the noisy input is likely to be self-symmetrical (e.g. an empty graph), and the model struggles to produce a non-symmetrical output. To showcase this effect, we design a simple experiment where the goal is to copy a simple grid structure (left) from an empty, self-symmetrical grid (right).
With few diffusion steps (e.g. 20), we can see that the model struggles to copy the input grid, a task that should be trivial for a deep neural network.
However, if we run the diffusion process for more steps, the model will start to produce output that resembles more and more the target grid. You can explore the effect of the number of diffusion steps on the model's performance in this notebook.
It turns out that an equivariant model (and by extension the denoiser in the diffusion process), in an attempt to balance the requirement to break symmetries while maintaining equivariance, will eventually learn a distribution equal to the marginal distribution of the input labels. Formally, we write
So how can we help the model break self-symmetries while maintaining equivariance?
We can use node identifiers to
In the next section, we explore ways to pass this information to the neural network.
How can we tell a denoiser that specific nodes are paired? We explore three different methods: Note that the methods can be combined to strengthen the alignment signal. We show that aligned equivariant denoiser remain equivariant to the non-paired nodes in the generated graph.
We show that our method achieves a SOTA-matching $54.7\%$ top-1 accuracy, compared to a $5\%$ accuracy without alignment.
$$ \begin{array}{l} \begin{array}{c} \end{array} \\ \begin{array}{lcccccc} \hline \textbf{Method} & \mathbf{k=1} \uparrow & \mathbf{k=3} \uparrow & \mathbf{k=5} \uparrow & \mathbf{k=10} \uparrow & \widehat{\mathrm{\textbf{MRR}}} \uparrow \\ \hline \text{Unaligned} & 4.1 & 6.5 & 7.8 & 9.8 & 0.056 \\ \text{DiffAlign-input} & 44.1 & 65.9 & 72.2 & 78.7 & 0.554 \\ \text{DiffAlign-PE} & 49.0 & 70.7 & 76.6 & \mathbf{81.8} & 0.601 \\ \text{DiffAlign-PE+skip} & \mathbf{54.7} & \mathbf{73.3} & \mathbf{77.8} & 81.1 & \mathbf{0.639} \\ \hline \end{array} \end{array} $$The models' performance persists with low sampling steps, indicating that the alignment is effective.
Unlocking the potential of diffusion models in graph-to-graph translation enables a range of downstream applications. We show how these applications can benefit our main target application: retrosynthesis.
Thanks to inference guidance, we can use our denoiser to generate reactants with desired properties, e.g. synthesizability. This is especially useful for mutli-step retrosynthesis where the output of a single-step model is chained through a search until reaching available starting materials.
We can also apply inpainting to generate one of the reactants conditional on the other reactants and the product, or to generate a reactant with a specific structure.
@proceedings{DiffAlign2025Laabid,
title={Equivariant Denoisers Cannot Copy Graphs: Align your Graph Diffusion Models},
author={Laabid, Najwa and Rissanen, Severi and Heinonen, Markus and Solin, Arno and Garg, Vikas},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025},
url={https://openreview.net/forum?id=onIro14tHv}}