Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

Overview

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation

License: MIT

[OpenReview] [arXiv] [Code]

The official implementation of GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022 Oral Presentation [54/3391]).

cover

Environments

Install via Conda (Recommended)

# Clone the environment
conda env create -f env.yml
# Activate the environment
conda activate geodiff
# Install PyG
conda install pytorch-geometric=1.7.2=py37_torch_1.8.0_cu102 -c rusty1s -c conda-forge

Dataset

Offical Dataset

The offical raw GEOM dataset is avaiable [here].

Preprocessed dataset

We provide the preprocessed datasets (GEOM) in this [google drive folder]. After downleading the dataset, it should be put into the folder path as specified in the dataset variable of config files ./configs/*.yml.

Prepare your own GEOM dataset from scratch (optional)

You can also download origianl GEOM full dataset and prepare your own data split. A guide is available at previous work ConfGF's [github page].

Training

All hyper-parameters and training details are provided in config files (./configs/*.yml), and free feel to tune these parameters.

You can train the model with the following commands:

# Default settings
python train.py ./config/qm9_default.yml
python train.py ./config/drugs_default.yml
# An ablation setting with fewer timesteps, as described in Appendix D.2.
python train.py ./config/drugs_1k_default.yml

The model checkpoints, configuration yaml file as well as training log will be saved into a directory specified by --logdir in train.py.

Generation

We provide the checkpoints of two trained models, i.e., qm9_default and drugs_default in the [google drive folder]. Note that, please put the checkpoints *.pt into paths like ${log}/${model}/checkpoints/, and also put corresponding configuration file *.yml into the upper level directory ${log}/${model}/.

Attention: if you want to use pretrained models, please use the code at the pretrain branch, which is the vanilla codebase for reproducing the results with our pretrained models. We recently notice some issue of the codebase and update it, making the main branch not compatible well with the previous checkpoints.

You can generate conformations for entire or part of test sets by:

python test.py ${log}/${model}/checkpoints/${iter}.pt \
    --start_idx 800 --end_idx 1000

Here start_idx and end_idx indicate the range of the test set that we want to use. All hyper-parameters related to sampling can be set in test.py files. Specifically, for testing qm9 model, you could add the additional arg --w_global 0.3, which empirically shows slightly better results.

Conformations of some drug-like molecules generated by GeoDiff are provided below.

Evaluation

After generating conformations following the obove commands, the results of all benchmark tasks can be calculated based on the generated data.

Task 1. Conformation Generation

The COV and MAT scores on the GEOM datasets can be calculated using the following commands:

python eval_covmat.py ${log}/${model}/${sample}/sample_all.pkl

Task 2. Property Prediction

For the property prediction, we use a small split of qm9 different from the Conformation Generation task. This split is also provided in the [google drive folder]. Generating conformations and evaluate mean absolute errors (MAR) metric on this split can be done by the following commands:

python ${log}/${model}/checkpoints/${iter}.pt --num_confs 50 \
      --start_idx 0 --test_set data/GEOM/QM9/qm9_property.pkl
python eval_prop.py --generated ${log}/${model}/${sample}/sample_all.pkl

Visualizing molecules with PyMol

Here we also provide a guideline for visualizing molecules with PyMol. The guideline is borrowed from previous work ConfGF's [github page].

Start Setup

  1. pymol -R
  2. Display - Background - White
  3. Display - Color Space - CMYK
  4. Display - Quality - Maximal Quality
  5. Display Grid
    1. by object: use set grid_slot, int, mol_name to put the molecule into the corresponding slot
    2. by state: align all conformations in a single slot
    3. by object-state: align all conformations and put them in separate slots. (grid_slot dont work!)
  6. Setting - Line and Sticks - Ball and Stick on - Ball and Stick ratio: 1.5
  7. Setting - Line and Sticks - Stick radius: 0.2 - Stick Hydrogen Scale: 1.0

Show Molecule

  1. To show molecules

    1. hide everything
    2. show sticks
  2. To align molecules: align name1, name2

  3. Convert RDKit mol to Pymol

    from rdkit.Chem import PyMol
    v= PyMol.MolViewer()
    rdmol = Chem.MolFromSmiles('C')
    v.ShowMol(rdmol, name='mol')
    v.SaveFile('mol.pkl')

Citation

Please consider citing the our paper if you find it helpful. Thank you!

@inproceedings{
xu2022geodiff,
title={GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation},
author={Minkai Xu and Lantao Yu and Yang Song and Chence Shi and Stefano Ermon and Jian Tang},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=PzcvxEMzvQC}
}

Acknowledgement

This repo is built upon the previous work ConfGF's [codebase]. Thanks Chence and Shitong!

Contact

If you have any question, please contact me at [email protected] or [email protected].

Known issues

  1. The current codebase is not compatible with more recent torch-geometric versions.
  2. The current processed dataset (with PyD data object) is not compatible with more recent torch-geometric versions.
Owner
Minkai Xu
Research [email protected]. Previous:
Minkai Xu
Facial recognition project

Facial recognition project documentation Project introduction This project is developed by linuxu. It is a face model recognition project developed ba

Jefferson 2 Dec 04, 2022
Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting Official PyTorch Implementation of paper "NeLF: Neural Light-tran

Ken Lin 38 Dec 26, 2022
PyTorch implementation of "Dataset Knowledge Transfer for Class-Incremental Learning Without Memory" (WACV2022)

Dataset Knowledge Transfer for Class-Incremental Learning Without Memory [Paper] [Slides] Summary Introduction Installation Reproducing results Citati

Habib Slim 5 Dec 05, 2022
[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

WIMP - What If Motion Predictor Reference PyTorch Implementation for What If Motion Prediction [PDF] [Dynamic Visualizations] Setup Requirements The W

William Qi 96 Dec 29, 2022
Code release of paper Improving neural implicit surfaces geometry with patch warping

NeuralWarp: Improving neural implicit surfaces geometry with patch warping Project page | Paper Code release of paper Improving neural implicit surfac

François Darmon 167 Dec 30, 2022
Official code for UnICORNN (ICML 2021)

UnICORNN (Undamped Independent Controlled Oscillatory RNN) [ICML 2021] This repository contains the implementation to reproduce the numerical experime

Konstantin Rusch 21 Dec 22, 2022
Datasets for new state-of-the-art challenge in disentanglement learning

High resolution disentanglement datasets This repository contains the Falcor3D and Isaac3D datasets, which present a state-of-the-art challenge for co

NVIDIA Research Projects 37 May 26, 2022
TorchX: A PyTorch Extension Library for More Efficient Deep Learning

TorchX TorchX: A PyTorch Extension Library for More Efficient Deep Learning. @misc{torchx, author = {Ansheng You and Changxu Wang}, title = {T

Donny You 8 May 28, 2022
Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Churn-Prediction-Project In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class. Project in

1 Jan 03, 2022
Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

Frank Odom 27 Dec 21, 2022
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

CSAW-M This repository contains code for CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer. Source code for tr

Yue Liu 7 Oct 11, 2022
A Light in the Dark: Deep Learning Practices for Industrial Computer Vision

A Light in the Dark: Deep Learning Practices for Industrial Computer Vision This is the repository for our Paper/Contribution to the WI2022 in Nürnber

Maximilian Harl 6 Jan 17, 2022
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Ponder(ing) Transformer Implementation of a Transformer that learns to adapt the number of computational steps it takes depending on the difficulty of

Phil Wang 65 Oct 04, 2022
CarND-LaneLines-P1 - Lane Finding Project for Self-Driving Car ND

Finding Lane Lines on the Road Overview When we drive, we use our eyes to decide where to go. The lines on the road that show us where the lanes are a

Udacity 769 Dec 27, 2022
Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

Frizu 6 Sep 08, 2022
Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis

Pyramid Transformer Net (PTNet) Project | Paper Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis. PTNet: A Hi

Xuzhe Johnny Zhang 6 Jun 08, 2022
Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling".

PSSL Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling". It consists of the pre-tra

2 Dec 21, 2021
An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Dual Correlation Reduction Network An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022. Any

yueliu1999 109 Dec 23, 2022
Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

lolviz By Terence Parr. See Explained.ai for more stuff. A very nice looking javascript lolviz port with improvements by Adnan M.Sagar. A simple Pytho

Terence Parr 785 Dec 30, 2022