Code for the paper: Graph Neural Networks for Predicting Chemical Reaction Performance
- First install Anaconda.
- Create a conda environment with
conda create --name rxntorch python=3.6
- Then, activate the new conda environment with
conda activate rxntorch
- Install RDKit
conda install -c rdkit rdkit
- Installing PyTorch with a CUDA enabled version
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
- Install scikit-learn with
conda install scikit-learn
Finally, clone this repository to your local machine.
- Run:
pip install requirements
- Download the data from the below google drive links and put it in: ./data/<data_name>/raw/
- Prepare the domain features (chemical properties) by running
python 01_prepare_data.py --dataset_name <data_name> --use_rdkit_feats <rdkit or no_rdkit> --test_ratio <test_ratio>
- Depending on the type of the features you're using (with rdkit or no rdkit) you can do feature selection using:
python 02_train_rf.py --dataset_name <data_name>
If you're not using rdkit, you don't have to do feature selection because the feature set is not too large.
- To train the model, run:
python train_yield.py
Important arguments:
-p: Dataset path
-dn: Dataset name (su,dy,az)
-op: Output path
-mv: Model version
--split_set_num: Which split set to use. This is generated by running 01_prepare_data.py
--use_domain: Use chemical features or not. Options: (rdkit, no_rdkit, no_domain)
--epochs: Number of epochs
--seed: Random seed
--layers: Number of layes
--hidden: Hidden size for all layers
--lr_decay: Learning rate decay
--batch_size: Size of mini-batch
--dropout_rate: Droput rate
- To generate model predictions and visualize the activations of the GNN, run:
05_load_model.ipynb
If using chemical features (domain features) you need the json file containg the features. Otherwise, you can just use smiles strings.
- To plot the training curves and get the avg perfroamnce, run:
04_plots.ipynb