This repository contains the code and datasets for the manuscript "".
-
DownloadLinkForData.txt file
- you must open this file and download the link from this file
- it contains KG, embeddings, uniprot2genename, and reactome2pathwayname files
-
2 jupyter lab notebooks files
- one for measruing the accuracy
- one for predictions
-
gp2v.yml file
- this has the can be used to create the environment
-
graphpattern2vec folders
- this folder holds the objects/funtions of graphpattern2vec
-
Readme.md file
- Your reading it right now
-
model folder
- will hold temperary files
From pip:
Todo update copy paste for badges
git clone https://github.com/gravelCompBio/GraphPattern2vec.git
cd GraphPatter2vec/
ALSO DOWNLOAD THIS data/ FOLDER AND PUT IT IN THE GraphPattern2vec-main folder (unzip and make sure the name of the unziped folder is still "data")
the yml file is in GraphPatter2Vec file and is named "gp2v.yml"
conda env create -f gp2v.yml -n gp2v
conda activate gp2v
- Double check you downloaded the data/ folder (see download repostory section )
- Navigate to your way inside GraphPattern2vec-main folder in the termimal and run jupyter lab
jupyter lab
- for measureing an accurate ROC
or
- for generating perdictions
- ROC in this file does not represeint the overall accuracy
Before running the link prediction in either notebook generate your own embeddings !!!!!! PLEASE READ THIS SECTION !!!!!!
-
double check you downloaded all files of the Knowlege graph and embedding files from the "Downloading this repository " section
-
after you preform the random walk in sections of the code in either notebook, you can use the embbeding files we provided. If you wish to generate you own embedding you can preform metapath2vec off the random walk to re-generate new embbeding files. (see section below for metapath2vec instructrions)
-
If you happy with the existing emebbeding files you can run the link prediction sections of the code in either notebook
We use Change2vec++ to generate embeddings This is where the code is from
We used These paramaters for change2vec
- Size 256
- Window 7
- Negitive 5
- MinCount 5
- Threads 32
- PlusPlus 1
Example code of how we put it in the terminal
Todo update acutull example
./metapath2vec
see paper for a better explination
Dong, Y., Chawla, N. V., & Swami, A. (2017, August). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 135-144).
------ TO DO fill out the rest of the documentaiton
This section will describe utilizing the graphpattern2vec model and producing link-predictions with a knowledge graph
To immediately use the model with the example data provided, Example code can be found in graphpattern2vec_process-multithread-Edited2022 notebook. The computational notebook can be viewed using JupyterLab which is included in our environment. You can run it using jupyter lab.
Note if not using provided example embedding:
All cells in the notebook should be run until the Link Prediction section.
Embeddings used for predictions are generated from the output of graphpattern2vec (nodes collected through modified walk algorithm) which are then manually converted through metapath2vec- this step is done outside of the notebook
Run the rest of the cells after the Link Prediction section, after inputting embedding file generated outside of notebook.