PORCELAN: Integrating representation learning, permutation, and optimization to detect lineage-related gene expression patterns
This repository contains code for the paper "Integrating representation learning, permutation, and optimization to detect lineage-related gene expression patterns". We developed Permutation, Optimization, and Representation learning based single Cell gene Expression and Lineage ANalysis (PORCELAN) to identify lineage-informative genes or subtrees where lineage and expression are tightly coupled:
data
contains jupyter notebooks for downloading, simulating, and pre-processing the datasets used in the paper's results. For convenience, we also provide pre-processed data files indata/preprocessed
. Seedata/README.md
for further details.figure_notebooks
contains jupyter notebooks to reproduce the paper's main and supplemental figures. Most results can be reproduced in a few seconds or minutes but we also provide the data files for the results displayed in the figures inresults
for convenience. Seefigure_notebooks/README.md
for further details.tutorial
contains a jupyter notebook that walks the user through applying all componenets of PORCELAN to an example tumor. Seetutorial/README.md
for further details.
Python:
This repository was developed using Python 3.8. You can use Conda to create a virtual environment for a specific Python version. Additional required packages are listed in requirements.txt
and can be installed using the following command:
pip install -r requirements.txt
Installing dependencies can take a few minutes or up to an hour dependending on how many packages need to be downloaded rather than reusing cached versions.
R:
We only use R to simulate lineage-resolved gene expression data with TedSim (installation instructions).
Operating system and hardware:
We tested this code on a machine running Rocky Linux 8.8 (Green Obsidian) and equipped with an NVIDIA RTX A6000 GPU.
TODO