*Last updated: Aug 20, 2020
FunGAP is freely available for academic use. For the commerical use or license of FunGAP, please contact In-Geol Choi (email: igchoi (at) korea.ac.kr). Please, cite the following reference
Reference: Byoungnam Min Igor V Grigoriev In-Geol Choi, FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation (2017), Bioinformatics, Volume 33, Issue 18, Pages 2936–2937, https://doi.org/10.1093/bioinformatics/btx353
Please don't hesitate to post on Issues or contact me ([email protected]) for help. These steps were tested in the freshly installed Ubuntu 18.04 LTS.
Using Docker is the most reliable and robust way to install FunGAP. Please follow the instruction.
Although we recommend using Docker, some workspaces are not available for Docker (e.g., HPC). Please use the following instruction for conda-based FunGAP installation.
- Hisat2 v2.2.0
- Trinity v2.11.0
- RepeatModeler v2.0.1
- Maker v2.31.10
- GeneMark-ES/ET v4.59_lic
- Augustus v3.3.3
- Braker v2.1.5
- BUSCO v4.1.2
- Pfam_scan v1.6
- BLAST v2.9.0+
- Samtools v1.10
- Bamtools v2.5.1
- Pfam release 33.1
Download and install Anaconda3 (We assume that you install it in $HOME/anaconda3
)
cd $HOME
wget https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
bash Anaconda3-2020.07-Linux-x86_64.sh
echo ". $HOME/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
source $HOME/.bashrc
which conda # It should be $HOME/anaconda3/condabin/conda
Set up the channels.
# Add two channels
conda config --add channels bioconda
conda config --add channels conda-forge
# Check the channels
conda config --show channels
# channels:
# - conda-forge
# - bioconda
# - defaults
# Remove channels if you have unnecessary channels
conda config --remove channels bioconda/label/cf201901
conda config --remove channels conda-forge/label/cf201901
conda update conda
conda create -y -n fungap
conda activate fungap
conda install braker2=2.1.5 trinity=2.11.0 repeatmodeler=2.0.1 hisat2=2.2.0 pfam_scan=1.6 busco=4.1.2
pip install biopython==1.77 bcbio-gff markdown2 matplotlib
cpanm YAML Hash::Merge Logger::Simple Parallel::ForkManager MCE::Mutex Thread::Queue threads
Because Maker is incompatible with other dependencies (it requires Python2), we will make a new environment and install the Maker in it.
conda deactivate
conda create -n maker -c bioconda maker=2.31.10
Download FunGAP using GitHub clone. Suppose we are installing FunGAP in your $HOME
directory, but you are free to change the location. $FUNGAP_DIR
is going to be your FunGAP installation directory.
cd $HOME # or wherever you want
git clone https://github.com/CompSynBioLab-KoreaUniv/FunGAP.git
export FUNGAP_DIR=$(realpath FunGAP/)
Download Pfam databases in your $FUNGAP_DIR/db
directory.
ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release
mkdir -p $FUNGAP_DIR/db/pfam
cd $FUNGAP_DIR/db/pfam
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
gunzip Pfam-A.hmm.gz
gunzip Pfam-A.hmm.dat.gz
conda activate fungap
hmmpress Pfam-A.hmm # HMMER package (would be automatically installed in the above Anaconda step)
Go to the below site and download GeneMark-ES/ET. http://topaz.gatech.edu/GeneMark/license_download.cgi Don't forget to download the key, too.
mkdir $FUNGAP_DIR/external/
mv gmes_linux_64.tar.gz gm_key_64.gz $FUNGAP_DIR/external/ # Move your downloaded files to this directory
cd $FUNGAP_DIR/external/
tar -zxvf gmes_linux_64.tar.gz
gunzip gm_key_64.gz
cp gm_key_64 ~/.gm_key
GeneMark forces to use /usr/bin/perl
instead of conda-installed perl. You can change this by running change_path_in_perl_scripts.pl
script.
cd $FUNGAP_DIR/external/gmes_linux_64/
perl change_path_in_perl_scripts.pl "/usr/bin/env perl"
cd $FUNGAP_DIR/external/gmes_linux_64/
./gmes_petap.pl
conda activate fungap
cd $(dirname $(which RepeatMasker))/../share/RepeatMasker
# ./configure downloads required databases
echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
# It should look like this
ls $(dirname $(which RepeatMasker))/../share/RepeatMasker/Libraries
# Artefacts.embl Dfam.hmm RepeatAnnotationData.pm RepeatMasker.lib.nin RepeatPeps.lib RepeatPeps.lib.psq
# CONS-Dfam_3.0 README.meta RepeatMasker.lib RepeatMasker.lib.nsq RepeatPeps.lib.phr RepeatPeps.readme
# Dfam.embl RMRBMeta.embl RepeatMasker.lib.nhr RepeatMaskerLib.embl RepeatPeps.lib.pin taxonomy.dat
This script allows users to set and test (by --help command) all the dependencies. If this script runs without any issue, you are ready to run FunGAP!
cd $FUNGAP_DIR
conda activate maker
export MAKER_DIR=$(dirname $(which maker))
echo $MAKER_DIR # /home/ubuntu/anaconda3/envs/maker/bin
conda activate fungap
./set_dependencies.py \
--pfam_db_path db/pfam/ \
--genemark_path external/gmes_linux_64/ \
--maker_path ${MAKER_DIR}
You have to fix this bug; otherwise, you will encounter this error.
ERROR: Number of good genes is 0, so the parameters cannot be optimized. Recomended are at least 300 genes
WARNING: Number of good genes is low (0
). Recomended are at least 300 genes
conda activate fungap
cd $(dirname $(which braker.pl))
vim filterGenesIn_mRNAname.pl
Go to line 38, and add a "?" character.
From
if ( $_ =~ m/transcript_id \"(.*)\"/ ) {
to
if ( $_ =~ m/transcript_id \"(.*?)\"/ ) {
Somehow conda-installed diamond doesn't work at the moment. So replace the diamond with new one.
conda activate fungap
which diamond # It should look like */conda/fungap/bin/diamond
cp $(which diamond) $(which diamond).backup
wget https://github.com/bbuchfink/diamond/releases/download/v2.0.0/diamond-linux64.tar.gz
tar -xf diamond-linux64.tar.gz
mv diamond $(dirname $(which diamond))
You can download yeast (Saccharomyces cerevisiae) genome assembly (FASTA) and RNA-seq reads (two FASTQs) from NCBI for testing FunGAP.
# Download RNA-seq reads using SRA toolkit (https://ncbi.github.io/sra-tools/install_config.html)
# Parameter -X indicates that we only need <int> pairs from the dataset.
fastq-dump -X 1000000 -I --split-files SRR1198667
# Download assembly
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz
gunzip GCF_000146045.2_R64_genomic.fna.gz
$FUNGAP_DIR/download_sister_orgs.py \
--taxon "Saccharomyces cerevisiae" \
--email_address <your_email_address> \
--num_sisters 1
zcat sister_orgs/*faa.gz > prot_db.faa
$FUNGAP_DIR/get_augustus_species.py \
--genus_name "Saccharomyces" \
--email_address [email protected]
- saccharomyces_cerevisiae_S288C
$FUNGAP_DIR/fungap.py \
--genome_assembly GCF_000146045.2_R64_genomic.fna \
--trans_read_1 SRR1198667_1.fastq \
--trans_read_2 SRR1198667_2.fastq \
--augustus_species saccharomyces_cerevisiae_S288C \
--busco_dataset ascomycota_odb10 \
--sister_proteome prot_db.faa \
--num_cores 8
It took about 9 hours by dual Intel(R) Xeon(R) CPU E5-2670 v3 with 40 CPU cores.