benchmark

History

Name		Name	Last commit message	Last commit date
parent directory ..
seqkit_file_size		seqkit_file_size
seqkit_multi_threads		seqkit_multi_threads
tmp_result		tmp_result
README.md		README.md
benchmark.5tests.tsv		benchmark.5tests.tsv
benchmark.5tests.tsv.C.png		benchmark.5tests.tsv.C.png
benchmark.5tests.tsv.png		benchmark.5tests.tsv.png
plot.R		plot.R
plot.sh		plot.sh
revcom_biogo		revcom_biogo
revcom_biogo.go		revcom_biogo.go
run.pl		run.pl
run_benchmark_01_revcom.sh		run_benchmark_01_revcom.sh
run_benchmark_02_exctact_by_id_list.sh		run_benchmark_02_exctact_by_id_list.sh
run_benchmark_03_sampling.sh		run_benchmark_03_sampling.sh
run_benchmark_04_remove_duplicated_seqs_by_seq.sh		run_benchmark_04_remove_duplicated_seqs_by_seq.sh
run_benchmark_05_subseq_with_bed.sh		run_benchmark_05_subseq_with_bed.sh

README.md

Benchmark

Datasets and results are described at http://shenwei356.github.io/seqkit/benchmark

The benchmark needs be performed in Linux-like operating systems.

Install softwares

Softwares

seqkit. (Go). Version v0.3.1.1.
fasta_utilities. (Perl). Version 3dcc0bc. Lots of dependencies to install_.
fastx_toolkit. (Perl). Version 0.0.13. Can't handle multi-line FASTA files_.
seqmagick. (Python). Version 0.6.1
seqtk. (C). Version 1.1-r92-dirty.

A Python script memusg was used to computate running time and peak memory usage of a process.

Attention: the fasta_utilities uses Perl module Term-ProgressBar which makes it failed to run when using benchmark script run_benchmark_00_all.pl. Please change the source code of ProgressBar.pm (for me, the path is /usr/share/perl5/vendor_perl/Term/ProgressBar.pm). Add the code below after line 535:

$config{bar_width} = 1 if $config{bar_width} < 1;

The edited code is

} else {
  $config{bar_width}  = $target;
  $config{bar_width} = 1 if $config{bar_width} < 1;   # new line
  die "configured bar_width $config{bar_width} < 1"
  if $config{bar_width} < 1;
}

Clone this repository

git clone https://github.com/shenwei356/seqkit
cd seqkit/benchmark

Data preparation

http://shenwei356.github.io/seqkit/benchmark/#datasets

Or download all test data seqkit-benchmark-data.tar.gz (2.2G) and uncompress it, and then move them into directory seqkit/benchmark

wget http://app.shenwei.me/data/seqkit/seqkit-benchmark-data.tar.gz
tar -zxvf seqkit-benchmark-data.tar.gz
mv seqkit-benchmark-data/* seqkit/benchmark

Run tests

A Perl scripts run.pl is used to automatically running tests and generate data for plotting.

$ perl run.pl -h
Usage:

1. Run all tests:

perl run.pl run_benchmark*.sh --outfile benchmark.5test.tsv

2. Run one test:

perl run.pl run_benchmark_04_remove_duplicated_seqs_by_name.sh -o benchmark.rmdup.tsv

3. Custom repeate times:

perl run.pl -n 3 run_benchmark_04_remove_duplicated_seqs_by_name.sh -o benchmark.rmdup.tsv

To compare performance between different softwares, run:

./run.pl run_benchmark*.sh -n 3 -o benchmark.5tests.tsv

To test performance of other functions in seqkit, run:

./run.pl run_test*.sh -n 1 -o benchmark.seqkit.tsv

Plot result

R libraries dplyr, ggplot2, scales, ggthemes, ggrepel are needed.

Plot for result of the five tests:

./plot.R -i benchmark.5tests.tsv

Plot for result of the tests of other functions in seqkit:

./plot.R -i benchmark.seqkit.tsv --width 5 --height 3

./plot.R -i benchmark.5tests.tsv --width 8 --height 3 --lx 0.75 --ly 0.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

benchmark

benchmark

README.md

Benchmark

Install softwares

Clone this repository

Data preparation

Run tests

Plot result

Files

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

README.md

Benchmark

Install softwares

Clone this repository

Data preparation

Run tests

Plot result