Skip to content

Installation

Kutluhan Incekara edited this page Nov 20, 2023 · 8 revisions

Cloud installation (TERRA)

If you don't have an account on Terra, you must first set up a Google Cloud and Terra account. You can find more information here.

If you are already a Terra user, you can find C-BIRD in this Dockstore link. You can add C-BIRD to your workflows by clicking the Terra button on the top right. Please check the files and databases section for other requirements.

Local Installation

Dependencies

WDL workflows need a workflow engine and a containerization solution to run. You must install one of the workflow engines and container runtimes below into a Unix-based operating system. Please note that C-BIRD tested with Cromwell and uDocker in our local HPC, and running instructions will be given for Cromwell.

Workflow engines:

Container runtimes:

Notes: Cromwell requires Java 11 runtime environment. Miniwdl requires Python 3.6. uDocker is an alternate container runtime that can execute docker containers without root privileges.

Getting C-BIRD

You can get C-BIRD via git.

git clone https://github.com/Kincekara/C-BIRD.git

Required files

👉 For a quick start, you can collect the required files from here (except Kraken2/Bracken database). Terra users, you can upload required files into your workspace and use them as workspace data. (Edit: Those files are not needed for version 1.3.0 and above)

1. Kraken2/Bracken database: C-BIRD uses Kraken2 and Bracken for profiling the reads. Please use a precompiled RefSeq database on Ben Langmead's page. Those packages contain a Kraken 2 database along with Bracken database builts. If you want to use your own database, you need to add Bracken databases to your compressed database file. The standard-8 package is recommended for an efficient runtime.

2. BUSCO database: Download bacteria_odb10 from BUSCO page.

3. Mash Sketch: C-BIRD uses a curated mash sketch for identification that you can download here. Please do not try to use your own sketch. If the target organism is not part of the sketch, you won't see a predicted organism in the results. However, you can still see the top taxon in Bracken's results.

4. NCBI's AmrFinderPlus database: Since the database structure changed in new versions of AMRFinder Plus, C-BIRD cannot use a database later than 2022-10-11.2. This will be fixed in next version. The database should be a compressed archive (tar.gz) as input.

5. PlasmidFinder database: The database should be a compressed archive (tar.gz) as input. The latest database can be found here. You may need KMA to index the files.

6. NCBI's genome statistics: The latest genome size statistics can be found here. Please use an uncompressed text file as input.

7. Adapters fasta (Optional): You can specify adapter sequences to be trimmed. If C-BIRD cannot find this file, it activates auto-detection for PE reads.

8. Target genes fasta (Optional): If you are seeking new genes or any protein sequences, you can use it as an input to C-BIRD. This file should be fasta formatted protein sequence(s).

Clone this wiki locally