Skip to content

Latest commit

 

History

History
27 lines (24 loc) · 1.54 KB

paper.md

File metadata and controls

27 lines (24 loc) · 1.54 KB
title tags authors date bibliography
GFF3toEMBL: Preparing annotated assemblies for submission to EMBL
bioinformatics
annotation
EMBL
name orcid affiliation
Andrew J. Page
0000-0001-6919-6062
Pathogen Informatics, Wellcome Trust Sanger Institute
name orcid affiliation
Sascha Steinbiss
0000-0002-2151-0574
Pathogen Informatics, Wellcome Trust Sanger Institute
name affiliation
Ben Taylor
Pathogen Informatics, Wellcome Trust Sanger Institute
name orcid affiliation
Jacqueline A. Keane
0000-0002-2021-1863
Pathogen Informatics, Wellcome Trust Sanger Institute
19 Sept 2016
paper.bib

Summary

An essential part of open reproducable research in genomics is the deposition of annoated de novo assembled genomes in public archives such as EMBL/GenBank [@BLAXTER2016]. The interfaces provided by the major archives do not allow for data to be easily submitted on a large scale without substantial prior knowledge on the part of the submitter. This has lead to a situation where less than 15% of all sequenced bacteria have corresponding public assemblies. We address this by providing GFF3toEMBL, which converts the output of the most commonly used annotation tool, PROKKA [@SEEMANN2014], and converts it to a format suitable for submisison to EMBL. It is robust, fast, memory efficient and well tested, and has been used to submit more than 30% of all annoated genomes in EMBL/GenBank [@PAGE2016]. It is a small, but essential missing step in making genomic research more open and reproducable.

References