Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re: Reading GTF file #3

Open
mdhe1248 opened this issue Aug 10, 2018 · 6 comments
Open

Re: Reading GTF file #3

mdhe1248 opened this issue Aug 10, 2018 · 6 comments

Comments

@mdhe1248
Copy link

Background

I am trying to read/load a GTF file, so that I can compute coverage of my sequencing reads on each gene. However, I obtained errors during reading and couldn't access to my GTF file. How could I open a GTF file?
By the way, my GTF file was downloaded from Gencode and contains mouse genome annotations.

Current Behavior

I tried to read my gtf file, but I got an error message.

using GenomicFeatures
GFF3.Reader( "gencode.vM18.annotation.gtf")

ERROR: MethodError: Cannot `convert` an object of type String to an object of type GenomicFeatures.GFF3.Reader
This may have arisen from a call to the constructor GenomicFeatures.GFF3.Reader(...), since type constructors fall back to convert methods.

I tried open, and this time I didn't get any error message.

reader = open(GFF3.Reader, "gencode.vM18.annotation.gtf")

GenomicFeatures.GFF3.Reader(BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}(BufferedStreams.BufferedInputStream{IOStream}(<128.0 KiB buffer, 100% filled, data immobilized>), -27, 6, false), false, Symbol[:feature], false, GenomicFeatures.GFF3.Record[], 0, 5)

Then, I executed IntervalCollection, but obtained an error

features = IntervalCollection(reader)

ERROR: GenomicFeatures.GFF3.Reader file format error on line 6 ~>"; gene_t"
Stacktrace:
 [1] _read!(::GenomicFeatures.GFF3.Reader, ::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}, ::GenomicFeatures.GFF3.Record) at /home/donghoon/.julia/v0.6/BioCore/src/ReaderHelper.jl:164
 [2] read! at /home/donghoon/.julia/v0.6/BioCore/src/ReaderHelper.jl:134 [inlined]
 [3] tryread!(::GenomicFeatures.GFF3.Reader, ::GenomicFeatures.GFF3.Record) at /home/donghoon/.julia/v0.6/BioCore/src/Ragel.jl:241                                     
 [4] start(::GenomicFeatures.GFF3.Reader) at /home/donghoon/.julia/v0.6/BioCore/src/Ragel.jl:258                                                                       
 [5] _collect(::Type{GenomicFeatures.Interval{GenomicFeatures.GFF3.Record}}, ::GenomicFeatures.GFF3.Reader, ::Base.SizeUnknown) at ./array.jl:394                      
 [6] GenomicFeatures.IntervalCollection(::GenomicFeatures.GFF3.Reader) at /home/donghoon/.julia/v0.6/GenomicFeatures/src/gff3/reader.jl:73     

Your Environment

  • Package Version used: 0.2.1
  • Julia Version used: 0.6.4
  • Operating System and version (desktop or mobile): Ubuntu 16.04.5
  • Link to your project:
@TransGirlCodes
Copy link
Member

Hi @mdhe1248, can you provide us with the file you are trying to read? This is a file format error thrown from the reader. This might mean several things, maybe something is not standard about the formatting of the file, or maybe there is a bug in the reader, or maybe the GFF3 grammer we use to generate the reader is too strict.

@mdhe1248
Copy link
Author

mdhe1248 commented Aug 11, 2018

Hi @benjward ,

GTF is a bit different from GFF3. That may be the reason why I get such errors above. Gencode combined manual annotations from HAVANA group and automatic annotations from Ensembl. https://en.wikipedia.org/wiki/GENCODE

This is the result from head gencode.vM18.annotation.gtf from the terminal.

$head gencode.vM18.annotation.gtf

##description: evidence-based annotation of the mouse genome (GRCm38), version M18 (Ensembl 93)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2018-07-02
chr1    HAVANA  gene    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; level 2; havana_gene "OTTMUSG00000049935.1";
chr1    HAVANA  transcript      3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; transcript_type "TEC"; transcript_name "RP23-271O17.1-001"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
chr1    HAVANA  exon    3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; transcript_type "TEC"; transcript_name "RP23-271O17.1-001"; exon_number 1; exon_id "ENSMUSE00001343744.1"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
chr1    ENSEMBL gene    3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; gene_type "snRNA"; gene_name "Gm26206"; level 3;
chr1    ENSEMBL transcript      3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; transcript_name "Gm26206-201"; level 3; transcript_support_level "NA"; tag "basic";

GTF is also widely used and is more close to GFF2. Maybe, it would be okay to make GFF2.Reader or GTF.Reader, or some reader that recognizes different extensions. I ran something on R and it has been several hours. I hope Julia can accelerate analysis process.

I downloaded Comprehensive gene annotation from https://www.gencodegenes.org/mouse_releases/current.html

@TransGirlCodes
Copy link
Member

TransGirlCodes commented Aug 11, 2018

Hi @mdhe1248 I'm surprised that we don't already have a GTF parser!
I shall look at making one today at the JuliaCon hackerthon.

@mdhe1248
Copy link
Author

mdhe1248 commented Aug 11, 2018

Oh, My PI must be there, probably talking about debugger. :)

Thank you for your help.

@TransGirlCodes
Copy link
Member

Is your PI Tim Holy? If you can I'd try to convert GTF to GFF3 if you use it with BioJulia, whilst a proper GFF2 parser is worked out.

@CiaranOMara CiaranOMara transferred this issue from BioJulia/GenomicFeatures.jl May 6, 2020
@jonathanBieler
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants