-
Notifications
You must be signed in to change notification settings - Fork 2
/
metadata.csv
We can make this file beautiful and searchable if this error is corrected: Illegal quoting in line 35.
51 lines (41 loc) · 3.55 KB
/
metadata.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
bucket_id,name,prefix,path,value
BUCKET_ID,42bp_link_github,,,https://github.com/omgenomics/bio-data-zoo
BUCKET_ID,42bp_tooltip,bam/bad,truncated.bam,BAM file is truncated
BUCKET_ID,42bp_tooltip,bam/bad,bai_older_than_data.bam,BAM is older than index by 1s (often a data transfer timing issue)
BUCKET_ID,42bp_tooltip,bam/bad,read_name_longer_than_254.sam,SAM file stores a read name longer than 254 character
BUCKET_ID,42bp_tooltip,bam/good,basic_unsorted.bam,BAM file is not sorted by mapping position
BUCKET_ID,42bp_tooltip,bam/good,compressed.sam.gz,SAM file compressed with bgzip
BUCKET_ID,42bp_tooltip,bam/good,indexed_bai.bam,BAM file with BAI index
BUCKET_ID,42bp_tooltip,bam/good,indexed_csi.bam,BAM file with CSI index
BUCKET_ID,42bp_tooltip,bam/good,indexed_csi.sam.gz,SAM file with CSI index
BUCKET_ID,42bp_tooltip,bam/good,indexed_tbi.sam.gz,SAM file with TBI index
BUCKET_ID,42bp_tooltip,bam/good,no_mapped_reads.bam,BAM file with no mapping information
BUCKET_ID,42bp_tooltip,bed/bad,spaces.bed,BED file with spaces instead of tabs
BUCKET_ID,42bp_tooltip,bed/bad,negative_coords.bed,BED file with negative coordinates
BUCKET_ID,42bp_tooltip,bed/bad,start_greater_than_end_coords.bed,BED file with invalid range where start > end
BUCKET_ID,42bp_tooltip,bed/bad,non_integer_coords.bed,BED file with floating point coordinates instead of integers
BUCKET_ID,42bp_tooltip,bed/good,compressed.bed.gz,BED file compressed with bgzip
BUCKET_ID,42bp_tooltip,bed/good,indexed_csi.bed.gz,BED file with CSI index
BUCKET_ID,42bp_tooltip,bed/good,indexed_tbi.bed.gz,BED file with TBI index
BUCKET_ID,42bp_tooltip,bed/good,unsorted.bed,BED file is not sorted by start position
BUCKET_ID,42bp_tooltip,fasta/good,basic_aligned.fa,FASTA output by MSA tool
BUCKET_ID,42bp_tooltip,fasta/good,compressed.fa.gz,FASTA compressed with bgzip
BUCKET_ID,42bp_tooltip,fasta/good,duplicate_sequence_names.fa,FASTA with duplicate sequence names
BUCKET_ID,42bp_tooltip,fasta/good,empty_lines.fa,FASTA with empty lines between sequences
BUCKET_ID,42bp_tooltip,fasta/good,multiline.fa,FASTA with sequences split across multiple lines
BUCKET_ID,42bp_tooltip,fasta/good,name_contains_spaces.fa,FASTA with spaces in sequence name
BUCKET_ID,42bp_tooltip,fastq/bad,quality_mismatch.fastq,FASTQ where 2nd read has len(sequence) != len(quality)
BUCKET_ID,42bp_tooltip,fastq/bad,truncated_clean.fastq,FASTQ where 3rd read is truncated right after the sequence
BUCKET_ID,42bp_tooltip,fastq/bad,truncated_halfway.fastq,FASTQ where 2nd read is truncated half-way through the sequence
BUCKET_ID,42bp_tooltip,fastq/good,compressed.fastq.gz,FASTQ compressed with bgzip
BUCKET_ID,42bp_tooltip,fastq/good,duplicate_+.fastq,FASTQ where + line shows read name
BUCKET_ID,42bp_tooltip,fastq/good,interleaved.fastq,FASTQ where R1/R2 are interleaved
BUCKET_ID,42bp_tooltip,fastq/good,multiline.fastq,FASTQ file where sequence/quality are multi-line (please don't do this)
BUCKET_ID,42bp_tooltip,fastq/good,[email protected],FASTQ file where quality starts with @ (trips up simple FASTQ parsers)
BUCKET_ID,42bp_tooltip,vcf/bad,missing_info_field.vcf,VCF uses field "AN" which is not defined in the header
BUCKET_ID,42bp_tooltip,vcf/good,basic_multisample.bcf,BCF with 1200+ samples
BUCKET_ID,42bp_tooltip,vcf/good,basic_multisample.vcf,VCF with 1200+ samples
BUCKET_ID,42bp_tooltip,vcf/good,compressed.vcf.gz,VCF compressed with bgzip
BUCKET_ID,42bp_tooltip,vcf/good,indexed.bcf,BCF indexed with CSI (TBI not supported for BCF)
BUCKET_ID,42bp_tooltip,vcf/good,indexed_csi.vcf.gz,VCF indexed with CSI
BUCKET_ID,42bp_tooltip,vcf/good,indexed_tbi.vcf.gz,VCF indexed with TBI