Skip to content

5. Outputs

Jared Johnson edited this page Oct 14, 2024 · 2 revisions

All Outputs

πŸ“¦${outdir}
┣ πŸ“‚${timestamp}
┃ ┣ πŸ“‚${taxa}
┃ ┃ ┣ πŸ“‚${cluster}
┃ ┃ ┃ ┣ πŸ“‚alns
┃ ┃ ┃ β”£πŸ“œ${timestamp}-${taxa}-${cluster}.{gubbins,snippy}.aln
┃ ┃ ┃ ┣ πŸ“‚dists
┃ ┃ ┃ ┃ β”—πŸ“œ${timestamp}-${taxa}-${cluster}-{accessory,core-snps}_.{poppunk,gubbins,snippy}-{long,wide}.csv
┃ ┃ ┃ ┣ πŸ“‚figures
┃ ┃ ┃ ┃ β”£πŸ“œ${timestamp}-${taxa}-${cluster}-core-snps_{ML,NJ}.{gubbins,snippy}.microreact
┃ ┃ ┃ ┃ β”£πŸ“œ${timestamp}-${taxa}-${cluster}-core-snps_{ML,NJ}.{gubbins,snippy}.jpg
┃ ┃ ┃ ┃ β”—πŸ“œ${timestamp}-${taxa}-${cluster}-{accessory,core-snps}_dist.{poppunk,gubbins,snippy}.jpg
┃ ┃ ┃ ┣ πŸ“‚snippy
┃ ┃ ┃ ┃ β”— πŸ“œ${timestamp}-${species}-${sample}.tar.gz
┃ ┃ ┃ ┣ πŸ“‚stats
┃ ┃ ┃ ┃ β”— πŸ“œ${timestamp}-${species}-${sample}.{gubbins,snippy}.stats
┃ ┃ ┃ β”— πŸ“‚trees
┃ ┃ ┃ β”—πŸ“œ${timestamp}-${taxa}-${cluster}-core-snps_{ML,NJ}.{gubbins,snippy}.nwk
┃ ┃ ┣ πŸ“‚poppunk
┃ ┃ ┃ ┣ πŸ“œ${timestamp}-${species}-pp_core_NJ.nwk
┃ ┃ ┃ ┣ πŸ“œ${timestamp}-${species}-pp_microreact_clusters.csv
┃ ┃ ┃ ┣ πŸ“œ${timestamp}-${species}-pp_perplexity20.0_accessory_mandrake.dot
┃ ┃ ┃ ┣ πŸ“œ${timestamp}-${species}-pp.microreact
┃ ┃ ┃ ┣ πŸ“œ${timestamp}-${species}-pp-core-acc-dist.txt.gz
┃ ┃ ┃ ┣ πŸ“œ${timestamp}-${species}-pp-jaccard-dist.txt.gz
┃ ┃ ┃ β”— πŸ“œ${timestamp}-${species}-pp-clusters.csv
┃ ┃ ┣ πŸ“‚other
┃ ┃ ┃ ┣ πŸ“œmultiqc_report.html
┃ ┃ ┃ β”— πŸ“œsoftware_versions.yml
┃ ┃ β”— πŸ“œ${timestamp}-summary.tsv
┃ β”— πŸ“œ${timestamp}-db-info.csv

Run Summary

Results are summarized for all samples included in the analysis. This includes both new samples and those that were stored in your BigBacter database. Subsets of this summary are also saved for each species-cluster.

${timestamp}-summary.tsv
${timestamp}-${species}-${cluster}-summary.csv

Column descriptions are provided below:

Column Name Description
ID Sample ID (_T[0-9]+ denotes replicate number)
STATUS Sample status (NEW or OLD)
QUAL Sample quality status (PASS or 'FAIL'). Based on values in PER_GENFRAC, PER_HET, and PER_LOWCOV
RUN_ID Run timestamp
TAXA Sample taxonomy (same as supplied in the samplesheet via taxa)
CLUSTER Assigned PopPUNK cluster
ISO_IN_CLUSTER Total number of isolates in the assigned PopPUNK cluster
ISO_PASS_QC Number of isolates in the assigned PopPUNK cluster that passed QC
MEAN_SNP_DIST_SNIPPY Mean pairwise SNP distances between the sample and all other isolates in the assigned cluster that passed QC, including the reference (without recombination masked)
MIN_SNP_DIST_SNIPPY Minimum pairwise SNP distances between the sample and all other isolates in the assigned cluster that passed QC, including the reference (without recombination masked)
MAX_SNP_DIST_SNIPPY Maximum pairwise SNP distances between the sample and all other isolates in the assigned cluster that passed QC, including the reference (without recombination masked)
STRONG_LINKAGE_SNIPPY List of samples in the cluster with pairwise SNP differences <= --strong_link_cutoff (without recombination masked)
INTER_LINKAGE_SNIPPY List of samples in the cluster with pairwise SNP differences between --strong_link_cutoff and --inter_link_cutoff (without recombination masked)
MEAN_SNP_DIST_GUBBINS Mean pairwise SNP distances between the sample and all other isolates in the assigned cluster that passed QC, including the reference (with recombination masked)
MIN_SNP_DIST_GUBBINS Minimum pairwise SNP distances between the sample and all other isolates in the assigned cluster that passed QC, including the reference (with recombination masked)
MAX_SNP_DIST_GUBBINS Maximum pairwise SNP distances between the sample and all other isolates in the assigned cluster that passed QC, including the reference (with recombination masked)
STRONG_LINKAGE_GUBBINS List of samples in the cluster with pairwise SNP differences <= --strong_link_cutoff (with recombination masked)
INTER_LINKAGE_GUBBINS List of samples in the cluster with pairwise SNP differences between --strong_link_cutoff and --inter_link_cutoff (with recombination masked)
LENGTH Length of the reference genome (bp)
ALIGNED Reference positions covered by the sample (bp)
UNALIGNED Reference positions not covered by the sample (bp)
RECOMB Number of sites in recombinant regions, including variant and invariant sites (bp).
VARIANT Number of variant sites detected in the sample, as compared to the reference (bp)
HET Number of heterogenous sites (bp)
MASKED Number of masked sites (bp)
LOWCOV Number of low coverage sites (bp)
PER_GENFRAC Percentage of the reference genome covered by the sample
PER_LOWCOV
PER_HET Percentage of the reference genome with heterogeneous sites

PopPUNK Results (πŸ“‚poppunk/)

PopPUNK Visualizations

Visualizations of the PopPUNK database are generated via poppunk_visualise with the --microreact option. You can view the .microreact file at https://microreact.org/upload.

${timestamp}-${species}-pp_core_NJ.nwk
${timestamp}-${species}-pp_microreact_clusters.csv
${timestamp}-${species}-pp_perplexity20.0_accessory_mandrake.dot
${timestamp}-${species}-pp.microreact

PopPUNK Clusters

Cluster assignments for all samples in the PopPUNK database. Should contain the same information as ${timestamp}-${species}-pp_microreact_clusters.csv.

${timestamp}-${species}-pp-clusters.csv

PopPUNK Distances

Pairwise core/accessory distances and Jaccard distances (same as Mash) produced from the PopPUNK database using sketchlib query.

${timestamp}-${species}-pp-core-acc-dist.txt.gz
${timestamp}-${species}-pp-jaccard-dist.txt.gz

Alignment Files (πŸ“‚alns/)

Whole genome (snippy) or core genome (gubbins) alignment files.

${timestamp}-${taxa}-${cluster}.{gubbins,snippy}.aln

Pairwise Distances (πŸ“‚dists/)

Pairwise distance files in long and wide formats. Included for core SNPs (snippy & gubbins) and accessory distances.

${timestamp}-${taxa}-${cluster}-{accessory,core-snps}_.{poppunk,gubbins,snippy}-{long,wide}.csv

Figures (πŸ“‚figures/)

Static image files (.jpg) and Microreact files of pairwise distance matrices and/or phylogenetic trees.

πŸ“œ${timestamp}-${taxa}-${cluster}-core-snps\_{ML,NJ}.{gubbins,snippy}.jpg
πŸ“œ${timestamp}-${taxa}-${cluster}-{accessory,core-snps}\_dist.{poppunk,gubbins,snippy}.jpg

Note

Static images: new samples are shown in bold, black text. Historic samples are shown in grey, plain text. Asterisks on nodes indicate bootstrap values < 70%.

Individual SNP Files (πŸ“‚snippy/)

Tar files containing all the info needed to perform SNP analysis using snippy-core.

${timestamp}-${species}-${sample}.tar.gz

Analysis Statistics (πŸ“‚stats/)

Sample statistics for snippy and gubbins.

${timestamp}-${species}-${sample}.{gubbins,snippy}.stats

Phylogenetic trees (πŸ“‚trees/)

Maximum likelihood or neighbor joining trees generated from Snippy or Gubbins core SNP alignments.

${timestamp}-${taxa}-${cluster}-core-snps\_{ML,NJ}.{gubbins,snippy}.nwk \

Database Info

Snapshot of your BigBacter database. This is only generated if --db_info is true.

${timestamp}-db-info.csv \