This is a part of a complete series of my learning notes on Microbiome. This contains statistics and data analysis concepts, tutorials and resources.
In microbiome data analysis, say, taxonomic or functional profiling is identifying some microbes, genes, or pathways.
How certain are we in the identification?
How do we know the tool identified it accurately?
There is almost always an uncertainty in any data and/or any analysis. How do we understand the uncertainty and how much of it is uncertain?
💡** The answer to this is the Statistics. **
So what forms the basis to understand statistics?
🤔 Some basic terminology can help understand better - View Jargons
Data analysis in microbiome, helps us understand if the data is a count or compositional data, how many zeroes (sparsity) of the data, the variation within samples of the same population and between the populations (eg. health vs disease).
What is the chance that the change in abundance of this microbe is associated with disease? How sure (quantify) are we to decide this microbe/gene is different between conditions (health vs disease)?
Likewise, the chances to conclude a lot of things in any biological data is through probability.
We know there is a chance, but is that chance helping us conclude something about the data? Can we validate the hypothesis we think?
More content coming soon….