Statistics

Abundance statistics of metabarcoding data

Normalisation

Do not remove singletons before calculating chao1 predicted diversity. This process relies on knowing the number of singleton and doubleton OTU’s/ASV’s. This also means that is there are many spurious singleton/ doubleton OTU’s (due to sequencing error etc.), then the chao estimates will be poor.

To rarefy or to not rarefy

Rarefying data is a method of normalising datasets so that they can be directly compared without biases. The main bias being avoided in metabarcoding is read depth - where some samples have many more sequences than others the diversity they will uncover will surely be different purely because of these extra reads.

The procedure followed is normally

  • choose a minimum number of reads that must be present in a sample

  • remove samples with fewer reads than this chosen threshold

  • randomly Subsample all samples until they have the same number of reads

In phyloseq use rarefy_even_depth.

The major drawback of this approach is loss of data. “Despite its current popularity in microbiome analyses rarefying biological count data is statistically inadmissible because it requires the omission of available valid data.” (ref)

Additionally,

Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases

Alpha Diversity

Alpha diversity

Beta Diversity

Beta diversity

Bray Curtis distance

Differential Abundance

Analysis of microbial compositions: a review of normalization and differential abundance analysis

— Author: Nicola Coyle 25/01/2022