
log-ratios are actually ideal for normalizing for sequencing depth. An OTU that is present in 0.01% of the reads that then grows to 0.05% of the reads actually increased by 5x fold, which is huge! But if you only look at percent increase, its just 0.04% which may seem insignificant. Think about it, the changes in the most abundant OTUs will drown out the changes in the lower abundant OTUs. log-ratios are ideal for detecting large relative changes. There are a few clear cut advantages of using this approach. ANCOM's strategy is to run tests on the log ratios across OTUs rather than the OTUs themselves. It makes few assumptions about the distributions of the OTUs.ĭue to the highly dependent nature of our data sets, its actually quite difficult to formulate meaningful distributions. t-test, anova mann-whitney) directly on OTU tables to identify differentially abundant species, since they implicitly assume independence between the OTUs. With this reasoning, it doesn't really make much sense to apply univariate tests (i.e. If you'd like some resources to check out statistical sampling, I'd start off checking out the Multinomial distribution. And this isn't even accounting for all of the ecological dependencies present. The sampling process itself imposes dependence between the OTUs. All OTUs are sampled without replacement from some environmental sample. When you think about how OTUs are sampled in the first place, this makes sense. It makes no assumption about independence between features. There are two things that makes this tool very strong metagenomeSeq makes the assumption using Zero Inflated Gaussian distribution.ĪNCOM does make its own set of assumptions, making it a little differient from the conventional differiential abundance tool. DESeq2 makes assumption using the Negative Binomial distribution. You cannot hope to solve the differential abundance problem without making some assumptions. In fact, there are an infinite number of hypotheses that explain this change of proportions, making the differential abundance problem impossibly difficult.
For instance, it is a valid hypothesis that species 2-10 all halved. Also, there are multiple hypotheses that could explain the change of proportions.
Here, it appears that everything is changing, which isn't the case in the original environment.
Above are the proportions of the species in the exact same environment across the two time points.