Wednesday, July 16, 2014

Learning statistics: Principal Component Analysis


We decided to apply Principal Component Analysis for upcoming multisample BAM QC analysis in Qualimap.

The idea is very simple: we compare 2 and more BAM files based on a number of various metrics including mean coverage, GC content, insert size, mapping quality and others. Since there are multiple comparison features avaialable one can apply PCA to analyze how different are the samples from each other by using biplot.

Here is a small collection of tutorials on PCA that I went through to get more confident with the topic.

1) An intuitive explanation of PCA
Nice overview of the topic, focusing on understanding the application of PCA.

2) A tutorial on principal component analysis
Practical tutorial with very low entry level which allows to do try PCA on a small example

3) Using R for Multivariate Analysis
Great tutorial on multivariate analysis using R. Totally recommended.