Tuesday, May 6, 2014

Learning statistics: Simpsons paradox

Although I had a couple of nice courses in the university, I still feel myself not very confident when going through the statistics analysis part in a genomics paper. Apparently, it is not enough to know algorithms and biology to become a bionformatician :) After realizing (finally!) that statistics is super important for science and especially for data analysis, I started taking some MOOCs (this,this and this for example ) to refresh and improve my knowledge. As a result I am learning a lot of cool new things now! :)

Today I came across something interesting in the Exploratory Data Analysis course: Simpson's paradox.

Somehow I never heard about it before or just did not keep attention to it. The idea is the following: main trend or correlation is different for distinct groups in the dataset compared to the dataset as a whole. Such effect is usually introduced by some unrecognized confounding factor in the data.

Nice visualization and explanation can be found here.

No comments: