Saturday, March 22, 2014

Why one shouldn't use Bivariate Correlations for Variable Selection?

In applied statistics, what typically happens is a researcher sits down with their statistical software of choice and they compute a correlation between their response variable and their collection of possible predictors. From here, they toss out potential predictors that either have low correlation or for which the correlation is not significant. The concern here is that it is possible for the correlation between the marginal distributions of the response and a predictor to be almost zero or non-significant and for that predictor to be an important element in the data generating pathway. Read more about why we shouldn't be using bivariate correlations for variable selection..

No comments:

Post a Comment