Friday, June 28, 2013

A picture is worth a thousand words

How useful different graphs and visualizations can be, for understand various aspects of the data..?? Here is an example: Analyses of the Best Undergraduate Business Schools of 2013

Split violin plots

Violin plots are useful for comparing distributions. When data are grouped by a factor with two levels (e.g. males and females), you can split the violins in half to see the difference between groups.

Is There Any Point to the 12 Times Table

Exactly why do we use times tables at all? If learning tables up to 10 is good, then learning them up to 12 is better. The typical error if you know up to your 10 times table is 9.4%. But if you know up to your 12 times table, it is only 8.2%. 

Tuesday, June 25, 2013

Income inequality, real and personal

In a different take on the income inequality issue, the Economic Policy Institute, in collaboration with Periscopic, created

This website brings clarity to the national dialogue on wage and income inequality, using interactive tools and videos to tell the story of how we arrived at the state of inequality we find today and what can be done to reverse course and ensure workers get their fair share.

Monday, June 24, 2013

Wednesday, June 19, 2013

Can you correct a 300-year-old error?

A challenge: can you correct something that Jacob Bernoulli got wrong? It stayed wrong for nearly 300 years until our author, Professor Antony Edwards, spotted it and corrected it.  It is a simple exercise in schoolboy probability. It is Problem XVII in Part III of Bernoulli’s book. For those who would like to try their hand, here is the problem:

Here are some interesting articles about Ars Conjectandi:

Monday, June 17, 2013

On “Geek” Versus “Nerd”

Do you know the difference between Nerd and Geek ?? As J.R. Firth notes “You shall know a word by the company it keeps”, here is an analysis done on the basis of pointwise mutual information (PMI):

Thursday, June 13, 2013

Understanding Multicollinearity

Roughly speaking, Multicollinearity occurs when two or more regressors are highly correlated. Most of us  know what does it mean, how to detect it and are taught how to cope with it, but not why is it so. From Wikipedia: “In this situation (Multicollinearity) the coefficient estimates may change erratically in response to small changes in the model or the data.” The Wikipedia entry continues to discuss detection, implications and remedies. Here is an attempt to provide the intuition:

Top 100 R packages for 2013

Statistical analysis of R packages using R:

Monday, June 10, 2013

There Was a Time before Mathematica…

Here is Stephen Wolfram's story in his own words about how Mathematica was developed:

How to Do Statistical Research

The strategy: “develop theory/model/method, seek application" is a bad start in statistical research problems. If you seek an application after developing any theory, you don’t ask, “What is a reasonable way to answer this question, given this data, in this context?” Instead, you ask, “Can I answer the question with this data; in this context; with my theory, model, or method?” Who then considers whether a different (perhaps simpler) answer would have been better? Here is Terry Speed discussing this issue:

At what sample size do correlations stabilize?

Maybe you have encountered this situation: you run a large-scale study over the internet, and out of curiosity, you frequently check the correlation between two variables.

The experience with this practice is usually frustrating, as in small sample sizes correlations go up and down, change sign, move from “significant” to “non-significant” and back. So at what sample size do correlations stabilize?

Thursday, June 6, 2013

SAS Dominates Analytics Job Market; R up 42%

Here is an article giving interesting insights into the popularity of data analysis software:

A new Sudoku Solver in R.

Sudoku is nowadays probably the most widespread puzzle game in the world. As such, it has an interesting variety of solving techniques, not just with paper and pencil but also with computers. In R, there is even a package, dedicated exclusively to Sudokus. Here is one article discussing, how to develop an algorithm for solving Sudoku: