Friday, March 28, 2014

Why use R?

Why should R be preferred over other statistical software? Read in the words of an extensive user of both a proprietary statistical programming language as well as the open source alternative.

Add new colors to your R-charts

If you are a big fan of Wes Anderson's movies and if you love the quirky characters and stories, the distinctive cinematography, and the unique visual style, then you can bring some of that style to your own R charts, by making use of these Wes Anderson inspired palettes.

Wednesday, March 26, 2014

Statistics reveal a prescription drug epidemic

After the tragically early death of actor Philip Seymour Hoffman last month, Carlos Grajales finds that the statistics reveal a prescription drug epidemic in the US. Can you believe that in 2010 drug overdose caused more deaths than motor vehicle traffic crashes.

Tuesday, March 25, 2014

Overlapping Clusters

Aren't all of us used to seeing the well-separated clusters displayed in textbooks and papers.. But that doesn't happen in reality. So, what should one do in such cases? Read about how to deal with such situation at:

Handling Character data in R

In today's data-centric world, a statistician can't escape from text data. It's not a very difficult task, if we start in time. So, let's learn about handling character data in R with this free e-book:

Saturday, March 22, 2014

About Normality and Testing for Normality

It is often said that with small sample sizes, everything looks normal, as the normality tests are, indeed, very sensitive to what goes on in the extreme tails. In other words, if we have enough data to fail a normality test, we always will because our real-world data won’t be clean enough. If we don’t have enough data to reliably fail a normality test, then there’s no point in performing the test, and we have to rely on the fat pencil test or our own understanding of the underlying processes. Read the detailed reasoning at:

Why one shouldn't use Bivariate Correlations for Variable Selection?

In applied statistics, what typically happens is a researcher sits down with their statistical software of choice and they compute a correlation between their response variable and their collection of possible predictors. From here, they toss out potential predictors that either have low correlation or for which the correlation is not significant. The concern here is that it is possible for the correlation between the marginal distributions of the response and a predictor to be almost zero or non-significant and for that predictor to be an important element in the data generating pathway. Read more about why we shouldn't be using bivariate correlations for variable selection..

Friday, March 21, 2014

Teaching for Modern Generation

Dr. Rajeeva Karandikar speaking about how teaching should be transformed for modern generation which is an instant generation, the Facebook/Whats app/Twitter generation, the generation for which sending email is too slow. All those who are teachers or who aspire to become one, should not miss it ...

Thursday, March 20, 2014

80/20 Rule of Statistical Methods Development

Developing statistical methods is hard and often frustrating work. One of the under appreciated rules in statistical methods development is the 80/20 rule. The basic idea is that the first reasonable thing you can do to a set of data often is 80% of the way to the optimal solution. Everything after that is working on getting the last 20%. The hard decision is whether to create a new method is whether the 20% is worth it. This is obviously application specific. Here is an interesting discussion about 80/20 rule of statistical methods development.

The Improbability Principle

The video and slides from David Hand's lecture on the subject of his new book 'The Improbability Principle'.

It is about extraordinarily improbable events. It’s about events which are so unlikely that we wouldn’t expect to see them during our entire lifetimes - or even the lifetime of the human race or the universe itself. And it’s about why, despite all that, we do see such events; and more, it’s about why we them again and again.

Secrets of Teaching R: An Interview with Bob Muenchen

It is of interest to see what makes R so popular, yet ‘quirky’ to learn. To get some insight from a real pro here is an interview with Bob Muenchen. Bob is the author of 'R for SAS and SPSS Users'. He is also the creator of, a popular web site devoted to analyzing trends in analytics software and helping people learn the R language.

Google Drive in R

Want to retrieve all direct links to your Google Documents? R can help you out. Check out the details at:

Bayesian First Aid

Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. Here are a few of them:
Here are a few more introductory articles:

Thursday, March 13, 2014

Magical Wolfram Language

Examples of what can be done with the knowledge-based Wolfram Language..
Right from Blurring Faces in an Image to Hiding Secret Messages in Images, Make a You-Centric world map.. Do check out the complete list!!

Mathematical Character Curves

Check out to see how various shapes are represented through mathematical equations and inequalities..
We're glad to see that people have been enjoying our mathematical character curves!

Check out how you can play with your favorite cartoon characters using Wolfram Mathematica

Wednesday, March 12, 2014

A Hack to Create Matrices in R, Matlab style!!

The Matlab syntax for creating matrices is pretty and convenient. Its R-counterpart is functional but not as pretty, plus the default is to specify the values column wise. Using meta-programming we can hack together a function that allows us to create matrices in a similar way as in Matlab. Read more at:

Thursday, March 6, 2014

The Magical Mind of Persi Diaconis

When Diaconis first came to Stanford, he planned to keep his magic background a secret from his academic colleagues.. fearing they wouldn't take seriously a man of hocus-pocus who did research on card shuffling.
Then he stumbled upon a book that described an experiment by the French mathematician Paul Lévy, analyzing the phenomenon known as perfect shuffling - in which a standard deck of cards is carefully shuffled eight times and ends up returning precisely to its starting arrangement. Diaconis says. "I thought, If Paul Lévy can study perfect shuffling, I can say I study perfect shuffling. So I wrote up my work on perfect shuffling, and it got on the front page of The New York Times."

Forecasting weekly data

What would you do if the seasonal period is rather long and non-integer? For example, if you have a weekly data, ARIMA models do not tend to give good results. The simplest approach in such situation is a regression with ARIMA errors. Here is an example using weekly data on US finished motor gasoline products supplied (in thousands of barrels per day) from February 1991 to May 2005.

Wednesday, March 5, 2014

Beauty is the First Test

"Beauty is the first test; there is no permanent place in the world for ugly mathematics."
- G. H. Hardy

Why Mathematics Is Beautiful and Why It Matters, here is an Huffington Post article.

No need for SPSS – Now beautiful output in R as well

Many social scientists don't want to move R as it doesn't give a simple table view, just like the SPSS output window. The articles below discuss ways to put the results of certain statistics in HTML tables in R. These tables can be saved to disk or, even better for quick inspection, shown in a web browser or viewer pane... and then R output will be atleast as beautiful as the SPSS output.

Tuesday, March 4, 2014

Photoshop via Clustering

"Do not believe anything: what artists really do is to hang around all day."
-Paco de Lucia
It seems clustering is the new way to Photoshop.. one gets different variations with different no. of clusters..
PS: Don't miss the video link in the end.

Oldies but Goldies: Some Classical Books on Statistical Graphics

The article below highlights some interesting things about three classical books on statistical graphics. The books are old but still relevant and together they give a sense of the development of exploratory graphics in general and the graphics system in R specifically as all three books were written at Bell Labs where the S-language was developed.

Monday, March 3, 2014

Movies and Statistics

It’s Oscars season again, so why shouldn't statisticians enjoy this movie fever...

Here is some number crunching with IMDb data, using R..

Some tools on predicting Academy Awards..