Thursday, January 30, 2014

Comparing multiple (g)lm in one graph

It is already possible to compare multiple models as table output, here the author has built a function that plots several (g)lm-objects in a single ggplot-graph:

Recurrent events analysis, not so straightforward!

Heart failure hospitalizations are associated with an increased risk of cardiovascular death, so if an individual dies during follow-up, this isn't necessarily independent of the event process of interest. Dependent censoring needs to be accounted for in any analyses that are carried out and this renders standard methods as unsuitable. Here is some discussion about the alternative approaches:

History through the president’s words

Studying president's choice of words, over time, provides glimpses of change in American politics. Check out different tabs.. Eg. Foreign policy gives a very clear picture of how relations evolved with various countries..

Free books on statistical learning

Here are some books related to statistical learning, freely available online:

Story Competition

People first started talking about the Normal Distribution nearly 300 years ago. The scientific community used their understanding of the Normal Curve to model and give meaning to the results of their experiments.Today, we owe much of our modern technology and modern world to the discoveries made possible by the Normal Curve. So, what would the world be like if the Normal Curve had never been discovered? Submit your story for a chance at $3500 in cash prizes!

Wednesday, January 29, 2014

Charts That Don’t Start at Zero

A statistician throws light on how an improper usage of statistical tools can lead to misleading conclusions:

Tuesday, January 28, 2014

Interview with Inventor of S and R

John Chambers (creator of S programming language & core member of R programming language project) recounts the history of S and R in the following interview:

John Chambers talks about his involvement in the birth of the S language in 1976, and how it evolved over the years to become the inspiration for the R language.

Monday, January 27, 2014

Public transit times in major cities

Here is an interesting visualization..

You can select the time of the day and day of the week,  and get a realistic estimate of how long it takes to get from point A to point B. There is also an interesting comparison option, which lets you choose two locations to see which area will get you somewhere else faster. 

Musings on Random Walk

"A drunk man will find his way home, but a drunk bird may get lost forever."
- Shizuo Kakutani
Want to know why? Read at:

R Tricks for Kids

Here is an article from 'Teaching Statistics', which describes real-world phenomena simulation models, which can be used to engage middle-school students with probability. Links to R instructional material and easy-to-use code are provided to facilitate implementation in the classroom.

Friday, January 24, 2014

An interview with Sir David Cox

Sir David Cox is arguably one of the world’s leading living statisticians. He has made pioneering and important contributions to numerous areas of statistics and applied probability over the years, of which perhaps the best known is the proportional hazards model, which is widely used in the analysis of survival data. In this interview, he says, "I would like to think of myself as a scientist, who happens largely to specialise in the use of statistics”

Read the complete interview at:

Wednesday, January 22, 2014

Does 1+2+3… really equal -1/12?

A recent Numberphile video claims that the sum of all the positive integers is -1/12. Bothered by that, Evelyn Lamb talks about what it means to assign a value to an infinite series and explains different ways of doing this.

A century of passenger air travel

Kiln and the Guardian explored the 100-year history of passenger air travel, and to kick off the interactive is an interactive map that uses live flight data from FlightStats. The map shows all current flights in the air right now. Be sure to click through all the tabs. They're worth the watch and listen, with a combination of narration, interactive charts, and old photos.

Tuesday, January 21, 2014

Solving water resource problems using Statistics

In an exclusive interview, Dr. Upmanu Lall, Director of Columbia Water Center discusses how he uses Statistics and an understanding of climate, agriculture, commerce, engineering, technology, and politics to solve some of the world’s most pressing water problems:

Sunday, January 19, 2014

Not Missing at Random

Not Missing at Random (NMAR) is data that is missing for a specific reason..
Here is an interesting example of NMAR data.. with the message that one shouldn't be sad and low, after reading on Facebook, about abnormally flattering lives of their friends' ..

The Music Timeline

The Music Timeline shows genres of music waxing and waning using stacked area chart. Each stripe on the graph represents a genre; the thickness of the stripe tells you roughly the popularity of music released in a given year in that genre.

An Interview with Larry Wasserman

Professor Larry Wasserman is currently Professor in the Department of Statistics and Machine Learning at Carnegie Mellon University. His research interests include nonparametric inference, machine learning, statistical topology and astrostatistics. Here is a link to his interview where he talks about statistics and his career in statistics.

R is the most-used tool

O'Reilly has just published the results of the Data Scientist Salary Survey, based on data collected from attendees of the O'Reilly Strata conferences in 2012 and 2013. Each respondent listed multiple tools that they used both in data roles and non-data roles. R topped the list of Statistical Software beating SAS, SPSS, Excel etc.

Thursday, January 16, 2014

Competitions to celebrate 175th Anniversary of ASA

American Statistical Association is celebrating 175th anniversary. You may celebrate with them by doing any of the following:
  • Entering ASA's Got Talent, the ASA's unique talent competition
  • Looking for clues in Amstat News and playing ASA's Trivia Challenge
  • Sending in your design for the ASA's official 175th anniversary T-shirt

Submit your entries before 30th April 2014. More details can be found at: 

Wednesday, January 15, 2014

Timeline of Statistics

Check out this precise yet detailed "timeline of statistics" published by Significance magazine to celebrate its 10th anniversary..

Regression with Gradient Descent

Here is an overview of the gradient descent algorithm, which offers some intuition on why the algorithm works and where it comes from, and provides examples of implementing it for ordinary least squares and logistic regression in R:

Lexical distance between European languages

So why is English still considered a Germanic language and not the  Latinate one? How do you measure the proximity in linguistic families? Read more at:

Tuesday, January 14, 2014

n vs n-1

People keep on wondering “Why is the denominator in the sample mean n, but the denominator for the sample variance is n−1?” All of us have had to answer this question at some time in our careers, either for our students or for ourselves. How do you answer it, and how helpful is your answer? Do you feel obliged to introduce distinctions such as populations vs samples, description vs inference, parameters vs statistics, Greek vs Roman letters? Or more advanced concepts, such as degrees of freedom, dimensions of subspaces, unbiasedness or maximum likelihood? Read more at:

How Much Time to Conceive?

One of the most important questions that people ask when they make the decision to have a child is: how long is it going to take us to get pregnant? The probabilities mentioned by doctors provide an answer to this question. But these probabilities are estimates at best (albeit, no doubt, educated estimates!) and are associated with some not insignificant uncertainties. Here is an approach to judge how important is the value of the monthly probability in determining the time to conception, using basic probability distributions and R visualisations:

Monday, January 13, 2014

Hidden History

A modern statistician needs to appreciate the historical roots of the profession, argues Terry Speed:

So look to your statistical roots!

Friday, January 10, 2014

Are you saving too much?

The only hard-and-fast rule for how much retirement income you will need is that there is no hard-and-fast rule. New research shows that many retirees can live well on less than the amount suggested by financial industry but others rack up higher expenses through travel, expensive hobbies or medical costs that can't be avoided. Read more at:

Thursday, January 9, 2014

From spreadsheet thinking to R thinking

One may have inertia in switching from spreadsheets to R. Here is a post to help overcome the same:

Statistics and The War

We all agree that wars are terrible and to be avoided to the greatest extent possible, yet it is hard not to concede that wars can bring scientific, technological, industrial, cultural, political, even economic benefits. This is one of the many paradoxes of war. Statistics is no exception. Not only was there extremely rapid development of some areas of statistics, especially industrial statistics, but also a large proportion of the leaders in our subject in the 40 years following the World War 2 met it for the first time during the War. Most of them, would not have become statisticians but for the War.

Wednesday, January 8, 2014

Paul Erdös, The Maverick Genius

One of the finest minds in the history of mathematics, Erdös chose as his epitaph the self-deprecating Hungarian phrase “Finally I am becoming stupider no more.” Read more about him at:

Friday, January 3, 2014

Bodily maps of emotions

Emotions are often felt in the body, and somatosensory feedback has been proposed to trigger conscious emotional experiences. As one would expect, the body looks like it shuts down with depression, and it lights up with happiness, but it's the subtle differences that are most interesting. Read how statistical classifiers were used to distinguished emotion-specific activation maps accurately:

Thursday, January 2, 2014

Generalized linear models for predicting rates

We often need to build a predictive model that estimates rates. A simple example is estimating default rates of mortgages or credit cards. One could try linear regression, but specialized tools often do much better. Here is a discussion of  how to do such things in R:

Experience the thrill of touching real data

The story of one man's efforts to re-analyse the stats behind a BBC report on bowel cancer is a heartwarmingly nerdy one:

Wednesday, January 1, 2014

Parallelisation may not be always better than sequential processing

Parallelisation incurs some overhead: information needs to be distributed over the nodes, and the result from each node needs to be collected and aggregated into the resulting object. This overhead is one of the main reasons why in certain cases parallel processing takes longer than sequential processing. Read more at:

Animation of the Construction of a Confidence Interval

The confidence interval is one of the more tricky statistical concepts. A way of explaining confidence intervals is as the region of possible null hypotheses resulting in corresponding significance tests that are not rejected. Turns out it is not easy to make a corresponding nice explanatory animation either, but that’s what has been tried here: