Big Data: When Can You Act on Correlation?

The key is to know when correlation is enough, and what to do when it is not

David Ritter, director at BCG, has explored the world of big data and when companies should take action based on observed correlations in the data.

His ideas have big implications for business because, if correlation is enough, then instead of having to know what causes customers to act, it may be enough just to know what things tend to happen together.

For example, many large supermarkets already understand that women who buy certain kinds of food may tend to be pregnant. Digital Life reported that in 2012:

… news broke of how data analytics by Target in the US enabled it to identify which customers were pregnant – and even what trimester they were in. It famously sent coupons for baby products to a teenage girl whose father, unaware she was expecting, angrily confronted a Minneapolis store manager.

Ritter notes that the key question when looking at correlation in the data is “Can I take action on the basis of a correlation finding?”

And his answer:

The answer to that question is “It depends”—primarily on two factors:

  1. Confidence That the Correlation Will Reliably Recur in the Future. The higher that confidence level, the more reasonable it is to take action in response.
  2. The Tradeoff Between the Risk and Reward of Acting. If the risk of acting and being wrong is extremely high, [then] … acting on even a strong correlation may be a mistake.

The first factor—the confidence that the correlation will recur—is in turn a function of two things: the frequency with which the correlation has historically occurred (the more often events occur together in real life, the more likely it is that they are connected) and the understanding around what is causing that statistical finding. This second element—what we call “clarity of causality”—stems from the fact that the fewer possible explanations there are for a correlation, the higher the likelihood that the two events are in fact linked. Considering frequency and clarity together yields a more reliable gauge of the overall confidence in the finding than evaluating only one or the other in isolation.

When working with big data, sometimes correlation is enough. But other times, understanding the cause is vital. The key is to know when correlation is enough—and what to do when it is not.

To read the full article by David Ritter, visit the BCG website.