Data-Driven Decisions Start with These 4 Questions

With data being considered the new oil, unique advantages are being brought into the business world. Properly using data can result in unimaginable possibilities, but to get the correct answers the right questions must be asked.  Read this blog post to learn more about how data is introducing optimized operations and new possibilities with the help of new questions being asked.


Data has become central to how we run our businesses today. In fact, the global market intelligence firm International Data Corporation (IDC) projects spending on data and analytics to reach $274.3 billion by 2022. However, much of that money is not being spent wisely. Gartner analyst Nick Heudecker‏ has estimated that as many as 85% of big data projects fail.

A big part of the problem is that numbers that show up on a computer screen take on a special air of authority. Once data are pulled in through massive databases and analyzed through complex analytics software, we rarely ask where it came from, how it’s been modified, or whether it’s fit for the purpose intended.

The truth is that to get useful answers from data, we can’t just take it at face value. We need to learn how to ask thoughtful questions. In particular, we need to know how it was sourced, what models were used to analyze it, and what was left out. Most of all, we need to go beyond using data simply to optimize operations and leverage it to imagine new possibilities.

We can start by asking:

How was the data sourced?

Data, it’s been said, is the plural of anecdote. Real-world events, such as transactions, diagnostics, and other relevant information, are recorded and stored in massive server farms. Yet few bother to ask where the data came from, and unfortunately, the quality and care with which data is gathered can vary widely. In fact, a Gartner study recently found that firms lose an average of $15 million per year due to poor data quality.

Often data is subject to human error, such as when poorly paid and unmotivated retail clerks perform inventory checks. However, even when the data collection process is automated, there are significant sources of error, such as intermittent power outages in cellphone towers or mistakes in the clearing process for financial transactions.

Data that is of poor quality or used in the wrong context can be worse than no data at all. In fact, one study found that 65% of a retailer’s inventory data was inaccurate. Another concern, which has become increasingly important since the EU passed stringent GDPR data standards is whether there was proper consent when the data was collected.

So don’t just assume the data you have is accurate and of good quality. You have to ask where it was sourced from and how it’s been maintained. Increasingly, we need to audit our data transactions with as much care as we do our financial transactions.

How was it analyzed?

Even if data is accurate and well maintained, the quality of analytic models can vary widely. Often models are pulled together from open-source platforms, such as GitHub, and repurposed for a particular task. Before long, everybody forgets where it came from or how it is evaluating a particular data set.

Lapses like these are more common than you’d think and can cause serious damage. Consider the case of two prominent economists who published a working paper that warned that U.S. debt was approaching a critical level. Their work caused a political firestorm but, as it turned out, they had made a simple Excel error that caused them to overstate the effect that debt had on GDP.

As models become more sophisticated and incorporate more sources, we’re also increasingly seeing bigger problems with how models are trained. One of the most common errors is overfitting, which basically means that the more variables you use to create a model, the harder it gets to make it generally valid. In some cases, excess data can result in data leakage, in which training data gets mixed with testing data.

These types of errors can plague even the most sophisticated firms. Amazon and Google, just to name two of the most prominent cases, have recently had highly publicized scandals related to model bias. As we do with data, we need to constantly be asking hard questions of our models. Are they suited to the purpose we’re using them for? Are they taking the right factors into account? Does the output truly reflect what’s going on in the real world?

What doesn’t the data tell us?

Data models, just like humans, tend to base judgments on the information that is most available. Sometimes, the data you don’t have can affect your decision making as much as the data you do have. We commonly associate this type of availability bias with human decisions, but often human designers pass it on to automated systems.

For instance, in the financial industry, those who have extensive credit histories can access credit much easier than those who don’t. The latter, often referred to as “thin-file” clients, can find it difficult to buy a car, rent an apartment, or get a credit card. (One of us, Greg, experienced this problem personally when he returned to the U.S. after 15 years overseas).

Yet a thin file doesn’t necessarily indicate a poor credit risk. Firms often end up turning away potentially profitable customers simply because they lack data on them. Experian recently began to address this problem with its Boost program, which allows consumers to raise their scores by giving them credit for things like regular telecom and utility payments. To date, millions have signed up.

So it’s important to ask hard questions about what your data model might be missing. If you are managing what you measure, you need to ensure that what you are measuring reflects the real world, not just the data that’s easiest to collect.

How can we use data to redesign products and business models?

Over the past decade, we’ve learned how data can help us run our businesses more efficiently. Using data intelligently allows us to automate processes, predict when our machines need maintenance, and serve our customers better. It’s data that enables Amazon to offer same-day shipping.

Data can also become an important part of the product itself. To take one famous example, Netflix has long used smart data analytics to create better programming for less money. This has given the company an important edge over rivals like Disney and WarnerMedia.

Yet where it gets really exciting is when you can use data to completely re-imagine your business. At Experian, where Eric works, they’ve been able to leverage the cloud to shift from only delivering processed data in the form of credit reports to a service that offers its customers real-time access to more granular data that the reports are based on. That may seem like a subtle shift, but it’s become one of the fastest-growing parts of Experian’s business.

It’s been said that data is the new oil, but it’s far more valuable than that. We need to start treating data as more than a passive asset class. If used wisely, it can offer a true competitive edge and take a business in completely new directions. To achieve that, however, you can’t start merely looking for answers. You have to learn how to ask new questions.

SOURCE: Haller, E.; Satell, G. (11 February 2020) "Data-Driven Decisions Start with These 4 Questions" (Web Blog Post). Retrieved from https://hbr.org/2020/02/data-driven-decisions-start-with-these-4-questions