Is your data really oil?

[with Ajay Agrawal and Avi Goldfarb, originally published in HBR Online under the title “Is your company’s data actually valuable in the AI era?” , 17 Jan 2018. Their book, Prediction Machines, is coming out in April 2018].

AI is coming. That is what we heard throughout 2017 and will likely continue to hear throughout this year. For established businesses that are not Google or Facebook, a natural question to ask is: What have we got that is going to allow us to survive this transition?

In our experience, when business leaders ask this with respect to AI, the answer they are given is “data.” This view is confirmed by the business press. There are hundreds of articles claiming that “data is the new oil” — by which they mean it is a fuel that will drive the AI economy.

If that is the case, then your company can consider itself lucky. You collected all this data, and then it turned out you were sitting on an oil reserve when AI happened to show up. But when you have that sort of luck, it is probably a good idea to ask “Are we really that lucky?”

The “data is oil” analogy does have some truth to it. Like internal combustion engines with oil, AI needs data to run. AI takes in raw data and converts it into something useful for decision making. Want to know the weather tomorrow? Let’s use data on past weather. Want to know yogurt sales next week? Let’s use data on past yogurt sales. AIs are prediction machines driven by data.

But does AI need your data? There is a tendency these days to see all data as potentially valuable for AI, but that isn’t really the case. Yes, data, like oil, is used day-to-day to operate your prediction machine. But the data you are sitting on now is likely not that data. Instead, the data you have now, which your company accumulated over time, is the type of data used to build the prediction machine — not operate it.

The data you have now is training data. You use that data as input to train an algorithm. And you use that algorithm to generate predictions to inform actions.

So, yes, that does mean your data is valuable. But it does not mean your business can survive the storm. Once your data is used to train a prediction machine, it is devalued. It is not useful anymore for that sort of prediction. And there are only so many predictions your data will be useful for. To continue the oil analogy, data can be burned. It is somewhat lost after use. Scientists know this. They spend years collecting data, but once it has produced research findings, it sits unused in a file drawer or on back-up disk. Your business may be sitting on an oil well, but it’s finite. It doesn’t guarantee you more in the AI economy than perhaps a more favorable liquidation value.

Even to the extent that your data could be valuable, your ability to capture that value may be limited. How many other sources of comparable data exist? If you are one of many yogurt vendors, then your database containing the past 10 years of yogurt sales and related data (price, temperature, sales of related products like ice cream) will have less market value than if you are the only owner of that type of data. In other words, just as with oil, the greater the number of other suppliers of your type of data, the less value you can capture from your training data. The value of your training data is further influenced by the value generated through enhanced prediction accuracy. Your training data is more valuable if enhanced prediction accuracy can increase yogurt sales by $100 million rather than only $10 million.

Moreover, the ongoing value of data usually comes from the actions you take in your day-to-day business — the new data you accrue each day. New data allows you to operate your prediction machine after it is trained. It also enables you to improve your prediction machine through learning. While 10 years of data on past yogurt sales is valuable for training an AI model to predict future yogurt sales, the actual predictions used to manage the supply chain require operational data on an ongoing basis. And this is the important point for today’s incumbent companies.

An AI startup that acquires a trove of data on past yogurt sales can train an AI model to predict future sales. It can’t actually use its model to make decisions unless the startup obtains ongoing operational data to learn from. Unlike startups, large enterprises generate operational data every day. That’s an asset. The more operations, the more data. Furthermore, the owner of the operation can actually make use of the prediction. It can use the prediction to enhance its future operation.

In the AI economy, the value of your accumulated data is limited to a one-time benefit from training your AI model. And the value of training data is, like oil or any other input, influenced by the overall supply — it’s less valuable when more people have it. In contrast, the value of your ongoing operational data is not limited to a one-time benefit, but rather provides a perpetual benefit for operating and further enhancing your prediction machine. So, despite all the talk about data being the new oil, your accumulated historical data isn’t the thing. However, it may be the thing that gets you to the thing. Its value for your future business prospects is low. But if you can find ways to generate a new, ongoing data stream that delivers a performance advantage in terms of your AI’s predictive power, that will give you sustainable leverage when AI arrives.

One Reply to “Is your data really oil?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s