Behind the Buzz of Behavioral Data

What is the economic value of online behavioral data? It is a seemingly simple question.

Define the terms. Behavioral data refers to information produced as a result of actions, typically commercial behavior using a range of devices connected to the Internet, such as a PC, tablet, or smartphone. Behavioral data tracks the sites visited, the apps downloaded, or the games played.

More to the point, behavioral data is not static data. Static data refers to a person’s slowly changing features, such as education, income, occupation, or (typically) residential location. Behavioral data has value because it provides information above and beyond what static data can provide.

Behavioral data has crept its way into economic life through the efforts of numerous online websites that track cookies, among many methods. For example, if a man named Sam visits websites for Porsche, then it might get recorded in a database about Sam’s online behavior. What is the value of knowing that Sam visited that site? The data could be informative about what type of car Sam prefers to buy, namely, Porsche automobiles and expensive sports cars. The data permits an analyst to divide people into segments on the basis of preferences—in this case, those who shop for sports cars (as opposed to those who do not).

So, what is wrong with that? Issues are starting to arise because the segmenting is just too good. This requires an explanation.

Additional information

Behavioral data contributes to two valuable activities—risk assessment and targeted advertising. These overlap, and their differences cause trouble.

First consider targeted marketing, taking the example of Sam visiting the Porsche website. Before behavioral data, marketing campaigns aimed at Sam used static data. Static data already reveals plenty about Sam. Long ago, somebody figured out how to collect data from a state’s Department of Motor Vehicles, which keeps good records on auto registrations. From those data alone, many auto companies already knew Sam’s driving record and general preference for a sports car, or something else like the one Sam drove.

If someone had cared to look, additional mailing lists also indicated related tendencies about Sam, including his age, gender, income, and marital status. A host of other databases could reveal even more about him, like his preference for magazines, whether he owns a gun, and so on. Those might indicate whether Sam was a good candidate for a sporty sedan or not, for example.

In other words, the incremental value of behavioral data does not arise in a vacuum. Static data also allowed somebody to estimate these probabilities. Behavioral data has to improve on this.

Does it? Yes, if it helps targeting, namely, estimating the probability of an event. Said another way, there is no point in showing an auto ad to someone who always had no likelihood of buying a product. The purpose of an ad is to improve the odds, and that requires showing the right ad to the right person.

Static data already allowed a marketer to estimate the odds. In other words, Porsche dealers already had a pretty good idea about which neighborhoods to target if they want to let potential buyers know that about the sale of a few models, an event which might get somebody to buy a Porsche instead of another sports car. Historical experience also informs a dealer’s estimate of likely conversion rates in each neighborhood.

But behavioral data allows for something on top of more precise targeting. Online targeting allows for rapid response. That is because the marketing effort can catch the reader while they are performing online activity. That can catch someone while shopping online at home or using a smartphone in a retail outlet.

In in the old days, static data only went so far. Advertisers had to buy time on local television, or with newspaper inserts. These were not very targeted and would show the ads to far more viewers than necessary.

Targeting a surfer potentially works much better if it separates surfers into many groups and lets the car dealer avoid showing ads to anybody who has zero chance of making a purchase. In other words, the Porsche dealer may look for high-income targets, but not all high-income households. Online behavior can help identify customers who never purchase Porsches and never will—avoiding young adults with less income, say, or older adults with no demonstrable taste for speed, or minivan owners with too many children to buy a two-seater.


So far this all sounds inoffensive. Targeted advertising saves money by not serving an ad to everyone. This is usually inoffensive, because, in principle, the dealer is not refusing to serve any group (in the unlikely event they show up). Refusing to deal is illegal, but refusing to inform with an ad is not.

A slight rephrasing should make this situation sound more problematic. Targeted advertising is one of the few legal forms of discrimination.

That brings us to the issues with risk assessment. In many of its mathematical formats, risk assessment is not much different than targeted advertising. Indeed, sometimes the algebra is identical. Risk assessment is all about dividing people into groups and assigning risks to each group. Sometimes that borders on illegal discrimination.

Think of it this way. Auto insurers want to catalogue people into risky drivers and less risky drivers, and then divide by distinct degrees of risk (for example, few speeding tickets). It helps them estimate premiums for different types of car owners. Not only is insurance for Porsches expensive due to their high cost of repair, but the insurance should be more expensive if the drivers of Porsches tend to get in more accidents. Premiums reflect those factors.

In this sense, behavioral data is valuable. Its value arises from better predictive models for risk, which tailors risk-related products to the variance in human behavior instead of merely the variance in static categories of human traits.

Make no mistake about its value. Quite a few important economic transactions are sensitive to the risk assessment attached to someone. It is so for just about anything involving credit—for example, credit cards, mortgages, auto loans, and even a seller’s credit for a furniture purchase.

That points toward the danger, too. Modern societies have never been comfortable with too much tailoring of any of these risk-assessment products. To be blunt, it has become acceptable to discriminate on the basis of smoking and drinking, but not race, gender, religion, or many other factors.

That should give anybody some pause. Behavioral data can become functionally equivalent to racial and religious discrimination, or discrimination on such matters as sexual orientation, sporting, and hobbies, and tastes for products and services. Online behavioral data often reveals quite a bit, after all.


Society is also uncomfortable with behavioral data because it can contain mistakes. Standard commercial processes can accentuate the mistakes if those processes contain no process for corrections.

To illustrate the issues, consider credit fraud, which happens with alarming frequency today. If someone steals Sam’s credit card and makes a range of illegal purchases, Sam must make sure the fraud does not stay in his records, ruining his credit score. Fortunately, there are regulations that deal with such events and compel firms to fix them, and Sam can learn quickly who to call.

Now consider something remarkably common, such as an identity mistake during registration at an online site. What if Sam’s actions get mistaken for another person’s because he and another person share the same first and last name? No law governs a firm’s obligations. It can be much more difficult to clear up, and there are many stories of these types of issues beginning to multiply at modern data brokers.

Commercial practices can make matters worse. In particular, one data broker might  sell records to another, especially if one broker sells its predictive model to another, as is common when they do not provide competing services. Through a variety of resale mechanisms, once a mistake finds its way into one database, it may find its way into many others.

How do users find out if a mistake played a role in an important aspect of a transaction, such as its price? That can be a challenge. And what right does a user have to forbid a firm from using information? That is an open question in many countries today.

Think about this for 10 minutes, and the questions get more vexing. What online behavior can vendors put into the databases of brokers? In many countries, the answer is “everything,” including YouTube videos watched, a cooking site featuring high-cholesterol dishes, and pornography.


To summarize, behavioral data’s value over static data arises from the finer segmenting it permits. Issues are starting to arise, however, because behavioral data is becoming too good. Similarity between the mathematics of targeting and risk assessment raises the stakes. One person’s target can be another’s risk assessment.

It’s a conundrum: a seemingly inoffensive target for marketing can easily become an offensive and costly insurance premium. Are you confident we are in good hands?

Copyright held by IEEE. To view the original, see here.

14 Replies to “Behind the Buzz of Behavioral Data”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s