Predictably inaccurate Big data inaccuracy, prevalence, and pitfalls

 

Do our digital breadcrumbs truly portray who we really are? In the mad rush to collect big data, are companies overlooking the fact that not all of what they have might be relevant? John Lucker, Susan Hogan, and Trevor Bischoff share insights on targeted messaging, data integrity, and much more.

When a matchmaker may say, “Tanya, I’ve got someone perfect for you,” you’re not going to run off and marry that person. You’re going to do your research. You’re going to spend time with that person and verify that what they are saying is correct. 

Learn More

Subscribe on iTunes

Listen to more podcasts

TRANSCRIPT

TANYA OTT: First impressions aren’t always what they appear. It’s true for relationships—and for big data. 

I’m Tanya Ott, and this is the Press Room, Deloitte University Press’s podcast on the issues and ideas that you and your business should be thinking about. And one of those issues is data—how you get it and what you do with it.

I’m gonna start [with] a little personal [story] here. I live with my very own data geek. My husband Jason is a data scientist and whenever I’m interviewing someone for the podcast and they cite research, he always peppers me with questions.  

JASON FULMORE: What’s the methodology? How did they collect their data? Was it self-report or something else? What’s the sample size? [What are the] limitations?

TANYA OTT: Honestly, it can be maddening at times because I don’t always know the answers. But here’s the thing—many businesses spend a load of money buying data about their potential customers, and those businesses also don’t know the answers to those questions.

JOHN LUCKER: When I came up with [the] idea, it was largely as a result of being a consumer.

TANYA OTT: John Lucker is the global advanced analytics market leader at Deloitte and Touche LLP.

JOHN LUCKER: My role is to help identify opportunities to leverage data and analytics to solve tough business problems.

TANYA OTT: John says despite all the buzz about big data, as a consumer he didn’t feel like companies were interacting with him in particularly personalized or meaningful ways.

JOHN LUCKER: I still feel like largely I get blanket emails that everybody else gets. I feel like I’m touched in the same way that the Sunday newspaper supplements touch me—you know, everybody gets the same thing. And marketers seem to hold on to what is called a “spray-and-pray” technique, which is just send out a lot of offers and hope that you get 0.2 percent more responses than you got the last time you sent them out.

TANYA OTT: Can you point to a particular instance in your life where you got something and you thought, “Wow! This is so off the mark. How could they possibly profile me in this way with some kind of data?”

JOHN LUCKER: You know, many of us have experienced this. Perhaps we’re helping our parents out with some of their bills or things like that. And then they passed and you still [get] their mail years later. Or maybe you got your name spelled wrong consistently all of the time from a particular company.

These are the types of things that I’ve noticed and I know many of my friends have noticed. And I’ve talked casually with people in the business world and we just kind of scratch our heads about how could this data be so problematic?

TANYA OTT: You’re really savvy at this stuff, but for people for whom this is not their expertise, can you explain how this whole ecosystem of data collection works? I mean, some data is collected by individual companies and other data, they buy from brokers?

JOHN LUCKER: I’ll try. The ecosystem is, in fact, a bit of a mystery to most and I would suspect that those who know it best are the ones directly involved in the data-broker industry. It’s considered to be a largely proprietary industry and many data brokers keep it pretty close to them in terms of how the data flows from them to what they gather as an individual company, what they license and resell from others. It’s very difficult to figure that out.

What I’ve learned in working with and talking to some data brokers is that they specialize in certain things. They tend to create niches or specialties that they’re known for. And then they round out the kind of holistic personal [picture] of the individuals or the companies that they have in their data by gathering data through licensing agreements and reseller agreements from other data brokers. So, in essence, data broker A and data broker B may only gather 5, 10, [or] 15 percent of all the data they sell, and then they just get it from other companies to create their data products and services.

TANYA OTT: If I were to use an analogy from the computing sphere, it’s almost like something pings all over the place and you can’t quite track back where the origin of the data is.

JOHN LUCKER: That’s correct. And that’s one of the big problems companies who purchase data from data brokers need to think about when they use this data for important business purposes.

TANYA OTT: Here’s the thing. A lot of consumers, just like you and me, are convinced that companies have really robust data profiles on us. After all, they can track what we buy online. They can scan public records for information on our houses and our jobs. They can track our movement on social media. Many people are actually kind of creeped out when they realized just how big their digital breadcrumbs are.

But John Lucker wanted to know just how accurate all of that data is, so he tapped two of his colleagues at Deloitte—Susan Hogan ...

SUSAN HOGAN: I’m on the behavioral economics and management team. We explore a lot of the behavioral insights—that’s the “why” behind behavior and cognitive biases that come into play.

TANYA OTT: ... and Trevor Bischoff.

TREVOR BISCHOFF: I’m deeply involved with data analytics, predictive modeling, and regulatory guidance for financial firms.

TANYA OTT: And together, Trevor, Susan, and John created a survey ...

JOHN LUCKER: And we sent it to a tranche of people inside our firm who typically would represent a highly desirable base of customers that most companies would salivate to have as their customers.

TANYA OTT: Those people voluntarily and anonymously responded to the survey by going out to a particular data-broker service, retrieving their records, and then reporting back to John, Susan, and Trevor about the accuracy and timeliness of their data. And what they found was surprising. A significant amount of basic demographic data about them was wrong and sometimes quite wrong.

TREVOR BISCHOFF: We found that the highest performing category was home data, in which only 41 percent of respondents reported that data was 0 to 50 percent correct. The worst performing category was economic data. Eighty-four percent of respondents reported that their data was 0 to 50 percent correct.

JOHN LUCKER: Statistically that means that for most of the data fields that were not accurate, they were barely a coin toss. It really makes you wonder, and frankly a little bit concerned, about what might happen to data.

TANYA OTT: Trevor says even some pretty basic data was wrong.

TREVOR BISCOFF: How many rooms or bathrooms are in their house; the estimated value of their home; their estimated annual income. If you have someone’s address, you do have Internet sites that you might be able to pull data from. Whereas economic data can be a little bit more difficult to access because that information isn’t freely available.

TANYA OTT: And some people are just simply left out of the data collection process.

JOHN LUCKER: One of the things that we found in our study was that people who were born outside the United States, but have been living in the United States for a reasonable period of time—working, living, and consuming here in the United States—still had a surprisingly low data presence and [low] accuracy of the data in the data that was studied. This implies that there is a timeliness and data-flow issue in the ecosystem, but because of it, it also can create a natural bias and skewedness to any analytics that might be done with that because these people are either missing or [their data is] incomplete. I think that’s kind of a classic case study of what can happen and why data brokers need to probably study these things more carefully.

TANYA OTT: Why is there so much inaccurate data?

SUSAN HOGAN: It could be that the data was collected in a way that was not ideal. People may have been led to “demand effects”—they may have felt their response may have an evaluative component. If you asked me what I had for breakfast and I actually had, you know, something that is not acceptable …

TANYA OTT: Like chocolate cake?

SUSAN HOGAN: Chocolate cake! Or a certain product, alcohol or a cigarette or something like that for breakfast, I may not have shared that! So anything where you’re asked something in a setting and you think maybe your answer isn’t what they want. Demand effect can also [mean] we want to be pleasing respondents. We want to give the answer that we think people want to hear just to help out the person. There’s a lot of reasons why we may, as individuals, not give a truthful answer.

TANYA OTT: She says another problem could be the data gatherers don’t ask the right questions or they make inferences from the data that aren’t correct.

SUSAN HOGAN: We use the example of someone who buys a magazine on hang gliding or has a subscription to a magazine on hang gliding. An insurer may think this person engages in risky behavior, but really it may be the case that they are a photographer and they just like the beautiful scenery. The same with a cigar collector. Some people who collect cigars may smoke only occasionally. That (collecting cigars) could send a signal that may or may not be true about behavior.

One thing we saw was that [assumptions made regarding] political party [affiliation] were often wrong. People may look to certain demographics about a person—their geography, their occupation—and make inferences that way, but that may be incorrect as well.

TANYA OTT: I think your comment about assuming someone’s political affiliation is really interesting. I live in a world with a lot of journalists, and on social media they’re going to follow or “like” everybody across the spectrum, even some really fringe sorts of organizations, and when that data is collected, it could look like they’re particularly interested in one kind of political realm versus another.

And a lot of journalists are often misidentified or mistargeted with ads and news stories and other things like that on social media because of that. It’s a very interesting world.

SUSAN HOGAN: This gets into some of the whole behavioral considerations which my group looks into. The boomerang effect—this is this idea of targeting or messaging or trying to persuade, and having the exact opposite effect of what you thought you’re going to have. So if I don’t know anything about Tanya and I start out the relationship, I’m going to keep it very light and get to know [her] in a reciprocal way.

We’re going to exchange information like you would on a first date, and it’s going to grow over time and I’m going to learn more. If I think I came into the situation understanding your political affiliation, I may start with a comment, and if I’ve gotten it wrong, that’s the end of this relationship. I may have set you off, and it will inhibit our ability to move forward—so you can do more damage than good by sometimes thinking you know something better [rather] than letting it follow that natural progression.

I think with big data what happens is sometimes people think, “Oh, now that I know all this information, I can start as though I’m on the third date.” And you always want to start with the first date—if you can use that information to inform or give you an idea about this person, you shouldn’t jump so far to the intimate relationship until you have time to develop that on your own.

TANYA OTT: You guys have done a lot of looking at case studies on this and I’m wondering, with the boomerang effect, have you seen an example or two that you think is really illustrative about how a company [that] may be [engaged in] targeted messaging or advertising or marketing or something like that at a consumer or at a consumer group had that backfire because they just didn’t have the data right?

SUSAN HOGAN: We have some humorous examples. We have a 20-something targeted with by AARP, which you know just makes people laugh. And then we have some examples [where] they’re using the wrong information. Another behavioral consideration that comes in often with the boomerang effect is this idea of psychological reactions or behavioral freedom and, even if you got your information on me right, it’s just too creepy, it’s just too personal, and it’s just too invasive, and that may just make me want to pull away.

It’s nice if I take time to look at your LinkedIn profile and find out we have commonalities [in] where we went to school, some common friends, or some common places [where] we [have] lived. But when it gets so invasive or so personal, I think, “Gosh, that stuff I only tell my inner circle.” Even if it is right, it has the effect of making me want to put distance between us.

TANYA OTT: Obviously this has got to be a big issue for businesses because they’re investing in this data. They hope to use the data in a strategic way. And yet, the data may not be any good. So what strategies can companies employ to help ensure their data is accurate and to safeguard themselves against the problems that you outlined?

TREVOR BISCHOFF: Know where the data is coming from. What are the sources? Are these government public records or is this third-party data collected on the Internet through browsers?

JOHN LUCKER: I also think that customers of data brokers should insist on some type of warranty on the data that they’re buying. I’m not sure if that warranty would be offered or honored, but it seems that if in fact there is an expectation that this data be timely and accurate—and I assume the brokers are representing this data [to] be timely and accurate as well—[then] that type of warranty should be an important component of the purchase.

I also think that businesses should be very involved and proactive in how they use this data. They should take time to explore the data and do some independent tests themselves to validate the accuracy of the data itself. They should also build into their strategies the expectation that the data isn’t going to be perfect. They shouldn’t plan on it being overly correct. They should plan their business actions and strategies around the inevitability that by blindly acting on data with the assumption that it is accurate, they’re going to create problems for themselves.

TANYA OTT: This is only going to get more complicated as computing power multiplies and [the] ability to capture and store data from more sources increases. John Lucker, Susan Hogan, and Trevor Bischoff share additional thoughts on how your business can protect its data insights in their article “Predictably inaccurate: Big data inaccuracy prevalence, pitfalls, and prescriptions.” That’s in the newest issue of Deloitte Review, publishing July 24.

I’m Tanya Ott for the Press Room. You can tweet us at @du_press—we’d love to hear from you!

This podcast is provided by Deloitte and is intended to provide general information only. This podcast is not intended to constitute advice or services of any kind. For additional information about Deloitte, go to Deloitte.com/about.