When Digital Dust Is Gathered, Constellation May Be Muddled

That constellation of information known as Big Data can be a sight to behold.

Adam Frank of NPR’s 13.7 blog explains Big Data as “the ability to understand (and control) a seemingly chaotic world on levels never before imagined.”

Big Data is like gathering digital dust, says New Yorker tech blogger Gary Marcus. “It’s a very valuable tool,” he says, “but it’s rarely the whole solution by itself.”

This new information era is brought to you by a few different factors, says Chris Barnatt, who teaches computing at Nottingham University in London and runs ExplainingComputers.com.

“We’ve got more powerful processors. The cost of memory gets less. The cost of storage gets less. And also, a lot more data is being created,” he tells NPR’s Jacki Lyden on weekends on All Things Considered. “Everything from social media sites, to buying things online with e-commerce, to simulations going on in companies’ medical facilities. Everyone is creating more and more information.”

Marcus says “Big Data” versus just “data” is really a matter of magnitude. But he says that’s important because quantity can actually make a difference in the story the data tells.

Here’s the catch: While Big Data can uncover correlations between data points, it doesn’t reveal causation. Sometimes, that doesn’t really matter, but other times, it might — in ways we’re not always aware of.

For instance, the city of Boston developed a smartphone app called Street Bump to track potholes. “It passively collects GPS data and accelerometer data so it can report when you drive over potholes,” Kate Crawford, a researcher for Microsoft Research and visiting professor at MIT, tells Lyden.

The thing is, not everyone bouncing around on bad roads has a smartphone — or the app, for that matter — to help gather the data.

“And indeed, this maps very closely around how rich a particular neighborhood is and what the ages of the people who live in that neighborhood [are],” Crawford says.

She says the ethics get tricky with that sort of skewed data. The city is trying to address the discrepancy by working with academics. But Boston isn’t the only city taking on such projects. The Wall Street Journal reports on an initiative in New Jersey to manage traffic.

To summarize the tension around Big Data, New York Times reporter Steve Lohr quotes Albert Einstein: “Not everything that counts can be counted, and not everything that can be counted counts.”

Lohr, who writes about Big Data and privacy, tells Lyden he’s more concerned about what the data gets wrong than how much is revealed about us.

“The real danger for most of us is discrimination by statistical inference,” he says.

Remember when you searched for “deep fryer” online for your cooking class? Well that action could now be associated with unhealthy behavior, and the nuance gets lost.

It’s not just online tracking consumers should be wary of, Lohr says. Credit cards, for example, store a lot of information that you might not think of. His advice:

“Put the health club membership on the credit card, but your visits to the liquor store should be in cash because those things will follow you in ways you don’t know now.”

Copyright 2013 NPR. To see more, visit http://www.npr.org/.