Facebook Doesn’t Tell Users Everything It Really Knows About Them

by Julia Angwin, Terry Parris Jr. and Surya Mattu
ProPublica, Dec. 27, 2016

Facebook has long let users see all sorts of things the site knows about them, like whether they enjoy soccer, have recently moved, or like Melania Trump.

But the tech giant gives users little indication that it buys far more sensitive data about them, including their income, the types of restaurants they frequent and even how many credit cards are in their wallets.

Since September, ProPublica has been encouraging Facebook users to share the categories of interest that the site has assigned to them. Users showed us everything from “Pretending to Text in Awkward Situations” to “Breastfeeding in Public.” In total, we collected more than 52,000 unique attributes that Facebook has used to classify users.

Facebook’s page explaining “what influences the ads you see” says the company gets the information about its users “from a few different sources.”

What the page doesn’t say is that those sources include detailed dossiers obtained from commercial data brokers about users’ offline lives. Nor does Facebook show users any of the often remarkably detailed information it gets from those brokers.

“They are not being honest,” said Jeffrey Chester, executive director of the Center for Digital Democracy. “Facebook is bundling a dozen different data companies to target an individual customer, and an individual should have access to that bundle as well.”

When asked this week about the lack of disclosure, Facebook responded that users can discern the use of third-party data if they know where to look. Each time an ad appears using such data, Facebook says, users can click a button on the ad revealing that fact. Users can still not see what specific information about their lives is being used.

The company said it does not disclose the use of third-party data on its general page about ad targeting because the data is widely available and was not collected by Facebook.

“Our approach to controls for third-party categories is somewhat different than our approach for Facebook-specific categories,” said Steve Satterfield, a Facebook manager of privacy and public policy. “This is because the data providers we work with generally make their categories available across many different ad platforms, not just on Facebook.”

Satterfield said users who don’t want that information to be available to Facebook should contact the data brokers directly. He said users can visit a page in Facebook’s help center, which provides links to the opt-outs for six data brokers that sell personal data to Facebook.

Limiting commercial data brokers’ distribution of your personal information is no simple matter. For instance, opting out of Oracle’s Datalogix, which provides about 350 types of data to Facebook according to our analysis, requires “sending a written request, along with a copy of government-issued identification” in postal mail to Oracle’s chief privacy officer.

Users can ask data brokers to show them the information stored about them. But that can also be complicated. One Facebook broker, Acxiom, requires people to send the last four digits of their social security number to obtain their data. Facebook changes its providers from time to time so members would have to regularly visit the help center page to protect their privacy.

One of us actually tried to do what Facebook suggests. While writing a book about privacy in 2013, reporter Julia Angwin tried to opt out from as many data brokers as she could. Of the 92 brokers she identified that accepted opt-outs, 65 of them required her to submit a form of identification such as a driver’s license. In the end, she could not remove her data from the majority of providers.

ProPublica’s experiment to gather Facebook’s ad categories from readers was part of our Black Box series, which explores the power of algorithms in our lives. Facebook uses algorithms not only to determine the news and advertisements that it displays to users, but also to categorize its users in tens of thousands of micro-targetable groups.

Our crowd-sourced data showed us that Facebook’s categories range from innocuous groupings of people who like southern food to sensitive categories such as “Ethnic Affinity” which categorizes people based on their affinity for African-Americans, Hispanics and other ethnic groups. Advertisers can target ads toward a group — or exclude ads from being shown to a particular group.

Last month, after ProPublica bought a Facebook ad in its housing categories that excluded African-Americans, Hispanics and Asian-Americans, the company said it would build an automated system to help it spot ads that illegally discriminate.

Facebook has been working with data brokers since 2012 when it signed a deal with Datalogix. This prompted Chester, the privacy advocate at the Center for Digital Democracy, to file a complaint with the Federal Trade Commission alleging that Facebook had violated a consent decree with the agency on privacy issues. The FTC has never publicly responded to that complaint and Facebook subsequently signed deals with five other data brokers.

To find out exactly what type of data Facebook buys from brokers, we downloaded a list of 29,000 categories that the site provides to ad buyers. Nearly 600 of the categories were described as being provided by third-party data brokers. (Most categories were described as being generated by clicking pages or ads on Facebook.)

The categories from commercial data brokers were largely financial, such as “total liquid investible assets $1-$24,999,” “People in households that have an estimated household income of between $100K and $125K,” or even “Individuals that are frequent transactor at lower cost department or dollar stores.”

We compared the data broker categories with the crowd-sourced list of what Facebook tells users about themselves. We found none of the data broker information on any of the tens of the thousands of “interests” that Facebook showed users.

Our tool also allowed users to react to the categories they were placed in as being “wrong,” “creepy” or “spot on.” The category that received the most votes for “wrong” was “Farmville slots.” The category that got the most votes for “creepy” was “Away from family.” And the category that was rated most “spot on” was “NPR.”

Clarification, Jan. 4, 2017: We’ve added details about what Facebook tells users regarding third-party data. Specifically, each time an ad appears using such information, Facebook says, users can click a button on the ad revealing the use of third-party data.

ProPublica is a Pulitzer Prize-winning investigative newsroom. Sign up for their newsletter.