Chester Report: Is Big Data Watching You?

According to a new report, Big Data is Watching: Growing Digital Data Surveillance of Consumers by ISPs and Other Leading Video Providers by Jeff (Center for Digital Democracy) Chester, the Internet is out to get you. The report claims that ISPs, major Internet content providers, television networks, and Google are all amassing gigantic troves of information about everything you watch on TV or over Internet streaming services and everything you read on web sites. Chester says this is a threat to consumers because, well, it’s kinda like surveillance and maybe you thought nobody knows what kind of web sites and TV shows you like.  More specifically:

In addition to threats to privacy, there are practices that use data that can discriminate or harm vulnerable consumers, which should also be addressed by the FCC—such as the targeting of low-income households for loans through the use of video, the role of ethnic/racial data used in a digital profile, and how data about or involving children and adolescents are used for digital marketing purposes.

The Internet Changes TV Viewing

What this comes down to in a technical sense is the fact that video distribution over the Internet is fundamentally different from video over the air or over the cable network. TV broadcasters don’t know who is watching the shows they send over the air, but OTA TV doesn’t make financial sense without advertising. So broadcasters sell ads for OTA shows that are targeted at the audiences they believe their shows attract. This knowledge is incomplete, but traditional TV advertising was a lucrative business despite (or even because of) incomplete knowledge about who’s watching. But TV advertising is annoying to consumers, hence the appeal of devices like TiVo that permit us to skip entire blocks of ads with the push of a button.

In a more perfect world, advertisers would only show us ads that are relevant to our interests. While “interest-based advertising” isn’t possible on OTA TV, radio, or print, it’s eminently possible and even practical when the Internet becomes the vehicle for video presentation. This is because the Internet is fundamentally an “end-to-end” system in which each viewer sees a unique copy of the program. While OTA TV has to show the same ads to each viewer, Internet TV makes it possible for each viewer to see a personalized ad that may not be seen by anyone else.

By itself, this should not be a matter of concern. As long as it’s the case that ads are the revenue source that funds free TV, we’re going to see ads. If we all start using TiVos and pressing the green “skip ads” button all the time, ads will become something like product placements that are built into the shows we watch or around them. The only alternative is some sort of subscription fee that covers programming and distribution costs. It’s just dollars and cents.

Personalized ads are more valuable to advertisers than blanket ads that everyone sees. Personalized ads are more relevant to the elusive match between what we’re looking to buy and what the advertisers are looking to sell. Because they’re more relevant and more valuable, there’s a good chance that the day will come when we tolerate them better than we do today’s blanket ads for the muffler repair shop, the sporty car, or the boring beer.

How Google Sells Ads

But personalizing ads means the video distributor and the ad network that sells ads to advertisers has to know who’s watching and what kinds of things are interesting to each viewer. This isn’t always easy to determine. It’s easy for Google and other Internet search providers to know that people who enter specific search requests are looking to make purchases. If you type something like “2017 connected crossover car reviews,” it’s likely that you’re looking to buy a car that gets along with your phone. Similarly, if you search for “Sperry Gold Cup ASV 2-eye boat shoe” you probably want some boating shoes.

Google uses the search terms you enter to sell ads on the results page. When I do the car search above, I get ads from Mazda and Toyota, ten organic (actual) search results, and then three more ads; the shoe search produces similar results, with a couple of image results interspersed.  There’s nothing controversial about this, at least so far. It starts to get spooky when I go about my business – maybe making a purchase and maybe not – and then start to see ads for shoes and crossover cars as I read web pages. Scooting over to the New York Times to read about Apple TV, I see an ad for the Nissan Pathfinder, an SUV that’s close to the crossover category.

So yes, Google remembers and profits from the memory. Having read the article about Apple TV, I see that the Times has recommended some articles for me based on their knowledge of the articles I’ve read and possibly of the comments I’ve left on them. Again, there’s nothing controversial about the Times recommending stories to me; I’m glad they do and only wish their recommendations were better.

I can’t keep my searches private from Google and I can’t keep my article history secret from the New York Times, and I don’t want to. So while they’re both encroaching on my privacy in some sense, I’m fine with it because I see some value in the transactions that are taking place: Google is providing me with search results and the Times is providing me with news. I’m also OK with both of them selling information about me to other advertisers because it means I see more relevant ads, and hopefully fewer ads than I would otherwise see. This is only a problem for me when I see ads for things I’ve already bought, which happens a lot more than it should.

Back to the Chester Paper

The Chester report is hard to read because it has more text in the footnotes than in the body, there is very low information density in both, and the report is presented backwards. It starts with a very brief set of cautionary notices about the scary things that are happening on the Internet that violate our privacy rights and ends with a series of case studies of various firms in the advertising space. In order, these firms are:

  1. AT&T
  2. Cablevision
  3. Charter
  4. Comcast (including NBC Universal)
  5. Cox Communications
  6.  Dish
  7. Time Warner Cable
  8. Verizon
  9. Disney/ABC
  10. News Corp (FOX)
  11. Turner Broadcasting
  12. Viacom/CBS
  13. Google

So it’s ISPs, content creators, and Google. No Facebook, no Amazon, no Twitter, no Netflix. These exclusions are odd, obviously, since the excluded firms have lots of information about our likes and dislikes, our purchases, and our TV viewing habits. It appears that the author may just have an agenda.

ISP Phobia

Each of the case studies highlights advertising claims and employment ads for the services the non-Google companies are running or creating; naturally, they’re all a bit exaggerated. Chester seems to be most concerned about the targeting services with connections to ISPs because he believes, incorrectly, that ISPs have more information about us than they probably do:

Phone and cable ISPs are an especially significant and growing threat to our privacy because—as the key providers of our Internet and device connections—they have in-depth access to information about what we do online(page 3).

ISPs don’t know what we search for, so their knowledge of our immediate desires to purchase products and services is limited. And as we’re explained, a great deal of the information that passes between us and our ISP is encrypted, so they don’t even know what we’re reading and watching in many cases. The non-Google firms are trying to exploit Big Data – which is really a lot of very small, context-free pieces of information – with a lot of analysis to determine the information that traditional advertisers care about, such as demographics, income, and buying habits.

The concern about ISPs is buttressed by business connections ISPs are making with advertising networks, known as Data Management Platforms, to target users for personalized ads:

Using a predictive modeling algorithm developed by AT&T Labs,” along with other information, AT&T promises to deliver an “advertiser’s target audience when and where they are most likely watching content.” It also incorporates a consumer’s mobile device data, including “what wireless device they are using, what operating system they are using for their device, how large a data plan they have, and when their contract expires.” TV Blueprint “gives advertisers working with AT&T the ability to reach people based on factors like device, operating system, whether or not they’re heavy data users or the status of their carrier contract,” using “sophisticated second-by-second set top box data” and other information. AT&T pulls data “from millions of set-top boxes” and analyzes what a consumer views (such as on unaffiliated pay-cable networks), and uses these data to target consumers based on their viewing profile.

But this is more about the delivery of ads than about the collection of personal information. Consumers can watch TV shows on many different devices, but information about the devices simply helps the ad network deliver the ad; it doesn’t help target the ad. A similar pattern occurs when Chester discusses the other ISPs, and it’s not so much about the ISPs’ roles as providers of Internet services than about their role as transmitters of video streams. While the cable company or DVR manufacturer may know something about the TV shows we watch, their information is no greater than that of TiVo which functions not only as a set top box replacement but also as a gateway to Netflix, Hulu, Amazon, and the cable company’s on-demand services.

The discussion of the TV producers Disney, News Corp, Turner, and Viacom is no better. In the near future, we may stream their programming direct from their web sites, but for the time being the cable companies are a firewall between the consumer and the program for linear TV if not for on-demand.

It’s no exaggeration to say that the data collection, analysis, and delivery networks being built by the ISPs and content creators are attempts to close the gap between the broad and thin information they have about our viewing and shopping habits the narrow and deep information that Google, Facebook, Netflix and Amazon already have about us.

Much Ado About Very Damn Little

Despite Chester’s scary language about surveillance, privacy, discrimination, and influence, there’s very little, if anything, going on inside the ISPs and the content creators that isn’t already being done by Google and the other firms providing services at the edge of the Internet. Google and friends are encrypting communication with end users specifically to prevent other firms from gaining access to searches, purchases, shows watched, and social interactions because these things are vital to their business interests.

ISPs and content creators are both looking for ways to increase revenues, and so are the middlemen with well-established advertising businesses. Whether this is scary, creepy, acceptable, or simply the way the Internet works is a question of policy and politics. But at least you know what Jeff Chester is up to.