FCC Confused About Privacy

As we’ve explained in previous posts, the FCC is often very confused about the way the Internet works. The agency has claimed that the Domain Name Service routes information across the Internet, for example, when it really does little more than translate domain names into IP addresses. The routing of packets is handled by the Internet Protocol (IP) itself, with help from a protocol known as Border Gateway Protocol (BGP) that allows IP routers to build maps of the Internet.

FCC Chairman Wheeler’s ISP privacy proposal makes similar errors, and adds a new logical error that wasn’t quite so evident in his “open Internet” rulemaking.

The factual error is the claim that consumers are powerless to hide their Internet activities from the presumptively untrustworthy  ISPs. The proposal falsely claims that consumers necessarily share web surfing details with ISPs:

Even when data is encrypted, broadband providers can still see the websites that a customer visits, how often they visit them, and the amount of time they spend on each website.

This claim is partially true for consumers who simply visit sites like Google.com that are encrypted by default with TLS, but it’s not at all true for consumers who use VPNs. It’s only partially true because the information that TLS exposes to ISPs is limited to IP addresses and flows to and from those addresses. This information is a lot less useful than Wheeler imagines because web pages are composed of page elements that have their own IP addresses and data flows. This is easy to confirm by looking at the ads that accompany typical web pages.

Number of TCP Connections per Web Page

Number of TCP Connections per Web Page. Source: HTTP Archive.

These ads are served up by ad servers associated with ad networks, and there can be cases in which more actual packets come to the user from ads than from the substance of the web pages he or she visits. This is especially true for the annoying video ads that accompany so many web pages these days, especially pages devoted to sports and news. The typical web page contains 54 image requests today, which account for 1.5 MB of data.

The HTML of the web page itself averages a mere 67 KB by comparison. All told, the typical web page entails 40 different IP addresses today, and all the ISP can do with the all that information is guess what the important parts are and what parts are merely ads.  It would require an enormous amount of computation to figure out what the user is doing at these IP addresses and which, if any, are actually interesting to the user.

As a practical matter, converting the raw information that ISPs can harvest from web requests made by users who aren’t using VPNs is a very difficult task. The information is both highly random and very fragmented. So Wheeler’s proposal assumes a status quo that doesn’t really exist.

This is par for the course in the public commentary around the privacy issue from outside the FCC as well. Professor Nick Feamster has written letters to the FCC as well as blog posts that claim ISPs are in a position to see “much more user traffic from many more devices than other parties in the Internet ecosystem…”

This is factually incorrect for two reasons and misleading in one other. Even though ISPs may be “in a position” to see a lot of traffic doesn’t mean they actually do anything with the traffic they can see or even that seeing the traffic is the same as understanding it. In fact, it’s a lot harder to extract any user preference information from a raw data stream – even when it’s unencrypted – than it is to extract useful data on the other side of the web.

Internet Vantage Points

Internet Vantage Points

Let’s look at a little diagram of the Internet’s vantage points where web surfing is concerned. Users enter URLs in their web browsers, which pass web page requests to the TCP element in their operating system (Windows or MacOS.) The operating system passes it on to the home router, which sends it up to the headend of the cable modem network (or its equivalent in non-cable networks.) The ISP network connects to the web server, either directly or through a transit network. If the web server is commercial in any way, it relies on a series of advertising networks to record the user’s visit and offer up suggestions on the ads the user is going to see.

The ownership of many of these vantage points varies. Consumers use multiple web browsers, operating systems, and home routers. Web servers use a variety of transit networks and ad servers. The fact that home routers can come from either ISPs or the consumer’s retail choice is the largest omission in the Feamster analysis. He assumes the home router is the primary vantage point for the prying ISP, which completely falls apart when users buy their own routers and simply use the ISP equipment to carry packets without awareness of which user in the house is looking at the given web page.

Both Feamster and the FCC make the gigantic leap from the rather limited fact that ISPs know the IP addresses of all the non-VPN packets they carry to the conclusion that they can actually do something with this information. There is a strangely inconsistent application of what ISPs and consumers might be able to do with what they actually do. When discussing consumers, the tendency of privacy advocates and the FCC is to disregard the protections of personal privacy afforded to those who exercise opt-out rights and use VPNs because most consumers don’t care very much about privacy. If the tools are available to consumers but consumers don’t use them, it doesn’t matter what they tell pollsters because their behavior says they don’t care about Internet privacy.

But when discussing ISPs, the balance of “can” and “do” swings in the opposite direction. Even though ISPs are capable of adding surveillance code to home routers, there’s no evidence that they actually do. When I made home routers for ISPs I didn’t get a single request to track web site visits. If I had, it would have been an enormous project that would have doubled or tripled the price my company charged our ISP customers. And we had control of the code in the home router, not the ISPs.

So what difference does it make that the ISPs could track user Internet behavior if they don’t?

As to the claim that ISPs are in a position to see much more user traffic than other parties in the Internet ecosystem, this certainly isn’t true in relation to advertising networks. For encrypted web sites, only the web servers and ad networks know which pages the user is visiting. This information is much more valuable than simply having a collection of dozens of IP addresses for each web page the user visits and no way to make any sense of them.

So the Wheeler and Feamster analyses seem to express a concern that ISPs may someday develop the capability to parse user activities that can rival and possibly even surpass the capabilities that ad networks already have today. Therefore, the FCC’s privacy inquiry compares the possibility of the ISPs becoming serious rivals to the ad networks to the reality that ad networks have more information than any other party in the Internet space regarding user web activity.

I’m not sure that would be such a bad thing. But even if it is, shouldn’t we be talking about ISP data collection practices versus those of ad networks if we want to have a coherent policy dialog? The false claims, misdirection, and cherry-picking about who knows what about whom is preventing this discussion from taking place, and that’s unfortunate.