CDT’s Diagram Muddies the Waters

The week before last, I wrote about the factual errors in a letter to the FCC signed by a collection of advocacy groups with an interest in consumer privacy. Briefly, the letter grossly overstated the amount of personal information available to ISPs from Internet communications and understated the ease with which consumers can shield communications from the (prying?) eyes of ISPs. The reality is that many Internet “edge services” such as Amazon and Google can and do encrypt communications with their users and customers, and consumers are free to use Virtual Private Networks to cloak communications that would otherwise be unencrypted.

One of the signatories is the Center for Democracy and Technology (CDT,) a group that generally has a better grasp on Internet technology than the other signatories, Free Press and Public Knowledge in particular. CDT employs actual technologists, which can’t be said for the others. But CDT is mainly a group of lawyers, so we have to take their technical analysis with a bit of context.

Two of the CDT lawyers, Alex Bradshaw and Stan Adams, have written a blog post on the issues facing the FCC as it seeks to apply its new Section 222 authority to the Internet. Most of the post reads like this:

Sections 222(c) and (d) control how, and under what conditions, carriers may use, share, or disclose “Customer Proprietary Network Information” (CPNI), the definition of which includes information related to the “quantity, technical configuration, type, destination, location, and amount of use of a telecommunications system…and information contained in bills.”

This probably means that ISPs must obtain customer consent before sharing information about Web surfing habits with ad networks, or it would mean that if the ISPs were on a level playing field with Amazon and Google. The lawyers will sort that out.

The blog post is accompanied by a memo that attempts to decompose Internet services into component parts that adds nothing to the analysis in the CDT blog, and actually detracts from it by containing the same kinds of nagging errors Peter Swire found in the FCC letter. The memo argues that application level encryption leaves protocol headers unencrypted, and seems to argue that unencrypted protocol headers (for TCP and IP) provide ISPs with important information about what the customer is doing. As the memo puts it:

Long-term monitoring of packet headers traveling to and from IP and MAC addresses can reveal patterns and associations that paint a picture of what kinds of information customers are sending and receiving over the Internet, and when, where, and how they do so. For instance, by looking at packet size, packet streams, and IP addresses, a network operator could infer that you are streaming a movie from a particular content provider. A network operator could begin to develop a comprehensive profile of your broadband usage patterns, or even your personal habits, like when you sleep, work, watch movies and send email.

While this is at least partially true, it has very little significance. The encrypted packets that pass between Google search and a Google user simply expose the fact that an interaction takes place, not the identity of the user or the terms the user is looking for. It could be anyone in the house, even a visitor, and the search can be about anything. Only Google knows for sure who submitted the search request or what it was about.

Streaming a movie from Netflix provides no more information than a Google search, since it’s encrypted in TLS v1.2. The ISP can find out that someone in the house initiates a Netflix streaming session, but does not know which person and which movie. Again, the specific identity of the user and the content is known only to Netflix.

I verified this by examining a Netflix session with Wireshark, a network analyzer. This is what the capture looks like. The lines labeled “TLSv1.2” mean the connection is encrypted, which is confirmed by the fact that the TCP packets contain gibberish.

Netflix capture

So the memo and its accompanying diagram are a lot of smoke and mirrors. ISPs have very limited insight into what individual subscribers are doing across the Internet; packets are encrypted, protocol headers provide very little information due to the fact that everyone on a household shares a common IP address (the one that belongs to the router) and because individual users do not share MAC addresses with the ISP. Hence, CDT’s initial claim that ISPs can monitor IP and MAC (Ethernet and Wi-Fi interface addresses) is false.

We access the Internet from our home routers, which contain a function known as a Network Address Translator that replaces the IP and MAC addresses of the devices within the home with a common, global IP address and the MAC address the router port that connects to the DSL or cable modem. To get information about who is doing what you need to be on the other end of the Internet, where Google and Netflix sit.

CDT’s description of packets, protocols, layers, and encryption is nice, but they should look at some actual data flows before jumping to conclusions. The post does raise an interesting question, however: Assuming that ISPs and edge services should be on a level playing field with respect to privacy, if it’s troubling that ISPs might be able to decode such patterns as “your broadband usage patterns, or even your personal habits, like when you sleep, work, watch movies and send email,” is it equally troubling that Google, Amazon, and Netflix might be able to profile user information that happens to be at least that personal?

Is it troubling that Amazon knows I use Tom’s toothpaste, Google knows I’m taking an overseas trip this week, and Netflix knows I like to watch cowboy cop shows like “Longmire?” It doesn’t trouble me a great deal that anyone knows these things because I’ve just disclosed them, maybe even truthfully.

Finally, CDT’s decomposition of Internet data formats is cloudy because it omits the most important element of Internet use, the stream. Packets, protocols, layers, and headers are much less important that the thing they enable, which is streams of data between users and users or users and services. Information streams are the most important element of Internet interaction, and any analysis of the Internet that fails to mention them isn’t very useful. Streams will be the subject of a post to come shortly.