Thursday, January 26, 2017

Policy-based Privacy is Over


Yesterday, President Donald Trump issued an executive order to enhance "Public Safety in the Interior of the United States".

Of interest here is section 14:
Sec. 14.  Privacy Act.  Agencies shall, to the extent consistent with applicable law, ensure that their privacy policies exclude persons who are not United States citizens or lawful permanent residents from the protections of the Privacy Act regarding personally identifiable information.  
What this means is that the executive branch, including websites, libraries and information systems may not use privacy policies to protect users other than US citizens and green card holders. Since websites, libraries and information systems typically don't keep track of user citizen status, this makes it very difficult to have any privacy policy at all.

Note that this executive order does not apply to the Library of Congress, an organ of the legislative branch of the US government. Nevertheless, it demonstrates the vulnerability of policy-based privacy. Who's to say that Congress won't enact the same restrictions for the legislative branch? Who's to say that Congress won't enact the same restrictions on any website. library or information system that operates in multiple states?

Lawyering privacy won't work any more. Librarianing privacy won't work any more. We need to rely on engineers to build privacy into our websites, libraries and information systems. This is possible. Engineers have tools such as strong cryptography that allow privacy to be built into systems without compromising functionality. It's not that engineers are immune from privacy-breaking mandates, but it's orders of magnitude more difficult to outlaw privacy engineering than it is to invalidate privacy policies. A system that doesn't record what a user does can't produce user activity records. Some facts are not alternativable. Math trumps Trump.

Friday, January 13, 2017

Google's "Crypto-Cookies" are tracking Chrome users

Ordinary HTTP cookies are used in many ways to make the internet work. Cookies help websites remember their users. A common use of cookies is for authentication: when you log into a website, the reason you stay logged is because of a cookie that contains your authentication info. Every request you make to the website includes this cookie; the website then knows to grant you access.

But there's a problem: someone might steal your cookies and hijack your login. This is particularly easy for thieves if your communication with the website isn't encrypted with HTTPS. To address the risk of cookie theft, the security engineers of the internet have been working on ways to protect these cookies with strong encryption. In this article, I'll call these "crypto-cookies", a term not used by the folks developing them. The Chrome user interface calls them Channel IDs.


Development of secure "crypto-cookies" has not been a straight path. A first approach, called "Origin Bound Certificates" has been abandoned. A second approach "TLS Channel IDs" has been implemented, then superseded by a third approach, "TLS Token Binding" (nicknamed "TokBind"). If you use the Chrome web browser, your connections to Google web services take advantage of TokBind for most, if not all, Google services.

This is excellent for security, but might not be so good for privacy; 3rd party content is the culprit. It turns out that Google has not limited crypto-cookie deployment to services like GMail and Youtube that have log-ins. Google hosts many popular utilities that don't get tracked by conventional cookies. Font libraries such as Google Fonts, javascript libraries such as jQuery, and app frameworks such as Angular, are all hosted on Google servers. Many websites load these resources from Google for convenience and fast load times.  In addition, Google utility scripts such as Analytics and Tag Manager are delivered from separate domains so that users are only tracked across websites if so configured.  But with Google Chrome (and Microsoft's Edge Browser), every user that visits any website using Google Analytics, Google Tag Manager, Google Fonts, JQuery, Angular, etc. are subject to tracking across websites by Google. According to Princeton's OpenWMP project, more than half of all websites embed content hosted on Google servers.
Top 3rd-party content hosts. From Princeton's OpenWMP.
Note that most of the hosts labeled "Non-Tracking Content"
are at this time subject to "crypto-cookie" tracking.


While using 3rd party content hosted by Google was always problematic for privacy-sensitive sites, the impact on privacy was blunted by two factors – cacheing and statelessness. If a website loads fonts from fonts.gstatic.com, or style files from fonts.googleapis.com, the files are cached by the browser and only loaded once per day. Before the rollout of crypto-cookies, Google had no way to connect one request for a font file with the next – the request was stateless; the domains never set cookies. In fact, Google says:
Use of Google Fonts is unauthenticated. No cookies are sent by website visitors to the Google Fonts API. Requests to the Google Fonts API are made to resource-specific domains, such as fonts.googleapis.com or fonts.gstatic.com, so that your requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail. 
But if you use Chrome, your requests for these font files are no longer stateless. Google can follow you from one website to the next, without using conventional tracking cookies.

There's worse. Crypto-cookies aren't yet recognized by privacy plugins like Privacy Badger, so you can be tracked even though you're trying not to be. The TokBind RFC also includes a feature called "Referred Token Binding" which is meant to allow federated authentication (so you can sign into one site and be recognized by another). In the hands of the advertising industry, this will get used for sharing of the crypto-cookie across domains.

To be fair, there's nothing in the crypto-cookie technology itself that makes the privacy situation any different from the status quo. But as the tracking mechanism moves into the web security layer, control of tracking is moved away from application layers. It's entirely possible that the parts of Google running services like gstatic.com and googleapis.com have not realized that their infrastructure has started tracking users. If so, we'll eventually see the tracking turned off.  It's also possible that this is all part of Google's evil master plan for better advertising, but I'm guessing it's just a deployment mistake.

So far, not many companies have deployed crypto-cookie technology on the server-side. In addition to Google and Microsoft, I find a few advertising companies that are using it.  Chrome and Edge are the only client side implementations I know of.

For now, web developers who are concerned about user privacy can no longer ignore the risks of embedding third party content. Web users concerned about being tracked might want to use Firefox for a while.

Notes:

  1. This blog is hosted on a Google service, so assume you're being watched. Hi Google!
  2. OS X Chrome saves the crypto-cookies in an SQLite file at "~/Library/Application Support/Google/Chrome/Default/Origin Bound Certs". 
  3. I've filed bug reports/issues for Google Fonts, Google Chrome, and Privacy Badger. 
  4. Dirk Balfanz, one of the engineers behind TokBind has a really good website that explains the ins and outs of what I call crypto-cookies.