reCAPTCHA: I’m Not a Robot But I’m Not Sure About You

Google’s reCAPTCHA v3 is a major advance in website security and user convenience. Unlike the previous versions of reCAPTCHA that forced human website readers to run through tests to prove we’re not bots, v3 uses a combination of data and machine learning to automate the decision.

While the automatic decision is going to be cheered by users (who really wants to click on parts of an image before leaving a blog comment?), the chattering class is asking questions about privacy.

It’s reasonable to ask, but there’s not much to see here that’s not already routine across the web. How long this lasts is anyone’s guess, but it won’t be forever.

The Robot Menace

First, let’s recall why there’s such a thing as reCAPTCHA to begin with. Google search rewards websites with lots of inbound links with high rankings. (Note: it’s more complicated than that, but this is the starting point.)

Scammers who are selling counterfeit goods on the Internet wants their sites to rank high because traffic equals money. The easiest way to create a lot of inbound links to a bogus site is to exploit website comments and user reviews with robotic comments with embedded links.

We’ve all seen such comments, and those of us who blog spend inordinate amounts of time getting rid of them. Bot comments are the Internet’s parasites, and we’ve had to develop a number of medicines to control them. ReCAPTCHA is one such medicine.

How ReCAPTCHA v3 Works

Google returns a “humanity score” from 0 to 1 based on the user’s interactions with the website. The company explains the reCAPTCHA v3 process on its developer page:

reCAPTCHA v3 returns a score for each request without user friction. The score is based on interactions with your site and enables you to take an appropriate action for your site…reCAPTCHA learns by seeing real traffic on your site. For this reason, scores in a staging environment or soon after implementing may differ from production. As reCAPTCHA v3 doesn’t ever interrupt the user flow, you can first run reCAPTCHA without taking action and then decide on thresholds by looking at your traffic in the admin console. By default, you can use a threshold of 0.5. [emphasis added]

When you enable reCAPTCHA v3 on your blog, you allow Google to see real traffic on your site. This probably sounds scary. In fact, one of the web’s more trollish sites, Fast Company, did a whole piece on how scary this may be. The TL;DR is that reCAPTCHA v3 enhances security at the expense of privacy.

But…This is the Way the Web’s Business Model Works

Most websites – especially the big ones – are already tracked by Google and a host of other companies. Install the Ghostery plugin and you can easily see how many trackers are embedded in the websites you visit.

The most common Google trackers are Google Analytics and DoubleClick. Most websites that use reCAPTCHA v3 already use these other trackers, so they’re not sending new information to Google.

What they are doing is getting some benefit from the info they already send. ReCAPTCHA v3 is only scary if you don’t understand what’s been going on forever: the web is funded by advertising, and web advertising is controlled by a small number of companies that collect and monetize web behavior.

The Policy Question

Google is privy to more information about your use of the web than any other company. Not only does the company see your searches, it reads the email you send to Gmail users, and sees most of the web pages you visit in full, unencrypted form.

Google appears to have an insatiable appetite for even more information. This is perfectly understandable because all companies want to grow and the only path to growth that has ever worked for Google is services that process information.

Google’s foray into the ISP business was a failure because the ISP sector does not generally depend on the collection and resale of information. But there is one element of this sector that is purely an information processing function, DNS. Domain lookups are the creamy essence of the limited browsing information that’s available to ISPs.

Seriously, I Promise!

Google’s Appetite for Personal Information

What Google Collects

Because its DNS hasn’t been successful, Google has developed a method of taking DNS provider choice away from users by embedding it in browsers. But Google promises it won’t use DNS lookups for advertising purposes.

As long as this is the case, privacy nuts needn’t worry about Google’s new DNS variation, DNS over HTTPS (DoH.) And the company makes the same promise for reCAPTCHA v3.

Historically, the value of Google’s privacy promises hasn’t been very durable. A recent analysis by the New York Times Privacy Project, Google’s 4,000-Word Privacy Policy Is a Secret History of the Internet, shows just how dramatically these data collection practices have changed since Google published its first privacy policy in 1999.

Conclusion

We can believe Google will not resell or otherwise monetize reCAPTHA v3 data today, but we should also believe it will monetize it in the future. Google is not a charitable enterprise, and essentially all of its income is made by monetizing personal information.

If you’re concerned about allowing Google too much access to your browsing information there’s not much you can do beyond refusing to use the Chrome browser. Ironically, the best choices for privacy-enhanced browsing are built on the Chromium open source project’s rendering engine, the same on used by Chrome.

But the better browsers don’t report nearly as much information back to Google. The best examples are Brave and a new version of Microsoft’s Edge browser that’s current in beta. Brave has some problems, but the new Edge browser is solid.