Data harvesting, Facebook, and the “10 year challenge”.

To be fair, I haven’t seen much concern expressed on the theme of this tweet,

Has anyone considered that Facebook's “How Hard Did Aging Hit You” Challenge is just a way to refine facial recognition technology? What better way to get people to offer up a ton of comparative visual data?
— Greg Britton (@gmbritton) January 13, 2019

but with over 5 000 retweets (at time of writing), it certainly seems like some people are convinced, and possibly concerned. I’m however not inclined to think that the “10 year challenge” is a sinister ploy by Facebook to harvest this freely-volunteered data, in order to improve its facial recognition technology.

First, those of us who post photographs of ourselves, and others, on Facebook are already giving Zuckerberg a day-by-day update, often geotagged, and usually with embedded EXIF data confirming the date and time it was taken. So, they can do an analysis of how users’ faces have changed over any timescale they like already, with a dataset that goes back to their 2004 launch.

A 10-year comparison isn’t obviously more valuable than a 5-year (or 15-year) one, but the point is that they can make any of these comparisons they like with the data they have, and – at least for a 5 or 10-year comparison – the dataset would already be massive, whether or not “challenges” like this take off.

Second, Facebook owns the rights to photographs posted on the platform, as the Terms of Service tell you, so there is no obstacle to them doing so:

Specifically, when you share, post or upload content that is covered by intellectual property rights (e.g. photos or videos) on or in connection with our Products, you grant us a non-exclusive, transferable, sub-licensable, royalty-free and worldwide licence to host, use, distribute, modify, run, copy, publicly perform or display, translate and create derivative works of your content (consistent with your privacy and application settings).

Here’s a Facebook thread from someone who has domain-specific knowledge, and who you might find more persuasive than I do. He says: “As a machine learning researcher let me just say that paired photos associated with the same name with a positive ID and the time interval is the gold standard data set to do this.”

Again, though, as much as this may be true – and it certainly sounds plausible – the 10-year challenge is surely going to add a small fraction of data to a vast set of photographs, making it difficult to see how it’s a unique concern, rather than just another example of how much data we freely share with Facebook and other services.

So, if you are concerned about our data footprint, fearing that we might one day end up with something like ~~Charlie Brooker’s~~ China’s “social credit score”, where things like playing too many video games can impact on your rating, which in turn affects things like the interest rate you might pay, that’s one thing – but you don’t demonstrate a consistent commitment to that concern by treating this “challenge” as something uniquely interesting (or even particularly new).

It’s easy to be concerned or outraged by an instance of a potential problem, when it bubbles to the surface thanks to trending on Twitter or being omnipresent on your Facebook feed. But, if we are concerned for a few hours or days, rather than seeing it as one fairly typical instance among many, it looks a lot more like “clicktivism” than genuine concern.

Yes, I think that there are legitimate concerns around the potential misuse of all the data we are naively (sometimes) sharing. But there are also clear benefits to these datasets – as this fairly balanced piece in Wired points out, the data can be used for things like tracking down lost children, especially so if we can model what they might look like after being missing for a couple of years.

It could also be used to track suspected or real criminals, or – once it’s sufficiently refined – to predict impending health issues, for example in detecting the development of unusual patches of colour on your face, thereby (for example) diagnosing melanoma while it’s still treatable.

And yes, it can also, of course, be used to track you. But most people who I’ve heard assert that this is a problem don’t go on to explain why this kind of tracking is innately worse than the tracking we happily sign up for when our insurance company offers us a discount on our health premiums for signing a document attesting to being a non-smoker, or the tracking we sign up to when wearing smartwatches, or in not disabling the GPS on our smartphones.

In summary, there might well be causes for concern in data-harvesting. But when we’re allowing so much of it already, some argument needs to be made for any particular instance being particularly problematic. If that case isn’t made, it’s difficult to see why something like this should add to our panopticon-paranoia.

By Jacques Rousseau