There are 150 child sexual abuse laws around the world. Now, metadata is making it easier for countries to work together.

Each day, a team of analysts in the UK faces a seemingly endless mountain of horrors. The team of 21, who work at the Internet Watch Foundation’s office in Cambridgeshire, spend hours trawling through images and videos containing child sexual abuse. And, each time they find a photo or piece of footage it needs to be assessed and labeled. Last year alone the team identified 153,383 web pages with links to child sexual abuse imagery. This creates a vast database that can then be shared internationally in an attempt to stem the flow of abuse. The problem? Different countries have different ways of categorizing images and videos.

Until now, analysts at the UK-based child protection charity have checked to see whether the material they find falls into three categories: either A, B, or C. These groupings are based on the UK’s laws and sentencing guidelines for child sexual abuse and broadly set out types of abuse. Images in category A, for example, the most severe classification, include the worst crimes against children. These classifications are then used to work out how long someone convicted of a crime should be sentenced for. But other countries use different classifications.

Now the IWF believes a data breakthrough could remove some of these differences. The group has rebuilt its hashing software, dubbed Intelligrade, to automatically match up images and videos to the rules and laws of Australia, Canada, New Zealand, the US, and the UK, also known as the Five Eyes countries. The change should mean less duplication of analytical work and make it easier for tech companies to prioritize the most serious images and videos of abuse first.

“We believe that we are better able to share data so that it can be used in meaningful ways by more people, rather than all of us just working in our own little silos,” says Chris Hughes, the director of the IWF’s reporting hotline. “Currently, when we share data it is very difficult to get any meaningful comparisons against the data because they simply don’t mesh correctly.”

Countries place different weightings on images based on what happens in them and the age of the children involved. Some countries classify images based on whether children are prepubescent or pubescent as well as the crime that is taking place. The UK’s most serious category, A, includes penetrative sexual activity, beastiality, and sadism. It doesn’t necessarily include acts of masturbation, Hughes says. Whereas in the US this falls in a higher category. “At the moment, the US requesting IWF category A images would be missing out on that level of content,” Hughes says.

All the photos and videos the IWF looks at are given a hash, essentially a code, that’s shared with tech companies and law enforcement agencies around the world. These hashes are used to detect and block the known abuse content being uploaded to the web again. The hashing system has had a substantial impact on the spread of child sexual abuse material online, but the IWF’s latest tool adds significantly new information to each hash.

The IWF’s secret weapon is metadata. This is data that’s about data—it can be the what, who, how, and when of what is contained in the images. Metadata is a powerful tool for investigators, as it allows them to spot patterns in people’s actions and analyze them for trends. Among the biggest proponents of metadata are spies, who say it can be more revealing than the content of people’s messages.

The IWF has ramped up the amount of metadata it creates for each image and video it adds to its hash list, Hughes says. Each new image or video it looks at is being assessed in more detail than ever before. As well as working out if sexual abuse content falls under the UK’s three groups, its analysts are now adding up to 20 different pieces of information to their reports. These fields match what is needed to determine the classifications of an image in the other Five Eyes countries—the charity’s policy staff compared each of the laws and worked out what metadata is needed. “We decided to provide a high level of granularity about describing the age, a high level of granularity in terms of depicting what’s taking place in the image, and also confirming gender,” Hughes says.

Improvements in abuse-detection technologies and more thorough processes by technology companies mean that more sexual abuse content is being found than ever before—although some companies are better at this than others. Last year the nonprofit National Center for Missing and Exploited Children received 21.4 million reports of abuse content from technology companies, which are required by US law to report what they find. It was more than any other year on record, and the reports contained 65.4 million images, videos, and other files.

Despite the increase in reporting of child abuse material, one of the big challenges faced is the different reporting processes and standards around the world. It’s difficult to gather a full picture of the true scale of child sexual abuse online because of the differences in approaches. A 2018 legal review from the US-based nonprofit the International Centre of Missing and Exploited Children found a lot of inconsistencies. The review claims 118 countries have “sufficient” child sexual-abuse material laws, 62 have laws that are insufficient, and 16 countries don’t have any. Some countries with poor laws don’t define child sexual abuse, others don’t look at how technology is used in crimes, and some don’t criminalize the possession of abuse content.

Separately, European Union–funded research conducted by the international policing group Interpol and ECPAT International, a series of civil society organizations, found that there are “substantial challenges” with comparing information about child sexual abuse content, and that this hampers efforts to find the victims. “This situation is complicated by the use of different categorization approaches in ascribing victim characteristics and experiences of victimization, which prohibit meaningful comparison between studies,” the February 2018 report says.

The IWF hopes its Intelligrade system will help out with some of these problems. “It almost reduces the need to create one law around the world that exists for child sexual abuse,” says Emma Hardy, the IWF’s director of communications. Previous academic research has recommended countries work on making their laws against child sexual abuse the same; although this is a logistical and political challenge. “The technology is filling the big gaps of legal harmonization,” Hardy says. The IWF is now researching more countries where its tool could plot images against the laws—20 countries are on a long list.

A spokesperson for Google, which receives data from the IWF, says the increased granularity in the data should prove to be useful. “This new system will help this fight by making it easier for companies—large and small—to know what hashes are in IWF’s Intelligrade and how they correspond to imagery that is illegal under different and complex legal regimes,” the spokesperson says. They add that the “additional metadata” can help in the fight against child sexual abuse online. “Having a clear mapping of the classification across jurisdictions will help NGOs, industry, and lawmakers identify differences in policies and regulation and hopefully result in better legislative outcomes,” the spokesperson says.

But beyond trying to close some of the legal gaps, Hughes says including more metadata in the work the IWF analysts do will help everyone understand the types of abuse that are happening and fight back against them. This is worth the extra time it will take IWF staff to review images, he says.

By including details such as the sexual abuse seen in photos and videos, analysts will be able to more clearly evaluate the types of abuse they are seeing and determine if criminal behavior is changing. The IWF will be able to know how many instances of specific types of abuse are happening and the broad age groups of victims. It will also be able to tell which types of abuse are most commonly shared to which websites. Intelligrade is also being used to pull in and store the file names of child sexual-abuse content, which can be used to understand the coded language child abusers use to talk to each other.

And adding extra data to images means that machine-learning systems can be trained to more successfully detect different types of abuse. A database with more labels means images can be better understood by AI—the IWF is currently in the process of classifying millions of category A and B images it has access to from the UK government’s Child Abuse Image Database. “The plan is to be able to train classifiers not just to say whether an image is criminal, but to fine-tune that classification to be able to say whether it contains a certain criminal act,” he says.

The approach will allow the IWF to scour the web in search of new images and videos of child sexual abuse. “That’s when we can start to reap the benefits for our own use,” Hughes says. “It means that we can better target content.”

This story originally appeared on WIRED UK. 


More Great WIRED Stories

This post was originally published on Wired Top News