The Myth of Facial Recognition Bias

Zurah Shaker
Nov 28, 2022
7 min read

Updated: Apr 11

By Clearview AI BLOG

Since 2018, there has been a perpetual myth that facial recognition technology (FRT) is inaccurate, and worse, racially and demographically biased. It is a technology that has been under attack from activists on this basis. However, the technology has improved dramatically and is more accurate and advanced than the human eye.

According to the National Institute of Standards and Technology (NIST), which tests over 650 algorithms for accuracy, there are now over 100 algorithms that can match a photo out of a lineup of over 12 million photos, over 99% of the time.

WHAT IS NIST, & WHY IS IT THE MOST KNOWN TEST FOR FRT?

NIST is the world’s foremost expert in the independent evaluation of facial recognition algorithms for accuracy in verification and identification use cases. NIST’s Face Recognition Vendor Test (FRVT) accepts any algorithm submission ranging from reputable vendors and government-developed systems to experimental products. Even our geopolitical adversaries, such as China and Russia, submit their technology for testing by NIST and as of November 2022 over 650 algorithms have been evaluated in total.

There are two tests in particular: the NIST FRVT 1:1 and the NIST FRVT 1:N. The 1:1 testing scores each algorithm for positive verification. Given two facial photos, it evaluates the accuracy of a particular algorithm in correctly determining if they are the same person or not. It is broken down by testing each algorithm by different types of photos: mugshots, VISA photos, border photos, and the most difficult: WILD photos. WILD photos are photos of faces taken in all types of angles and under different lighting conditions.

The NIST FRVT 1:1 test also requires 1:1 matching with diverse demographics, genders, and ethnicities.

The NIST FRVT 1:N is a significantly harder test. Like the 1:1 test, it measures accuracy of each algorithm across the same types of categories. However, instead of just measuring the accuracy of matching two photos, it tests the accuracy of the algorithm to match a photo accurately out of a large set of millions of other photos.

The top 100 algorithms in the NIST FRVT 1:N test, in investigation mode, have over 99% accuracy for picking a photo out of a mugshot lineup of 12 million photos. This shows how advanced and phenomenally accurate the technology is in 2022 and much better than the human eye.

According to the Innocence Project, about 70% of known wrongful convictions involve eyewitness misidentification [1]. In these cases, not only is an innocent person victimized by being wrongfully convicted of a crime, but the real perpetrator remains at large and may victimize again. Accurate FRT can help prevent these tragedies.

As you can see, the NIST FRVT test is very rigorous. Other tests that have been done on FRT, such as tests done by the ACLU or the MIT Media Lab have methodological flaws, and are unfortunately cited as sources.

THE ACLU’S TEST OF RACIAL BIAS IN 2018

An often cited study about racial bias with FRT is one conducted by the ACLU of Northern California in 2018. From a scientific point of view, flaws with this test include:

The test is not reproducible. The ACLU did not and has not shared its code, sample data, or methodology for any kind of reproduction or additional testing.
The test is not exhaustive. The ACLU used the Amazon Rekognition FRT algorithm to power their test -- (which is not tested by NIST) and no other algorithm was used.
The test is skewed. The test deliberately set the sensitivity/accuracy “threshold” parameter of Amazon Rekognition algorithm at “80%” as opposed to the recommended and default threshold of 95%, which would yield more false positives.

Unfortunately, this is one of the most cited studies on the demographic bias of FRT, however, it had many methodological flaws, and lacks scientific rigor compared to NIST.

GENDER SHADES STUDY IN 2018 & 2019 BY THE MIT MEDIA LAB

Gender Shades, a study led by MIT Media Lab researchers, found that gender classification systems sold by IBM, Microsoft, and Face++ had an error rate as much as 34.4 percentage points higher for darker-skinned females than lighter-skinned males.

Below are the methodological flaws we identified with this Gender Shades study:

The test does not measure facial recognition. Instead, the test measured “gender classification,” rather than facial recognition. Gender classification is not a category used by law enforcement in the investigative process.
The test is limited and uses old technology. The algorithms used in this study such as IBM and Microsoft’s technology are not representative of modern-day facial recognition algorithms.
The test is skewed. The study treats all gender predictions the same, regardless of confidence scores, which skews the results to show more errors.

By conflating facial recognition with gender recognition, some people have called to ban FRT due to this misleading study.

The NIST FRVT test, by contrast, measures actual facial recognition accuracy and performance in the 1:1 test. The 1:1 test assesses and includes a much more rigorous breakdown of accuracy by gender and demographics, across over 650 algorithms, as opposed to a handful in the Gender Shades study.

THIRD-PARTY CITATIONS OF NIST THAT DEMONSTRATE RACIAL BIAS ARE MISLEADING

Even with NIST demonstrating that the top algorithms perform remarkably better than that of the human eye and exhibit undetectable racial bias, news articles incorrectly cite the myth of racial bias with FRT referencing NIST as a source. Below is a quote from Scientific American in an article titled - How NIST Tested Facial Recognition Algorithms for Racial Bias, published in December 2019:

Along with other findings, NIST’s tests revealed that many of these algorithms were 10 to 100 times more likely to inaccurately identify a photograph of a black or East Asian face, compared with a white one. In searching a database to find a given face, most of them picked incorrect images among black women at significantly higher rates than they did among other demographics.

However, the NIST report that is cited from December 2019 contains far more detail that is not shown in the Scientific American article:

The accuracy of algorithms used in this report has been documented in recent FRVT evaluation reports. These show a wide range in accuracy across algorithm developers, with the most accurate algorithms producing many fewer errors than lower-performing variants. More accurate algorithms produce fewer errors, and will be expected therefore to have smaller demographic differentials.

What articles like the above from Scientific American fail to disclose are the very minimal differentials in the top algorithms as ranked by NIST when it comes to demographic bias. Poor algorithms which have large demographic differentials are cherry picked as evidence of bias to convey a specific biased message of the author.

NIST testing shows that Clearview AI's facial recognition algorithm does not indicate any racial bias, and to this date, there are no known instances where Clearview AI's technology has resulted in a wrongful arrest.

In the NIST 1:1 FRVT that evaluates demographic accuracy, Clearview AI’s algorithm consistently achieved greater than 99% accuracy across all demographics.

WHAT NIST ACTUALLY SAID ON THE TOPIC OF DEMOGRAPHIC BIAS

In 2020, the Director of the Information Technology Laboratory for NIST, Dr. Charles Romine testified before the U.S. Homeland Security Committee that with the highest-performing algorithms they saw “undetectable” bias, further noting, that they did not see a “statistical level of significance” related to bias in these top-performing algorithms:

“In the highest performing algorithms for one-to-many matches, the highest performing algorithms, we saw undetectable, the bias, the demographic differentials that we were, that we were measuring, we say are undetectable in the report.” — Charles Romine, NIST Information Technology Laboratory Director Facial Recognition and Biometric Technology, C-SPAN (Feb. 6, 2020)

Indeed, NIST’s October 2021 evaluation of Clearview AI’s facial recognition algorithm found 99% accuracy for all demographics – highlighting the dependability and accuracy in advanced algorithms.

MACHINE LEARNING HAS REVOLUTIONIZED FACIAL RECOGNITION ACCURACY

Unlike older FRT algorithms which use outdated technological methods, such as manual measurements; neural networks and other modern artificial intelligence algorithms are trained on millions of sample images to increase accuracy. With more example photos, the better an algorithm can learn and cluster information resulting in a more accurate algorithm and match. Thus, training example photos from every demographic helps decrease racial bias issues and improves overall accuracy.

Since 2018, facial recognition algorithms have performed with over 99% accuracy for identifying a photo out of a gallery of 1 million photos. Previously, FRT was accurate for tagging your friends on Facebook (1 out of 1,000) and unlocking your iPhone X (1:1 matching), but there were no breakthroughs for the 1 out of 1 million search until 2018.

Clearview AI has been trained on a diverse dataset of all ethnicities to prevent racial bias in its algorithm. We are always improving the dataset and we have one of the largest training datasets in the world, resulting in a highly accurate algorithm across all demographics.

According to Patrick Grother, a computer scientist at NIST, on the effectiveness of neural networks:

“The test shows a wholesale uptake by the industry of convolutional neural networks, which didn’t exist five years ago. About 25 developers have algorithms that outperform the most accurate one we reported in 2014.” — Patrick Grother, Computer Scientist at NIST Author of NIST Interagency Report (NISTIR) 8238, 11/26/2018

At Clearview AI, we are constantly amazed at how much better machine learning and artificial intelligence algorithms have improved our algorithm in such a short period of time. Even though algorithms have far surpassed the accuracy of the human eye, we will continue to see improvements beyond what is currently imaginable. Hopefully those that are misinformed on FRT can soon see this as well.

THE FUTURE OF FACIAL RECOGNITION

Now bias-free facial recognition algorithms that have high degrees of accuracy and public perception recognizes the reality, the remaining debate is how this technology should be deployed and regulated.

It is important to know the facts and science when discussing life changing topics like the use of FRT in law enforcement, potential legislation, or regulation. Clearview AI believes that regulation is essential for powerful technology like FRT, and all the facts about the accuracy of the technology must be known before making any judgements or decisions regarding its use.

In the last 3 years of Clearview AI being deployed in the field, there have been no known wrongful arrests due to the use of our technology. Yet, we have seen many positive use cases with the technology such as rescuing children from child exploitation, solving financial crimes, and keeping our communities safe. Furthermore, we regularly hear stories from our law enforcement customers about how effective this life-saving technology is.

CLEARVIEW AI 2.0

PUBLIC SAFETY

FEDERAL

CRIMINAL INVESTIGATIONS

PUBLIC DEFENDERS

NATIONAL SECURITY & DEFENSE

SUCCESS STORIES

RESOURCES

BLOG

IMPACT

FAQS

TESTIMONIALS

EVENTS

COMPANY

COMPANY OVERVIEW

LEADERSHIP

CLEARVIEW AI PRINCIPLES

CAREERS

LEGAL

CONTACT

MEDIA

PRESS ROOM

MEDIA HIGHLIGHTS

VIDEOS