top of page

How We Store and Search 30 Billion Faces

By Terence Liu BLOG

Overview of a vector database: search and add flows.
Overview of a vector database: search and add flows.

Our team of dedicated engineers has been working around the clock to enhance our product’s capabilities, and we have learned a lot along the way. We are looking forward to sharing some of the challenges and insights to a wider audience. This article is just the start of an exciting series of Engineering write-ups. We hope you find it interesting!


The Clearview AI platform has evolved significantly over the past few years, with our database growing from a few million face images to an astounding 30 billion today. As the database has grown, so has the complexity of efficiently processing, storing, and searching this massive amount of data. Our team has embraced this challenge, developing innovative solutions to tackle these technical issues on the deca-billion scale.


The process of generating an embedding vector for a given face.
The process of generating an embedding vector for a given face.

Face search is a complex process that involves several steps. First, a face image is uploaded as a probe. The image is then preprocessed, which includes alignment and cropping. Next, the preprocessed image is turned into a numeric vector through embedding extraction. We do this using neural networks, which perform significantly better compared to conventional methods. Finally, the probe’s embedding vector is compared against relevant parts of a database, and the database returns a set of potential matches if any is present, ranked by their similarity to the input face. Based on similarity thresholds and a human-in-the-loop investigative process, the results can then be used to assist in the process of identifying an individual or verifying their identity.