Authors: Junjie Qi, Ramil Bakhshyiev, Matthijs Douze, Gergely Szilvasy, Honghao Qiu, Vishal Gandhi, Meta
In this poster we describe some recent accomplishments and improvements of the FAISS (Foundational AI Similarity Search), an industry-leading open-source vector search and clustering tool developed by Meta over the past decade. This library has been recently enhanced through a collaboration with NVIDIA to integrate state-of-the-art GPU-accelerated algorithms from the cuVS library, resulting in significant performance improvements, including up to 12.3x faster index build times and 8.1x faster search latency, as demonstrated by benchmarks on large-scale datasets.
RaBitQ, now supported in Faiss, offers a novel binary quantization approach that enhances vector search by performing scalar quantization on transformed query vectors, achieving faster search speeds than traditional asymmetric binary quantization methods, with experimental results showing it performs as well or better than certain configurations of OPQ, PQFS, and LSQ for speed versus accuracy trade-offs, although current implementations lack SIMD optimizations which could further enhance performance.
The Offline Vector KNN Search workflow achieved a significant milestone by successfully processing 100s of billion embeddings, leveraging the power of Meta's internal infrastructure and Faiss library to demonstrate the feasibility of large-scale vector search, while also highlighting opportunities for future enhancements in approximate search, multi-GPU processing, and optimized indexing techniques.
A prototype for distributed clustering on GPUs achieved over a 10x reduction in latency compared to CPUs, training million centroids on a large dataset with high dimensions within 1 hour , compared to 10s hours on high memory CPUs, while maintaining near-identical data quality across cluster size and embedding-centroid distance metrics.
RaBitQ, now supported in Faiss, offers a novel binary quantization approach that enhances vector search by performing scalar quantization on transformed query vectors, achieving faster search speeds than traditional asymmetric binary quantization methods, with experimental results showing it performs as well or better than certain configurations of OPQ, PQFS, and LSQ for speed versus accuracy trade-offs, although current implementations lack SIMD optimizations which could further enhance performance.
The Offline Vector KNN Search workflow achieved a significant milestone by successfully processing 1 trillion embeddings using synthetic vectors, leveraging the power of Meta's internal infrastructure and Faiss library to demonstrate the feasibility of large-scale vector search, while also highlighting opportunities for future enhancements in approximate search, multi-GPU processing, and optimized indexing techniques.
A prototype for distributed clustering on GPUs achieved over a 10x reduction in latency compared to CPUs, training 2 million centroids on a 22 million dataset with 768 dimensions in just 1 hour using 16 GPUs, compared to 11 hours on 71 high memory CPUs, while maintaining near-identical data quality across cluster size and embedding-centroid distance metrics.