Cohere Embed 4: Reducing Memory Footprint with No Loss in Search Quality

Discover how Cohere Embed 4 enables a 3x reduction in embedding size with no loss in search quality. Learn how to cut memory use and costs in Elasticsearch or OpenSearch with real-world hybrid search results.

Shrinking your vector embeddings can save memory and money – but will it hurt your search quality? In this post, we test Cohere’s new Embed 4 model, which promises state-of-the-art efficiency even at lower dimensions. Here’s how we cut our embedding size by 3x in a real-world search system, with no loss in retrieval performance.

On April 15 2025, Cohere introduced a new embedding model named Embed 4, which was claimed to “deliver state-of-the-art accuracy and efficiency”. The new embedding model was trained using the Matryoshka Representation Learning, a method that claims to create an embedding model whose embeddings could reasonably shrink without a significant loss in performance. This is a crucial characteristic for optimizing the memory footprint when working with Big Data frameworks such as Elasticsearch and Opensearch.

In this post, we aim to test this proposition on real world data and find out whether shrinking the embedding size using cohere’s new embedding model, does in fact preserve the retrieval performance.

Cohere Embed 4: Reducing Memory Footprint with No Loss in Search Quality

Background

Why Should I Care?

Setup

Dataset

Metrics

Results

Conclusions

Rehan

Leave a Reply Cancel reply