Introducing vector search with UltraWarm in Amazon OpenSearch Service

Amazon OpenSearch Service has been providing vector database capabilities to enable efficient vector similarity searches using specialized k-nearest neighbor (k-NN) indexes to customers since 2019. This functionality has supported various use cases such as semantic search, Retrieval Augmented Generation (RAG) with large language models (LLMs), and rich media searching. With the explosion of AI capabilities and the increasing creation of generative AI applications, customers are seeking vector databases with rich feature sets.

OpenSearch Service also offers a multi-tiered storage solution to its customers in the form of UltraWarm and Cold tiers. UltraWarm provides cost-effective storage for less-active data with query capabilities, though with higher latency compared to hot storage. Cold tier offers even lower-cost archival storage for detached indexes that can be reattached when needed. Moving data to UltraWarm makes it immutable, which aligns well with use cases where data updates are infrequent like log analytics.

Until now, there was a limitation where UltraWarm or Cold storage tiers couldn’t store k-NN indexes. As customers adopt OpenSearch Service for vector use cases, we’ve observed that they’re facing high costs due to memory and storage becoming bottlenecks for their workloads.

To provide similar cost-saving economics for larger datasets, we are now supporting k-NN indexes in both UltraWarm and Cold tiers. This will enable you to save costs, especially for workloads where:

A significant portion of your vector data is accessed less frequently (for example, historical product catalogs, archived content embeddings, or older document repositories)
You need isolation between frequently and infrequently accessed workloads, minimizing the need to scale hot tier instances to help prevent interference from indexes that can be moved to the warm tier

In this post, we discuss this new capability and its use cases, and provide a cost-benefit analysis in different scenarios.

New capability: K-NN indexes in UltraWarm and Cold tiers

You can now enable UltraWarm and Cold tiers for your k-NN indexes from OpenSearch Service version 2.17 and up. This feature is available for both new and existing domains upgraded to version 2.17. K-NN indexes created after OpenSearch Service version 2.x are eligible for migration to warm and cold tiers. K-NN indexes using various types of engines (FAISS, NMSLib, and Lucene) are eligible to migrate.

Use cases

This multi-tiered approach to k-NN vector search benefits the following various use cases:

Long-term semantic search – Maintain searchability on years of historical text data for legal, research, or compliance purposes
Evolving AI models – Store embeddings from multiple versions of AI models, allowing comparisons and backward compatibility without the cost of keeping all data in hot storage
Large-scale image and video similarity – Build extensive libraries of visual content that can be searched efficiently, even as the dataset grows beyond the practical limits of hot storage
Ecommerce product recommendations – Store and search through vast product catalogs, moving less popular or seasonal items to cheaper tiers while maintaining search capabilities

Let’s explore real-world scenarios to illustrate the potential cost benefits of using k-NN indexes with UltraWarm and Cold storage tiers. We will be using us-east-1 as the representative AWS Region for these scenarios.

Scenario 1: Balancing hot and warm storage for mixed workloads

Let’s say you have 100 million vectors of 768 dimensions (around 330 GB of raw vectors) spread across 20 Lucene engine indexes of 5 million vectors each (roughly 16.5 GB), out of which 50% of data (about 10 indexes or 165 GB) is queried infrequently.

Domain setup without UltraWarm support

In this approach, you prioritize maximum performance by keeping all of the data in hot storage, providing the fastest possible query responses for the vectors. You deploy a cluster with 6x r6gd.4xlarge instances.

The monthly cost for this setup comes to $7,550 per month with a data instance cost of $6,700.

Although this provides top-tier performance for the queries, it might be over-provisioned given the mixed access patterns of your data.

Cost-saving strategy: UltraWarm domain setup

In this approach, you align your storage strategy with the observed access patterns, optimizing for both performance and cost. The hot tier continues to provide optimal performance for frequently accessed data, while less critical data moves to UltraWarm storage.

While UltraWarm queries experience higher latency compared to hot storage—this trade-off is often acceptable for less frequently accessed data. Additionally, since UltraWarm data becomes immutable, this strategy works best for stable datasets that don’t require any updates.

You keep the frequently accessed 50% of data (roughly 165 GB) in hot storage, allowing you to reduce your hot tier to 3x r6gd.4xlarge.search instances. For the less frequently accessed 50% of data (roughly 165 GB), you introduce 2x ultrawarm1.medium.search instances as UltraWarm nodes. This tier offers a cost-effective solution for data that doesn’t require the absolute fastest access times.

By tiering your data based on access patterns, you significantly reduce your hot tier footprint while introducing a small warm tier for less critical data. This strategy allows you to maintain high performance for frequent queries while optimizing costs for the entire system.

The hot tier continues to provide optimal performance for the majority of queries targeting frequently accessed data. For the warm tier, you see an increase in latency for queries on less frequently accessed data, but this is mitigated by effective caching on the UltraWarm nodes. Overall, the system maintains high availability and fault tolerance.

This balanced approach reduces your monthly cost to $5,350, with $3,350 for the hot tier and $350 for the warm tier, reducing the monthly costs by roughly 29% overall.

Scenario 2: Managing Growing Vector Database with Access-Based Patterns

Imagine your system processes and indexes vast amounts of content (text, images, and videos), generating vector embeddings using the Lucene engine for advanced content recommendation and similarity search. As your content library grows, you’ve observed clear access patterns where newer or popular content is queried frequently while older or less popular content sees decreased activity but still needs to be searchable.

To effectively leverage tiered storage in OpenSearch Service, consider organizing your data into separate indices based on expected query patterns. This index-level organization is important because data migration between tiers happens at the index level, allowing you to move specific indices to cost-effective storage tiers as their access patterns change.

Your current dataset consists of 150 GB of vector data, growing by 50 GB monthly as new content is added. The data access patterns show:

About 30% of your content receives 70% of the queries, typically newer or popular items
Another 30% sees moderate query volume
The remaining 40% is accessed infrequently but must remain searchable for completeness and occasional deep analysis

Given these characteristics, let’s explore a single-tiered and multi-tiered approach to managing this growing dataset efficiently.

Single-tiered configuration

For a single-tiered configuration, as the dataset expands, the vector data will grow to be around 400 GB over 6 months, all stored in a hot (default) tier. In the case of r6gd.8xlarge.search instances, the data instance count would be around 3 nodes.

The overall monthly costs for the domain under a single-tiered setup would be around $8050 with a data instance cost of around $6700.

Multi-tiered configuration

To optimize performance and cost, you implement a multi-tiered storage strategy using Index State Management (ISM) policies to automate the movement of indices between tiers as access patterns evolve:

Hot tier – Stores frequently accessed indices for fastest access
Warm tier – Houses moderately accessed indices with higher latency
Cold tier – Archives rarely accessed indices for cost-effective long-term retention

For the data distribution, you start with a total of 150 GB with a monthly growth of 50 GB. The following is the projected data distribution when the data reaches 400 GB at around the 6 month mark:

Hot tier – Approximately 100 GB (most frequently queried content) on 1x r6gd.8xlarge
Warm Tier – Approximately 100 GB (moderately accessed content) on 2x ultrawarm1.medium.search
Cold Tier – Approximately 200 GB (rarely accessed content)

Under the multi-tiered setup, the cost for the vector data domain totals $3880, including $2330 cost of data nodes, $350 cost of UltraWarm nodes, and $5.00 of cold storage costs.

You see compute savings as the hot tier instance size reduced by around 66%. Your overall cost savings were around 50% year-over-year with multi-tiered domains.

Scenario 3: Large-scale disk-based vector search with UltraWarm

Let’s consider a system managing 1 billion vectors of 768 dimensions distributed across 100 indexes of 10 million vectors each. The system predominantly uses disk-based vector search with 32x FAISS quantization for cost optimization, and about 70% of queries target 30% of the data, making it an ideal candidate for tiered storage.

Domain setup without UltraWarm support

In this approach, using disk-based vector search to handle the large-scale data, you deploy a cluster with 4x r6gd.4xlarge instances. This setup provides adequate storage capacity while optimizing memory usage through disk-based search.

The monthly cost for this setup comes to $6,500 per month with a data instance cost of $4,470.

Cost-saving strategy: UltraWarm domain setup

In this approach, you align your storage strategy with the observed query patterns, similar to Scenario 1.

You keep the frequently accessed 30% of data in hot storage, using 1x r6gd.4xlarge instances. For the less frequently accessed 70% of data, you use 2x ultrawarm1.medium.search instances.

You use disk-based vector search in both storage tiers to optimize memory usage. This balanced approach reduces your monthly cost to $3,270, with $1,120 for the hot tier and $400 for the warm tier, reducing the monthly costs by roughly 50% overall.

Get started with UltraWarm and Cold storage

To take advantage of k-NN indexes in UltraWarm and Cold tiers, make sure that your domain is running OpenSearch Service 2.17 or later. For instructions to migrate k-NN indexes across storage tiers, refer to UltraWarm storage for Amazon OpenSearch Service.

Consider the following best practices for multi-tiered vector search:

Analyze your query patterns to optimize data placement across tiers
Use Index State Management (ISM) to manage the data lifecycle across tiers transparently
Monitor cache hit rates using the k-NN stats and adjust tiering and node sizing as needed

Summary

The introduction of k-NN vector search capabilities in UltraWarm and Cold tiers for OpenSearch Service marks a significant step forward in providing cost-effective, scalable solutions for vector search workloads. This feature allows you to balance performance and cost by keeping frequently accessed data in hot storage for lowest latency, while moving less active data to UltraWarm for cost savings. While UltraWarm storage introduces some performance trade-offs and makes data immutable, these characteristics often align well with real-world access patterns where older data sees fewer queries and updates.

We encourage you to evaluate your current vector search workloads and consider how this multi-tier approach could benefit your use cases. As AI and machine learning continue to evolve, we remain committed to enhancing our services to meet your growing needs.

Stay tuned for future updates as we continue to innovate and expand the capabilities of vector search in OpenSearch Service.

About the Authors

Kunal Kotwani is a software engineer at Amazon Web Services, focusing on OpenSearch core and vector search technologies. His major contributions include developing storage optimization solutions for both local and remote storage systems that help customers run their search workloads more cost-effectively.

Navneet Verma is a senior software engineer at AWS OpenSearch . His primary interests include machine learning, search engines and improving search relevancy. Outside of work, he enjoys playing badminton.

Sorabh Hamirwasia is a senior software engineer at AWS working on the OpenSearch Project. His primary interest include building cost optimized and performant distributed systems.

Introducing vector search with UltraWarm in Amazon OpenSearch Service