Amazon OpenSearch Service and Firehose: Rolling over indices automatically

Optimizing OpenSearch clusters with time-series data means optimizing the shard size and count. When ingesting data using AWS Data Firehose, this can be achieved using Rollover Indices. Read on for a guide on how to set this up.

When deciding how to organize data into indices in OpenSearch, the first consideration is what kind of data would be indexed into them – so, for example, data with different schemas would be indexed into separate indices (or separate index patterns).

The next step is to consider how much data would be indexed into each index. This needs to be optimized at a lower level – the shards, which are data units that the indices are composed of. Creating large shards might make them more difficult to read, and could be the cause of imbalance in the cluster, while having a lot of small shards can increase overhead and reduce performance.

For time-series data, optimizing the shard size and count is usually managed using Data Streams, as explained in this article. However, that is not possible when ingesting documents using AWS Data Firehose. Using Data Streams requires indexing requests to use the value “create” for the parameter “op_type”, which basically means providing a guarantee that the data is append-only. As Firehose uses the op_type value of “index”, we need some other solution.

Fortunately, there’s an alternative method that offers similar benefits without the op_type limitation – rollover indices. This approach uses an alias to manage a sequence of indices: data is written only to the most recent index, while all indices under the alias can be read from. When an index reaches a certain size, it’s rolled over and a new one is created, keeping shard sizes consistent and optimal.

To achieve this, we need to:

Create a lifecycle policy
Create the first index and the alias
Configure Firehose to write to the alias

Examples for each of those steps follow. The following applies to both non-managed and AWS Opensearch. For Elasticsearch the same basic idea applies with a different syntax for the policy.

Amazon OpenSearch Service and Firehose: Rolling over indices automatically

Creating an ISM policy to manage rollover

Creating the first index and creating the alias

Configure Firehose to write to the rollover alias

Moving to rollover indices

Other considerations

Rehan

Leave a Reply Cancel reply