Retrieval Augmented Generation (RAG) is a well-known approach to creating generative AI applications. RAG combines large language models (LLMs) with external world knowledge retrieval and is increasingly popular for adding accuracy and personalization to AI. It retrieves relevant information from external sources, augments the input with this data, and generates responses based on both. This approach reduces hallucinations, improves fact accuracy, and allows for up-to-date, efficient, and explainable AI systems. RAG’s ability to break through classical language model limitations has made it applicable to broad AI use cases.
Amazon OpenSearch Service is a versatile search and analytics tool. It is capable of performing security analytics, searching data, analyzing logs, and many other tasks. It can also work with vector data with a k-nearest neighbors (k-NN) plugin, which makes it helpful for more complex search strategies. Because of this feature, OpenSearch Service can serve as a knowledge base for generative AI applications that integrate language generation with search results.
By preserving context over several exchanges, honing responses, and providing a more seamless user experience, conversational search enhances RAG. It helps with complex information needs, resolves ambiguities, and manages multi-turn reasoning. Conversational search provides a more natural and personalized interaction, yielding more accurate and pertinent results, even though standard RAG performs well for single queries.
In this post, we explore conversational search, its architecture, and various ways to implement it.
Solution overview
Let’s walk through the solution to build conversational search. The following diagram illustrates the solution architecture.
The new OpenSearch feature known as agents and tools is used to create conversational search. To develop sophisticated AI applications, agents coordinate a variety of machine learning (ML) tasks. Every agent has a number of tools; each intended for a particular function. To use agents and tools, you need OpenSearch version 2.13 or later.
Prerequisites
To implement this solution, you need an AWS account. If you don’t have one, you can create an account. You also need an OpenSearch Service domain with OpenSearch version 2.13 or later. You can use an existing domain or create a new domain.
To use the Amazon Titan Text Embedding and Anthropic Claude V1 models in Amazon Bedrock, you need to enable access to these foundation models (FMs). For instructions, refer to Add or remove access to Amazon Bedrock foundation models.
Configure IAM permissions
Complete the following steps to set up an AWS Identity and Access Management (IAM) role and user with appropriate permissions:
- Create an IAM role with the following policy that will allow the OpenSearch Service domain to invoke the Amazon Bedrock API:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Statement1", "Effect": "Allow", "Action": [ "bedrock:InvokeAgent", "bedrock:InvokeModel" ], "Resource": [ "arn:aws:bedrock:${Region}::foundation-model/amazon.titan-embed-text-v1", "arn:aws:bedrock: ${Region}::foundation-model/anthropic.claude-instant-v1" ] } ] }
Depending on the AWS Region and model you use, specify those in the Resource section.
- Add
opensearchservice.amazonaws.com
as a trusted entity. - Make a note of the IAM role Amazon Resource name (ARN).
- Assign the preceding policy to the IAM user that will create a connector.
- Create a
passRole
policy and assign it to IAM user that will create the connector using Python:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::${AccountId}:role/OpenSearchBedrock" } ] }
- Map the IAM role you created to the OpenSearch Service domain role using the following steps:
Establish a connection to the Amazon Bedrock model using the MLCommons plugin
In order to identify patterns and relationships, an embedding model transforms input data—such as words or images—into numerical vectors in a continuous space. Similar objects are grouped together to make it easier for AI systems to comprehend and respond to intricate user enquiries.
Semantic search concentrates on the purpose and meaning of a query. OpenSearch stores data in a vector index for retrieval and transforms it into dense vectors (lists of numbers) using text embedding models. We are using amazon.titan-embed-text-v1 hosted on Amazon Bedrock, but you will need to evaluate and choose the right model for your use case. The amazon.titan-embed-text-v1 model maps sentences and paragraphs to a 1,536-dimensional dense vector space and is optimized for the task of semantic search.
Complete the following steps to establish a connection to the Amazon Bedrock model using the MLCommons plugin:
- Establish a connection by using the Python client with the connection blueprint.
- Modify the values of the host and region parameters in the provided code block. For this example, we’re running the program in Visual Studio Code with Python version 3.9.6, but newer versions should also work.
- For the role ARN, use the ARN you created earlier, and run the following script using the credentials of the IAM user you created:
import boto3 import requests from requests_aws4auth import AWS4Auth host="https://search-test.us-east-1.es.amazonaws.com/" region = 'us-east-1' service="es" credentials = boto3.Session().get_credentials() awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token) path="_plugins/_ml/connectors/_create" url = host + path payload = { "name": "Amazon Bedrock Connector: embedding", "description": "The connector to bedrock Titan embedding model", "version": 1, "protocol": "aws_sigv4", "parameters": { "region": "us-east-1", "service_name": "bedrock", "model": "amazon.titan-embed-text-v1" }, "credential": { "roleArn": "arn:aws:iam::
:role/opensearch_bedrock_external" }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke", "headers": { "content-type": "application/json", "x-amz-content-sha256": "required" }, "request_body": "{ \"inputText\": \"${parameters.inputText}\" }", "pre_process_function": "connector.pre_process.bedrock.embedding", "post_process_function": "connector.post_process.bedrock.embedding" } ] } headers = {"Content-Type": "application/json"} r = requests.post(url, auth=awsauth, json=payload, headers=headers, timeout=15) print(r.status_code) print(r.text) - Run the Python program. This will return
connector_id
.python3 connect_bedrocktitanembedding.py 200 {"connector_id":"nbBe65EByVCe3QrFhrQ2"}
- Create a model group against which this model will be registered in the OpenSearch Service domain:
POST /_plugins/_ml/model_groups/_register { "name": "embedding_model_group", "description": "A model group for bedrock embedding models" }
You get the following output:
{ "model_group_id": "1rBv65EByVCe3QrFXL6O", "status": "CREATED" }
- Register a model using
connector_id
andmodel_group_id
:POST /_plugins/_ml/models/_register { "name": "titan_text_embedding_bedrock", "function_name": "remote", "model_group_id": "1rBv65EByVCe3QrFXL6O", "description": "test model", "connector_id": "nbBe65EByVCe3QrFhrQ2", "interface": {} }
You get the following output:
{
"task_id": "2LB265EByVCe3QrFAb6R",
"status": "CREATED",
"model_id": "2bB265EByVCe3QrFAb60"
}
- Deploy a model using the model ID:
You get the following output:
{
"task_id": "bLB665EByVCe3QrF-slA",
"task_type": "DEPLOY_MODEL",
"status": "COMPLETED"
}
Now the model is deployed, and you will see that in OpenSearch Dashboards on the OpenSearch Plugins page.
Create an ingestion pipeline for data indexing
Use the following code to create an ingestion pipeline for data indexing. The pipeline will establish a connection to the embedding model, retrieve the embedding, and then store it in the index.
PUT /_ingest/pipeline/cricket_data_pipeline {
"description": "batting score summary embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "GQOsUJEByVCe3QrFfUNq",
"field_map": {
"cricket_score": "cricket_score_embedding"
}
}
}
]
}
Create an index for storing data
Create an index for storing data (for this example, the cricket achievements of batsmen). This index stores raw text and embeddings of the summary text with 1,536 dimensions and uses the ingest pipeline we created in the previous step.
PUT cricket_data {
"mappings": {
"properties": {
"cricket_score": {
"type": "text"
},
"cricket_score_embedding": {
"type": "knn_vector",
"dimension": 1536,
"space_type": "l2",
"method": {
"name": "hnsw",
"engine": "faiss"
}
}
}
},
"settings": {
"index": {
"knn": "true"
}
}
}
Ingest sample data
Use the following code to ingest the sample data for four batsmen:
POST _bulk?pipeline=cricket_data_pipeline
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Sachin Tendulkar, often hailed as the 'God of Cricket,' amassed an extraordinary batting record throughout his 24-year international career. In Test cricket, he played 200 matches, scoring a staggering 15,921 runs at an average of 53.78, including 51 centuries and 68 half-centuries, with a highest score of 248 not out. His One Day International (ODI) career was equally impressive, spanning 463 matches where he scored 18,426 runs at an average of 44.83, notching up 49 centuries and 96 half-centuries, with a top score of 200 not out – the first double century in ODI history. Although he played just one T20 International, scoring 10 runs, his overall batting statistics across formats solidified his status as one of cricket's all-time greats, setting numerous records that stand to this day."}
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Virat Kohli, widely regarded as one of the finest batsmen of his generation, has amassed impressive statistics across all formats of international cricket. As of April 2024, in Test cricket, he has scored over 8,000 runs with an average exceeding 50, including numerous centuries. His One Day International (ODI) record is particularly stellar, with more than 12,000 runs at an average well above 50, featuring over 40 centuries. In T20 Internationals, Kohli has maintained a high average and scored over 3,000 runs. Known for his exceptional ability to chase down targets in limited-overs cricket, Kohli has consistently ranked among the top batsmen in ICC rankings and has broken several batting records throughout his career, cementing his status as a modern cricket legend."}
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Adam Gilchrist, the legendary Australian wicketkeeper-batsman, had an exceptional batting record across formats during his international career from 1996 to 2008. In Test cricket, Gilchrist scored 5,570 runs in 96 matches at an impressive average of 47.60, including 17 centuries and 26 half-centuries, with a highest score of 204 not out. His One Day International (ODI) record was equally remarkable, amassing 9,619 runs in 287 matches at an average of 35.89, with 16 centuries and 55 half-centuries, and a top score of 172. Gilchrist's aggressive batting style and ability to change the course of a game quickly made him one of the most feared batsmen of his era. Although his T20 International career was brief, his overall batting statistics, combined with his wicketkeeping skills, established him as one of cricket's greatest wicketkeeper-batsmen."}
{"index": {"_index": "cricket_data"}}
{"cricket_score": "Brian Lara, the legendary West Indian batsman, had an extraordinary batting record in international cricket during his career from 1990 to 2007. In Test cricket, Lara amassed 11,953 runs in 131 matches at an impressive average of 52.88, including 34 centuries and 48 half-centuries. He holds the record for the highest individual score in a Test innings with 400 not out, as well as the highest first-class score of 501 not out. In One Day Internationals (ODIs), Lara scored 10,405 runs in 299 matches at an average of 40.48, with 19 centuries and 63 half-centuries. His highest ODI score was 169. Known for his elegant batting style and ability to play long innings, Lara's exceptional performances, particularly in Test cricket, cemented his status as one of the greatest batsmen in the history of the game."}
Deploy the LLM for response generation
Use the following code to deploy the LLM for response generation. Modify the values of host, region, and roleArn in the provided code block.
- Create a connector by running the following Python program. Run the script using the credentials of the IAM user created earlier.
import boto3 import requests from requests_aws4auth import AWS4Auth host="https://search-test.us-east-1.es.amazonaws.com/" region = 'us-east-1' service="es" credentials = boto3.Session().get_credentials() awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token) path="_plugins/_ml/connectors/_create" url = host + path payload = { "name": "BedRock Claude instant-v1 Connector ", "description": "The connector to BedRock service for claude model", "version": 1, "protocol": "aws_sigv4", "parameters": { "region": "us-east-1", "service_name": "bedrock", "anthropic_version": "bedrock-2023-05-31", "max_tokens_to_sample": 8000, "temperature": 0.0001, "response_filter": "$.completion" }, "credential": { "roleArn": "arn:aws:iam::accountId:role/opensearch_bedrock_external" }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/anthropic.claude-instant-v1/invoke", "headers": { "content-type": "application/json", "x-amz-content-sha256": "required" }, "request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature}, \"anthropic_version\":\"${parameters.anthropic_version}\" }" } ] } headers = {"Content-Type": "application/json"} r = requests.post(url, auth=awsauth, json=payload, headers=headers, timeout=15) print(r.status_code) print(r.text)
If it ran successfully, it would return connector_id
and a 200-response code:
200
{"connector_id":"LhLSZ5MBLD0avmh1El6Q"}
- Create a model group for this model:
POST /_plugins/_ml/model_groups/_register { "name": "claude_model_group", "description": "This is an example description" }
This will return model_group_id; make a note of it:
{
"model_group_id": "LxLTZ5MBLD0avmh1wV4L",
"status": "CREATED"
}
- Register a model using
connection_id
andmodel_group_id
:POST /_plugins/_ml/models/_register { "name": "anthropic.claude-v1", "function_name": "remote", "model_group_id": "LxLTZ5MBLD0avmh1wV4L", "description": "LLM model", "connector_id": "LhLSZ5MBLD0avmh1El6Q", "interface": {} }
It will return model_id
and task_id
:
{
"task_id": "YvbVZ5MBtVAPFbeA7ou7",
"status": "CREATED",
"model_id": "Y_bVZ5MBtVAPFbeA7ovb"
}
- Finally, deploy the model using an API:
The status will show as COMPLETED
. That means the model is successfully deployed.
{
"task_id": "efbvZ5MBtVAPFbeA7otB",
"task_type": "DEPLOY_MODEL",
"status": "COMPLETED"
}
Create an agent in OpenSearch Service
An agent orchestrates and runs ML models and tools. A tool performs a set of specific tasks. For this post, we use the following tools:
VectorDBTool
– The agent use this tool to retrieve OpenSearch documents relevant to the user questionMLModelTool
– This tool generates user responses based on prompts and OpenSearch documents
Use the embedding model_id
in VectorDBTool
and LLM model_id
in MLModelTool
:
POST /_plugins/_ml/agents/_register {
"name": "cricket score data analysis agent",
"type": "conversational_flow",
"description": "This is a demo agent for cricket data analysis",
"app_type": "rag",
"memory": {
"type": "conversation_index"
},
"tools": [
{
"type": "VectorDBTool",
"name": "cricket_knowledge_base",
"parameters": {
"model_id": "2bB265EByVCe3QrFAb60",
"index": "cricket_data",
"embedding_field": "cricket_score_embedding",
"source_field": [
"cricket_score"
],
"input": "${parameters.question}"
}
},
{
"type": "MLModelTool",
"name": "bedrock_claude_model",
"description": "A general tool to answer any question",
"parameters": {
"model_id": "gbcfIpEByVCe3QrFClUp",
"prompt": "\n\nHuman:You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say don't know. \n\nContext:\n${parameters.cricket_knowledge_base.output:-}\n\n${parameters.chat_history:-}\n\nHuman:${parameters.question}\n\nAssistant:"
}
}
]
}
This returns an agent ID; take note of the agent ID, which will be used in subsequent APIs.
Query the index
We have batting scores of four batsmen in the index. For the first query, let’s specify the player name:
POST /_plugins/_ml/agents//_execute {
"parameters": {
"question": "What is batting score of Sachin Tendulkar ?"
}
}
Based on context and available information, it returns the batting score of Sachin Tendulkar. Note the memory_id from the response; you will need it for subsequent questions in the next steps.
We can ask a follow-up question. This time, we don’t specify the player name and expect it to answer based on the earlier question:
POST /_plugins/_ml/agents//_execute {
"parameters": {
"question": " How many T20 international match did he play?",
"next_action": "then compare with Virat Kohlis score",
"memory_id": "so-vAJMByVCe3QrFYO7j",
"message_history_limit": 5,
"prompt": "\n\nHuman:You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say don't know. \n\nContext:\n${parameters.population_knowledge_base.output:-}\n\n${parameters.chat_history:-}\n\nHuman:always learn useful information from chat history\nHuman:${parameters.question}, ${parameters.next_action}\n\nAssistant:"
}
}
In the preceding API, we use the following parameters:
Question
andNext_action
– We also pass the next action to compare Sachin’s score with Virat’s score.Memory_id
– This is memory assigned to this conversation. Use the samememory_id
for subsequent questions.Prompt
– This is the prompt you give to the LLM. It includes the user’s question and the next action. The LLM should answer only using the data indexed in OpenSearch and must not invent any information. This way, you prevent hallucination.
Refer to ML Model tool for more details about setting up these parameters and the GitHub repo for blueprints for remote inferences.
The tool stores the conversation history of the questions and answers in the OpenSearch index, which is used to refine answers by asking follow-up questions.
In real-world scenarios, you can map memory_id
against the user’s profile to preserve the context and isolate the user’s conversation history.
We have demonstrated how to create a conversational search application using the built-in features of OpenSearch Service.
Clean up
To avoid incurring future charges, delete the resources created while building this solution:
Conclusion
In this post, we demonstrated how to use OpenSearch agents and tools to create a RAG pipeline with conversational search. By integrating with ML models, vectorizing questions, and interacting with LLMs to improve prompts, this configuration oversees the entire process. This method allows you to quickly develop AI assistants that are ready for production without having to start from scratch.
If you’re building a RAG pipeline with conversational history to let users ask follow-up questions for more refined answers, give it a try and share your feedback or questions in the comments!
About the author
Bharav Patel is a Specialist Solution Architect, Analytics at Amazon Web Services. He primarily works on Amazon OpenSearch Service and helps customers with key concepts and design principles of running OpenSearch workloads on the cloud. Bharav likes to explore new places and try out different cuisines.