add sagemaker embedding model tutorial

Signed-off-by: Yaliang Wu <[email protected]>
ylwu-amzn · Jan 25, 2024 · bfe0be3 · bfe0be3
1 parent bba0f08
commit bfe0be3
Show file tree

Hide file tree

Showing 4 changed files with 424 additions and 6 deletions.
diff --git a/docs/remote_inference_blueprints/sagemaker_connector_blueprint.md b/docs/remote_inference_blueprints/sagemaker_connector_blueprint.md
@@ -1,5 +1,28 @@
 ### Sagemaker connector blueprint example for embedding:
 
+Read more details on https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/
+
+Make sure your Sagemaker model input follow such format, so the [default pre-process function](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/#preprocessing-function) can work
+```
+["hello world", "how are you"]
+```
+and output follow such format, so the [default post-process function](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/#post-processing-function) can work
+```
+[
+  [
+    -0.048237994,
+    -0.07612697,
+    ...
+  ],
+  [
+    0.32621247,
+    0.02328475,
+    ...
+  ]
+]
+```
+
+Then, you can create Sagemaker embedding model with default pre/post process function:
 ```json
 POST /_plugins/_ml/connectors/_create
 {
@@ -23,8 +46,10 @@ POST /_plugins/_ml/connectors/_create
       "headers": {
         "content-type": "application/json"
       },
-      "url": "<PLEASE ADD YOUR Sagemaker MODEL ENDPOINT URL>",
-      "request_body": "<PLEASE ADD YOUR REQUEST BODY. Example: ${parameters.inputs}>"
+      "url": "<PLEASE ADD YOUR Sagemaker MODEL INFERENCE ENDPOINT URL>",
+      "request_body": "${parameters.inputs}",
+      "pre_process_function": "connector.pre_process.default.embedding",
+      "post_process_function": "connector.post_process.default.embedding"
     }
   ]
 }

diff --git a/docs/tutorials/aws/semantic_search_with_bedrock_titan_embedding_model.md b/docs/tutorials/aws/semantic_search_with_bedrock_titan_embedding_model.md
@@ -1,5 +1,9 @@
 # Topic
 
+> The easiest way for setting up Bedrock titan embedding model on your Amazon OpenSearch cluster is using [AWS CloudFormation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cfn-template.html)
+
+> This tutorial explains detail steps if you want to configure everything manually. 
+
 This doc introduces how to build semantic search in Amazon managed OpenSearch with [Bedrock Titan embedding model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html).
 If you are not using Amazon OpenSearch, you can refer to [bedrock_connector_titan_embedding_blueprint](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_titan_embedding_blueprint.md).
 
@@ -13,7 +17,7 @@ Go to AWS OpenSearch console UI and create OpenSearch domain.
 
 Copy the domain ARN which will be used in later steps.
 
-## 1. Create IAM role to access Bedrock model
+## 1. Create IAM role to invoke Bedrock model
 To invoke Bedrock model, we need to create an IAM role with proper permission.
 This IAM role will be configured in connector. Connector will use this role to invoke Bedrock model.
 
@@ -60,7 +64,7 @@ Generate a new IAM role specifically for signing your create connector request.
 
 
 Create IAM role `my_create_bedrock_connector_role` with 
-- Custom trust policy, `your_iam_user_arn` is the IAM user which will run `aws sts assume-role` in step 3.1
+- Custom trust policy. Note: `your_iam_user_arn` is the IAM user which will run `aws sts assume-role` in step 3.1
 ```
 {
     "Version": "2012-10-17",
@@ -149,7 +153,7 @@ payload = {
   "version": 1,
   "protocol": "aws_sigv4",
   "parameters": {
-    "region": "your_amazon_opensearch_domain_region",
+    "region": "your_bedrock_model_region",
     "service_name": "bedrock"
   },
   "credential": {

diff --git a/docs/tutorials/aws/semantic_search_with_cohere_embedding_model.md b/docs/tutorials/aws/semantic_search_with_cohere_embedding_model.md
@@ -1,5 +1,9 @@
 # Topic
 
+> The easiest way for setting up Bedrock titan embedding model on your Amazon OpenSearch cluster is using [AWS CloudFormation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cfn-template.html)
+
+> This tutorial explains detail steps if you want to configure everything manually. You can also connect to other service with similar way.
+
 This doc introduces how to build semantic search in Amazon managed OpenSearch with [Cohere embedding model](https://docs.cohere.com/reference/embed).
 If you are not using Amazon OpenSearch, you can refer to [cohere_v3_connector_embedding_blueprint](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/cohere_v3_connector_embedding_blueprint.md) and [OpenSearch semantic search](https://opensearch.org/docs/latest/search-plugins/semantic-search/).
 
@@ -72,7 +76,7 @@ Generate a new IAM role specifically for signing your create connector request.
 
 
 Create IAM role `my_create_connector_role` with 
-- Custom trust policy, `your_iam_user_arn` is the IAM user which will run `aws sts assume-role` in step 4.1
+- Custom trust policy. Note: `your_iam_user_arn` is the IAM user which will run `aws sts assume-role` in step 4.1
 ```
 {
     "Version": "2012-10-17",