在 Amazon Bedrock 上使用 Cohere Embed 的语义搜索

本教程展示了如何使用 Amazon OpenSearch Service 中的 Cohere Embed 模型实现语义搜索。更多信息请参见语义搜索。

如果使用 Python，您可以使用 opensearch-py-ml 客户端 CLI 创建 Cohere 连接器并测试模型。CLI 自动化了许多配置步骤，从而加快了设置速度并减少了出错的可能性。有关使用 CLI 的更多信息，请参阅 CLI 文档。

如果使用自托管 OpenSearch 而非 Amazon OpenSearch Service，请使用蓝图在 Amazon Bedrock 上创建到模型的连接器。有关创建连接器的更多信息，请参阅连接器。

在 Amazon OpenSearch Service 中设置嵌入模型最简单的方法是使用 AWS CloudFormation。此外，您还可以使用 AIConnectorHelper 笔记本设置嵌入模型。

Amazon Bedrock 有配额限制。有关提高此限制的更多信息，请参阅通过 Amazon Bedrock 中的预置吞吐量提高模型调用容量。

将以 your_ 为前缀的占位符替换为您自己的值。

先决条件：创建 OpenSearch 集群

前往 Amazon OpenSearch Service 控制台并创建 OpenSearch 域。

记下域的 Amazon 资源名称 (ARN)；您将在后续步骤中使用它。

步骤 1：创建 IAM 角色以在 Amazon Bedrock 上调用模型

要在 Amazon Bedrock 上调用模型，您必须创建具有适当权限的 AWS Identity and Access Management (IAM) 角色。连接器将使用此角色来调用模型。

前往 IAM 控制台，创建一个名为 my_invoke_bedrock_cohere_role 的新 IAM 角色，并添加以下信任策略和权限

自定义信任策略

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "es.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

权限

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:bedrock:*::foundation-model/cohere.embed-english-v3"
        }
    ]
}

如果您需要支持多语言的模型，可以使用 cohere.embed-multilingual-v3 模型。

记下角色 ARN；您将在后续步骤中使用它。

步骤 2：在 Amazon OpenSearch Service 中配置 IAM 角色

按照以下步骤在 Amazon OpenSearch Service 中配置 IAM 角色。

步骤 2.1：为签名连接器请求创建 IAM 角色

专门为签名您的创建连接器 API 请求生成一个新的 IAM 角色。

创建一个名为 my_create_bedrock_cohere_connector_role 的 IAM 角色，并添加以下信任策略和权限

自定义信任策略

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "your_iam_user_arn"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

您将在步骤 3.1 中使用 your_iam_user_arn IAM 用户来代入该角色。

权限

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "your_iam_role_arn_created_in_step1"
        },
        {
            "Effect": "Allow",
            "Action": "es:ESHttpPost",
            "Resource": "your_opensearch_domain_arn"
        }
    ]
}

记下此角色 ARN；您将在后续步骤中使用它。

步骤 2.2：映射后端角色

按照以下步骤映射后端角色

登录 OpenSearch Dashboards，并在顶部菜单中选择安全。
选择角色，然后选择 ml_full_access 角色。
在 ml_full_access 角色详细信息页面上，选择映射用户，然后选择管理映射。
在后端角色字段中输入在步骤 2.1 中创建的 IAM 角色 ARN，如下图所示。
选择映射。

IAM 角色现在已成功配置到您的 OpenSearch 集群中。

步骤 3：创建连接器

按照以下步骤为模型创建连接器。有关创建连接器的更多信息，请参阅连接器。

步骤 3.1：获取临时凭证

使用步骤 2.1 中指定的 IAM 用户凭证来代入角色

aws sts assume-role --role-arn your_iam_role_arn_created_in_step2.1 --role-session-name your_session_name

从响应中复制临时凭证，并将其配置到 ~/.aws/credentials 中

[default]
AWS_ACCESS_KEY_ID=your_access_key_of_role_created_in_step2.1
AWS_SECRET_ACCESS_KEY=your_secret_key_of_role_created_in_step2.1
AWS_SESSION_TOKEN=your_session_token_of_role_created_in_step2.1

步骤 3.2：创建连接器

使用在 ~/.aws/credentials 中配置的临时凭证运行以下 Python 代码

import boto3
import requests 
from requests_aws4auth import AWS4Auth

host = 'your_amazon_opensearch_domain_endpoint'
region = 'your_amazon_opensearch_domain_region'
service = 'es'

credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

path = '/_plugins/_ml/connectors/_create'
url = host + path

payload = {
  "name": "Amazon Bedrock Cohere Connector: embedding v3",
  "description": "The connector to Bedrock Cohere embedding model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "your_bedrock_model_region",
    "service_name": "bedrock",
    "input_type":"search_document",
    "truncate": "END"
  },
  "credential": {
    "roleArn": "your_iam_role_arn_created_in_step1"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.your_bedrock_model_region.amazonaws.com/model/cohere.embed-english-v3/invoke",
      "headers": {
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{ \"texts\": ${parameters.texts}, \"truncate\": \"${parameters.truncate}\", \"input_type\": \"${parameters.input_type}\" }",
      "pre_process_function": "connector.pre_process.cohere.embedding",
      "post_process_function": "connector.post_process.cohere.embedding"
    }
  ]
}

headers = {"Content-Type": "application/json"}

r = requests.post(url, auth=awsauth, json=payload, headers=headers)
print(r.text)

更多信息请参见Cohere 蓝图。

脚本输出连接器 ID

{"connector_id":"1p0u8o0BWbTmLN9F2Y7m"}

记下连接器 ID；您将在下一步中使用它。

步骤 4：创建和测试模型

创建模型组

 POST /_plugins/_ml/model_groups/_register
 {
     "name": "Bedrock_embedding_model",
     "description": "Test model group for bedrock embedding model"
 }

响应包含模型组 ID

 {
   "model_group_id": "050q8o0BWbTmLN9Foo4f",
   "status": "CREATED"
 }

注册模型

 POST /_plugins/_ml/models/_register
 {
   "name": "Bedrock Cohere embedding model v3",
   "function_name": "remote",
   "description": "test embedding model",
   "model_group_id": "050q8o0BWbTmLN9Foo4f",
   "connector_id": "0p0p8o0BWbTmLN9F-o4G"
 }

响应包含模型 ID

 {
   "task_id": "TRUr8o0BTaDH9c7tSRfx",
   "status": "CREATED",
   "model_id": "VRUu8o0BTaDH9c7t9xet"
 }

部署模型

 POST /_plugins/_ml/models/VRUu8o0BTaDH9c7t9xet/_deploy

响应包含部署操作的任务 ID

 {
   "task_id": "1J0r8o0BWbTmLN9FjY6I",
   "task_type": "DEPLOY_MODEL",
   "status": "COMPLETED"
 }

测试模型

 POST /_plugins/_ml/models/VRUu8o0BTaDH9c7t9xet/_predict
 {
   "parameters": {
     "texts": ["hello world"]
   }
 }

响应包含模型生成的嵌入

 {
   "inference_results": [
     {
       "output": [
         {
           "name": "sentence_embedding",
           "data_type": "FLOAT32",
           "shape": [
             1024
           ],
           "data": [
             -0.02973938,
             -0.023651123,
             -0.06021118,
             ...]
         }
       ],
       "status_code": 200
     }
   ]
 }

步骤 5：配置语义搜索

按照以下步骤配置语义搜索。

步骤 5.1：创建摄入管道

首先，创建一个摄入管道，该管道使用 Amazon SageMaker 中的模型从输入文本创建嵌入。

PUT /_ingest/pipeline/my_bedrock_cohere_embedding_pipeline
{
    "description": "text embedding pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": "your_bedrock_embedding_model_id_created_in_step4",
                "field_map": {
                    "text": "text_knn"
                }
            }
        }
    ]
}

步骤 5.2：创建向量索引

接下来，创建一个向量索引来存储输入文本和生成的嵌入。

PUT my_index
{
  "settings": {
    "index": {
      "knn.space_type": "cosinesimil",
      "default_pipeline": "my_bedrock_cohere_embedding_pipeline",
      "knn": "true"
    }
  },
  "mappings": {
    "properties": {
      "text_knn": {
        "type": "knn_vector",
        "dimension": 1024
      }
    }
  }
}

步骤 5.3：摄入数据

将示例文档摄入到索引中

POST /my_index/_doc/1000001
{
    "text": "hello world."
}

步骤 5.4：搜索索引

运行向量搜索以从向量索引中检索文档

POST /my_index/_search
{
  "query": {
    "neural": {
      "text_knn": {
        "query_text": "hello",
        "model_id": "your_embedding_model_id_created_in_step4",
        "k": 100
      }
    }
  },
  "size": "1",
  "_source": ["text"]
}

先决条件：创建 OpenSearch 集群
步骤 1：创建 IAM 角色以在 Amazon Bedrock 上调用模型
步骤 2：在 Amazon OpenSearch Service 中配置 IAM 角色
- 步骤 2.1：为签名连接器请求创建 IAM 角色
- 步骤 2.2：映射后端角色
步骤 3：创建连接器
- 步骤 3.1：获取临时凭证
- 步骤 3.2：创建连接器
步骤 4：创建和测试模型
步骤 5：配置语义搜索

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想做贡献？编辑此页面或创建问题。

在 Amazon Bedrock 上使用 Cohere Embed 的语义搜索

先决条件：创建 OpenSearch 集群

步骤 1：创建 IAM 角色以在 Amazon Bedrock 上调用模型

步骤 2：在 Amazon OpenSearch Service 中配置 IAM 角色

步骤 2.1：为签名连接器请求创建 IAM 角色

步骤 2.2：映射后端角色

步骤 3：创建连接器

步骤 3.1：获取临时凭证

步骤 3.2：创建连接器

步骤 4：创建和测试模型

步骤 5：配置语义搜索

步骤 5.1：创建摄入管道

步骤 5.2：创建向量索引

步骤 5.3：摄入数据

步骤 5.4：搜索索引

OpenSearch 链接

参与其中

资源

联系我们