Link Search Menu Expand Document Documentation Menu

按字段重新排序搜索结果

从 OpenSearch 2.18 开始,您可以按字段重新排序搜索结果。当您的文档包含一个特别重要的字段,或者您希望重新排序来自外部托管模型的搜索结果时,此功能非常有用。有关更多信息,请参阅按字段重新排序搜索结果

本教程解释了如何在自管理 OpenSearch 和 Amazon OpenSearch Service 中使用 Cohere Rerank 模型按字段重新排序搜索结果。

将以 your_ 为前缀的占位符替换为您自己的值。

步骤 1(自管理 OpenSearch):创建连接器

要创建连接器,请发送以下请求:

POST /_plugins/_ml/connectors/_create
{
    "name": "cohere-rerank",
    "description": "The connector to Cohere reanker model",
    "version": "1",
    "protocol": "http",
    "credential": {
        "cohere_key": "your_cohere_api_key"
    },
    "parameters": {
        "model": "rerank-english-v3.0",
        "return_documents": true
    },
    "actions": [
        {
            "action_type": "predict",
            "method": "POST",
            "url": "https://api.cohere.ai/v1/rerank",
            "headers": {
                "Authorization": "Bearer ${credential.cohere_key}"
            },
            "request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"model\": \"${parameters.model}\", \"top_n\": ${parameters.top_n},  \"return_documents\": ${parameters.return_documents} }"
        }
    ]
}

响应包含连接器 ID:

{"connector_id":"qp2QP40BWbTmLN9Fpo40"}

请记下连接器 ID;您将在后续步骤中使用它。然后转到步骤 2

步骤 1(Amazon OpenSearch Service):创建连接器

按照以下步骤使用 Amazon OpenSearch Service 创建连接器。

先决条件:创建 OpenSearch 集群

前往 Amazon OpenSearch Service 控制台并创建一个 OpenSearch 域。

请记下域的 Amazon Resource Name (ARN) 和 URL;您将在后续步骤中使用它们。

步骤 1.1:在 AWS Secrets Manager 中存储 API 密钥

AWS Secrets Manager 中存储您的 Cohere API 密钥。

  1. 打开 AWS Secrets Manager。
  2. 选择 Store a new secret(存储新密钥)。
  3. 选择 Other type of secret(其他类型的密钥)。
  4. 创建一个键值对,其中键为 my_cohere_key,值为您的 Cohere API 密钥。
  5. 将您的密钥命名为 my_test_cohere_secret

记下密钥 ARN;您将在后续步骤中使用它。

步骤 1.2:创建 IAM 角色

要使用在步骤 1 中创建的密钥,您必须创建一个具有该密钥读取权限的 AWS 身份和访问管理 (IAM) 角色。此 IAM 角色将在连接器中配置,并允许连接器读取密钥。

转到 IAM 控制台,创建一个名为 my_cohere_secret_role 的新 IAM 角色,并添加以下信任策略和权限:

  • 自定义信任策略
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "es.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

  • 权限
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Effect": "Allow",
            "Resource": "your_secret_arn_created_in_step1"
        }
    ]
}

记下角色 ARN;您将在后续步骤中使用它。

步骤 1.3:在 Amazon OpenSearch Service 中配置 IAM 角色

按照以下步骤在 Amazon OpenSearch Service 中配置 IAM 角色。

步骤 1.3.1:创建用于签署连接器请求的 IAM 角色

专门为签署您的创建连接器 API 请求生成一个新的 IAM 角色。

创建一个名为 my_create_cohere_connector_role 的 IAM 角色,并添加以下信任策略和权限:

  • 自定义信任策略
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "your_iam_user_arn"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

您将使用 your_iam_user_arn IAM 用户在步骤 4.1 中承担该角色。

  • 权限
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "your_iam_role_arn_created_in_step2"
        },
        {
            "Effect": "Allow",
            "Action": "es:ESHttpPost",
            "Resource": "your_opensearch_domain_arn_created_in_step0"
        }
    ]
}

记下此角色 ARN;您将在后续步骤中使用它。

步骤 1.3.2:映射后端角色

按照以下步骤映射后端角色

  1. 登录 OpenSearch Dashboards,并在顶部菜单中选择 Security(安全)。
  2. 选择 Roles(角色),然后选择 ml_full_access 角色。
  3. ml_full_access 角色详情页面,选择 Mapped users(已映射用户),然后选择 Manage mapping(管理映射)。
  4. 后端角色字段中输入在步骤 3.1 中创建的 IAM 角色 ARN,如下图所示。
  5. 选择 Map(映射)。

IAM 角色现已成功在您的 OpenSearch 集群中配置。

步骤 1.4:创建连接器

按照以下步骤为模型创建连接器。有关创建连接器的更多信息,请参阅连接器

步骤 1.4.1:获取临时凭证

使用步骤 3.1 中指定的 IAM 用户的凭证来承担角色

aws sts assume-role --role-arn your_iam_role_arn_created_in_step3.1 --role-session-name your_session_name

从响应中复制临时凭证,并将其配置在 ~/.aws/credentials

[default]
AWS_ACCESS_KEY_ID=your_access_key_of_role_created_in_step3.1
AWS_SECRET_ACCESS_KEY=your_secret_key_of_role_created_in_step3.1
AWS_SESSION_TOKEN=your_session_token_of_role_created_in_step3.1

步骤 1.4.2:创建连接器

使用在 ~/.aws/credentials 中配置的临时凭证运行以下 Python 代码

import boto3
import requests 
from requests_aws4auth import AWS4Auth

host = 'your_amazon_opensearch_domain_endpoint_created_in_step0'
region = 'your_amazon_opensearch_domain_region'
service = 'es'

credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

path = '/_plugins/_ml/connectors/_create'
url = host + path

payload = {
    "name": "cohere-rerank",
    "description": "The connector to Cohere reanker model",
    "version": "1",
    "protocol": "http",
    "credential": {
        "secretArn": "your_secret_arn_created_in_step1",
        "roleArn": "your_iam_role_arn_created_in_step2"
    },
    "parameters": {
        "model": "rerank-english-v3.0",
        "return_documents": true

    },
    "actions": [
        {
            "action_type": "predict",
            "method": "POST",
            "url": "https://api.cohere.ai/v1/rerank",
            "headers": {
                "Authorization": "Bearer ${credential.secretArn.my_cohere_key}"
            },
            "request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"model\": \"${parameters.model}\", \"top_n\": ${parameters.top_n}, \"return_documents\": ${parameters.return_documents} }"
        }
    ]
}

headers = {"Content-Type": "application/json"}

r = requests.post(url, auth=awsauth, json=payload, headers=headers)
print(r.text)

脚本将输出连接器 ID

{"connector_id":"qp2QP40BWbTmLN9Fpo40"}

记下连接器 ID;您将在下一步中使用它。

步骤 2:注册 Cohere Rerank 模型

通过自管理 OpenSearch 或 Amazon OpenSearch Service 方法成功创建连接器后,即可注册 Cohere Rerank 模型。

使用步骤 1 中的连接器 ID 创建模型:

POST /_plugins/_ml/models/_register?deploy=true
{
    "name": "cohere rerank model",
    "function_name": "remote",
    "description": "test rerank model",
    "connector_id": "your_connector_id"
}

请记下连接器 ID;您将在后续步骤中使用它。

步骤 3:测试模型

要测试模型,请发送以下请求:

POST /_plugins/_ml/models/your_model_id/_predict
{
  "parameters": {
	"top_n" : 100,
    "query": "What day is it?",
	"documents" : ["Monday", "Tuesday", "apples"]
  }
}

响应包含匹配文档:

{
	"inference_results": [
		{
			"output": [
				{
					"name": "response",
					"dataAsMap": {
						"id": "e15a3922-3d89-4adc-96cf-9b85a619fb66",
						"results": [
							{
								"document": {
									"text": "Monday"
								},
								"index": 0.0,
								"relevance_score": 0.21076629
							},
							{
								"document": {
									"text": "Tuesday"
								},
								"index": 1.0,
								"relevance_score": 0.13206616
							},
							{
								"document": {
									"text": "apples"
								},
								"index": 2.0,
								"relevance_score": 1.0804956E-4
							}
						],
						"meta": {
							"api_version": {
								"version": "1"
							},
							"billed_units": {
								"search_units": 1.0
							}
						}
					}
				}
			],
			"status_code": 200
		}
	]
}

对于每个文档,重新排名模型都会分配一个分数。现在,您将创建一个搜索管道,该管道调用 Cohere 模型并根据其相关性分数重新排序搜索结果。

步骤 3:重新排序搜索结果

按照以下步骤重新排序搜索结果。

步骤 3.1:创建索引

要创建索引,请发送以下请求:

POST _bulk
{ "index": { "_index": "nyc_facts", "_id": 1 } }
{ "fact_title": "Population of New York", "fact_description": "New York City has an estimated population of over 8.3 million people as of 2023, making it the most populous city in the United States." }
{ "index": { "_index": "nyc_facts", "_id": 2 } }
{ "fact_title": "Statue of Liberty", "fact_description": "The Statue of Liberty, a symbol of freedom, was gifted to the United States by France in 1886 and stands on Liberty Island in New York Harbor." }
{ "index": { "_index": "nyc_facts", "_id": 3 } }
{ "fact_title": "New York City is a Global Financial Hub", "fact_description": "New York City is home to the New York Stock Exchange (NYSE) and Wall Street, which are central to the global finance industry." }
{ "index": { "_index": "nyc_facts", "_id": 4 } }
{ "fact_title": "Broadway", "fact_description": "Broadway is a major thoroughfare in New York City known for its theaters. It's also considered the birthplace of modern American theater and musicals." }
{ "index": { "_index": "nyc_facts", "_id": 5 } }
{ "fact_title": "Central Park", "fact_description": "Central Park, located in Manhattan, spans 843 acres and is one of the most visited urban parks in the world, offering green spaces, lakes, and recreational areas." }
{ "index": { "_index": "nyc_facts", "_id": 6 } }
{ "fact_title": "Empire State Building", "fact_description": "The Empire State Building, completed in 1931, is an iconic Art Deco skyscraper that was the tallest building in the world until 1970." }
{ "index": { "_index": "nyc_facts", "_id": 7 } }
{ "fact_title": "Times Square", "fact_description": "Times Square, often called 'The Cross-roads of the World,' is known for its bright lights, Broadway theaters, and New Year's Eve ball drop." }
{ "index": { "_index": "nyc_facts", "_id": 8 } }
{ "fact_title": "Brooklyn Bridge", "fact_description": "The Brooklyn Bridge, completed in 1883, connects Manhattan and Brooklyn and was the first suspension bridge to use steel in its construction." }
{ "index": { "_index": "nyc_facts", "_id": 9 } }
{ "fact_title": "New York City Public Library", "fact_description": "The New York Public Library, founded in 1895, has over 50 million items in its collections and serves as a major cultural and educational resource." }
{ "index": { "_index": "nyc_facts", "_id": 10 } }
{ "fact_title": "New York's Chinatown", "fact_description": "New York's Chinatown, one of the largest in the world, is known for its vibrant culture, food, and history. It plays a key role in the city's Chinese community." }

步骤 3.2:创建重新排序管道

要创建重新排序管道,请发送以下请求:

PUT /_search/pipeline/cohere_pipeline
{
  "response_processors": [
    {
      "ml_inference": {
        "model_id": "your_model_id",
        "input_map": {
          "documents": "fact_description",
          "query": "_request.ext.query_context.query_text",
          "top_n": "_request.ext.query_context.top_n"
        },
        "output_map": {
          "relevance_score": "results[*].relevance_score",
          "description": "results[*].document.text"
        },
        "full_response_path": false,
        "ignore_missing": false,
        "ignore_failure": false,
        "one_to_one": false,
        "override": false,
        "model_config": {}
      }
    },
    {
      "rerank": {
        "by_field": {
          "target_field": "relevance_score",
          "remove_target_field": false,
          "keep_previous_score": false,
          "ignore_failure": false
        }
      }
    }
  ]
}

步骤 3.3:测试管道

要测试管道,请发送与索引文档相关的查询,并将 top_n 设置为大于或等于 size 的值:

GET nyc_facts/_search?search_pipeline=cohere_pipeline
{
  "query": {
    "match_all": {}
  },
  "size": 5,
  "ext": {
    "rerank": {
      "query_context": {
        "query_text": "Where do people go to see a show?",
        "top_n" : "10"
      }
    }
  }
}

响应包含重新排序的文档:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10,
      "relation": "eq"
    },
    "max_score": 0.34986588,
    "hits": [
      {
        "_index": "nyc_facts",
        "_id": "_7a76b04b5016c71c",
        "_score": 0.34986588,
        "_source": {
          "result_document": "Broadway is a major thoroughfare in New York City known for its theaters. It's also considered the birthplace of modern American theater and musicals.",
          "fact_title": "Times Square",
          "fact_description": "Times Square, often called 'The Cross-roads of the World,' is known for its bright lights, Broadway theaters, and New Year's Eve ball drop.",
          "relevance_score": 0.34986588
        }
      },
      {
        "_index": "nyc_facts",
        "_id": "_00c26e453971ed68",
        "_score": 0.1066906,
        "_source": {
          "result_document": "Times Square, often called 'The Cross-roads of the World,' is known for its bright lights, Broadway theaters, and New Year's Eve ball drop.",
          "fact_title": "New York City Public Library",
          "fact_description": "The New York Public Library, founded in 1895, has over 50 million items in its collections and serves as a major cultural and educational resource.",
          "relevance_score": 0.1066906
        }
      },
      {
        "_index": "nyc_facts",
        "_id": "_d03d3610a5a5bd82",
        "_score": 0.00019563535,
        "_source": {
          "result_document": "The New York Public Library, founded in 1895, has over 50 million items in its collections and serves as a major cultural and educational resource.",
          "fact_title": "Broadway",
          "fact_description": "Broadway is a major thoroughfare in New York City known for its theaters. It's also considered the birthplace of modern American theater and musicals.",
          "relevance_score": 0.00019563535
        }
      },
      {
        "_index": "nyc_facts",
        "_id": "_9284bae64eab7f63",
        "_score": 0.000019988918,
        "_source": {
          "result_document": "The Statue of Liberty, a symbol of freedom, was gifted to the United States by France in 1886 and stands on Liberty Island in New York Harbor.",
          "fact_title": "Brooklyn Bridge",
          "fact_description": "The Brooklyn Bridge, completed in 1883, connects Manhattan and Brooklyn and was the first suspension bridge to use steel in its construction.",
          "relevance_score": 0.000019988918
        }
      },
      {
        "_index": "nyc_facts",
        "_id": "_7aa6f2934f47911b",
        "_score": 0.0000104515475,
        "_source": {
          "result_document": "The Brooklyn Bridge, completed in 1883, connects Manhattan and Brooklyn and was the first suspension bridge to use steel in its construction.",
          "fact_title": "Statue of Liberty",
          "fact_description": "The Statue of Liberty, a symbol of freedom, was gifted to the United States by France in 1886 and stands on Liberty Island in New York Harbor.",
          "relevance_score": 0.0000104515475
        }
      }
    ]
  },
  "profile": {
    "shards": []
  }
}

评估重新排序的结果时,请关注 result_document 字段及其对应的 relevance_scorefact_description 字段显示原始文档文本,并不反映重新排序的顺序。