Link Search Menu Expand Document Documentation Menu

使用 Amazon Bedrock 模型重新排序搜索结果

重新排序管道可以对搜索结果进行重新排序,为搜索结果中的每个文档提供相对于搜索查询的相关性分数。相关性分数由交叉编码器模型计算。

本教程展示了如何使用 Amazon Bedrock Rerank API,利用托管在 Amazon Bedrock 上的模型对搜索结果进行重新排序。

将以 your_ 为前缀的占位符替换为您自己的值。

前提条件:在 Amazon Bedrock 上测试模型

在使用模型之前,请在 Amazon Bedrock 上进行测试。有关支持的重新排序器模型,请参阅Amazon Bedrock 中支持重新排序的区域和模型。有关模型 ID,请参阅Amazon Bedrock 中支持的基础模型。要执行重新排序测试,请使用以下代码:

import json
import boto3
bedrock_region = "your_bedrock_model_region_like_us-west-2"
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=bedrock_region)

model_id = "amazon.rerank-v1:0"

response = bedrock_agent_runtime_client.rerank(
    queries=[
        {
            "textQuery": {
                "text": "What is the capital city of America?",
            },
            "type": "TEXT"
        }
    ],
    rerankingConfiguration={
        "bedrockRerankingConfiguration": {
            "modelConfiguration": {
                "modelArn": f"arn:aws:bedrock:{bedrock_region}::foundation-model/{model_id}"
            },
        },
        "type": "BEDROCK_RERANKING_MODEL"
    },
    sources=[
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "Carson City is the capital city of the American state of Nevada.",
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },        
    ]
)

results = response["results"]
print(json.dumps(results, indent=2))

重新排序的结果按最高分数排序

[
  {
    "index": 2,
    "relevanceScore": 0.7711548805236816
  },
  {
    "index": 0,
    "relevanceScore": 0.0025114635936915874
  },
  {
    "index": 1,
    "relevanceScore": 2.4876489987946115e-05
  },
  {
    "index": 3,
    "relevanceScore": 6.339210358419223e-06
  }
]

要按索引对结果进行排序,请使用以下代码:

print(json.dumps(sorted(results, key=lambda x: x['index']),indent=2))

以下是按索引排序的结果:

[
  {
    "index": 0,
    "relevanceScore": 0.0025114635936915874
  },
  {
    "index": 1,
    "relevanceScore": 2.4876489987946115e-05
  },
  {
    "index": 2,
    "relevanceScore": 0.7711548805236816
  },
  {
    "index": 3,
    "relevanceScore": 6.339210358419223e-06
  }
]

步骤 1:创建连接器并注册模型

要创建连接器并注册模型,请使用以下步骤。

步骤 1.1:为模型创建连接器

首先,为模型创建一个连接器。

如果您使用自管理的 OpenSearch,请提供您的 AWS 凭证:

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Rerank API",
  "description": "Test connector for Amazon Bedrock Rerank API",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "access_key": "your_access_key",
    "secret_key": "your_secret_key",
    "session_token": "your_session_token"
  },
  "parameters": {
    "service_name": "bedrock",
    "endpoint": "bedrock-agent-runtime",
    "region": "your_bedrock_model_region_like_us-west-2",
    "api_name": "rerank",
    "model_id": "amazon.rerank-v1:0"
  },
  "actions": [
    {
      "action_type": "PREDICT",
      "method": "POST",
      "url": "https://${parameters.endpoint}.${parameters.region}.amazonaws.com/${parameters.api_name}",
      "headers": {
        "x-amz-content-sha256": "required",
        "content-type": "application/json"
      },
      "pre_process_function": "connector.pre_process.bedrock.rerank",
      "request_body": """
        {
          "queries": ${parameters.queries},
          "rerankingConfiguration": {
            "bedrockRerankingConfiguration": {
              "modelConfiguration": {
                "modelArn": "arn:aws:bedrock:${parameters.region}::foundation-model/${parameters.model_id}"
              }
            },
            "type": "BEDROCK_RERANKING_MODEL"
          },
          "sources": ${parameters.sources}
        }
      """,
      "post_process_function": "connector.post_process.bedrock.rerank"
    }
  ]
}

如果您使用 Amazon OpenSearch Service,您可以提供一个 AWS Identity and Access Management (IAM) 角色 Amazon Resource Name (ARN),该角色允许访问 Amazon Bedrock。有关更多信息,请参阅 AWS 文档。使用以下请求创建连接器:

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Rerank API",
  "description": "Test connector for Amazon Bedrock Rerank API",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "roleArn": "your_role_arn_which_allows_access_to_bedrock_agent_runtime_rerank_api"
  },
  "parameters": {
    "service_name": "bedrock",
    "endpoint": "bedrock-agent-runtime",
    "region": "your_bedrock_model_region_like_us-west-2",
    "api_name": "rerank",
    "model_id": "amazon.rerank-v1:0"
  },
  "actions": [
    {
      "action_type": "PREDICT",
      "method": "POST",
      "url": "https://${parameters.endpoint}.${parameters.region}.amazonaws.com/${parameters.api_name}",
      "headers": {
        "x-amz-content-sha256": "required",
        "content-type": "application/json"
      },
      "pre_process_function": "connector.pre_process.bedrock.rerank",
      "request_body": """
        {
          "queries": ${parameters.queries},
          "rerankingConfiguration": {
            "bedrockRerankingConfiguration": {
              "modelConfiguration": {
                "modelArn": "arn:aws:bedrock:${parameters.region}::foundation-model/${parameters.model_id}"
              }
            },
            "type": "BEDROCK_RERANKING_MODEL"
          },
          "sources": ${parameters.sources}
        }
      """,
      "post_process_function": "connector.post_process.bedrock.rerank"
    }
  ]
}

步骤 1.2:注册和部署模型

使用响应中的连接器 ID 注册和部署模型:

POST /_plugins/_ml/models/_register?deploy=true
{
  "name": "Amazon Bedrock Rerank API",
  "function_name": "remote",
  "description": "test Amazon Bedrock Rerank API",
  "connector_id": "your_connector_id"
}

记下响应中的模型 ID;您将在后续步骤中使用它。

步骤 1.3:测试模型

使用 Predict API 测试模型:

POST _plugins/_ml/_predict/text_similarity/your_model_id
{
  "query_text": "What is the capital city of America?",
  "text_docs": [
    "Carson City is the capital city of the American state of Nevada.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
    "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
  ]
}

或者,您可以使用以下查询测试模型。此查询绕过 pre_process_function 并直接调用 Rerank API:

POST _plugins/_ml/models/your_model_id/_predict
{
  "parameters": {
    "queries": [
      {
        "textQuery": {
            "text": "What is the capital city of America?"
        },
        "type": "TEXT"
      }
    ],
    "sources": [
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "Carson City is the capital city of the American state of Nevada."
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan."
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        },
        {
            "inlineDocumentSource": {
                "textDocument": {
                    "text": "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
                },
                "type": "TEXT"
            },
            "type": "INLINE"
        }
    ]
  }
}

连接器的 pre_process_function 将输入转换为 Predict API parameters 所需的格式。

默认情况下,Amazon Bedrock Rerank API 输出的格式如下:

[
  {
    "index": 2,
    "relevanceScore": 0.7711548724998493
  },
  {
    "index": 0,
    "relevanceScore": 0.0025114635138098534
  },
  {
    "index": 1,
    "relevanceScore": 2.4876490010363496e-05
  },
  {
    "index": 3,
    "relevanceScore": 6.339210403977635e-06
  }
]

连接器的 post_process_function 将模型的输出转换为 Reranker 处理器可以解释的格式,并按索引顺序排列结果。

响应包含四个 similarity 输出。对于每个 similarity 输出,data 数组包含每个文档相对于查询的相关性分数。similarity 输出按照输入文档的顺序提供;第一个相似性结果与第一个文档相关:

{
  "inference_results": [
    {
      "output": [
        {
          "name": "similarity",
          "data_type": "FLOAT32",
          "shape": [
            1
          ],
          "data": [
            0.0025114636
          ]
        },
        {
          "name": "similarity",
          "data_type": "FLOAT32",
          "shape": [
            1
          ],
          "data": [
            2.487649e-05
          ]
        },
        {
          "name": "similarity",
          "data_type": "FLOAT32",
          "shape": [
            1
          ],
          "data": [
            0.7711549
          ]
        },
        {
          "name": "similarity",
          "data_type": "FLOAT32",
          "shape": [
            1
          ],
          "data": [
            6.3392104e-06
          ]
        }
      ],
      "status_code": 200
    }
  ]
}

步骤 2:创建重新排序管道

要创建重新排序管道,请使用以下步骤。

步骤 2.1:摄取测试数据

使用以下请求将数据摄取到您的索引中:

POST _bulk
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "Carson City is the capital city of the American state of Nevada." }
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan." }
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district." }
{ "index": { "_index": "my-test-data" } }
{ "passage_text" : "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states." }

步骤 2.2:创建重新排序管道

使用 Amazon Bedrock 重新排序模型创建重新排序管道:

PUT /_search/pipeline/rerank_pipeline_bedrock
{
    "description": "Pipeline for reranking with Bedrock rerank model",
    "response_processors": [
        {
            "rerank": {
                "ml_opensearch": {
                    "model_id": "your_model_id_created_in_step1"
                },
                "context": {
                    "document_fields": ["passage_text"]
                }
            }
        }
    ]
}

如果在 document_fields 中提供多个字段名,则所有字段的值将首先被连接,然后执行重新排序。

步骤 2.3:测试重新排序

首先,在不使用重新排序管道的情况下测试查询:

POST my-test-data/_search
{
  "query": {
    "match": {
      "passage_text": "What is the capital city of America?"
    }
  },
  "highlight": {
    "pre_tags": ["<strong>"],
    "post_tags": ["</strong>"],
    "fields": {"passage_text": {}}
  },
  "_source": false,
  "fields": ["passage_text"]
}

响应中的第一个文档是 Carson City is the capital city of the American state of Nevada,这是不正确的。

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 2.5045562,
    "hits": [
      {
        "_index": "my-test-data",
        "_id": "1",
        "_score": 2.5045562,
        "fields": {
          "passage_text": [
            "Carson City is the capital city of the American state of Nevada."
          ]
        },
        "highlight": {
          "passage_text": [
            "Carson <strong>City</strong> <strong>is</strong> <strong>the</strong> <strong>capital</strong> <strong>city</strong> <strong>of</strong> <strong>the</strong> American state <strong>of</strong> Nevada."
          ]
        }
      },
      {
        "_index": "my-test-data",
        "_id": "2",
        "_score": 0.5807494,
        "fields": {
          "passage_text": [
            "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan."
          ]
        },
        "highlight": {
          "passage_text": [
            "<strong>The</strong> Commonwealth <strong>of</strong> <strong>the</strong> Northern Mariana Islands <strong>is</strong> a group <strong>of</strong> islands in <strong>the</strong> Pacific Ocean.",
            "Its <strong>capital</strong> <strong>is</strong> Saipan."
          ]
        }
      },
      {
        "_index": "my-test-data",
        "_id": "3",
        "_score": 0.5261191,
        "fields": {
          "passage_text": [
            "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
          ]
        },
        "highlight": {
          "passage_text": [
            "(also known as simply Washington or D.C., and officially as <strong>the</strong> District <strong>of</strong> Columbia) <strong>is</strong> <strong>the</strong> <strong>capital</strong>",
            "<strong>of</strong> <strong>the</strong> United States.",
            "It <strong>is</strong> a federal district."
          ]
        }
      },
      {
        "_index": "my-test-data",
        "_id": "4",
        "_score": 0.5083029,
        "fields": {
          "passage_text": [
            "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
          ]
        },
        "highlight": {
          "passage_text": [
            "<strong>Capital</strong> punishment (<strong>the</strong> death penalty) has existed in <strong>the</strong> United States since beforethe United States",
            "As <strong>of</strong> 2017, <strong>capital</strong> punishment <strong>is</strong> legal in 30 <strong>of</strong> <strong>the</strong> 50 states."
          ]
        }
      }
    ]
  }
}

接下来,使用重新排序管道测试查询:

POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock
{
  "query": {
    "match": {
      "passage_text": "What is the capital city of America?"
    }
  },
  "ext": {
    "rerank": {
      "query_context": {
         "query_text": "What is the capital city of America?"
      }
    }
  },
  "highlight": {
    "pre_tags": ["<strong>"],
    "post_tags": ["</strong>"],
    "fields": {"passage_text": {}}
  },
  "_source": false,
  "fields": ["passage_text"]
}

响应中的第一个文档是 "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",这是正确的。

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 0.7711549,
    "hits": [
      {
        "_index": "my-test-data",
        "_id": "3",
        "_score": 0.7711549,
        "fields": {
          "passage_text": [
            "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
          ]
        },
        "highlight": {
          "passage_text": [
            "(also known as simply Washington or D.C., and officially as <strong>the</strong> District <strong>of</strong> Columbia) <strong>is</strong> <strong>the</strong> <strong>capital</strong>",
            "<strong>of</strong> <strong>the</strong> United States.",
            "It <strong>is</strong> a federal district."
          ]
        }
      },
      {
        "_index": "my-test-data",
        "_id": "1",
        "_score": 0.0025114636,
        "fields": {
          "passage_text": [
            "Carson City is the capital city of the American state of Nevada."
          ]
        },
        "highlight": {
          "passage_text": [
            "Carson <strong>City</strong> <strong>is</strong> <strong>the</strong> <strong>capital</strong> <strong>city</strong> <strong>of</strong> <strong>the</strong> American state <strong>of</strong> Nevada."
          ]
        }
      },
      {
        "_index": "my-test-data",
        "_id": "2",
        "_score": 02.487649e-05,
        "fields": {
          "passage_text": [
            "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan."
          ]
        },
        "highlight": {
          "passage_text": [
            "<strong>The</strong> Commonwealth <strong>of</strong> <strong>the</strong> Northern Mariana Islands <strong>is</strong> a group <strong>of</strong> islands in <strong>the</strong> Pacific Ocean.",
            "Its <strong>capital</strong> <strong>is</strong> Saipan."
          ]
        }
      },
      {
        "_index": "my-test-data",
        "_id": "4",
        "_score": 6.3392104e-06,
        "fields": {
          "passage_text": [
            "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
          ]
        },
        "highlight": {
          "passage_text": [
            "<strong>Capital</strong> punishment (<strong>the</strong> death penalty) has existed in <strong>the</strong> United States since beforethe United States",
            "As <strong>of</strong> 2017, <strong>capital</strong> punishment <strong>is</strong> legal in 30 <strong>of</strong> <strong>the</strong> 50 states."
          ]
        }
      }
    ]
  },
  "profile": {
    "shards": []
  }
}

您可以通过指定 query_text_path 而不是 query_text 来重用相同的查询:

POST my-test-data/_search?search_pipeline=rerank_pipeline_bedrock
{
  "query": {
    "match": {
      "passage_text": "What is the capital city of America?"
    }
  },
  "ext": {
    "rerank": {
      "query_context": {
         "query_text_path": "query.match.passage_text.query"
      }
    }
  },
  "highlight": {
    "pre_tags": ["<strong>"],
    "post_tags": ["</strong>"],
    "fields": {"passage_text": {}}
  },
  "_source": false,
  "fields": ["passage_text"]
}