解释

1.0 版引入

想知道为什么某个文档在特定查询中排名更高（或更低）？您可以使用 explain API 来获取每个结果相关性分数 (_score) 的计算方式解释。

OpenSearch 使用名为 Okapi BM25 的概率排名框架来计算相关性分数。Okapi BM25 基于 Apache Lucene 使用的原始 TF/IDF 框架。

Explain API 在资源和时间方面都是一项开销大的操作。在生产集群上，我们建议您仅出于故障排除目的谨慎使用它。

端点

GET <index>/_explain/<id>
POST <index>/_explain/<id>

路径参数

参数	类型	描述	必需
`<index>`	字符串	索引的名称。您只能指定一个索引。	是
`<id>`	字符串	要附加到文档的唯一标识符。	是

查询参数

您必须指定索引和文档 ID。所有其他参数都是可选的。

参数	类型	描述	必需
`分析器`	字符串	用于 `q` 查询字符串的分析器。仅当使用 `q` 时有效。	否
`analyze_wildcard`	布尔型	是否分析 `q` 字符串中的通配符和前缀查询。仅当使用 `q` 时有效。默认为 `false`。	否
`default_operator`	字符串	用于 `q` 查询字符串的默认布尔运算符（`AND` 或 `OR`）。仅当使用 `q` 时有效。默认为 `OR`。	否
`df`	字符串	如果在 `q` 字符串中未指定字段，则为默认搜索字段。仅当使用 `q` 时有效。	否
`lenient`	布尔型	指定 OpenSearch 是否应忽略基于格式的查询失败（例如，查询文本字段以获取整数）。默认为 `false`。	否
`preference`	字符串	指定从哪个分片检索结果的偏好。可用选项包括 `_local`（指示操作从本地分配的分片副本中检索结果）以及分配给特定分片副本的自定义字符串值。默认情况下，OpenSearch 在随机分片上执行解释操作。	否
`q`	字符串	采用 Lucene 语法的查询字符串。使用时，您可以使用 `analyzer`、`analyze_wildcard`、`default_operator`、`df` 和 `stored_fields` 参数配置查询行为。	否
`stored_fields`	字符串	要返回的存储字段的逗号分隔列表。如果省略，则只返回 `_source`。	否
`路由`	字符串	用于将操作路由到特定分片的值。	否
`_source`	字符串	是否在响应正文中包含 `_source` 字段。默认为 `true`。	否
`_source_excludes`	字符串	要在查询响应中排除的源字段的逗号分隔列表。	否
`_source_includes`	字符串	要在查询响应中包含的源字段的逗号分隔列表。	否

示例请求

要查看所有结果的解释输出，请将 explain 标志设置为 true，无论是在 URL 中还是在请求正文中。

POST opensearch_dashboards_sample_data_ecommerce/_search?explain=true
{
  "query": {
    "match": {
      "customer_first_name": "Mary"
    }
  }
}

更常见的是，您希望获取单个文档的输出。在这种情况下，请在 URL 中指定文档 ID。

POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
{
  "query": {
    "match": {
      "customer_first_name": "Mary"
    }
  }
}

示例响应

{
  "_index" : "kibana_sample_data_ecommerce",
  "_id" : "EVz1Q3sBgg5eWQP6RSte",
  "matched" : true,
  "explanation" : {
    "value" : 3.5671005,
    "description" : "weight(customer_first_name:mary in 1) [PerFieldSimilarity], result of:",
    "details" : [
      {
        "value" : 3.5671005,
        "description" : "score(freq=1.0), computed as boost * idf * tf from:",
        "details" : [
          {
            "value" : 2.2,
            "description" : "boost",
            "details" : [ ]
          },
          {
            "value" : 3.4100041,
            "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details" : [
              {
                "value" : 154,
                "description" : "n, number of documents containing term",
                "details" : [ ]
              },
              {
                "value" : 4675,
                "description" : "N, total number of documents with field",
                "details" : [ ]
              }
            ]
          },
          {
            "value" : 0.47548598,
            "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details" : [
              {
                "value" : 1.0,
                "description" : "freq, occurrences of term within document",
                "details" : [ ]
              },
              {
                "value" : 1.2,
                "description" : "k1, term saturation parameter",
                "details" : [ ]
              },
              {
                "value" : 0.75,
                "description" : "b, length normalization parameter",
                "details" : [ ]
              },
              {
                "value" : 1.0,
                "description" : "dl, length of field",
                "details" : [ ]
              },
              {
                "value" : 1.1206417,
                "description" : "avgdl, average length of field",
                "details" : [ ]
              }
            ]
          }
        ]
      }
    ]
  }
}

响应正文字段

字段	描述
`matched`	指示文档是否与查询匹配。
`解释`	`explanation` 对象有三个属性：`value`、`description` 和 `details`。`value` 显示计算结果，`description` 解释执行了哪种类型的计算，而 `details` 显示了任何执行的子计算。
词条频率 (`tf`)	给定文档中，词条在一个字段中出现的次数。词条出现的次数越多，相关性分数越高。
逆文档频率 (`idf`)	词条在索引中（所有文档中）出现的频率。词条出现的频率越高，相关性分数越低。
字段归一化因子 (`fieldNorm`)	字段的长度。OpenSearch 会为出现在相对较短字段中的词条分配更高的相关性分数。

tf、idf 和 fieldNorm 值在文档添加或更新时的索引阶段计算并存储。这些值可能存在一些（通常很小）不准确性，因为它们是基于每个分片返回的样本总和。

单个查询包括计算相关性分数的其他因素，例如词条邻近度、模糊性等。

端点
路径参数
查询参数
示例请求
示例响应
响应正文字段

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想贡献？编辑此页面或创建问题。

解释

端点

路径参数

查询参数

示例请求

示例响应

响应正文字段

OpenSearch 链接

参与其中

资源

联系我们