词项级和全文查询对比

您可以使用术语级查询和全文查询来搜索文本，但术语级查询通常用于搜索结构化数据，而全文查询则用于全文搜索。术语级查询和全文查询之间的主要区别在于，术语级查询搜索文档中精确指定的术语，而全文查询会分析查询字符串。下表总结了术语级查询和全文查询之间的区别。

	词条级查询	全文查询
描述	术语级查询回答哪些文档符合查询。	全文查询回答文档与查询的匹配程度。
分析器	搜索术语未经分析。这意味着术语查询会按原样搜索您的搜索术语。	搜索术语由在索引时用于特定文档字段的同一分析器进行分析。这意味着您的搜索术语会经历与文档字段相同的分析过程。
相关性	术语级查询简单地返回匹配的文档，不根据相关性分数进行排序。它们仍然会计算相关性分数，但该分数对于所有返回的文档都是相同的。	全文查询为每个匹配计算一个相关性分数，并按相关性递减的顺序对结果进行排序。
用例	当您想要匹配精确值（例如数字、日期或标签）且不需要按相关性对匹配项进行排序时，请使用术语级查询。	使用全文查询来匹配文本字段，并在考虑了大小写和词干变体等因素后按相关性排序。

OpenSearch 使用 BM25 排名算法计算相关性分数。要了解更多信息，请参阅Okapi BM25。

我应该使用全文查询还是术语级查询？

为了阐明全文查询和术语级查询之间的区别，请考虑以下两个搜索特定文本短语的示例。莎士比亚的全部作品已在 OpenSearch 集群中建立索引。

示例：短语搜索

在此示例中，您将在 text_entry 字段中搜索莎士比亚全部作品中的短语“To be, or not to be”。

首先，对此搜索使用一个术语级查询

GET shakespeare/_search
{
  "query": {
    "term": {
      "text_entry": "To be, or not to be"
    }
  }
}

响应不包含任何匹配项，由零个 hits 指示

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

这是因为术语“To be, or not to be”在倒排索引中是按字面搜索的，其中仅存储文本字段的分析值。术语级查询不适合搜索已分析的文本字段，因为它们经常产生意外结果。处理文本数据时，仅对映射为 keyword 的字段使用术语级查询。

现在使用一个全文查询搜索相同的短语

GET shakespeare/_search
{
  "query": {
    "match": {
      "text_entry": "To be, or not to be"
    }
  }
}

搜索查询“To be, or not to be”被分析并分词为令牌数组，就像文档的 text_entry 字段一样。全文查询获取搜索查询与所有文档的 text_entry 字段之间令牌的交集，然后按相关性分数对结果进行排序

{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 17.419369,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "34229",
        "_score" : 17.419369,
        "_source" : {
          "type" : "line",
          "line_id" : 34230,
          "play_name" : "Hamlet",
          "speech_number" : 19,
          "line_number" : "3.1.64",
          "speaker" : "HAMLET",
          "text_entry" : "To be, or not to be: that is the question:"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "109930",
        "_score" : 14.883024,
        "_source" : {
          "type" : "line",
          "line_id" : 109931,
          "play_name" : "A Winters Tale",
          "speech_number" : 23,
          "line_number" : "4.4.153",
          "speaker" : "PERDITA",
          "text_entry" : "Not like a corse; or if, not to be buried,"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "103117",
        "_score" : 14.782743,
        "_source" : {
          "type" : "line",
          "line_id" : 103118,
          "play_name" : "Twelfth Night",
          "speech_number" : 53,
          "line_number" : "1.3.95",
          "speaker" : "SIR ANDREW",
          "text_entry" : "will not be seen; or if she be, its four to one"
        }
      }
    ]
  }
}
...

有关所有全文查询的列表，请参阅全文查询。

示例：精确术语搜索

如果您想在 speaker 字段中搜索诸如“HAMLET”之类的精确术语，并且不需要按相关性分数对结果进行排序，则术语级查询更高效

GET shakespeare/_search
{
  "query": {
    "term": {
      "speaker": "HAMLET"
    }
  }
}

响应包含文档匹配项

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1582,
      "relation" : "eq"
    },
    "max_score" : 4.2540946,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "32700",
        "_score" : 4.2540946,
        "_source" : {
          "type" : "line",
          "line_id" : 32701,
          "play_name" : "Hamlet",
          "speech_number" : 9,
          "line_number" : "1.2.66",
          "speaker" : "HAMLET",
          "text_entry" : "[Aside]  A little more than kin, and less than kind."
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "32702",
        "_score" : 4.2540946,
        "_source" : {
          "type" : "line",
          "line_id" : 32703,
          "play_name" : "Hamlet",
          "speech_number" : 11,
          "line_number" : "1.2.68",
          "speaker" : "HAMLET",
          "text_entry" : "Not so, my lord; I am too much i' the sun."
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "32709",
        "_score" : 4.2540946,
        "_source" : {
          "type" : "line",
          "line_id" : 32710,
          "play_name" : "Hamlet",
          "speech_number" : 13,
          "line_number" : "1.2.75",
          "speaker" : "HAMLET",
          "text_entry" : "Ay, madam, it is common."
        }
      }
    ]
  }
}
...

术语级查询提供精确匹配。因此，如果您搜索“Hamlet”，您不会收到任何匹配项，因为“HAMLET”是一个关键词字段，它在 OpenSearch 中是按字面存储的，而不是以分析形式存储的。搜索查询“HAMLET”也是按字面搜索的。因此，要获得此字段的匹配项，我们需要输入完全相同的字符。

我应该使用全文查询还是术语级查询？
- 示例：短语搜索
- 示例：精确术语搜索

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想要贡献？编辑此页面或创建问题。

词项级和全文查询对比

我应该使用全文查询还是术语级查询？

示例：短语搜索

示例：精确术语搜索

OpenSearch 链接

参与其中

资源

联系我们