Link Search Menu Expand Document Documentation Menu

索引短语

The index_phrases mapping parameter determines whether a field’s text is additionally processed to generate phrase tokens. When enabled, the system creates extra tokens representing sequences of exactly two consecutive words (bigrams). This can significantly improve the performance and accuracy of phrase queries. However, it also increases the index size and the time needed to index documents.

默认情况下,index_phrases 设置为 false,以保持更精简的索引和更快的文档摄入。

在字段上启用索引短语

以下示例创建了一个名为 blog 的索引,其中 content 字段配置了 index_phrases

PUT /blog
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "index_phrases": true
      }
    }
  }
}

使用以下请求索引文档

PUT /blog/_doc/1
{
  "content": "The slow green turtle swims past the whale"
}

使用以下搜索请求执行 match_phrase 查询

POST /blog/_search
{
  "query": {
    "match_phrase": {
      "content": "slow green"
    }
  }
}

查询返回存储的文档

{
  "took": 25,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "blog",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "content": "The slow green turtle swims past the whale"
        }
      }
    ]
  }
}

尽管在未提供 index_phrases 映射参数时也会返回相同的命中结果,但使用此参数可确保查询执行如下操作:

  • 内部使用 .index_phrases 字段
  • 匹配预分词的二元组,例如“slow green”、“green turtle”或“turtle swims”。
  • 绕过位置查找,速度更快,尤其是在大规模部署时。
剩余 350 字符

有问题?

想做贡献?