索引短语
The index_phrases
mapping parameter determines whether a field’s text is additionally processed to generate phrase tokens. When enabled, the system creates extra tokens representing sequences of exactly two consecutive words (bigrams). This can significantly improve the performance and accuracy of phrase queries. However, it also increases the index size and the time needed to index documents.
默认情况下,index_phrases
设置为 false
,以保持更精简的索引和更快的文档摄入。
在字段上启用索引短语
以下示例创建了一个名为 blog
的索引,其中 content
字段配置了 index_phrases
PUT /blog
{
"mappings": {
"properties": {
"content": {
"type": "text",
"index_phrases": true
}
}
}
}
使用以下请求索引文档
PUT /blog/_doc/1
{
"content": "The slow green turtle swims past the whale"
}
使用以下搜索请求执行 match_phrase
查询
POST /blog/_search
{
"query": {
"match_phrase": {
"content": "slow green"
}
}
}
查询返回存储的文档
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5753642,
"hits": [
{
"_index": "blog",
"_id": "1",
"_score": 0.5753642,
"_source": {
"content": "The slow green turtle swims past the whale"
}
}
]
}
}
尽管在未提供 index_phrases
映射参数时也会返回相同的命中结果,但使用此参数可确保查询执行如下操作:
- 内部使用
.index_phrases
字段 - 匹配预分词的二元组,例如“slow green”、“green turtle”或“turtle swims”。
- 绕过位置查找,速度更快,尤其是在大规模部署时。