在混合查询中使用内部命中
3.0 版本引入
在运行混合搜索时,可以通过在搜索请求中包含 inner_hits
子句来检索匹配的嵌套对象或子文档。此信息可帮助您探索文档中与查询匹配的特定部分。
要了解有关 inner_hits
如何工作的更多信息,请参阅 检索内部匹配。
在混合查询执行期间,文档的评分和检索如下:
- 每个子查询根据其内部匹配的相关性选择父文档。
- 所有子查询中选择的父文档被合并,它们的得分被归一化以生成混合得分。
- 对于每个父文档,相关的
inner_hits
从分片中检索并包含在最终响应中。
混合查询在确定最终搜索结果时,处理内部匹配的方式与传统查询不同:
- 在**传统查询**中,父文档的最终排名直接由
inner_hits
得分决定。 - 在**混合查询**中,最终排名由**混合得分**(所有子查询得分的归一化组合)决定。但是,父文档仍然是根据其
inner_hits
的相关性从分片中获取的。
响应中的 inner_hits
部分显示了归一化之前的原始得分。父文档显示最终的混合得分。
示例
以下示例演示了在混合查询中使用 inner_hits
。
步骤 1:创建索引
创建具有两个嵌套字段(user
和 location
)的索引
PUT /my-nlp-index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"user": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
},
"location": {
"type": "nested",
"properties": {
"city": {
"type": "text"
},
"state": {
"type": "text"
}
}
}
}
}
}
步骤 2:创建搜索管道
使用 min_max
归一化技术和 arithmetic_mean
组合技术配置带有 normalization-processor
的搜索管道
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {}
}
}
}
]
}
步骤 3:将文档摄取到索引中
要将文档摄取到上一步创建的索引中,请发送以下请求
POST /my-nlp-index/_bulk
{"index": {"_index": "my-nlp-index"}}
{"user":[{"name":"John Alder","age":35},{"name":"Sammy","age":34},{"name":"Mike","age":32},{"name":"Maples","age":30}],"location":[{"city":"Amsterdam","state":"Netherlands"},{"city":"Udaipur","state":"Rajasthan"},{"city":"Naples","state":"Italy"}]}
{"index": {"_index": "my-nlp-index"}}
{"user":[{"name":"John Wick","age":46},{"name":"John Snow","age":40},{"name":"Sansa Stark","age":22},{"name":"Arya Stark","age":20}],"location":[{"city":"Tromso","state":"Norway"},{"city":"Los Angeles","state":"California"},{"city":"London","state":"UK"}]}
步骤 4:使用混合搜索并获取内部匹配来搜索索引
以下请求运行混合查询,以搜索两个嵌套字段:user
和 location
中的匹配项。它将每个字段的结果合并到一个单独的父文档排名列表中,同时使用 inner_hits
检索匹配的嵌套对象
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "user",
"query": {
"match": {
"user.name": "John"
}
},
"score_mode": "sum",
"inner_hits": {}
}
},
{
"nested": {
"path": "location",
"query": {
"match": {
"location.city": "Udaipur"
}
},
"inner_hits": {}
}
}
]
}
}
}
响应包括匹配的父文档以及 user
和 location
嵌套字段的相关嵌套 inner_hits
。每个内部匹配都显示了哪个嵌套对象匹配以及它对整体混合得分的贡献程度
...
{
"hits": [
{
"_index": "my-nlp-index",
"_id": "1",
"_score": 1.0,
"inner_hits": {
"location": {
"hits": {
"max_score": 0.44583148,
"hits": [
{
"_nested": {
"field": "location",
"offset": 1
},
"_score": 0.44583148,
"_source": {
"city": "Udaipur",
"state": "Rajasthan"
}
}
]
}
},
"user": {
"hits": {
"max_score": 0.4394061,
"hits": [
{
"_nested": {
"field": "user",
"offset": 0
},
"_score": 0.4394061,
"_source": {
"name": "John Alder",
"age": 35
}
}
]
}
}
}
// Additional details omitted for brevity
},
{
"_index": "my-nlp-index",
"_id": "2",
"_score": 5.0E-4,
"inner_hits": {
"user": {
"hits": {
"max_score": 0.31506687,
"hits": [
{
"_nested": {
"field": "user",
"offset": 0
},
"_score": 0.31506687,
"_source": {
"name": "John Wick",
"age": 46
}
},
{
"_nested": {
"field": "user",
"offset": 1
},
"_score": 0.31506687,
"_source": {
"name": "John Snow",
"age": 40
}
}
]
}
}
// Additional details omitted for brevity
}
}
]
// Additional details omitted for brevity
}
...
使用 explain 参数
要了解内部匹配如何影响混合得分,可以启用解释功能。响应将包含详细的评分信息。有关在混合查询中使用 explain
的更多信息,请参阅 混合搜索解释。
就资源和时间而言,explain
是一种开销较大的操作。对于生产集群,我们建议仅在故障排除时谨慎使用。
首先,将 hybrid_score_explanation
处理器添加到您在步骤 2 中创建的搜索管道中
PUT /_search/pipeline/nlp-search-pipeline
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean"
}
}
}
],
"response_processors": [
{
"hybrid_score_explanation": {}
}
]
}
然后,运行您在步骤 4 中运行的相同查询,并在搜索请求中包含 explain
参数
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline&explain=true
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "user",
"query": {
"match": {
"user.name": "John"
}
},
"score_mode": "sum",
"inner_hits": {}
}
},
{
"nested": {
"path": "location",
"query": {
"match": {
"location.city": "Udaipur"
}
},
"inner_hits": {}
}
}
]
}
}
}
响应中包含一个 _explanation
对象,其中包含详细的评分信息。嵌套的 details
数组提供了有关使用的评分模式、对父文档评分有贡献的子文档数量以及评分如何归一化和组合的相关信息
{
...
"_explanation": {
"value": 1.0,
"description": "arithmetic_mean combination of:",
"details": [
{
"value": 1.0,
"description": "min_max normalization of:",
"details": [
{
"value": 0.4458314776420593,
"description": "combined score of:",
"details": [
{
"value": 0.4394061,
"description": "Score based on 1 child docs in range from 0 to 6, using score mode Avg",
"details": [
{
"value": 0.4394061,
"description": "weight(user.name:john in 0) [PerFieldSimilarity], result of:"
// Additional details omitted for brevity
}
]
},
{
"value": 0.44583148,
"description": "Score based on 1 child docs in range from 0 to 6, using score mode Avg",
"details": [
{
"value": 0.44583148,
"description": "weight(location.city:udaipur in 5) [PerFieldSimilarity], result of:"
// Additional details omitted for brevity
}
]
}
]
}
]
}
]
}
}
...
内部匹配的排序
要应用排序,请在 inner_hits
子句中添加一个 sort
子句。例如,要按 user.age
排序,请在 inner_hits
子句中指定此排序条件
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "user",
"query": {
"match": {
"user.name": "John"
}
},
"score_mode": "sum",
"inner_hits": {
"sort": [
{
"user.age": {
"order": "desc"
}
}
]
}
}
},
{
"nested": {
"path": "location",
"query": {
"match": {
"location.city": "Udaipur"
}
},
"inner_hits": {}
}
}
]
}
}
}
在响应中,user
的内部匹配按年龄降序排列,而不是按相关性排序,这就是为什么 _score
字段为 null
(应用自定义排序时不会计算分数)
...
"user": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my-nlp-index",
"_id": "2",
"_nested": {
"field": "user",
"offset": 0
},
"_score": null,
"_source": {
"name": "John Wick",
"age": 46
},
"sort": [
46
]
},
{
"_index": "my-nlp-index",
"_id": "2",
"_nested": {
"field": "user",
"offset": 1
},
"_score": null,
"_source": {
"name": "John Snow",
"age": 40
},
"sort": [
40
]
}
]
}
}
...
内部匹配的分页
要对内部匹配结果进行分页,请在 inner_hits
子句中指定 from
参数(起始位置)和 size
参数(结果数量)。以下示例请求通过将 from
设置为 2
(跳过前两个)并将 size
设置为 2
(返回两个结果),仅从 user
字段中检索第三个和第四个嵌套对象
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "user",
"query": {
"match_all": {}
},
"inner_hits": {
"from": 2,
"size": 2
}
}
},
{
"nested": {
"path": "location",
"query": {
"match": {
"location.city": "Udaipur"
}
},
"inner_hits": {}
}
}
]
}
}
}
响应包含从偏移量 2
开始的 user
字段内部匹配
...
"user": {
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my-nlp-index",
"_id": "1",
"_nested": {
"field": "user",
"offset": 2
},
"_score": 1.0,
"_source": {
"name": "Mike",
"age": 32
}
},
{
"_index": "my-nlp-index",
"_id": "1",
"_nested": {
"field": "user",
"offset": 3
},
"_score": 1.0,
"_source": {
"name": "Maples",
"age": 30
}
}
]
}
}
...
为 inner_hits 字段定义自定义名称
为了区分单个查询中的多个内部匹配,可以在搜索响应中为内部匹配定义自定义名称。例如,您可以为 location
字段的内部匹配提供一个自定义名称 coordinates
,如下所示:
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "user",
"query": {
"match_all": {}
},
"inner_hits": {
"name": "coordinates"
}
}
},
{
"nested": {
"path": "location",
"query": {
"match": {
"location.city": "Udaipur"
}
},
"inner_hits": {}
}
}
]
}
}
}
在响应中,user
字段的内部匹配显示在自定义名称 coordinates
下
...
"inner_hits": {
"coordinates": {
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my-nlp-index",
"_id": "1",
"_nested": {
"field": "user",
"offset": 0
},
"_score": 1.0,
"_source": {
"name": "John Alder",
"age": 35
}
}
]
}
},
"location": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.44583148,
"hits": [
{
"_index": "my-nlp-index",
"_id": "1",
"_nested": {
"field": "location",
"offset": 1
},
"_score": 0.44583148,
"_source": {
"city": "Udaipur",
"state": "Rajasthan"
}
}
]
}
}
}
...