折叠混合查询结果
于 3.1 版本引入
collapse
参数允许您按字段对结果进行分组,只返回每个唯一字段值的最高得分文档。这在您希望避免搜索结果中出现重复项时非常有用。您进行折叠的字段必须是 keyword
类型或数字类型。返回结果的数量仍受查询中 size
参数的限制。
collapse
参数与其他混合查询搜索选项(如排序、解释和分页)兼容,使用其标准语法。
在混合查询中使用 collapse
时,请注意以下事项:
- 不支持内部匹配项(Inner hits)。
- 处理大型结果集时,性能可能会受到影响。
- 聚合在折叠前的结果上运行,而非最终输出。
- 分页行为变更:由于
collapse
减少了结果总数,它会影响结果在页面间的分布方式。要检索更多结果,请考虑增加分页深度。 - 结果可能与
collapse
响应处理器 返回的结果不同,后者在查询执行后应用折叠逻辑。
示例
以下示例演示了如何折叠混合查询结果。
创建索引
PUT /bakery-items
{
"mappings": {
"properties": {
"item": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"price": {
"type": "float"
},
"baked_date": {
"type": "date"
}
}
}
}
将文档摄取到索引中
POST /bakery-items/_bulk
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 15, "baked_date": "2023-07-01T00:00:00Z" }
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 18, "baked_date": "2023-07-04T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 12, "baked_date": "2023-07-02T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 16, "baked_date": "2023-07-03T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 17, "baked_date": "2023-07-09T00:00:00Z" }
创建一个搜索管道。此示例使用 min_max
归一化技术。
PUT /_search/pipeline/norm-pipeline
{
"description": "Normalization processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean"
}
}
}
]
}
搜索索引,按 item
字段对搜索结果进行分组。
GET /bakery-items/_search?search_pipeline=norm-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"item": "Chocolate Cake"
}
},
{
"bool": {
"must": {
"match": {
"category": "cakes"
}
}
}
}
]
}
},
"collapse": {
"field": "item"
}
}
响应返回折叠的搜索结果。
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "bakery-items",
"_id": "wBRPZZcB49c_2-1rYmO7",
"_score": 1.0,
"_source": {
"item": "Chocolate Cake",
"category": "cakes",
"price": 15,
"baked_date": "2023-07-01T00:00:00Z"
},
"fields": {
"item": [
"Chocolate Cake"
]
}
},
{
"_index": "bakery-items",
"_id": "whRPZZcB49c_2-1rYmO7",
"_score": 0.5005,
"_source": {
"item": "Vanilla Cake",
"category": "cakes",
"price": 12,
"baked_date": "2023-07-02T00:00:00Z"
},
"fields": {
"item": [
"Vanilla Cake"
]
}
}
]
}
折叠并排序结果
要折叠并排序混合查询结果,请在查询中提供 collapse
和 sort
参数。
GET /bakery-items/_search?search_pipeline=norm-pipeline
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"item": "Chocolate Cake"
}
},
{
"bool": {
"must": {
"match": {
"category": "cakes"
}
}
}
}
]
}
},
"collapse": {
"field": "item"
},
"sort": "price"
}
有关混合查询中排序的更多信息,请参阅 在混合查询中使用排序。
在响应中,文档按最低价格排序。
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "bakery-items",
"_id": "whRPZZcB49c_2-1rYmO7",
"_score": null,
"_source": {
"item": "Vanilla Cake",
"category": "cakes",
"price": 12,
"baked_date": "2023-07-02T00:00:00Z"
},
"fields": {
"item": [
"Vanilla Cake"
]
},
"sort": [
12.0
]
},
{
"_index": "bakery-items",
"_id": "wBRPZZcB49c_2-1rYmO7",
"_score": null,
"_source": {
"item": "Chocolate Cake",
"category": "cakes",
"price": 15,
"baked_date": "2023-07-01T00:00:00Z"
},
"fields": {
"item": [
"Chocolate Cake"
]
},
"sort": [
15.0
]
}
]
}
折叠并解释
在折叠搜索结果时,您可以提供 explain
查询参数。
GET /bakery-items/_search?search_pipeline=norm-pipeline&explain=true
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"item": "Chocolate Cake"
}
},
{
"bool": {
"must": {
"match": {
"category": "cakes"
}
}
}
}
]
}
},
"collapse": {
"field": "item"
}
}
响应包含每个搜索结果评分过程的详细信息。
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_shard": "[bakery-items][0]",
"_node": "Jlu8P9EaQCy3C1BxaFMa_g",
"_index": "bakery-items",
"_id": "3ZILepcBheX09_dPt8TD",
"_score": 1.0,
"_source": {
"item": "Chocolate Cake",
"category": "cakes",
"price": 15,
"baked_date": "2023-07-01T00:00:00Z"
},
"fields": {
"item": [
"Chocolate Cake"
]
},
"_explanation": {
"value": 1.0,
"description": "combined score of:",
"details": [
{
"value": 1.0,
"description": "ConstantScore(item:Chocolate Cake)",
"details": []
},
{
"value": 1.0,
"description": "ConstantScore(category:cakes)",
"details": []
}
]
}
},
{
"_shard": "[bakery-items][0]",
"_node": "Jlu8P9EaQCy3C1BxaFMa_g",
"_index": "bakery-items",
"_id": "35ILepcBheX09_dPt8TD",
"_score": 0.5005,
"_source": {
"item": "Vanilla Cake",
"category": "cakes",
"price": 12,
"baked_date": "2023-07-02T00:00:00Z"
},
"fields": {
"item": [
"Vanilla Cake"
]
},
"_explanation": {
"value": 1.0,
"description": "combined score of:",
"details": [
{
"value": 0.0,
"description": "ConstantScore(item:Chocolate Cake) doesn't match id 2",
"details": []
},
{
"value": 1.0,
"description": "ConstantScore(category:cakes)",
"details": []
}
]
}
}
]
}
有关在混合查询中使用 explain
的更多信息,请参阅 混合搜索解释。
折叠和分页
您可以通过提供 from
和 size
参数来对折叠后的结果进行分页。有关混合查询中分页的更多信息,请参阅 对混合查询结果进行分页。有关 from
和 size
的更多信息,请参阅 from
和 size
参数。
对于此示例,创建以下索引:
PUT /bakery-items-pagination
{
"settings": {
"index.number_of_shards": 3
},
"mappings": {
"properties": {
"item": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"price": {
"type": "float"
},
"baked_date": {
"type": "date"
}
}
}
}
将以下文档摄取到索引中:
POST /bakery-items-pagination/_bulk
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 15, "baked_date": "2023-07-01T00:00:00Z" }
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 18, "baked_date": "2023-07-02T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 12, "baked_date": "2023-07-02T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 11, "baked_date": "2023-07-04T00:00:00Z" }
{ "index": {} }
{ "item": "Ice Cream Cake", "category": "cakes", "price": 23, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Ice Cream Cake", "category": "cakes", "price": 22, "baked_date": "2023-07-10T00:00:00Z" }
{ "index": {} }
{ "item": "Carrot Cake", "category": "cakes", "price": 24, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Carrot Cake", "category": "cakes", "price": 26, "baked_date": "2023-07-21T00:00:00Z" }
{ "index": {} }
{ "item": "Red Velvet Cake", "category": "cakes", "price": 25, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Red Velvet Cake", "category": "cakes", "price": 29, "baked_date": "2023-07-30T00:00:00Z" }
{ "index": {} }
{ "item": "Cheesecake", "category": "cakes", "price": 27. "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Cheesecake", "category": "cakes", "price": 34. "baked_date": "2023-07-21T00:00:00Z" }
{ "index": {} }
{ "item": "Coffee Cake", "category": "cakes", "price": 42, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Coffee Cake", "category": "cakes", "price": 41, "baked_date": "2023-07-05T00:00:00Z" }
{ "index": {} }
{ "item": "Cocunut Cake", "category": "cakes", "price": 23, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Cocunut Cake", "category": "cakes", "price": 32, "baked_date": "2023-07-12T00:00:00Z" }
// Additional documents omitted for brevity
运行一个 hybrid
查询,通过指定 from
和 size
参数来对结果进行分页。在以下示例中,查询请求从第 6 个位置开始的两个结果(from: 5, size: 2
)。分页深度设置为限制每个分片最多返回 10 个文档。检索到结果后,应用 collapse
参数以按 item
字段对它们进行分组。
GET /bakery-items-pagination/_search?search_pipeline=norm-pipeline
{
"query": {
"hybrid": {
"pagination_depth": 10,
"queries": [
{
"match": {
"item": "Chocolate Cake"
}
},
{
"bool": {
"must": {
"match": {
"category": "cakes"
}
}
}
}
]
}
},
"from": 5,
"size": 2,
"collapse": {
"field": "item"
}
}
"hits": {
"total": {
"value": 70,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "bakery-items-pagination",
"_id": "gDayepcBIkxlgFKYda0p",
"_score": 0.5005,
"_source": {
"item": "Red Velvet Cake",
"category": "cakes",
"price": 29,
"baked_date": "2023-07-30T00:00:00Z"
},
"fields": {
"item": [
"Red Velvet Cake"
]
}
},
{
"_index": "bakery-items-pagination",
"_id": "aTayepcBIkxlgFKYca15",
"_score": 0.5005,
"_source": {
"item": "Vanilla Cake",
"category": "cakes",
"price": 12,
"baked_date": "2023-07-02T00:00:00Z"
},
"fields": {
"item": [
"Vanilla Cake"
]
}
}
]
}