结果排序
排序允许用户以对他们最有意义的方式对结果进行排序。
默认情况下,全文查询根据相关性分数对结果进行排序。您可以通过将 order
参数设置为 asc
(升序)或 desc
(降序)来选择按任意字段值进行排序。
例如,要按 line_id
值的降序排列结果,请使用以下查询
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"line_id": {
"order": "desc"
}
}
]
}
结果按 line_id
降序排列
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3205,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "3204",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3205,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "",
"speaker" : "KING HENRY IV",
"text_entry" : "Exeunt"
},
"sort" : [
3205
]
},
{
"_index" : "shakespeare",
"_id" : "3203",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3204,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.45",
"speaker" : "KING HENRY IV",
"text_entry" : "Let us not leave till all our own be won."
},
"sort" : [
3204
]
},
{
"_index" : "shakespeare",
"_id" : "3202",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3203,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.44",
"speaker" : "KING HENRY IV",
"text_entry" : "And since this business so fair is done,"
},
"sort" : [
3203
]
},
{
"_index" : "shakespeare",
"_id" : "3201",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3202,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.43",
"speaker" : "KING HENRY IV",
"text_entry" : "Meeting the cheque of such another day:"
},
"sort" : [
3202
]
},
{
"_index" : "shakespeare",
"_id" : "3200",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3201,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.42",
"speaker" : "KING HENRY IV",
"text_entry" : "Rebellion in this land shall lose his sway,"
},
"sort" : [
3201
]
},
{
"_index" : "shakespeare",
"_id" : "3199",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3200,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.41",
"speaker" : "KING HENRY IV",
"text_entry" : "To fight with Glendower and the Earl of March."
},
"sort" : [
3200
]
},
{
"_index" : "shakespeare",
"_id" : "3198",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3199,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.40",
"speaker" : "KING HENRY IV",
"text_entry" : "Myself and you, son Harry, will towards Wales,"
},
"sort" : [
3199
]
},
{
"_index" : "shakespeare",
"_id" : "3197",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3198,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.39",
"speaker" : "KING HENRY IV",
"text_entry" : "Who, as we hear, are busily in arms:"
},
"sort" : [
3198
]
},
{
"_index" : "shakespeare",
"_id" : "3196",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3197,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.38",
"speaker" : "KING HENRY IV",
"text_entry" : "To meet Northumberland and the prelate Scroop,"
},
"sort" : [
3197
]
},
{
"_index" : "shakespeare",
"_id" : "3195",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3196,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.37",
"speaker" : "KING HENRY IV",
"text_entry" : "Towards York shall bend you with your dearest speed,"
},
"sort" : [
3196
]
}
]
}
}
sort
参数是一个数组,因此您可以按照优先级的顺序指定多个字段值。
如果两个字段的 line_id
值相同,OpenSearch 会使用 speech_number
作为第二排序选项
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"line_id": {
"order": "desc"
}
},
{
"speech_number": {
"order": "desc"
}
}
]
}
您可以继续按任意数量的字段值进行排序,以获得正确的排序结果。这不一定是数值——您也可以按日期或时间戳字段排序
"sort": [
{
"date": {
"order": "desc"
}
}
]
经过分析的文本字段不能用于对文档进行排序,因为倒排索引只包含单独的词元化术语,而不是整个字符串。因此,例如,您不能按 play_name
排序。
要绕过此限制,您可以使用映射为关键字类型的原始文本字段版本。在以下示例中,play_name.keyword
未经分析,并且您拥有完整原始版本的副本,用于排序目的
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"play_name.keyword": {
"order": "desc"
}
}
]
}
结果按 play_name
字段的字母顺序排序。
将 sort
与 search_after
参数一起使用可实现更高效的滚动。结果将从 search_after
数组中指定排序值之后的文档开始。
确保 search_after
数组中的值数量与 sort
数组中的值数量相同,且顺序也相同。在此示例中,您请求的结果将从 line_id = 3202
和 speech_number = 8
之后的文档开始。
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"line_id": {
"order": "desc"
}
},
{
"speech_number": {
"order": "desc"
}
}
],
"search_after": [
"3202",
"8"
]
}
排序模式
排序模式适用于按数组或多值字段排序。它指定应选择哪个数组值来对文档进行排序。对于包含数字数组的数值字段,您可以按 avg
(平均值)、sum
(总和)或 median
(中位数)模式排序。要按最小值或最大值排序,请使用适用于数值和字符串数据类型的 min
(最小值)或 max
(最大值)模式。
升序排序的默认模式是 min
,降序排序的默认模式是 max
。
以下示例演示了使用排序模式按数组字段进行排序。
考虑一个存储学生成绩的索引。将两个文档索引到该索引中
PUT students/_doc/1
{
"name": "John Doe",
"grades": [70, 90]
}
PUT students/_doc/2
{
"name": "Mary Major",
"grades": [80, 100]
}
使用 avg
模式按最高平均成绩对所有学生进行排序
GET students/_search
{
"query" : {
"match_all": {}
},
"sort" : [
{"grades" : {"order" : "desc", "mode" : "avg"}}
]
}
响应包含按 grades
降序排序的学生
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Mary Major",
"grades" : [
80,
100
]
},
"sort" : [
90
]
},
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"grades" : [
70,
90
]
},
"sort" : [
80
]
}
]
}
}
嵌套对象排序
在对 嵌套 对象进行排序时,请提供 path
参数以指定要排序的字段路径。
例如,在 students
索引中,将变量 first_sem
映射为 nested
PUT students
{
"mappings" : {
"properties": {
"first_sem": {
"type" : "nested"
}
}
}
}
索引两个包含嵌套字段的文档
PUT students/_doc/1
{
"name": "John Doe",
"first_sem" : {
"grades": [70, 90]
}
}
PUT students/_doc/2
{
"name": "Mary Major",
"first_sem": {
"grades": [80, 100]
}
}
按平均成绩排序时,提供嵌套字段的路径
GET students/_search
{
"query" : {
"match_all": {}
},
"sort" : [
{"first_sem.grades": {
"order" : "desc",
"mode" : "avg",
"nested": {
"path": "first_sem"
}
}
}
]
}
处理缺失值
missing
参数指定了缺失值的处理方式。内置的有效值包括 _last
(将包含缺失值的文档排在最后)和 _first
(将包含缺失值的文档排在最前面)。默认值是 _last
。您还可以指定一个自定义值作为缺失文档的排序值。
例如,您可以索引一个包含 average
字段的文档,以及另一个不包含 average
字段的文档
PUT students/_doc/1
{
"name": "John Doe",
"average": 80
}
PUT students/_doc/2
{
"name": "Mary Major"
}
对文档进行排序,将缺失字段的文档排在最前面
GET students/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc",
"missing": "_first"
}
}
]
}
响应首先列出文档 2
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Mary Major"
},
"sort" : [
9223372036854775807
]
},
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"average" : 80
},
"sort" : [
80
]
}
]
}
}
忽略未映射字段
如果一个字段未映射,默认情况下按此字段排序的搜索请求会失败。为避免这种情况,您可以使用 unmapped_type
参数,该参数指示 OpenSearch 忽略该字段。例如,如果将 unmapped_type
设置为 long
,则该字段将被视为已映射为 long
类型。此外,索引中所有具有 unmapped_type
字段的文档都将被视为在此字段中没有值,因此不会按此字段排序。
例如,考虑两个索引。在第一个索引中索引一个包含 average
字段的文档
PUT students/_doc/1
{
"name": "John Doe",
"average": 80
}
在第二个索引中索引一个不包含 average
字段的文档
PUT students_no_map/_doc/2
{
"name": "Mary Major"
}
搜索两个索引中的所有文档,并按 average
字段对其进行排序
GET students*/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc"
}
}
]
}
默认情况下,第二个索引会产生错误,因为 average
字段未映射
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 1,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 0,
"index" : "students_no_map",
"node" : "cam9NWqVSV-jUIkQ3tRubw",
"reason" : {
"type" : "query_shard_exception",
"reason" : "No mapping found for [average] in order to sort on",
"index" : "students_no_map",
"index_uuid" : "JgfRkypKSUSpyU-ZXr9kKA"
}
}
]
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"average" : 80
},
"sort" : [
80
]
}
]
}
}
您可以指定 unmapped_type
参数,以便忽略未映射的字段
GET students*/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc",
"unmapped_type": "long"
}
}
]
}
响应包含这两个文档
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"average" : 80
},
"sort" : [
80
]
},
{
"_index" : "students_no_map",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Mary Major"
},
"sort" : [
-9223372036854775808
]
}
]
}
}
跟踪分数
默认情况下,按字段排序时不会计算分数。您可以将 track_scores
设置为 true
来计算并跟踪分数
GET students/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc"
}
}
],
"track_scores": true
}
按地理距离排序
您可以按 _geo_distance
对文档进行排序。支持以下参数。
参数 | 描述 |
---|---|
distance_type | 指定距离计算方法。有效值为 arc (弧长)和 plane (平面)。plane 方法更快,但对于长距离或靠近两极的距离,准确性较低。默认值是 arc 。 |
mode | 指定如何处理具有多个地理点(geopoint)的字段。默认情况下,当排序顺序为升序时,文档按最短距离排序;当排序顺序为降序时,文档按最长距离排序。有效值为 min (最小值)、max (最大值)、median (中位数)和 avg (平均值)。 |
unit | 指定用于计算排序值的单位。默认是米 (m )。 |
ignore_unmapped | 指定如何处理未映射的字段。将 ignore_unmapped 设置为 true 以忽略未映射的字段。默认值是 false (遇到未映射字段时会产生错误)。 |
_geo_distance
参数不支持 missing_values
。当文档不包含用于计算距离的字段时,距离总是被视为 infinity
(无穷大)。
例如,索引两个带有地理点(geopoint)的文档
PUT testindex1/_doc/1
{
"point": [74.00, 40.71]
}
PUT testindex1/_doc/2
{
"point": [73.77, -69.63]
}
搜索所有文档并按其与给定点的距离进行排序
GET testindex1/_search
{
"sort": [
{
"_geo_distance": {
"point": [59, -54],
"order": "asc",
"unit": "km",
"distance_type": "arc",
"mode": "min",
"ignore_unmapped": true
}
}
],
"query": {
"match_all": {}
}
}
响应包含排序后的文档
{
"took" : 864,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "testindex1",
"_id" : "2",
"_score" : null,
"_source" : {
"point" : [
73.77,
-69.63
]
},
"sort" : [
1891.2667493895767
]
},
{
"_index" : "testindex1",
"_id" : "1",
"_score" : null,
"_source" : {
"point" : [
74.0,
40.71
]
},
"sort" : [
10628.402240213345
]
}
]
}
}
您可以提供地理点字段类型支持的任何格式的坐标。有关所有格式的说明,请参阅 地理点字段类型文档。
要将多个地理点传递给 _geo_distance
,请使用数组
GET testindex1/_search
{
"sort": [
{
"_geo_distance": {
"point": [[59, -54], [60, -53]],
"order": "asc",
"unit": "km",
"distance_type": "arc",
"mode": "min",
"ignore_unmapped": true
}
}
],
"query": {
"match_all": {}
}
}
对于每个文档,排序距离将计算为搜索中提供的所有点到文档中所有点的距离的最小值、最大值或平均值(根据 mode
指定)。
性能考量
排序字段值会被加载到内存中进行排序。因此,为了最小化开销,我们建议将 数值类型 映射到最小可接受的类型,例如 short
、integer
和 float
。字符串类型 的排序字段不应被分析或词元化。