自动完成功能
自动完成功能在用户输入时显示建议。
例如,如果用户输入“pop”,OpenSearch 会提供“popcorn”或“popsicles”等建议。这些建议预判了用户的意图,并引导他们更快地找到可能的搜索词。
OpenSearch 允许您设计自动完成功能,使其能够随每次按键更新、提供一些相关的建议并容忍拼写错误。
使用以下方法之一实现自动完成
前缀匹配发生在查询时,而其他三种方法发生在索引时。所有方法都将在以下部分中描述。
前缀匹配
前缀匹配查找与查询字符串中最后一个词匹配的文档。
例如,假设用户在搜索 UI 中输入“qui”。要自动完成此短语,请使用 match_phrase_prefix
查询来搜索所有以“qui”为前缀的 text_entry
字段值。
GET shakespeare/_search
{
"query": {
"match_phrase_prefix": {
"text_entry": {
"query": "qui",
"slop": 3
}
}
}
}
为了使词序和相对位置灵活,请指定一个 slop
值。要了解 slop
选项,请参见 Slop。
前缀匹配不需要任何特殊映射。它直接与您的数据配合使用。然而,这是一项相当耗费资源的操作。前缀为 a
可能会匹配数十万个词,这对您的用户没有用。为了限制前缀扩展的影响,请将 max_expansions
设置为一个合理的数字。
GET shakespeare/_search
{
"query": {
"match_phrase_prefix": {
"text_entry": {
"query": "qui",
"slop": 3,
"max_expansions": 10
}
}
}
}
查询可以扩展到的最大词数。查询将搜索词“扩展”到与 fuzziness
中指定的距离内的匹配词。
实现查询时自动完成的简易性是以性能为代价的。在大规模实现此功能时,我们建议使用索引时解决方案。使用索引时解决方案,您可能会遇到较慢的索引速度,但这只是您一次性付出的代价,而不是每次查询都付出的代价。边缘 n-gram、即时搜索(search-as-you-type)和完成建议器方法都是索引时解决方案。
边缘 n-gram 匹配
在索引期间,边缘 n-gram 将一个词拆分为 n 个字符的序列,以支持更快地查找部分搜索词。
如果您对词“quick”进行 n-gram 处理,结果将取决于 n 的值。
n | 类型 | n-gram |
---|---|---|
1 | Unigram | [ q , u , i , c , k ] |
2 | Bigram | [ qu , ui , ic , ck ] |
3 | Trigram | [ qui , uic , ick ] |
4 | Four-gram | [ quic , uick ] |
5 | Five-gram | [ quick ] |
自动完成只需要搜索短语的起始 n-gram,因此 OpenSearch 使用一种特殊类型的 n-gram,称为 边缘 n-gram。
对词“quick”进行边缘 n-gram 处理会得到以下结果
q
qu
qui
quic
quick
这遵循了用户输入的相同序列。
要配置字段以使用边缘 n-gram,请创建一个带有 edge_ngram
过滤器的自动完成分析器。
PUT shakespeare
{
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
"settings": {
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edge_ngram_filter"
]
}
}
}
}
}
此示例创建索引并实例化边缘 n-gram 过滤器和分析器。
edge_ngram_filter
生成的边缘 n-gram 最小长度为 1(单个字母),最大长度为 20。因此,它为最多 20 个字母的词提供建议。
autocomplete
分析器将字符串分词为单独的词,将词转换为小写,然后使用 edge_ngram_filter
为每个词生成边缘 n-gram。
使用 analyze
操作测试此分析器
POST shakespeare/_analyze
{
"analyzer": "autocomplete",
"text": "quick"
}
它以词元的形式返回边缘 n-gram
q
qu
qui
quic
quick
在搜索时使用 standard
分析器。否则,搜索查询会拆分为边缘 n-gram,您将获得与 q
、u
和 i
匹配的所有结果。这是少数几种在索引时和查询时使用不同分析器的情况之一。
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": {
"query": "qui",
"analyzer": "standard"
}
}
}
}
响应包含匹配文档:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 533,
"relation": "eq"
},
"max_score": 9.712725,
"hits": [
{
"_index": "shakespeare",
"_id": "22006",
"_score": 9.712725,
"_source": {
"type": "line",
"line_id": 22007,
"play_name": "Antony and Cleopatra",
"speech_number": 12,
"line_number": "5.2.44",
"speaker": "CLEOPATRA",
"text_entry": "Quick, quick, good hands."
}
},
{
"_index": "shakespeare",
"_id": "54665",
"_score": 9.712725,
"_source": {
"type": "line",
"line_id": 54666,
"play_name": "Loves Labours Lost",
"speech_number": 21,
"line_number": "5.1.52",
"speaker": "HOLOFERNES",
"text_entry": "Quis, quis, thou consonant?"
}
}
...
]
}
}
或者,在映射本身中指定 search_analyzer
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
完成建议器
完成建议器接受一个建议列表,并将其构建为有限状态换能器(FST),这是一种优化的数据结构,本质上是一个图。此数据结构驻留在内存中,并针对快速前缀查找进行了优化。要了解更多关于 FST 的信息,请参见 维基百科。
当用户输入时,完成建议器沿着匹配路径逐个字符地遍历 FST 图。在用户输入耗尽后,它会检查剩余的结尾以生成建议列表。
完成建议器使您的自动完成解决方案尽可能高效,并让您能够对其建议进行明确控制。
使用名为 completion
的专用字段类型,它将 FST 类似的数据结构存储在索引中。
PUT shakespeare
{
"mappings": {
"properties": {
"text_entry": {
"type": "completion"
}
}
}
}
要获取建议,请使用带有 suggest
参数的 search
端点。
GET shakespeare/_search
{
"suggest": {
"autocomplete": {
"prefix": "To be",
"completion": {
"field": "text_entry"
}
}
}
}
短语“to be”与 text_entry
字段的 FST 进行前缀匹配。
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"autocomplete" : [
{
"text" : "To be",
"offset" : 0,
"length" : 5,
"options" : [
{
"text" : "To be a comrade with the wolf and owl,--",
"_index" : "shakespeare",
"_id" : "50652",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 50653,
"play_name" : "King Lear",
"speech_number" : 68,
"line_number" : "2.4.230",
"speaker" : "KING LEAR",
"text_entry" : "To be a comrade with the wolf and owl,--"
}
},
{
"text" : "To be a make-peace shall become my age:",
"_index" : "shakespeare",
"_id" : "78566",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 78567,
"play_name" : "Richard II",
"speech_number" : 20,
"line_number" : "1.1.160",
"speaker" : "JOHN OF GAUNT",
"text_entry" : "To be a make-peace shall become my age:"
}
},
{
"text" : "To be a party in this injury.",
"_index" : "shakespeare",
"_id" : "75259",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 75260,
"play_name" : "Othello",
"speech_number" : 57,
"line_number" : "5.1.93",
"speaker" : "IAGO",
"text_entry" : "To be a party in this injury."
}
},
{
"text" : "To be a preparation gainst the Polack;",
"_index" : "shakespeare",
"_id" : "33591",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 33592,
"play_name" : "Hamlet",
"speech_number" : 17,
"line_number" : "2.2.67",
"speaker" : "VOLTIMAND",
"text_entry" : "To be a preparation gainst the Polack;"
}
},
{
"text" : "To be a public spectacle to all:",
"_index" : "shakespeare",
"_id" : "3709",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 3710,
"play_name" : "Henry VI Part 1",
"speech_number" : 6,
"line_number" : "1.4.41",
"speaker" : "TALBOT",
"text_entry" : "To be a public spectacle to all:"
}
}
]
}
]
}
}
要指定要返回的建议数量,请使用 size
参数。
GET shakespeare/_search
{
"suggest": {
"autocomplete": {
"prefix": "To n",
"completion": {
"field": "text_entry",
"size": 3
}
}
}
}
最多返回三个文档。
{
"took" : 4109,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"autocomplete" : [
{
"text" : "To n",
"offset" : 0,
"length" : 4,
"options" : [
{
"text" : "To NESTOR",
"_index" : "shakespeare",
"_id" : "99707",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 99708,
"play_name" : "Troilus and Cressida",
"speech_number" : 3,
"line_number" : "",
"speaker" : "ULYSSES",
"text_entry" : "To NESTOR"
}
},
{
"text" : "To name the bigger light, and how the less,",
"_index" : "shakespeare",
"_id" : "91884",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 91885,
"play_name" : "The Tempest",
"speech_number" : 91,
"line_number" : "1.2.394",
"speaker" : "CALIBAN",
"text_entry" : "To name the bigger light, and how the less,"
}
},
{
"text" : "To nature none more bound; his training such,",
"_index" : "shakespeare",
"_id" : "40510",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 40511,
"play_name" : "Henry VIII",
"speech_number" : 18,
"line_number" : "1.2.126",
"speaker" : "KING HENRY VIII",
"text_entry" : "To nature none more bound; his training such,"
}
}
]
}
]
}
}
suggest
参数仅使用前缀匹配来查找建议。例如,文档“To be, or not to be”不属于结果。如果您希望返回特定文档作为建议,可以手动添加精心策划的建议并添加权重以优先显示您的建议。
索引包含输入建议的文档并分配权重
PUT shakespeare/_doc/1?refresh=true
{
"text_entry": {
"input": [
"To n", "To be, or not to be: that is the question:"
],
"weight": 10
}
}
执行相同的搜索
GET shakespeare/_search
{
"suggest": {
"autocomplete": {
"prefix": "To n",
"completion": {
"field": "text_entry",
"size": 3
}
}
}
}
您将看到索引文档作为第一个结果。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"autocomplete" : [
{
"text" : "To n",
"offset" : 0,
"length" : 4,
"options" : [
{
"text" : "To n",
"_index" : "shakespeare",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"text_entry" : {
"input" : [
"To n",
"To be, or not to be: that is the question:"
],
"weight" : 10
}
}
},
{
"text" : "To NESTOR",
"_index" : "shakespeare",
"_id" : "99707",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 99708,
"play_name" : "Troilus and Cressida",
"speech_number" : 3,
"line_number" : "",
"speaker" : "ULYSSES",
"text_entry" : "To NESTOR"
}
},
{
"text" : "To name the bigger light, and how the less,",
"_index" : "shakespeare",
"_id" : "91884",
"_score" : 1.0,
"_source" : {
"type" : "line",
"line_id" : 91885,
"play_name" : "The Tempest",
"speech_number" : 91,
"line_number" : "1.2.394",
"speaker" : "CALIBAN",
"text_entry" : "To name the bigger light, and how the less,"
}
}
]
}
]
}
}
您还可以通过指定 fuzzy
参数来允许查询中的拼写错误。
GET shakespeare/_search
{
"suggest": {
"autocomplete": {
"prefix": "rosenkrantz",
"completion": {
"field": "text_entry",
"size": 3,
"fuzzy" : {
"fuzziness" : "AUTO"
}
}
}
}
}
结果与正确的拼写匹配。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"autocomplete" : [
{
"text" : "rosenkrantz",
"offset" : 0,
"length" : 11,
"options" : [
{
"text" : "ROSENCRANTZ:",
"_index" : "shakespeare",
"_id" : "35196",
"_score" : 5.0,
"_source" : {
"type" : "line",
"line_id" : 35197,
"play_name" : "Hamlet",
"speech_number" : 2,
"line_number" : "4.2.1",
"speaker" : "HAMLET",
"text_entry" : "ROSENCRANTZ:"
}
}
]
}
]
}
}
您可以使用正则表达式来定义完成建议器查询的前缀。
GET shakespeare/_search
{
"suggest": {
"autocomplete": {
"prefix": "rosen*",
"completion": {
"field": "text_entry",
"size": 3
}
}
}
}
欲了解更多信息,请参阅 completion
字段类型文档。
边输入边搜索
OpenSearch 有一个专用的 search_as_you_type
字段类型,它针对即时搜索功能进行了优化,可以使用前缀和中缀完成来匹配词。 search_as_you_type
字段不需要您预先设置自定义分析器或索引建议。
首先,将字段映射为 search_as_you_type
PUT shakespeare
{
"mappings": {
"properties": {
"text_entry": {
"type": "search_as_you_type"
}
}
}
}
索引文档后,OpenSearch 会自动创建并存储其 n-gram 和边缘 n-gram。例如,考虑字符串 that is the question
。首先,它使用标准分析器拆分为词,这些词存储在 text_entry
字段中。
[
"that",
"is",
"the",
"question"
]
除了存储这些词之外,此字段的以下 2-gram 存储在 text_entry._2gram
字段中:
[
"that is",
"is the",
"the question"
]
此字段的以下 3-gram 存储在 text_entry._3gram
字段中:
[
"that is the",
"is the question"
]
最后,在应用边缘 n-gram 词元过滤器后,结果词元存储在 text_entry._index_prefix
字段中。
[
"t",
"th",
"tha",
"that",
...
]
然后,您可以使用 multi-match
查询的 bool_prefix
类型以任意顺序匹配词。
GET shakespeare/_search
{
"query": {
"multi_match": {
"query": "uncle what",
"type": "bool_prefix",
"fields": [
"text_entry",
"text_entry._2gram",
"text_entry._3gram"
]
}
},
"size": 3
}
词语在查询中以相同顺序出现的文档在结果中排名更高。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4759,
"relation" : "eq"
},
"max_score" : 10.437667,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "2817",
"_score" : 10.437667,
"_source" : {
"type" : "line",
"line_id" : 2818,
"play_name" : "Henry IV",
"speech_number" : 5,
"line_number" : "5.2.31",
"speaker" : "HOTSPUR",
"text_entry" : "Uncle, what news?"
}
},
{
"_index" : "shakespeare",
"_id" : "37085",
"_score" : 9.437667,
"_source" : {
"type" : "line",
"line_id" : 37086,
"play_name" : "Henry V",
"speech_number" : 26,
"line_number" : "1.2.262",
"speaker" : "KING HENRY V",
"text_entry" : "What treasure, uncle?"
}
},
{
"_index" : "shakespeare",
"_id" : "79274",
"_score" : 9.358302,
"_source" : {
"type" : "line",
"line_id" : 79275,
"play_name" : "Richard II",
"speech_number" : 29,
"line_number" : "2.1.187",
"speaker" : "KING RICHARD II",
"text_entry" : "Why, uncle, whats the matter?"
}
}
]
}
}
要按顺序匹配词语,您可以使用 match_phrase_prefix
查询。
GET shakespeare/_search
{
"query": {
"match_phrase_prefix": {
"text_entry": "uncle wha"
}
},
"size": 3
}
响应包含与前缀匹配的文档。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 16.37664,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "2817",
"_score" : 16.37664,
"_source" : {
"type" : "line",
"line_id" : 2818,
"play_name" : "Henry IV",
"speech_number" : 5,
"line_number" : "5.2.31",
"speaker" : "HOTSPUR",
"text_entry" : "Uncle, what news?"
}
},
{
"_index" : "shakespeare",
"_id" : "6789",
"_score" : 16.37664,
"_source" : {
"type" : "line",
"line_id" : 6790,
"play_name" : "Henry VI Part 2",
"speech_number" : 60,
"line_number" : "1.3.202",
"speaker" : "KING HENRY VI",
"text_entry" : "Uncle, what shall we say to this in law?"
}
},
{
"_index" : "shakespeare",
"_id" : "7877",
"_score" : 16.37664,
"_source" : {
"type" : "line",
"line_id" : 7878,
"play_name" : "Henry VI Part 2",
"speech_number" : 13,
"line_number" : "3.2.28",
"speaker" : "KING HENRY VI",
"text_entry" : "Where is our uncle? whats the matter, Suffolk?"
}
}
]
}
}
最后,要精确匹配最后一个词而不是作为前缀,您可以使用 match_phrase
查询。
GET shakespeare/_search
{
"query": {
"match_phrase": {
"text_entry": "uncle what"
}
},
"size": 5
}
响应包含精确匹配。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 14.437452,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "2817",
"_score" : 14.437452,
"_source" : {
"type" : "line",
"line_id" : 2818,
"play_name" : "Henry IV",
"speech_number" : 5,
"line_number" : "5.2.31",
"speaker" : "HOTSPUR",
"text_entry" : "Uncle, what news?"
}
},
{
"_index" : "shakespeare",
"_id" : "6789",
"_score" : 9.461917,
"_source" : {
"type" : "line",
"line_id" : 6790,
"play_name" : "Henry VI Part 2",
"speech_number" : 60,
"line_number" : "1.3.202",
"speaker" : "KING HENRY VI",
"text_entry" : "Uncle, what shall we say to this in law?"
}
},
{
"_index" : "shakespeare",
"_id" : "100955",
"_score" : 8.947967,
"_source" : {
"type" : "line",
"line_id" : 100956,
"play_name" : "Troilus and Cressida",
"speech_number" : 28,
"line_number" : "3.2.98",
"speaker" : "CRESSIDA",
"text_entry" : "Well, uncle, what folly I commit, I dedicate to you."
}
}
]
}
}
如果您修改上一个 match_phrase
查询中的文本并省略最后一个字母,则上一个响应中的任何文档都不会返回。
GET shakespeare/_search
{
"query": {
"match_phrase": {
"text_entry": "uncle wha"
}
}
}
结果为空。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
欲了解更多信息,请参阅 search_as_you_type
字段类型文档。