自动完成功能

自动完成功能在用户输入时显示建议。

例如，如果用户输入“pop”，OpenSearch 会提供“popcorn”或“popsicles”等建议。这些建议预判了用户的意图，并引导他们更快地找到可能的搜索词。

OpenSearch 允许您设计自动完成功能，使其能够随每次按键更新、提供一些相关的建议并容忍拼写错误。

使用以下方法之一实现自动完成

前缀匹配
边缘 n-gram 匹配
边输入边搜索
完成建议器

前缀匹配发生在查询时，而其他三种方法发生在索引时。所有方法都将在以下部分中描述。

前缀匹配

前缀匹配查找与查询字符串中最后一个词匹配的文档。

例如，假设用户在搜索 UI 中输入“qui”。要自动完成此短语，请使用 match_phrase_prefix 查询来搜索所有以“qui”为前缀的 text_entry 字段值。

GET shakespeare/_search
{
  "query": {
    "match_phrase_prefix": {
      "text_entry": {
        "query": "qui",
        "slop": 3
      }
    }
  }
}

为了使词序和相对位置灵活，请指定一个 slop 值。要了解 slop 选项，请参见 Slop。

前缀匹配不需要任何特殊映射。它直接与您的数据配合使用。然而，这是一项相当耗费资源的操作。前缀为 a 可能会匹配数十万个词，这对您的用户没有用。为了限制前缀扩展的影响，请将 max_expansions 设置为一个合理的数字。

GET shakespeare/_search
{
  "query": {
    "match_phrase_prefix": {
      "text_entry": {
        "query": "qui",
        "slop": 3,
        "max_expansions": 10
      }
    }
  }
}

查询可以扩展到的最大词数。查询将搜索词“扩展”到与 fuzziness 中指定的距离内的匹配词。

实现查询时自动完成的简易性是以性能为代价的。在大规模实现此功能时，我们建议使用索引时解决方案。使用索引时解决方案，您可能会遇到较慢的索引速度，但这只是您一次性付出的代价，而不是每次查询都付出的代价。边缘 n-gram、即时搜索（search-as-you-type）和完成建议器方法都是索引时解决方案。

边缘 n-gram 匹配

在索引期间，边缘 n-gram 将一个词拆分为 n 个字符的序列，以支持更快地查找部分搜索词。

如果您对词“quick”进行 n-gram 处理，结果将取决于 n 的值。

n	类型	n-gram
1	Unigram	[ `q`, `u`, `i`, `c`, `k` ]
2	Bigram	[ `qu`, `ui`, `ic`, `ck` ]
3	Trigram	[ `qui`, `uic`, `ick` ]
4	Four-gram	[ `quic`, `uick` ]
5	Five-gram	[ `quick` ]

自动完成只需要搜索短语的起始 n-gram，因此 OpenSearch 使用一种特殊类型的 n-gram，称为 边缘 n-gram。

对词“quick”进行边缘 n-gram 处理会得到以下结果

q
qu
qui
quic
quick

这遵循了用户输入的相同序列。

要配置字段以使用边缘 n-gram，请创建一个带有 edge_ngram 过滤器的自动完成分析器。

PUT shakespeare
{
  "mappings": {
    "properties": {
      "text_entry": {
        "type": "text",
        "analyzer": "autocomplete"
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      }
    }
  }
}

此示例创建索引并实例化边缘 n-gram 过滤器和分析器。

edge_ngram_filter 生成的边缘 n-gram 最小长度为 1（单个字母），最大长度为 20。因此，它为最多 20 个字母的词提供建议。

autocomplete 分析器将字符串分词为单独的词，将词转换为小写，然后使用 edge_ngram_filter 为每个词生成边缘 n-gram。

使用 analyze 操作测试此分析器

POST shakespeare/_analyze
{
  "analyzer": "autocomplete",
  "text": "quick"
}

它以词元的形式返回边缘 n-gram

q
qu
qui
quic
quick

在搜索时使用 standard 分析器。否则，搜索查询会拆分为边缘 n-gram，您将获得与 q、u 和 i 匹配的所有结果。这是少数几种在索引时和查询时使用不同分析器的情况之一。

GET shakespeare/_search
{
  "query": {
    "match": {
      "text_entry": {
        "query": "qui",
        "analyzer": "standard"
      }
    }
  }
}

响应包含匹配文档：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 533,
      "relation": "eq"
    },
    "max_score": 9.712725,
    "hits": [
      {
        "_index": "shakespeare",
        "_id": "22006",
        "_score": 9.712725,
        "_source": {
          "type": "line",
          "line_id": 22007,
          "play_name": "Antony and Cleopatra",
          "speech_number": 12,
          "line_number": "5.2.44",
          "speaker": "CLEOPATRA",
          "text_entry": "Quick, quick, good hands."
        }
      },
      {
        "_index": "shakespeare",
        "_id": "54665",
        "_score": 9.712725,
        "_source": {
          "type": "line",
          "line_id": 54666,
          "play_name": "Loves Labours Lost",
          "speech_number": 21,
          "line_number": "5.1.52",
          "speaker": "HOLOFERNES",
          "text_entry": "Quis, quis, thou consonant?"
        }
      }
      ...
    ]
  }
}

或者，在映射本身中指定 search_analyzer

"mappings": {
  "properties": {
    "text_entry": {
      "type": "text",
      "analyzer": "autocomplete",
      "search_analyzer": "standard"
    }
  }
}

完成建议器

完成建议器接受一个建议列表，并将其构建为有限状态换能器（FST），这是一种优化的数据结构，本质上是一个图。此数据结构驻留在内存中，并针对快速前缀查找进行了优化。要了解更多关于 FST 的信息，请参见维基百科。

当用户输入时，完成建议器沿着匹配路径逐个字符地遍历 FST 图。在用户输入耗尽后，它会检查剩余的结尾以生成建议列表。

完成建议器使您的自动完成解决方案尽可能高效，并让您能够对其建议进行明确控制。

使用名为 completion 的专用字段类型，它将 FST 类似的数据结构存储在索引中。

PUT shakespeare
{
  "mappings": {
    "properties": {
      "text_entry": {
        "type": "completion"
      }
    }
  }
}

要获取建议，请使用带有 suggest 参数的 search 端点。

GET shakespeare/_search
{
  "suggest": {
    "autocomplete": {
      "prefix": "To be",
      "completion": {
        "field": "text_entry"
      }
    }
  }
}

短语“to be”与 text_entry 字段的 FST 进行前缀匹配。

{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "autocomplete" : [
      {
        "text" : "To be",
        "offset" : 0,
        "length" : 5,
        "options" : [
          {
            "text" : "To be a comrade with the wolf and owl,--",
            "_index" : "shakespeare",
            "_id" : "50652",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 50653,
              "play_name" : "King Lear",
              "speech_number" : 68,
              "line_number" : "2.4.230",
              "speaker" : "KING LEAR",
              "text_entry" : "To be a comrade with the wolf and owl,--"
            }
          },
          {
            "text" : "To be a make-peace shall become my age:",
            "_index" : "shakespeare",
            "_id" : "78566",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 78567,
              "play_name" : "Richard II",
              "speech_number" : 20,
              "line_number" : "1.1.160",
              "speaker" : "JOHN OF GAUNT",
              "text_entry" : "To be a make-peace shall become my age:"
            }
          },
          {
            "text" : "To be a party in this injury.",
            "_index" : "shakespeare",
            "_id" : "75259",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 75260,
              "play_name" : "Othello",
              "speech_number" : 57,
              "line_number" : "5.1.93",
              "speaker" : "IAGO",
              "text_entry" : "To be a party in this injury."
            }
          },
          {
            "text" : "To be a preparation gainst the Polack;",
            "_index" : "shakespeare",
            "_id" : "33591",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 33592,
              "play_name" : "Hamlet",
              "speech_number" : 17,
              "line_number" : "2.2.67",
              "speaker" : "VOLTIMAND",
              "text_entry" : "To be a preparation gainst the Polack;"
            }
          },
          {
            "text" : "To be a public spectacle to all:",
            "_index" : "shakespeare",
            "_id" : "3709",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 3710,
              "play_name" : "Henry VI Part 1",
              "speech_number" : 6,
              "line_number" : "1.4.41",
              "speaker" : "TALBOT",
              "text_entry" : "To be a public spectacle to all:"
            }
          }
        ]
      }
    ]
  }
}

要指定要返回的建议数量，请使用 size 参数。

GET shakespeare/_search
{
  "suggest": {
    "autocomplete": {
      "prefix": "To n",
      "completion": {
        "field": "text_entry",
        "size": 3
      }
    }
  }
}

最多返回三个文档。

{
  "took" : 4109,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "autocomplete" : [
      {
        "text" : "To n",
        "offset" : 0,
        "length" : 4,
        "options" : [
          {
            "text" : "To NESTOR",
            "_index" : "shakespeare",
            "_id" : "99707",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 99708,
              "play_name" : "Troilus and Cressida",
              "speech_number" : 3,
              "line_number" : "",
              "speaker" : "ULYSSES",
              "text_entry" : "To NESTOR"
            }
          },
          {
            "text" : "To name the bigger light, and how the less,",
            "_index" : "shakespeare",
            "_id" : "91884",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 91885,
              "play_name" : "The Tempest",
              "speech_number" : 91,
              "line_number" : "1.2.394",
              "speaker" : "CALIBAN",
              "text_entry" : "To name the bigger light, and how the less,"
            }
          },
          {
            "text" : "To nature none more bound; his training such,",
            "_index" : "shakespeare",
            "_id" : "40510",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 40511,
              "play_name" : "Henry VIII",
              "speech_number" : 18,
              "line_number" : "1.2.126",
              "speaker" : "KING HENRY VIII",
              "text_entry" : "To nature none more bound; his training such,"
            }
          }
        ]
      }
    ]
  }
}

suggest 参数仅使用前缀匹配来查找建议。例如，文档“To be, or not to be”不属于结果。如果您希望返回特定文档作为建议，可以手动添加精心策划的建议并添加权重以优先显示您的建议。

索引包含输入建议的文档并分配权重

PUT shakespeare/_doc/1?refresh=true
{
  "text_entry": {
    "input": [
      "To n", "To be, or not to be: that is the question:"
    ],
    "weight": 10
  }
}

执行相同的搜索

GET shakespeare/_search
{
  "suggest": {
    "autocomplete": {
      "prefix": "To n",
      "completion": {
        "field": "text_entry",
        "size": 3
      }
    }
  }
}

您将看到索引文档作为第一个结果。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "autocomplete" : [
      {
        "text" : "To n",
        "offset" : 0,
        "length" : 4,
        "options" : [
          {
            "text" : "To n",
            "_index" : "shakespeare",
            "_id" : "1",
            "_score" : 10.0,
            "_source" : {
              "text_entry" : {
                "input" : [
                  "To n",
                  "To be, or not to be: that is the question:"
                ],
                "weight" : 10
              }
            }
          },
          {
            "text" : "To NESTOR",
            "_index" : "shakespeare",
            "_id" : "99707",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 99708,
              "play_name" : "Troilus and Cressida",
              "speech_number" : 3,
              "line_number" : "",
              "speaker" : "ULYSSES",
              "text_entry" : "To NESTOR"
            }
          },
          {
            "text" : "To name the bigger light, and how the less,",
            "_index" : "shakespeare",
            "_id" : "91884",
            "_score" : 1.0,
            "_source" : {
              "type" : "line",
              "line_id" : 91885,
              "play_name" : "The Tempest",
              "speech_number" : 91,
              "line_number" : "1.2.394",
              "speaker" : "CALIBAN",
              "text_entry" : "To name the bigger light, and how the less,"
            }
          }
        ]
      }
    ]
  }
}

您还可以通过指定 fuzzy 参数来允许查询中的拼写错误。

GET shakespeare/_search
{
  "suggest": {
    "autocomplete": {
      "prefix": "rosenkrantz",
      "completion": {
        "field": "text_entry",
        "size": 3,
        "fuzzy" : {
            "fuzziness" : "AUTO"
        }
      }
    }
  }
}

结果与正确的拼写匹配。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "autocomplete" : [
      {
        "text" : "rosenkrantz",
        "offset" : 0,
        "length" : 11,
        "options" : [
          {
            "text" : "ROSENCRANTZ:",
            "_index" : "shakespeare",
            "_id" : "35196",
            "_score" : 5.0,
            "_source" : {
              "type" : "line",
              "line_id" : 35197,
              "play_name" : "Hamlet",
              "speech_number" : 2,
              "line_number" : "4.2.1",
              "speaker" : "HAMLET",
              "text_entry" : "ROSENCRANTZ:"
            }
          }
        ]
      }
    ]
  }
}

您可以使用正则表达式来定义完成建议器查询的前缀。

GET shakespeare/_search
{
  "suggest": {
    "autocomplete": {
      "prefix": "rosen*",
      "completion": {
        "field": "text_entry",
        "size": 3
      }
    }
  }
}

欲了解更多信息，请参阅 completion 字段类型文档。

边输入边搜索

OpenSearch 有一个专用的 search_as_you_type 字段类型，它针对即时搜索功能进行了优化，可以使用前缀和中缀完成来匹配词。 search_as_you_type 字段不需要您预先设置自定义分析器或索引建议。

首先，将字段映射为 search_as_you_type

PUT shakespeare
{
  "mappings": {
    "properties": {
      "text_entry": {
        "type": "search_as_you_type"
      }
    }
  }
}

索引文档后，OpenSearch 会自动创建并存储其 n-gram 和边缘 n-gram。例如，考虑字符串 that is the question。首先，它使用标准分析器拆分为词，这些词存储在 text_entry 字段中。

[
    "that",
    "is",
    "the",
    "question"
]

除了存储这些词之外，此字段的以下 2-gram 存储在 text_entry._2gram 字段中：

[
    "that is",
    "is the",
    "the question"
]

此字段的以下 3-gram 存储在 text_entry._3gram 字段中：

[
    "that is the",
    "is the question"
]

最后，在应用边缘 n-gram 词元过滤器后，结果词元存储在 text_entry._index_prefix 字段中。

[
    "t", 
    "th", 
    "tha", 
    "that", 
    ...
]

然后，您可以使用 multi-match 查询的 bool_prefix 类型以任意顺序匹配词。

GET shakespeare/_search
{
  "query": {
    "multi_match": {
      "query": "uncle what",
      "type": "bool_prefix",
      "fields": [
        "text_entry",
        "text_entry._2gram",
        "text_entry._3gram"
      ]
    }
  },
  "size": 3
}

词语在查询中以相同顺序出现的文档在结果中排名更高。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4759,
      "relation" : "eq"
    },
    "max_score" : 10.437667,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "2817",
        "_score" : 10.437667,
        "_source" : {
          "type" : "line",
          "line_id" : 2818,
          "play_name" : "Henry IV",
          "speech_number" : 5,
          "line_number" : "5.2.31",
          "speaker" : "HOTSPUR",
          "text_entry" : "Uncle, what news?"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "37085",
        "_score" : 9.437667,
        "_source" : {
          "type" : "line",
          "line_id" : 37086,
          "play_name" : "Henry V",
          "speech_number" : 26,
          "line_number" : "1.2.262",
          "speaker" : "KING HENRY V",
          "text_entry" : "What treasure, uncle?"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "79274",
        "_score" : 9.358302,
        "_source" : {
          "type" : "line",
          "line_id" : 79275,
          "play_name" : "Richard II",
          "speech_number" : 29,
          "line_number" : "2.1.187",
          "speaker" : "KING RICHARD II",
          "text_entry" : "Why, uncle, whats the matter?"
        }
      }
    ]
  }
}

要按顺序匹配词语，您可以使用 match_phrase_prefix 查询。

GET shakespeare/_search
{
  "query": {
    "match_phrase_prefix": {
      "text_entry": "uncle wha"
    }
  },
  "size": 3
}

响应包含与前缀匹配的文档。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 16.37664,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "2817",
        "_score" : 16.37664,
        "_source" : {
          "type" : "line",
          "line_id" : 2818,
          "play_name" : "Henry IV",
          "speech_number" : 5,
          "line_number" : "5.2.31",
          "speaker" : "HOTSPUR",
          "text_entry" : "Uncle, what news?"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "6789",
        "_score" : 16.37664,
        "_source" : {
          "type" : "line",
          "line_id" : 6790,
          "play_name" : "Henry VI Part 2",
          "speech_number" : 60,
          "line_number" : "1.3.202",
          "speaker" : "KING HENRY VI",
          "text_entry" : "Uncle, what shall we say to this in law?"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "7877",
        "_score" : 16.37664,
        "_source" : {
          "type" : "line",
          "line_id" : 7878,
          "play_name" : "Henry VI Part 2",
          "speech_number" : 13,
          "line_number" : "3.2.28",
          "speaker" : "KING HENRY VI",
          "text_entry" : "Where is our uncle? whats the matter, Suffolk?"
        }
      }
    ]
  }
}

最后，要精确匹配最后一个词而不是作为前缀，您可以使用 match_phrase 查询。

GET shakespeare/_search
{
  "query": {
    "match_phrase": {
      "text_entry": "uncle what"
    }
  },
  "size": 5
}

响应包含精确匹配。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 14.437452,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "2817",
        "_score" : 14.437452,
        "_source" : {
          "type" : "line",
          "line_id" : 2818,
          "play_name" : "Henry IV",
          "speech_number" : 5,
          "line_number" : "5.2.31",
          "speaker" : "HOTSPUR",
          "text_entry" : "Uncle, what news?"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "6789",
        "_score" : 9.461917,
        "_source" : {
          "type" : "line",
          "line_id" : 6790,
          "play_name" : "Henry VI Part 2",
          "speech_number" : 60,
          "line_number" : "1.3.202",
          "speaker" : "KING HENRY VI",
          "text_entry" : "Uncle, what shall we say to this in law?"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "100955",
        "_score" : 8.947967,
        "_source" : {
          "type" : "line",
          "line_id" : 100956,
          "play_name" : "Troilus and Cressida",
          "speech_number" : 28,
          "line_number" : "3.2.98",
          "speaker" : "CRESSIDA",
          "text_entry" : "Well, uncle, what folly I commit, I dedicate to you."
        }
      }
    ]
  }
}

如果您修改上一个 match_phrase 查询中的文本并省略最后一个字母，则上一个响应中的任何文档都不会返回。

GET shakespeare/_search
{
  "query": {
    "match_phrase": {
      "text_entry": "uncle wha"
    }
  }
}

结果为空。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}