高亮查询匹配项
高亮显示会突出显示结果中的搜索词,以便您可以强调查询匹配项。
要突出显示搜索词,请在查询块外部添加一个 highlight
参数
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": "life"
}
},
"size": 3,
"highlight": {
"fields": {
"text_entry": {}
}
}
}
结果中的每个文档都包含一个 highlight
对象,其中显示了用 em
标签包裹的搜索词
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 805,
"relation" : "eq"
},
"max_score" : 7.450247,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "33765",
"_score" : 7.450247,
"_source" : {
"type" : "line",
"line_id" : 33766,
"play_name" : "Hamlet",
"speech_number" : 60,
"line_number" : "2.2.233",
"speaker" : "HAMLET",
"text_entry" : "my life, except my life."
},
"highlight" : {
"text_entry" : [
"my <em>life</em>, except my <em>life</em>."
]
}
},
{
"_index" : "shakespeare",
"_id" : "51877",
"_score" : 6.873042,
"_source" : {
"type" : "line",
"line_id" : 51878,
"play_name" : "King Lear",
"speech_number" : 18,
"line_number" : "4.6.52",
"speaker" : "EDGAR",
"text_entry" : "The treasury of life, when life itself"
},
"highlight" : {
"text_entry" : [
"The treasury of <em>life</em>, when <em>life</em> itself"
]
}
},
{
"_index" : "shakespeare",
"_id" : "39245",
"_score" : 6.6167283,
"_source" : {
"type" : "line",
"line_id" : 39246,
"play_name" : "Henry V",
"speech_number" : 7,
"line_number" : "4.7.31",
"speaker" : "FLUELLEN",
"text_entry" : "mark Alexanders life well, Harry of Monmouths life"
},
"highlight" : {
"text_entry" : [
"mark Alexanders <em>life</em> well, Harry of Monmouths <em>life</em>"
]
}
}
]
}
}
高亮显示功能作用于实际的字段内容。OpenSearch 会从存储的字段(映射设置为 true
的字段)中检索这些内容,如果字段未存储,则从 _source
字段中检索。您可以通过将 force_source
参数设置为 true
来强制从 _source
字段检索字段内容。
即使在搜索本身使用同义词或词干提取时,highlight
参数也会突出显示原始词条。
获取偏移量的方法
要突出显示搜索词,高亮器需要每个词条的起始和结束字符偏移量。偏移量标记了词条在原始文本中的位置。高亮器可以从以下来源获取偏移量
-
倒排列表(Postings):当文档被索引时,OpenSearch 会创建一个倒排搜索索引——用于搜索文档的核心数据结构。倒排列表表示倒排搜索索引,并存储每个已分析词条到其出现文档列表的映射。如果在映射 文本字段 时将
index_options
参数设置为offsets
,OpenSearch 会将每个词条的起始和结束字符偏移量添加到倒排索引中。在高亮显示期间,高亮器会直接在倒排列表上重新运行原始查询以定位每个词条。因此,存储偏移量可以使大字段的高亮显示更高效,因为它不需要重新分析文本。存储词条偏移量需要额外的磁盘空间,但比存储词条向量使用的磁盘空间少。 - 文本重新分析:在没有倒排列表和词条向量的情况下,高亮器会重新分析文本以进行高亮显示。对于需要高亮显示的每个文档和每个字段,高亮器会创建一个小的内存中索引,并通过 Lucene 的查询执行计划器重新运行原始查询,以访问当前文档的低级匹配信息。重新分析文本在大多数用例中效果良好。然而,对于大字段,此方法会消耗更多内存和时间。
高亮器类型
OpenSearch 支持四种高亮器实现:plain
、unified
、fvh
(快速向量高亮器)和 semantic
。
下表列出了每种高亮器获取偏移量的方法。
高亮器 | 获取偏移量的方法 |
---|---|
unified | 如果 term_vector 设置为 with_positions_offsets ,则使用词条向量,如果 index_options 设置为 offsets ,则使用倒排列表,否则,重新分析文本。 |
fvh | 词条向量。 |
plain | 文本重新分析。 |
semantic | 模型推理。 |
设置高亮器类型
要设置高亮器类型,请在 type
字段中指定
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": "life"
}
},
"highlight": {
"fields": {
"text_entry": { "type": "plain"}
}
}
}
unified
高亮器
unified
高亮器基于 Lucene Unified Highlighter,是 OpenSearch 的默认高亮器。它将文本分成句子,并将这些句子视为独立的文档,使用 BM25 算法根据相似性对它们进行评分。unified
高亮器支持精确短语和多词条高亮显示,包括模糊、前缀和正则表达式。如果您使用复杂查询来突出显示多个文档中的多个字段,我们建议在 postings
或 term_vector
字段上使用 unified
高亮器。
fvh
高亮器
fvh
高亮器基于 Lucene Fast Vector Highlighter。要使用此高亮器,您需要存储带有位置偏移量的词条向量,这会增加索引大小。fvh
高亮器可以将来自多个字段的匹配词条组合成一个结果。它还可以根据匹配词条的位置分配权重;因此,在高亮显示一个短语匹配优于词条匹配的查询时,您可以将短语匹配排在词条匹配之上。此外,您可以配置 fvh
高亮器来选择返回文本片段的边界,并且可以使用不同的标签突出显示多个词。
plain
高亮器
plain
高亮器基于标准 Lucene 高亮器。它要求高亮显示的字段要么单独存储,要么存储在 _source
字段中。plain
高亮器反映了查询匹配逻辑,特别是词语重要性和短语查询中的位置。它适用于大多数用例,但对于大字段可能较慢,因为它必须重新分析要高亮显示的文本。
semantic
高亮器
semantic
高亮器使用机器学习 (ML) 模型,根据查询的含义,识别并突出显示文本字段中最具语义相关性的句子或段落。这超出了其他高亮器提供的传统词汇匹配。它不依赖于倒排列表或词条向量的偏移量,而是使用已部署的 ML 模型(由 model_id
指定)对字段内容执行推理。这种方法允许您即使在确切词条与查询不匹配时也能突出显示上下文相关的文本。高亮显示在句子级别执行。
在使用 semantic
高亮器之前,您必须配置和部署一个句子高亮模型。有关在 OpenSearch 中使用 ML 模型的更多信息,请参阅集成 ML 模型。有关 OpenSearch 提供的句子高亮模型的信息,请参阅语义句子高亮模型。
要使用 semantic
高亮器,您必须在 highlight.options
对象中指定一个 model_id
。该模型决定文本的哪些部分在语义上与查询相似。
有关分步指南,请参阅语义高亮教程。
高亮选项
下表描述了可以在全局或字段级别指定的高亮选项。字段级别设置会覆盖全局设置。
选项 | 描述 |
---|---|
type | 指定要使用的高亮器。有效值为 unified 、fvh 、plain 和 semantic 。默认值为 unified 。 |
fields | 指定要搜索并高亮显示的文本字段。支持通配符表达式。如果使用通配符,则仅高亮显示 text 和 keyword 字段。例如,您可以将 fields 设置为 my_field* 以包含所有以 my_field 前缀开头的 text 和 keyword 字段。 |
force_source | 指定用于高亮显示的字段值应从 _source 字段获取,而不是从存储的字段值获取。默认值为 false 。 |
require_field_match | 指定是否仅高亮显示包含搜索查询匹配项的字段。默认值为 true 。要高亮显示所有字段,请将此选项设置为 false 。 |
pre_tags | 指定高亮文本的 HTML 开始标签,作为字符串数组。 |
post_tags | 指定高亮文本的 HTML 结束标签,作为字符串数组。 |
tags_schema | 如果将此选项设置为 styled ,OpenSearch 将使用内置标签模式。在此模式中,pre_tags 为 <em class="hlt1"> 、<em class="hlt2"> 、<em class="hlt3"> 、<em class="hlt4"> 、<em class="hlt5"> 、<em class="hlt6"> 、<em class="hlt7"> 、<em class="hlt8"> 、<em class="hlt9"> 和 <em class="hlt10"> ,而 post_tags 为 </em> 。 |
boundary_chars | 所有边界字符组合成的字符串。 默认值为 ".,!? \t\n" 。 |
boundary_scanner | 仅对 unified 和 fvh 高亮器有效。指定是否将高亮片段拆分为句子、单词或字符。有效值如下- sentence :按 BreakIterator 定义的句子边界拆分高亮片段。您可以在 boundary_scanner_locale 选项中指定 BreakIterator 的区域设置。- word :按 BreakIterator 定义的单词边界拆分高亮片段。您可以在 boundary_scanner_locale 选项中指定 BreakIterator 的区域设置。- chars :按 boundary_chars 中列出的任何字符拆分高亮片段。仅对 fvh 高亮器有效。 |
boundary_scanner_locale | 为 boundary_scanner 提供一个 区域设置。有效值为语言标签(例如,"en-US" )。默认值为 Locale.ROOT。 |
boundary_max_scan | 当 fvh 高亮器的 boundary_scanner 参数设置为 chars 时,控制扫描边界字符的距离。默认值为 20。 |
encoder | 指定高亮片段在返回前是否应进行 HTML 编码。有效值为 default (无编码)或 html (先转义 HTML 文本,然后插入高亮标签)。例如,如果字段文本为 <h3>Hamlet</h3> 并且 encoder 设置为 html ,则高亮文本为 "<h3><em>Hamlet</em></h3>" 。 |
fragmenter | 指定如何将文本拆分为高亮片段。仅对 plain 高亮器有效。有效值如下- span (默认):将文本拆分为相同大小的片段,但尽量不在高亮词条之间拆分文本。- simple :将文本拆分为相同大小的片段。 |
fragment_offset | 指定要开始高亮显示的字符偏移量。仅对 fvh 高亮器有效。 |
fragment_size | 高亮片段的大小,以字符数指定。如果 number_of_fragments 设置为 0,则忽略 fragment_size 。默认值为 100。 |
number_of_fragments | 返回片段的最大数量。如果 number_of_fragments 设置为 0,OpenSearch 将返回整个字段的高亮内容。默认值为 5。 |
顺序 | 高亮片段的排序顺序。将 order 设置为 score 以按相关性对片段进行排序。每个高亮器使用不同的算法计算相关性分数。默认值为 none 。 |
highlight_query | 指定应高亮显示与搜索查询不同的查询的匹配项。highlight_query 选项在您使用更快的查询来获取文档匹配项,并使用更慢的查询(例如 rescore_query )来优化结果时非常有用。我们建议将搜索查询作为 highlight_query 的一部分。 |
matched_fields | 组合来自不同字段的匹配项以突出显示一个字段。此功能最常见的用例是突出显示以不同方式分析并保存在多字段中的文本。如果使用 fvh ,则 matched_fields 列表中的所有字段必须将 term_vector 字段设置为 with_positions_offsets 。匹配项组合的字段是唯一加载的字段,因此将其 store 选项设置为 yes 会很有利。仅对 fvh 和 unified 高亮器有效。 |
no_match_size | 指定如果没有匹配的片段需要高亮显示,则从字段开头返回的字符数。默认值为 0。 |
phrase_limit | 考虑的文档中匹配短语的数量。限制 fvh 高亮器要分析的短语数量,以避免消耗大量内存。如果使用 matched_fields ,phrase_limit 指定每个匹配字段的短语数量。更高的 phrase_limit 会导致查询时间增加和内存消耗更多。仅对 fvh 高亮器有效。默认值为 256。 |
max_analyzer_offset | 指定高亮请求要分析的最大字符数。剩余文本将不被处理。如果待高亮文本超出此偏移量,则返回空高亮。高亮请求将分析的最大字符数由 index.highlight.max_analyzed_offset 定义。达到此限制时,将返回错误。将 max_analyzer_offset 设置为低于 index.highlight.max_analyzed_offset 的值以避免错误。 |
options | 一个包含高亮器特定选项的全局对象。 |
options.model_id | 用于高亮显示的已部署 ML 模型的 ID。必需。仅对 semantic 高亮器有效。 |
统一高亮器的句子扫描器将大于 fragment_size
的句子在达到 fragment_size
后的第一个词语边界处拆分。要返回不拆分的完整句子,请将 fragment_size
设置为 0。
更改高亮标签
设计您的应用程序代码以解析 highlight
对象的结果,并对搜索词执行操作,例如更改其颜色、加粗、斜体等。
要更改默认的 em
标签,请在 pretag
和 posttag
参数中指定新标签
GET shakespeare/_search
{
"query": {
"match": {
"play_name": "Henry IV"
}
},
"size": 3,
"highlight": {
"pre_tags": [
"<strong>"
],
"post_tags": [
"</strong>"
],
"fields": {
"play_name": {}
}
}
}
响应中剧本名称将由新标签高亮显示
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3205,
"relation" : "eq"
},
"max_score" : 3.548232,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "0",
"_score" : 3.548232,
"_source" : {
"type" : "act",
"line_id" : 1,
"play_name" : "Henry IV",
"speech_number" : "",
"line_number" : "",
"speaker" : "",
"text_entry" : "ACT I"
},
"highlight" : {
"play_name" : [
"<strong>Henry IV</strong>"
]
}
},
{
"_index" : "shakespeare",
"_id" : "1",
"_score" : 3.548232,
"_source" : {
"type" : "scene",
"line_id" : 2,
"play_name" : "Henry IV",
"speech_number" : "",
"line_number" : "",
"speaker" : "",
"text_entry" : "SCENE I. London. The palace."
},
"highlight" : {
"play_name" : [
"<strong>Henry IV</strong>"
]
}
},
{
"_index" : "shakespeare",
"_id" : "2",
"_score" : 3.548232,
"_source" : {
"type" : "line",
"line_id" : 3,
"play_name" : "Henry IV",
"speech_number" : "",
"line_number" : "",
"speaker" : "",
"text_entry" : "Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"
},
"highlight" : {
"play_name" : [
"<strong>Henry IV</strong>"
]
}
}
]
}
}
指定高亮查询
默认情况下,OpenSearch 仅考虑搜索查询进行高亮显示。如果您使用快速查询获取文档匹配项,并使用像 rescore_query
这样的较慢查询来优化结果,那么高亮显示优化后的结果会很有用。您可以通过添加 highlight_query
来实现这一点
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": {
"query": "thats my name"
}
}
},
"rescore": {
"window_size": 20,
"query": {
"rescore_query": {
"match_phrase": {
"text_entry": {
"query": "thats my name",
"slop": 1
}
}
},
"rescore_query_weight": 5
}
},
"_source": false,
"highlight": {
"order": "score",
"fields": {
"text_entry": {
"highlight_query": {
"bool": {
"must": {
"match": {
"text_entry": {
"query": "thats my name"
}
}
},
"should": {
"match_phrase": {
"text_entry": {
"query": "that is my name",
"slop": 1,
"boost": 10.0
}
}
},
"minimum_should_match": 0
}
}
}
}
}
}
组合不同字段的匹配项以高亮显示一个字段
您可以使用 fvh
高亮器组合来自不同字段的匹配项,以高亮显示一个字段。此功能最常见的用例是高亮显示以不同方式分析并保存在多字段中的文本。 matched_fields
列表中的所有字段都必须将 term_vector
字段设置为 with_positions_offsets
。组合匹配项的字段是唯一加载的字段,因此将其 store
选项设置为 yes
会很有益。
示例
为 shakespeare
索引创建一个映射,其中 text_entry
字段使用 standard
分析器进行分析,并包含一个使用 english
分析器进行分析的 english
子字段
PUT shakespeare
{
"mappings" : {
"properties" : {
"text_entry" : {
"type" : "text",
"term_vector": "with_positions_offsets",
"fields": {
"english": {
"type": "text",
"analyzer": "english",
"term_vector": "with_positions_offsets"
}
}
}
}
}
}
standard
分析器将 text_entry
字段拆分为单个单词。您可以通过使用 analyze API 操作来确认这一点
GET shakespeare/_analyze
{
"text": "bragging of thine",
"field": "text_entry"
}
响应包含按空格拆分的原始字符串
{
"tokens" : [
{
"token" : "bragging",
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "of",
"start_offset" : 9,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "thine",
"start_offset" : 12,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
english
分析器不仅将字符串拆分为单词,还会对标记进行词干提取并删除停用词。您可以通过使用带 text_entry.english
字段的 analyze API 操作来确认这一点
GET shakespeare/_analyze
{
"text": "bragging of thine",
"field": "text_entry.english"
}
响应包含词干提取后的单词
{
"tokens" : [
{
"token" : "brag",
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "thine",
"start_offset" : 12,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
要搜索单词 bragging
的所有形式,请使用以下查询
GET shakespeare/_search
{
"query": {
"query_string": {
"query": "text_entry.english:bragging",
"fields": [
"text_entry"
]
}
},
"highlight": {
"order": "score",
"fields": {
"text_entry": {
"matched_fields": [
"text_entry",
"text_entry.english"
],
"type": "fvh"
}
}
}
}
响应高亮显示 text_entry
字段中单词“bragging”的所有版本
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 26,
"relation" : "eq"
},
"max_score" : 10.153671,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "56666",
"_score" : 10.153671,
"_source" : {
"type" : "line",
"line_id" : 56667,
"play_name" : "macbeth",
"speech_number" : 34,
"line_number" : "2.3.118",
"speaker" : "MACBETH",
"text_entry" : "Is left this vault to brag of."
},
"highlight" : {
"text_entry" : [
"Is left this vault to <em>brag</em> of."
]
}
},
{
"_index" : "shakespeare",
"_id" : "71445",
"_score" : 9.284528,
"_source" : {
"type" : "line",
"line_id" : 71446,
"play_name" : "Much Ado about nothing",
"speech_number" : 18,
"line_number" : "5.1.65",
"speaker" : "LEONATO",
"text_entry" : "As under privilege of age to brag"
},
"highlight" : {
"text_entry" : [
"As under privilege of age to <em>brag</em>"
]
}
},
{
"_index" : "shakespeare",
"_id" : "86782",
"_score" : 9.284528,
"_source" : {
"type" : "line",
"line_id" : 86783,
"play_name" : "Romeo and Juliet",
"speech_number" : 8,
"line_number" : "2.6.31",
"speaker" : "JULIET",
"text_entry" : "Brags of his substance, not of ornament:"
},
"highlight" : {
"text_entry" : [
"<em>Brags</em> of his substance, not of ornament:"
]
}
},
{
"_index" : "shakespeare",
"_id" : "44531",
"_score" : 8.552448,
"_source" : {
"type" : "line",
"line_id" : 44532,
"play_name" : "King John",
"speech_number" : 15,
"line_number" : "3.1.124",
"speaker" : "CONSTANCE",
"text_entry" : "A ramping fool, to brag and stamp and swear"
},
"highlight" : {
"text_entry" : [
"A ramping fool, to <em>brag</em> and stamp and swear"
]
}
},
{
"_index" : "shakespeare",
"_id" : "63208",
"_score" : 8.552448,
"_source" : {
"type" : "line",
"line_id" : 63209,
"play_name" : "Merchant of Venice",
"speech_number" : 11,
"line_number" : "3.4.79",
"speaker" : "PORTIA",
"text_entry" : "A thousand raw tricks of these bragging Jacks,"
},
"highlight" : {
"text_entry" : [
"A thousand raw tricks of these <em>bragging</em> Jacks,"
]
}
},
{
"_index" : "shakespeare",
"_id" : "73026",
"_score" : 8.552448,
"_source" : {
"type" : "line",
"line_id" : 73027,
"play_name" : "Othello",
"speech_number" : 75,
"line_number" : "2.1.242",
"speaker" : "IAGO",
"text_entry" : "but for bragging and telling her fantastical lies:"
},
"highlight" : {
"text_entry" : [
"but for <em>bragging</em> and telling her fantastical lies:"
]
}
},
{
"_index" : "shakespeare",
"_id" : "85974",
"_score" : 8.552448,
"_source" : {
"type" : "line",
"line_id" : 85975,
"play_name" : "Romeo and Juliet",
"speech_number" : 20,
"line_number" : "1.5.70",
"speaker" : "CAPULET",
"text_entry" : "And, to say truth, Verona brags of him"
},
"highlight" : {
"text_entry" : [
"And, to say truth, Verona <em>brags</em> of him"
]
}
},
{
"_index" : "shakespeare",
"_id" : "96800",
"_score" : 8.552448,
"_source" : {
"type" : "line",
"line_id" : 96801,
"play_name" : "Titus Andronicus",
"speech_number" : 60,
"line_number" : "1.1.311",
"speaker" : "SATURNINUS",
"text_entry" : "Agree these deeds with that proud brag of thine,"
},
"highlight" : {
"text_entry" : [
"Agree these deeds with that proud <em>brag</em> of thine,"
]
}
},
{
"_index" : "shakespeare",
"_id" : "18189",
"_score" : 7.9273787,
"_source" : {
"type" : "line",
"line_id" : 18190,
"play_name" : "As you like it",
"speech_number" : 12,
"line_number" : "5.2.30",
"speaker" : "ROSALIND",
"text_entry" : "and Caesars thrasonical brag of I came, saw, and"
},
"highlight" : {
"text_entry" : [
"and Caesars thrasonical <em>brag</em> of I came, saw, and"
]
}
},
{
"_index" : "shakespeare",
"_id" : "32054",
"_score" : 7.9273787,
"_source" : {
"type" : "line",
"line_id" : 32055,
"play_name" : "Cymbeline",
"speech_number" : 52,
"line_number" : "5.5.211",
"speaker" : "IACHIMO",
"text_entry" : "And then a mind put int, either our brags"
},
"highlight" : {
"text_entry" : [
"And then a mind put int, either our <em>brags</em>"
]
}
}
]
}
}
为了给单词“bragging”的原始形式更高的分数,您可以提升 text_entry
字段
GET shakespeare/_search
{
"query": {
"query_string": {
"query": "bragging",
"fields": [
"text_entry^5",
"text_entry.english"
]
}
},
"highlight": {
"order": "score",
"fields": {
"text_entry": {
"matched_fields": [
"text_entry",
"text_entry.english"
],
"type": "fvh"
}
}
}
}
响应首先列出包含单词“bragging”的文档
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 26,
"relation" : "eq"
},
"max_score" : 49.746853,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "45739",
"_score" : 49.746853,
"_source" : {
"type" : "line",
"line_id" : 45740,
"play_name" : "King John",
"speech_number" : 10,
"line_number" : "5.1.51",
"speaker" : "BASTARD",
"text_entry" : "Of bragging horror: so shall inferior eyes,"
},
"highlight" : {
"text_entry" : [
"Of <em>bragging</em> horror: so shall inferior eyes,"
]
}
},
{
"_index" : "shakespeare",
"_id" : "63208",
"_score" : 47.077244,
"_source" : {
"type" : "line",
"line_id" : 63209,
"play_name" : "Merchant of Venice",
"speech_number" : 11,
"line_number" : "3.4.79",
"speaker" : "PORTIA",
"text_entry" : "A thousand raw tricks of these bragging Jacks,"
},
"highlight" : {
"text_entry" : [
"A thousand raw tricks of these <em>bragging</em> Jacks,"
]
}
},
{
"_index" : "shakespeare",
"_id" : "68474",
"_score" : 47.077244,
"_source" : {
"type" : "line",
"line_id" : 68475,
"play_name" : "A Midsummer nights dream",
"speech_number" : 101,
"line_number" : "3.2.427",
"speaker" : "PUCK",
"text_entry" : "Thou coward, art thou bragging to the stars,"
},
"highlight" : {
"text_entry" : [
"Thou coward, art thou <em>bragging</em> to the stars,"
]
}
},
{
"_index" : "shakespeare",
"_id" : "73026",
"_score" : 47.077244,
"_source" : {
"type" : "line",
"line_id" : 73027,
"play_name" : "Othello",
"speech_number" : 75,
"line_number" : "2.1.242",
"speaker" : "IAGO",
"text_entry" : "but for bragging and telling her fantastical lies:"
},
"highlight" : {
"text_entry" : [
"but for <em>bragging</em> and telling her fantastical lies:"
]
}
},
{
"_index" : "shakespeare",
"_id" : "39816",
"_score" : 44.679565,
"_source" : {
"type" : "line",
"line_id" : 39817,
"play_name" : "Henry V",
"speech_number" : 28,
"line_number" : "5.2.138",
"speaker" : "KING HENRY V",
"text_entry" : "armour on my back, under the correction of bragging"
},
"highlight" : {
"text_entry" : [
"armour on my back, under the correction of <em>bragging</em>"
]
}
},
{
"_index" : "shakespeare",
"_id" : "63200",
"_score" : 44.679565,
"_source" : {
"type" : "line",
"line_id" : 63201,
"play_name" : "Merchant of Venice",
"speech_number" : 11,
"line_number" : "3.4.71",
"speaker" : "PORTIA",
"text_entry" : "Like a fine bragging youth, and tell quaint lies,"
},
"highlight" : {
"text_entry" : [
"Like a fine <em>bragging</em> youth, and tell quaint lies,"
]
}
},
{
"_index" : "shakespeare",
"_id" : "56666",
"_score" : 10.153671,
"_source" : {
"type" : "line",
"line_id" : 56667,
"play_name" : "macbeth",
"speech_number" : 34,
"line_number" : "2.3.118",
"speaker" : "MACBETH",
"text_entry" : "Is left this vault to brag of."
},
"highlight" : {
"text_entry" : [
"Is left this vault to <em>brag</em> of."
]
}
},
{
"_index" : "shakespeare",
"_id" : "71445",
"_score" : 9.284528,
"_source" : {
"type" : "line",
"line_id" : 71446,
"play_name" : "Much Ado about nothing",
"speech_number" : 18,
"line_number" : "5.1.65",
"speaker" : "LEONATO",
"text_entry" : "As under privilege of age to brag"
},
"highlight" : {
"text_entry" : [
"As under privilege of age to <em>brag</em>"
]
}
},
{
"_index" : "shakespeare",
"_id" : "86782",
"_score" : 9.284528,
"_source" : {
"type" : "line",
"line_id" : 86783,
"play_name" : "Romeo and Juliet",
"speech_number" : 8,
"line_number" : "2.6.31",
"speaker" : "JULIET",
"text_entry" : "Brags of his substance, not of ornament:"
},
"highlight" : {
"text_entry" : [
"<em>Brags</em> of his substance, not of ornament:"
]
}
},
{
"_index" : "shakespeare",
"_id" : "44531",
"_score" : 8.552448,
"_source" : {
"type" : "line",
"line_id" : 44532,
"play_name" : "King John",
"speech_number" : 15,
"line_number" : "3.1.124",
"speaker" : "CONSTANCE",
"text_entry" : "A ramping fool, to brag and stamp and swear"
},
"highlight" : {
"text_entry" : [
"A ramping fool, to <em>brag</em> and stamp and swear"
]
}
}
]
}
}
使用 semantic
高亮器
semantic
高亮器使用指定的机器学习模型来查找文本中与搜索查询在语义上相关的段落,即使没有精确的关键字匹配。高亮显示在句子级别发生。
要使用 semantic
高亮器,请在 fields
对象中将 type
设置为 semantic
,并在全局 highlight.options
对象中提供已部署的句子转换器或问答模型的 model_id
。
以下示例使用 neural
查询来查找与“神经退行性疾病治疗”相关的文档,然后使用指定的 sentence_model_id
应用语义高亮。
POST neural-search-index/_search
{
"_source": {
"excludes": ["text_embedding"]
},
"query": {
"neural": {
"text_embedding": {
"query_text": "treatments for neurodegenerative diseases",
"model_id": "your-text-embedding-model-id",
"k": 5
}
}
},
"highlight": {
"fields": {
"text": {
"type": "semantic"
}
},
"options": {
"model_id": "your-sentence-model-id"
}
}
}
响应为每个命中包含一个 highlight
对象,通过使用 标签强调最语义相关的句子来指示它。请注意,模型 ID 仅为占位符:
{
"took": 628,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": { "value": 5, "relation": "eq" },
"max_score": 0.4841726,
"hits": [
{
"_index": "neural-search-index",
"_id": "srL7G5YBmDiZSe-G2pDc",
"_score": 0.4841726,
"_source": {
"text": "Alzheimer's disease is a progressive neurodegenerative disorder characterized by accumulation of amyloid-beta plaques and neurofibrillary tangles in the brain. Early symptoms include short-term memory impairment, followed by language difficulties, disorientation, and behavioral changes. While traditional treatments such as cholinesterase inhibitors and memantine provide modest symptomatic relief, they do not alter disease progression. Recent clinical trials investigating monoclonal antibodies targeting amyloid-beta, including aducanumab, lecanemab, and donanemab, have shown promise in reducing plaque burden and slowing cognitive decline. Early diagnosis using biomarkers such as cerebrospinal fluid analysis and PET imaging may facilitate timely intervention and improved outcomes."
},
"highlight": {
"text": [
"Alzheimer's disease is a progressive neurodegenerative disorder ... <em>Recent clinical trials investigating monoclonal antibodies targeting amyloid-beta, including aducanumab, lecanemab, and donanemab, have shown promise in reducing plaque burden and slowing cognitive decline.</em> Early diagnosis using biomarkers ..."
]
}
},
// ... other hits with highlighted sentences ...
]
}
}
为简洁起见,示例响应中的高亮片段已被截断。 semantic
高亮器返回包含最相关段落的完整句子。
查询限制
请注意以下限制
- 在提取要高亮显示的术语时,高亮器不反映查询的布尔逻辑。因此,对于某些复杂的布尔查询,例如嵌套布尔查询和使用
minimum_should_match
的查询,OpenSearch 可能会高亮显示与查询匹配不对应的术语。 fvh
高亮器不支持 span 查询。semantic
高亮器需要一个在highlight.options
中由model_id
指定的已部署机器学习模型。它不使用传统的偏移方法(倒排列表、词向量),而是完全依赖于模型推理。