展平图词元过滤器
flatten_graph
词元过滤器用于处理图结构中在相同位置生成多个词元时出现的复杂词元关系。一些词元过滤器,例如 synonym_graph
和 word_delimiter_graph
,会生成多位置词元——即重叠或跨越多个位置的词元。这些词元图对于搜索查询很有用,但在索引期间不直接支持。flatten_graph
词元过滤器将多位置词元解析为线性的词元序列。展平图可确保与索引过程的兼容性。
词元图展平是一个有损过程。尽可能避免使用 flatten_graph
过滤器。相反,仅在搜索分析器中应用图词元过滤器,从而消除对 flatten_graph
过滤器的需求。
示例
以下示例请求创建一个名为 test_index
的新索引,并配置一个带有 flatten_graph
过滤器的分析器
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_index_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_custom_filter",
"flatten_graph"
]
}
},
"filter": {
"my_custom_filter": {
"type": "word_delimiter_graph",
"catenate_all": true
}
}
}
}
}
生成的词元
使用以下请求检查使用该分析器生成的词元
POST /test_index/_analyze
{
"analyzer": "my_index_analyzer",
"text": "OpenSearch helped many employers"
}
响应包含生成的词元
{
"tokens": [
{
"token": "OpenSearch",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0,
"positionLength": 2
},
{
"token": "Open",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "Search",
"start_offset": 4,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "helped",
"start_offset": 11,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "many",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "employers",
"start_offset": 23,
"end_offset": 32,
"type": "<ALPHANUM>",
"position": 4
}
]
}