经典令牌过滤器
经典令牌过滤器的主要功能是与经典分词器协同工作。它通过应用以下常见转换来处理令牌,这些转换有助于文本分析和搜索
- 删除所有格结尾,例如’s。例如,John’s 变为 John。
- 删除首字母缩略词中的句点。例如,D.A.R.P.A. 变为 DARPA。
示例
以下示例请求创建一个名为 custom_classic_filter
的新索引,并使用 classic
过滤器配置分析器
PUT /custom_classic_filter
{
"settings": {
"analysis": {
"analyzer": {
"custom_classic": {
"type": "custom",
"tokenizer": "classic",
"filter": ["classic"]
}
}
}
}
}
生成的词元
使用以下请求检查使用该分析器生成的词元
POST /custom_classic_filter/_analyze
{
"analyzer": "custom_classic",
"text": "John's co-operate was excellent."
}
响应包含生成的词元
{
"tokens": [
{
"token": "John",
"start_offset": 0,
"end_offset": 6,
"type": "<APOSTROPHE>",
"position": 0
},
{
"token": "co",
"start_offset": 7,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "operate",
"start_offset": 10,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "was",
"start_offset": 18,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "excellent",
"start_offset": 22,
"end_offset": 31,
"type": "<ALPHANUM>",
"position": 4
}
]
}