空白分析器
空白符 (`whitespace`) 分析器仅根据空白符字符(例如,空格和制表符)将文本分解为词元。它不应用任何转换,例如小写或删除停用词,因此文本的原始大小写得以保留,并且标点符号也作为词元的一部分包含在内。
示例
使用以下命令创建一个名为 my_whitespace_index
并使用 whitespace
分析器的索引
PUT /my_whitespace_index
{
"mappings": {
"properties": {
"my_field": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
配置自定义分析器
使用以下命令配置一个自定义分析器索引,该分析器等同于添加了 lowercase
字符过滤器的 whitespace
分析器
PUT /my_custom_whitespace_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"my_field": {
"type": "text",
"analyzer": "my_custom_whitespace_analyzer"
}
}
}
}
生成的词元
使用以下请求检查使用该分析器生成的词元
POST /my_custom_whitespace_index/_analyze
{
"analyzer": "my_custom_whitespace_analyzer",
"text": "The SLOW turtle swims away! 123"
}
响应包含生成的词元
{
"tokens": [
{"token": "the","start_offset": 0,"end_offset": 3,"type": "word","position": 0},
{"token": "slow","start_offset": 4,"end_offset": 8,"type": "word","position": 1},
{"token": "turtle","start_offset": 9,"end_offset": 15,"type": "word","position": 2},
{"token": "swims","start_offset": 16,"end_offset": 21,"type": "word","position": 3},
{"token": "away!","start_offset": 22,"end_offset": 27,"type": "word","position": 4},
{"token": "123","start_offset": 28,"end_offset": 31,"type": "word","position": 5}
]
}