Link Search Menu Expand Document Documentation Menu

Trim 词元过滤器

trim 词元过滤器从词元中删除前导和尾随的空白字符。

许多常用的分词器,例如 standardkeywordwhitespace 分词器,在分词过程中会自动去除前导和尾随的空白字符。使用这些分词器时,无需额外配置 trim 词元过滤器。

示例

以下示例请求创建了一个名为 my_pattern_trim_index 的新索引,并配置了一个分析器,该分析器带有 trim 过滤器和 pattern 分词器,其中 pattern 分词器不会去除前导和尾随的空白字符

PUT /my_pattern_trim_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_trim_filter": {
          "type": "trim"
        }
      },
      "tokenizer": {
        "my_pattern_tokenizer": {
          "type": "pattern",
          "pattern": ","
        }
      },
      "analyzer": {
        "my_pattern_trim_analyzer": {
          "type": "custom",
          "tokenizer": "my_pattern_tokenizer",
          "filter": [
            "lowercase",
            "my_trim_filter"
          ]
        }
      }
    }
  }
}

生成的词元

使用以下请求检查使用该分析器生成的词元

GET /my_pattern_trim_index/_analyze
{
  "analyzer": "my_pattern_trim_analyzer",
  "text": " OpenSearch ,  is ,   powerful  "
}

响应包含生成的词元

{
  "tokens": [
    {
      "token": "opensearch",
      "start_offset": 0,
      "end_offset": 12,
      "type": "word",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 13,
      "end_offset": 18,
      "type": "word",
      "position": 1
    },
    {
      "token": "powerful",
      "start_offset": 19,
      "end_offset": 32,
      "type": "word",
      "position": 2
    }
  ]
}
剩余 350 字符

有问题?

想要贡献?