Link Search Menu Expand Document Documentation Menu

最小匹配度

minimum_should_match 参数可用于全文搜索,并指定文档必须匹配的最小术语数才能在搜索结果中返回。

以下示例要求文档至少匹配三个搜索词中的两个,才能作为搜索结果返回

GET /shakespeare/_search
{
  "query": {
    "match": {
      "text_entry": {
        "query": "prince king star",
        "minimum_should_match": "2"
      }
    }
  }
}

在此示例中,查询有三个可选子句,它们与 OR 结合,因此文档必须匹配 princeking,或者 princestar,或者 kingstar

有效值

您可以将 minimum_should_match 参数指定为以下值之一。

值类型 示例 描述
非负整数 2 文档必须匹配此数量的可选子句。
负整数 -1 文档必须匹配可选子句总数减去此数字后的数量。
非负百分比 70% 文档必须匹配可选子句总数的此百分比。要匹配的子句数向下舍入到最接近的整数。
负百分比 -30% 文档可以有此百分比的可选子句不匹配。文档允许不匹配的子句数向下舍入到最接近的整数。
组合 2<75% n<p% 格式的表达式。如果可选子句的数量小于或等于 n,则文档必须匹配所有可选子句。如果可选子句的数量大于 n,则文档必须匹配 p 百分比的可选子句。
多种组合 3<-1 5<50% 多个组合由空格分隔。每个条件适用于可选子句的数量,该数量大于 < 符号左侧的数字。在此示例中,如果可选子句的数量为三个或更少,则文档必须匹配所有子句。如果可选子句的数量为四个或五个,则文档必须匹配除一个之外的所有子句。如果可选子句的数量为 6 个或更多,则文档必须匹配其中 50%。

n 为文档必须匹配的可选子句的数量。当 n 以百分比计算时,如果 n 小于 1,则使用 1。如果 n 大于可选子句的数量,则使用可选子句的数量。

在布尔查询中使用参数

布尔查询should 子句中列出可选子句,在 must 子句中列出必需子句。可选地,它可以包含 filter 子句来过滤结果。

考虑一个包含以下五个文档的示例索引

PUT testindex/_doc/1
{
  "text": "one OpenSearch"
}

PUT testindex/_doc/2
{
  "text": "one two OpenSearch"
}

PUT testindex/_doc/3
{
  "text": "one two three OpenSearch"
}

PUT testindex/_doc/4
{
  "text": "one two three four OpenSearch"
}

PUT testindex/_doc/5
{
  "text": "OpenSearch"
}

以下查询包含四个可选子句

GET testindex/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text": "OpenSearch"
          }
        }
      ], 
      "should": [
        {
          "match": {
            "text": "one"
          }
        },
        {
          "match": {
            "text": "two"
          }
        },
        {
          "match": {
            "text": "three"
          }
        },
        {
          "match": {
            "text": "four"
          }
        }
      ],
      "minimum_should_match": "80%"
    }
  }
}

由于 minimum_should_match 指定为 80%,要匹配的可选子句数量计算为 4 · 0.8 = 3.2,然后向下舍入为 3。因此,结果包含匹配至少三个子句的文档

{
  "took": 40,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 2.494999,
    "hits": [
      {
        "_index": "testindex",
        "_id": "4",
        "_score": 2.494999,
        "_source": {
          "text": "one two three four OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "3",
        "_score": 1.5744598,
        "_source": {
          "text": "one two three OpenSearch"
        }
      }
    ]
  }
}

现在将 minimum_should_match 指定为 -20%

GET testindex/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text": "OpenSearch"
          }
        }
      ], 
      "should": [
        {
          "match": {
            "text": "one"
          }
        },
        {
          "match": {
            "text": "two"
          }
        },
        {
          "match": {
            "text": "three"
          }
        },
        {
          "match": {
            "text": "four"
          }
        }
      ],
      "minimum_should_match": "-20%"
    }
  }
}

文档可以拥有的不匹配可选子句的数量计算为 4 · 0.2 = 0.8,并向下舍入为 0。因此,结果只包含一个匹配所有可选子句的文档

{
  "took": 41,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 2.494999,
    "hits": [
      {
        "_index": "testindex",
        "_id": "4",
        "_score": 2.494999,
        "_source": {
          "text": "one two three four OpenSearch"
        }
      }
    ]
  }
}

请注意,指定正百分比 (80%) 和负百分比 (-20%) 并未导致文档必须匹配的可选子句数量相同,因为在这两种情况下,结果都向下舍入了。如果可选子句的数量例如是 5,那么 80%-20% 都将产生文档必须匹配的相同可选子句数量 (4)。

默认 minimum_should_match

如果查询包含 mustfilter 子句,则默认的 minimum_should_match 值为 0。例如,以下查询搜索匹配 OpenSearch 和 0 个可选 should 子句的文档

GET testindex/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text": "OpenSearch"
          }
        }
      ], 
      "should": [
        {
          "match": {
            "text": "one"
          }
        },
        {
          "match": {
            "text": "two"
          }
        },
        {
          "match": {
            "text": "three"
          }
        },
        {
          "match": {
            "text": "four"
          }
        }
      ]
    }
  }
}

此查询返回索引中的所有五个文档

{
  "took": 34,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 2.494999,
    "hits": [
      {
        "_index": "testindex",
        "_id": "4",
        "_score": 2.494999,
        "_source": {
          "text": "one two three four OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "3",
        "_score": 1.5744598,
        "_source": {
          "text": "one two three OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "2",
        "_score": 0.91368985,
        "_source": {
          "text": "one two OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 0.4338556,
        "_source": {
          "text": "one OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "5",
        "_score": 0.11964063,
        "_source": {
          "text": "OpenSearch"
        }
      }
    ]
  }
}

但是,如果省略 must 子句,则查询搜索匹配一个可选 should 子句的文档

GET testindex/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "text": "one"
          }
        },
        {
          "match": {
            "text": "two"
          }
        },
        {
          "match": {
            "text": "three"
          }
        },
        {
          "match": {
            "text": "four"
          }
        }
      ]
    }
  }
}

结果只包含四个匹配至少一个可选子句的文档

{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 2.426633,
    "hits": [
      {
        "_index": "testindex",
        "_id": "4",
        "_score": 2.426633,
        "_source": {
          "text": "one two three four OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "3",
        "_score": 1.4978898,
        "_source": {
          "text": "one two three OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "2",
        "_score": 0.8266785,
        "_source": {
          "text": "one two OpenSearch"
        }
      },
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 0.3331056,
        "_source": {
          "text": "one OpenSearch"
        }
      }
    ]
  }
}
剩余 350 字符

有问题?

想贡献吗?