排名特征

使用 rank_feature 查询可以根据文档中的数值（例如相关性分数、流行度或新鲜度）提升文档分数。此查询非常适合需要使用数值特征微调相关性排名的场景。与全文查询不同，rank_feature 仅关注数值信号；当与 bool 等复合查询结合使用时，它最有效。

为了使 rank_feature 查询生效，目标字段需要被映射为 rank_feature 字段类型。这可以实现内部优化的评分，从而实现快速高效的提升。

分数影响取决于字段值和可选的 saturation、log 或 sigmoid 函数。这些函数在查询时动态应用以计算最终文档分数；它们不会更改或存储文档本身的任何值。

参数

rank_feature 查询支持以下参数。

参数	数据类型	必需/可选	描述
`field`	字符串	必需	一个 `rank_feature` 或 `rank_features` 字段，用于对文档评分。
`提升`	浮点型	可选	应用于分数的乘数。默认值为 `1.0`。介于 0 和 1 之间的值会降低分数；大于 1 的值会放大分数。
`saturation`	对象	可选	对特征值应用饱和函数。提升随值增长，但在超出 `pivot` 后趋于平稳。如果未提供其他函数，则为默认函数。每次只能使用 `saturation`、`log` 或 `sigmoid` 中的一个函数。
`log`	对象	可选	使用基于字段值的对数评分函数。最适合大范围值。每次只能使用 `saturation`、`log` 或 `sigmoid` 中的一个函数。
`sigmoid`	对象	可选	应用由 `pivot` 和 `exponent` 控制的 Sigmoid（S 形）曲线来影响分数。每次只能使用 `saturation`、`log` 或 `sigmoid` 中的一个函数。
`positive_score_impact`	布尔型	可选	当设置为 `false` 时，较低的值得分更高。适用于价格等特征，其中较小的值更好。在映射中定义。默认值为 `true`。

示例

以下示例演示如何定义和使用 rank_feature 字段来影响文档评分。

创建具有排序特征字段的索引

定义一个具有 rank_feature 字段的索引，以表示像 popularity 这样的信号

PUT /products
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "popularity": { "type": "rank_feature" }
    }
  }
}

索引示例文档

添加具有不同流行度值的示例产品

POST /products/_bulk
{ "index": { "_id": 1 } }
{ "title": "Wireless Earbuds", "popularity": 1 }
{ "index": { "_id": 2 } }
{ "title": "Bluetooth Speaker", "popularity": 10 }
{ "index": { "_id": 3 } }
{ "title": "Portable Charger", "popularity": 25 }
{ "index": { "_id": 4 } }
{ "title": "Smartwatch", "popularity": 50 }
{ "index": { "_id": 5 } }
{ "title": "Noise Cancelling Headphones", "popularity": 100 }
{ "index": { "_id": 6 } }
{ "title": "Gaming Laptop", "popularity": 250 }
{ "index": { "_id": 7 } }
{ "title": "4K Monitor", "popularity": 500 }

基本排序特征查询

您可以使用 rank_feature 根据 popularity 分数提升结果

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity"
    }
  }
}

此查询本身不执行过滤。相反，它根据 popularity 的值对所有文档进行评分。值越高，分数越高。

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 0.9252834,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.9252834,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 0.86095566,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 0.71237755,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 0.5532503,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 0.38240916,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 0.19851118,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 0.024169207,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

与全文搜索结合

要过滤相关结果并根据流行度提升它们，请使用以下请求。此查询对所有匹配“headphones”的文档进行排名，并提升流行度更高的文档。

POST /products/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "headphones"
        }
      },
      "should": {
        "rank_feature": {
          "field": "popularity"
        }
      }
    }
  }
}

Boost 参数

通过 boost 参数，您可以调整 rank_feature 子句的分数贡献。这在 bool 等复合查询中特别有用，您可以在其中控制数值字段（如流行度、新鲜度或相关性分数）对最终文档排名的影响程度。

在以下示例中，bool 查询匹配 title 中包含“headphones”一词的文档，并使用 rank_feature 子句和 2.0 的 boost 值来提升更受欢迎的结果。这将使 rank_feature 分数对文档总分的贡献翻倍。

POST /products/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "headphones"
        }
      },
      "should": {
        "rank_feature": {
          "field": "popularity",
          "boost": 2.0
        }
      }
    }
  }
}

配置评分函数

默认情况下，rank_feature 查询使用从字段派生的 pivot 值和 saturation 函数。您可以显式地将函数设置为 saturation、log 或 sigmoid。

饱和函数

在 rank_feature 查询中，saturation 函数是默认的评分方法。它为具有较大特征值的文档分配更高的分数，但随着值超过指定的 pivot，分数的增长会变得更加平缓。当您希望对非常大的值给予递减的回报时，这很有用，例如，在提升 popularity 的同时避免过度奖励极高的数字。计算分数的公式是 rank_feature 字段的值 / (rank_feature 字段的值 + pivot)。生成的分数始终在 0 到 1 之间。如果未提供 pivot，则使用索引中所有 rank_feature 值的近似几何平均值。

以下示例使用 saturation，pivot 值为 50。

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity",
      "saturation": {
        "pivot": 50
      }
    }
  }
}

pivot 定义了评分增长放缓的点。高于 pivot 的值仍然会增加分数，但回报会递减，如返回的命中所示。

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 0.9090909,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.9090909,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 0.8333333,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 0.6666666,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 0.5,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 0.3333333,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 0.16666669,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 0.019607842,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

对数函数

当 rank_feature 字段包含大量值范围时，log 函数很有用。它对 score 应用对数刻度，从而减少极高值的影响，并有助于在广泛的值分布中规范化评分。当低值之间的微小差异应比高值之间的较大差异更具影响力时，这尤其有用。分数使用公式 log(scaling_factor + rank_feature field) 计算。以下示例使用 scaling_factor 为 2

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity",
      "log": {
        "scaling_factor": 2
      }
    }
  }
}

在示例数据集中，popularity 字段的范围是 1 到 500。log 函数会压缩 250 和 500 等大值对 score 的贡献，同时仍允许值为 10 或 25 的文档获得有意义的分数。相比之下，如果应用 saturation 函数，高于 pivot 的文档将迅速接近相同的最大分数

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 6.2186003,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 6.2186003,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 5.529429,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 4.624973,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 3.9512436,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 3.295837,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 2.4849067,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 1.0986123,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

Sigmoid 函数

sigmoid 函数提供平滑的 S 形评分曲线，这在您希望控制评分影响的陡峭度和中点时特别有用。分数使用公式 rank feature field value^exp / (rank feature field value^exp + pivot^exp) 得出。以下示例使用配置了 pivot 和 exponent 的 sigmoid 函数。pivot 定义了分数为 0.5 的值。exponent 控制曲线的陡峭程度。值越低，在 pivot 附近的过渡越急剧

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity",
      "sigmoid": {
        "pivot": 50,
        "exponent": 0.5
      }
    }
  }
}

sigmoid 函数平滑地提升 pivot 附近（在此示例中为 50）的分数，对接近 pivot 的值给予适度偏好，同时使高低极端值趋于平缓

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 0.7597469,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.7597469,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 0.690983,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 0.58578646,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 0.5,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 0.41421357,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 0.309017,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 0.12389934,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

反转分数影响

默认情况下，值越高分数越高。如果您希望较低的值产生较高的分数（例如，较低的价格更相关），请在索引创建期间将 positive_score_impact 设置为 false

PUT /products_new
{
  "mappings": {
    "properties": {
      "popularity": {
        "type": "rank_feature",
        "positive_score_impact": false
      }
    }
  }
}

参数
示例

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想贡献？编辑此页面或创建议题。