Link Search Menu Expand Document Documentation Menu

排名特征

使用 rank_feature 查询可以根据文档中的数值(例如相关性分数、流行度或新鲜度)提升文档分数。此查询非常适合需要使用数值特征微调相关性排名的场景。与 全文查询 不同,rank_feature 仅关注数值信号;当与 bool 等复合查询结合使用时,它最有效。

为了使 rank_feature 查询生效,目标字段需要被映射为 rank_feature 字段类型。这可以实现内部优化的评分,从而实现快速高效的提升。

分数影响取决于字段值和可选的 saturationlogsigmoid 函数。这些函数在查询时动态应用以计算最终文档分数;它们不会更改或存储文档本身的任何值。

参数

rank_feature 查询支持以下参数。

参数 数据类型 必需/可选 描述
field 字符串 必需 一个 rank_featurerank_features 字段,用于对文档评分。
提升 浮点型 可选 应用于分数的乘数。默认值为 1.0。介于 0 和 1 之间的值会降低分数;大于 1 的值会放大分数。
saturation 对象 可选 对特征值应用饱和函数。提升随值增长,但在超出 pivot 后趋于平稳。如果未提供其他函数,则为默认函数。每次只能使用 saturationlogsigmoid 中的一个函数。
log 对象 可选 使用基于字段值的对数评分函数。最适合大范围值。每次只能使用 saturationlogsigmoid 中的一个函数。
sigmoid 对象 可选 应用由 pivotexponent 控制的 Sigmoid(S 形)曲线来影响分数。每次只能使用 saturationlogsigmoid 中的一个函数。
positive_score_impact 布尔型 可选 当设置为 false 时,较低的值得分更高。适用于价格等特征,其中较小的值更好。在映射中定义。默认值为 true

示例

以下示例演示如何定义和使用 rank_feature 字段来影响文档评分。

创建具有排序特征字段的索引

定义一个具有 rank_feature 字段的索引,以表示像 popularity 这样的信号

PUT /products
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "popularity": { "type": "rank_feature" }
    }
  }
}

索引示例文档

添加具有不同流行度值的示例产品

POST /products/_bulk
{ "index": { "_id": 1 } }
{ "title": "Wireless Earbuds", "popularity": 1 }
{ "index": { "_id": 2 } }
{ "title": "Bluetooth Speaker", "popularity": 10 }
{ "index": { "_id": 3 } }
{ "title": "Portable Charger", "popularity": 25 }
{ "index": { "_id": 4 } }
{ "title": "Smartwatch", "popularity": 50 }
{ "index": { "_id": 5 } }
{ "title": "Noise Cancelling Headphones", "popularity": 100 }
{ "index": { "_id": 6 } }
{ "title": "Gaming Laptop", "popularity": 250 }
{ "index": { "_id": 7 } }
{ "title": "4K Monitor", "popularity": 500 }

基本排序特征查询

您可以使用 rank_feature 根据 popularity 分数提升结果

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity"
    }
  }
}

此查询本身不执行过滤。相反,它根据 popularity 的值对所有文档进行评分。值越高,分数越高。

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 0.9252834,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.9252834,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 0.86095566,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 0.71237755,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 0.5532503,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 0.38240916,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 0.19851118,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 0.024169207,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

要过滤相关结果并根据流行度提升它们,请使用以下请求。此查询对所有匹配“headphones”的文档进行排名,并提升流行度更高的文档。

POST /products/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "headphones"
        }
      },
      "should": {
        "rank_feature": {
          "field": "popularity"
        }
      }
    }
  }
}

Boost 参数

通过 boost 参数,您可以调整 rank_feature 子句的分数贡献。这在 bool 等复合查询中特别有用,您可以在其中控制数值字段(如流行度、新鲜度或相关性分数)对最终文档排名的影响程度。

在以下示例中,bool 查询匹配 title 中包含“headphones”一词的文档,并使用 rank_feature 子句和 2.0boost 值来提升更受欢迎的结果。这将使 rank_feature 分数对文档总分的贡献翻倍。

POST /products/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "headphones"
        }
      },
      "should": {
        "rank_feature": {
          "field": "popularity",
          "boost": 2.0
        }
      }
    }
  }
}

配置评分函数

默认情况下,rank_feature 查询使用从字段派生的 pivot 值和 saturation 函数。您可以显式地将函数设置为 saturationlogsigmoid

饱和函数

rank_feature 查询中,saturation 函数是默认的评分方法。它为具有较大特征值的文档分配更高的分数,但随着值超过指定的 pivot,分数的增长会变得更加平缓。当您希望对非常大的值给予递减的回报时,这很有用,例如,在提升 popularity 的同时避免过度奖励极高的数字。计算分数的公式是 rank_feature 字段的值 / (rank_feature 字段的值 + pivot)。生成的分数始终在 01 之间。如果未提供 pivot,则使用索引中所有 rank_feature 值的近似几何平均值。

以下示例使用 saturationpivot 值为 50

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity",
      "saturation": {
        "pivot": 50
      }
    }
  }
}

pivot 定义了评分增长放缓的点。高于 pivot 的值仍然会增加分数,但回报会递减,如返回的命中所示。

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 0.9090909,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.9090909,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 0.8333333,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 0.6666666,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 0.5,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 0.3333333,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 0.16666669,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 0.019607842,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

对数函数

rank_feature 字段包含大量值范围时,log 函数很有用。它对 score 应用对数刻度,从而减少极高值的影响,并有助于在广泛的值分布中规范化评分。当低值之间的微小差异应比高值之间的较大差异更具影响力时,这尤其有用。分数使用公式 log(scaling_factor + rank_feature field) 计算。以下示例使用 scaling_factor2

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity",
      "log": {
        "scaling_factor": 2
      }
    }
  }
}

在示例数据集中,popularity 字段的范围是 1500log 函数会压缩 250500 等大值对 score 的贡献,同时仍允许值为 1025 的文档获得有意义的分数。相比之下,如果应用 saturation 函数,高于 pivot 的文档将迅速接近相同的最大分数

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 6.2186003,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 6.2186003,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 5.529429,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 4.624973,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 3.9512436,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 3.295837,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 2.4849067,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 1.0986123,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

Sigmoid 函数

sigmoid 函数提供平滑的 S 形评分曲线,这在您希望控制评分影响的陡峭度和中点时特别有用。分数使用公式 rank feature field value^exp / (rank feature field value^exp + pivot^exp) 得出。以下示例使用配置了 pivotexponentsigmoid 函数。pivot 定义了分数为 0.5 的值。exponent 控制曲线的陡峭程度。值越低,在 pivot 附近的过渡越急剧

POST /products/_search
{
  "query": {
    "rank_feature": {
      "field": "popularity",
      "sigmoid": {
        "pivot": 50,
        "exponent": 0.5
      }
    }
  }
}

sigmoid 函数平滑地提升 pivot 附近(在此示例中为 50)的分数,对接近 pivot 的值给予适度偏好,同时使高低极端值趋于平缓

{
  ...
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 0.7597469,
    "hits": [
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.7597469,
        "_source": {
          "title": "4K Monitor",
          "popularity": 500
        }
      },
      {
        "_index": "products",
        "_id": "6",
        "_score": 0.690983,
        "_source": {
          "title": "Gaming Laptop",
          "popularity": 250
        }
      },
      {
        "_index": "products",
        "_id": "5",
        "_score": 0.58578646,
        "_source": {
          "title": "Noise Cancelling Headphones",
          "popularity": 100
        }
      },
      {
        "_index": "products",
        "_id": "4",
        "_score": 0.5,
        "_source": {
          "title": "Smartwatch",
          "popularity": 50
        }
      },
      {
        "_index": "products",
        "_id": "3",
        "_score": 0.41421357,
        "_source": {
          "title": "Portable Charger",
          "popularity": 25
        }
      },
      {
        "_index": "products",
        "_id": "2",
        "_score": 0.309017,
        "_source": {
          "title": "Bluetooth Speaker",
          "popularity": 10
        }
      },
      {
        "_index": "products",
        "_id": "1",
        "_score": 0.12389934,
        "_source": {
          "title": "Wireless Earbuds",
          "popularity": 1
        }
      }
    ]
  }
}

反转分数影响

默认情况下,值越高分数越高。如果您希望较低的值产生较高的分数(例如,较低的价格更相关),请在索引创建期间将 positive_score_impact 设置为 false

PUT /products_new
{
  "mappings": {
    "properties": {
      "popularity": {
        "type": "rank_feature",
        "positive_score_impact": false
      }
    }
  }
}

剩余 350 字符

有问题?

想贡献?