Link Search Menu Expand Document Documentation Menu

扩展统计聚合

extended_stats 聚合是 stats 聚合的一个更全面的版本。除了 stats 提供的基本统计度量外,extended_stats 还计算以下内容:

  • 平方和
  • 方差
  • 总体方差
  • 样本方差
  • 标准差
  • 总体标准差
  • 样本标准差
  • 标准差范围
    • 上限
    • 下限
    • 总体上限
    • 总体下限
    • 样本上限
    • 样本下限

标准差和方差是总体统计量;它们分别总是等于总体标准差和总体方差。

std_deviation_bounds 对象定义了一个范围,该范围跨越指定数量的标准差(默认是两个标准差)在平均值之上和之下。此对象始终包含在输出中,但仅对正态分布数据有意义。在解释这些值之前,请验证您的数据集是否遵循正态分布。

参数

extended_stats 聚合支持以下参数:

参数 必需/可选 数据类型 描述
field 必需 字符串 返回扩展统计数据的字段名称。
sigma 可选 双精度浮点数(非负) 用于计算 std_deviation_bounds 区间,指定标准差在平均值之上和之下的数量。默认值为 2
missing 可选 数值 分配给字段缺失实例的值。如果未提供,则包含缺失值的文档将从扩展统计中省略。

示例

以下示例请求返回 OpenSearch Dashboards 示例电子商务数据中 taxful_total_price 的扩展统计信息:

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "extended_stats_taxful_total_price": {
      "extended_stats": {
        "field": "taxful_total_price"
      }
    }
  }
}

示例响应

响应包含 taxful_total_price 的扩展统计信息:

...
"aggregations" : {
  "extended_stats_taxful_total_price" : {
    "count" : 4675,
    "min" : 6.98828125,
    "max" : 2250.0,
    "avg" : 75.05542864304813,
    "sum" : 350884.12890625,
    "sum_of_squares" : 3.9367749294174194E7,
    "variance" : 2787.59157113862,
    "variance_population" : 2787.59157113862,
    "variance_sampling" : 2788.187974983536,
    "std_deviation" : 52.79764740155209,
    "std_deviation_population" : 52.79764740155209,
    "std_deviation_sampling" : 52.80329511482722,
    "std_deviation_bounds" : {
      "upper" : 180.6507234461523,
      "lower" : -30.53986616005605,
      "upper_population" : 180.6507234461523,
      "lower_population" : -30.53986616005605,
      "upper_sampling" : 180.66201887270256,
      "lower_sampling" : -30.551161586606312
    }
  }
 }
}

定义范围

您可以通过将 sigma 参数设置为任何非负值来定义用于计算 std_deviation_bounds 区间的标准差数量。

示例:定义范围

std_deviation_bounds 标准差的数量设置为 3

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "extended_stats_taxful_total_price": {
      "extended_stats": {
        "field": "taxful_total_price",
        "sigma": 3
      }
    }
  }
}

这将改变标准差范围

{
...
  "aggregations": {
...
      "std_deviation_bounds": {
        "upper": 233.44837084770438,
        "lower": -83.33751356160813,
        "upper_population": 233.44837084770438,
        "lower_population": -83.33751356160813,
        "upper_sampling": 233.46531398752978,
        "lower_sampling": -83.35445670143353
      }
    }
  }
}

缺失值

您可以为聚合字段的缺失实例指定一个值。有关更多信息,请参阅缺失聚合

通过摄取以下文档准备示例索引:

POST _bulk
{ "create": { "_index": "students", "_id": "1" } }
{ "name": "John Doe", "gpa": 3.89, "grad_year": 2022}
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "grad_year": 2025 }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }

示例:替换缺失值

计算 extended_stats,将缺失的 GPA 字段替换为 0

GET students/_search
{
  "size": 0,
  "aggs": {
    "extended_stats_gpa": {
      "extended_stats": {
        "field": "gpa",
        "missing": 0
      }
    }
  }
}

在响应中,所有缺失的 gpa 值都将替换为 0

...
  "aggregations": {
    "extended_stats_gpa": {
      "count": 3,
      "min": 0,
      "max": 3.890000104904175,
      "avg": 2.4700000286102295,
      "sum": 7.4100000858306885,
      "sum_of_squares": 27.522500681877148,
      "variance": 3.0732667526245145,
      "variance_population": 3.0732667526245145,
      "variance_sampling": 4.609900128936772,
      "std_deviation": 1.7530735160353415,
      "std_deviation_population": 1.7530735160353415,
      "std_deviation_sampling": 2.147067797936705,
      "std_deviation_bounds": {
        "upper": 5.976147060680912,
        "lower": -1.0361470034604534,
        "upper_population": 5.976147060680912,
        "lower_population": -1.0361470034604534,
        "upper_sampling": 6.7641356244836395,
        "lower_sampling": -1.8241355672631805
      }
    }
  }
}

示例:忽略缺失值

计算 extended_stats,但不分配 missing 参数

GET students/_search
{
  "size": 0,
  "aggs": {
    "extended_stats_gpa": {
      "extended_stats": {
        "field": "gpa"
      }
    }
  }
}

OpenSearch 计算扩展统计数据时,将省略包含缺失字段值的文档(默认行为)

...
  "aggregations": {
    "extended_stats_gpa": {
      "count": 2,
      "min": 3.5199999809265137,
      "max": 3.890000104904175,
      "avg": 3.7050000429153442,
      "sum": 7.4100000858306885,
      "sum_of_squares": 27.522500681877148,
      "variance": 0.03422502293587115,
      "variance_population": 0.03422502293587115,
      "variance_sampling": 0.0684500458717423,
      "std_deviation": 0.18500006198883057,
      "std_deviation_population": 0.18500006198883057,
      "std_deviation_sampling": 0.2616295967044675,
      "std_deviation_bounds": {
        "upper": 4.075000166893005,
        "lower": 3.334999918937683,
        "upper_population": 4.075000166893005,
        "lower_population": 3.334999918937683,
        "upper_sampling": 4.228259236324279,
        "lower_sampling": 3.1817408495064092
      }
    }
  }
}

包含缺失 GPA 值的文档已从该计算中省略。请注意 count 的差异。

剩余 350 字符

有问题?

想要贡献?