矩阵统计聚合
matrix_stats
聚合是一种多值指标聚合,用于以矩阵形式生成两个或多个字段的协方差统计数据。
matrix_stats
聚合不支持脚本编写。
参数
matrix_stats
聚合接受以下参数。
参数 | 必需/可选 | 数据类型 | 描述 |
---|---|---|---|
fields | 必需 | 字符串 | 计算矩阵统计数据的字段数组。 |
missing | 可选 | 对象 | 用于替换缺失值的值。默认情况下,缺失值将被忽略。请参阅缺失值。 |
mode | 可选 | 字符串 | 从多值或数组字段中作为样本使用的值。允许的值为 avg 、min 、max 、sum 和 median 。默认为 avg 。 |
示例
以下示例返回 OpenSearch Dashboards 电子商务样本数据中 taxful_total_price
和 products.base_price
字段的统计数据
GET opensearch_dashboards_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"matrix_stats_taxful_total_price": {
"matrix_stats": {
"fields": ["taxful_total_price", "products.base_price"]
}
}
}
}
响应包含聚合结果
{
"took": 250,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4675,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"matrix_stats_taxful_total_price": {
"doc_count": 4675,
"fields": [
{
"name": "products.base_price",
"count": 4675,
"mean": 34.99423943014724,
"variance": 360.5035285833702,
"skewness": 5.530161335032689,
"kurtosis": 131.1630632404217,
"covariance": {
"products.base_price": 360.5035285833702,
"taxful_total_price": 846.6489362233169
},
"correlation": {
"products.base_price": 1,
"taxful_total_price": 0.8444765264325269
}
},
{
"name": "taxful_total_price",
"count": 4675,
"mean": 75.05542864304839,
"variance": 2788.1879749835425,
"skewness": 15.812149139923994,
"kurtosis": 619.1235507385886,
"covariance": {
"products.base_price": 846.6489362233169,
"taxful_total_price": 2788.1879749835425
},
"correlation": {
"products.base_price": 0.8444765264325269,
"taxful_total_price": 1
}
}
]
}
}
}
下表描述了响应字段。
统计量 | 描述 |
---|---|
count | 为聚合采样的文档数量。 |
mean | 根据样本计算出的字段平均值。 |
variance | 与均值偏差的平方,衡量数据离散度。 |
skewness | 衡量分布相对于均值的非对称性。请参阅偏度。 |
kurtosis | 衡量分布尾部厚度。随着尾部变轻,峰度降低。峰度和偏度用于确定总体是否可能呈正态分布。请参阅峰度。 |
covariance | 衡量两个字段之间联合变异性的指标。正值表示它们的值朝同一方向变化。 |
correlation | 归一化协方差,衡量两个字段之间关系强度的指标。可能的值范围为 -1 到 1(含),表示从完全负相关到完全正线性相关。值为 0 表示变量之间没有可辨别的关系。 |
缺失值
要定义缺失值的处理方式,请使用 missing
参数。默认情况下,缺失值将被忽略。
例如,创建一个索引,其中文档 1 缺少 gpa
和 class_grades
字段
POST _bulk
{ "create": { "_index": "students", "_id": "1" } }
{ "name": "John Doe" }
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "class_grades": [3.0, 3.9, 4.0] }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "class_grades": [3.2, 2.1, 3.8] }
首先,在不提供 missing
参数的情况下运行 matrix_stats
聚合
GET students/_search
{
"size": 0,
"aggs": {
"matrix_stats_taxful_total_price": {
"matrix_stats": {
"fields": [
"gpa",
"class_grades"
],
"mode": "avg"
}
}
}
}
OpenSearch 在计算矩阵统计数据时会忽略缺失值
{
"took": 5,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"matrix_stats_taxful_total_price": {
"doc_count": 2,
"fields": [
{
"name": "gpa",
"count": 2,
"mean": 3.684999942779541,
"variance": 0.05444997482300096,
"skewness": 0,
"kurtosis": 1,
"covariance": {
"gpa": 0.05444997482300096,
"class_grades": 0.09899998760223136
},
"correlation": {
"gpa": 1,
"class_grades": 0.9999999999999991
}
},
{
"name": "class_grades",
"count": 2,
"mean": 3.333333333333333,
"variance": 0.1800000381469746,
"skewness": 0,
"kurtosis": 1,
"covariance": {
"gpa": 0.09899998760223136,
"class_grades": 0.1800000381469746
},
"correlation": {
"gpa": 0.9999999999999991,
"class_grades": 1
}
}
]
}
}
}
要将缺失字段设置为 0
,请将 missing
参数作为键值映射提供。即使 class_grades
是一个数组字段,matrix_stats
聚合也会将多值数值字段展平为每个文档的平均值,因此您必须提供一个单一数字作为缺失值
GET students/_search
{
"size": 0,
"aggs": {
"matrix_stats_taxful_total_price": {
"matrix_stats": {
"fields": ["gpa", "class_grades"],
"mode": "avg",
"missing": {
"gpa": 0,
"class_grades": 0
}
}
}
}
}
OpenSearch 在计算矩阵统计数据时,会将任何缺失的 gpa
或 class_grades
值替换为 0
{
"took": 23,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"matrix_stats_taxful_total_price": {
"doc_count": 3,
"fields": [
{
"name": "gpa",
"count": 3,
"mean": 2.456666628519694,
"variance": 4.55363318017324,
"skewness": -0.688130006360758,
"kurtosis": 1.5,
"covariance": {
"gpa": 4.55363318017324,
"class_grades": 4.143944374667273
},
"correlation": {
"gpa": 1,
"class_grades": 0.9970184390038257
}
},
{
"name": "class_grades",
"count": 3,
"mean": 2.2222222222222223,
"variance": 3.793703722777191,
"skewness": -0.6323693521730989,
"kurtosis": 1.5000000000000002,
"covariance": {
"gpa": 4.143944374667273,
"class_grades": 3.793703722777191
},
"correlation": {
"gpa": 0.9970184390038257,
"class_grades": 1
}
}
]
}
}
}