Link Search Menu Expand Document Documentation Menu

矩阵统计聚合

matrix_stats 聚合是一种多值指标聚合,用于以矩阵形式生成两个或多个字段的协方差统计数据。

matrix_stats 聚合不支持脚本编写。

参数

matrix_stats 聚合接受以下参数。

参数 必需/可选 数据类型 描述
fields 必需 字符串 计算矩阵统计数据的字段数组。
missing 可选 对象 用于替换缺失值的值。默认情况下,缺失值将被忽略。请参阅缺失值
mode 可选 字符串 从多值或数组字段中作为样本使用的值。允许的值为 avgminmaxsummedian。默认为 avg

示例

以下示例返回 OpenSearch Dashboards 电子商务样本数据中 taxful_total_priceproducts.base_price 字段的统计数据

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "matrix_stats_taxful_total_price": {
      "matrix_stats": {
        "fields": ["taxful_total_price", "products.base_price"]
      }
    }
  }
}

响应包含聚合结果

{
  "took": 250,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "matrix_stats_taxful_total_price": {
      "doc_count": 4675,
      "fields": [
        {
          "name": "products.base_price",
          "count": 4675,
          "mean": 34.99423943014724,
          "variance": 360.5035285833702,
          "skewness": 5.530161335032689,
          "kurtosis": 131.1630632404217,
          "covariance": {
            "products.base_price": 360.5035285833702,
            "taxful_total_price": 846.6489362233169
          },
          "correlation": {
            "products.base_price": 1,
            "taxful_total_price": 0.8444765264325269
          }
        },
        {
          "name": "taxful_total_price",
          "count": 4675,
          "mean": 75.05542864304839,
          "variance": 2788.1879749835425,
          "skewness": 15.812149139923994,
          "kurtosis": 619.1235507385886,
          "covariance": {
            "products.base_price": 846.6489362233169,
            "taxful_total_price": 2788.1879749835425
          },
          "correlation": {
            "products.base_price": 0.8444765264325269,
            "taxful_total_price": 1
          }
        }
      ]
    }
  }
}

下表描述了响应字段。

统计量 描述
count 为聚合采样的文档数量。
mean 根据样本计算出的字段平均值。
variance 与均值偏差的平方,衡量数据离散度。
skewness 衡量分布相对于均值的非对称性。请参阅偏度
kurtosis 衡量分布尾部厚度。随着尾部变轻,峰度降低。峰度和偏度用于确定总体是否可能呈正态分布。请参阅峰度
covariance 衡量两个字段之间联合变异性的指标。正值表示它们的值朝同一方向变化。
correlation 归一化协方差,衡量两个字段之间关系强度的指标。可能的值范围为 -1 到 1(含),表示从完全负相关到完全正线性相关。值为 0 表示变量之间没有可辨别的关系。

缺失值

要定义缺失值的处理方式,请使用 missing 参数。默认情况下,缺失值将被忽略。

例如,创建一个索引,其中文档 1 缺少 gpaclass_grades 字段

POST _bulk
{ "create": { "_index": "students", "_id": "1" } }
{ "name": "John Doe" } 
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "class_grades": [3.0, 3.9, 4.0] } 
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "class_grades": [3.2, 2.1, 3.8] }

首先,在不提供 missing 参数的情况下运行 matrix_stats 聚合

GET students/_search
{
  "size": 0,
  "aggs": {
    "matrix_stats_taxful_total_price": {
      "matrix_stats": {
        "fields": [
          "gpa",
          "class_grades"
        ],
        "mode": "avg"
      }
    }
  }
}

OpenSearch 在计算矩阵统计数据时会忽略缺失值

{
  "took": 5,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "matrix_stats_taxful_total_price": {
      "doc_count": 2,
      "fields": [
        {
          "name": "gpa",
          "count": 2,
          "mean": 3.684999942779541,
          "variance": 0.05444997482300096,
          "skewness": 0,
          "kurtosis": 1,
          "covariance": {
            "gpa": 0.05444997482300096,
            "class_grades": 0.09899998760223136
          },
          "correlation": {
            "gpa": 1,
            "class_grades": 0.9999999999999991
          }
        },
        {
          "name": "class_grades",
          "count": 2,
          "mean": 3.333333333333333,
          "variance": 0.1800000381469746,
          "skewness": 0,
          "kurtosis": 1,
          "covariance": {
            "gpa": 0.09899998760223136,
            "class_grades": 0.1800000381469746
          },
          "correlation": {
            "gpa": 0.9999999999999991,
            "class_grades": 1
          }
        }
      ]
    }
  }
}

要将缺失字段设置为 0,请将 missing 参数作为键值映射提供。即使 class_grades 是一个数组字段,matrix_stats 聚合也会将多值数值字段展平为每个文档的平均值,因此您必须提供一个单一数字作为缺失值

GET students/_search
{
  "size": 0,
  "aggs": {
    "matrix_stats_taxful_total_price": {
      "matrix_stats": {
        "fields": ["gpa", "class_grades"],
        "mode": "avg",
        "missing": {
          "gpa": 0,
          "class_grades": 0
        }
      }
    }
  }
}

OpenSearch 在计算矩阵统计数据时,会将任何缺失的 gpaclass_grades 值替换为 0

{
  "took": 23,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "matrix_stats_taxful_total_price": {
      "doc_count": 3,
      "fields": [
        {
          "name": "gpa",
          "count": 3,
          "mean": 2.456666628519694,
          "variance": 4.55363318017324,
          "skewness": -0.688130006360758,
          "kurtosis": 1.5,
          "covariance": {
            "gpa": 4.55363318017324,
            "class_grades": 4.143944374667273
          },
          "correlation": {
            "gpa": 1,
            "class_grades": 0.9970184390038257
          }
        },
        {
          "name": "class_grades",
          "count": 3,
          "mean": 2.2222222222222223,
          "variance": 3.793703722777191,
          "skewness": -0.6323693521730989,
          "kurtosis": 1.5000000000000002,
          "covariance": {
            "gpa": 4.143944374667273,
            "class_grades": 3.793703722777191
          },
          "correlation": {
            "gpa": 0.9970184390038257,
            "class_grades": 1
          }
        }
      ]
    }
  }
}
剩余 350 字符

有问题?

想要贡献?