组合聚合

composite 聚合根据一个或多个文档字段或来源创建桶。composite 聚合为每个单独的源值组合创建一个桶。默认情况下，结果中会省略一个或多个单独字段中缺少值的组合。

每个源都有四种聚合类型之一

terms 类型按唯一（通常是 String）值进行分组。
histogram 类型按指定宽度的桶进行数值分组。
date_histogram 类型按指定宽度的日期或时间范围进行分组。
geotile_grid 类型将地理点分组到具有指定分辨率的网格中。

composite 聚合通过将其源键组合到桶中来工作。结果桶在源之间和源内部都有序排列

跨源：桶按照源在聚合请求中排列的顺序进行嵌套。
源内：每个源中值的顺序决定了该源的桶顺序。排序可以是字母顺序、数字顺序、日期时间顺序或地理瓦片顺序，具体取决于源类型。

考虑以下马拉松参与者索引中的字段：

{... "city": "Albuquerque", "place": "Bronze" ...}
{... "city": "Boston",  ...}
{... "city": "Chicago", "place": "Bronze" ...}
{... "city": "Albuquerque", "place": "Gold" ...}
{... "city": "Chicago", "place": "Silver" ...}
{... "city": "Boston", "place": "Bronze" ...}
{... "city": "Chicago", "place": "Gold" ...}

假设请求按如下方式指定源：

    ...
    "sources": [
        { "marathon_city": { "terms": { "field": "city" }}},
        { "participant_medal": { "terms": { "field": "place" }}}
    ],
    ...

您必须为每个源分配一个唯一的键名。

结果 composite 包含以下有序的桶：

{ "city": "Albuquerque", "place": "Bronze" }
{ "city": "Albuquerque", "place": "Gold" }
{ "city": "Boston", "place": "Bronze" }
{ "city": "Boston", "place": "Silver" }
{ "city": "Chicago", "place": "Bronze" }
{ "city": "Chicago", "place": "Gold" }
{ "city": "Chicago", "place": "Silver" }

请注意，city 和 place 字段都按字母顺序排序。

参数

composite 聚合接受以下参数：

参数	必需/可选	数据类型	描述
`sources`	必需	数组	源对象数组。有效类型包括 `terms`、`histogram`、`date_histogram` 和 `geotile_grid`。
`size`	可选	数值	结果中要返回的 `composite` 桶的数量。默认值为 `10`。请参阅分页组合结果。
`after`	可选	字符串	一个键，指定从何处继续显示分页的 `composite` 桶。请参阅分页组合结果。
`顺序`	可选	字符串	对于每个源，是否按升序或降序排列值。有效值为 `asc` 和 `desc`。默认值为 `asc`。
`missing_bucket`	可选	布尔型	对于每个源，是否包含缺少值的文档。默认值为 `false`。如果设置为 `true`，OpenSearch 将包含这些文档，并提供 `null` 作为字段的键。Null 值在升序中排在首位。

有关特定于聚合的参数，请参阅相应的聚合文档。

词项集

使用 terms 聚合来聚合字符串或布尔数据。有关更多信息，请参阅术语聚合。

您可以使用 terms 源为任何类型的数据创建组合桶。但是，由于 terms 源为每个唯一值创建桶，因此您通常会改用 histogram 源来处理数值数据。

以下示例请求返回 OpenSearch Dashboards 示例电子商务数据中每周某天和客户性别的首 4 个组合桶：

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "day": { "terms": { "field": "day_of_week" }}},
          { "gender": { "terms": { "field": "customer_gender" }}}
        ],
        "size": 4
      }
    }
  }
}

由于此示例的数据集包含每个桶的有效数据，因此聚合会为性别和每周某天的每种组合生成一个桶，总共生成 14 个桶。

因为请求指定了大小为 4，所以响应包含前四个组合桶。由于源是 terms 类型，因此桶在源之间和源内部都按字母升序排列

{
  "took": 51,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "day": "Monday",
        "gender": "MALE"
      },
      "buckets": [
        {
          "key": {
            "day": "Friday",
            "gender": "FEMALE"
          },
          "doc_count": 399
        },
        {
          "key": {
            "day": "Friday",
            "gender": "MALE"
          },
          "doc_count": 371
        },
        {
          "key": {
            "day": "Monday",
            "gender": "FEMALE"
          },
          "doc_count": 320
        },
        {
          "key": {
            "day": "Monday",
            "gender": "MALE"
          },
          "doc_count": 259
        }
      ]
    }
  }
}

您可以使用响应中返回的 after_key 来查看更多结果。请参阅下一节中的示例。

直方图

使用 histogram 源创建数值数据的组合聚合。有关更多信息，请参阅直方图聚合。

对于 histogram 源，每个组合桶键中使用的名称是该键直方图间隔中的最低值。每个源直方图间隔包含 [lower_bound, lower_bound + interval) 范围内的值。第一个间隔的名称是源字段中的最低值（对于升序值源）。

以下示例请求返回 OpenSearch Dashboards 示例电子商务数据中数量和基本单价的首 6 个组合桶，桶宽度分别为 1 和 50：

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "quantity": { "histogram": { "field": "products.quantity", "interval": 1 }}},
          { "unit_price": { "histogram": { "field": "products.base_unit_price", "interval": 50 }}}
        ],
        "size": 6
      }
    }
  }
}

聚合返回两个 histogram 源的首 6 个桶键和文档计数。与 terms 示例中一样，桶在源字段之间和源内部都有序排列。但是，在这种情况下，顺序是数值的，并且基于每个直方图宽度的包含下限。

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "quantity": 2,
        "unit_price": 150
      },
      "buckets": [
        {
          "key": {
            "quantity": 1,
            "unit_price": 0
          },
          "doc_count": 17691
        },
        {
          "key": {
            "quantity": 1,
            "unit_price": 50
          },
          "doc_count": 5014
        },
        {
          "key": {
            "quantity": 1,
            "unit_price": 100
          },
          "doc_count": 482
        },
        {
          "key": {
            "quantity": 1,
            "unit_price": 150
          },
          "doc_count": 148
        },
        {
          "key": {
            "quantity": 1,
            "unit_price": 200
          },
          "doc_count": 32
        },
        {
          "key": {
            "quantity": 2,
            "unit_price": 150
          },
          "doc_count": 4
        }
      ]
    }
  }
}

每个字段的桶键是字段间隔的下限。例如，第一个组合桶的 unit_price 键是 0。

要检索接下来的 6 个桶，请按照如下方式，使用响应中的 after_key 对象提供 after 参数：

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "quantity": { "histogram": { "field": "products.quantity", "interval": 1 }}},
          { "unit_price": { "histogram": { "field": "products.base_unit_price", "interval": 50 }}}
        ],
        "size": 6,
        "after": {
            "quantity": 2,
            "unit_price": 150
        }
      }
    }
  }
}

仅剩两个桶。

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "quantity": 2,
        "unit_price": 500
      },
      "buckets": [
        {
          "key": {
            "quantity": 2,
            "unit_price": 200
          },
          "doc_count": 8
        },
        {
          "key": {
            "quantity": 2,
            "unit_price": 500
          },
          "doc_count": 4
        }
      ]
    }
  }
}

日期直方图

要创建日期范围的组合聚合，请使用 date_histogram 聚合。有关更多信息，请参阅日期直方图聚合。

OpenSearch 将日期（包括 date_interval 桶键）表示为长整型，该长整型表示自 Unix 时间纪元以来的毫秒数。您可以使用 format 参数格式化日期输出。这不会改变键的顺序。

OpenSearch 以 UTC 存储日期时间。您可以使用 time_zone 参数以不同的时区显示输出结果。

以下示例请求返回 OpenSearch Dashboards 示例电子商务数据中每个已售产品创建年份和销售日期的首 4 个组合桶，桶宽度分别为 1 年和 1 天。

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "product_creation_date": { "date_histogram": { "field": "products.created_on", "calendar_interval": "1y", "format": "yyyy" }}},
          { "order_date": { "date_histogram": { "field": "order_date", "calendar_interval": "1d", "format": "yyyy-MM-dd" }}}
        ],
        "size": 4
      }
    }
  }
}

聚合返回格式化的基于日期的桶键和计数。对于 date_interval 组合聚合，字段排序是按日期进行的。

{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "product_creation_date": "2016",
        "order_date": "2025-02-23"
      },
      "buckets": [
        {
          "key": {
            "product_creation_date": "2016",
            "order_date": "2025-02-20"
          },
          "doc_count": 146
        },
        {
          "key": {
            "product_creation_date": "2016",
            "order_date": "2025-02-21"
          },
          "doc_count": 153
        },
        {
          "key": {
            "product_creation_date": "2016",
            "order_date": "2025-02-22"
          },
          "doc_count": 143
        },
        {
          "key": {
            "product_creation_date": "2016",
            "order_date": "2025-02-23"
          },
          "doc_count": 140
        }
      ]
    }
  }
}

地理瓦片网格

使用 geotile_grid 源将 geo_point 值聚合成表示地图瓦片的桶。与其它组合聚合源一样，默认情况下，结果仅包含包含数据的桶。有关更多信息，请参阅地理瓦片网格聚合。

每个单元格对应一个地图瓦片。单元格标签使用 {zoom}/{x}/{y} 格式。

以下示例请求返回精确度为 8 的 geoip.location 字段中包含位置的首 6 个瓦片：

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "tile": { "geotile_grid": { "field": "geoip.location", "precision": 8 } } }
        ],
        "size": 6
      }
    }
  }
}

聚合返回指定的地理瓦片和点计数。

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "tile": "8/122/104"
      },
      "buckets": [
        {
          "key": {
            "tile": "8/43/102"
          },
          "doc_count": 310
        },
        {
          "key": {
            "tile": "8/75/96"
          },
          "doc_count": 896
        },
        {
          "key": {
            "tile": "8/75/124"
          },
          "doc_count": 178
        },
        {
          "key": {
            "tile": "8/122/104"
          },
          "doc_count": 408
        }
      ]
    }
  }
}

组合源

您可以组合任意两种或多种不同类型的源。

以下示例请求返回由三种不同源类型组成的桶：

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "order_date": { "date_histogram": { "field": "order_date", "calendar_interval": "1M", "format": "yyyy-MM" }}},
          { "gender": { "terms": { "field": "customer_gender" }}},          
          { "unit_price": { "histogram": { "field": "products.base_unit_price", "interval": 200 }}}
        ],
        "size": 10
      }
    }
  }
}

聚合返回混合类型的组合桶和文档计数。

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "order_date": "2025-03",
        "gender": "MALE",
        "unit_price": 200
      },
      "buckets": [
        {
          "key": {
            "order_date": "2025-02",
            "gender": "FEMALE",
            "unit_price": 0
          },
          "doc_count": 1517
        },
        {
          "key": {
            "order_date": "2025-02",
            "gender": "MALE",
            "unit_price": 0
          },
          "doc_count": 1369
        },
        {
          "key": {
            "order_date": "2025-02",
            "gender": "MALE",
            "unit_price": 200
          },
          "doc_count": 6
        },
        {
          "key": {
            "order_date": "2025-02",
            "gender": "MALE",
            "unit_price": 400
          },
          "doc_count": 1
        },
        {
          "key": {
            "order_date": "2025-03",
            "gender": "FEMALE",
            "unit_price": 0
          },
          "doc_count": 3656
        },
        {
          "key": {
            "order_date": "2025-03",
            "gender": "FEMALE",
            "unit_price": 200
          },
          "doc_count": 1
        },
        {
          "key": {
            "order_date": "2025-03",
            "gender": "MALE",
            "unit_price": 0
          },
          "doc_count": 3530
        },
        {
          "key": {
            "order_date": "2025-03",
            "gender": "MALE",
            "unit_price": 200
          },
          "doc_count": 7
        }
      ]
    }
  }
}

子聚合

当组合聚合与子聚合结合使用时，它们最有用，可以揭示组合桶中文档的信息。

以下示例请求比较了 OpenSearch Dashboards 示例电子商务数据中每周某天按性别划分的平均支出：

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "composite_buckets": {
      "composite": {
        "sources": [
          { "weekday": { "terms": { "field": "day_of_week" }}},
          { "gender": { "terms": { "field": "customer_gender" }}}          
        ],
        "size": 6
      },
      "aggs": {
        "avg_spend": {
          "avg": { "field": "taxful_total_price" }
        }
      }
    }
  }
}

聚合返回前 6 个桶的平均 taxful_total_price。

{
  "took": 30,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "weekday": "Saturday",
        "gender": "MALE"
      },
      "buckets": [
        {
          "key": {
            "weekday": "Friday",
            "gender": "FEMALE"
          },
          "doc_count": 399,
          "avg_spend": {
            "value": 71.7733395989975
          }
        },
        {
          "key": {
            "weekday": "Friday",
            "gender": "MALE"
          },
          "doc_count": 371,
          "avg_spend": {
            "value": 79.72514108827494
          }
        },
        {
          "key": {
            "weekday": "Monday",
            "gender": "FEMALE"
          },
          "doc_count": 320,
          "avg_spend": {
            "value": 72.1588623046875
          }
        },
        {
          "key": {
            "weekday": "Monday",
            "gender": "MALE"
          },
          "doc_count": 259,
          "avg_spend": {
            "value": 86.1754946911197
          }
        },
        {
          "key": {
            "weekday": "Saturday",
            "gender": "FEMALE"
          },
          "doc_count": 365,
          "avg_spend": {
            "value": 73.53236301369863
          }
        },
        {
          "key": {
            "weekday": "Saturday",
            "gender": "MALE"
          },
          "doc_count": 371,
          "avg_spend": {
            "value": 72.78092360175202
          }
        }
      ]
    }
  }
}

组合结果分页

如果请求结果超过 size 个桶，则返回 size 个桶。在这种情况下，结果包含一个 after_key 对象，其中包含列表中下一个桶的键。要检索请求的下一个 size 个桶，请再次发送请求，并在 after 参数中提供 after_key。有关示例，请参阅直方图中的请求。

始终使用 after_key，而不是复制最后一个桶来继续分页响应。两者有时是不同的。

通过索引排序提高性能

为了加快大型数据集上的组合聚合速度，您可以使用与聚合源中相同的字段和顺序对索引进行排序。当 index.sort.field 和 index.sort.order 与组合聚合中使用的源字段和顺序匹配时，OpenSearch 可以更高效地返回结果，并减少内存使用。虽然索引排序在索引期间会增加少量开销，但组合聚合的查询性能提升是显著的。

以下示例请求为 my-sorted-index 索引中的每个字段设置排序字段和排序顺序：

PUT /my-sorted-index
{
  "settings": {
    "index": {
      "sort.field": ["customer_id", "timestamp"],
      "sort.order": ["asc", "desc"]
    }
  },
  "mappings": {
    "properties": {
      "customer_id": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      },
      "price": {
        "type": "double"
      }
    }
  }
}

以下请求在 my-sorted-index 索引上创建了一个组合聚合。由于索引按 customer_id 升序和 timestamp 降序排序，并且聚合源与该排序顺序匹配，因此此查询运行速度更快，内存压力更小。

GET /my-sorted-index/_search
{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          { "customer": { "terms": { "field": "customer_id", "order": "asc" } } },
          { "time": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
        ]
      }
    }
  }
}

参数
词项集
直方图
日期直方图
地理瓦片网格
组合源
子聚合
组合结果分页
通过索引排序提高性能

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想要贡献？编辑此页面或创建问题。

组合聚合

参数

词项集

直方图

日期直方图

地理瓦片网格

组合源

子聚合

组合结果分页

通过索引排序提高性能

OpenSearch 链接

参与其中

资源

联系我们