工作负载的构成

所有工作负载都包含以下文件和目录

workload.json：包含所有工作负载设置。
index.json：包含文档映射和参数以及索引设置。
files.txt：包含数据语料库文件名。
_test-procedures：大多数工作负载只包含一个默认测试过程，该过程在 default.json 中配置。
_operations：包含测试过程中使用的所有操作。
workload.py：为测试增加了更多动态功能。

workload.json

以下工作负载示例展示了创建 workload.json 文件所需的所有基本元素。您可以在自己的基准配置中运行此工作负载，以了解所有元素如何协同工作。

{
  "description": "Tutorial benchmark for OpenSearch Benchmark",
  "indices": [
    {
      "name": "movies",
      "body": "index.json"
    }
  ],
  "corpora": [
    {
      "name": "movies",
      "documents": [
        {
          "source-file": "movies-documents.json",
          "document-count": 11658903, # Fetch document count from command line
          "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
    },
    {
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "iterations": 1000,
      "target-throughput": 100
    }
  ]
}

工作负载通常包括以下元素

indices：定义工作负载使用的相关索引和索引模板。
corpora：定义工作负载使用的所有文档语料库。
schedule：定义操作以及操作的内联运行顺序。或者，您可以使用 operations 对操作进行分组，并使用 test_procedures 参数指定操作顺序。
operations：可选。描述工作负载有哪些可用操作以及如何对它们进行参数化。

索引

要创建索引，请指定其 name。要向索引添加定义，请使用 body 选项并将其指向包含索引定义的 JSON 文件。有关更多信息，请参阅索引。

语料库

corpora 元素需要包含文档语料库的索引名称（例如 movies）以及定义文档语料库的参数列表。此列表包括以下参数

source-file：包含工作负载相应文档的文件名。在本地使用 OpenSearch Benchmark 时，文档包含在 JSON 文件中。提供 base_url 时，请使用压缩文件格式：.zip、.bz2、.zst、.gz、.tar、.tar.gz、.tgz 或 .tar.bz2。压缩文件必须包含一个包含名称的 JSON 文件。
document-count：source-file 中的文档数量，它决定了哪些客户端索引与文档语料库的哪些部分相关联。每个 N 客户端被分配文档语料库的 N 分之一，以摄取到测试集群中。当使用包含具有父/子关系的文档的源时，请指定父文档的数量。
uncompressed-bytes：解压后源文件的大小（以字节为单位），表示解压后的源文件需要多少磁盘空间。
compressed-bytes：解压前源文件的大小（以字节为单位）。这可以帮助您评估集群摄取文档所需的时间。

操作

operations 元素列出了工作负载执行的 OpenSearch API 操作。例如，您可以列出一个名为 create-index 的操作，该操作在基准测试集群中创建一个索引，OpenSearch Benchmark 可以向其中写入文档。操作通常列在 schedule 元素内部。

计划

schedule 元素包含一个按指定顺序运行的操作列表，如以下 JSON 示例所示

  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
    },
    {
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "iterations": 1000,
      "target-throughput": 100
    }
  ]
}

根据此 schedule，操作将按以下顺序运行

create-index 操作创建一个索引。索引保持为空，直到 bulk 操作添加带有基准测试数据的文档。
cluster-health 操作在运行工作负载之前评估集群的健康状况。在 JSON 示例中，工作负载会一直等待，直到集群的健康状态为 green。
- bulk 操作运行 bulk API，同时索引 5000 个文档。
- 在基准测试之前，工作负载会等待指定的 warmup-time-period 过去。在 JSON 示例中，预热期为 120 秒。
clients 字段定义了将同时运行批量索引操作的客户端数量，在此示例中为八个。
search 操作运行 match_all 查询，以匹配所有文档在通过指定的客户端由 bulk API 索引后。
- iterations 字段定义了每个客户端运行 search 操作的次数。基准测试报告会根据此数字自动调整百分位数。要生成精确的百分位数，基准测试需要至少运行 1,000 次迭代。
- target-throughput 字段定义了每个客户端每秒执行的请求数量。此设置可以帮助降低基准测试延迟。例如，target-throughput 为 100 个请求除以 8 个客户端，意味着每个客户端每秒将发出 12 个请求。有关如何在 OpenSearch Benchmark 中定义目标吞吐量的更多信息，请参阅目标吞吐量。

index.json

index.json 文件定义了在 create-index 操作期间工作负载文档的数据映射、索引参数和索引设置。

当 OpenSearch Benchmark 为工作负载创建索引时，它使用 index.json 文件中的索引设置和映射模板。index.json 文件中的映射基于工作负载语料库中单个文档的映射，该文档存储在 files.txt 文件中。以下是 nyc_taxis 工作负载的 index.json 文件示例。您可以自定义字段，例如 number_of_shards、number_of_replicas、query_cache_enabled 和 requests_cache_enabled。

{
  "settings": {
    "index.number_of_shards": {{number_of_shards | default(1)}},
    "index.number_of_replicas": {{number_of_replicas | default(0)}},
    "index.queries.cache.enabled": {{query_cache_enabled | default(false) | tojson}},
    "index.requests.cache.enable": {{requests_cache_enabled | default(false) | tojson}}
  },
  "mappings": {
    "_source": {
      "enabled": {{ source_enabled | default(true) | tojson }}
    },
    "properties": {
      "surcharge": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "dropoff_datetime": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "trip_type": {
        "type": "keyword"
      },
      "mta_tax": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "rate_code_id": {
        "type": "keyword"
      },
      "passenger_count": {
        "type": "integer"
      },
      "pickup_datetime": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "tolls_amount": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "tip_amount": {
        "type": "half_float"
      },
      "payment_type": {
        "type": "keyword"
      },
      "extra": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "vendor_id": {
        "type": "keyword"
      },
      "store_and_fwd_flag": {
        "type": "keyword"
      },
      "improvement_surcharge": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "fare_amount": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "ehail_fee": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "cab_color": {
        "type": "keyword"
      },
      "dropoff_location": {
        "type": "geo_point"
      },
      "vendor_name": {
        "type": "text"
      },
      "total_amount": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "trip_distance": {%- if trip_distance_mapping is defined %} {{ trip_distance_mapping | tojson }} {%- else %} {
        "scaling_factor": 100,
        "type": "scaled_float"
      }{%- endif %},
      "pickup_location": {
        "type": "geo_point"
      }
    },
    "dynamic": "strict"
  }
}

files.txt

files.txt 文件列出了存储工作负载数据的文件，这些文件通常存储在压缩的 JSON 文件中。

_operations 和 _test-procedures

为了使工作负载更易于阅读，_operations 和 _test-procedures 被分离到两个目录中。

_operations 目录包含一个 default.json 文件，该文件列出了测试过程可以使用的所有受支持操作。一些工作负载（例如 nyc_taxis）包含一个额外的 .json 文件，该文件列出了特定于功能的操作，例如 snapshot 操作。以下 JSON 示例显示了 nyc_taxis 工作负载中的操作列表

    {
      "name": "index",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(10000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}}
    },
    {
      "name": "update",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(10000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}},
      "conflicts": "{{conflicts | default('random')}}",
      "on-conflict": "{{on_conflict | default('update')}}",
      "conflict-probability": {{conflict_probability | default(25)}},
      "recency": {{recency | default(0)}}
    },
    {
      "name": "wait-until-merges-finish",
      "operation-type": "index-stats",
      "index": "_all",
      "condition": {
        "path": "_all.total.merges.current",
        "expected-value": 0
      },
      "retry-until-success": true,
      "include-in-reporting": false
    },
    {
      "name": "default",
      "operation-type": "search",
      "body": {
        "query": {
          "match_all": {}
        }
      }
    },
    {
      "name": "range",
      "operation-type": "search",
      "body": {
        "query": {
          "range": {
            "total_amount": {
              "gte": 5,
              "lt": 15
            }
          }
        }
      }
    },
    {
      "name": "distance_amount_agg",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "bool": {
            "filter": {
              "range": {
                "trip_distance": {
                  "lt": 50,
                  "gte": 0
                }
              }
            }
          }
        },
        "aggs": {
          "distance_histo": {
            "histogram": {
              "field": "trip_distance",
              "interval": 1
            },
            "aggs": {
              "total_amount_stats": {
                "stats": {
                  "field": "total_amount"
                }
              }
            }
          }
        }
      }
    },
    {
      "name": "autohisto_agg",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "01/01/2015",
              "lte": "21/01/2015",
              "format": "dd/MM/yyyy"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": 20
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_agg",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
              "dropoff_datetime": {
              "gte": "01/01/2015",
              "lte": "21/01/2015",
              "format": "dd/MM/yyyy"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "day"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_calendar_interval",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_calendar_interval_with_tz",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month",
              "time_zone": "America/New_York"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_fixed_interval",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "fixed_interval": "60d"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_fixed_interval_with_tz",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "fixed_interval": "60d",
              "time_zone": "America/New_York"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_fixed_interval_with_metrics",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "fixed_interval": "60d"
            },
            "aggs": {
              "total_amount": { "stats": { "field": "total_amount" } },
              "tip_amount": { "stats": { "field": "tip_amount" } },
              "trip_distance": { "stats": { "field": "trip_distance" } }
            }
          }
        }
      }
    },
    {
      "name": "auto_date_histogram",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": "12"
            }
          }
        }
      }
    },
    {
      "name": "auto_date_histogram_with_tz",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": "13",
              "time_zone": "America/New_York"
            }
          }
        }
      }
    },
    {
      "name": "auto_date_histogram_with_metrics",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": "12"
            },
            "aggs": {
              "total_amount": { "stats": { "field": "total_amount" } },
              "tip_amount": { "stats": { "field": "tip_amount" } },
              "trip_distance": { "stats": { "field": "trip_distance" } }
            }
          }
        }
      }
    },
    {
      "name": "desc_sort_tip_amount",
      "operation-type": "search",
      "index": "nyc_taxis",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"tip_amount" : "desc"}
        ]
      }
    },
    {
      "name": "asc_sort_tip_amount",
      "operation-type": "search",
      "index": "nyc_taxis",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"tip_amount" : "asc"}
        ]
      }
    }

_test-procedures 目录包含一个 default.json 文件，该文件设置了工作负载执行的操作顺序。与 _operations 目录类似，_test-procedures 目录还可以包含特定于功能的测试过程，例如 nyc_taxis 的 searchable_snapshots.json。以下示例显示了 nyc_taxis 的可搜索快照测试过程

 {
      "name": "searchable-snapshot",
      "description": "Measuring performance for Searchable Snapshot feature. Based on the default test procedure 'append-no-conflicts'.",
      "schedule": [
        {
          "operation": "delete-index"
        },
        {
          "operation": {
            "operation-type": "create-index",
            "settings": {%- if index_settings is defined %} {{ index_settings | tojson }} {%- else %}{
              "index.codec": "best_compression",
              "index.refresh_interval": "30s",
              "index.translog.flush_threshold_size": "4g"
            }{%- endif %}
          }
        },
        {
          "name": "check-cluster-health",
          "operation": {
            "operation-type": "cluster-health",
            "index": "nyc_taxis",
            "request-params": {
              "wait_for_status": "{{ cluster_health | default('green') }}",
              "wait_for_no_relocating_shards": "true"
            },
            "retry-until-success": true
          }
        },
        {
          "operation": "index",
          "warmup-time-period": 240,
          "clients": {{ bulk_indexing_clients | default(8) }},
          "ignore-response-error-level": "{{ error_level | default('non-fatal') }}"
        },
        {
          "name": "refresh-after-index",
          "operation": "refresh"
        },
        {
          "operation": {
            "operation-type": "force-merge",
            "request-timeout": 7200
            {%- if force_merge_max_num_segments is defined %},
            "max-num-segments": {{ force_merge_max_num_segments | tojson }}
            {%- endif %}
          }
        },
        {
          "name": "refresh-after-force-merge",
          "operation": "refresh"
        },
        {
          "operation": "wait-until-merges-finish"
        },
        {
          "operation": "create-snapshot-repository"
        },
        {
          "operation": "delete-snapshot"
        },
        {
          "operation": "create-snapshot"
        },
        {
          "operation": "wait-for-snapshot-creation"
        },
        {
          "operation": {
            "name": "delete-local-index",
            "operation-type": "delete-index"
          }
        },
        {
          "operation": "restore-snapshot"
        },
        {
          "operation": "default",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 3
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "range",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 0.7
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "distance_amount_agg",
          "warmup-iterations": 50,
          "iterations": 50
          {%- if not target_throughput %}
          ,"target-throughput": 2
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "autohisto_agg",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 1.5
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "date_histogram_agg",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 1.5
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        }
      ]
    }

后续步骤

现在您已经熟悉了工作负载的剖析，请参阅选择工作负载的标准。

workload.json
index.json
files.txt
_operations 和 _test-procedures
后续步骤

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想贡献？编辑此页面或创建问题。

工作负载的构成

workload.json

索引

语料库

操作

计划

index.json

files.txt

_operations 和 _test-procedures

后续步骤

OpenSearch 链接

参与其中

资源

联系我们