工作负载的构成
所有工作负载都包含以下文件和目录
- workload.json:包含所有工作负载设置。
- index.json:包含文档映射和参数以及索引设置。
- files.txt:包含数据语料库文件名。
- _test-procedures:大多数工作负载只包含一个默认测试过程,该过程在
default.json
中配置。 - _operations:包含测试过程中使用的所有操作。
- workload.py:为测试增加了更多动态功能。
workload.json
以下工作负载示例展示了创建 workload.json
文件所需的所有基本元素。您可以在自己的基准配置中运行此工作负载,以了解所有元素如何协同工作。
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
工作负载通常包括以下元素
- indices:定义工作负载使用的相关索引和索引模板。
- corpora:定义工作负载使用的所有文档语料库。
schedule
:定义操作以及操作的内联运行顺序。或者,您可以使用operations
对操作进行分组,并使用test_procedures
参数指定操作顺序。operations
:可选。描述工作负载有哪些可用操作以及如何对它们进行参数化。
索引
要创建索引,请指定其 name
。要向索引添加定义,请使用 body
选项并将其指向包含索引定义的 JSON 文件。有关更多信息,请参阅 索引。
语料库
corpora
元素需要包含文档语料库的索引名称(例如 movies
)以及定义文档语料库的参数列表。此列表包括以下参数
source-file
:包含工作负载相应文档的文件名。在本地使用 OpenSearch Benchmark 时,文档包含在 JSON 文件中。提供base_url
时,请使用压缩文件格式:.zip
、.bz2
、.zst
、.gz
、.tar
、.tar.gz
、.tgz
或.tar.bz2
。压缩文件必须包含一个包含名称的 JSON 文件。document-count
:source-file
中的文档数量,它决定了哪些客户端索引与文档语料库的哪些部分相关联。每个 N 客户端被分配文档语料库的 N 分之一,以摄取到测试集群中。当使用包含具有父/子关系的文档的源时,请指定父文档的数量。uncompressed-bytes
:解压后源文件的大小(以字节为单位),表示解压后的源文件需要多少磁盘空间。compressed-bytes
:解压前源文件的大小(以字节为单位)。这可以帮助您评估集群摄取文档所需的时间。
操作
operations
元素列出了工作负载执行的 OpenSearch API 操作。例如,您可以列出一个名为 create-index
的操作,该操作在基准测试集群中创建一个索引,OpenSearch Benchmark 可以向其中写入文档。操作通常列在 schedule
元素内部。
计划
schedule
元素包含一个按指定顺序运行的操作列表,如以下 JSON 示例所示
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
根据此 schedule
,操作将按以下顺序运行
create-index
操作创建一个索引。索引保持为空,直到bulk
操作添加带有基准测试数据的文档。cluster-health
操作在运行工作负载之前评估集群的健康状况。在 JSON 示例中,工作负载会一直等待,直到集群的健康状态为green
。bulk
操作运行bulk
API,同时索引5000
个文档。- 在基准测试之前,工作负载会等待指定的
warmup-time-period
过去。在 JSON 示例中,预热期为120
秒。
clients
字段定义了将同时运行批量索引操作的客户端数量,在此示例中为八个。search
操作运行match_all
查询,以匹配所有文档在通过指定的客户端由bulk
API 索引后。iterations
字段定义了每个客户端运行search
操作的次数。基准测试报告会根据此数字自动调整百分位数。要生成精确的百分位数,基准测试需要至少运行 1,000 次迭代。target-throughput
字段定义了每个客户端每秒执行的请求数量。此设置可以帮助降低基准测试延迟。例如,target-throughput
为 100 个请求除以 8 个客户端,意味着每个客户端每秒将发出 12 个请求。有关如何在 OpenSearch Benchmark 中定义目标吞吐量的更多信息,请参阅 目标吞吐量。
index.json
index.json
文件定义了在 create-index
操作期间工作负载文档的数据映射、索引参数和索引设置。
当 OpenSearch Benchmark 为工作负载创建索引时,它使用 index.json
文件中的索引设置和映射模板。index.json
文件中的映射基于工作负载语料库中单个文档的映射,该文档存储在 files.txt
文件中。以下是 nyc_taxis
工作负载的 index.json
文件示例。您可以自定义字段,例如 number_of_shards
、number_of_replicas
、query_cache_enabled
和 requests_cache_enabled
。
{
"settings": {
"index.number_of_shards": {{number_of_shards | default(1)}},
"index.number_of_replicas": {{number_of_replicas | default(0)}},
"index.queries.cache.enabled": {{query_cache_enabled | default(false) | tojson}},
"index.requests.cache.enable": {{requests_cache_enabled | default(false) | tojson}}
},
"mappings": {
"_source": {
"enabled": {{ source_enabled | default(true) | tojson }}
},
"properties": {
"surcharge": {
"scaling_factor": 100,
"type": "scaled_float"
},
"dropoff_datetime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"trip_type": {
"type": "keyword"
},
"mta_tax": {
"scaling_factor": 100,
"type": "scaled_float"
},
"rate_code_id": {
"type": "keyword"
},
"passenger_count": {
"type": "integer"
},
"pickup_datetime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"tolls_amount": {
"scaling_factor": 100,
"type": "scaled_float"
},
"tip_amount": {
"type": "half_float"
},
"payment_type": {
"type": "keyword"
},
"extra": {
"scaling_factor": 100,
"type": "scaled_float"
},
"vendor_id": {
"type": "keyword"
},
"store_and_fwd_flag": {
"type": "keyword"
},
"improvement_surcharge": {
"scaling_factor": 100,
"type": "scaled_float"
},
"fare_amount": {
"scaling_factor": 100,
"type": "scaled_float"
},
"ehail_fee": {
"scaling_factor": 100,
"type": "scaled_float"
},
"cab_color": {
"type": "keyword"
},
"dropoff_location": {
"type": "geo_point"
},
"vendor_name": {
"type": "text"
},
"total_amount": {
"scaling_factor": 100,
"type": "scaled_float"
},
"trip_distance": {%- if trip_distance_mapping is defined %} {{ trip_distance_mapping | tojson }} {%- else %} {
"scaling_factor": 100,
"type": "scaled_float"
}{%- endif %},
"pickup_location": {
"type": "geo_point"
}
},
"dynamic": "strict"
}
}
files.txt
files.txt
文件列出了存储工作负载数据的文件,这些文件通常存储在压缩的 JSON 文件中。
_operations 和 _test-procedures
为了使工作负载更易于阅读,_operations
和 _test-procedures
被分离到两个目录中。
_operations
目录包含一个 default.json
文件,该文件列出了测试过程可以使用的所有受支持操作。一些工作负载(例如 nyc_taxis
)包含一个额外的 .json
文件,该文件列出了特定于功能的操作,例如 snapshot
操作。以下 JSON 示例显示了 nyc_taxis
工作负载中的操作列表
{
"name": "index",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(10000)}},
"ingest-percentage": {{ingest_percentage | default(100)}}
},
{
"name": "update",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(10000)}},
"ingest-percentage": {{ingest_percentage | default(100)}},
"conflicts": "{{conflicts | default('random')}}",
"on-conflict": "{{on_conflict | default('update')}}",
"conflict-probability": {{conflict_probability | default(25)}},
"recency": {{recency | default(0)}}
},
{
"name": "wait-until-merges-finish",
"operation-type": "index-stats",
"index": "_all",
"condition": {
"path": "_all.total.merges.current",
"expected-value": 0
},
"retry-until-success": true,
"include-in-reporting": false
},
{
"name": "default",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
{
"name": "range",
"operation-type": "search",
"body": {
"query": {
"range": {
"total_amount": {
"gte": 5,
"lt": 15
}
}
}
}
},
{
"name": "distance_amount_agg",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"trip_distance": {
"lt": 50,
"gte": 0
}
}
}
}
},
"aggs": {
"distance_histo": {
"histogram": {
"field": "trip_distance",
"interval": 1
},
"aggs": {
"total_amount_stats": {
"stats": {
"field": "total_amount"
}
}
}
}
}
}
},
{
"name": "autohisto_agg",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "01/01/2015",
"lte": "21/01/2015",
"format": "dd/MM/yyyy"
}
}
},
"aggs": {
"dropoffs_over_time": {
"auto_date_histogram": {
"field": "dropoff_datetime",
"buckets": 20
}
}
}
}
},
{
"name": "date_histogram_agg",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "01/01/2015",
"lte": "21/01/2015",
"format": "dd/MM/yyyy"
}
}
},
"aggs": {
"dropoffs_over_time": {
"date_histogram": {
"field": "dropoff_datetime",
"calendar_interval": "day"
}
}
}
}
},
{
"name": "date_histogram_calendar_interval",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"date_histogram": {
"field": "dropoff_datetime",
"calendar_interval": "month"
}
}
}
}
},
{
"name": "date_histogram_calendar_interval_with_tz",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"date_histogram": {
"field": "dropoff_datetime",
"calendar_interval": "month",
"time_zone": "America/New_York"
}
}
}
}
},
{
"name": "date_histogram_fixed_interval",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"date_histogram": {
"field": "dropoff_datetime",
"fixed_interval": "60d"
}
}
}
}
},
{
"name": "date_histogram_fixed_interval_with_tz",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"date_histogram": {
"field": "dropoff_datetime",
"fixed_interval": "60d",
"time_zone": "America/New_York"
}
}
}
}
},
{
"name": "date_histogram_fixed_interval_with_metrics",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"date_histogram": {
"field": "dropoff_datetime",
"fixed_interval": "60d"
},
"aggs": {
"total_amount": { "stats": { "field": "total_amount" } },
"tip_amount": { "stats": { "field": "tip_amount" } },
"trip_distance": { "stats": { "field": "trip_distance" } }
}
}
}
}
},
{
"name": "auto_date_histogram",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"auto_date_histogram": {
"field": "dropoff_datetime",
"buckets": "12"
}
}
}
}
},
{
"name": "auto_date_histogram_with_tz",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"auto_date_histogram": {
"field": "dropoff_datetime",
"buckets": "13",
"time_zone": "America/New_York"
}
}
}
}
},
{
"name": "auto_date_histogram_with_metrics",
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"gte": "2015-01-01 00:00:00",
"lt": "2016-01-01 00:00:00"
}
}
},
"aggs": {
"dropoffs_over_time": {
"auto_date_histogram": {
"field": "dropoff_datetime",
"buckets": "12"
},
"aggs": {
"total_amount": { "stats": { "field": "total_amount" } },
"tip_amount": { "stats": { "field": "tip_amount" } },
"trip_distance": { "stats": { "field": "trip_distance" } }
}
}
}
}
},
{
"name": "desc_sort_tip_amount",
"operation-type": "search",
"index": "nyc_taxis",
"body": {
"query": {
"match_all": {}
},
"sort" : [
{"tip_amount" : "desc"}
]
}
},
{
"name": "asc_sort_tip_amount",
"operation-type": "search",
"index": "nyc_taxis",
"body": {
"query": {
"match_all": {}
},
"sort" : [
{"tip_amount" : "asc"}
]
}
}
_test-procedures
目录包含一个 default.json
文件,该文件设置了工作负载执行的操作顺序。与 _operations
目录类似,_test-procedures
目录还可以包含特定于功能的测试过程,例如 nyc_taxis
的 searchable_snapshots.json
。以下示例显示了 nyc_taxis
的可搜索快照测试过程
{
"name": "searchable-snapshot",
"description": "Measuring performance for Searchable Snapshot feature. Based on the default test procedure 'append-no-conflicts'.",
"schedule": [
{
"operation": "delete-index"
},
{
"operation": {
"operation-type": "create-index",
"settings": {%- if index_settings is defined %} {{ index_settings | tojson }} {%- else %}{
"index.codec": "best_compression",
"index.refresh_interval": "30s",
"index.translog.flush_threshold_size": "4g"
}{%- endif %}
}
},
{
"name": "check-cluster-health",
"operation": {
"operation-type": "cluster-health",
"index": "nyc_taxis",
"request-params": {
"wait_for_status": "{{ cluster_health | default('green') }}",
"wait_for_no_relocating_shards": "true"
},
"retry-until-success": true
}
},
{
"operation": "index",
"warmup-time-period": 240,
"clients": {{ bulk_indexing_clients | default(8) }},
"ignore-response-error-level": "{{ error_level | default('non-fatal') }}"
},
{
"name": "refresh-after-index",
"operation": "refresh"
},
{
"operation": {
"operation-type": "force-merge",
"request-timeout": 7200
{%- if force_merge_max_num_segments is defined %},
"max-num-segments": {{ force_merge_max_num_segments | tojson }}
{%- endif %}
}
},
{
"name": "refresh-after-force-merge",
"operation": "refresh"
},
{
"operation": "wait-until-merges-finish"
},
{
"operation": "create-snapshot-repository"
},
{
"operation": "delete-snapshot"
},
{
"operation": "create-snapshot"
},
{
"operation": "wait-for-snapshot-creation"
},
{
"operation": {
"name": "delete-local-index",
"operation-type": "delete-index"
}
},
{
"operation": "restore-snapshot"
},
{
"operation": "default",
"warmup-iterations": 50,
"iterations": 100
{%- if not target_throughput %}
,"target-throughput": 3
{%- elif target_throughput is string and target_throughput.lower() == 'none' %}
{%- else %}
,"target-throughput": {{ target_throughput | tojson }}
{%- endif %}
{%-if search_clients is defined and search_clients %}
,"clients": {{ search_clients | tojson}}
{%- endif %}
},
{
"operation": "range",
"warmup-iterations": 50,
"iterations": 100
{%- if not target_throughput %}
,"target-throughput": 0.7
{%- elif target_throughput is string and target_throughput.lower() == 'none' %}
{%- else %}
,"target-throughput": {{ target_throughput | tojson }}
{%- endif %}
{%-if search_clients is defined and search_clients %}
,"clients": {{ search_clients | tojson}}
{%- endif %}
},
{
"operation": "distance_amount_agg",
"warmup-iterations": 50,
"iterations": 50
{%- if not target_throughput %}
,"target-throughput": 2
{%- elif target_throughput is string and target_throughput.lower() == 'none' %}
{%- else %}
,"target-throughput": {{ target_throughput | tojson }}
{%- endif %}
{%-if search_clients is defined and search_clients %}
,"clients": {{ search_clients | tojson}}
{%- endif %}
},
{
"operation": "autohisto_agg",
"warmup-iterations": 50,
"iterations": 100
{%- if not target_throughput %}
,"target-throughput": 1.5
{%- elif target_throughput is string and target_throughput.lower() == 'none' %}
{%- else %}
,"target-throughput": {{ target_throughput | tojson }}
{%- endif %}
{%-if search_clients is defined and search_clients %}
,"clients": {{ search_clients | tojson}}
{%- endif %}
},
{
"operation": "date_histogram_agg",
"warmup-iterations": 50,
"iterations": 100
{%- if not target_throughput %}
,"target-throughput": 1.5
{%- elif target_throughput is string and target_throughput.lower() == 'none' %}
{%- else %}
,"target-throughput": {{ target_throughput | tojson }}
{%- endif %}
{%-if search_clients is defined and search_clients %}
,"clients": {{ search_clients | tojson}}
{%- endif %}
}
]
}
后续步骤
现在您已经熟悉了工作负载的剖析,请参阅 选择工作负载 的标准。