Link Search Menu Expand Document Documentation Menu

工作负载的构成

所有工作负载都包含以下文件和目录

  • workload.json:包含所有工作负载设置。
  • index.json:包含文档映射和参数以及索引设置。
  • files.txt:包含数据语料库文件名。
  • _test-procedures:大多数工作负载只包含一个默认测试过程,该过程在 default.json 中配置。
  • _operations:包含测试过程中使用的所有操作。
  • workload.py:为测试增加了更多动态功能。

workload.json

以下工作负载示例展示了创建 workload.json 文件所需的所有基本元素。您可以在自己的基准配置中运行此工作负载,以了解所有元素如何协同工作。

{
  "description": "Tutorial benchmark for OpenSearch Benchmark",
  "indices": [
    {
      "name": "movies",
      "body": "index.json"
    }
  ],
  "corpora": [
    {
      "name": "movies",
      "documents": [
        {
          "source-file": "movies-documents.json",
          "document-count": 11658903, # Fetch document count from command line
          "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
    },
    {
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "iterations": 1000,
      "target-throughput": 100
    }
  ]
}

工作负载通常包括以下元素

  • indices:定义工作负载使用的相关索引和索引模板。
  • corpora:定义工作负载使用的所有文档语料库。
  • schedule:定义操作以及操作的内联运行顺序。或者,您可以使用 operations 对操作进行分组,并使用 test_procedures 参数指定操作顺序。
  • operations可选。描述工作负载有哪些可用操作以及如何对它们进行参数化。

索引

要创建索引,请指定其 name。要向索引添加定义,请使用 body 选项并将其指向包含索引定义的 JSON 文件。有关更多信息,请参阅 索引

语料库

corpora 元素需要包含文档语料库的索引名称(例如 movies)以及定义文档语料库的参数列表。此列表包括以下参数

  • source-file:包含工作负载相应文档的文件名。在本地使用 OpenSearch Benchmark 时,文档包含在 JSON 文件中。提供 base_url 时,请使用压缩文件格式:.zip.bz2.zst.gz.tar.tar.gz.tgz.tar.bz2。压缩文件必须包含一个包含名称的 JSON 文件。
  • document-countsource-file 中的文档数量,它决定了哪些客户端索引与文档语料库的哪些部分相关联。每个 N 客户端被分配文档语料库的 N 分之一,以摄取到测试集群中。当使用包含具有父/子关系的文档的源时,请指定父文档的数量。
  • uncompressed-bytes:解压后源文件的大小(以字节为单位),表示解压后的源文件需要多少磁盘空间。
  • compressed-bytes:解压前源文件的大小(以字节为单位)。这可以帮助您评估集群摄取文档所需的时间。

操作

operations 元素列出了工作负载执行的 OpenSearch API 操作。例如,您可以列出一个名为 create-index 的操作,该操作在基准测试集群中创建一个索引,OpenSearch Benchmark 可以向其中写入文档。操作通常列在 schedule 元素内部。

计划

schedule 元素包含一个按指定顺序运行的操作列表,如以下 JSON 示例所示

  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
    },
    {
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "iterations": 1000,
      "target-throughput": 100
    }
  ]
}

根据此 schedule,操作将按以下顺序运行

  1. create-index 操作创建一个索引。索引保持为空,直到 bulk 操作添加带有基准测试数据的文档。
  2. cluster-health 操作在运行工作负载之前评估集群的健康状况。在 JSON 示例中,工作负载会一直等待,直到集群的健康状态为 green
    • bulk 操作运行 bulk API,同时索引 5000 个文档。
    • 在基准测试之前,工作负载会等待指定的 warmup-time-period 过去。在 JSON 示例中,预热期为 120 秒。
  3. clients 字段定义了将同时运行批量索引操作的客户端数量,在此示例中为八个。
  4. search 操作运行 match_all 查询,以匹配所有文档在通过指定的客户端由 bulk API 索引后。
    • iterations 字段定义了每个客户端运行 search 操作的次数。基准测试报告会根据此数字自动调整百分位数。要生成精确的百分位数,基准测试需要至少运行 1,000 次迭代。
    • target-throughput 字段定义了每个客户端每秒执行的请求数量。此设置可以帮助降低基准测试延迟。例如,target-throughput 为 100 个请求除以 8 个客户端,意味着每个客户端每秒将发出 12 个请求。有关如何在 OpenSearch Benchmark 中定义目标吞吐量的更多信息,请参阅 目标吞吐量

index.json

index.json 文件定义了在 create-index 操作期间工作负载文档的数据映射、索引参数和索引设置。

当 OpenSearch Benchmark 为工作负载创建索引时,它使用 index.json 文件中的索引设置和映射模板。index.json 文件中的映射基于工作负载语料库中单个文档的映射,该文档存储在 files.txt 文件中。以下是 nyc_taxis 工作负载的 index.json 文件示例。您可以自定义字段,例如 number_of_shardsnumber_of_replicasquery_cache_enabledrequests_cache_enabled

{
  "settings": {
    "index.number_of_shards": {{number_of_shards | default(1)}},
    "index.number_of_replicas": {{number_of_replicas | default(0)}},
    "index.queries.cache.enabled": {{query_cache_enabled | default(false) | tojson}},
    "index.requests.cache.enable": {{requests_cache_enabled | default(false) | tojson}}
  },
  "mappings": {
    "_source": {
      "enabled": {{ source_enabled | default(true) | tojson }}
    },
    "properties": {
      "surcharge": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "dropoff_datetime": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "trip_type": {
        "type": "keyword"
      },
      "mta_tax": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "rate_code_id": {
        "type": "keyword"
      },
      "passenger_count": {
        "type": "integer"
      },
      "pickup_datetime": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "tolls_amount": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "tip_amount": {
        "type": "half_float"
      },
      "payment_type": {
        "type": "keyword"
      },
      "extra": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "vendor_id": {
        "type": "keyword"
      },
      "store_and_fwd_flag": {
        "type": "keyword"
      },
      "improvement_surcharge": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "fare_amount": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "ehail_fee": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "cab_color": {
        "type": "keyword"
      },
      "dropoff_location": {
        "type": "geo_point"
      },
      "vendor_name": {
        "type": "text"
      },
      "total_amount": {
        "scaling_factor": 100,
        "type": "scaled_float"
      },
      "trip_distance": {%- if trip_distance_mapping is defined %} {{ trip_distance_mapping | tojson }} {%- else %} {
        "scaling_factor": 100,
        "type": "scaled_float"
      }{%- endif %},
      "pickup_location": {
        "type": "geo_point"
      }
    },
    "dynamic": "strict"
  }
}

files.txt

files.txt 文件列出了存储工作负载数据的文件,这些文件通常存储在压缩的 JSON 文件中。

_operations 和 _test-procedures

为了使工作负载更易于阅读,_operations_test-procedures 被分离到两个目录中。

_operations 目录包含一个 default.json 文件,该文件列出了测试过程可以使用的所有受支持操作。一些工作负载(例如 nyc_taxis)包含一个额外的 .json 文件,该文件列出了特定于功能的操作,例如 snapshot 操作。以下 JSON 示例显示了 nyc_taxis 工作负载中的操作列表

    {
      "name": "index",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(10000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}}
    },
    {
      "name": "update",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(10000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}},
      "conflicts": "{{conflicts | default('random')}}",
      "on-conflict": "{{on_conflict | default('update')}}",
      "conflict-probability": {{conflict_probability | default(25)}},
      "recency": {{recency | default(0)}}
    },
    {
      "name": "wait-until-merges-finish",
      "operation-type": "index-stats",
      "index": "_all",
      "condition": {
        "path": "_all.total.merges.current",
        "expected-value": 0
      },
      "retry-until-success": true,
      "include-in-reporting": false
    },
    {
      "name": "default",
      "operation-type": "search",
      "body": {
        "query": {
          "match_all": {}
        }
      }
    },
    {
      "name": "range",
      "operation-type": "search",
      "body": {
        "query": {
          "range": {
            "total_amount": {
              "gte": 5,
              "lt": 15
            }
          }
        }
      }
    },
    {
      "name": "distance_amount_agg",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "bool": {
            "filter": {
              "range": {
                "trip_distance": {
                  "lt": 50,
                  "gte": 0
                }
              }
            }
          }
        },
        "aggs": {
          "distance_histo": {
            "histogram": {
              "field": "trip_distance",
              "interval": 1
            },
            "aggs": {
              "total_amount_stats": {
                "stats": {
                  "field": "total_amount"
                }
              }
            }
          }
        }
      }
    },
    {
      "name": "autohisto_agg",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "01/01/2015",
              "lte": "21/01/2015",
              "format": "dd/MM/yyyy"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": 20
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_agg",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
              "dropoff_datetime": {
              "gte": "01/01/2015",
              "lte": "21/01/2015",
              "format": "dd/MM/yyyy"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "day"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_calendar_interval",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_calendar_interval_with_tz",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month",
              "time_zone": "America/New_York"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_fixed_interval",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "fixed_interval": "60d"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_fixed_interval_with_tz",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "fixed_interval": "60d",
              "time_zone": "America/New_York"
            }
          }
        }
      }
    },
    {
      "name": "date_histogram_fixed_interval_with_metrics",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "fixed_interval": "60d"
            },
            "aggs": {
              "total_amount": { "stats": { "field": "total_amount" } },
              "tip_amount": { "stats": { "field": "tip_amount" } },
              "trip_distance": { "stats": { "field": "trip_distance" } }
            }
          }
        }
      }
    },
    {
      "name": "auto_date_histogram",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": "12"
            }
          }
        }
      }
    },
    {
      "name": "auto_date_histogram_with_tz",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": "13",
              "time_zone": "America/New_York"
            }
          }
        }
      }
    },
    {
      "name": "auto_date_histogram_with_metrics",
      "operation-type": "search",
      "body": {
        "size": 0,
        "query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2016-01-01 00:00:00"
            }
          }
        },
        "aggs": {
          "dropoffs_over_time": {
            "auto_date_histogram": {
              "field": "dropoff_datetime",
              "buckets": "12"
            },
            "aggs": {
              "total_amount": { "stats": { "field": "total_amount" } },
              "tip_amount": { "stats": { "field": "tip_amount" } },
              "trip_distance": { "stats": { "field": "trip_distance" } }
            }
          }
        }
      }
    },
    {
      "name": "desc_sort_tip_amount",
      "operation-type": "search",
      "index": "nyc_taxis",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"tip_amount" : "desc"}
        ]
      }
    },
    {
      "name": "asc_sort_tip_amount",
      "operation-type": "search",
      "index": "nyc_taxis",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"tip_amount" : "asc"}
        ]
      }
    }

_test-procedures 目录包含一个 default.json 文件,该文件设置了工作负载执行的操作顺序。与 _operations 目录类似,_test-procedures 目录还可以包含特定于功能的测试过程,例如 nyc_taxissearchable_snapshots.json。以下示例显示了 nyc_taxis 的可搜索快照测试过程

 {
      "name": "searchable-snapshot",
      "description": "Measuring performance for Searchable Snapshot feature. Based on the default test procedure 'append-no-conflicts'.",
      "schedule": [
        {
          "operation": "delete-index"
        },
        {
          "operation": {
            "operation-type": "create-index",
            "settings": {%- if index_settings is defined %} {{ index_settings | tojson }} {%- else %}{
              "index.codec": "best_compression",
              "index.refresh_interval": "30s",
              "index.translog.flush_threshold_size": "4g"
            }{%- endif %}
          }
        },
        {
          "name": "check-cluster-health",
          "operation": {
            "operation-type": "cluster-health",
            "index": "nyc_taxis",
            "request-params": {
              "wait_for_status": "{{ cluster_health | default('green') }}",
              "wait_for_no_relocating_shards": "true"
            },
            "retry-until-success": true
          }
        },
        {
          "operation": "index",
          "warmup-time-period": 240,
          "clients": {{ bulk_indexing_clients | default(8) }},
          "ignore-response-error-level": "{{ error_level | default('non-fatal') }}"
        },
        {
          "name": "refresh-after-index",
          "operation": "refresh"
        },
        {
          "operation": {
            "operation-type": "force-merge",
            "request-timeout": 7200
            {%- if force_merge_max_num_segments is defined %},
            "max-num-segments": {{ force_merge_max_num_segments | tojson }}
            {%- endif %}
          }
        },
        {
          "name": "refresh-after-force-merge",
          "operation": "refresh"
        },
        {
          "operation": "wait-until-merges-finish"
        },
        {
          "operation": "create-snapshot-repository"
        },
        {
          "operation": "delete-snapshot"
        },
        {
          "operation": "create-snapshot"
        },
        {
          "operation": "wait-for-snapshot-creation"
        },
        {
          "operation": {
            "name": "delete-local-index",
            "operation-type": "delete-index"
          }
        },
        {
          "operation": "restore-snapshot"
        },
        {
          "operation": "default",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 3
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "range",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 0.7
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "distance_amount_agg",
          "warmup-iterations": 50,
          "iterations": 50
          {%- if not target_throughput %}
          ,"target-throughput": 2
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "autohisto_agg",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 1.5
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        },
        {
          "operation": "date_histogram_agg",
          "warmup-iterations": 50,
          "iterations": 100
          {%- if not target_throughput %}
          ,"target-throughput": 1.5
          {%- elif target_throughput is string and target_throughput.lower() == 'none' %}
          {%- else %}
          ,"target-throughput": {{ target_throughput | tojson }}
          {%- endif %}
          {%-if search_clients is defined and search_clients %}
          ,"clients": {{ search_clients | tojson}}
          {%- endif %}
        }
      ]
    }

后续步骤

现在您已经熟悉了工作负载的剖析,请参阅 选择工作负载 的标准。

剩余 350 字符

有问题?

想贡献?