创建自定义工作负载

OpenSearch Benchmark 包含一组可用于对集群数据进行基准测试的工作负载。此外，如果您想创建针对您自己数据量身定制的工作负载，可以使用以下选项之一创建自定义工作负载

创建自定义工作负载

从现有集群创建工作负载

如果您已经拥有一个包含索引数据的 OpenSearch 集群，请使用以下步骤为您的集群创建自定义工作负载。

先决条件

在创建自定义 OpenSearch Benchmark 工作负载之前，请确保您已满足以下先决条件

一个 OpenSearch 集群，其中包含一个包含 1000 个或更多文档的索引。如果您的集群索引不包含至少 1000 个文档，工作负载仍然可以运行测试，但是，您不能使用 --test-mode 运行工作负载。
您必须具有正确的权限才能访问您的 OpenSearch 集群。有关集群权限的更多信息，请参阅权限。

自定义工作负载

要开始创建自定义 OpenSearch Benchmark 工作负载，请使用 opensearch-benchmark create-workload 命令。

opensearch-benchmark create-workload \
--workload="<WORKLOAD NAME>" \
--target-hosts="<CLUSTER ENDPOINT>" \
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \
--indices="<INDEXES TO GENERATE WORKLOAD FROM>" \
--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>"

将上述示例中的以下选项替换为与您现有集群相关的信息

--workload: 您自定义工作负载的自定义名称。
--target-hosts: 以逗号分隔的主机:端口对列表，集群将从中提取数据。
--client-options: OpenSearch Benchmark 用于访问集群的基本身份验证客户端选项。
--indices: 您的 OpenSearch 集群中包含数据的一个或多个索引。
--output-path: OpenSearch Benchmark 创建工作负载及其配置文件所在的目录。

以下示例响应从一个包含名为 movies-info 索引的集群中创建一个名为 movies 的工作负载。movies-info 索引包含 2,000 多个文档。

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Connected to OpenSearch cluster [380d8fd64dd85b5f77c0ad81b0799e1e] version [1.1.0].

Extracting documents for index [movies] for test mode...      1000/1000 docs [100.0% done]
Extracting documents for index [movies]...                    2000/2000 docs [100.0% done]

[INFO] Workload movies has been created. Run it with: opensearch-benchmark --workload-path=/Users/hoangia/Desktop/workloads/movies

-------------------------------
[INFO] SUCCESS (took 2 seconds)
-------------------------------

作为工作负载创建的一部分，OpenSearch Benchmark 生成以下文件。您可以在 --output-path 选项指定的目录中访问它们。

workload.json: 包含通用工作负载规范。
<index>.json: 包含提取索引的映射和设置。
<index>-documents.json: 包含提取索引中每个文档的源。任何以 -1k 为后缀的源仅包含工作负载文档语料库的一小部分，并且仅在测试模式下运行工作负载时使用。

默认情况下，OpenSearch Benchmark 不包含生成查询的引用。由于您最了解自己的数据，我们建议您向 workload.json 添加一个与您的索引规范匹配的查询。以下面的 match_all 查询作为添加到工作负载中的查询示例

{
      "operation": {
        "name": "query-match-all",
        "operation-type": "search",
        "body": {
          "query": {
            "match_all": {}
          }
        }
      },
      "clients": 8,
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 100
    }

在没有现有集群的情况下创建工作负载

如果您想创建自定义工作负载但没有包含索引数据的现有 OpenSearch 集群，您可以通过直接构建工作负载源文件来创建工作负载。您只需要可以导出为 JSON 格式的数据。

要使用源文件构建工作负载，请为您的工作负载创建一个目录并执行以下步骤

构建一个 <index>-documents.json 文件，其中包含构成工作负载文档语料库的文档行，并存储所有要摄取和查询到集群中的数据。以下示例显示了 movies-documents.json 文件的前几行，其中包含有关著名电影的文档行

  # First few rows of movies-documents.json
  {"title": "Back to the Future", "director": "Robert Zemeckis", "revenue": "$212,259,762 USD", "rating": "8.5 out of 10",  "image_url": "https://imdb.com/images/32"}
  {"title": "Avengers: Endgame", "director": "Anthony and Joe Russo", "revenue": "$2,800,000,000 USD", "rating": "8.4 out   of 10", "image_url": "https://imdb.com/images/2"}
  {"title": "The Grand Budapest Hotel", "director": "Wes Anderson", "revenue": "$173,000,000 USD", "rating": "8.1 out of 10", "image_url": "https://imdb.com/images/65"}
  {"title": "The Godfather: Part II", "director": "Francis Ford Coppola", "revenue": "$48,000,000 USD", "rating": "9 out of 10", "image_url": "https://imdb.com/images/7"}

在同一目录中，构建一个 index.json 文件。工作负载将此文件用作 <index>-documents.json 中包含的文档的数据映射和索引设置的参考。以下示例创建了特定于上一步中 movie-documents.json 数据的映射和设置

 {
 "settings": {
     "index.number_of_replicas": 0
 },
 "mappings": {
     "dynamic": "strict",
     "properties": {
     "title": {
         "type": "text"
     },
     "director": {
         "type": "text"
     },
     "revenue": {
         "type": "text"
     },
     "rating": {
         "type": "text"
     },
     "image_url": {
         "type": "text"
     }
     }
 }
 }

接下来，构建一个 workload.json 文件，该文件提供工作负载的高级概述并确定工作负载如何运行基准测试。workload.json 文件包含以下部分
- indices: 使用上一步中创建的工作负载的 index.json 文件中的映射，定义要在 OpenSearch 集群中创建的索引名称。
- corpora: 定义语料库和源文件，包括
  - document-count: <index>-documents.json 中的文档数量。要获取准确的文档数量，请运行 wc -l <index>-documents.json。
  - uncompressed-bytes: 索引内的字节数。要获取准确的字节数，请在 macOS 上运行 stat -f %z <index>-documents.json 或在 GNU/Linux 上运行 stat -c %s <index>-documents.json。或者，运行 ls -lrt | grep <index>-documents.json。
- schedule: 定义工作负载的操作序列和可用的测试程序。

以下示例 workload.json 文件提供了 movies 工作负载的入口点。indices 部分创建一个名为 movies 的索引。`corpora` 部分引用了在第一步中创建的源文件 movie-documents.json，并提供了文档计数和未压缩字节量。最后，`schedule` 部分定义了工作负载被调用时执行的一些操作，包括

删除任何当前名为 movies 的索引。
根据 movie-documents.json 中的数据和 index.json 中的映射创建名为 movies 的索引。
验证集群是否健康且可以摄取新索引。
将 workload.json 中的数据语料库摄取到集群中。

查询结果。

  {
  "version": 2,
  "description": "Tutorial benchmark for OpenSearch Benchmark",
  "indices": [
      {
      "name": "movies",
      "body": "index.json"
      }
  ],
  "corpora": [
      {
      "name": "movies",
      "documents": [
          {
          "source-file": "movies-documents.json",
          "document-count": 11658903, # Fetch document count from command line
          "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
          }
      ]
      }
  ],
  "schedule": [
      {
      "operation": {
          "operation-type": "delete-index"
      }
      },
      {
      "operation": {
          "operation-type": "create-index"
      }
      },
      {
      "operation": {
          "operation-type": "cluster-health",
          "request-params": {
          "wait_for_status": "green"
          },
          "retry-until-success": true
      }
      },
      {
      "operation": {
          "operation-type": "bulk",
          "bulk-size": 5000
      },
      "warmup-time-period": 120,
      "clients": 8
      },
      {
      "operation": {
          "operation-type": "force-merge"
      }
      },
      {
      "operation": {
          "name": "query-match-all",
          "operation-type": "search",
          "body": {
          "query": {
              "match_all": {}
          }
          }
      },
      "clients": 8,
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 100
      }
  ]
  }

`corpora` 部分引用了在第一步中创建的源文件 movie-documents.json，并提供了文档计数和未压缩字节量。最后，`schedule` 部分定义了工作负载被调用时执行的一些操作，包括

删除任何当前名为 movies 的索引。
根据 movie-documents.json 中的数据和 index.json 中的映射创建名为 movies 的索引。
- 验证集群是否健康且可以摄取新索引。
- 将 workload.json 中的数据语料库摄取到集群中。
- 查询结果。

对于所有创建的工作负载文件，通过运行测试来验证工作负载是否正常。要验证工作负载，请运行以下命令，将 --workload-path 替换为您的工作负载目录的路径

opensearch-benchmark list workloads --workload-path=</path/to/workload/>

调用您的自定义工作负载

使用 opensearch-benchmark execute-test 命令调用您的新工作负载并对 OpenSearch 集群运行基准测试，如以下示例所示。将 --workload-path 替换为您自定义工作负载的路径，将 --target-host 替换为集群的 host:port 对，并将 --client-options 替换为访问集群所需的任何授权选项。

opensearch-benchmark execute-test \
--pipeline="benchmark-only" \
--workload-path="<PATH OUTPUTTED IN THE OUTPUT OF THE CREATE-WORKLOAD COMMAND>" \
--target-host="<CLUSTER ENDPOINT>" \
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'"

测试结果将显示在 workloads.json 中由 --output-path 选项设置的目录中。

高级选项

您可以使用以下高级选项增强自定义工作负载的功能。

测试模式

如果您想在测试模式下运行测试以确保您的工作负载按预期运行，请将 --test-mode 选项添加到 execute-test 命令中。测试模式仅从提供的每个索引中摄取前 1000 个文档，并对其运行查询操作。

要使用测试模式，请使用以下命令创建一个 <index>-documents-1k.json 文件，其中包含 <index>-documents.json 中的前 1000 个文档

head -n 1000 <index>-documents.json > <index>-documents-1k.json

然后，使用 --test-mode 选项运行 opensearch-benchmark execute-test。测试模式运行工作负载测试的快速版本。

opensearch-benchmark execute-test \
--pipeline="benchmark-only"  \
--workload-path="<PATH OUTPUTTED IN THE OUTPUT OF THE CREATE-WORKLOAD COMMAND>" \
--target-host="<CLUSTER ENDPOINT>" \
--client-options"basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \
--test-mode

向测试程序添加变体

多次使用自定义工作负载后，您可能希望使用相同的工作负载，但以不同的顺序执行工作负载的操作。您无需创建新的工作负载或直接重新组织过程，而是可以提供测试程序来改变工作负载操作。

要为工作负载操作添加变体，请转到您的 workload.json 文件，并将 schedule 部分替换为 test_procedures 数组，如以下示例所示。数组中的每个项目都包含以下内容

name: 测试程序的名称。
default: 当设置为 true 时，如果未指定其他测试程序，OpenSearch Benchmark 将默认为工作负载中指定为 default 的测试程序。
schedule: 测试程序将运行的所有操作。

"test_procedures": [
    {
      "name": "index-and-query",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "delete-index"
          }
        },
        {
          "operation": {
            "operation-type": "create-index"
          }
        },
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            },
            "retry-until-success": true
          }
        },
        {
          "operation": {
            "operation-type": "bulk",
            "bulk-size": 5000
          },
          "warmup-time-period": 120,
          "clients": 8
        },
        {
          "operation": {
            "operation-type": "force-merge"
          }
        },
        {
          "operation": {
            "name": "query-match-all",
            "operation-type": "search",
            "body": {
              "query": {
                "match_all": {}
              }
            }
          },
          "clients": 8,
          "warmup-iterations": 1000,
          "iterations": 1000,
          "target-throughput": 100
        }
      ]
    }
  ]
}

分离操作和测试程序

如果您想让 workload.json 文件更具可读性，可以将操作和测试程序分离到不同的目录中，并在 workload.json 中引用每个文件的路径。要分离操作和程序，请执行以下步骤

将所有测试程序添加到单个文件中。您可以为该文件指定任何名称。由于前面的 movies 工作负载包含索引任务和查询，因此此步骤将测试程序文件命名为 index-and-query.json。
将所有操作添加到名为 operations.json 的文件中。

通过添加以下语法在 workloads.json 中引用新文件，将 parts 替换为每个文件的相对路径，如以下示例所示

 "operations": [
     {{ benchmark.collect(parts="operations/*.json") }}
 ]
 # Reference test procedure files in workload.json
 "test_procedures": [
     {{ benchmark.collect(parts="test_procedures/*.json") }}
 ]

后续步骤

有关配置 OpenSearch Benchmark 的更多信息，请参阅配置 OpenSearch Benchmark。
要查看 OpenSearch Benchmark 预打包工作负载的列表，请参阅 opensearch-benchmark-workloads 存储库。

从现有集群创建工作负载
调用您的自定义工作负载
高级选项
后续步骤

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想贡献？编辑此页或创建问题。