Link Search Menu Expand Document Documentation Menu

性能分析器

性能分析器是一个插件,包含一个代理和 REST API,允许您查询众多集群性能指标,包括这些指标的聚合。

性能分析器插件在 OpenSearch 2.0 及更高版本中默认安装。如果您希望在 OpenSearch 2.0 或更高版本中禁用性能分析器,请参阅禁用性能分析器

先决条件

在 OpenSearch 中使用性能分析器之前,请查看以下先决条件。

存储

性能分析器使用 /dev/shm 作为临时存储。在繁重的集群工作负载期间,性能分析器最多可使用 1 GB 的空间。

然而,Docker 的 /dev/shm 默认大小为 64 MB。要更改此值,您可以使用 docker run --shm-size 1gb 标志或 Docker Compose 中的类似设置

如果您不使用 Docker,可以使用 df -h 检查 /dev/shm 的大小。默认值应该足够,但如果您需要更改其大小,请将以下行添加到 /etc/fstab

tmpfs /dev/shm tmpfs defaults,noexec,nosuid,size=1G 0 0

然后重新挂载文件系统

mount -o remount /dev/shm

安全

性能分析器支持请求的传输中加密。它目前支持请求的客户端或服务器身份验证。要启用传输中加密,请编辑您 $OPENSEARCH_HOME 目录中的 performance-analyzer.properties

vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties

更改以下行以配置传输中加密。请注意,certificate-file-path 必须是服务器证书,而不是根证书颁发机构 (CA)。

https-enabled = true

#Setup the correct path for certificates
certificate-file-path = specify_path

private-key-file-path = specify_path

安装性能分析器

性能分析器插件已包含在 Dockertarball 安装中,但您也可以手动安装该插件。

要手动安装性能分析器插件,请从 Maven 下载该插件,并使用标准的 插件安装 过程进行安装。性能分析器在集群中的每个节点上运行。

要在 tarball 安装上启动性能分析器根本原因分析 (RCA) 代理,请运行以下命令:

OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli

以下命令启用性能分析器插件。

curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'

禁用性能分析器

如果您希望节省内存并在禁用性能分析器插件的情况下运行本地 OpenSearch 实例,请执行以下步骤:

  1. 在禁用性能分析器之前,请使用以下命令停止任何当前正在运行的 RCA 代理操作:
  curl -XPOST localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
  1. 通过运行以下命令关闭性能分析器 RCA 代理:
  kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}')
  1. 通过运行以下命令禁用性能分析器插件:
  curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
  1. 通过运行以下命令卸载性能分析器插件:
  bin/opensearch-plugin remove opensearch-performance-analyzer

配置性能分析器

要配置性能分析器插件,请编辑 config/opensearch-performance-analyzer/ 目录中的 performance-analyzer.properties 配置文件。请务必取消注释 #webservice-bind-host 行并将其设置为 0.0.0.0。您可以参考以下示例配置。

# ======================== OpenSearch Performance Analyzer plugin config =========================

# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.

# WebService bind host; default to all interfaces
webservice-bind-host = 0.0.0.0

# Metrics data location
metrics-location = /dev/shm/performanceanalyzer/

# Metrics deletion interval (minutes) for metrics data.
# Interval should be between 1 to 60.
metrics-deletion-interval = 1

# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
# the files and wouldn't like for them to be cleaned up.
cleanup-metrics-db-files = true

# WebService exposed by App's port
webservice-listener-port = 9600

# Metric DB File Prefix Path location
metrics-db-file-prefix-path = /tmp/metricsdb_

https-enabled = false

#Setup the correct path for certificates
#certificate-file-path = specify_path

#private-key-file-path = specify_path

# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata

# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata

要启动性能分析器 RCA 代理,请运行以下命令:

OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli

为 RPM/YUM 安装启用性能分析器

如果您是从 RPM 分发安装 OpenSearch,可以使用 systemctl 启动和停止性能分析器。

# Start OpenSearch Performance Analyzer
sudo systemctl start opensearch-performance-analyzer.service
# Stop OpenSearch Performance Analyzer
sudo systemctl stop opensearch-performance-analyzer.service

示例 API 查询和响应

以下是性能分析器 API 查询的示例。该查询获取与您的 OpenSearch 集群相关的性能指标。

GET localhost:9600/_plugins/_performanceanalyzer/metrics/units

以下是示例响应。

{"Disk_Utilization":"%","Cache_Request_Hit":"count", 
"Refresh_Time":"ms","ThreadPool_QueueLatency":"count",
"Merge_Time":"ms","ClusterApplierService_Latency":"ms",
"PublishClusterState_Latency":"ms",
"Cache_Request_Size":"B","LeaderCheck_Failure":"count",
"ThreadPool_QueueSize":"count","Sched_Runtime":"s/ctxswitch","Disk_ServiceRate":"MB/s","Heap_AllocRate":"B/s","Indexing_Pressure_Current_Limits":"B",
"Sched_Waittime":"s/ctxswitch","ShardBulkDocs":"count",
"Thread_Blocked_Time":"s/event","VersionMap_Memory":"B",
"Master_Task_Queue_Time":"ms","IO_TotThroughput":"B/s",
"Indexing_Pressure_Current_Bytes":"B",
"Indexing_Pressure_Last_Successful_Timestamp":"ms",
"Net_PacketRate6":"packets/s","Cache_Query_Hit":"count",
"IO_ReadSyscallRate":"count/s","Net_PacketRate4":"packets/s","Cache_Request_Miss":"count",
"ThreadPool_RejectedReqs":"count","Net_TCP_TxQ":"segments/flow","Master_Task_Run_Time":"ms",
"IO_WriteSyscallRate":"count/s","IO_WriteThroughput":"B/s",
"Refresh_Event":"count","Flush_Time":"ms","Heap_Init":"B",
"Indexing_Pressure_Rejection_Count":"count",
"CPU_Utilization":"cores","Cache_Query_Size":"B",
"Merge_Event":"count","Cache_FieldData_Eviction":"count",
"IO_TotalSyscallRate":"count/s","Net_Throughput":"B/s",
"Paging_RSS":"pages",
"AdmissionControl_ThresholdValue":"count",
"Indexing_Pressure_Average_Window_Throughput":"count/s",
"Cache_MaxSize":"B","IndexWriter_Memory":"B",
"Net_TCP_SSThresh":"B/flow","IO_ReadThroughput":"B/s",
"LeaderCheck_Latency":"ms","FollowerCheck_Failure":"count",
"HTTP_RequestDocs":"count","Net_TCP_Lost":"segments/flow",
"GC_Collection_Event":"count","Sched_CtxRate":"count/s",
"AdmissionControl_RejectionCount":"count","Heap_Max":"B",
"ClusterApplierService_Failure":"count",
"PublishClusterState_Failure":"count",
"Merge_CurrentEvent":"count","Indexing_Buffer":"B",
"Bitset_Memory":"B","Net_PacketDropRate4":"packets/s",
"Heap_Committed":"B","Net_PacketDropRate6":"packets/s",
"Thread_Blocked_Event":"count","GC_Collection_Time":"ms",
"Cache_Query_Miss":"count","Latency":"ms",
"Shard_State":"count","Thread_Waited_Event":"count",
"CB_ConfiguredSize":"B","ThreadPool_QueueCapacity":"count",
"CB_TrippedEvents":"count","Disk_WaitTime":"ms",
"Data_RetryingPendingTasksCount":"count",
"AdmissionControl_CurrentValue":"count",
"Flush_Event":"count","Net_TCP_RxQ":"segments/flow",
"Shard_Size_In_Bytes":"B","Thread_Waited_Time":"s/event",
"HTTP_TotalRequests":"count",
"ThreadPool_ActiveThreads":"count",
"Paging_MinfltRate":"count/s","Net_TCP_SendCWND":"B/flow",
"Cache_Request_Eviction":"count","Segments_Total":"count",
"FollowerCheck_Latency":"ms","Heap_Used":"B",
"Master_ThrottledPendingTasksCount":"count",
"CB_EstimatedSize":"B","Indexing_ThrottleTime":"ms",
"Master_PendingQueueSize":"count",
"Cache_FieldData_Size":"B","Paging_MajfltRate":"count/s",
"ThreadPool_TotalThreads":"count","ShardEvents":"count",
"Net_TCP_NumFlows":"count","Election_Term":"count"}

根本原因分析

根本原因分析 (RCA) 框架使用性能分析器提供的信息,告知管理员其集群所遇到的性能和可用性问题的根本原因。

启用 RCA 框架

要启用 RCA 框架,请运行以下命令:

curl -XPOST https://:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'

如果您遇到 curl: (52) Empty reply from server 响应,请运行以下命令以启用 RCA:

curl -XPOST https://:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:<custom-admin-password>' -k

示例 API 查询和响应

要请求所有可用的 RCA,请运行以下命令:

GET localhost:9600/_plugins/_performanceanalyzer/rca

要请求特定的 RCA,请运行以下命令:

GET localhost:9600/_plugins/_performanceanalyzer/rca?name=HighHeapUsageClusterRCA

以下是示例响应。

{
  "HighHeapUsageClusterRCA": [{
    "RCA_name": "HighHeapUsageClusterRCA",
    "state": "unhealthy",
    "timestamp": 1587426650942,
    "HotClusterSummary": [{
      "number_of_nodes": 2,
      "number_of_unhealthy_nodes": 1,
      "HotNodeSummary": [{
        "host_address": "192.168.144.2",
        "node_id": "JtlEoRowSI6iNpzpjlbp_Q",
        "HotResourceSummary": [{
          "resource_type": "old gen",
          "threshold": 0.65,
          "value": 0.81827232588145373,
          "avg": NaN,
          "max": NaN,
          "min": NaN,
          "unit_type": "heap usage in percentage",
          "time_period_seconds": 600,
          "TopConsumerSummary": [{
              "name": "CACHE_FIELDDATA_SIZE",
              "value": 590702564
            },
            {
              "name": "CACHE_REQUEST_SIZE",
              "value": 28375
            },
            {
              "name": "CACHE_QUERY_SIZE",
              "value": 12687
            }
          ],
        }]
      }]
    }]
  }]
}

有关性能分析器和 RCA 用法的更多文档可在以下链接中找到:


相关文章