性能分析器
性能分析器是一个插件,包含一个代理和 REST API,允许您查询众多集群性能指标,包括这些指标的聚合。
性能分析器插件在 OpenSearch 2.0 及更高版本中默认安装。如果您希望在 OpenSearch 2.0 或更高版本中禁用性能分析器,请参阅禁用性能分析器。
先决条件
在 OpenSearch 中使用性能分析器之前,请查看以下先决条件。
存储
性能分析器使用 /dev/shm
作为临时存储。在繁重的集群工作负载期间,性能分析器最多可使用 1 GB 的空间。
然而,Docker 的 /dev/shm
默认大小为 64 MB。要更改此值,您可以使用 docker run --shm-size 1gb
标志或 Docker Compose 中的类似设置。
如果您不使用 Docker,可以使用 df -h
检查 /dev/shm
的大小。默认值应该足够,但如果您需要更改其大小,请将以下行添加到 /etc/fstab
tmpfs /dev/shm tmpfs defaults,noexec,nosuid,size=1G 0 0
然后重新挂载文件系统
mount -o remount /dev/shm
安全
性能分析器支持请求的传输中加密。它目前不支持请求的客户端或服务器身份验证。要启用传输中加密,请编辑您 $OPENSEARCH_HOME
目录中的 performance-analyzer.properties
。
vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties
更改以下行以配置传输中加密。请注意,certificate-file-path
必须是服务器证书,而不是根证书颁发机构 (CA)。
https-enabled = true
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
安装性能分析器
性能分析器插件已包含在 Docker 和 tarball 安装中,但您也可以手动安装该插件。
要手动安装性能分析器插件,请从 Maven 下载该插件,并使用标准的 插件安装 过程进行安装。性能分析器在集群中的每个节点上运行。
要在 tarball 安装上启动性能分析器根本原因分析 (RCA) 代理,请运行以下命令:
OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli
以下命令启用性能分析器插件。
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
禁用性能分析器
如果您希望节省内存并在禁用性能分析器插件的情况下运行本地 OpenSearch 实例,请执行以下步骤:
- 在禁用性能分析器之前,请使用以下命令停止任何当前正在运行的 RCA 代理操作:
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
- 通过运行以下命令关闭性能分析器 RCA 代理:
kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}')
- 通过运行以下命令禁用性能分析器插件:
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
- 通过运行以下命令卸载性能分析器插件:
bin/opensearch-plugin remove opensearch-performance-analyzer
配置性能分析器
要配置性能分析器插件,请编辑 config/opensearch-performance-analyzer/
目录中的 performance-analyzer.properties
配置文件。请务必取消注释 #webservice-bind-host
行并将其设置为 0.0.0.0
。您可以参考以下示例配置。
# ======================== OpenSearch Performance Analyzer plugin config =========================
# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
# WebService bind host; default to all interfaces
webservice-bind-host = 0.0.0.0
# Metrics data location
metrics-location = /dev/shm/performanceanalyzer/
# Metrics deletion interval (minutes) for metrics data.
# Interval should be between 1 to 60.
metrics-deletion-interval = 1
# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
# the files and wouldn't like for them to be cleaned up.
cleanup-metrics-db-files = true
# WebService exposed by App's port
webservice-listener-port = 9600
# Metric DB File Prefix Path location
metrics-db-file-prefix-path = /tmp/metricsdb_
https-enabled = false
#Setup the correct path for certificates
#certificate-file-path = specify_path
#private-key-file-path = specify_path
# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata
# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata
要启动性能分析器 RCA 代理,请运行以下命令:
OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli
为 RPM/YUM 安装启用性能分析器
如果您是从 RPM 分发安装 OpenSearch,可以使用 systemctl
启动和停止性能分析器。
# Start OpenSearch Performance Analyzer
sudo systemctl start opensearch-performance-analyzer.service
# Stop OpenSearch Performance Analyzer
sudo systemctl stop opensearch-performance-analyzer.service
示例 API 查询和响应
以下是性能分析器 API 查询的示例。该查询获取与您的 OpenSearch 集群相关的性能指标。
GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
以下是示例响应。
{"Disk_Utilization":"%","Cache_Request_Hit":"count",
"Refresh_Time":"ms","ThreadPool_QueueLatency":"count",
"Merge_Time":"ms","ClusterApplierService_Latency":"ms",
"PublishClusterState_Latency":"ms",
"Cache_Request_Size":"B","LeaderCheck_Failure":"count",
"ThreadPool_QueueSize":"count","Sched_Runtime":"s/ctxswitch","Disk_ServiceRate":"MB/s","Heap_AllocRate":"B/s","Indexing_Pressure_Current_Limits":"B",
"Sched_Waittime":"s/ctxswitch","ShardBulkDocs":"count",
"Thread_Blocked_Time":"s/event","VersionMap_Memory":"B",
"Master_Task_Queue_Time":"ms","IO_TotThroughput":"B/s",
"Indexing_Pressure_Current_Bytes":"B",
"Indexing_Pressure_Last_Successful_Timestamp":"ms",
"Net_PacketRate6":"packets/s","Cache_Query_Hit":"count",
"IO_ReadSyscallRate":"count/s","Net_PacketRate4":"packets/s","Cache_Request_Miss":"count",
"ThreadPool_RejectedReqs":"count","Net_TCP_TxQ":"segments/flow","Master_Task_Run_Time":"ms",
"IO_WriteSyscallRate":"count/s","IO_WriteThroughput":"B/s",
"Refresh_Event":"count","Flush_Time":"ms","Heap_Init":"B",
"Indexing_Pressure_Rejection_Count":"count",
"CPU_Utilization":"cores","Cache_Query_Size":"B",
"Merge_Event":"count","Cache_FieldData_Eviction":"count",
"IO_TotalSyscallRate":"count/s","Net_Throughput":"B/s",
"Paging_RSS":"pages",
"AdmissionControl_ThresholdValue":"count",
"Indexing_Pressure_Average_Window_Throughput":"count/s",
"Cache_MaxSize":"B","IndexWriter_Memory":"B",
"Net_TCP_SSThresh":"B/flow","IO_ReadThroughput":"B/s",
"LeaderCheck_Latency":"ms","FollowerCheck_Failure":"count",
"HTTP_RequestDocs":"count","Net_TCP_Lost":"segments/flow",
"GC_Collection_Event":"count","Sched_CtxRate":"count/s",
"AdmissionControl_RejectionCount":"count","Heap_Max":"B",
"ClusterApplierService_Failure":"count",
"PublishClusterState_Failure":"count",
"Merge_CurrentEvent":"count","Indexing_Buffer":"B",
"Bitset_Memory":"B","Net_PacketDropRate4":"packets/s",
"Heap_Committed":"B","Net_PacketDropRate6":"packets/s",
"Thread_Blocked_Event":"count","GC_Collection_Time":"ms",
"Cache_Query_Miss":"count","Latency":"ms",
"Shard_State":"count","Thread_Waited_Event":"count",
"CB_ConfiguredSize":"B","ThreadPool_QueueCapacity":"count",
"CB_TrippedEvents":"count","Disk_WaitTime":"ms",
"Data_RetryingPendingTasksCount":"count",
"AdmissionControl_CurrentValue":"count",
"Flush_Event":"count","Net_TCP_RxQ":"segments/flow",
"Shard_Size_In_Bytes":"B","Thread_Waited_Time":"s/event",
"HTTP_TotalRequests":"count",
"ThreadPool_ActiveThreads":"count",
"Paging_MinfltRate":"count/s","Net_TCP_SendCWND":"B/flow",
"Cache_Request_Eviction":"count","Segments_Total":"count",
"FollowerCheck_Latency":"ms","Heap_Used":"B",
"Master_ThrottledPendingTasksCount":"count",
"CB_EstimatedSize":"B","Indexing_ThrottleTime":"ms",
"Master_PendingQueueSize":"count",
"Cache_FieldData_Size":"B","Paging_MajfltRate":"count/s",
"ThreadPool_TotalThreads":"count","ShardEvents":"count",
"Net_TCP_NumFlows":"count","Election_Term":"count"}
根本原因分析
根本原因分析 (RCA) 框架使用性能分析器提供的信息,告知管理员其集群所遇到的性能和可用性问题的根本原因。
启用 RCA 框架
要启用 RCA 框架,请运行以下命令:
curl -XPOST https://:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
如果您遇到 curl: (52) Empty reply from server
响应,请运行以下命令以启用 RCA:
curl -XPOST https://:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:<custom-admin-password>' -k
示例 API 查询和响应
要请求所有可用的 RCA,请运行以下命令:
GET localhost:9600/_plugins/_performanceanalyzer/rca
要请求特定的 RCA,请运行以下命令:
GET localhost:9600/_plugins/_performanceanalyzer/rca?name=HighHeapUsageClusterRCA
以下是示例响应。
{
"HighHeapUsageClusterRCA": [{
"RCA_name": "HighHeapUsageClusterRCA",
"state": "unhealthy",
"timestamp": 1587426650942,
"HotClusterSummary": [{
"number_of_nodes": 2,
"number_of_unhealthy_nodes": 1,
"HotNodeSummary": [{
"host_address": "192.168.144.2",
"node_id": "JtlEoRowSI6iNpzpjlbp_Q",
"HotResourceSummary": [{
"resource_type": "old gen",
"threshold": 0.65,
"value": 0.81827232588145373,
"avg": NaN,
"max": NaN,
"min": NaN,
"unit_type": "heap usage in percentage",
"time_period_seconds": 600,
"TopConsumerSummary": [{
"name": "CACHE_FIELDDATA_SIZE",
"value": 590702564
},
{
"name": "CACHE_REQUEST_SIZE",
"value": 28375
},
{
"name": "CACHE_QUERY_SIZE",
"value": 12687
}
],
}]
}]
}]
}]
}
相关链接
有关性能分析器和 RCA 用法的更多文档可在以下链接中找到: