数据摘要
这是一项实验性功能,不建议在生产环境中使用。有关功能进展的更新或如果您想留下反馈,请加入 OpenSearch 论坛的讨论。
OpenSearch Dashboards Assistant 的数据摘要功能使用大型语言模型(LLM)来帮助您为存储在 OpenSearch 索引中的数据生成摘要。此工具提供了一种有效的方式来从大型数据集中获取洞察,从而更容易理解和处理 OpenSearch 索引中包含的信息。
配置
要配置数据摘要功能,请执行以下步骤。
先决条件
在使用数据摘要功能之前,请在 OpenSearch Dashboards 中按如下方式启用查询增强功能:
- 在顶部菜单栏中,转到 Management > Dashboards Management。
- 在左侧导航窗格中,选择 Advanced settings。
- 在设置页面上,将 Enable query enhancements 切换为 On。
步骤 1:启用数据摘要功能
要启用数据摘要功能,请配置以下 opensearch_dashboards.yml
设置:
queryEnhancements.queryAssist.summary.enabled: true
步骤 2:创建数据摘要代理
要协调数据摘要,请创建一个数据摘要代理。要创建代理,请发送 POST /_plugins/_flow_framework/workflow?provision=true
请求并将代理模板作为有效负载提供:
请求
POST /_plugins/_flow_framework/workflow?provision=true
{
"name": "Query Assist Agent",
"description": "Create a Query Assist Agent using Claude on BedRock",
"use_case": "REGISTER_AGENT",
"version": {
"template": "1.0.0",
"compatibility": ["2.13.0", "3.0.0"]
},
"workflows": {
"provision": {
"user_params": {},
"nodes": [
{
"id": "create_claude_connector",
"type": "create_connector",
"previous_node_inputs": {},
"user_inputs": {
"version": "1",
"name": "Claude instant runtime Connector",
"protocol": "aws_sigv4",
"description": "The connector to BedRock service for Claude model",
"actions": [
{
"headers": {
"x-amz-content-sha256": "required",
"content-type": "application/json"
},
"method": "POST",
"request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature}, \"anthropic_version\":\"${parameters.anthropic_version}\" }",
"action_type": "predict",
"url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-instant-v1/invoke"
}
],
"credential": {
"access_key": "<YOUR_ACCESS_KEY>",
"secret_key": "<YOUR_SECRET_KEY>",
"session_token": "<YOUR_SESSION_TOKEN>"
},
"parameters": {
"region": "us-west-2",
"endpoint": "bedrock-runtime.us-west-2.amazonaws.com",
"content_type": "application/json",
"auth": "Sig_V4",
"max_tokens_to_sample": "8000",
"service_name": "bedrock",
"temperature": "0.0001",
"response_filter": "$.completion",
"anthropic_version": "bedrock-2023-05-31"
}
}
},
{
"id": "register_claude_model",
"type": "register_remote_model",
"previous_node_inputs": {
"create_claude_connector": "connector_id"
},
"user_inputs": {
"description": "Claude model",
"deploy": true,
"name": "claude-instant"
}
},
{
"id": "create_query_assist_data_summary_ml_model_tool",
"type": "create_tool",
"previous_node_inputs": {
"register_claude_model": "model_id"
},
"user_inputs": {
"parameters": {
"prompt": "Human: You are an assistant that helps to summarize the data and provide data insights.\nThe data are queried from OpenSearch index through user's question which was translated into PPL query.\nHere is a sample PPL query: `source=<index> | where <field> = <value>`.\nNow you are given ${parameters.sample_count} sample data out of ${parameters.total_count} total data.\nThe user's question is `${parameters.question}`, the translated PPL query is `${parameters.ppl}` and sample data are:\n```\n${parameters.sample_data}\n```\nCould you help provide a summary of the sample data and provide some useful insights with precise wording and in plain text format, do not use markdown format.\nYou don't need to echo my requirements in response.\n\nAssistant:"
},
"name": "MLModelTool",
"type": "MLModelTool"
}
},
{
"id": "create_query_assist_data_summary_agent",
"type": "register_agent",
"previous_node_inputs": {
"create_query_assist_data_summary_ml_model_tool": "tools"
},
"user_inputs": {
"parameters": {},
"type": "flow",
"name": "Query Assist Data Summary Agent",
"description": "this is an query assist data summary agent"
}
}
]
}
}
}
有关代理模板示例,请参阅Flow Framework 示例模板。记下代理 ID;您将在下一步中使用它。
步骤 3:创建根代理
接下来,为上一步中创建的数据摘要代理创建一个根代理:
POST /.plugins-ml-config/_doc/os_data2summary
{
"type": "os_root_agent",
"configuration": {
"agent_id": "<DATA_SUMMARY_AGENT_ID>"
}
}
此示例演示了一个系统索引。在启用安全的域中,只有超级管理员才有权限执行此代码。有关进行超级管理员调用的信息,请参阅系统索引。有关访问权限,请联系您的系统管理员。
步骤 4:测试代理
您可以通过使用示例有效负载调用代理来验证数据摘要代理是否成功创建:
POST /_plugins/_ml/agents/<DATA_SUMMARY_AGENT_ID>/_execute
{
"parameters": {
"sample_data":"'[{\"_index\":\"90943e30-9a47-11e8-b64d-95841ca0b247\",\"_source\":{\"referer\":\"http://twitter.com/success/gemini-9a\",\"request\":\"/beats/metricbeat/metricbeat-6.3.2-amd64.deb\",\"agent\":\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\",\"extension\":\"deb\",\"memory\":null,\"ip\":\"239.67.210.53\",\"index\":\"opensearch_dashboards_sample_data_logs\",\"message\":\"239.67.210.53 - - [2018-08-30T15:29:01.686Z] \\\"GET /beats/metricbeat/metricbeat-6.3.2-amd64.deb HTTP/1.1\\\" 404 2633 \\\"-\\\" \\\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\\\"\",\"url\":\"https://artifacts.opensearch.org/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb\",\"tags\":\"success\",\"geo\":{\"srcdest\":\"CN:PL\",\"src\":\"CN\",\"coordinates\":{\"lat\":44.91167028,\"lon\":-108.4455092},\"dest\":\"PL\"},\"utc_time\":\"2024-09-05 15:29:01.686\",\"bytes\":2633,\"machine\":{\"os\":\"win xp\",\"ram\":21474836480},\"response\":\"404\",\"clientip\":\"239.67.210.53\",\"host\":\"artifacts.opensearch.org\",\"event\":{\"dataset\":\"sample_web_logs\"},\"phpmemory\":null,\"timestamp\":\"2024-09-05 15:29:01.686\"}}]'",
"sample_count":1,
"total_count":383,
"question":"Are there any errors in my logs?",
"ppl":"source=opensearch_dashboards_sample_data_logs| where QUERY_STRING(['response'], '4* OR 5*')"}
}
生成数据摘要
您可以通过调用 /api/assistant/data2summary
API 端点生成数据摘要。sample_count
、total_count
、question
和 ppl
参数是可选的:
POST /api/assistant/data2summary
{
"sample_data":"'[{\"_index\":\"90943e30-9a47-11e8-b64d-95841ca0b247\",\"_source\":{\"referer\":\"http://twitter.com/success/gemini-9a\",\"request\":\"/beats/metricbeat/metricbeat-6.3.2-amd64.deb\",\"agent\":\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\",\"extension\":\"deb\",\"memory\":null,\"ip\":\"239.67.210.53\",\"index\":\"opensearch_dashboards_sample_data_logs\",\"message\":\"239.67.210.53 - - [2018-08-30T15:29:01.686Z] \\\"GET /beats/metricbeat/metricbeat-6.3.2-amd64.deb HTTP/1.1\\\" 404 2633 \\\"-\\\" \\\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\\\"\",\"url\":\"https://artifacts.opensearch.org/downloads/beats/metricbeat/metricbeat-6.3.2-amd64.deb\",\"tags\":\"success\",\"geo\":{\"srcdest\":\"CN:PL\",\"src\":\"CN\",\"coordinates\":{\"lat\":44.91167028,\"lon\":-108.4455092},\"dest\":\"PL\"},\"utc_time\":\"2024-09-05 15:29:01.686\",\"bytes\":2633,\"machine\":{\"os\":\"win xp\",\"ram\":21474836480},\"response\":\"404\",\"clientip\":\"239.67.210.53\",\"host\":\"artifacts.opensearch.org\",\"event\":{\"dataset\":\"sample_web_logs\"},\"phpmemory\":null,\"timestamp\":\"2024-09-05 15:29:01.686\"}}]'",
"sample_count":1,
"total_count":383,
"question":"Are there any errors in my logs?",
"ppl":"source=opensearch_dashboards_sample_data_logs| where QUERY_STRING(['response'], '4* OR 5*')"
}
下表描述了 Assistant 数据摘要 API 参数。
参数 | 必需/可选 | 描述 |
---|---|---|
sample_data | 必需 | 由指定查询返回并用作摘要输入的数据样本。 |
question | 可选 | 用户关于数据的自然语言问题,用于指导摘要生成。 |
ppl | 可选 | 用于检索数据的 Piped Processing Language (PPL) 查询;在查询协助中,此查询由 LLM 使用用户的自然语言问题生成。 |
sample_count | 可选 | sample_data 中包含的条目数量。 |
total_count | 可选 | 完整查询结果集中的总条目数量。 |
在 OpenSearch Dashboards 中查看数据摘要
要在 OpenSearch Dashboards 中查看告警洞察,请执行以下步骤:
-
在顶部菜单栏中,转到 OpenSearch Dashboards > Discover。
-
从查询语言下拉列表中,选择 PPL。您将在查询文本之后看到生成的数据摘要,如下图所示。