异步批量摄取

已弃用 3.0

此功能已弃用。如需类似功能，请使用 OpenSearch Data Prepper。如果您希望恢复此功能，请在 ML Commons 仓库中创建问题。

使用异步批量摄取 API 将数据从远程文件服务器（如 Amazon Simple Storage Service (Amazon S3) 或 OpenAI）中的文件摄取到 OpenSearch 集群。有关详细配置步骤，请参阅异步批量摄取。

端点

POST /_plugins/_ml/_batch_ingestion

请求正文字段

下表列出了可用的请求字段。

字段	数据类型	必需/可选	描述
`index_name`	字符串	必需	索引名称。
`field_map`	对象	必需	将源文件中的字段映射到 OpenSearch 索引中的特定字段进行摄取。
`ingest_fields`	数组	可选	列出应直接摄取到 OpenSearch 索引中而无需额外映射的源文件中的字段。
`credential`	对象	必需	包含访问外部数据源（如 Amazon S3 或 OpenAI）的认证信息。
`data_source`	对象	必需	指定摄取数据的外部文件的类型和位置。
`data_source.type`	字符串	必需	指定外部数据源的类型。有效值为 `s3` 和 `openAI`。
`data_source.source`	数组	必需	指定摄取数据的一个或多个文件位置。对于 `s3`，请指定 Amazon S3 存储桶的文件路径（例如，`["s3://offlinebatch/output/sagemaker_batch.json.out"]`）。对于 `openAI`，请指定输入或输出文件的文件 ID（例如，`["file-<your output file id>", "file-<your input file id>", "file-<your other file>"]`）。

示例请求：摄取单个文件

POST /_plugins/_ml/_batch_ingestion
{
  "index_name": "my-nlp-index",
  "field_map": {
    "chapter": "$.content[0]",
    "title": "$.content[1]",
    "chapter_embedding": "$.SageMakerOutput[0]",
    "title_embedding": "$.SageMakerOutput[1]",
    "_id": "$.id"
  },
  "ingest_fields": ["$.id"],
  "credential": {
    "region": "us-east-1",
    "access_key": "<your access key>",
    "secret_key": "<your secret key>",
    "session_token": "<your session token>"
  },
  "data_source": {
    "type": "s3",
    "source": ["s3://offlinebatch/output/sagemaker_batch.json.out"]
  }
}

示例请求：摄取多个文件

POST /_plugins/_ml/_batch_ingestion
{
  "index_name": "my-nlp-index-openai",
  "field_map": {
    "question": "source[1].$.body.input[0]",
    "answer": "source[1].$.body.input[1]",
    "question_embedding":"source[0].$.response.body.data[0].embedding",
    "answer_embedding":"source[0].$.response.body.data[1].embedding",
    "_id": ["source[0].$.custom_id", "source[1].$.custom_id"]
  },
  "ingest_fields": ["source[2].$.custom_field1", "source[2].$.custom_field2"],
  "credential": {
    "openAI_key": "<you openAI key>"
  },
  "data_source": {
    "type": "openAI",
    "source": ["file-<your output file id>", "file-<your input file id>", "file-<your other file>"]
  }
}

示例响应

{
  "task_id": "cbsPlpEBMHcagzGbOQOx",
  "task_type": "BATCH_INGEST",
  "status": "CREATED"
}

端点
示例请求：摄取单个文件
示例请求：摄取多个文件
示例响应

此页面有帮助吗？

✔ 是 ✖ 否

告诉我们原因

剩余 350 字符

有问题？在 OpenSearch 论坛上提问。

想要贡献？编辑此页面或创建问题。

异步批量摄取

端点

请求正文字段

示例请求：摄取单个文件

示例请求：摄取多个文件

示例响应

OpenSearch 链接

参与其中

资源

联系我们