Link Search Menu Expand Document Documentation Menu

从对象数组生成嵌入

本教程将向您展示如何为对象数组生成嵌入。欲了解更多信息,请参阅自动生成嵌入

将以 your_ 为前缀的占位符替换为您自己的值。

步骤 1:注册嵌入模型

在本教程中,您将使用托管在 Amazon Bedrock 上的 Amazon Titan Text Embeddings 模型

首先,请按照 Amazon Bedrock Titan 蓝图示例 来注册和部署模型。

测试模型,提供模型 ID

POST /_plugins/_ml/models/your_embedding_model_id/_predict
{
    "parameters": {
        "inputText": "hello world"
    }
}

响应包含推理结果

{
  "inference_results": [
    {
      "output": [
        {
          "name": "sentence_embedding",
          "data_type": "FLOAT32",
          "shape": [ 1536 ],
          "data": [0.7265625, -0.0703125, 0.34765625, ...]
        }
      ],
      "status_code": 200
    }
  ]
}

步骤 2:创建摄取管道

按照接下来的步骤创建用于生成嵌入的摄入管道。

步骤 2.1:创建向量索引

首先,创建向量索引

PUT my_books
{
  "settings" : {
      "index.knn" : "true",
      "default_pipeline": "bedrock_embedding_pipeline"
  },
  "mappings": {
    "properties": {
      "books": {
        "type": "nested",
        "properties": {
          "title_embedding": {
            "type": "knn_vector",
            "dimension": 1536
          },
          "title": {
            "type": "text"
          },
          "description": {
            "type": "text"
          }
        }
      }
    }
  }
}

步骤 2.2:创建摄入管道

然后创建一个内部摄入管道,为数组中的一个元素生成嵌入。

此管道包含三个处理器

  • text_embedding 处理器:将临时字段的值转换为嵌入。

要创建这样的管道,请发送以下请求

PUT _ingest/pipeline/bedrock_embedding_pipeline
{
  "processors": [
    {
      "text_embedding": {
        "model_id": "your_embedding_model_id",
        "field_map": {
          "books.title": "title_embedding"
        }
      }
    }
  ]
}

步骤 2.3:模拟管道

首先,您将在包含两个图书对象(都带有 title 字段)的数组上测试管道

POST _ingest/pipeline/bedrock_embedding_pipeline/_simulate
{
  "docs": [
    {
      "_index": "my_books",
      "_id": "1",
      "_source": {
        "books": [
          {
            "title": "first book",
            "description": "This is first book"
          },
          {
            "title": "second book",
            "description": "This is second book"
          }
        ]
      }
    }
  ]
}

响应包含两个对象在其 title_embedding 字段中生成的嵌入

{
  "docs": [
    {
      "doc": {
        "_index": "my_books",
        "_id": "1",
        "_source": {
          "books": [
            {
              "title": "first book",
              "title_embedding": [-1.1015625, 0.65234375, 0.7578125, ...],
              "description": "This is first book"
            },
            {
              "title": "second book",
              "title_embedding": [-0.65234375, 0.21679688, 0.7265625, ...],
              "description": "This is second book"
            }
          ]
        },
        "_ingest": {
          "_value": null,
          "timestamp": "2024-05-28T16:16:50.538929413Z"
        }
      }
    }
  ]
}

接下来,您将在包含两个图书对象(一个带有 title 字段,一个不带)的数组上测试管道

POST _ingest/pipeline/bedrock_embedding_foreach_pipeline/_simulate
{
  "docs": [
    {
      "_index": "my_books",
      "_id": "1",
      "_source": {
        "books": [
          {
            "title": "first book",
            "description": "This is first book"
          },
          {
            "description": "This is second book"
          }
        ]
      }
    }
  ]
}

响应包含带有 title 字段的对象的生成嵌入

{
  "docs": [
    {
      "doc": {
        "_index": "my_books",
        "_id": "1",
        "_source": {
          "books": [
            {
              "title": "first book",
              "title_embedding": [-1.1015625, 0.65234375, 0.7578125, ...],
              "description": "This is first book"
            },
            {
              "description": "This is second book"
            }
          ]
        },
        "_ingest": {
          "_value": null,
          "timestamp": "2024-05-28T16:19:03.942644042Z"
        }
      }
    }
  ]
}

步骤 2.4:测试数据摄入

摄入一个文档

PUT my_books/_doc/1
{
  "books": [
    {
      "title": "first book",
      "description": "This is first book"
    },
    {
      "title": "second book",
      "description": "This is second book"
    }
  ]
}

获取文档

GET my_books/_doc/1

响应包含生成的嵌入

{
  "_index": "my_books",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "books": [
      {
        "description": "This is first book",
        "title": "first book",
        "title_embedding": [-1.1015625, 0.65234375, 0.7578125, ...]
      },
      {
        "description": "This is second book",
        "title": "second book",
        "title_embedding": [-0.65234375, 0.21679688, 0.7265625, ...]
      }
    ]
  }
}      

您还可以批量摄入多个文档,并通过调用 Get Document API 来测试生成的嵌入。

POST _bulk
{ "index" : { "_index" : "my_books" } }
{ "books" : [{"title": "first book", "description": "This is first book"}, {"title": "second book", "description": "This is second book"}] }
{ "index" : { "_index" : "my_books" } }
{ "books" : [{"title": "third book", "description": "This is third book"}, {"description": "This is fourth book"}] }