从对象数组生成嵌入
本教程将向您展示如何为对象数组生成嵌入。欲了解更多信息,请参阅自动生成嵌入。
将以 your_
为前缀的占位符替换为您自己的值。
步骤 1:注册嵌入模型
在本教程中,您将使用托管在 Amazon Bedrock 上的 Amazon Titan Text Embeddings 模型。
首先,请按照 Amazon Bedrock Titan 蓝图示例 来注册和部署模型。
测试模型,提供模型 ID
POST /_plugins/_ml/models/your_embedding_model_id/_predict
{
"parameters": {
"inputText": "hello world"
}
}
响应包含推理结果
{
"inference_results": [
{
"output": [
{
"name": "sentence_embedding",
"data_type": "FLOAT32",
"shape": [ 1536 ],
"data": [0.7265625, -0.0703125, 0.34765625, ...]
}
],
"status_code": 200
}
]
}
步骤 2:创建摄取管道
按照接下来的步骤创建用于生成嵌入的摄入管道。
步骤 2.1:创建向量索引
首先,创建向量索引
PUT my_books
{
"settings" : {
"index.knn" : "true",
"default_pipeline": "bedrock_embedding_pipeline"
},
"mappings": {
"properties": {
"books": {
"type": "nested",
"properties": {
"title_embedding": {
"type": "knn_vector",
"dimension": 1536
},
"title": {
"type": "text"
},
"description": {
"type": "text"
}
}
}
}
}
}
步骤 2.2:创建摄入管道
然后创建一个内部摄入管道,为数组中的一个元素生成嵌入。
此管道包含三个处理器
text_embedding
处理器:将临时字段的值转换为嵌入。
要创建这样的管道,请发送以下请求
PUT _ingest/pipeline/bedrock_embedding_pipeline
{
"processors": [
{
"text_embedding": {
"model_id": "your_embedding_model_id",
"field_map": {
"books.title": "title_embedding"
}
}
}
]
}
步骤 2.3:模拟管道
首先,您将在包含两个图书对象(都带有 title
字段)的数组上测试管道
POST _ingest/pipeline/bedrock_embedding_pipeline/_simulate
{
"docs": [
{
"_index": "my_books",
"_id": "1",
"_source": {
"books": [
{
"title": "first book",
"description": "This is first book"
},
{
"title": "second book",
"description": "This is second book"
}
]
}
}
]
}
响应包含两个对象在其 title_embedding
字段中生成的嵌入
{
"docs": [
{
"doc": {
"_index": "my_books",
"_id": "1",
"_source": {
"books": [
{
"title": "first book",
"title_embedding": [-1.1015625, 0.65234375, 0.7578125, ...],
"description": "This is first book"
},
{
"title": "second book",
"title_embedding": [-0.65234375, 0.21679688, 0.7265625, ...],
"description": "This is second book"
}
]
},
"_ingest": {
"_value": null,
"timestamp": "2024-05-28T16:16:50.538929413Z"
}
}
}
]
}
接下来,您将在包含两个图书对象(一个带有 title
字段,一个不带)的数组上测试管道
POST _ingest/pipeline/bedrock_embedding_foreach_pipeline/_simulate
{
"docs": [
{
"_index": "my_books",
"_id": "1",
"_source": {
"books": [
{
"title": "first book",
"description": "This is first book"
},
{
"description": "This is second book"
}
]
}
}
]
}
响应包含带有 title
字段的对象的生成嵌入
{
"docs": [
{
"doc": {
"_index": "my_books",
"_id": "1",
"_source": {
"books": [
{
"title": "first book",
"title_embedding": [-1.1015625, 0.65234375, 0.7578125, ...],
"description": "This is first book"
},
{
"description": "This is second book"
}
]
},
"_ingest": {
"_value": null,
"timestamp": "2024-05-28T16:19:03.942644042Z"
}
}
}
]
}
步骤 2.4:测试数据摄入
摄入一个文档
PUT my_books/_doc/1
{
"books": [
{
"title": "first book",
"description": "This is first book"
},
{
"title": "second book",
"description": "This is second book"
}
]
}
获取文档
GET my_books/_doc/1
响应包含生成的嵌入
{
"_index": "my_books",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"books": [
{
"description": "This is first book",
"title": "first book",
"title_embedding": [-1.1015625, 0.65234375, 0.7578125, ...]
},
{
"description": "This is second book",
"title": "second book",
"title_embedding": [-0.65234375, 0.21679688, 0.7265625, ...]
}
]
}
}
您还可以批量摄入多个文档,并通过调用 Get Document API 来测试生成的嵌入。
POST _bulk
{ "index" : { "_index" : "my_books" } }
{ "books" : [{"title": "first book", "description": "This is first book"}, {"title": "second book", "description": "This is second book"}] }
{ "index" : { "_index" : "my_books" } }
{ "books" : [{"title": "third book", "description": "This is third book"}, {"description": "This is fourth book"}] }