使用 Cohere 压缩嵌入优化向量搜索
本教程将向您展示如何使用 Cohere 压缩嵌入来优化向量搜索。这些嵌入能够实现更高效的存储和更快的向量表示检索,使其成为大规模搜索应用的理想选择。
本教程兼容 2.17 及更高版本,但在第 4 步:搜索索引中的使用模板查询和搜索管道除外,该功能需要 2.19 或更高版本。
本教程使用 Amazon Bedrock 上的 Cohere Embed Multilingual v3 模型。有关在 Amazon Bedrock 上使用 Cohere 压缩嵌入的更多信息,请参阅这篇博客文章。
在本教程中,您将使用以下 OpenSearch 组件:
将以 your_
为前缀的占位符替换为您自己的值。
步骤 1:配置嵌入模型
按照以下步骤创建到 Amazon Bedrock 的连接器,以访问 Cohere Embed 模型。
步骤 1.1:创建连接器
使用此蓝图为嵌入模型创建连接器。有关创建连接器的更多信息,请参阅连接器。
因为您将在本教程中使用ML 推理处理器,所以您无需在连接器中指定预处理或后处理函数。
要创建连接器,请发送以下请求。参数"embedding_types": ["int8"]
指定了 Cohere 模型的 8 位整数量化嵌入。此设置将 32 位浮点嵌入压缩为 8 位整数,从而减少存储空间并提高计算速度。虽然精度略有折衷,但对于搜索任务而言通常可以忽略不计。这些量化嵌入与 OpenSearch 的knn_index
兼容,后者支持字节向量
POST _plugins/_ml/connectors/_create
{
"name": "Amazon Bedrock Connector: Cohere embed-multilingual-v3",
"description": "Test connector for Amazon Bedrock Cohere embed-multilingual-v3",
"version": 1,
"protocol": "aws_sigv4",
"credential": {
"access_key": "your_aws_access_key",
"secret_key": "your_aws_secret_key",
"session_token": "your_aws_session_token"
},
"parameters": {
"region": "your_aws_region",
"service_name": "bedrock",
"truncate": "END",
"input_type": "search_document",
"model": "cohere.embed-multilingual-v3",
"embedding_types": ["int8"]
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"headers": {
"x-amz-content-sha256": "required",
"content-type": "application/json"
},
"url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
"request_body": "{ \"texts\": ${parameters.texts}, \"truncate\": \"${parameters.truncate}\", \"input_type\": \"${parameters.input_type}\", \"embedding_types\": ${parameters.embedding_types} }"
}
]
}
有关模型参数的更多信息,请参阅Cohere 文档和Amazon Bedrock 文档
响应包含连接器 ID:
{
"connector_id": "AOP0OZUB3JwAtE25PST0"
}
记下连接器 ID;您将在下一步中使用它。
步骤 1.2:注册模型
接下来,使用上一步中创建的连接器注册模型。参数interface
是可选的。如果模型不需要特定的接口配置,请将此参数设置为空对象:"interface": {}
POST _plugins/_ml/models/_register?deploy=true
{
"name": "Bedrock Cohere embed-multilingual-v3",
"version": "1.0",
"function_name": "remote",
"description": "Bedrock Cohere embed-multilingual-v3",
"connector_id": "AOP0OZUB3JwAtE25PST0",
"interface": {
"input": "{\n \"type\": \"object\",\n \"properties\": {\n \"parameters\": {\n \"type\": \"object\",\n \"properties\": {\n \"texts\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"string\"\n }\n },\n \"embedding_types\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"string\",\n \"enum\": [\"float\", \"int8\", \"uint8\", \"binary\", \"ubinary\"]\n }\n },\n \"truncate\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"string\",\n \"enum\": [\"NONE\", \"START\", \"END\"]\n }\n },\n \"input_type\": {\n \"type\": \"string\",\n \"enum\": [\"search_document\", \"search_query\", \"classification\", \"clustering\"]\n }\n },\n \"required\": [\"texts\"]\n }\n },\n \"required\": [\"parameters\"]\n}",
"output": "{\n \"type\": \"object\",\n \"properties\": {\n \"inference_results\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"object\",\n \"properties\": {\n \"output\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\n \"type\": \"string\"\n },\n \"dataAsMap\": {\n \"type\": \"object\",\n \"properties\": {\n \"id\": {\n \"type\": \"string\",\n \"format\": \"uuid\"\n },\n \"texts\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"string\"\n }\n },\n \"embeddings\": {\n \"type\": \"object\",\n \"properties\": {\n \"binary\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"number\"\n }\n }\n },\n \"float\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"number\"\n }\n }\n },\n \"int8\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"number\"\n }\n }\n },\n \"ubinary\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"number\"\n }\n }\n },\n \"uint8\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"number\"\n }\n }\n }\n }\n },\n \"response_type\": {\n \"type\": \"string\"\n }\n },\n \"required\": [\"embeddings\"]\n }\n },\n \"required\": [\"name\", \"dataAsMap\"]\n }\n },\n \"status_code\": {\n \"type\": \"integer\"\n }\n },\n \"required\": [\"output\", \"status_code\"]\n }\n }\n },\n \"required\": [\"inference_results\"]\n}"
}
}
有关更多信息,请参阅模型接口文档
响应包含模型 ID
{
"task_id": "COP0OZUB3JwAtE25yiQr",
"status": "CREATED",
"model_id": "t64OPpUBX2k07okSZc2n"
}
要测试模型,请发送以下请求:
POST _plugins/_ml/models/t64OPpUBX2k07okSZc2n/_predict
{
"parameters": {
"texts": ["Say this is a test"],
"embedding_types": [ "int8" ]
}
}
响应包含生成的嵌入
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"id": "db07a08c-283d-4da5-b0c5-a9a54ef35d01",
"texts": [
"Say this is a test"
],
"embeddings": {
"int8": [
[
-26.0,
31.0,
...
]
]
},
"response_type": "embeddings_by_type"
}
}
],
"status_code": 200
}
]
}
步骤 2:创建摄取管道
摄取管道允许您在索引文档之前对其进行处理。在本例中,您将使用它来为数据中的title
和description
字段生成嵌入。
有两种设置管道的方法
- 分别为
title
和description
调用模型:此选项为每个字段发送单独的请求,生成独立的嵌入。 - 通过组合
title
和description
一次性调用模型:此选项将字段连接成单个输入并发送一个请求,生成一个表示两者的单一嵌入。
选项 1:分别为 title 和 description 调用模型
PUT _ingest/pipeline/ml_inference_pipeline_cohere
{
"processors": [
{
"ml_inference": {
"tag": "ml_inference",
"description": "This processor is going to run ml inference during ingest request",
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "$..title"
},
{
"texts": "$..description"
}
],
"output_map": [
{
"title_embedding": "embeddings.int8[0]"
},
{
"description_embedding": "embeddings.int8[0]"
}
],
"model_config": {
"embedding_types": ["int8"]
},
"ignore_failure": false
}
}
]
}
选项 2:通过组合 title 和 description 一次性调用模型
PUT _ingest/pipeline/ml_inference_pipeline_cohere
{
"description": "Concatenate title and description fields",
"processors": [
{
"set": {
"field": "title_desc_tmp",
"value": [
"",
""
]
}
},
{
"ml_inference": {
"tag": "ml_inference",
"description": "This processor is going to run ml inference during ingest request",
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "title_desc_tmp"
}
],
"output_map": [
{
"title_embedding": "embeddings.int8[0]",
"description_embedding": "embeddings.int8[1]"
}
],
"model_config": {
"embedding_types": ["int8"]
},
"ignore_failure": true
}
},
{
"remove": {
"field": "title_desc_tmp"
}
}
]
}
通过发送以下模拟请求来测试管道
POST _ingest/pipeline/ml_inference_pipeline_cohere/_simulate
{
"docs": [
{
"_index": "books",
"_id": "1",
"_source": {
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"description": "A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.",
"publication_year": 1925,
"genre": "Classic Fiction"
}
}
]
}
响应包含生成的嵌入
{
"docs": [
{
"doc": {
"_index": "books",
"_id": "1",
"_source": {
"publication_year": 1925,
"author": "F. Scott Fitzgerald",
"genre": "Classic Fiction",
"description": "A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.",
"title": "The Great Gatsby",
"title_embedding": [
18,
33,
...
],
"description_embedding": [
-21,
-14,
...
]
},
"_ingest": {
"timestamp": "2025-02-25T09:11:32.192125042Z"
}
}
}
]
}
步骤 3:创建向量索引并摄取数据
接下来,创建一个向量索引
PUT books
{
"settings": {
"index": {
"default_pipeline": "ml_inference_pipeline_cohere",
"knn": true,
"knn.algo_param.ef_search": 100
}
},
"mappings": {
"properties": {
"title_embedding": {
"type": "knn_vector",
"dimension": 1024,
"data_type": "byte",
"space_type": "l2",
"method": {
"name": "hnsw",
"engine": "lucene",
"parameters": {
"ef_construction": 100,
"m": 16
}
}
},
"description_embedding": {
"type": "knn_vector",
"dimension": 1024,
"data_type": "byte",
"space_type": "l2",
"method": {
"name": "hnsw",
"engine": "lucene",
"parameters": {
"ef_construction": 100,
"m": 16
}
}
}
}
}
}
将测试数据摄取到索引中
POST _bulk
{"index":{"_index":"books"}}
{"title":"The Great Gatsby","author":"F. Scott Fitzgerald","description":"A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.","publication_year":1925,"genre":"Classic Fiction"}
{"index":{"_index":"books"}}
{"title":"To Kill a Mockingbird","author":"Harper Lee","description":"A powerful story of racial injustice and loss of innocence in the American South during the Great Depression.","publication_year":1960,"genre":"Literary Fiction"}
{"index":{"_index":"books"}}
{"title":"Pride and Prejudice","author":"Jane Austen","description":"A romantic novel of manners that follows the character development of Elizabeth Bennet as she learns about the repercussions of hasty judgments and comes to appreciate the difference between superficial goodness and actual goodness.","publication_year":1813,"genre":"Romance"}
步骤 4:搜索索引
您可以通过以下方式对索引运行向量搜索
使用模板查询和搜索管道
首先,创建一个搜索管道
PUT _search/pipeline/ml_inference_pipeline_cohere_search
{
"request_processors": [
{
"ml_inference": {
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "$..ext.ml_inference.text"
}
],
"output_map": [
{
"ext.ml_inference.vector": "embeddings.int8[0]"
}
],
"model_config": {
"input_type": "search_query",
"embedding_types": ["int8"]
}
}
}
]
}
接下来,使用模板查询运行搜索
GET books/_search?search_pipeline=ml_inference_pipeline_cohere_search&verbose_pipeline=false
{
"query": {
"template": {
"knn": {
"description_embedding": {
"vector": "${ext.ml_inference.vector}",
"k": 10
}
}
}
},
"ext": {
"ml_inference": {
"text": "American Dream"
}
},
"_source": {
"excludes": [
"title_embedding", "description_embedding"
]
},
"size": 2
}
要查看每个搜索处理器的输入和输出,请在请求中添加&verbose_pipeline=true
。这对于调试和理解搜索管道如何修改查询非常有用。有关更多信息,请参阅调试搜索管道。
在搜索管道中重写查询
创建另一个重写查询的搜索管道
PUT _search/pipeline/ml_inference_pipeline_cohere_search2
{
"request_processors": [
{
"ml_inference": {
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "$..match.description.query"
}
],
"output_map": [
{
"query_vector": "embeddings.int8[0]"
}
],
"model_config": {
"input_type": "search_query",
"embedding_types": ["int8"]
},
"query_template": """
{
"query": {
"knn": {
"description_embedding": {
"vector": ${query_vector},
"k": 10
}
}
},
"_source": {
"excludes": [
"title_embedding",
"description_embedding"
]
},
"size": 2
}
"""
}
}
]
}
现在使用此管道运行向量搜索
GET books/_search?search_pipeline=ml_inference_pipeline_cohere_search2
{
"query": {
"match": {
"description": "American Dream"
}
}
}
响应包含匹配文档:
{
"took": 96,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 7.271585e-7,
"hits": [
{
"_index": "books",
"_id": "U640PJUBX2k07okSEMwy",
"_score": 7.271585e-7,
"_source": {
"publication_year": 1925,
"author": "F. Scott Fitzgerald",
"genre": "Classic Fiction",
"description": "A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.",
"title": "The Great Gatsby"
}
},
{
"_index": "books",
"_id": "VK40PJUBX2k07okSEMwy",
"_score": 6.773544e-7,
"_source": {
"publication_year": 1960,
"author": "Harper Lee",
"genre": "Literary Fiction",
"description": "A powerful story of racial injustice and loss of innocence in the American South during the Great Depression.",
"title": "To Kill a Mockingbird"
}
}
]
}
}
步骤 5(可选):使用二进制嵌入
在本节中,您将扩展设置以支持二进制嵌入,这提供了更高效的存储和更快的检索。二进制嵌入可以显著减少存储需求并提高搜索速度,使其成为大规模应用的理想选择。
您无需修改连接器或模型——您只需要更新向量索引、摄取管道和搜索管道。
步骤 5.1:创建摄取管道
通过使用与步骤 2中相同的配置,但将所有出现的int8
替换为binary
,创建一个名为ml_inference_pipeline_cohere_binary
的新摄取管道。
选项 1:分别为 title 和 description 调用模型
PUT _ingest/pipeline/ml_inference_pipeline_cohere
{
"processors": [
{
"ml_inference": {
"tag": "ml_inference",
"description": "This processor is going to run ml inference during ingest request",
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "$..title"
},
{
"texts": "$..description"
}
],
"output_map": [
{
"title_embedding": "embeddings.binary[0]"
},
{
"description_embedding": "embeddings.binary[0]"
}
],
"model_config": {
"embedding_types": ["binary"]
},
"ignore_failure": false
}
}
]
}
选项 2:通过组合 title 和 description 一次性调用模型
PUT _ingest/pipeline/ml_inference_pipeline_cohere
{
"description": "Concatenate title and description fields",
"processors": [
{
"set": {
"field": "title_desc_tmp",
"value": [
"",
""
]
}
},
{
"ml_inference": {
"tag": "ml_inference",
"description": "This processor is going to run ml inference during ingest request",
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "title_desc_tmp"
}
],
"output_map": [
{
"title_embedding": "embeddings.binary[0]",
"description_embedding": "embeddings.binary[1]"
}
],
"model_config": {
"embedding_types": ["binary"]
},
"ignore_failure": true
}
},
{
"remove": {
"field": "title_desc_tmp"
}
}
]
}
步骤 5.2:创建向量索引并摄取数据
创建一个包含二进制向量字段的新向量索引
PUT books_binary_embedding
{
"settings": {
"index": {
"default_pipeline": "ml_inference_pipeline_cohere_binary",
"knn": true
}
},
"mappings": {
"properties": {
"title_embedding": {
"type": "knn_vector",
"dimension": 1024,
"data_type": "binary",
"space_type": "hamming",
"method": {
"name": "hnsw",
"engine": "faiss"
}
},
"description_embedding": {
"type": "knn_vector",
"dimension": 1024,
"data_type": "binary",
"space_type": "hamming",
"method": {
"name": "hnsw",
"engine": "faiss"
}
}
}
}
}
将测试数据摄取到索引中
POST _bulk
{"index":{"_index":"books_binary_embedding"}}
{"title":"The Great Gatsby","author":"F. Scott Fitzgerald","description":"A novel of decadence and excess in the Jazz Age, exploring themes of wealth, love, and the American Dream.","publication_year":1925,"genre":"Classic Fiction"}
{"index":{"_index":"books_binary_embedding"}}
{"title":"To Kill a Mockingbird","author":"Harper Lee","description":"A powerful story of racial injustice and loss of innocence in the American South during the Great Depression.","publication_year":1960,"genre":"Literary Fiction"}
{"index":{"_index":"books_binary_embedding"}}
{"title":"Pride and Prejudice","author":"Jane Austen","description":"A romantic novel of manners that follows the character development of Elizabeth Bennet as she learns about the repercussions of hasty judgments and comes to appreciate the difference between superficial goodness and actual goodness.","publication_year":1813,"genre":"Romance"}
步骤 5.3:创建搜索管道
通过使用与步骤 2中相同的配置,但将所有出现的int8
替换为binary
,创建一个名为ml_inference_pipeline_cohere_search_binary
的新搜索管道。
- 将
embeddings.int8[0]
更改为embeddings.binary[0]
。 - 将
"embedding_types": ["int8"]
更改为"embedding_types": ["binary"]
。
使用模板查询和搜索管道
首先,创建一个搜索管道
PUT _search/pipeline/ml_inference_pipeline_cohere_search_binary
{
"request_processors": [
{
"ml_inference": {
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "$..ext.ml_inference.text"
}
],
"output_map": [
{
"ext.ml_inference.vector": "embeddings.binary[0]"
}
],
"model_config": {
"input_type": "search_query",
"embedding_types": ["binary"]
}
}
}
]
}
在搜索管道中重写查询
创建另一个重写查询的搜索管道
PUT _search/pipeline/ml_inference_pipeline_cohere_search_binary2
{
"request_processors": [
{
"ml_inference": {
"model_id": "t64OPpUBX2k07okSZc2n",
"input_map": [
{
"texts": "$..match.description.query"
}
],
"output_map": [
{
"query_vector": "embeddings.binary[0]"
}
],
"model_config": {
"input_type": "search_query",
"embedding_types": ["binary"]
},
"query_template": """
{
"query": {
"knn": {
"description_embedding": {
"vector": ${query_vector},
"k": 10
}
}
},
"_source": {
"excludes": [
"title_embedding",
"description_embedding"
]
},
"size": 2
}
"""
}
}
]
}
然后,您可以按照步骤 4中描述的方式使用搜索管道运行向量搜索。