检索特定字段
当您在 OpenSearch 中运行基本搜索时,默认情况下,用于索引的原始 JSON 对象也会在 _source
对象中为每个命中返回。这可能导致通过网络传输大量数据,从而增加延迟和成本。有几种方法可以将响应限制为仅所需的信息。
禁用 _source
您可以在搜索请求中将 _source
设置为 false
,以从响应中排除 _source
字段。
GET /index1/_search
{
"_source": false,
"query": {
"match_all": {}
}
}
由于在上述搜索中未选择任何字段,检索到的命中将仅包含命中的 _index
、_id
和 _score
。
{
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index1",
"_id" : "41",
"_score" : 1.0
},
{
"_index" : "index1",
"_id" : "51",
"_score" : 1.0
}
]
}
}
_source
也可以通过以下配置在索引映射中禁用:
"mappings": {
"_source": {
"enabled": false
}
}
如果索引映射中禁用了 _source
,则使用 docvalue 字段进行搜索和使用存储字段进行搜索变得非常有用。
指定要检索的字段
您可以在 fields
参数中列出要检索的字段。也接受通配符模式。
GET /index1/_search
{
"_source": false,
"fields": ["age", "nam*"],
"query": {
"match_all": {}
}
}
响应包含 name
和 age
字段。
{
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index1",
"_id" : "41",
"_score" : 1.0,
"fields" : {
"name" : [
"John Doe"
],
"age" : [
30
]
}
},
{
"_index" : "index1",
"_id" : "51",
"_score" : 1.0,
"fields" : {
"name" : [
"Jane Smith"
],
"age" : [
25
]
}
}
]
}
}
以自定义格式提取字段
您还可以使用对象表示法将自定义格式应用于所选字段。
如果您有以下文档:
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_source": {
"title": "Document 1",
"date": "2023-07-04T12:34:56Z"
}
}
那么您可以使用 fields
参数和自定义格式进行查询:
GET /my_index/_search
{
"query": {
"match_all": {}
},
"fields": {
"date": {
"format": "yyyy-MM-dd"
}
},
"_source": false
}
此外,您可以在 fields
参数中使用多数字段和字段别名,因为它会查询索引的文档 _source
和 _mappings
。
使用 docvalue_fields 进行搜索
要从索引中检索特定字段,您还可以使用 docvalue_fields
参数。此参数与 fields
参数的工作方式略有不同。它从文档值而不是 _source
字段检索信息,这对于未分析的字段(如关键字、日期和数字字段)更有效。文档值具有针对高效排序和聚合优化的列式存储格式。它以易于读取的方式将值存储在磁盘上。当您使用 docvalue_fields
时,OpenSearch 直接从这种优化的存储格式中读取值。它对于检索主要用于排序、聚合和脚本中使用的字段的值非常有用。
以下示例演示了如何使用 docvalue_fields
参数。
-
使用以下映射创建一个索引:
PUT /my_index { "mappings": { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "publication_date": { "type": "date" }, "price": { "type": "double" } } } }
-
将以下文档索引到新创建的索引中:
POST /my_index/_doc/1 { "title": "OpenSearch Basics", "author": "John Doe", "publication_date": "2021-01-01", "price": 29.99 }
POST /my_index/_doc/2 { "title": "Advanced OpenSearch", "author": "Jane Smith", "publication_date": "2022-01-01", "price": 39.99 }
-
使用
docvalue_fields
仅检索author
和publication_date
字段。POST /my_index/_search { "_source": false, "docvalue_fields": ["author", "publication_date"], "query": { "match_all": {} } }
响应包含 author
和 publication_date
字段。
{
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1.0,
"fields": {
"author": ["John Doe"],
"publication_date": ["2021-01-01T00:00:00.000Z"]
}
},
{
"_index": "my_index",
"_id": "2",
"_score": 1.0,
"fields": {
"author": ["Jane Smith"],
"publication_date": ["2022-01-01T00:00:00.000Z"]
}
}
]
}
}
将 docvalue_fields 与嵌套对象一起使用
在 OpenSearch 中,如果要检索嵌套对象的文档值,不能直接使用 docvalue_fields
参数,因为它将返回一个空数组。相反,您应该使用 inner_hits
参数及其自己的 docvalue_fields
属性,如下例所示。
-
定义索引映射:
PUT /my_index { "mappings": { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "comments": { "type": "nested", "properties": { "username": { "type": "keyword" }, "content": { "type": "text" }, "created_at": { "type": "date" } } } } } }
-
索引您的数据:
POST /my_index/_doc/1 { "title": "OpenSearch Basics", "author": "John Doe", "comments": [ { "username": "alice", "content": "Great article!", "created_at": "2023-01-01T12:00:00Z" }, { "username": "bob", "content": "Very informative.", "created_at": "2023-01-02T12:00:00Z" } ] }
-
使用
inner_hits
和docvalue_fields
执行搜索:POST /my_index/_search { "query": { "nested": { "path": "comments", "query": { "match_all": {} }, "inner_hits": { "docvalue_fields": ["username", "created_at"] } } } }
以下是预期的响应:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "OpenSearch Basics",
"author": "John Doe",
"comments": [
{
"username": "alice",
"content": "Great article!",
"created_at": "2023-01-01T12:00:00Z"
},
{
"username": "bob",
"content": "Very informative.",
"created_at": "2023-01-02T12:00:00Z"
}
]
},
"inner_hits": {
"comments": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_nested": {
"field": "comments",
"offset": 0
},
"docvalue_fields": {
"username": ["alice"],
"created_at": ["2023-01-01T12:00:00Z"]
}
},
{
"_index": "my_index",
"_id": "1",
"_nested": {
"field": "comments",
"offset": 1
},
"docvalue_fields": {
"username": ["bob"],
"created_at": ["2023-01-02T12:00:00Z"]
}
}
]
}
}
}
}
]
}
}
使用 stored_fields 进行搜索
默认情况下,OpenSearch 将整个文档存储在 _source
字段中,并用它在搜索结果中返回文档内容。但是,您可能也希望单独存储某些字段以进行更有效的检索。您可以使用 stored_fields
将特定文档字段与 _source
字段分开存储和检索。
与 _source
不同,stored_fields
必须在您要单独存储的字段的映射中明确定义。如果您经常需要只检索一小部分字段并希望避免检索整个 _source
字段,这可能很有用。以下示例演示了如何使用 stored_fields
参数。
-
使用以下映射创建一个索引:
PUT /my_index { "mappings": { "properties": { "title": { "type": "text", "store": true // Store the title field separately }, "author": { "type": "keyword", "store": true // Store the author field separately }, "publication_date": { "type": "date" }, "price": { "type": "double" } } } }
-
索引您的数据:
POST /my_index/_doc/1 { "title": "OpenSearch Basics", "author": "John Doe", "publication_date": "2022-01-01", "price": 29.99 }
POST my_index/_doc/2 { "title": "Advanced OpenSearch", "author": "Jane Smith", "publication_date": "2023-01-01", "price": 39.99 }
-
使用
stored_fields
执行搜索:POST /my_index/_search { "_source": false, "stored_fields": ["title", "author"], "query": { "match_all": {} } }
以下是预期的响应:
{
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1.0,
"fields": {
"title": ["OpenSearch Basics"],
"author": ["John Doe"]
}
},
{
"_index": "my_index",
"_id": "2",
"_score": 1.0,
"fields": {
"title": ["Advanced OpenSearch"],
"author": ["Jane Smith"]
}
}
]
}
}
通过将 stored_fields
设置为 _none_
,可以完全禁用 stored_fields
。
使用嵌套对象搜索 stored_fields
在 OpenSearch 中,如果要检索嵌套对象的 stored_fields
,不能直接使用 stored_fields
参数,因为不会返回任何数据。相反,您应该使用 inner_hits
参数及其自己的 stored_fields
属性,如下例所示。
-
使用以下映射创建一个索引:
PUT /my_index { "mappings": { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "comments": { "type": "nested", "properties": { "username": { "type": "keyword", "store": true }, "content": { "type": "text", "store": true }, "created_at": { "type": "date", "store": true } } } } } }
-
索引您的数据:
POST /my_index/_doc/1 { "title": "OpenSearch Basics", "author": "John Doe", "comments": [ { "username": "alice", "content": "Great article!", "created_at": "2023-01-01T12:00:00Z" }, { "username": "bob", "content": "Very informative.", "created_at": "2023-01-02T12:00:00Z" } ] }
-
使用
inner_hits
和stored_fields
执行搜索:POST /my_index/_search { "_source": false, "query": { "nested": { "path": "comments", "query": { "match_all": {} }, "inner_hits": { "stored_fields": ["comments.username", "comments.content", "comments.created_at"] } } } }
以下是预期的响应:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1.0,
"inner_hits": {
"comments": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_nested": {
"field": "comments",
"offset": 0
},
"fields": {
"comments.username": ["alice"],
"comments.content": ["Great article!"],
"comments.created_at": ["2023-01-01T12:00:00.000Z"]
}
},
{
"_index": "my_index",
"_id": "1",
"_nested": {
"field": "comments",
"offset": 1
},
"fields": {
"comments.username": ["bob"],
"comments.content": ["Very informative."],
"comments.created_at": ["2023-01-02T12:00:00.000Z"]
}
}
]
}
}
}
}
]
}
}
使用源过滤
源过滤是一种控制 _source
字段哪些部分包含在搜索响应中的方法。只在响应中包含必要的字段有助于减少网络传输的数据量并提高性能。
您可以使用完整的字段名称或简单的通配符模式,在搜索响应中包含或排除 _source
字段中的特定字段。以下示例演示如何包含特定字段。
-
索引您的数据:
PUT /my_index/_doc/1 { "title": "OpenSearch Basics", "author": "John Doe", "publication_date": "2021-01-01", "price": 29.99 }
-
使用源过滤执行搜索:
POST /my_index/_search { "_source": ["title", "author"], "query": { "match_all": {} } }
以下是预期的响应:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "OpenSearch Basics",
"author": "John Doe"
}
}
]
}
}
使用源过滤排除字段
您可以通过在搜索请求中使用 "excludes"
参数来选择排除字段,如下例所示:
POST /my_index/_search
{
"_source": {
"excludes": ["price"]
},
"query": {
"match_all": {}
}
}
以下是预期的响应:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "OpenSearch Basics",
"author": "John Doe",
"publication_date": "2021-01-01"
}
}
]
}
}
在同一搜索中同时包含和排除字段
在某些情况下,include
和 exclude
参数都可能是必要的。以下示例演示如何在同一搜索中包含和排除字段。
考虑一个包含以下文档的 products
索引:
{
"product_id": "123",
"name": "Smartphone",
"category": "Electronics",
"price": 699.99,
"description": "A powerful smartphone with a sleek design.",
"reviews": [
{
"user": "john_doe",
"rating": 5,
"comment": "Great phone!",
"date": "2023-01-01"
},
{
"user": "jane_doe",
"rating": 4,
"comment": "Good value for money.",
"date": "2023-02-15"
}
],
"supplier": {
"name": "TechCorp",
"contact_email": "support@techcorp.com",
"address": {
"street": "123 Tech St",
"city": "Techville",
"zipcode": "12345"
}
},
"inventory": {
"stock": 50,
"warehouse_location": "A1"
}
}
要在该索引上执行搜索,同时仅在响应中包含 name
、price
、reviews
和 supplier
字段,并排除 supplier
对象中的 contact_email
字段和 reviews
对象中的 comment
字段,请执行以下搜索:
GET /products/_search
{
"_source": {
"includes": ["name", "price", "reviews.*", "supplier.*"],
"excludes": ["reviews.comment", "supplier.contact_email"]
},
"query": {
"match": {
"category": "Electronics"
}
}
}
以下是预期的响应:
{
"hits": {
"hits": [
{
"_source": {
"name": "Smartphone",
"price": 699.99,
"reviews": [
{
"user": "john_doe",
"rating": 5,
"date": "2023-01-01"
},
{
"user": "jane_doe",
"rating": 4,
"date": "2023-02-15"
}
],
"supplier": {
"name": "TechCorp",
"address": {
"street": "123 Tech St",
"city": "Techville",
"zipcode": "12345"
}
}
}
}
]
}
}
使用脚本字段
script_fields
参数允许您在搜索结果中包含使用脚本计算其值的自定义字段。这对于根据文档数据动态计算值非常有用。您还可以使用类似的方法检索派生字段
。更多信息,请参阅检索字段。
如果您有一个产品索引,其中每个产品文档都包含 price
和 discount_percentage
字段。您可以使用 script_fields
参数在搜索结果中包含一个名为 discounted_price
的自定义字段,该字段将根据 price
和 discount_percentage
字段使用脚本计算。
-
索引数据:
PUT /products/_doc/123 { "product_id": "123", "name": "Smartphone", "price": 699.99, "discount_percentage": 10, "category": "Electronics", "description": "A powerful smartphone with a sleek design." }
-
使用
script_fields
参数在搜索结果中包含一个名为discounted_price
的自定义字段。此字段将根据price
和discount_percentage
字段使用脚本计算:GET /products/_search { "_source": ["product_id", "name", "price", "discount_percentage"], "query": { "match": { "category": "Electronics" } }, "script_fields": { "discounted_price": { "script": { "lang": "painless", "source": "doc[\"price\"].value * (1 - doc[\"discount_percentage\"].value / 100)" } } } }
您应该会收到以下响应
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "products",
"_id": "123",
"_score": 1.0,
"_source": {
"product_id": "123",
"name": "Smartphone",
"price": 699.99,
"discount_percentage": 10
},
"fields": {
"discounted_price": [629.991]
}
}
]
}
}