Link Search Menu Expand Document Documentation Menu

子搜索

这是一个实验性功能,不建议在生产环境中使用。有关该功能进展的更新或如果您想提供反馈,请加入 OpenSearch 论坛的讨论。

子搜索(也称为子查询)允许您在一个查询中使用另一个查询的结果。OpenSearch 管道处理语言 (PPL) 支持四种类型的子搜索命令

前三个子搜索命令(inexistsscalar)是您可以在 where 命令(where <布尔表达式>)和搜索过滤器(search source=* <布尔表达式>)中使用的表达式。relation 子搜索命令是一个可在 join 操作中使用的语句。

in

一个 in 子搜索允许您检查某个字段的值是否存在于另一个查询的结果中。当您希望根据来自另一个索引或查询的数据过滤结果时,这非常有用。

语法

where <field> [not] in [ search source=... | ... | ... ]

用法

source = outer | where a in [ source = inner | fields b ]
source = outer | where (a) in [ source = inner | fields b ]
source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]
source = outer | where a not in [ source = inner | fields b ]
source = outer | where (a) not in [ source = inner | fields b ]
source = outer | where (a,b,c) not in [ source = inner | fields d,e,f ]
source = outer a in [ source = inner | fields b ]
source = outer a not in [ source = inner | fields b ]
source = outer | where a in [ source = inner1 | where b not in [ source = inner2 | fields c ] | fields b ] // nested
source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source = inner | fields d ] | fields l.a, r.a, b, c //as join filter

exists

一个 exists 子搜索检查子搜索查询是否返回任何结果。这对于关联子查询特别有用,在关联子查询中您希望检查相关记录的存在性。

语法

where [not] exists [ search source=... | ... | ... ]

用法

以下示例演示了实现 exists 子搜索的不同方法,从简单的聚合比较到复杂的嵌套计算。

它们是基于以下假设创建的

  • ab 是外部表的字段。
  • cd 是内部表的字段。
  • ef 是嵌套表的字段。

关联

在以下示例中,内部查询引用了外部查询的字段(例如当 a = c 时),从而在查询之间创建了依赖关系。子搜索对外部查询中的每一行评估一次

source = outer | where exists [ source = inner | where a = c ]
source = outer | where not exists [ source = inner | where a = c ]
source = outer | where exists [ source = inner | where a = c and b = d ]
source = outer | where not exists [ source = inner | where a = c and b = d ]
source = outer exists [ source = inner | where a = c ]
source = outer not exists [ source = inner | where a = c ]
source = table as t1 exists [ source = table as t2 | where t1.a = t2.a ]

不关联

在以下示例中,子搜索独立于外部查询。内部查询不引用外部查询中的任何字段,因此无论外部查询中有多少行,它都只评估一次

source = outer | where exists [ source = inner | where c > 10 ]
source = outer | where not exists [ source = inner | where c > 10 ]

嵌套

以下示例演示了如何在一个子搜索中嵌套另一个子搜索,从而创建多层次的查询复杂性。这种方法对于需要来自不同数据源的多个条件的复杂过滤场景非常有用

source = outer | where exists [ source = inner1 | where a = c and exists [ source = nested | where c = e ] ]
source = outer | where exists [ source = inner1 | where a = c | where exists [ source = nested | where c = e ] ]

scalar

一个 scalar 子搜索返回一个单值,您可以在比较或计算中使用它。当您需要将字段与另一个查询的聚合值进行比较时,这非常有用。

语法

where <field> = [ search source=... | ... | ... ]

用法

以下示例演示了实现 scalar 子搜索的不同方法,从简单的聚合比较到复杂的嵌套计算。

不关联

在以下示例中,scalar 子搜索独立于外部查询。这些子搜索检索一个可用于计算或比较的单值

source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | stats max(c) ] + b | fields m, a
source = outer | where a > [ source = inner | stats min(c) ] | fields a
source = outer a > [ source = inner | stats min(c) ] | fields a

关联

在以下示例中,scalar 子搜索引用了外部查询的字段,从而创建了一个依赖关系,其中内部查询结果取决于外部查询的每一行

source = outer | eval m = [ source = inner | where outer.b = inner.d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where b = d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where outer.b > inner.d | stats max(c) ] | fields m, a
source = outer | where a = [ source = inner | where outer.b = inner.d | stats max(c) ]
source = outer | where a = [ source = inner | where b = d | stats max(c) ]
source = outer | where [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
source = outer a = [ source = inner | where b = d | stats max(c) ]
source = outer [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a

嵌套

以下示例演示了如何嵌套多个 scalar 子搜索以创建复杂比较或在一个子搜索中使用另一个子搜索的结果

source = outer | where a = [ source = inner | stats max(c) | sort c ] OR b = [ source = inner | where c = 1 | stats min(d) | sort d ]
source = outer | where a = [ source = inner | where c =  [ source = nested | stats max(e) by f | sort f ] | stats max(d) by c | sort c | head 1 ]

relation

一个 relation 子搜索允许您将查询结果作为数据集用于连接操作。当您需要与过滤或转换后的数据集进行连接,而不是直接与静态索引进行连接时,这非常有用。

语法

join on <condition> [ search source=... | ... | ... ] [as alias]

用法

以下示例演示了如何在连接操作中使用 relation 子搜索。第一个示例展示了如何与过滤后的数据集进行连接,而第二个示例展示了如何在另一个查询中嵌套 relation 子搜索

source = table1 | join left = l right = r on condition [ source = table2 | where d > 10 | head 5 ] //subquery in join right side
source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1

示例

以下示例演示了不同子搜索类型在查询场景中如何协同工作,例如多级查询或嵌套多种子搜索类型。

复杂查询示例

以下示例演示了如何在复杂查询中组合不同类型的子搜索。

示例 1:包含 inscalar 子搜索的查询

以下查询同时使用 inscalar 子搜索,以查找来自加拿大的供应商,这些供应商提供的零件名称以“forest”开头,并且可用数量大于 1994 年总订购数量的一半

source = supplier
| join ON s_nationkey = n_nationkey nation
| where n_name = 'CANADA'
   and s_suppkey in [ /* in subsearch */
     source = partsupp
     | where ps_partkey in [ /* nested in subsearch */
         source = part
         | where like(p_name, 'forest%')
         | fields p_partkey
       ]
       and ps_availqty > [ /* scalar subsearch */
         source = lineitem
         | where l_partkey = ps_partkey
           and l_suppkey = ps_suppkey
           and l_shipdate >= date('1994-01-01')
           and l_shipdate < date_add(date('1994-01-01'), interval 1 year)
         | stats sum(l_quantity) as sum_l_quantity
         | eval half_sum_l_quantity = 0.5 * sum_l_quantity
         | fields half_sum_l_quantity
       ]
     | fields ps_suppkey

示例 2:包含 relationscalarexists 子搜索的查询

以下查询使用 relationscalarexists 子搜索,以查找来自特定国家代码、账户余额高于平均水平且未下任何订单的客户

source = [  /* relation subsearch */
  source = customer
    | where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
      and c_acctbal > [ /* scalar subsearch */
          source = customer
          | where c_acctbal > 0.00
            and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
          | stats avg(c_acctbal)
        ]
      and not exists [ /* correlated exists subsearch */
          source = orders
          | where o_custkey = c_custkey
        ]
    | eval cntrycode = substring(c_phone, 1, 2)
    | fields cntrycode, c_acctbal
  ] as custsale
| stats count() as numcust, sum(c_acctbal) as totacctbal by cntrycode
| sort cntrycode

限制

PPL 子搜索仅在 plugins.calcite.enabled 设置为 true 时才有效。

剩余 350 字符

有问题?

想做贡献?