Link Search Menu Expand Document Documentation Menu

复杂查询

除了简单的 SFW(SELECT-FROM-WHERE)查询外,SQL 插件还支持子查询、连接、联合和减法等复杂查询。这些查询在一个以上的 OpenSearch 索引上操作。要了解这些查询在后台如何执行,请使用 explain 操作。

连接

OpenSearch SQL 支持内连接、交叉连接和左外连接。

约束

连接有许多约束

  1. 您只能连接两个索引。
  2. 您必须为索引使用别名(例如,people p)。
  3. 在 ON 子句中,您只能使用 AND 条件。
  4. 在 WHERE 语句中,不要组合包含多个索引的树。例如,以下语句有效:

    WHERE (a.type1 > 3 OR a.type1 < 0) AND (b.type2 > 4 OR b.type2 < -1)
    

    以下语句无效:

    WHERE (a.type1 > 3 OR b.type2 < 0) AND (a.type1 > 4 OR b.type2 < -1)
    
  5. 您不能对结果使用 GROUP BY 或 ORDER BY。
  6. 不支持带 OFFSET 的 LIMIT(例如 LIMIT 25 OFFSET 25)。

描述

JOIN 子句使用每个索引共有的值组合一个或多个索引的列。

语法

规则 tableSource

tableSource

规则 joinPart

joinPart

示例 1:内连接

内连接根据您的连接谓词组合两个索引的列,创建新的结果集。它迭代这两个索引并比较每个文档以找到满足连接谓词的文档。您可以选择在 JOIN 子句之前加上 INNER 关键字。

连接谓词由 ON 子句指定。

SQL 查询

SELECT
  a.account_number, a.firstname, a.lastname,
  e.id, e.name
FROM accounts a
JOIN employees_nested e
 ON a.account_number = e.id

解释

explain 输出很复杂,因为 JOIN 子句与在单独的查询规划器框架中执行的两个 OpenSearch DSL 查询相关联。您可以通过检查 Physical PlanLogical Plan 对象来解释它。

{
  "Physical Plan" : {
    "Project [ columns=[a.account_number, a.firstname, a.lastname, e.name, e.id] ]" : {
      "Top [ count=200 ]" : {
        "BlockHashJoin[ conditions=( a.account_number = e.id ), type=JOIN, blockSize=[FixedBlockSize with size=10000] ]" : {
          "Scroll [ employees_nested as e, pageSize=10000 ]" : {
            "request" : {
              "size" : 200,
              "from" : 0,
              "_source" : {
                "excludes" : [ ],
                "includes" : [
                  "id",
                  "name"
                ]
              }
            }
          },
          "Scroll [ accounts as a, pageSize=10000 ]" : {
            "request" : {
              "size" : 200,
              "from" : 0,
              "_source" : {
                "excludes" : [ ],
                "includes" : [
                  "account_number",
                  "firstname",
                  "lastname"
                ]
              }
            }
          },
          "useTermsFilterOptimization" : false
        }
      }
    }
  },
  "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query",
  "Logical Plan" : {
    "Project [ columns=[a.account_number, a.firstname, a.lastname, e.name, e.id] ]" : {
      "Top [ count=200 ]" : {
        "Join [ conditions=( a.account_number = e.id ) type=JOIN ]" : {
          "Group" : [
            {
              "Project [ columns=[a.account_number, a.firstname, a.lastname] ]" : {
                "TableScan" : {
                  "tableAlias" : "a",
                  "tableName" : "accounts"
                }
              }
            },
            {
              "Project [ columns=[e.name, e.id] ]" : {
                "TableScan" : {
                  "tableAlias" : "e",
                  "tableName" : "employees_nested"
                }
              }
            }
          ]
        }
      }
    }
  }
}

结果集

a.account_number a.firstname a.lastname e.id e.name
6 Hattie Bond 6 Jane Smith

示例 2:交叉连接

交叉连接,也称为笛卡尔连接,将第一个索引中的每个文档与第二个索引中的每个文档组合。结果集是两个索引文档的笛卡尔积。此操作类似于没有指定连接条件的 ON 子句的内连接。

对大型或中型索引执行交叉连接存在风险。它可能会触发断路器,终止查询以避免内存不足。

SQL 查询

SELECT
  a.account_number, a.firstname, a.lastname,
  e.id, e.name
FROM accounts a
JOIN employees_nested e

结果集

a.account_number a.firstname a.lastname e.id e.name
1 Amber Duke 3 Bob Smith
1 Amber Duke 4 Susan Smith
1 Amber Duke 6 Jane Smith
6 Hattie Bond 3 Bob Smith
6 Hattie Bond 4 Susan Smith
6 Hattie Bond 6 Jane Smith
13 Nanette Bates 3 Bob Smith
13 Nanette Bates 4 Susan Smith
13 Nanette Bates 6 Jane Smith
18 Dale Adams 3 Bob Smith
18 Dale Adams 4 Susan Smith
18 Dale Adams 6 Jane Smith

示例 3:左外连接

如果第一个索引不满足连接谓词,则使用左外连接保留其行。OUTER 关键字是可选的。

SQL 查询

SELECT
  a.account_number, a.firstname, a.lastname,
  e.id, e.name
FROM accounts a
LEFT JOIN employees_nested e
 ON a.account_number = e.id

结果集

a.account_number a.firstname a.lastname e.id e.name
1 Amber Duke null null
6 Hattie Bond 6 Jane Smith
13 Nanette Bates null null
18 Dale Adams null null

子查询

子查询是用于另一个语句内并用括号括起来的完整 SELECT 语句。从解释输出中,您可以看到某些子查询实际上被转换为等效的连接查询以执行。

示例 1:表子查询

SQL 查询

SELECT a1.firstname, a1.lastname, a1.balance
FROM accounts a1
WHERE a1.account_number IN (
  SELECT a2.account_number
  FROM accounts a2
  WHERE a2.balance > 10000
)

解释

{
  "Physical Plan" : {
    "Project [ columns=[a1.balance, a1.firstname, a1.lastname] ]" : {
      "Top [ count=200 ]" : {
        "BlockHashJoin[ conditions=( a1.account_number = a2.account_number ), type=JOIN, blockSize=[FixedBlockSize with size=10000] ]" : {
          "Scroll [ accounts as a2, pageSize=10000 ]" : {
            "request" : {
              "size" : 200,
              "query" : {
                "bool" : {
                  "filter" : [
                    {
                      "bool" : {
                        "adjust_pure_negative" : true,
                        "must" : [
                          {
                            "bool" : {
                              "adjust_pure_negative" : true,
                              "must" : [
                                {
                                  "bool" : {
                                    "adjust_pure_negative" : true,
                                    "must_not" : [
                                      {
                                        "bool" : {
                                          "adjust_pure_negative" : true,
                                          "must_not" : [
                                            {
                                              "exists" : {
                                                "field" : "account_number",
                                                "boost" : 1
                                              }
                                            }
                                          ],
                                          "boost" : 1
                                        }
                                      }
                                    ],
                                    "boost" : 1
                                  }
                                },
                                {
                                  "range" : {
                                    "balance" : {
                                      "include_lower" : false,
                                      "include_upper" : true,
                                      "from" : 10000,
                                      "boost" : 1,
                                      "to" : null
                                    }
                                  }
                                }
                              ],
                              "boost" : 1
                            }
                          }
                        ],
                        "boost" : 1
                      }
                    }
                  ],
                  "adjust_pure_negative" : true,
                  "boost" : 1
                }
              },
              "from" : 0
            }
          },
          "Scroll [ accounts as a1, pageSize=10000 ]" : {
            "request" : {
              "size" : 200,
              "from" : 0,
              "_source" : {
                "excludes" : [ ],
                "includes" : [
                  "firstname",
                  "lastname",
                  "balance",
                  "account_number"
                ]
              }
            }
          },
          "useTermsFilterOptimization" : false
        }
      }
    }
  },
  "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query",
  "Logical Plan" : {
    "Project [ columns=[a1.balance, a1.firstname, a1.lastname] ]" : {
      "Top [ count=200 ]" : {
        "Join [ conditions=( a1.account_number = a2.account_number ) type=JOIN ]" : {
          "Group" : [
            {
              "Project [ columns=[a1.balance, a1.firstname, a1.lastname, a1.account_number] ]" : {
                "TableScan" : {
                  "tableAlias" : "a1",
                  "tableName" : "accounts"
                }
              }
            },
            {
              "Project [ columns=[a2.account_number] ]" : {
                "Filter [ conditions=[AND ( AND account_number ISN null, AND balance GT 10000 ) ] ]" : {
                  "TableScan" : {
                    "tableAlias" : "a2",
                    "tableName" : "accounts"
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

结果集

a1.firstname a1.lastname a1.balance
Amber Duke 39225
Nanette Bates 32838

示例 2:From 子查询

SQL 查询

SELECT a.f, a.l, a.a
FROM (
  SELECT firstname AS f, lastname AS l, age AS a
  FROM accounts
  WHERE age > 30
) AS a

解释

{
  "from" : 0,
  "size" : 200,
  "query" : {
    "bool" : {
      "filter" : [
        {
          "bool" : {
            "must" : [
              {
                "range" : {
                  "age" : {
                    "from" : 30,
                    "to" : null,
                    "include_lower" : false,
                    "include_upper" : true,
                    "boost" : 1.0
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "firstname",
      "lastname",
      "age"
    ],
    "excludes" : [ ]
  }
}

结果集

f l a
Amber Duke 32
Dale Adams 33
Hattie Bond 36
剩余 350 字符

有问题?

想贡献?