MYSQL 优化器预处理阶段对子查询的优化

旃麒雅 · 发表于 2016-10-18 08:19:31

最近看了下MYSQL优化器预处理阶段对子查询的优化, 同时还有些疑问发出来向大家请教下。
sql_resolver.cc

JOIN::prepare() 预处理阶段
1) remove_redundant_subquery_clauses()方法去除子查询中冗余语句
MySQL 中的子查询类型：enum subs_type {UNKNOWN_SUBS, SINGLEROW_SUBS, EXISTS_SUBS, IN_SUBS, ALL_SUBS, ANY_SUBS};
MySQL 会对EXISTS_SUBS, IN_SUBS, ALL_SUBS, ANY_SUBS 这四种类型的子查询过滤冗余语句，规则如下：
a）如果有order by 直接去掉, 因为这里排序是没有意义的例如：
select * from t1 where name in ( select name from t2 where t2.col = xxx order by name) -->
select * from t1 where name in ( select name from t2 where t2.col = xxx )
b) 如果有distinct 直接去掉,其实这里有个疑问：
  如果子查询有100W条记录，去重后只剩下100条记录了这样比较100条记录，否则 id > all 的话需要比较100w条记录呢，这是否说明distinct还是有效果的呢？
例如：
select * from t1 where id > all ( select distinct(name) from t2 where t2.col = xxx ) -->
select * from t1 where id > all ( select name from t2 where t2.col = xxx )
c) 如果子查询有group by且不包含聚合函数和having的条件直接去掉。
为何要加上两个非条件呢?
聚合函数说明需要输出统计信息所以group by是有用的，having使用过滤分组的同样这样的group by也是有用的,所以这这种情况的group by需要保留。
例如：
select * from t1 where id in ( select id from t2 where t2.col = xxx group by t2.name) -- >
select * from t1 where id in ( select id from t2 where t2.col = xxx )

2) resovle_subquery()
a)转化子查询到半连接。
优点：子查询中全表扫描外层表的每条记录和子查询做join(5.6之前），半连接（上拉/扁平化）将子查询的表上拉到外面做join操作
这样可以充分使用join的优化策略。
注意下面的这个例子还有个对外表去重操作，至于如何去重取决于优化器的半连接优化策略。
1. 重复剔除:join结果到临时表但是在join列上加上主键，这样临时表中外层表就不会有重复的记录了
2. 松散扫描：子查询结果有序且子查询的join列有索引，则对子查询做分组且join每组的第一条记录
   3. 第一次匹配：外层表join到子查询的第一条符合记录就退出这条记录的join操作，接着做外层表的下一条记录join子查询
4. 索引式物化：子查询物化到临时表，（MySQL自动为临时表创建索引），可以把临时表当成内表利用临时的索引来做join。
5. 扫描式物化：子查询物化到临时表，但是子查询小结果集的临时表被当成了外表，所以做全表扫描。
例如：
select t1.* from t1 where t1.xxx in ( select t2.xxx from t2 ) -->
select t1.* from t1 join t2 on t1.xxx = t2.xxx
前置条件：
     /*
      Check if we're in subquery that is a candidate for flattening into a
      semi-join (which is done in flatten_subqueries()). The requirements are:
      1. Subquery predicate is an IN/=ANY subquery predicate
      2. Subquery is a single SELECT (not a UNION)
      3. Subquery does not have GROUP BY
      4. Subquery does not use aggregate functions or HAVING
      5. Subquery predicate is at the AND-top-level of ON/WHERE clause
          a) select xxx from t1 where t1.xxx in ()
          b) select t1.xx from t1 join t2 on t2.xx2 in ( )
      6. We are not in a subquery of a single table UPDATE/DELETE that
      doesn't have a JOIN (TODO: We should handle this at some
      point by switching to multi-table UPDATE/DELETE)
      rm 单表的UPDATE/DELETE的子查询是不行，因为UPDATE/DELETE没有join一说
      7. We're not in a confluent table-less subquery, like "SELECT 1".
      rm 无表子句不处理
      8. No execution method was already chosen (by a prepared statement)
      9. Parent select is not a confluent table-less select
      rm 父表不能是无表查询如 select 1
      10. Neither parent nor child select have STRAIGHT_JOIN option.
      */

b）物化标识子查询
将非相关子查询保存到临时表中或者缓存中。
c) 执行in 向 exists转换（优点：这样exists就可以使用半连接的优化策略了）
例子：
outer_expr IN (SELECT inner_expr FROM … WHERE subquery_where) -->
EXISTS (SELECT 1 FROM … WHERE subquery_where AND outer_expr=inner_expr)
但是还有两种情况需要考虑outer_expr为null的情况和inner_expr为null的情况，这两点也是我比较疑惑的地方
例如，有如下表a1：
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://askdba.alibaba-inc.com/services/getTfsFile.do?filePath=TB1YrKxKXXXXXbNXXXXXXXXXXXX.jpg
  我执行select * from a1 where id in ( select df from a1) -->无记录
select * from a1 as t1 where exists ( select 1 from a1 t2 where (t1.id = t2.df or t2.df is null ) ) --> 三条记录
结果竟然不一样！这是为什么呢？
d）<op> ALL/ANY/SOME 向MIN MAX转换 OP为大于或者小于操作,例子会将子查询转化成下面两种其中的一种。
例子:（oe 外表的表达式， ie 内表的表达式， cmp为 "> <")
  oe $cmp$ (SELECT ie FROM ... WHERE subq_where ... HAVING subq_having) -->
  oe $cmp$ (SELECT MAX(...) ) // handled by Item_singlerow_subselect
  oe $cmp$ <max>(SELECT ...) // handled by Item_maxmin_subselect
前置条件：
      /*
      If this is an ALL/ANY single-value subquery predicate, try to rewrite
      it with a MIN/MAX subquery.

      E.g. SELECT * FROM t1 WHERE b > ANY (SELECT a FROM t2) can be rewritten
      with SELECT * FROM t1 WHERE b > (SELECT MIN(a) FROM t2).

      A predicate may be transformed to use a MIN/MAX subquery if it:
      1. has a greater than/less than comparison operator, and
      2. is not correlated with the outer query, and
      rm 非相关子查询（与外表无关） select t1.xx from t1 where t1.xxx > any ( select xxx from t2 where t2.xx = 123 )
      3. UNKNOWN results are treated as FALSE, or can never be generated, and
      rm 未知结果被当成false处理，或者永远不会生成未知结果
      */
e) 使用值替代标量子查询
（标量子查询：select (subquery) from table)其中subquery只能为单列和单值。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] MYSQL 优化器预处理阶段对子查询的优化

浏览过的版块

扫码加入运维网微信交流群