Allow correlated IN subqueries to be transformed into joins (Andy Fan, Tom Lane)
即允许相关的 IN 子查询转换为 JOIN,部分场景下可以大幅度提升查询性能,作者是国内大佬,给大佬点赞!👍🏻
子查询的分类
与子查询类似的是子连接,出现在 FROM 关键字后的子句是子查询语句,出现在 WHERE/ON 等约束条件中或投影中的子句是子连接语句,不过大多数时候,二者并不会分的那么细。
其中,子查询又可以细分为相关子查询和非相关子查询,顾名思义,就是子查询是否和主表有”关联”,相关子查询会导致外层表每获得一个元组,子查询就需要重新执行一次;而非相关子查询是指在子查询语句是独立的,和外层的表没有直接的关联,子查询可以单独执行一次,外层表可以重复利用子查询的执行结果。这一段话听起来和标量子查询很类似,不过标量子查询返回的是单个值,通常出现在 SELECT 列表中,对于标量子查询的改写就那么一个套路,使用 LEFT JOIN,如果关联字段是主外键的关系,那么就使用 INNER JOIN 进行改写。关于标量子查询的案例,可以参照 DB Killer?原来是标量子查询!
postgres=# EXPLAIN ANALYZE SELECT* FROM tb1 WHERE id IN ( SELECT id FROM tb2 WHERE tb2.c1 = tb1.c1 ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- MergeJoin (cost=1.55..2.08rows=2 width=12) (actual time=0.067..0.078rows=1 loops=1) Merge Cond: (tb1.id = tb2.id) JoinFilter: (tb1.c1 = tb2.c1) Rows Removed byJoinFilter: 9 -> Index Scan using tb1_pkey on tb1 (cost=0.29..328.29rows=10000 width=12) (actual time=0.024..0.029rows=11 loops=1) -> Sort (cost=1.27..1.29rows=10 width=8) (actual time=0.029..0.031rows=10 loops=1) Sort Key: tb2.id Sort Method: quicksort Memory: 25kB -> Seq Scan on tb2 (cost=0.00..1.10rows=10 width=8) (actual time=0.012..0.016rows=10 loops=1) Planning Time: 0.498 ms Execution Time: 0.169 ms (11rows)
postgres=# select version(); version ------------------------------------------------------------------------------------------------------------ PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.520150623 (Red Hat 4.8.5-44), 64-bit (1row)
postgres=# EXPLAIN ANALYZE SELECT* FROM tb1 WHERE id IN ( SELECT id FROM tb2 WHERE tb2.c1 = tb1.c1 ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=5.25..175.91rows=14 width=12) (actual time=1.470..1.543rows=1 loops=1) -> Hash Semi Join (cost=5.25..175.91rows=3 width=12) (actual time=0.454..1.080rows=1 loops=1) Hash Cond: ((tb1.c1 = tb2.c1) AND (tb1.id = tb2.id)) Extra Text: (seg0) Hash chain length 1.0 avg, 1 max, using3of524288 buckets. -> Seq Scan on tb1 (cost=0.00..118.00rows=1667 width=12) (actual time=0.011..0.098rows=1723 loops=1) -> Hash (cost=5.10..5.10rows=2 width=8) (actual time=0.011..0.011rows=3 loops=1) -> Seq Scan on tb2 (cost=0.00..5.10rows=2 width=8) (actual time=0.005..0.005rows=3 loops=1) Planning time: 0.288 ms (slice0) Executor memory: 59K bytes. (slice1) Executor memory: 4172K bytes avg x 6 workers, 4172K bytes max (seg0). Work_mem: 1K bytes max. Memory used: 128000kB Optimizer: Postgres query optimizer Execution time: 1.965 ms (13rows)
postgres=# select version(); version ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4.26 (Greenplum Database 6.26.4 build commit:bcbaaead795b417d18644f36f334827f0815cf37) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Mar 8202400:39:37 (1row)