深入浅出 effective_cache_size

数据库原理

Word count: 2kReading time: 9 min

 2024/02/19 

前言

关于 effective_cache_size 参数许久之前也曾翻译过一篇文章，最近在复核 14 internal 的时候，又有了一些新的理解与感触。

何如

众所周知，在 PostgreSQL 的双缓存架构下，其对于自己的 shared_buffers 有着”绝对”掌控权，但是对于文件系统缓存一无所知，加之 cache 几乎无处不在，除了文件系统，存储有、RAID 卡有、磁盘可能也有，因此优化器还要考虑到这些缓存的存在

We also use a rough estimate “effective_cache_size” of the number of disk pages in Postgres + OS-level disk cache. (We can’t simply use NBuffers for this purpose because that would ignore the effects of the kernel’s disk cache.)

effective_cache_size 便是做这个的，简而言之，此参数用于告诉优化器有多少缓存可以用于查询，包括操作系统的缓存，按照一般经验，推荐设置为 RAM * 0.7

Sets the planner’s assumption about the effective size of the disk cache that is available to a single query.

然而官网的解释比较粗糙，其详细原理并没有过多阐述，只是给出了如何调整的建议，越高越可能使用索引，越低则倾向于使用顺序扫描。

This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used

更高的数值会使得索引扫描更可能被使用，更低的数值会使得顺序扫描更可能被使用。

让我们看看代码中的注解，深入了解其原理。代码位于 costsize.c 中

/*
 * index_pages_fetched
 *	  Estimate the number of pages actually fetched after accounting for
 *	  cache effects.
 *
 * We use an approximation proposed by Mackert and Lohman, "Index Scans
 * Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
 * on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.
 * The Mackert and Lohman approximation is that the number of pages
 * fetched is
 *	PF =
 *		min(2TNs/(2T+Ns), T)			when T <= b
 *		2TNs/(2T+Ns)					when T > b and Ns <= 2Tb/(2T-b)
 *		b + (Ns - 2Tb/(2T-b))*(T-b)/T	when T > b and Ns > 2Tb/(2T-b)
 * where
 *		T = # pages in table
 *		N = # tuples in table
 *		s = selectivity = fraction of table to be scanned
 *		b = # buffer pages available (we include kernel space here)
 *
 * We assume that effective_cache_size is the total number of buffer pages
 * available for the whole query, and pro-rate that space across all the
 * tables in the query and the index currently under consideration.  (This
 * ignores space needed for other indexes used by the query, but since we
 * don't know which indexes will get used, we can't estimate that very well;
 * and in any case counting all the tables may well be an overestimate, since
 * depending on the join plan not all the tables may be scanned concurrently.)
 *
 * The product Ns is the number of tuples fetched; we pass in that
 * product rather than calculating it here.  "pages" is the number of pages
 * in the object under consideration (either an index or a table).
 * "index_pages" is the amount to add to the total table space, which was
 * computed for us by make_one_rel.
 *
 * Caller is expected to have ensured that tuples_fetched is greater than zero
 * and rounded to integer (see clamp_row_est).  The result will likewise be
 * greater than zero and integral.
 */
double
index_pages_fetched(double tuples_fetched, BlockNumber pages,
					double index_pages, PlannerInfo *root)
{
  ...

看起来很生硬，代码看得我云里雾里。简而言之，计算采用了 Mackert 和 Lohman 提出的模型，这个方法旨在估算有限 LRU 缓冲区下索引扫描的 I/O 成本。

该公式基于以下参数：

T：表中的页面数
N：表中的元组数
s：约束条件的选择率
b：表上的页面已经在缓存中的数量

index_pages_fetched 通过这个公式估算，在考虑到缓存效应后，实际需要获取的页面数量。这种方法帮助优化器在考虑不同查询执行计划时，更准确地估算 I/O 成本，从而选择最佳的查询执行策略。

假设当前查询中涉及的所有表的总页面是 total_pages，effective_cache_size 表示当前查询中涉及的所有表已经加载到缓存的总页面数，那么 (细节可以参照查询优化深度探索一书)：

分析

代码过于晦涩，让我们看个实际例子：

postgres=# CREATE TABLE t_ordered AS SELECT id, ceil(random() * 100000000) AS r FROM generate_series(1, 1000000) AS id;
SELECT 1000000
postgres=# analyze t_ordered ;
ANALYZE
postgres=# select attname,correlation from pg_stats where tablename = 't_ordered';
 attname |  correlation  
---------+---------------
 id      |             1
 r       | -0.0018580947
(2 rows)

t_ordered 表的 id 列相关性为 1，表示完美关联，元组在磁盘上的物理顺序与索引中 TIDS 的逻辑顺序完美相关，那么每个页面只会被访问一次：Index Scan 节点将顺序地从一个页面跳到另一个页面，逐个读取元组。

而 r 列为 0.0018，越接近 0 的值表示数据越杂乱无章，Index Scan 节点必须在页面之间跳转，而不是顺序读取；在最坏的情况下，访问的页面数量会达到获取的元组数量。

让我们粗略计算下总成本是如何计算出来的，记住前提是 id 列的相关性为 1

postgres=# create index on t_ordered(id);
CREATE INDEX
postgres=# create index on t_ordered(r);
CREATE INDEX
postgres=# analyze t_ordered ;
ANALYZE
postgres=# set enable_bitmapscan to off;
SET
postgres=# set enable_seqscan to off;
SET
postgres=# explain select * from t_ordered where id < 100000;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Index Scan using t_ordered_id_idx on t_ordered  (cost=0.42..3351.27 rows=98677 width=12)
   Index Cond: (id < 100000)
(2 rows)

优化器预估返回 98677 条，那么选择率为 0.098677

postgres=# select 98677/reltuples as selectivity from pg_class where relname = 't_ordered';
 selectivity 
-------------
    0.098677
(1 row)

索引扫描的成本包括：

CPU cost of searching B+-tree，扫描 B 树的成本
CPU cost of scanning index tuples in leaf pages，在叶子页搜索匹配元组的成本
I/O cost of leaf pages，叶子页面的 I/O 成本
I/O cost of heap pages，堆页面的 I/O 成本
CPU cost of scanning heap tuples，扫描堆元组的成本

具体细节可以参考 https://www.interdb.jp/pg/pgsql03/02.html，此处粗略计算一下：

postgres=# WITH costs(idx_cost, tbl_cost) AS (
  SELECT
  (
    SELECT round(
      current_setting('random_page_cost')::real * pages +
      current_setting('cpu_index_tuple_cost')::real * tuples +
      current_setting('cpu_operator_cost')::real * tuples
    )
    FROM (
      SELECT relpages * 0.098677 AS pages, reltuples * 0.098677 AS tuples
      FROM pg_class WHERE relname = 't_ordered_id_idx'
    ) c
  ),
  (
    SELECT round(
      current_setting('seq_page_cost')::real * pages +
      current_setting('cpu_tuple_cost')::real * tuples
    )
    FROM (
      SELECT relpages *0.098677 AS pages, reltuples * 0.098677 AS tuples
      FROM pg_class WHERE relname = 't_ordered'
    ) c
  )
)
SELECT idx_cost, tbl_cost, idx_cost + tbl_cost AS total
FROM costs;
 idx_cost | tbl_cost | total 
----------+----------+-------
     1824 |     1524 |  3348
(1 row)

我们计算出来的结果为 3348，和优化器算出来的 3351 接近，当然这是一个近似值。现在，让我们基于随机列的查询 (返回大致相同的行数，使选择率大致相同)：

postgres=# explain select * from t_ordered where r < 10000000;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Index Scan using t_ordered_r_idx on t_ordered  (cost=0.42..24555.56 rows=98071 width=12)
   Index Cond: (r < '10000000'::double precision)
(2 rows)

计算方式类似，不过此处是 random 列，相关性并非 1

postgres=# WITH costs(idx_cost, tbl_cost) AS (
  SELECT
  (
    SELECT round(
      current_setting('random_page_cost')::real * pages +
      current_setting('cpu_index_tuple_cost')::real * tuples +
      current_setting('cpu_operator_cost')::real * tuples
    )
    FROM (
      SELECT relpages * 0.098071 AS pages, reltuples * 0.098071 AS tuples
      FROM pg_class WHERE relname = 't_ordered_r_idx'
    ) c
  ),
  (
    SELECT round(
      current_setting('random_page_cost')::real * tuples +
      current_setting('cpu_tuple_cost')::real * tuples
    )
    FROM (
      SELECT relpages * 0.098071 AS pages, reltuples * 0.098071 AS tuples
      FROM pg_class WHERE relname = 't_ordered'
    ) c
  )
)
SELECT idx_cost, tbl_cost, idx_cost + tbl_cost AS total
FROM costs;
 idx_cost | tbl_cost | total  
----------+----------+--------
     1812 |   393265 | 395077
(1 row)

可以看到这种方式计算出来的值和优化器算出来的相差极大，原因不难理解，优化器考虑了缓存，默认的 effective_cache_size 为 4GB。

假如我们将其设为最小值，这样计算出来的值 394988.27 就和我们计算出来的值十分接近了。

postgres=# set effective_cache_size to '8kB';
SET
postgres=# explain select * from t_ordered where r < 10000000;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Index Scan using t_ordered_r_idx on t_ordered  (cost=0.42..394988.27 rows=98071 width=12)
   Index Cond: (r < '10000000'::double precision)
(2 rows)

小结

小结一下

effective_cache_size 用于告诉优化器有多少缓存可以用于查询，并非真正分配这么多，仅仅是一个参考值，当评估叶子页面的 I/O 成本时，优化器会考虑到 effective_cache_size，越大，索引扫描的成本就越低，也就越倾向于使用索引扫描
而 correlation 相关性决定了元组在磁盘上的物理顺序和逻辑顺序，也决定了堆页面的 I/O 成本，能否顺序访问的概率

参考

https://www.interdb.jp/pg/pgsql03/02.html

https://www.cybertec-postgresql.com/en/effective_cache_size-a-practical-example/

https://www.cybertec-postgresql.com/en/effective_cache_size-what-it-means-in-postgresql/

Next Post

从 Greenplum 扩容谈起
Previous Post

PostgreSQL 17 新特性前瞻

CATALOG

1. 前言
2. 何如
3. 分析
4. 小结
5. 参考