性能退化？隐藏于Index‑Only Scan背后的实现细节

数据库原理

Word count: 1.5kReading time: 6 min

 2025/07/28 

前言

关于 vacuum full，我们都知晓其作用，收缩空间、移除死元组和冻结元组，那么正常来说，查询性能肯定是可以得到提升的，比如缓存效率，那么什么情况下，vauum full 了之后，性能反而会退化呢？这个问题来自老虎刘大师，周末抽空研究了一下。

PG 的 vacuum full 会把 allvisible 清零，需要再执行一次 vacuum 才有值。

复现

首先理解一下 allvisible 的作用，顾名思义 —— 全部可见，index only scan 的代价估算取决于堆页中全部可见的比例，这样的信息存在于 pg_class.relallvisible 字段中，另外，index only scan 的代价估算与常规的索引扫描代价估算不同：其与表访问相关的 I/O 成本是根据未出现在可见性映射中的页面比例来计算的，至于元组处理的成本估算是相同的。

	/*
	 * We use relallvisible as-is, rather than scaling it up like we do for
	 * the pages and tuples counts, on the theory that any pages added since
	 * the last VACUUM are most likely not marked all-visible.  But costsize.c
	 * wants it converted to a fraction.
	 */
	if (relallvisible == 0 || curpages <= 0)
		*allvisfrac = 0;
	else if ((double) relallvisible >= curpages)
		*allvisfrac = 1;
	else
		*allvisfrac = (double) relallvisible / curpages;
}

比如当前总页面数是 1W，8K 是 allvisible 的，那么 allvisfrac 就是 0.8。然后在后续处理中，会进一步进行折扣

1
2
3

if (indexonly)                       /* costsize.c */
    pages_fetched = ceil(pages_fetched *
                         (1.0 - baserel->allvisfrac));

当 allvisfrac = 0.8 时，随机堆页数直接砍成 20 %。
若 allvisfrac = 1.0，则 pages_fetched  为 0，即纯索引 IO，没有堆访问。

因此，根据当前的代价模型，当 allvisfrac 升高时，pages_fetched  下降 → index only scan 总成本下降 → 相较于 index scan 更可能被选中。若 allvisfrac ≈ 0，IOS 基本退化为普通 Index Scan。铺垫了这么多，让我们看个栗子：

postgres=# create table t3(id int);
CREATE TABLE
postgres=# create index on t3(id);
CREATE INDEX
postgres=# insert into t3 values(generate_series(1,10000));
INSERT 0 10000
postgres=# vacuum analyze t3;
VACUUM
postgres=# select relname,reltuples,relallvisible from pg_class where relname = 't3';
 relname | reltuples | relallvisible 
---------+-----------+---------------
 t3      |     10000 |            45
(1 row)

postgres=# explain select id from t3 where id = 99;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Index Only Scan using t3_id_idx on t3  (cost=0.29..4.30 rows=1 width=4)
   Index Cond: (id = 99)
(2 rows)

再看看 vacuum full 了之后的行为，代价直接变为 8.30，不难理解，多了一次随机 IO (random_page_cost)

postgres=# vacuum full t3;
VACUUM
postgres=# explain select id from t3 where id = 99;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Index Only Scan using t3_id_idx on t3  (cost=0.29..8.30 rows=1 width=4)
   Index Cond: (id = 99)
(2 rows)

postgres=# explain analyze select id from t3 where id = 99;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using t3_id_idx on t3  (cost=0.29..8.30 rows=1 width=4) (actual time=0.089..0.091 rows=1 loops=1)
   Index Cond: (id = 99)
   Heap Fetches: 1
 Planning Time: 0.161 ms
 Execution Time: 0.117 ms
(5 rows)

postgres=# select relname,reltuples,relallvisible from pg_class where relname = 't3';
 relname | reltuples | relallvisible 
---------+-----------+---------------
 t3      |     10000 |             0
(1 row)

postgres=# select count(*) from pg_visibility_map('t3') where all_visible = 't';
 count 
-------
     0
(1 row)

可以很清晰的看到，relallvisible = 0，并且 Heap Fetches: 1，说明回表扫描了一个数据块，**说明 index only scan 由于 vacuum full 性能退化为普通的 index scan。**那么，这是一个 BUG 吗？光轮现象，将其归纳为 BUG 也并无道理，其实这个问题官方一直都留意到了，最早的邮件可以追溯到 2013 年 https://www.postgresql.org/message-id/20130906001437.GA29264%40momjian.us，迄今为止，还一直在 TODO list 里面

Allow VACUUM FULL and CLUSTER to update the visibility map

多个补丁 (2013、2019、2021、2022 年) 都在邮件列表上讨论，但由于各种原因，始终未合并到主分支中。官方把它归类为”值得做但尚未完成的优化项”，并保持在公开 TODO；因此更准确的说法是缺少一项优化，而不是可报告的缺陷，因为功能的正确性和一致性是没有问题的。按照大管家 BRUCE 的说法：

If updates and deletes happen, there will eventually be sufficient reason for autovacuum to vacuum the table and set proper VM bits..

运行 VACUUM FULL 的典型场景是”回收空间”或”解除膨胀”，它需要 8 级锁，通常是运维窗口里的一次性操作；几乎总会跟着跑 ANALYZE 或很快触发 autovacuum。缺少 VM 位只会在这段短暂时间里让某些查询退化为 Index Scan，收益与实现复杂度相比不够诱人。因为当前 vacuum full (heapam_relation_copy_for_cluster) 和设置 vm 的函数 (visibilitymap_set) 可以理解为两套”逻辑”，重写路径会完全绕过 shared buffers，而 visibilitymap_set() 之类 API 又是按 Buffer 设计的，所以官方的说法是：要么重写 VM 代码，要么把 rewrite 路径改成走缓冲区，两条都很重，可能得不偿失。

The main problem of that attempt was usage of existing heap_page_is_all_visible() and visibilitymap_set() functions. These functions work through the buffer manager, while heap rewriting is made bypass buffer manager

说白了，vacuum full 不常触发 + 通常会带有 vacuum analyze 的操作，其次只是会短时间内的性能有一定的劣化，再者为此实现置位代价太大。

小结

VACUUM FULL 不更新 Visibility Map 是历史遗留的实现细节/缺失优化，而非逻辑错误；这一项早已被列入官方 TODO，最早可追溯到 2013 年，未来某个版本可能会补上。若现阶段，你发现 IOS 的退化对于当前业务影响较为严重，可在 FULL 后再跑一次普通 VACUUM 或等待 autovacuum 的自动触发。

这个案例也充分说明了社区的宗旨是把稳健性放在首位，宁缺勿滥，即使 Index‑Only Scan 失效会带来可见的性能损失，核心开发者仍然坚持“只要实现路径有潜在一致性风险，就暂缓合并”。与其仓促加入一段难以验证的“离线 VM 写入”代码，社区更愿意保持现状，确保不会破坏数据可靠性。其次，渐进式演进，而非一次到位，VACUUM FULL 不更新 VM 从最早被发现、加入 TODO，到多轮 PoC 补丁不断迭代，社区一直在用“拆小步、持续改进”的方式推进，而不是逼迫单次提交解决所有边角。