DBA 表示惊呆了，流复制也不会同步序列？

流复制

Word count: 1.4kReading time: 6 min

 2024/09/10 

前言

这两天，一位学员问了这样一个问题：”主备的序列不会进行同步吗？”，逻辑复制场景下，我们知道序列并不会进行同步，而是直接复制的SQL语句，并没有真正的在订阅侧再次调用序列。

那么流复制呢？**流复制的现象就比较诡异，在一定时间内，也不会进行复制，看似会打破流复制的复制行为，实则是序列的独特实现方式导致！**让我们一探究竟。

复现

首先搭建一个异步主从

postgres=# create table t2(id serial,info text);
CREATE TABLE
postgres=# insert into t2(info) values('hello');
INSERT 0 1
postgres=# insert into t2(info) values('world');
INSERT 0 1
postgres=# select * from t2_id_seq ;
 last_value | log_cnt | is_called 
------------+---------+-----------
          2 |      31 | t
(1 row)

然后让我们在备库上瞅瞅

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

postgres=# select * from t2_id_seq ;
 last_value | log_cnt | is_called 
------------+---------+-----------
         33 |       0 | t
(1 row)

确实如这位读者所说，在备库该序列还是一个”初始状态”，并且在备库是不能调用 nextval 手动推进序列值的

1 2	postgres=# select nextval('t2_id_seq'); ERROR: cannot execute nextval() in a read-only transaction

说明这种行为是”写行为”。那么为什么会这样？既然是写行为，如果是一个强同步备库的话，备库挂了，序列还能正常调用吗？让我们验证一下：

postgres=# select sync_state,pid from pg_stat_replication;
 sync_state |  pid  
------------+-------
 sync       | 23631
(1 row)

然后关闭强同步备库

postgres=# begin;
BEGIN
postgres=*# insert into t2(info) values('world');
INSERT 0 1
postgres=*# commit ;
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
COMMIT

postgres=# begin;
BEGIN
postgres=*# select nextval('t2_id_seq');
 nextval 
---------
       4
(1 row)

postgres=*# rollback ;
ROLLBACK
postgres=# begin;
BEGIN
postgres=*# select nextval('t2_id_seq');
 nextval 
---------
       5
(1 row)

postgres=*# commit ;
COMMIT

奇怪的是，与刚刚的结论有点不符，虽然是写行为，但是 COMMIT 的时候，居然没有阻塞。

其实这和序列的 WAL 日志原理有关，我在之前文章中也提过，

/*
 * We don't want to log each fetching of a value from a sequence,
 * so we pre-log a few fetches in advance. In the event of
 * crash we can lose (skip over) as many values as we pre-logged.
 */
#define SEQ_LOG_VALS	32

...

		/*
		 * We don't log the current state of the tuple, but rather the state
		 * as it would appear after "log" more fetches.  This lets us skip
		 * that many future WAL records, at the cost that we lose those
		 * sequence values if we crash.
		 */
		XLogBeginInsert();
		XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);

		/* set values that will be saved in xlog */
		seq->last_value = next;
		seq->is_called = true;
		seq->log_cnt = 0;

序列和其他数据库对象一样，发生了变更需要记录到WAL中，但是假如每变化一次都记录一次，会对性能产生冲击，所以预先记录了一些值，也就是说，序列每 32 个值记录一次 WAL。

那么当调用了 32 次之后，理论上再下一次调用便会卡住，让我们验证一下：

postgres=# select * from t2_id_seq ;
 last_value | log_cnt | is_called 
------------+---------+-----------
         68 |       4 | t
(1 row)

postgres=# select nextval('t2_id_seq');
 nextval 
---------
      69
(1 row)

postgres=# select nextval('t2_id_seq');
 nextval 
---------
      70
(1 row)

postgres=# select * from t2_id_seq ;
 last_value | log_cnt | is_called 
------------+---------+-----------
         70 |       2 | t
(1 row)

postgres=# select nextval('t2_id_seq');
 nextval 
---------
      71
(1 row)

postgres=# select nextval('t2_id_seq');
 nextval 
---------
      72
(1 row)

postgres=# select nextval('t2_id_seq'); 		---此处卡住
---夯住

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.

这种机制主要为了提升性能，但是一旦数据库崩溃了，就会丢失部分值，丢失了 log_cnt 数量的序列。当然这个并没有什么大的危害，这是可能会对序列造成浪费，形成空洞。

[postgres@mypg ~]$ psql
psql (17devel)
Type "help" for help.

postgres=# select * from myseq ;
 last_value | log_cnt | is_called 
------------+---------+-----------
        231 |       0 | t
(1 row)

postgres=# select nextval('myseq');
 nextval 
---------
     232
(1 row)

postgres=# \q
[postgres@mypg ~]$ pg_ctl -D 17data restart -mi
waiting for server to shut down.... done
server stopped
waiting for server to start....2024-09-10 21:21:12.521 CST [23943] LOG:  redirecting log output to logging collector process
2024-09-10 21:21:12.521 CST [23943] HINT:  Future log output will appear in directory "log".
 done
server started
[postgres@mypg ~]$ psql
psql (17devel)
Type "help" for help.

postgres=# select * from myseq ;
 last_value | log_cnt | is_called 
------------+---------+-----------
        264 |       0 | t
(1 row)

然后下一次调用又会记录一次 WAL。

另外一个值得注意的地方是，每次 checkpoint 之后第一次调用 nextval，也会重置 log_cnt 为 32。

postgres=# checkpoint ;
CHECKPOINT
postgres=# select nextval('t2_id_seq');
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
postgres=# checkpoint ;
CHECKPOINT
postgres=# select nextval('t2_id_seq');
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.

这样的话，每次就需要记录一次 WAL，自然就需要阻塞等待了。

小结

序列为了提升性能，每 32 个值才会记录一次 WAL，这种行为会”违反”流复制下的复制行为，提交无需等待强同步备库 (因为实际上在 log_cnt 到达 32 之前压根就没有记录 WAL)，因此，备库的序列值也会和主库不一样，主库每到达一次 log_cnt，就记录一次 WAL，然后备库的序列值就加 32，不断循环往复。再加上 cache 机制，提前将序列值缓存到内存中，如果主库有大量会话同时使用，更会加剧序列的空洞。

如果发生了切换，或者 crash 了，都会产生一些空洞，如果在某些场景中，需要严格依赖连续的值，可以考虑用 setval 重置一下，可以参照 https://stackoverflow.com/questions/244243/how-to-reset-postgres-primary-key-sequence-when-it-falls-out-of-sync

参考

https://www.postgresql.org/message-id/712cad46-a9c8-1389-aef8-faf0203c9be9@enterprisedb.com

https://stackoverflow.com/questions/244243/how-to-reset-postgres-primary-key-sequence-when-it-falls-out-of-sync

Next Post

How to help others？
Previous Post

还以为死锁只在单机模式下才有？那你 OUT 了！

CATALOG

1. 前言
2. 复现
3. 小结
4. 参考