自省 自行 自醒

DBA 表示惊呆了,流复制也不会同步序列?

Word count: 1.4kReading time: 6 min
2024/09/10
loading

前言

这两天,一位学员问了这样一个问题:”主备的序列不会进行同步吗?”,逻辑复制场景下,我们知道序列并不会进行同步,而是直接复制的SQL语句,并没有真正的在订阅侧再次调用序列。

那么流复制呢?**流复制的现象就比较诡异,在一定时间内,也不会进行复制,看似会打破流复制的复制行为,实则是序列的独特实现方式导致!**让我们一探究竟。

复现

首先搭建一个异步主从

1
2
3
4
5
6
7
8
9
10
11
postgres=# create table t2(id serial,info text);
CREATE TABLE
postgres=# insert into t2(info) values('hello');
INSERT 0 1
postgres=# insert into t2(info) values('world');
INSERT 0 1
postgres=# select * from t2_id_seq ;
last_value | log_cnt | is_called
------------+---------+-----------
2 | 31 | t
(1 row)

然后让我们在备库上瞅瞅

1
2
3
4
5
6
7
8
9
10
11
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)

postgres=# select * from t2_id_seq ;
last_value | log_cnt | is_called
------------+---------+-----------
33 | 0 | t
(1 row)

确实如这位读者所说,在备库该序列还是一个”初始状态”,并且在备库是不能调用 nextval 手动推进序列值的

1
2
postgres=# select nextval('t2_id_seq');
ERROR: cannot execute nextval() in a read-only transaction

说明这种行为是”写行为”。那么为什么会这样?既然是写行为,如果是一个强同步备库的话,备库挂了,序列还能正常调用吗?让我们验证一下:

1
2
3
4
5
postgres=# select sync_state,pid from pg_stat_replication;
sync_state | pid
------------+-------
sync | 23631
(1 row)

然后关闭强同步备库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
postgres=# begin;
BEGIN
postgres=*# insert into t2(info) values('world');
INSERT 0 1
postgres=*# commit ;
^CCancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
COMMIT

postgres=# begin;
BEGIN
postgres=*# select nextval('t2_id_seq');
nextval
---------
4
(1 row)

postgres=*# rollback ;
ROLLBACK
postgres=# begin;
BEGIN
postgres=*# select nextval('t2_id_seq');
nextval
---------
5
(1 row)

postgres=*# commit ;
COMMIT

奇怪的是,与刚刚的结论有点不符,虽然是写行为,但是 COMMIT 的时候,居然没有阻塞。

其实这和序列的 WAL 日志原理有关,我在之前文章中也提过,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/*
* We don't want to log each fetching of a value from a sequence,
* so we pre-log a few fetches in advance. In the event of
* crash we can lose (skip over) as many values as we pre-logged.
*/
#define SEQ_LOG_VALS 32

...

/*
* We don't log the current state of the tuple, but rather the state
* as it would appear after "log" more fetches. This lets us skip
* that many future WAL records, at the cost that we lose those
* sequence values if we crash.
*/
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);

/* set values that will be saved in xlog */
seq->last_value = next;
seq->is_called = true;
seq->log_cnt = 0;

序列和其他数据库对象一样,发生了变更需要记录到WAL中,但是假如每变化一次都记录一次,会对性能产生冲击,所以预先记录了一些值,也就是说,序列每 32 个值记录一次 WAL。

那么当调用了 32 次之后,理论上再下一次调用便会卡住,让我们验证一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
postgres=# select * from t2_id_seq ;
last_value | log_cnt | is_called
------------+---------+-----------
68 | 4 | t
(1 row)

postgres=# select nextval('t2_id_seq');
nextval
---------
69
(1 row)

postgres=# select nextval('t2_id_seq');
nextval
---------
70
(1 row)

postgres=# select * from t2_id_seq ;
last_value | log_cnt | is_called
------------+---------+-----------
70 | 2 | t
(1 row)

postgres=# select nextval('t2_id_seq');
nextval
---------
71
(1 row)

postgres=# select nextval('t2_id_seq');
nextval
---------
72
(1 row)

postgres=# select nextval('t2_id_seq'); ---此处卡住
---夯住

^CCancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.

这种机制主要为了提升性能,但是一旦数据库崩溃了,就会丢失部分值,丢失了 log_cnt 数量的序列。当然这个并没有什么大的危害,这是可能会对序列造成浪费,形成空洞。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[postgres@mypg ~]$ psql
psql (17devel)
Type "help" for help.

postgres=# select * from myseq ;
last_value | log_cnt | is_called
------------+---------+-----------
231 | 0 | t
(1 row)

postgres=# select nextval('myseq');
nextval
---------
232
(1 row)

postgres=# \q
[postgres@mypg ~]$ pg_ctl -D 17data restart -mi
waiting for server to shut down.... done
server stopped
waiting for server to start....2024-09-10 21:21:12.521 CST [23943] LOG: redirecting log output to logging collector process
2024-09-10 21:21:12.521 CST [23943] HINT: Future log output will appear in directory "log".
done
server started
[postgres@mypg ~]$ psql
psql (17devel)
Type "help" for help.

postgres=# select * from myseq ;
last_value | log_cnt | is_called
------------+---------+-----------
264 | 0 | t
(1 row)

然后下一次调用又会记录一次 WAL。

另外一个值得注意的地方是,每次 checkpoint 之后第一次调用 nextval,也会重置 log_cnt 为 32。

1
2
3
4
5
6
7
8
9
10
11
12
postgres=# checkpoint ;
CHECKPOINT
postgres=# select nextval('t2_id_seq');
^CCancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
postgres=# checkpoint ;
CHECKPOINT
postgres=# select nextval('t2_id_seq');
^CCancel request sent
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.

这样的话,每次就需要记录一次 WAL,自然就需要阻塞等待了。

小结

序列为了提升性能,每 32 个值才会记录一次 WAL,这种行为会”违反”流复制下的复制行为,提交无需等待强同步备库 (因为实际上在 log_cnt 到达 32 之前压根就没有记录 WAL),因此,备库的序列值也会和主库不一样,主库每到达一次 log_cnt,就记录一次 WAL,然后备库的序列值就加 32,不断循环往复。再加上 cache 机制,提前将序列值缓存到内存中,如果主库有大量会话同时使用,更会加剧序列的空洞。

如果发生了切换,或者 crash 了,都会产生一些空洞,如果在某些场景中,需要严格依赖连续的值,可以考虑用 setval 重置一下,可以参照 https://stackoverflow.com/questions/244243/how-to-reset-postgres-primary-key-sequence-when-it-falls-out-of-sync

参考

https://www.postgresql.org/message-id/712cad46-a9c8-1389-aef8-faf0203c9be9@enterprisedb.com

https://stackoverflow.com/questions/244243/how-to-reset-postgres-primary-key-sequence-when-it-falls-out-of-sync

CATALOG
  1. 1. 前言
  2. 2. 复现
  3. 3. 小结
  4. 4. 参考