PostgreSQL学徒

关于表结构变更你必须知道的事

Word count: 6.1kReading time: 31 min
2022/03/17
loading

前言

线上同事又双叒叕找过来了,涉及到一个大表更改字段类型,这个表接近2TB,数星星属月亮,历经9小时都没有修改成功,无奈只能回退。这是为何?殊不知,在PostgreSQL中,对表结构进行变更也是有不小的学问在里面的。

更改字段

首先是本例中的更改字段类型,这个需求也最常见。看看官网对这块的说明

Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception, when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.

Adding a CHECK or NOT NULL constraint requires scanning the table to verify that existing rows meet the constraint, but does not require a table rewrite.

Similarly, when attaching a new partition it may be scanned to verify that existing rows meet the partition constraint.

修改现有列的类型会导致表被重写。作为例外,如果USING子句不更改列的内容并且旧类型与新类型二进制兼容(binary coercible),则不需要重写表。但是索引仍然需要重建。

什么是binary coercible?翻看官网,原来也可以被称作”binary compatible”,即二进制兼容

Two types that are binary coercible both ways are also referred to as binary compatible

比如varchar和text,就可以称作为二进制兼容。PostgreSQL内置了一些转换函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
postgres=# \dC+  
List of casts
Source type | Target type | Function | Implicit? | Description
-----------------------------+-----------------------------+--------------------+---------------+-------------
"char" | character | bpchar | in assignment |
"char" | character varying | text | in assignment |
"char" | integer | int4 | no |
"char" | text | text | yes |
abstime | date | date | in assignment |
abstime | integer | (binary coercible) | no |
abstime | time without time zone | time | in assignment |

................................

timestamp without time zone | timestamp with time zone | timestamptz | yes |
timestamp without time zone | timestamp without time zone | timestamp | yes |
xml | character | (binary coercible) | in assignment |
xml | character varying | (binary coercible) | in assignment |
xml | text | (binary coercible) | in assignment |
(246 rows)

看一下\dC+元命令的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
postgres=# \dC+
********* QUERY **********
SELECT pg_catalog.format_type(castsource, NULL) AS "Source type",
pg_catalog.format_type(casttarget, NULL) AS "Target type",
CASE WHEN c.castmethod = 'b' THEN '(binary coercible)'
WHEN c.castmethod = 'i' THEN '(with inout)'
ELSE p.proname
END AS "Function",
CASE WHEN c.castcontext = 'e' THEN 'no'
WHEN c.castcontext = 'a' THEN 'in assignment'
ELSE 'yes'
END AS "Implicit?",
d.description AS "Description"
FROM pg_catalog.pg_cast c LEFT JOIN pg_catalog.pg_proc p
ON c.castfunc = p.oid
LEFT JOIN pg_catalog.pg_type ts
ON c.castsource = ts.oid
LEFT JOIN pg_catalog.pg_namespace ns
ON ns.oid = ts.typnamespace
LEFT JOIN pg_catalog.pg_type tt
ON c.casttarget = tt.oid
LEFT JOIN pg_catalog.pg_namespace nt
ON nt.oid = tt.typnamespace
LEFT JOIN pg_catalog.pg_description d
ON d.classoid = c.tableoid AND d.objoid = c.oid AND d.objsubid = 0
WHERE ( (true AND pg_catalog.pg_type_is_visible(ts.oid)
) OR (true AND pg_catalog.pg_type_is_visible(tt.oid)
) )
ORDER BY 1, 2;

可以看到,第三列看到的(binary coercible)根据pg_cast系统表里面的castmethod字段而来,假如castmethod为b,也就是binary-coercible,f表明使用castfunc中指定的函数。 i表明使用输入/输出函数。 b表明该类型是二进制兼容的,因此不需要转换。

castmethod:Indicates how the cast is performed. f means that the function specified in the castfunc field is used. i means that the input/output functions are used. b means that the types are binary-coercible, thus no conversion is required.

而Implicit列,分为三种情况:

  • assignment: pg_cast.castcontext = a,表示在赋值给目标列时隐式调用, 和显式调用一样。也就是说在赋值时自动对类型进行转换。例如字段类型为text,而输入的类型为int,可以自动转换

    If the cast is marked AS ASSIGNMENT then it can be invoked implicitly when assigning a value to a column of the target data type. For example, supposing that foo.f1 is a column of type text, then:

    INSERT INTO foo (f1) VALUES (42);

    will be allowed if the cast from type integer to type text is marked AS ASSIGNMENT, otherwise not. (We generally use the term assignment cast to describe this kind of cast.)

  • no:pg_cast.castcontext = e,表示仅能作为一个显式转换(使用CAST::语法)。

  • yes:pg_cast.castcontext = i,表示在表达式中隐式调用,和其他转换一样。

简单测一下,比如int转化为boolean,castcontext为e,表示不会隐式调用。

1
2
3
4
5
6
7
8
9
10
11
12
13
postgres=# select castsource::regtype,casttarget::regtype,castfunc::regproc,castcontext,castmethod from pg_cast where castsource= 'int'::regtype and casttarget = 'boolean'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+-----------------+-------------+------------
integer | boolean | pg_catalog.bool | e | f
(1 row)

postgres=# create table cas_test(id int, c1 boolean);
CREATE TABLE
postgres=# insert into cas_test values (1, int '1');
ERROR: column "c1" is of type boolean but expression is of type integer
LINE 1: insert into cas_test values (1, int '1');
^
HINT: You will need to rewrite or cast the expression.

所以,需要我们使用CAST的方式强转

1
2
3
4
5
6
7
8
postgres=# select castsource::regtype,casttarget::regtype,castfunc::regproc,castcontext,castmethod from pg_cast where castsource= 'int'::regtype and casttarget = 'boolean'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+-----------------+-------------+------------
integer | boolean | pg_catalog.bool | e | f
(1 row)

postgres=# insert into cas_test values (1, cast ((int '1') as boolean));
INSERT 0 1

当然,你可以修改一下系统表,让其自动转化即可

1
2
3
4
5
6
7
8
9
10
11
12
13
postgres=# begin;
BEGIN
postgres=*# update pg_cast set castcontext = 'a' where castsource= 'int'::regtype and casttarget = 'boolean'::regtype;
UPDATE 1
postgres=*# insert into cas_test values (1, int '1');
INSERT 0 1
postgres=*# rollback ;
ROLLBACK
postgres=# insert into cas_test values (1, int '1');
ERROR: column "c1" is of type boolean but expression is of type integer
LINE 1: insert into cas_test values (1, int '1');
^
HINT: You will need to rewrite or cast the expression.

这样就不会报错了。不过并不是所有的转换都有显示,只包括那些不能从某些普通规则推导出的转换。

It should be noted that pg_cast does not represent every type conversion that the system knows how to perform; only those that cannot be deduced from some generic rule. For example, casting between a domain and its base type is not explicitly represented in pg_cast. Another important exception is that “automatic I/O conversion casts”, those performed using a data type’s own I/O functions to convert to or from text or other string types, are not explicitly represented in pg_cast.

比如int到text的转换,内置转化里就没有

1
2
3
4
5
6
7
8
9
postgres=# select castsource::regtype,casttarget::regtype,castfunc::regproc,castcontext,castmethod from pg_cast where castsource= 'int'::regtype and casttarget = 'text'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
(0 rows)

postgres=# create table test1(info text);
CREATE TABLE
postgres=# insert into test1 values(int '1');
INSERT 0 1

那看看timestamp到text

1
2
3
4
5
6
7
8
9
postgres=# select castsource::regtype,casttarget::regtype,castfunc::regproc,castcontext,castmethod from pg_cast where castsource= 'timestamp'::regtype and casttarget = 'text'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
(0 rows)

postgres=# create table test3(info text);
CREATE TABLE
postgres=# insert into test3 values(timestamp '2020-01-01');
INSERT 0 1

text到timestamp

1
2
3
4
5
6
7
postgres=# create table test4(info timestamp);
CREATE TABLE
postgres=# insert into test4 values(text '2020-01-01');
ERROR: column "info" is of type timestamp without time zone but expression is of type text
LINE 1: insert into test4 values(text '2020-01-01');
^
HINT: You will need to rewrite or cast the expression.

那么我们可以自定义一个CAST来实现,这里我们使用内置的IO函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
postgres=# \h create cast
Command: CREATE CAST
Description: define a new cast
Syntax:
CREATE CAST (source_type AS target_type)
WITH FUNCTION function_name [ (argument_type [, ...]) ]
[ AS ASSIGNMENT | AS IMPLICIT ]

CREATE CAST (source_type AS target_type)
WITHOUT FUNCTION
[ AS ASSIGNMENT | AS IMPLICIT ]

CREATE CAST (source_type AS target_type)
WITH INOUT
[ AS ASSIGNMENT | AS IMPLICIT ]

URL: https://www.postgresql.org/docs/14/sql-createcast.html

postgres=# create cast (text as timestamp) with inout as assignment;
CREATE CAST

postgres=# select castsource::regtype,casttarget::regtype,castfunc::regproc,castcontext,castmethod from pg_cast where castsource= 'text'::regtype and casttarget = 'timestamp'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+-----------------------------+----------+-------------+------------
text | timestamp without time zone | - | a | i
(1 row)

postgres=# insert into test4 values(text '2020-01-01');
INSERT 0 1

postgres=# drop cast (text as timestamp);
DROP CAST
postgres=# insert into test4 values(text '2020-01-01');
ERROR: column "info" is of type timestamp without time zone but expression is of type text
LINE 1: insert into test4 values(text '2020-01-01');
^
HINT: You will need to rewrite or cast the expression.

You can define a cast as an I/O conversion cast by using the WITH INOUT syntax. An I/O conversion cast is performed by invoking the output function of the source data type, and passing the resulting string to the input function of the target data type. In many common cases, this feature avoids the need to write a separate cast function for conversion. An I/O conversion cast acts the same as a regular function-based cast; only the implementation is different.

数据类型都有一个内置的IO转换函数,比如text、int等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
postgres=# \df textin 
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------+------------------+---------------------+------
pg_catalog | textin | text | cstring | func
(1 row)

postgres=# \df textout
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+---------+------------------+---------------------+------
pg_catalog | textout | cstring | text | func
(1 row)

postgres=# \df int4in
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------+------------------+---------------------+------
pg_catalog | int4in | integer | cstring | func
(1 row)

postgres=# \df timestamp_in
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------------+-----------------------------+-----------------------+------
pg_catalog | timestamp_in | timestamp without time zone | cstring, oid, integer | func
(1 row)

postgres=# \df varcharin
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-----------+-------------------+-----------------------+------
pg_catalog | varcharin | character varying | cstring, oid, integer | func
(1 row)

textin用于转化为内部的表示格式,而textout用于将内部格式转化为想要的格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
* textin - converts "..." to internal representation
*/
Datum
textin(PG_FUNCTION_ARGS)
{
char *inputText = PG_GETARG_CSTRING(0);

PG_RETURN_TEXT_P(cstring_to_text(inputText));
}

/*
* textout - converts internal representation to "..."
*/
Datum
textout(PG_FUNCTION_ARGS)
{
Datum txt = PG_GETARG_DATUM(0);

PG_RETURN_CSTRING(TextDatumGetCString(txt));
}

表是否要重写

有了这些铺垫之后,我们来简单测一下常见的场景,哪些情况会涉及到表的重写?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
postgres=# create table test(id int,info varchar(10),t_time timestamp);
CREATE TABLE
postgres=# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13890/23388
(1 row)

postgres=# begin;
BEGIN
postgres=*# alter table test alter column id type bigint;
ALTER TABLE
postgres=*# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13890/23394
(1 row)

postgres=*# alter table test alter column info type varchar(20);
ALTER TABLE
postgres=*# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13890/23394
(1 row)

postgres=*# alter table test alter column t_time type date;
ALTER TABLE
postgres=*# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13890/23397
(1 row)

...

以上省略了若干,通过实验可以发现

  • varchar(x) to varchar(y) when y>=x,不会发生重写
  • varchar(x) to varchar or text,不会发生重写
  • int to bigint,需要重写

减少表的重写是在9.2里面的一个新特性。https://www.postgresql.org/docs/9.2/release-9-2.html#AEN116571

Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.

下面做一个简单汇总整理:

  • varchar(x) to varchar(y) when y>=x. It works too if going from varchar(x) to varchar or text (no size limitation)
  • numeric(x,z) to numeric(y,z) when y>=x, or to numeric without specifier
  • varbit(x) to varbit(y) when y>=x, or to varbit without specifier
  • timestamp(x) to timestamp(y) when y>=x or timestamp without specifier
  • timestamptz(x) to timestamptz(y) when y>=x or timestamptz without specifier
  • interval(x) to interval(y) when y>=x or interval without specifier
  • timestamp to text、varchar、varchar(n),char(n),需要重写
  • timestamp(x) to text、varchar、varchar(n)、char(n),n>=x,需要重写
  • text to char、char(x)、varchar(n),需要重写
  • text to varchar,不需要重写
  • numeric(x) to numeric(y),y>=x,不需要重写
  • numeric(x) to numeric,不需要重写
  • numeric(x,y) to numeric,不需要重写

简而言之,就是字段长度或者是精度标度由小变大,可以不需要rewrite。但是注意对于int4到int8这种转化,还是需要重写的,因为底层存储不一样。

那么假如是无法转换的类型我们该怎么做呢?

1
2
3
4
5
postgres=# begin;
BEGIN
postgres=*# alter table test alter column id type timestamp;
ERROR: column "id" cannot be cast automatically to type timestamp without time zone
HINT: You might need to specify "USING id::timestamp without time zone".

可以使用using子句,不过这种方式本质上就是使用旧值重新计算了一次,所以也会发生重写,本身int转化为timestamp也没什么意义。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
postgres=# begin;
BEGIN
postgres=# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13890/23438
(1 row)

postgres=*# alter table test alter column id type date using (timestamp with time zone 'epoch' + id * interval '1 second');
ALTER TABLE
postgres=*# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13890/23441
(1 row)

或者使用前面的知识,创建一个CAST

1
2
3
4
5
6
7
8
postgres=# create cast (int as timestamp) with inout as assignment;
CREATE CAST
postgres=# begin;
BEGIN
postgres=*# alter table test alter column id type timestamp;
ALTER TABLE
postgres=*# rollback ;
ROLLBACK

不过这种不适用于表里已有数据的情况

1
2
3
4
postgres=# insert into test values(1);
INSERT 0 1
postgres=# alter table test alter column id type timestamp;
ERROR: invalid input syntax for type timestamp: "1"

那么我们可以怎么做呢?使用UDF来搞

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
postgres=# create or replace function cast_int_to_timestamp(int) returns timestamp as $$  
select timestamp with time zone 'epoch' + $1 * interval '1 second'
$$ language sql strict ;
CREATE FUNCTION
postgres=# create cast (int as timestamp) with function cast_int_to_timestamp as assignment;
CREATE CAST

postgres=# begin;
BEGIN
postgres=*# alter table test alter column id type timestamp;
ALTER TABLE
postgres=*# select * from test;
id
---------------------
1970-01-01 08:00:01
(1 row)

postgres=*# rollback ;
ROLLBACK
postgres=# select * from test;
id
----
1
(1 row)

以text转化为timestamp为例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
postgres=# \d test1
Table "public.test1"
Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
info | text | | |

postgres=# insert into test1 values('2022-03-16 00:00:00');
INSERT 0 1
postgres=# select * from test1;
info
---------------------
2022-03-16 00:00:00
(1 row)

postgres=# begin;
BEGIN
postgres=*# select pg_relation_filepath('test1');
pg_relation_filepath
----------------------
base/13890/23444
(1 row)

postgres=*# alter table test1 alter COLUMN info type timestamp;
ERROR: column "info" cannot be cast automatically to type timestamp without time zone
HINT: You might need to specify "USING info::timestamp without time zone".

postgres=# alter table test1 alter COLUMN info type timestamp using (info::timestamp);
ALTER TABLE
postgres=# select pg_relation_filepath('test1');
pg_relation_filepath
----------------------
base/13890/23583
(1 row)

在线修改

http://v0.pigsty.cc/zh/blog/2020/01/30/%E5%9C%A8%E7%BA%BF%E4%BF%AE%E6%94%B9pg%E5%AD%97%E6%AE%B5%E7%B1%BB%E5%9E%8B/

索引是否要重写

另外,官网上有一个BUG!各位可以去抓紧提BUG了,说不定15的致谢名单里面就有你😁在官网上写到,不管如何,索引都会要重写,但是实测并不会。

but any indexes on the affected columns must still be rebuilt. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
postgres=# create table test(id numeric(5,3),info varchar(10));
CREATE TABLE
postgres=# create index myidx on test(id);
CREATE INDEX
postgres=# create index myidx2 on test(info);
CREATE INDEX
postgres=# select pg_relation_filepath('test'),pg_relation_filepath('myidx'),pg_relation_filepath('myidx2');
pg_relation_filepath | pg_relation_filepath | pg_relation_filepath
----------------------+----------------------+----------------------
base/13890/23877 | base/13890/23880 | base/13890/23881
(1 row)

postgres=# set client_min_messages = debug5;
DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
LOG: duration: 0.078 ms statement: set client_min_messages = debug5;
SET
postgres=# begin;
DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
LOG: duration: 0.073 ms statement: begin;
BEGIN
postgres=*# alter table test alter column id type numeric(6,3);
LOG: duration: 1.014 ms statement: alter table test alter column id type numeric(6,3);
ALTER TABLE
postgres=*# select pg_relation_filepath('test'),pg_relation_filepath('myidx'),pg_relation_filepath('myidx2');
LOG: duration: 0.271 ms statement: select pg_relation_filepath('test'),pg_relation_filepath('myidx'),pg_relation_filepath('myidx2');
pg_relation_filepath | pg_relation_filepath | pg_relation_filepath
----------------------+----------------------+----------------------
base/13890/23877 | base/13890/23880 | base/13890/23881
(1 row)

可以看到,索引并没有重写。只有表发生了重写,索引才会重写。而下面这种情况的日志打印十分清晰:DEBUG: building index “myidx2” on table “test” serially,索引发生了重写。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
postgres=# begin;
DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
LOG: duration: 0.049 ms statement: begin;
BEGIN
postgres=*# alter table test alter column id type int;
DEBUG: rewriting table "test"
DEBUG: building index "myidx2" on table "test" serially
DEBUG: index "myidx2" can safely use deduplication
DEBUG: building index "myidx" on table "test" serially
DEBUG: index "myidx" can safely use deduplication
DEBUG: drop auto-cascades to type pg_temp_23877
DEBUG: drop auto-cascades to type pg_temp_23877[]
LOG: duration: 2.567 ms statement: alter table test alter column id type int;
ALTER TABLE
postgres=*# select pg_relation_filepath('test'),pg_relation_filepath('myidx'),pg_relation_filepath('myidx2');
LOG: duration: 0.249 ms statement: select pg_relation_filepath('test'),pg_relation_filepath('myidx'),pg_relation_filepath('myidx2');
pg_relation_filepath | pg_relation_filepath | pg_relation_filepath
----------------------+----------------------+----------------------
base/13890/23885 | base/13890/23889 | base/13890/23888
(1 row)

修改系统表

修改系统表就不推荐了,比较危险。

这里之前朋友”乡下人”分享了一个有趣的case,引用一下:

1
2
3
4
postgres=# create table t(id numeric(5));
CREATE TABLE
postgres=# alter table t alter COLUMN id type numeric(8,2);
ALTER TABLE

尝试修改元数据来修改字段类型

1
2
3
4
5
6
7
8
9
10
postgres=# SELECT atttypmod, attname
FROM pg_attribute
WHERE 1 = 1
AND attrelid = 't'::regclass
AND attname ='id'
ORDER BY atttypmod;
atttypmod | attname
-----------+---------
524294 | id
(1 row)

这里atttypmod有个公式:atttypmod = precision * 65,536 + scale + 4,524294 = 8*65,536+2+4

如果我们要修改为numeric(10,3),那我们计算一下:atttypmod = precision * 65,536 + scale + 4 = 655360 + 3 + 4 = 655367

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
postgres=# UPDATE pg_attribute
SET atttypmod = 655367
WHERE attrelid = 't'::regclass
AND attname = 'id';
UPDATE 1

postgres=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+---------------+-----------+----------+---------
id | numeric(10,3) | | |

postgres=# select * from t;
id
----------
88888.00 ---这里为.00
(1 row)

重新插入一条数据

1
2
3
4
5
6
7
8
postgres=# insert into t values(1);
INSERT 0 1
postgres=# select * from t;
id
----------
88888.00 ---老的.00
1.0000 ---新的.0000
(2 rows)

如果我们修改了元数据,老的数据的精度是不会变化的。

另外,这里还涉及到一个问题,各位可能发现,对于numeric的类型在pg_attribute里面是看不到精度的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
postgres=# select * from pg_attribute where attrelid = 't'::regclass and attname = 'id';
-[ RECORD 1 ]--+-------
attrelid | 23903
attname | id
atttypid | 1700
attstattarget | -1
attlen | -1
attnum | 1
attndims | 0
attcacheoff | -1
atttypmod | 655367
attbyval | f
attalign | i
attstorage | m
attcompression |
attnotnull | f
atthasdef | f
atthasmissing | f
attidentity |
attgenerated |
attisdropped | f
attislocal | t
attinhcount | 0
attcollation | 0
attacl |
attoptions |
attfdwoptions |
attmissingval |

不过难不倒stackoverflow上面的大神,可以使用如下SQL查看numeric的精度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
postgres=# SELECT
CASE atttypid
WHEN 21 /*int2*/ THEN 16
WHEN 23 /*int4*/ THEN 32
WHEN 20 /*int8*/ THEN 64
WHEN 1700 /*numeric*/ THEN
CASE WHEN atttypmod = -1
THEN null
ELSE ((atttypmod - 4) >> 16) & 65535 -- calculate the precision
END
WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
ELSE null
END AS numeric_precision,
CASE
WHEN atttypid IN (21, 23, 20) THEN 0
WHEN atttypid IN (1700) THEN
CASE
WHEN atttypmod = -1 THEN null
ELSE (atttypmod - 4) & 65535 -- calculate the scale
END
ELSE null
END AS numeric_scale,
*
FROM
pg_attribute
where attrelid = 't'::regclass and attname = 'id';
-[ RECORD 1 ]-----+-------
numeric_precision | 10
numeric_scale | 3
attrelid | 23903
attname | id
atttypid | 1700
attstattarget | -1
attlen | -1
attnum | 1
attndims | 0
attcacheoff | -1
atttypmod | 655367
attbyval | f
attalign | i
attstorage | m
attcompression |
attnotnull | f
atthasdef | f
atthasmissing | f
attidentity |
attgenerated |
attisdropped | f
attislocal | t
attinhcount | 0
attcollation | 0
attacl |
attoptions |
attfdwoptions |
attmissingval |

添加字段

下面就是添加字段了,当需要给表添加一列的时候,需要考虑如下几种情况:

v10

在10版本中,新增不含有默认值的字段不需要重写表,只需要更新系统表,因此DDL可以瞬间执行。如果新增的字段带默认值,则需要重写表,相当于vacuum full,表越大,执行时间越长。

  • ALTER TABLE table_name ADD COLUMN flag text; ——不会重写
  • ALTER TABLE table_name ADD COLUMN flag text DEFAULT 'default values'; ——需要重写

v11

When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time of the statement and the result stored in the table’s metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required.

Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.

在11版本中,新增带默认值的字段可以不用重写表,通过pg_attribute中的atthasmissing和attmissingval来标识,不过也需要分情况:

  • 新增的默认值假如是一个常量,不需要重写,比如alter table test add column info text default ‘hello’ not null;
  • 新增的默认值假如是stable或者immutable类型的函数,不需要重写表,比如alter table test add column t_time timestamp default now() not null;
  • 新增的默认值假如是volatile类型的函数,需要重写表,比如alter table test add column id int default random() not null;

所以假如是11以前的版本或者是volatile类型的函数,需要进行拆解,减少锁的粒度,比如需要给某个表添加带有默认值的列,需要拆分:

  • alter table test add column info text; ——短暂的AccessExclusiveLock,8级锁
  • alter table test alter column info set default 'Y'; ——短暂的AccessExclusiveLock,8级锁
  • update test set info = 'Y' where id between ... ——分段提交,RowExclusiveLock,3级锁

总之满足基本的加锁原则即可:

  • 够用即可:使用满足条件的锁中最弱的锁模式
  • 越快越好:如果可能,可以用(长时间的弱锁 + 短时间的强锁)替换长时间的强锁
  • 递增获取:遵循2PL原则申请锁;越晚使用激进锁策略越好;在真正需要时再获取。
  • 相同顺序:获取锁尽量以一致的顺序获取,从而减小死锁的几率

注意:添加序列会导致表的重写。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
postgres=# select pg_relation_filepath('t3');
pg_relation_filepath
----------------------
base/13287/16491
(1 row)

postgres=# insert into t3 select n,'test' from generate_series(1,10000000) as n;
INSERT 0 10000000
postgres=# \timing on
Timing is on.
postgres=# alter table t3 add column num serial;
ALTER TABLE
Time: 43466.880 ms (00:43.467)
postgres=# select pg_relation_filepath('t3');
pg_relation_filepath
----------------------
base/13287/16497
(1 row)

删除字段

那删除字段呢?删除字段在PostgreSQL是很快的,不过缺点就是空间无法立刻回收。

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

可以看到,删除字段只是将系统表中设为不可见,........pg.dropped.2........,秒删

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
postgres=# \dt+ test
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+------+-------+----------+-------------+---------------+-------+-------------
public | test | table | postgres | permanent | heap | 42 MB |
(1 row)

postgres=# alter table test drop column info;
ALTER TABLE
postgres=# \dt+ test
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+------+-------+----------+-------------+---------------+-------+-------------
public | test | table | postgres | permanent | heap | 42 MB |
(1 row)

postgres=# select * from pg_attribute where attrelid = 'test'::regclass and attname not in ('tableoid','cmax','xmax','cmin','xmin','ctid','id');
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attalign | attstorage | attcompression |
attnotnull | atthasdef | atthasmissing | attidentity | attgenerated | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions | attmissingval
----------+------------------------------+----------+---------------+--------+--------+----------+-------------+-----------+----------+----------+------------+----------------+------------+-----------+---------------+-------------+--------------+--------------+------------+-------------+--------------+--------+------------+---------------+---------------
23630 | ........pg.dropped.2........ | 0 | 0 | -1 | 2 | 0 | -1 | -1 | f | i | x | |
f | f | f | | | t | t | 0 | 100 | | | |
(1 row)

所以,表的大小并没有缩减。随着更新,空间会被回收,直接vacuum full或者cluster也可以。相反,这方面Oracle、MySQL对于大表删除字段会比较费力。

回顾

最后,让我们回到生产中的这个案例,生产中有一个分区表,其中id列是serial,按照create_date范围分区,目前该分区表已经接近2TB,由于serial最大是4字节,很快就要用完。并且更加骚的是,针对每个子表,id列是主键,只能考虑将serial改为bigserial,也就是

  • alter table xxx alter column id type bigint;
  • alter sequence xxx as bigint;

但是由于表太大,int改为bigint,有了前面的知识铺垫,表是需要重写的!所以第一次执行了9个小时都没有执行完,十分费力。最终考虑了两个方案:

  • 新增一列为bigint,然后批量更新,而增量数据采用触发器进行捕获,最后对表进行rename,因为数据没有更新和删除,所以触发器逻辑相对还好
  • 类似于时序数据,只插入不删除,所以也可以采用detach的方式,将不常用的分区先detach,更改完成之后,再attach回来

不过后来,我在喝水的间隙,突然灵光一现!

**主键是在不同的子分区创建的,而不是在父表上创建的!**因为假如是父表的话,必须要求主键列包含分区键

Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

分区表的唯一约束(也就是主键)必须包括所有的分区键列。存在这个限制是因为构成约束的各个索引只能在它们自己的分区中直接执行唯一性;因此,分区结构本身必须保证在不同的分区中不存在重复。

所以,每个子表的主键只能限制各个子分区,而这个表的场景是按照create_date进行分区,类似于时序数据,只有插入,时间戳数据一直增长,那么完全可以直接设置为cycle即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
postgres=# create table test(id serial,id2 int,info text) partition by range(id2);
CREATE TABLE
postgres=# create table test_part_1 partition of test for values from (1) to (10);
CREATE TABLE
postgres=# create table test_part_2 partition of test for values from (10) to (20);
CREATE TABLE
postgres=# create table test_part_3 partition of test for values from (20) to (30);
CREATE TABLE
postgres=# create table test_part_4 partition of test for values from (30) to (40);
CREATE TABLE
postgres=# create table test_part_5 partition of test for values from (40) to (50);
CREATE TABLE
postgres=# create table test_part_default partition of test default ;
CREATE TABLE

然后针对各个子表创建主键

1
2
3
4
5
6
7
8
9
10
11
12
13
14
postgres=# alter table test_part_1 add primary key(id);
ALTER TABLE
postgres=# alter table test_part_2 add primary key(id);
ALTER TABLE
postgres=# alter table test_part_3 add primary key(id);
ALTER TABLE
postgres=# alter table test_part_4 add primary key(id);
ALTER TABLE
postgres=# alter table test_part_5 add primary key(id);
ALTER TABLE
postgres=# alter sequence test_id_seq maxvalue 15;
ALTER SEQUENCE
postgres=# alter sequence test_id_seq cycle ;
ALTER SEQUENCE

插入一下数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
postgres=# insert into test(id2,info) select n,'test' from generate_series(1,20) as n;
INSERT 0 20
postgres=# select * from test_part_1;
id | id2 | info
----+-----+------
1 | 1 | test
2 | 2 | test
3 | 3 | test
4 | 4 | test
5 | 5 | test
6 | 6 | test
7 | 7 | test
8 | 8 | test
9 | 9 | test
(9 rows)

postgres=# select * from test_part_2;
id | id2 | info
----+-----+------
10 | 10 | test
11 | 11 | test
12 | 12 | test
13 | 13 | test
14 | 14 | test
15 | 15 | test
1 | 16 | test
2 | 17 | test
3 | 18 | test
4 | 19 | test
(10 rows)

可以看到,不同的子分区,数据是可以重复的。我简直是太机智了!

小结

原来在PostgreSQL中,修改表结构也是一门大学问!各位可以去提BUG了,说不定在15的release的致谢名单里面,就能看到你的名字了。🤣🤣。

参考

https://www.postgresql.org/docs/current/sql-altertable.html

https://www.postgresql.org/docs/current/catalog-pg-cast.html

https://github.com/digoal/blog/blob/master/201801/20180131_01.md

https://github.com/digoal/blog/blob/master/201710/20171013_03.md

https://github.com/digoal/blog/blob/master/201805/20180517_04.md

https://stackoverflow.com/questions/3350148/where-are-numeric-precision-and-scale-for-a-field-found-in-the-pg-catalog-tables

https://stackoverflow.com/questions/52376045/why-does-atttypmod-differ-from-character-maximum-length

https://www.depesz.com/2011/03/06/waiting-for-9-1-rewrite-less-changing-types-of-column/

https://www.postgresql.org/message-id/flat/CAMp9%3DExXtH0NeF%2BLTsNrew_oXycAJTNVKbRYnqgoEAT01t%3D67A%40mail.gmail.com

https://dba.stackexchange.com/questions/248985/changing-a-column-datatype-from-int-to-bigint

https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/postgres-tips-how-to-convert-2-billion-rows-to-bigint-with-citus/ba-p/1490128

https://stackoverflow.com/questions/48693373/does-alter-column-type-varcharn-rewrite-the-table-in-postgres-9-6

https://leopard.in.ua/2016/09/20/safe-and-unsafe-operations-postgresql#.YjKqsXpBzZT

https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Reduce_ALTER_TABLE_rewrites

https://dba.stackexchange.com/questions/241431/change-column-datatype-without-rewrite-when-data-remains-unchanged

CATALOG
  1. 1. 前言
  2. 2. 更改字段
    1. 2.1. 表是否要重写
      1. 2.1.1. 在线修改
    2. 2.2. 索引是否要重写
    3. 2.3. 修改系统表
  3. 3. 添加字段
    1. 3.1. v10
    2. 3.2. v11
  4. 4. 删除字段
  5. 5. 回顾
  6. 6. 小结
  7. 7. 参考