PostgreSQL并发删除插入同一条记录时的奇怪现象及分析

wfkjxy · 发表于 2016-12-21 09:56:20

用sysbench 0.4.12对PostgreSQL进行压测时遇到了主键约束违反的错误。
然后发现原因在于PostgreSQL在并发执行依次更新删除插入同一条记录的事务时(比如就像下面的事务)，就可能会报主键约束违反的错误。

begin;
update tb1 set c=2 where id=1;
delete from tb1 where id=1;
insert into tb1 values(1,2);
commit;

完整的再现方法是这样的：
1. 建表

create table tb1(id int primary key,c int);
insert into tb1 values(1,1);

2. 准备SQL脚本
test.sql:

begin;
update tb1 set c=2 where id=1
delete from tb1 where id=1;
insert into tb1 values(1,2);
commit;

3. 执行测试

[postgres(at)localhost ~]$ pgbench -n -f test.sql -c 2 -j 2 -t 2
client 1 aborted in state 3: ERROR: duplicate key value violates unique
constraint "tb1_pkey"
DETAIL: Key (id)=(1) already exists.
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
number of transactions per client: 2
number of transactions actually processed: 2/4
latency average: 0.000 ms
tps = 130.047467 (including connections establishing)
tps = 225.060485 (excluding connections establishing)

4. 查看日志
事先已经配置PostgreSQL打印所有SQL

[postgres(at)localhost ~]$ cat pg95data/pg_log/postgresql-2015-10-25_141648.log
2015-10-25 14:16:48.144 EDT 57177 0 LOG: database system was shut down at 2015-10-25 14:16:47 EDT
2015-10-25 14:16:48.146 EDT 57177 0 LOG: MultiXact member wraparound protections are now enabled
2015-10-25 14:16:48.149 EDT 57175 0 LOG: database system is ready to accept connections
2015-10-25 14:16:48.150 EDT 57181 0 LOG: autovacuum launcher started
2015-10-25 14:16:57.960 EDT 57184 0 LOG: connection received: host=[local]
2015-10-25 14:16:57.961 EDT 57184 0 LOG: connection authorized: user=postgres database=postgres
2015-10-25 14:16:57.971 EDT 57186 0 LOG: connection received: host=[local]
2015-10-25 14:16:57.971 EDT 57187 0 LOG: connection received: host=[local]
2015-10-25 14:16:57.972 EDT 57186 0 LOG: connection authorized: user=postgres database=postgres
2015-10-25 14:16:57.972 EDT 57187 0 LOG: connection authorized: user=postgres database=postgres
2015-10-25 14:16:57.975 EDT 57186 0 LOG: statement: begin;
2015-10-25 14:16:57.975 EDT 57186 0 LOG: statement: update tb1 set c=2 where id=1
2015-10-25 14:16:57.975 EDT 57187 0 LOG: statement: begin;
2015-10-25 14:16:57.976 EDT 57187 0 LOG: statement: update tb1 set c=2 where id=1
2015-10-25 14:16:57.978 EDT 57186 39682 LOG: statement: delete from tb1 where id=1;
2015-10-25 14:16:57.979 EDT 57186 39682 LOG: statement: insert into tb1 values(1,2);
2015-10-25 14:16:57.979 EDT 57186 39682 LOG: statement: commit;
2015-10-25 14:16:57.980 EDT 57186 0 LOG: statement: begin;
2015-10-25 14:16:57.981 EDT 57186 0 LOG: statement: update tb1 set c=2 where id=1
2015-10-25 14:16:57.981 EDT 57187 39683 LOG: statement: delete from tb1 where id=1;
2015-10-25 14:16:57.981 EDT 57186 39684 LOG: statement: delete from tb1 where id=1;
2015-10-25 14:16:57.981 EDT 57186 39684 LOG: statement: insert into tb1 values(1,2);
2015-10-25 14:16:57.981 EDT 57186 39684 LOG: statement: commit;
2015-10-25 14:16:57.983 EDT 57187 39683 LOG: statement: insert into tb1 values(1,2);
2015-10-25 14:16:57.983 EDT 57187 39683 ERROR: duplicate key value violates unique constraint "tb1_pkey"
2015-10-25 14:16:57.983 EDT 57187 39683 DETAIL: Key (id)=(1) already exists.
2015-10-25 14:16:57.983 EDT 57187 39683 STATEMENT: insert into tb1 values(1,2);

分析这段日志，发现和我的认识不符，我一直认为事务里的第一条UPDATE会获得一个行锁，没有得到锁的事务会等到得到锁的事务提交后把锁释放，这样的话之后的操作就变成了串行操作，不会出现冲突。

于是，我把这个问题作为BUG提交到社区的Bug邮件列表。
http://www.postgresql.org/message-id/20151025110136.3017.39398@wrigleys.postgresql.org

结果社区不认为这是Bug，而与PG实现MVCC的机制有关。并且手册中确实也有说明。虽然UPDATE仍然是阻塞的，在持有行锁的那个事务提交后，读已提交隔离级别下，被解除阻塞的事务会再次进行更新操作。但是这次更新操作可能会看到不一致的数据快照。

From http://www.postgresql.org/docs/current/static/transaction-iso.html > Because of the above rule, it is possible for an updating command to see an
> inconsistent snapshot: it can see the effects of concurrent updating
> commands on the same rows it is trying to update, but it does not see
> effects of those commands on other rows in the database. This behavior
> makes Read Committed mode unsuitable for commands that involve complex
> search conditions;

然而,不光是UPDATE，SELECT ... FOR UPDATE也可能看到不一致的快照，实验如下:

1. SQL脚本
test10.sql:

begin;
select * from tb1 where id=1 for update;
update tb1 set c=2 where id=1;
delete from tb1 where id=1;
insert into tb1 values(1,2);
commit;
begin;
select * from tb1 where id=1 for update;
update tb1 set c=2 where id=1;
delete from tb1 where id=1;
insert into tb1 values(1,2);
commit;
...

以上内容重复多次

2. 执行测试

[postgres@localhost ~]$ psql -f test10.sql >b1.log 2>&1 &
[postgres@localhost ~]$ psql -f test10.sql >b2.log 2>&1 &

3. 查看日志
b1.log:

...
BEGIN
id | c
----+---
(0 rows)
UPDATE 0
DELETE 0
psql:test10.sql:29: ERROR: duplicate key value violates unique constraint "tb1_pkey"
DETAIL: Key (id)=(1) already exists.
ROLLBACK
...

从日志可以看出，“select * from tb1 where id=1 for update”看到了一个不一致的状态。这不就是“脏读”吗！

解释
那么，怎么解释这个事情？
PostgreSQL的处理逻辑是这样的（手册也有说明）：
两个事务并发更新同一条记录时会导致一个事务被锁住，持有锁的事务提交后，被解除阻塞的事务的隔离级别如果是“读已提交”则对更新对象行再次进行where条件评估，如果仍然满足原来的where条件这执行更新否则不更新。

需要注意的是，where条件的再评估是针对初始检索筛选出的行而不是对整个表重新执行检索，所以如果这期间有insert过来的新行也满足where条件，或者某个被更新的行从原来不满足where条件变成了满足where条件，是不会被处理的。
另外，被insert的行总被认为是新行，哪怕它的主键和之前刚刚删除的一行相同（我之前没有意识到这一点，所以老在纠结）。

关于这个问题的详细解释，如下
参考PG的MVCC实现原理，逻辑上的行由1个或多个行版本(tuple)构成，这些tuple通过内部的t_ctid指向最新版本的tuple。像下面这样.
开始时，逻辑行上只有1个tuple，它的t_ctid指向自己(0,1) 。

postgres=# begin;
BEGIN
postgres=# select * from tb1;
id | c
----+---
1 | 2
(1 row)
postgres=# SELECT * FROM heap_page_items(get_raw_page('tb1', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
1 | 8160 | 1 | 32 | 148302 | 0 | 0 | (0,1) | 2 | 11008 | 24 | |
(1 row)

UPDATE后，出现了2个tuple，第2个tuple是新版，所以这两个tuple的t_ctid都指向(0,2) 。

postgres=# update tb1 set c=2 where id=1;
UPDATE 1
postgres=# SELECT * FROM heap_page_items(get_raw_page('tb1', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
1 | 8160 | 1 | 32 | 148302 | 148304 | 0 | (0,2) | 16386 | 8960 | 24 | |
2 | 8128 | 1 | 32 | 148304 | 0 | 0 | (0,2) | 32770 | 10240 | 24 | |
(2 rows)

DELETE后，最新的tuple的t_xmax被标记上了删除它的事务的事务ID。

postgres=# delete from tb1 where id=1;
DELETE 1
postgres=# SELECT * FROM heap_page_items(get_raw_page('tb1', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
1 | 8160 | 1 | 32 | 148302 | 148304 | 0 | (0,2) | 16386 | 8960 | 24 | |
2 | 8128 | 1 | 32 | 148304 | 148304 | 0 | (0,2) | 40962 | 8224 | 24 | |
(2 rows)

到目前为止，UPDATE和DELETE都作用的tuple都可以用t_ctid串起来，我们可以姑且称之为“tuple链”。但是INSERT操作是不一样的，它开始了一个新的tuple链。

postgres=# insert into tb1 values(1,2);
INSERT 0 1
postgres=# SELECT * FROM heap_page_items(get_raw_page('tb1', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
1 | 8160 | 1 | 32 | 148302 | 148304 | 0 | (0,2) | 16386 | 8960 | 24 | |
2 | 8128 | 1 | 32 | 148304 | 148304 | 0 | (0,2) | 40962 | 8224 | 24 | |
3 | 8096 | 1 | 32 | 148304 | 0 | 2 | (0,3) | 2 | 2048 | 24 | |
(3 rows)

在我们的例子中，由于并发更新而被阻塞的UPDATE操作在阻塞解除后，根据它操作对象的tuple（第一个tuple）的t_ctid找到这个tuple链上最新的tuple，即第2个tuple（它没有在整个表上再次执行检索，所以他没有发现第3个tuple）。由于第2个tuple已经被删除了，所以UPDATE的影响行数是0。这个事务中后面的DELETE遇到了和UPDATE同样的问题，开始时它没有抢过其他并发事务，等其他并发事务执行完了，它同样没有看到新插入的行，所以DELETE的影响行数也是0。因为这个原因，执行INSERT操作时，表中已经有了相同key的记录了，作为报主键约束违反的错误。
关于PostgreSQL MVCC的原理，可以参考 http://blog.iyunv.com/uid-20726500-id-4040024.html。

分析到这里，我们可以知道，即使没有第一个update，就像下面这样，问题也能再现。

[postgres@localhost ~]$ cat test0.sql
begin;
delete from tb1 where id=1;
insert into tb1 values(1,2);
commit;
[postgres@localhost ~]$ pgbench -n -f test0.sql -c 2 -j 2 -t 1
client 0 aborted in state 2: ERROR: duplicate key value violates unique constraint "tb1_pkey"
DETAIL: Key (id)=(1) already exists.
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
number of transactions per client: 1
number of transactions actually processed: 1/2
latency average: 0.000 ms
tps = 94.393053 (including connections establishing)
tps = 223.788743 (excluding connections establishing)

所以关键在于，在一个读已提交事务中，delete + insert同一个key就可能出现问题。

如果没有主键会发生什么？
如果没有主键，你会发现事情会变得更糟。
先把主键去掉

drop table tb1;
create table tb1(id int,c int);
insert into tb1 values(1,1);

再次测试，没有报错。

[postgres@localhost ~]$ pgbench -n -f test0.sql -c 2 -j 2 -t 1
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
number of transactions per client: 1
number of transactions actually processed: 2/2
latency average: 0.000 ms
tps = 169.047418 (including connections establishing)
tps = 338.094836 (excluding connections establishing)

但你会发现插入了2条记录。

postgres=# select * from tb1;
id | c
----+---
1 | 2
1 | 2
(2 rows)

最后
这个问题的危害还是有限的。
1，首先如果持有锁的事务回滚了，不会出现任何问题，它看到状态还是来自一个已提交的事务（只不过这个状态不是最终状态），所以不能算是“脏读”。
2，其次只有更新操作（包括select ... for update）可能会看到不一致状态，只读操作不会。
3，在同一个事务中，先后删除再插入同一个key也没有什么意义，也就测试程序可能会这么干（如果你的应用也是这么写的，请改掉！）。

如果非要在一个事务中删除再插入同一个key(或者遇到其它的更新操作会看到不一致状态的场景)，可以把隔离级别调高到可重复读或可串行化。但是调高以后，你会发现错误消息变成了“并发冲突”。但这个变化是有意义的，“并发冲突”代表一个可以重试的错误，应用捕获到这个错误后可以尝试再次执行，而“主键冲突”的错误没有这层含义。那么，看上去这个问题仅仅成了一个错误消息不当的问题了（实际上当然不是这么简单）。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

Red Hat RHCE 8 (EX294) Cert Guide

c++ size_t 和 int 的区别

HERE 使用 AWS EF 和 JFrog Artifactory 打

C++ 指针大全：从基础到进阶，一篇快速上手

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

[经验分享] PostgreSQL并发删除插入同一条记录时的奇怪现象及分析

浏览过的版块

扫码加入运维网微信交流群