[pgcluster: 473] リカバリー中に更新系クエリを発行するとエラーになる

kazunari takahashi kazunari.takahashi @ ctc-g.co.jp
2004年 8月 9日 (月) 20:14:16 JST


高橋です.

修正版で、検証してみました.
シーケンスの同期に関しては修正されておりました.
大変ありがとうございます.

新たに不具合を発見してしまいました.

####################################
現象
####################################

・リカバリー処理中に、ロードバランサー経由でinsert文を発行すると、
  エラーになりリカバリーされない.
  #insert文を発行しなければ正常にリカバリーされます.

####################################
#環境
####################################
pgcluster-1.0.7av12

pglb × 1 (solaris8 sparc)
clusterDB ×3 (solaris8 sparc) host名:serverA , serverB , serverC
rgrp × 1 (solaris8 sparc)

####################################
#手順
####################################

-------------------------
1.テーブル作成
-------------------------

test=> create TABLE test(id serial , name varchar(10));
NOTICE:  CREATE TABLE will create implicit sequence 'test_id_seq' for SERIAL column 'test.id'
CREATE TABLE

-------------------------
2.データをloadbalancer経由でインサート
-------------------------

test=> insert into test(name) values('a');
INSERT 33366 1

3clusterDBとも正常にインサートされる.

----------------------------
3.serverCのpostmasterを落とす
-----------------------------

test=> insert into test(name) values('b');
サーバA,Bには正常にインサート
test=> select * from test ;
 id | name
----+------
  1 | a
  2 | b

----------------------------
3.更にインサート
----------------------------
test=> insert into test(name) values('b');

#pgreplicate.sts,pglb.stsに以下のメッセージ

Thu Aug  5 21:19:29 2004  port(5432) host:serverC error

----------------------------
4.serverCのpostmasterを起動
----------------------------
$ pg_ctl start -D /usr/local/pgsql/data -o "-i -R"
Start in recovery mode!
Please wait until a data synchronization finishes from Master DB...
postmaster successfully started

----------------------------
5.更にインサートを1件
----------------------------

test=> insert into test(name) values('c');
「OK!  The data synchronization with Master DB was finished.」の
メッセージが出てから上記SQLをloadblancerから1回実行

※この時、しばらくプロンプトが帰ってこない.

serverA,serverB:
test=> select * from test ;
 id | name
----+------
  1 | a
  2 | b
  3 | b
  4 | c
(4 rows)

serverC:
test=> select * from test ;
 id | name
----+------
  1 | a
  2 | b
  3 | b
(3 rows)

----------------------------
6.リカバリー終了後にインサートを1件
----------------------------

test=> insert into test(name) values('d');
serverA,serverB:
test=> select * from test ;
 id | name
----+------
  1 | a
  2 | b
  3 | b
  4 | c
  5 | d
(5 rows)

serverC:
test=> select * from test ;
 id | name
----+------
  1 | a
  2 | b
  3 | b
(3 rows)


----------------------------------------------------------
DEBUGログ
----------------------------------------------------------
DEBUG(pgrecovery_loop): recovery accept port 7778

DEBUG(read_packet): receive packet no:1

DEBUG(first_setup_recovery): 1st setup target serverC

DEBUG(first_setup_recovery): 1st setup port 5432

DEBUG(first_setup_recovery): check another recovery process

DEBUG(PGRsem_unlock): add recovery target to host table

DEBUG(send_recovery_packet): set RECOVERY_PGDATA_REQ packet data

DEBUG(PGRsend_replicate_packet_to_server): host(serverA) : port(5432)
DEBUG(getTransactionTbl): not found in transaction tbl host  db:template1 pid:42
6
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(getTransactionTbl): not found in transaction tbl host  db:template1 pid:42
6
DEBUG(pgr_createConn): PQsetdbLogin host[serverA] port[5432] db[template1] user[postgres]
DEBUG(replicate_loop): wait replicate

DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432 user:postgres host:serverA
 query:VACUUM
DEBUG(insertTransactionTbl): sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,0
,0,2) )
DEBUG(insertTransactionTbl): PQexec send :VACUUM
DEBUG(insertTransactionTbl): PQexec end
DEBUG(getTransactionTbl): hit !! transaction tbl host serverA db:template1 pid
:426
DEBUG(getTransactionTbl): send packet to master serverA recoveryPort 7779

DEBUG(PGRsem_unlock): wait answer from master server

DEBUG(read_packet_from_master): wait

DEBUG(read_packet): get answer from master

DEBUG(send_recovery_packet): 1st master serverA - 5432
DEBUG(send_recovery_packet): 1st target serverC - 5432
DEBUG(replicate_loop): wait replicate

DEBUG(read_packet): receive packet no:5

DEBUG(PGRsend_replicate_packet_to_server): host(serverA) : port(5432)
DEBUG(getTransactionTbl): not found in transaction tbl host  db:template1 pid:42
6
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(getTransactionTbl): not found in transaction tbl host  db:template1 pid:42
6
DEBUG(pgr_createConn): PQsetdbLogin host[serverA] port[5432] db[template1] user[postgres]
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432 user:postgres host:serverA
 query:VACUUM FULL
DEBUG(insertTransactionTbl): sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,2) )
DEBUG(replicate_loop): wait replicate

DEBUG(insertTransactionTbl): PQexec send :VACUUM FULL
DEBUG(insertTransactionTbl): PQexec end
DEBUG(getTransactionTbl): hit !! transaction tbl host serverA db:template1 pid
:426
DEBUG(read_packet_from_master): wait

DEBUG(send_recovery_packet): 2nd master serverA - 5432
DEBUG(send_recovery_packet): 2nd target serverC - 5432
DEBUG(send_recovery_packet): second_setup_recovery end :1
DEBUG(replicate_loop): wait replicate

DEBUG(pgrecovery_loop): recovery accept port 7778

DEBUG(read_packet): receive packet no:9

DEBUG(read_packet): last master serverA - 5432
DEBUG(read_packet): last target serverC - 5432
DEBUG(PGRsem_unlock): replicate_loop selected

DEBUG(PGRread_packet): query :: insert into test (name) values('c')

DEBUG(PGRset_queue): msgsnd header rtn:0
DEBUG(PGRset_queue): msgsnd query[insert into test (name) values('c')] rtn:0
DEBUG(PGRreplicate_packet_send): cmdSts=Q

DEBUG(PGRreplicate_packet_send): cmdType=I

DEBUG(PGRreplicate_packet_send): port=5432

DEBUG(PGRreplicate_packet_send): pid=29795

DEBUG(PGRreplicate_packet_send): from_host=serverA

DEBUG(PGRreplicate_packet_send): dbName=test

DEBUG(PGRreplicate_packet_send): userName=test

DEBUG(PGRreplicate_packet_send): recieve sec=1092047364

DEBUG(PGRreplicate_packet_send): recieve usec=53901

DEBUG(PGRreplicate_packet_send): query_size=35

DEBUG(PGRreplicate_packet_send): query=insert into test (name) values('c')

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRget_master): serverA skipped

DEBUG(is_need_response): sem_lock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:5432 @ serverA host:5432 @ serverB

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:serverB

DEBUG(PGRsend_replicate_packet_to_server): host(serverB) : port(5432)
DEBUG(getTransactionTbl): hit !! transaction tbl host serverB db:test pid:2979
5
DEBUG(getTransactionTbl): db:test port:5432 user:test host:serverB query:insert into test (name) values('c')
DEBUG(getTransactionTbl): sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1092047364,53901,2) )
DEBUG(getTransactionTbl): PQexec send :insert into test (name) values('c')
DEBUG(getTransactionTbl): PQexec end
DEBUG(delete_template): sem_lock[2]
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(delete_template): PGRreplicate_packet_send end
DEBUG(PGRreturn_result): PGRreturn_result[6]
DEBUG(PGRreturn_result): [128 / 128] send
DEBUG(PGRreturn_result): 128 send
DEBUG(PGRreturn_result): status of PGRreturn_result[0]
DEBUG(PGRreturn_result): sem_unlock[2]
DEBUG(PGRsem_unlock): replicate_loop selected

DEBUG(replicate_loop): wait replicate

DEBUG(PGRsend_queue): master serverA - 5432

DEBUG(PGRget_HostTbl): target serverC - 5432

DEBUG(PGRget_HostTbl): msgrcv[insert into test (name) values('c')]
DEBUG(PGRsend_replicate_packet_to_server): host(serverA) : port(5432)
DEBUG(getTransactionTbl): hit !! transaction tbl host serverA db:test pid:2979
5
DEBUG(getTransactionTbl): db:test port:5432 user:test host:serverA query:inser
t into test (name) values('c')
DEBUG(getTransactionTbl): sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1092
047364,53901,2) )
ERROR(getTransactionTbl): PGRsend_replicate_packet_to_server to master error
DEBUG(getTransactionTbl): send_queue return status -1

ERROR(getTransactionTbl): PGRsend_queue failed
DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(PGRread_packet): replicate_loop selected

DEBUG(PGRread_packet): query :: insert into test (name) values('d')

DEBUG(PGRreplicate_packet_send): cmdSts=Q

DEBUG(PGRreplicate_packet_send): cmdType=I

DEBUG(PGRreplicate_packet_send): port=5432

DEBUG(PGRreplicate_packet_send): pid=29795

DEBUG(PGRreplicate_packet_send): from_host=serverA

DEBUG(PGRreplicate_packet_send): dbName=test

DEBUG(PGRreplicate_packet_send): userName=test

DEBUG(PGRreplicate_packet_send): recieve sec=1092047456

DEBUG(PGRreplicate_packet_send): recieve usec=624724

DEBUG(PGRreplicate_packet_send): query_size=35

DEBUG(PGRreplicate_packet_send): query=insert into test (name) values('d')

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): 5432 @ serverA return trigger
DEBUG(is_need_sync_time): sem_lock[1]
DEBUG(PGRreturn_result): PGRreturn_result[3,1092047456,624724,2]
DEBUG(PGRreturn_result): [128 / 128] send
DEBUG(PGRreturn_result): 128 send
DEBUG(PGRreturn_result): wait for answer
DEBUG(PGRread_packet): answer[PGR_QUERY_DONE_NOTICE_CMD]
DEBUG(PGRread_packet): QUERY DONE
DEBUG(PGRread_packet): status of PGRreturn_result[0]
DEBUG(PGRread_packet): sem_lock[2]
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:5432 @ serverA host:5432 @ serverB

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:serverB

DEBUG(PGRsend_replicate_packet_to_server): host(serverB) : port(5432)
DEBUG(getTransactionTbl): hit !! transaction tbl host serverB db:test pid:2979
5
DEBUG(getTransactionTbl): db:test port:5432 user:test host:serverB query:insert into test (name) values('d')
DEBUG(getTransactionTbl): sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1092047456,624724,2) )
DEBUG(getTransactionTbl): PQexec send :insert into test (name) values('d')
DEBUG(getTransactionTbl): PQexec end
DEBUG(delete_template): sem_lock[3]
DEBUG(PGRsem_lock): sem_unlock[2]
DEBUG(delete_template): PGRreplicate_packet_send end
DEBUG(PGRreturn_result): PGRreturn_result[6]
DEBUG(PGRreturn_result): [128 / 128] send
DEBUG(PGRreturn_result): 128 send
DEBUG(PGRreturn_result): status of PGRreturn_result[0]
DEBUG(delete_template): sem_unlock[3]
DEBUG(replicate_loop): wait replicate




pgcluster メーリングリストの案内