[pgcluster: 789] デッドロック発生後、無応答となる
小林 園子
sono-kobayashi @ hitachisoft.jp
2005年 6月 17日 (金) 12:11:01 JST
はじめまして。小林と申します。
PGCluster 1.1.0cフルパッケージ版をインストールして
高負荷時の動きを確認していたところ、
デッドロック発生後にロック解除をずっと待った状態となり、
無応答となってしまいました。
PGCluster 1.1.1a、PGCluster1.3.0cでも
同様に無応答となります。
クラスタDBを1台のみとした場合は、
デッドロック発生後、一方のトランザクションをロールバックします。
必ず発生しているわけではなく、
数回はロック解除を待機せずにロールバックするのですが、
ほとんどが無応答となります。
原因や解決方法がわかる方がいらっしゃいましたら
ご教授いただけないでしょうか?
また、同様な現象が発生している方はいらっしゃいますでしょうか?
よろしくお願いいたします。
<使用バージョン>
PGCluster1.1.0c
<OS>
Red Hat Enterprise Linux ES 3
<マシン構成>
ロードバランサ+レプリケーションサーバ 1台
クラスタDB 2台
<再現方法>
同一のテーブルに対し、以下の2つのUPDATE文を同時に実行する
1.update t_reservation_management set cdate=current_timestamp;
2.update t_reservation_management set
reservation_start_date='xxxxxx',reservation_last_date='xxxxx' where meeting_id ='';
互いのUPDATE文は、異なるカラムを更新している。
1はpsqlより手動でSQLを実行する。
2はJDBCで60回繰り返し実行する。1回のループごとにコネクションは解放する。
--------------------------------------------------------------
<ログ>
■レプリケーションサーバ
DEBUG:sem_lock[1]
DEBUG:send_sync_data():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1118969809,93361,33,1) )
DEBUG:PGRreturn_result():PGRreturn_result[]
DEBUG:PGRreturn_result():wait for answer
DEBUG:answer[PGR_QUERY_DONE_NOTICE_CMD]
DEBUG:read_answer():QUERY DONE
DEBUG:PGRreturn_result():status of PGRreturn_result[0]
DEBUG:sem_unlock[1]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:PGRreplicate_packet_send():cmdSts=T
DEBUG:PGRreplicate_packet_send():cmdType=U
DEBUG:PGRreplicate_packet_send():rlog=0
DEBUG:PGRreplicate_packet_send():request_id=489
DEBUG:PGRreplicate_packet_send():replicate_id=0
DEBUG:PGRreplicate_packet_send():port=5432
DEBUG:PGRreplicate_packet_send():pid=3449
DEBUG:PGRreplicate_packet_send():from_host=clusterdb2.pgcluster.jp
DEBUG:PGRreplicate_packet_send():dbName=mm_meet
DEBUG:PGRreplicate_packet_send():userName=postgre
DEBUG:PGRreplicate_packet_send():recieve sec=1118969809
DEBUG:PGRreplicate_packet_send():recieve usec=95226
DEBUG:PGRreplicate_packet_send():query_size=147
DEBUG:PGRreplicate_packet_send():query=update t_reservation_management set
reservation_start_date='2005-06-17 09:42',reservation_last_date='2005-06-17 21:42' where meeting_id
='BHCA0091'
DEBUG:sem_lock[1]
DEBUG:send_sync_data():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1118969809,95226,33,1) )
DEBUG:PGRreturn_result():PGRreturn_result[]
DEBUG:PGRreturn_result():wait for answer
DEBUG:answer[PGR_QUERY_DONE_NOTICE_CMD]
DEBUG:read_answer():QUERY DONE
DEBUG:PGRreturn_result():status of PGRreturn_result[0]
DEBUG:sem_unlock[1]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:PGRreplicate_packet_send():cmdSts=Q
DEBUG:PGRreplicate_packet_send():cmdType=U
DEBUG:PGRreplicate_packet_send():rlog=0
DEBUG:PGRreplicate_packet_send():request_id=4
DEBUG:PGRreplicate_packet_send():replicate_id=0
DEBUG:PGRreplicate_packet_send():port=5432
DEBUG:PGRreplicate_packet_send():pid=3591
DEBUG:PGRreplicate_packet_send():from_host=clusterdb1.pgcluster.jp
DEBUG:PGRreplicate_packet_send():dbName=mm_meet
DEBUG:PGRreplicate_packet_send():userName=postgre
DEBUG:PGRreplicate_packet_send():recieve sec=1118969809
DEBUG:PGRreplicate_packet_send():recieve usec=98656
DEBUG:PGRreplicate_packet_send():query_size=59
DEBUG:PGRreplicate_packet_send():query=update t_reservation_management set cdate=current_timestamp
DEBUG:sem_lock[1]
DEBUG:PGRreturn_result():PGRreturn_result[3,1118969809,98656,34,1]
DEBUG:PGRreturn_result():wait for answer
DEBUG:send_sync_data():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1118969809,98656,34,1) )
DEBUG:answer[PGR_LOCK_CONFLICT_NOTICE_CMD]
DEBUG:read_answer():LOCK CONFLICT !!
DEBUG:PGRreturn_result():status of PGRreturn_result[2]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:PGRreplicate_packet_send():cmdSts=T
DEBUG:PGRreplicate_packet_send():cmdType=U
DEBUG:PGRreplicate_packet_send():rlog=0
DEBUG:PGRreplicate_packet_send():request_id=490
DEBUG:PGRreplicate_packet_send():replicate_id=0
DEBUG:PGRreplicate_packet_send():port=5432
DEBUG:PGRreplicate_packet_send():pid=3449
DEBUG:PGRreplicate_packet_send():from_host=clusterdb2.pgcluster.jp
DEBUG:PGRreplicate_packet_send():dbName=mm_meet
DEBUG:PGRreplicate_packet_send():userName=postgre
DEBUG:PGRreplicate_packet_send():recieve sec=1118969809
DEBUG:PGRreplicate_packet_send():recieve usec=102684
DEBUG:PGRreplicate_packet_send():query_size=147
DEBUG:PGRreplicate_packet_send():query=update t_reservation_management set
reservation_start_date='2005-06-17 09:42',reservation_last_date='2005-06-17 21:42' where meeting_id
='BHCA0101'
DEBUG:sem_lock[1]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:replicate_loop():replicate_loop selected
DEBUG:PGRreplicate_packet_send():cmdSts=T
DEBUG:PGRreplicate_packet_send():cmdType=U
DEBUG:PGRreplicate_packet_send():rlog=3
DEBUG:PGRreplicate_packet_send():request_id=490
DEBUG:PGRreplicate_packet_send():replicate_id=0
DEBUG:PGRreplicate_packet_send():port=5432
DEBUG:PGRreplicate_packet_send():pid=3449
DEBUG:PGRreplicate_packet_send():from_host=clusterdb2.pgcluster.jp
DEBUG:PGRreplicate_packet_send():dbName=mm_meet
DEBUG:PGRreplicate_packet_send():userName=postgre
DEBUG:PGRreplicate_packet_send():recieve sec=1118969899
DEBUG:PGRreplicate_packet_send():recieve usec=112435
DEBUG:PGRreplicate_packet_send():query_size=147
DEBUG:PGRreplicate_packet_send():query=update t_reservation_management set
reservation_start_date='2005-06-17 09:42',reservation_last_date='2005-06-17 21:42' where meeting_id
='BHCA0101'
DEBUG:sem_lock[1]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:replicate_loop():replicate_loop selected
DEBUG:PGRreplicate_packet_send():cmdSts=T
DEBUG:PGRreplicate_packet_send():cmdType=U
DEBUG:PGRreplicate_packet_send():rlog=3
DEBUG:PGRreplicate_packet_send():request_id=490
DEBUG:PGRreplicate_packet_send():replicate_id=0
DEBUG:PGRreplicate_packet_send():port=5432
DEBUG:PGRreplicate_packet_send():pid=3449
DEBUG:PGRreplicate_packet_send():from_host=clusterdb2.pgcluster.jp
DEBUG:PGRreplicate_packet_send():dbName=mm_meet
DEBUG:PGRreplicate_packet_send():userName=postgre
DEBUG:PGRreplicate_packet_send():recieve sec=1118969989
DEBUG:PGRreplicate_packet_send():recieve usec=121566
DEBUG:PGRreplicate_packet_send():query_size=147
DEBUG:PGRreplicate_packet_send():query=update t_reservation_management set
reservation_start_date='2005-06-17 09:42',reservation_last_date='2005-06-17 21:42' where meeting_id
='BHCA0101'
DEBUG:sem_lock[1]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:PGRreplicate_packet_send():cmdSts=Q
・・・
■クラスタDB1(マスタDB)
LOG: XX000: could not create IPv6 socket: Address family not supported by protocol
LOCATION: StreamServerPort, pqcomm.c:334
LOG: 00000: database system was shut down at 2005-06-17 11:12:02 JST
LOCATION: StartupXLOG, xlog.c:2596
LOG: 00000: checkpoint record is at 0/12A2314
LOCATION: StartupXLOG, xlog.c:2628
LOG: 00000: redo record is at 0/12A2314; undo record is at 0/0; shutdown TRUE
LOCATION: StartupXLOG, xlog.c:2653
LOG: 00000: next transaction ID: 2032; next OID: 33527
LOCATION: StartupXLOG, xlog.c:2656
LOG: 00000: database system is ready
LOCATION: StartupXLOG, xlog.c:2946
ERROR: 57014: canceling query due to user request
LOCATION: ProcessInterrupts, postgres.c:2011
LOG: 08006: could not receive data from client: Connection reset by peer
LOCATION: pq_recvbuf, pqcomm.c:701
LOG: 08P01: unexpected EOF on client connection
LOCATION: SocketBackend, postgres.c:284
LOG: 00000: received smart shutdown request
LOCATION: pmdie, postmaster.c:1750
FATAL: 57P01: terminating connection due to administrator command
LOCATION: ProcessInterrupts, postgres.c:2002
FATAL: 57P01: terminating connection due to administrator command
LOCATION: ProcessInterrupts, postgres.c:2002
LOG: 00000: shutting down
LOCATION: ShutdownXLOG, xlog.c:3090
LOG: 00000: database system is shut down
LOCATION: ShutdownXLOG, xlog.c:3104
■クラスタDB2
LOG: XX000: could not create IPv6 socket: Address family not supported by protocol
LOCATION: StreamServerPort, pqcomm.c:334
LOG: 00000: database system was shut down at 2005-06-17 10:09:53 JST
LOCATION: StartupXLOG, xlog.c:2596
LOG: 00000: checkpoint record is at 0/D00E58
LOCATION: StartupXLOG, xlog.c:2628
LOG: 00000: redo record is at 0/D00E58; undo record is at 0/0; shutdown TRUE
LOCATION: StartupXLOG, xlog.c:2653
LOG: 00000: next transaction ID: 1011; next OID: 25449
LOCATION: StartupXLOG, xlog.c:2656
LOG: 00000: database system is ready
LOCATION: StartupXLOG, xlog.c:2946
LOG: 00000: received smart shutdown request
LOCATION: pmdie, postmaster.c:1750
FATAL: 57P01: terminating connection due to administrator command
LOCATION: ProcessInterrupts, postgres.c:2002
LOG: 00000: server process (PID 3530) was terminated by signal 11
LOCATION: LogChildExit, postmaster.c:2132
LOG: 00000: terminating any other active server processes
LOCATION: CleanupProc, postmaster.c:2053
WARNING: 57P02: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOCATION: quickdie, postgres.c:1843
LOG: 00000: all server processes terminated; reinitializing
LOCATION: reaper, postmaster.c:1965
LOG: 00000: database system was interrupted at 2005-06-17 11:12:09 JST
LOCATION: StartupXLOG, xlog.c:2610
LOG: 00000: checkpoint record is at 0/D00E58
LOCATION: StartupXLOG, xlog.c:2628
LOG: 00000: redo record is at 0/D00E58; undo record is at 0/0; shutdown TRUE
LOCATION: StartupXLOG, xlog.c:2653
LOG: 00000: next transaction ID: 1011; next OID: 25449
LOCATION: StartupXLOG, xlog.c:2656
LOG: 00000: database system was not properly shut down; automatic recovery in progress
LOCATION: StartupXLOG, xlog.c:2705
LOG: 00000: redo starts at 0/D00E98
LOCATION: StartupXLOG, xlog.c:2733
LOG: 00000: record with zero length at 0/F4F414
LOCATION: ReadRecord, xlog.c:1900
LOG: 00000: redo done at 0/F4F3F0
LOCATION: StartupXLOG, xlog.c:2765
LOG: 00000: database system is ready
LOCATION: StartupXLOG, xlog.c:2946
LOG: 00000: shutting down
LOCATION: ShutdownXLOG, xlog.c:3090
LOG: 00000: database system is shut down
LOCATION: ShutdownXLOG, xlog.c:3104
<設定ファイル>
■cluster.conf
#============================================================
# Cluster DB Server configuration file
#------------------------------------------------------------
# file: cluster.conf
#------------------------------------------------------------
# This file controls:
# o which hosts & port are replication server
# o which port use for replication request to replication server
# o which command use for recovery function
#============================================================
#------------------------------------------------------------
# set Replication Server information
# o Host_Name : hostname
# o Port : connection for postmaster
# o Recovery_Port : connection for recovery process
#------------------------------------------------------------
<Replicate_Server_Info>
<Host_Name> pgcluster.pgcluster.jp </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
<LifeCheck_Port> 8201 </LifeCheck_Port>
</Replicate_Server_Info>
#<Replicate_Server_Info>
# <Host_Name> replicate2.postgres.jp </Host_Name>
# <Port> 8002 </Port>
# <Recovery_Port> 8102 </Recovery_Port>
# <LifeCheck_Port> 8202 </LifeCheck_Port>
#</Replicate_Server_Info>
#<Replicate_Server_Info>
# <Host_Name> replicate3.postgres.jp </Host_Name>
# <Port> 8003 </Port>
# <Recovery_Port> 8103 </Recovery_Port>
# <LifeCheck_Port> 8203 </LifeCheck_Port>
#</Replicate_Server_Info>
#-------------------------------------------------------------
# set Cluster DB Server information
# o Recovery_Port : connection for recovery
# o Rsync_Path : path of rsync command
# o Rsync_Option : file transfer option for rsync
# o When_Stand_Alone : When all replication servers fell,
# you can set up two kinds of permittion,
# "real_only" or "read_write".
#-------------------------------------------------------------
<Recovery_Port> 7101 </Recovery_Port>
<LifeCheck_Port> 7201 </LifeCheck_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -2 </Rsync_Option>
<When_Stand_Alone> read_only </When_Stand_Alone>
<Status_Log_File> /usr/local/pgsql/log/cluster.sts </Status_Log_File>
<Error_Log_File> /usr/local/pgsql/log/cluster.log </Error_Log_File>
#-------------------------------------------------------------
# set partitional replicate control information
# set DB name and Table name to stop reprication
# o DB_Name : DB name
# o Table_Name : table name
#-------------------------------------------------------------
#<Not_Replicate_Info>
# <DB_Name> test_db </DB_Name>
# <Table_Name> log_table </Table_Name>
#</Not_Replicate_Info>
■pglb.conf
#============================================================
# Load Balance Server configuration file
#-------------------------------------------------------------
# file: pglb.conf
#-------------------------------------------------------------
# This file controls:
# o which hosts are db cluster server
# o which port use connect to db cluster server
# o how many connections are allowed on each DB server
#============================================================
#-------------------------------------------------------------
# set cluster DB server information
# o Host_Name : hostname
# o Port : connection for postmaster
# o Max_Connection : maximun number of connection to postmaster
#-------------------------------------------------------------
<Cluster_Server_Info>
<Host_Name> clusterdb1.pgcluster.jp</Host_Name>
<Port> 5432 </Port>
<Max_Connect> 100 </Max_Connect>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> clusterdb2.pgcluster.jp </Host_Name>
<Port> 5432 </Port>
<Max_Connect> 100 </Max_Connect>
</Cluster_Server_Info>
#<Cluster_Server_Info>
# <Host_Name> post3.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Max_Connect> 32 </Max_Connect>
#</Cluster_Server_Info>
#-------------------------------------------------------------
# set Load Balance server information
# o Backend_Socket_Dir : Unix domain socket path for the backend
# o Receive_Port : connection from client
# o Recovery_Port : connection for recovery process
# o LifeCheck_Port : connection for life check process
# o Max_Cluster_Num : maximun number of cluster DB servers
# o Use_Connection_Pooling : use connection pool [yes/no]
# o Max_Pool_Each_Server : number of pool connections/DB server
#-------------------------------------------------------------
<Backend_Socket_Dir> /tmp </Backend_Socket_Dir>
<Receive_Port> 5432 </Receive_Port>
<Recovery_Port> 6101 </Recovery_Port>
<LifeCheck_Port> 6201 </LifeCheck_Port>
<Max_Cluster_Num> 128 </Max_Cluster_Num>
<Use_Connection_Pooling> yes </Use_Connection_Pooling>
■pgreplicate.conf
#=============================================================
# PGReplicate configuration file
# for PGCluster-1.1.0a
#-------------------------------------------------------------
# file: pgreplicate.conf
#-------------------------------------------------------------
# This file controls:
# o which hosts & port are cluster server
# o which port use for replication request from cluster server
#=============================================================
#
#-------------------------------------------------------------
# A setup of Cluster DB(s)
#
# o Host_Name : The host name of Cluster DB.
# -- please write a host name by FQDN.
# -- do not write IP address.
# o Port : The connection port with postmaster.
# o Recovery_Port : The connection port at the time of
# a recovery sequence .
# o LifeCheck_Port : connection for life check process
#-------------------------------------------------------------
<Cluster_Server_Info>
<Host_Name> clusterdb1.pgcluster.jp </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7101 </Recovery_Port>
<LifeCheck_Port> 7201 </LifeCheck_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> clusterdb2.pgcluster.jp </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7101 </Recovery_Port>
<LifeCheck_Port> 7201 </LifeCheck_Port>
</Cluster_Server_Info>
#<Cluster_Server_Info>
# <Host_Name> cluster3.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Recovery_Port> 7101 </Recovery_Port>
# <LifeCheck_Port> 7201 </LifeCheck_Port>
#</Cluster_Server_Info>
#
#-------------------------------------------------------------
# A setup of Load Balance Server
#
# o Host_Name : The host name of a load balance server.
# -- please write a host name by FQDN.
# -- do not write IP address.
# o Recovery_Port : The connection port at the time of
# a recovery sequence .
# o LifeCheck_Port : connection for life check process
#-------------------------------------------------------------
<LoadBalance_Server_Info>
<Host_Name> pgcluster.pgcluster.jp </Host_Name>
<Recovery_Port> 6101 </Recovery_Port>
<LifeCheck_Port> 6201 </LifeCheck_Port>
</LoadBalance_Server_Info>
#------------------------------------------------------------
# A setup of the upper replication server for cascade connection.
#
# o Host_Name : The host name of Cluster DB.
# -- please write a host name by FQDN.
# -- do not write IP address.
# o Port : The connection port with postmaster.
# o Recovery_Port : The connection port at the time of
# a recovery sequence .
# o LifeCheck_Port : connection for life check process
#------------------------------------------------------------
#<Replicate_Server_Info>
# <Host_Name> upper_replicate.postgres.jp </Host_Name>
# <Port> 8001 </Port>
# <Recovery_Port> 8101 </Recovery_Port>
# <LifeCheck_Port> 8201 </LifeCheck_Port>
#</Replicate_Server_Info>
#
#-------------------------------------------------------------
# A setup of a replication server
#
# o Status_Log_File : logging file of cluster db's status
# o Error_Log_File : logging file of error and warning
# o Replicate_Port : connection for reprication
# o Recovery_Port : connection for recovery
# o LifeCheck_Port : connection for life check process
# o Response_mode : timing which returns a response
# normal -- return result of DB which received the query
# reliable -- return result after waiting for response of
# all Cluster DBs.
# o Use_Replication_Log : When this server hangs up without
# being replicated to the end,
# a remote server continues the
# replication using this log.
# yes -- use replication log
# no -- not use replication log
# o Reserved_Connections : The number of reserved connections
# from this replication server
# to each cluster dbs.
# (default is 1).
#-------------------------------------------------------------
<Status_Log_File> /usr/local/pgsql/log/pgreplicate.sts </Status_Log_File>
<Error_Log_File> /usr/local/pgsql/log/pgreplicate.log </Error_Log_File>
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
<LifeCheck_Port> 8201 </LifeCheck_Port>
以上です。
pgcluster メーリングリストの案内