[pgcluster: 887] pgbenchで固まってしまいます
shinozaki @ cobalt.co.jp
shinozaki @ cobalt.co.jp
2006年 7月 13日 (木) 17:55:25 JST
前にもレプリケートで失敗する件でお世話になりました、
篠崎と申します。
その時は、hostsファイルが原因だったのですが、
今回は、ちょっと現象が違うので何かわかればと思って、
メールさせていただきました。
今回もレプリケート関連といえば、レプリケート関連なのですが、
pgbenchでコネクション数を増やすとレプリケーションの処理が固まってしまうのです。
だいたい10ぐらいになるとそのままフリーズしてしまいます。
症状ですが、デバッグ文を出力した所、以下の2パターンで固まることがわかりました。
使用したモジュールは、PGCluster-1.5.0rc7
OSは RHEL4
カーネルは、2.6.9-11になります。
CPUはXeonの3G(HT−OFF)
メモリー2Gになります。
パターン1
2006-07-13 16:51:02 [13430] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13431] DEBUG:cmdSts=T
2006-07-13 16:51:02 [13431] DEBUG:cmdType=I
2006-07-13 16:51:02 [13431] DEBUG:rlog=0
2006-07-13 16:51:02 [13431] DEBUG:port=5432
2006-07-13 16:51:02 [13431] DEBUG:pid=5917
2006-07-13 16:51:02 [13431] DEBUG:from_host=192.168.0.52
2006-07-13 16:51:02 [13431] DEBUG:dbName=bench
2006-07-13 16:51:02 [13431] DEBUG:userName=postgres
2006-07-13 16:51:02 [13431] DEBUG:recieve sec=1152777062
2006-07-13 16:51:02 [13431] DEBUG:recieve usec=469697
2006-07-13 16:51:02 [13431] DEBUG:query_size=95
2006-07-13 16:51:02 [13431] DEBUG:request_id=6
2006-07-13 16:51:02 [13431] DEBUG:replicate_id=0
2006-07-13 16:51:02 [13431] DEBUG:query=INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (5, 1, 54163, 6573, CURRENT_TIMESTAMP)
2006-07-13 16:51:02 [13431] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13428] DEBUG:read_answer():QUERY DONE
2006-07-13 16:51:02 [13428] DEBUG:end thread_send_source()
2006-07-13 16:51:02 [13428] DEBUG:sem_unlock[1]
2006-07-13 16:51:02 [13430] DEBUG:sem_lock [1] got it
2006-07-13 16:51:02 [13430] DEBUG:PGRreplicate_packet_send():checking host bngdb01 for creating threads
2006-07-13 16:51:02 [13430] DEBUG:[0] is not same host
2006-07-13 16:51:02 [13430] DEBUG:PGRreplicate_packet_send():checking host bngdb02 for creating threads
2006-07-13 16:51:02 [13430] DEBUG:[1] is same host
2006-07-13 16:51:02 [13326] DEBUG:start thread_send_cluster()
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,140,0,1) )
2006-07-13 16:51:02 [13428] DEBUG:cmdSts=O
2006-07-13 16:51:02 [13428] DEBUG:cmdType=x
2006-07-13 16:51:02 [13428] DEBUG:rlog=0
2006-07-13 16:51:02 [13428] DEBUG:port=5432
2006-07-13 16:51:02 [13428] DEBUG:pid=6489
2006-07-13 16:51:02 [13428] DEBUG:from_host=192.168.0.51
2006-07-13 16:51:02 [13428] DEBUG:dbName=bench
2006-07-13 16:51:02 [13428] DEBUG:userName=postgres
2006-07-13 16:51:02 [13428] DEBUG:recieve sec=1152777062
2006-07-13 16:51:02 [13428] DEBUG:recieve usec=474077
2006-07-13 16:51:02 [13428] DEBUG:query_size=21
2006-07-13 16:51:02 [13428] DEBUG:request_id=0
2006-07-13 16:51:02 [13428] DEBUG:replicate_id=1797
2006-07-13 16:51:02 [13428] DEBUG:query=PGR_CLOSE_CONNECTION
2006-07-13 16:51:02 [13428] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13433] DEBUG:cmdSts=T
2006-07-13 16:51:02 [13433] DEBUG:cmdType=I
2006-07-13 16:51:02 [13433] DEBUG:rlog=0
2006-07-13 16:51:02 [13433] DEBUG:port=5432
2006-07-13 16:51:02 [13433] DEBUG:pid=6486
2006-07-13 16:51:02 [13433] DEBUG:from_host=192.168.0.51
2006-07-13 16:51:02 [13433] DEBUG:dbName=bench
2006-07-13 16:51:02 [13433] DEBUG:userName=postgres
2006-07-13 16:51:02 [13433] DEBUG:recieve sec=1152777062
2006-07-13 16:51:02 [13433] DEBUG:recieve usec=475208
2006-07-13 16:51:02 [13433] DEBUG:query_size=96
2006-07-13 16:51:02 [13433] DEBUG:request_id=6
2006-07-13 16:51:02 [13433] DEBUG:replicate_id=0
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():sync_command returns SYSTEM_COMMAND
2006-07-13 16:51:02 [13433] DEBUG:query=INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (10, 1, 21569, 3724, CURRENT_TIMESTAMP)
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():execute query(UPDATE branches SET bbalance = bbalance + 9676 WHERE bid = 1)
2006-07-13 16:51:02 [13433] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():PQexec status error
2006-07-13 16:51:02 [13326] DEBUG:thread_send_cluster():return value from send_replicate_packet_to_server() is -1
2006-07-13 16:51:02 [13326] DEBUG:thread_send_cluster():pthread_exit[0]
2006-07-13 16:51:02 [13430] DEBUG:read_answer():LOCK CONFLICT !!
2006-07-13 16:51:02 [13430] DEBUG:end thread_send_source()
2006-07-13 16:51:02 [13430] DEBUG:sem_unlock[1]
2006-07-13 16:51:02 [13430] DEBUG:PGRdo_replicate():PGRreplicate_packet_send returns 0
2006-07-13 16:51:02 [13431] DEBUG:sem_lock [1] got it
2006-07-13 16:51:02 [13431] DEBUG:PGRreplicate_packet_send():checking host bngdb01 for creating threads
2006-07-13 16:51:02 [13431] DEBUG:[0] is not same host
2006-07-13 16:51:02 [13431] DEBUG:PGRreplicate_packet_send():checking host bngdb02 for creating threads
2006-07-13 16:51:02 [13431] DEBUG:[1] is same host
2006-07-13 16:51:02 [13326] DEBUG:start thread_send_cluster()
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,141,0,1) )
ここで固まってしまします。
パターン2
2006-07-13 16:53:57 [13868] DEBUG:cmdSts=O
2006-07-13 16:53:57 [13868] DEBUG:cmdType=x
2006-07-13 16:53:57 [13868] DEBUG:rlog=0
2006-07-13 16:53:57 [13868] DEBUG:port=5432
2006-07-13 16:53:57 [13868] DEBUG:pid=5996
2006-07-13 16:53:57 [13868] DEBUG:from_host=192.168.0.52
2006-07-13 16:53:57 [13868] DEBUG:dbName=bench
2006-07-13 16:53:57 [13868] DEBUG:userName=postgres
2006-07-13 16:53:57 [13868] DEBUG:recieve sec=1152777237
2006-07-13 16:53:57 [13868] DEBUG:recieve usec=427657
2006-07-13 16:53:57 [13868] DEBUG:query_size=21
2006-07-13 16:53:57 [13868] DEBUG:request_id=0
2006-07-13 16:53:57 [13868] DEBUG:replicate_id=1839
2006-07-13 16:53:57 [13868] DEBUG:query=PGR_CLOSE_CONNECTION
2006-07-13 16:53:57 [13868] DEBUG:sem_lock [1] req
2006-07-13 16:53:57 [13871] DEBUG:cmdSts=T
2006-07-13 16:53:57 [13871] DEBUG:cmdType=I
2006-07-13 16:53:57 [13871] DEBUG:rlog=0
2006-07-13 16:53:57 [13871] DEBUG:port=5432
2006-07-13 16:53:57 [13871] DEBUG:pid=5997
2006-07-13 16:53:57 [13871] DEBUG:from_host=192.168.0.52
2006-07-13 16:53:57 [13871] DEBUG:dbName=bench
2006-07-13 16:53:57 [13871] DEBUG:userName=postgres
2006-07-13 16:53:57 [13871] DEBUG:recieve sec=1152777237
2006-07-13 16:53:57 [13871] DEBUG:recieve usec=428019
2006-07-13 16:53:57 [13871] DEBUG:query_size=95
2006-07-13 16:53:57 [13871] DEBUG:request_id=6
2006-07-13 16:53:57 [13871] DEBUG:replicate_id=0
2006-07-13 16:53:57 [13871] DEBUG:query=INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (7, 1, 84668, 7836, CURRENT_TIMESTAMP)
2006-07-13 16:53:57 [13871] DEBUG:sem_lock [1] req
2006-07-13 16:53:57 [13870] DEBUG:cmdSts=T
2006-07-13 16:53:57 [13870] DEBUG:cmdType=U
2006-07-13 16:53:57 [13870] DEBUG:rlog=0
2006-07-13 16:53:57 [13870] DEBUG:port=5432
2006-07-13 16:53:57 [13870] DEBUG:pid=5998
2006-07-13 16:53:57 [13870] DEBUG:from_host=192.168.0.52
2006-07-13 16:53:57 [13870] DEBUG:dbName=bench
2006-07-13 16:53:57 [13870] DEBUG:userName=postgres
2006-07-13 16:53:57 [13870] DEBUG:recieve sec=1152777237
2006-07-13 16:53:57 [13870] DEBUG:recieve usec=429271
2006-07-13 16:53:57 [13870] DEBUG:query_size=60
2006-07-13 16:53:57 [13870] DEBUG:request_id=5
2006-07-13 16:53:57 [13870] DEBUG:replicate_id=0
2006-07-13 16:53:57 [13870] DEBUG:query=UPDATE branches SET bbalance = bbalance + 9604 WHERE bid = 1
2006-07-13 16:53:57 [13870] DEBUG:sem_lock [1] req
ここで固まってしまいます。
何かわかることがありましたら、
よろしくお願いいたします。
以下設定ファイル
<pglb.conf>
#============================================================
# Load Balance Server configuration file
#-------------------------------------------------------------
# file: pglb.conf
#-------------------------------------------------------------
# This file controls:
# o which hosts are db cluster server
# o which port use connect to db cluster server
# o how many connections are allowed on each DB server
#============================================================
#-------------------------------------------------------------
# set cluster DB server information
# o Host_Name : hostname
# o Port : connection for postmaster
# o Max_Connection : maximun number of connection to postmaster
#-------------------------------------------------------------
#<Cluster_Server_Info>
# <Host_Name> master.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Max_Connect> 0 </Max_Connect>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
# <Host_Name> post2.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Max_Connect> 32 </Max_Connect>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
# <Host_Name> post3.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Max_Connect> 32 </Max_Connect>
#</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> bngdb01 </Host_Name>
<Port> 5432 </Port>
<Max_Connect> 512 </Max_Connect>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> bngdb02 </Host_Name>
<Port> 5432 </Port>
<Max_Connect> 512 </Max_Connect>
</Cluster_Server_Info>
#-------------------------------------------------------------
# set Load Balance server information
# o Backend_Socket_Dir : Unix domain socket path for the backend
# o Receive_Port : connection from client
# o Recovery_Port : connection for recovery process
# o Max_Cluster_Num : maximun number of cluster DB servers
# o Use_Connection_Pooling : use connection pool [yes/no]
# o Max_Pool_Each_Server : number of pool connections/DB server
#-------------------------------------------------------------
<Backend_Socket_Dir> /tmp </Backend_Socket_Dir>
<Receive_Port> 5433 </Receive_Port>
<Recovery_Port> 6001 </Recovery_Port>
<Max_Cluster_Num> 128 </Max_Cluster_Num>
<Use_Connection_Pooling> no </Use_Connection_Pooling>
<cluster.conf>
#============================================================
# Cluster DB Server configuration file
#------------------------------------------------------------
# file: cluster.conf
#------------------------------------------------------------
# This file controls:
# o which hosts & port are replication server
# o which port use for replication request to replication server
# o which command use for recovery function
#============================================================
#------------------------------------------------------------
# set Replication Server information
# o Host_Name : hostname
# o Port : connection for postmaster
# o Recovery_Port : connection for recovery process
#------------------------------------------------------------
<Replicate_Server_Info>
<Host_Name> bngdbrep01 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
#<Replicate_Server_Info>
# <Host_Name> replicate2.postgres.jp </Host_Name>
# <Port> 8002 </Port>
# <Recovery_Port> 8102 </Recovery_Port>
#</Replicate_Server_Info>
#<Replicate_Server_Info>
# <Host_Name> replicate3.postgres.jp </Host_Name>
# <Port> 8003 </Port>
# <Recovery_Port> 8103 </Recovery_Port>
#</Replicate_Server_Info>
#-------------------------------------------------------------
# set Cluster DB Server information
# o Recovery_Port : connection for recovery
# o Rsync_Path : path of rsync command
# o Rsync_Option : file transfer option for rsync
# o Rsync_Compress : use compression option for rsync
# [yes/no]. default : yes
# o When_Stand_Alone : When all replication servers fell,
# you can set up two kinds of permittion,
# "real_only" or "read_write".
#-------------------------------------------------------------
<Recovery_Port> 7001 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_only </When_Stand_Alone>
#-------------------------------------------------------------
# set partitional replicate control information
# set DB name and Table name to stop reprication
# o DB_Name : DB name
# o Table_Name : table name
#-------------------------------------------------------------
#<Not_Replicate_Info>
# <DB_Name> test_db </DB_Name>
# <Table_Name> log_table </Table_Name>
#</Not_Replicate_Info>
<pgreplicate.conf>
#=============================================================
# PGReplicate configuration file
#-------------------------------------------------------------
# file: pgreplicate.conf
#-------------------------------------------------------------
# This file controls:
# o which hosts & port are cluster server
# o which port use for replication request from cluster server
#=============================================================
#
#-------------------------------------------------------------
# A setup of Cluster DB(s)
#
# o Host_Name : The host name of Cluster DB.
# -- please write a host name by FQDN.
# -- do not write IP address.
# o Port : The connection port with postmaster.
# o Recovery_Port : The connection port at the time of
# a recovery sequence .
#-------------------------------------------------------------
#<Cluster_Server_Info>
# <Host_Name> master.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Recovery_Port> 7001 </Recovery_Port>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
# <Host_Name> cluster2.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Recovery_Port> 7001 </Recovery_Port>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
# <Host_Name> cluster3.postgres.jp </Host_Name>
# <Port> 5432 </Port>
# <Recovery_Port> 7001 </Recovery_Port>
#</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> bngdb01 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> bngdb02 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
#
#-------------------------------------------------------------
# A setup of Load Balance Server
#
# o Host_Name : The host name of a load balance server.
# -- please write a host name by FQDN.
# -- do not write IP address.
# o Recovery_Port : The connection port at the time of
# a recovery sequence .
#-------------------------------------------------------------
<LoadBalance_Server_Info>
<Host_Name> bngdblb01 </Host_Name>
<Recovery_Port> 6001 </Recovery_Port>
</LoadBalance_Server_Info>
#
#------------------------------------------------------------
# A setup of the upper replication server for cascade connection.
#
# o Host_Name : The host name of Cluster DB.
# -- please write a host name by FQDN.
# -- do not write IP address.
# o Port : The connection port with postmaster.
# o Recovery_Port : The connection port at the time of
# a recovery sequence .
#------------------------------------------------------------
#<Replicate_Server_Info>
# <Host_Name> bngdbrep01 </Host_Name>
# <Port> 8002 </Port>
# <Recovery_Port> 8102 </Recovery_Port>
#</Replicate_Server_Info>
#
#-------------------------------------------------------------
# A setup of a replication server
#
# o Replicate_Port : connection for replication
# o Recovery_Port : connection for recovery
# o Response_mode : timing which returns a response
# normal -- return result of DB which received the query
# reliable -- return result after waiting for response of
# all Cluster DBs.
# o Use_Replication_Log : use replication log
# [yes/no]. default : no
# o RLOG_Port : connection for replication log
#-------------------------------------------------------------
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
#<Response_Mode> normal </Response_Mode>
<Response_Mode> reliable </Response_Mode>
<Error_Log_File> /dbdata/reperr.log </Error_Log_File>
<Use_Replication_Log> no </Use_Replication_Log>
<RLOG_Port> 8301 </RLOG_Port>
pgcluster メーリングリストの案内