[pgcluster: 887] pgbenchで固まってしまいます

shinozaki @ cobalt.co.jp shinozaki @ cobalt.co.jp
2006年 7月 13日 (木) 17:55:25 JST


前にもレプリケートで失敗する件でお世話になりました、
篠崎と申します。

その時は、hostsファイルが原因だったのですが、
今回は、ちょっと現象が違うので何かわかればと思って、
メールさせていただきました。

今回もレプリケート関連といえば、レプリケート関連なのですが、
pgbenchでコネクション数を増やすとレプリケーションの処理が固まってしまうのです。
だいたい10ぐらいになるとそのままフリーズしてしまいます。

症状ですが、デバッグ文を出力した所、以下の2パターンで固まることがわかりました。

使用したモジュールは、PGCluster-1.5.0rc7
OSは RHEL4
カーネルは、2.6.9-11になります。

CPUはXeonの3G(HT−OFF)
メモリー2Gになります。

パターン1

2006-07-13 16:51:02 [13430] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13431] DEBUG:cmdSts=T
2006-07-13 16:51:02 [13431] DEBUG:cmdType=I
2006-07-13 16:51:02 [13431] DEBUG:rlog=0
2006-07-13 16:51:02 [13431] DEBUG:port=5432
2006-07-13 16:51:02 [13431] DEBUG:pid=5917
2006-07-13 16:51:02 [13431] DEBUG:from_host=192.168.0.52
2006-07-13 16:51:02 [13431] DEBUG:dbName=bench
2006-07-13 16:51:02 [13431] DEBUG:userName=postgres
2006-07-13 16:51:02 [13431] DEBUG:recieve sec=1152777062
2006-07-13 16:51:02 [13431] DEBUG:recieve usec=469697
2006-07-13 16:51:02 [13431] DEBUG:query_size=95
2006-07-13 16:51:02 [13431] DEBUG:request_id=6
2006-07-13 16:51:02 [13431] DEBUG:replicate_id=0
2006-07-13 16:51:02 [13431] DEBUG:query=INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (5, 1, 54163, 6573, CURRENT_TIMESTAMP)
2006-07-13 16:51:02 [13431] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13428] DEBUG:read_answer():QUERY DONE
2006-07-13 16:51:02 [13428] DEBUG:end thread_send_source()
2006-07-13 16:51:02 [13428] DEBUG:sem_unlock[1]
2006-07-13 16:51:02 [13430] DEBUG:sem_lock [1] got it
2006-07-13 16:51:02 [13430] DEBUG:PGRreplicate_packet_send():checking host bngdb01 for creating threads
2006-07-13 16:51:02 [13430] DEBUG:[0] is not same host
2006-07-13 16:51:02 [13430] DEBUG:PGRreplicate_packet_send():checking host bngdb02 for creating threads
2006-07-13 16:51:02 [13430] DEBUG:[1] is same host
2006-07-13 16:51:02 [13326] DEBUG:start thread_send_cluster()
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,140,0,1) )
2006-07-13 16:51:02 [13428] DEBUG:cmdSts=O
2006-07-13 16:51:02 [13428] DEBUG:cmdType=x
2006-07-13 16:51:02 [13428] DEBUG:rlog=0
2006-07-13 16:51:02 [13428] DEBUG:port=5432
2006-07-13 16:51:02 [13428] DEBUG:pid=6489
2006-07-13 16:51:02 [13428] DEBUG:from_host=192.168.0.51
2006-07-13 16:51:02 [13428] DEBUG:dbName=bench
2006-07-13 16:51:02 [13428] DEBUG:userName=postgres
2006-07-13 16:51:02 [13428] DEBUG:recieve sec=1152777062
2006-07-13 16:51:02 [13428] DEBUG:recieve usec=474077
2006-07-13 16:51:02 [13428] DEBUG:query_size=21
2006-07-13 16:51:02 [13428] DEBUG:request_id=0
2006-07-13 16:51:02 [13428] DEBUG:replicate_id=1797
2006-07-13 16:51:02 [13428] DEBUG:query=PGR_CLOSE_CONNECTION
2006-07-13 16:51:02 [13428] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13433] DEBUG:cmdSts=T
2006-07-13 16:51:02 [13433] DEBUG:cmdType=I
2006-07-13 16:51:02 [13433] DEBUG:rlog=0
2006-07-13 16:51:02 [13433] DEBUG:port=5432
2006-07-13 16:51:02 [13433] DEBUG:pid=6486
2006-07-13 16:51:02 [13433] DEBUG:from_host=192.168.0.51
2006-07-13 16:51:02 [13433] DEBUG:dbName=bench
2006-07-13 16:51:02 [13433] DEBUG:userName=postgres
2006-07-13 16:51:02 [13433] DEBUG:recieve sec=1152777062
2006-07-13 16:51:02 [13433] DEBUG:recieve usec=475208
2006-07-13 16:51:02 [13433] DEBUG:query_size=96
2006-07-13 16:51:02 [13433] DEBUG:request_id=6
2006-07-13 16:51:02 [13433] DEBUG:replicate_id=0
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():sync_command returns SYSTEM_COMMAND
2006-07-13 16:51:02 [13433] DEBUG:query=INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (10, 1, 21569, 3724, CURRENT_TIMESTAMP)
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():execute query(UPDATE branches SET bbalance = bbalance + 9676 WHERE bid = 1)
2006-07-13 16:51:02 [13433] DEBUG:sem_lock [1] req
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():PQexec status error
2006-07-13 16:51:02 [13326] DEBUG:thread_send_cluster():return value from send_replicate_packet_to_server() is -1
2006-07-13 16:51:02 [13326] DEBUG:thread_send_cluster():pthread_exit[0]
2006-07-13 16:51:02 [13430] DEBUG:read_answer():LOCK CONFLICT !!
2006-07-13 16:51:02 [13430] DEBUG:end thread_send_source()
2006-07-13 16:51:02 [13430] DEBUG:sem_unlock[1]
2006-07-13 16:51:02 [13430] DEBUG:PGRdo_replicate():PGRreplicate_packet_send returns 0
2006-07-13 16:51:02 [13431] DEBUG:sem_lock [1] got it
2006-07-13 16:51:02 [13431] DEBUG:PGRreplicate_packet_send():checking host bngdb01 for creating threads
2006-07-13 16:51:02 [13431] DEBUG:[0] is not same host
2006-07-13 16:51:02 [13431] DEBUG:PGRreplicate_packet_send():checking host bngdb02 for creating threads
2006-07-13 16:51:02 [13431] DEBUG:[1] is same host
2006-07-13 16:51:02 [13326] DEBUG:start thread_send_cluster()
2006-07-13 16:51:02 [13326] DEBUG:send_replicate_packet_to_server():sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,141,0,1) )

ここで固まってしまします。


パターン2

2006-07-13 16:53:57 [13868] DEBUG:cmdSts=O
2006-07-13 16:53:57 [13868] DEBUG:cmdType=x
2006-07-13 16:53:57 [13868] DEBUG:rlog=0
2006-07-13 16:53:57 [13868] DEBUG:port=5432
2006-07-13 16:53:57 [13868] DEBUG:pid=5996
2006-07-13 16:53:57 [13868] DEBUG:from_host=192.168.0.52
2006-07-13 16:53:57 [13868] DEBUG:dbName=bench
2006-07-13 16:53:57 [13868] DEBUG:userName=postgres
2006-07-13 16:53:57 [13868] DEBUG:recieve sec=1152777237
2006-07-13 16:53:57 [13868] DEBUG:recieve usec=427657
2006-07-13 16:53:57 [13868] DEBUG:query_size=21
2006-07-13 16:53:57 [13868] DEBUG:request_id=0
2006-07-13 16:53:57 [13868] DEBUG:replicate_id=1839
2006-07-13 16:53:57 [13868] DEBUG:query=PGR_CLOSE_CONNECTION
2006-07-13 16:53:57 [13868] DEBUG:sem_lock [1] req
2006-07-13 16:53:57 [13871] DEBUG:cmdSts=T
2006-07-13 16:53:57 [13871] DEBUG:cmdType=I
2006-07-13 16:53:57 [13871] DEBUG:rlog=0
2006-07-13 16:53:57 [13871] DEBUG:port=5432
2006-07-13 16:53:57 [13871] DEBUG:pid=5997
2006-07-13 16:53:57 [13871] DEBUG:from_host=192.168.0.52
2006-07-13 16:53:57 [13871] DEBUG:dbName=bench
2006-07-13 16:53:57 [13871] DEBUG:userName=postgres
2006-07-13 16:53:57 [13871] DEBUG:recieve sec=1152777237
2006-07-13 16:53:57 [13871] DEBUG:recieve usec=428019
2006-07-13 16:53:57 [13871] DEBUG:query_size=95
2006-07-13 16:53:57 [13871] DEBUG:request_id=6
2006-07-13 16:53:57 [13871] DEBUG:replicate_id=0
2006-07-13 16:53:57 [13871] DEBUG:query=INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (7, 1, 84668, 7836, CURRENT_TIMESTAMP)
2006-07-13 16:53:57 [13871] DEBUG:sem_lock [1] req
2006-07-13 16:53:57 [13870] DEBUG:cmdSts=T
2006-07-13 16:53:57 [13870] DEBUG:cmdType=U
2006-07-13 16:53:57 [13870] DEBUG:rlog=0
2006-07-13 16:53:57 [13870] DEBUG:port=5432
2006-07-13 16:53:57 [13870] DEBUG:pid=5998
2006-07-13 16:53:57 [13870] DEBUG:from_host=192.168.0.52
2006-07-13 16:53:57 [13870] DEBUG:dbName=bench
2006-07-13 16:53:57 [13870] DEBUG:userName=postgres
2006-07-13 16:53:57 [13870] DEBUG:recieve sec=1152777237
2006-07-13 16:53:57 [13870] DEBUG:recieve usec=429271
2006-07-13 16:53:57 [13870] DEBUG:query_size=60
2006-07-13 16:53:57 [13870] DEBUG:request_id=5
2006-07-13 16:53:57 [13870] DEBUG:replicate_id=0
2006-07-13 16:53:57 [13870] DEBUG:query=UPDATE branches SET bbalance = bbalance + 9604 WHERE bid = 1
2006-07-13 16:53:57 [13870] DEBUG:sem_lock [1] req

ここで固まってしまいます。

何かわかることがありましたら、
よろしくお願いいたします。

以下設定ファイル

<pglb.conf>
#============================================================
#          Load Balance Server configuration file
#-------------------------------------------------------------
# file: pglb.conf
#-------------------------------------------------------------
# This file controls:
#       o which hosts are db cluster server
#       o which port  use connect to db cluster server
#       o how many connections are allowed on each DB server
#============================================================
#-------------------------------------------------------------
# set cluster DB server information
#               o Host_Name : hostname
#               o Port : connection for postmaster
#               o Max_Connection : maximun number of connection to postmaster
#-------------------------------------------------------------
#<Cluster_Server_Info>
#    <Host_Name>   master.postgres.jp  </Host_Name>
#    <Port>        5432                </Port>
#    <Max_Connect> 0                   </Max_Connect>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
#    <Host_Name>   post2.postgres.jp   </Host_Name>
#    <Port>        5432                </Port>
#    <Max_Connect> 32                  </Max_Connect>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
#    <Host_Name>   post3.postgres.jp   </Host_Name>
#    <Port>        5432                </Port>
#    <Max_Connect> 32                  </Max_Connect>
#</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>   bngdb01             </Host_Name>
    <Port>        5432                </Port>
    <Max_Connect> 512                 </Max_Connect>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>   bngdb02             </Host_Name>
    <Port>        5432                </Port>
    <Max_Connect> 512                 </Max_Connect>
</Cluster_Server_Info>

#-------------------------------------------------------------
# set Load Balance server information
#               o Backend_Socket_Dir : Unix domain socket path for the backend
#               o Receive_Port : connection from client
#               o Recovery_Port : connection for recovery process
#               o Max_Cluster_Num : maximun number of cluster DB servers
#               o Use_Connection_Pooling : use connection pool [yes/no]
#               o Max_Pool_Each_Server : number of pool connections/DB server
#-------------------------------------------------------------
<Backend_Socket_Dir>    /tmp     </Backend_Socket_Dir>
<Receive_Port>          5433     </Receive_Port>
<Recovery_Port>         6001     </Recovery_Port>
<Max_Cluster_Num>        128     </Max_Cluster_Num>
<Use_Connection_Pooling>  no     </Use_Connection_Pooling>




<cluster.conf>
#============================================================
#          Cluster DB Server configuration file
#------------------------------------------------------------
# file: cluster.conf
#------------------------------------------------------------
# This file controls:
#       o which hosts & port are replication server
#       o which port use for replication request to replication server
#       o which command use for recovery function
#============================================================
#------------------------------------------------------------
# set Replication Server information
#               o Host_Name : hostname
#               o Port : connection for postmaster
#               o Recovery_Port : connection for recovery process
#------------------------------------------------------------
<Replicate_Server_Info>
        <Host_Name> bngdbrep01 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
#<Replicate_Server_Info>
#       <Host_Name> replicate2.postgres.jp </Host_Name>
#       <Port> 8002 </Port>
#       <Recovery_Port> 8102 </Recovery_Port>
#</Replicate_Server_Info>
#<Replicate_Server_Info>
#       <Host_Name> replicate3.postgres.jp </Host_Name>
#       <Port> 8003 </Port>
#       <Recovery_Port> 8103 </Recovery_Port>
#</Replicate_Server_Info>
#-------------------------------------------------------------
# set Cluster DB Server information
#               o Recovery_Port : connection for recovery
#               o Rsync_Path : path of rsync command
#               o Rsync_Option : file transfer option for rsync
#               o Rsync_Compress : use compression option for rsync
#                                  [yes/no]. default : yes
#       o When_Stand_Alone : When all replication servers fell,
#                            you can set up two kinds of permittion,
#                            "real_only" or "read_write".
#-------------------------------------------------------------
<Recovery_Port> 7001 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_only  </When_Stand_Alone>
#-------------------------------------------------------------
# set partitional replicate control information
#     set DB name and Table name to stop reprication
#       o DB_Name : DB name
#       o Table_Name : table name
#-------------------------------------------------------------
#<Not_Replicate_Info>
#       <DB_Name>     test_db      </DB_Name>
#       <Table_Name>  log_table    </Table_Name>
#</Not_Replicate_Info>



<pgreplicate.conf>
#=============================================================
#  PGReplicate configuration file
#-------------------------------------------------------------
# file: pgreplicate.conf
#-------------------------------------------------------------
# This file controls:
#       o which hosts & port are cluster server
#       o which port use for replication request from cluster server
#=============================================================
#
#-------------------------------------------------------------
# A setup of Cluster DB(s)
#
#        o Host_Name : The host name of Cluster DB.
#                      -- please write a host name by FQDN.
#                      -- do not write IP address.
#        o Port : The connection port with postmaster.
#        o Recovery_Port : The connection port at the time of
#                          a recovery sequence .
#-------------------------------------------------------------
#<Cluster_Server_Info>
#    <Host_Name>           master.postgres.jp  </Host_Name>
#    <Port>                5432                </Port>
#    <Recovery_Port>       7001                </Recovery_Port>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
#    <Host_Name>           cluster2.postgres.jp </Host_Name>
#    <Port>                5432                 </Port>
#    <Recovery_Port>       7001                 </Recovery_Port>
#</Cluster_Server_Info>
#<Cluster_Server_Info>
#    <Host_Name>           cluster3.postgres.jp </Host_Name>
#    <Port>                5432                 </Port>
#    <Recovery_Port>       7001                 </Recovery_Port>
#</Cluster_Server_Info>

<Cluster_Server_Info>
    <Host_Name>           bngdb01              </Host_Name>
    <Port>                5432                 </Port>
    <Recovery_Port>       7001                 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>           bngdb02              </Host_Name>
    <Port>                5432                 </Port>
    <Recovery_Port>       7001                 </Recovery_Port>
</Cluster_Server_Info>

#
#-------------------------------------------------------------
# A setup of Load Balance Server
#
#        o Host_Name : The host name of a load balance server.
#                      -- please write a host name by FQDN.
#                      -- do not write IP address.
#        o Recovery_Port : The connection port at the time of
#                          a recovery sequence .
#-------------------------------------------------------------
<LoadBalance_Server_Info>
    <Host_Name>       bngdblb01                   </Host_Name>
    <Recovery_Port>   6001                      </Recovery_Port>
</LoadBalance_Server_Info>
#
#------------------------------------------------------------
# A setup of the upper replication server for cascade connection.
#
#        o Host_Name : The host name of Cluster DB.
#                      -- please write a host name by FQDN.
#                      -- do not write IP address.
#        o Port : The connection port with postmaster.
#        o Recovery_Port : The connection port at the time of
#                          a recovery sequence .
#------------------------------------------------------------
#<Replicate_Server_Info>
#    <Host_Name>       bngdbrep01                  </Host_Name>
#    <Port>            8002                        </Port>
#    <Recovery_Port>   8102                        </Recovery_Port>
#</Replicate_Server_Info>
#
#-------------------------------------------------------------
# A setup of a replication server
#
#               o Replicate_Port : connection for replication
#               o Recovery_Port : connection for recovery
#               o Response_mode : timing which returns a response
#                 normal   -- return result of DB which received the query
#                 reliable -- return result after waiting for response of
#                      all Cluster DBs.
#               o Use_Replication_Log : use replication log
#                                       [yes/no]. default : no
#               o RLOG_Port : connection for replication log
#-------------------------------------------------------------
<Replication_Port>    8001            </Replication_Port>
<Recovery_Port>       8101            </Recovery_Port>
#<Response_Mode>       normal          </Response_Mode>
<Response_Mode>       reliable          </Response_Mode>
<Error_Log_File>        /dbdata/reperr.log      </Error_Log_File>
<Use_Replication_Log> no              </Use_Replication_Log>
<RLOG_Port>           8301            </RLOG_Port>



pgcluster メーリングリストの案内