[pgcluster: 953] Re: ロードバランサが固まります
shinozaki @ cobalt.co.jp
shinozaki @ cobalt.co.jp
2007年 1月 15日 (月) 17:34:56 JST
篠崎です。
ロードバランサが固まる件ですが、
自己解決しました。
原因ですが、
ロードバランサとレプリケータが搭載されているサーバに
NICを2枚刺ししていたのが問題でした。
NICを1枚だけ使用するように設定ファイルを書き直した結果
ロードバランサのロックが発生しなくなりました。
お騒がせいたしました。
色々お世話になりありがとうございました。
shinozaki @ cobalt.co.jp さんは書きました:
>三谷さん。
>
>返信メールありがとうございます。
>返事が遅くなってしまってすいません。
>
>その後、三谷さんの指摘を受けて、
>PGClusterのバージョンを1.5.0.rc13に変更
>ライブラリパスの設定、設定ファイルを最新の物に変更
>等を行ってみたのですが、
>どうしても、ロードバランサのロック状態がかわりませんでした。
>
>ロードバランサのロック現象は、毎回確実に20〜30分で起こせますので、
>実際に動作を行いnetstatでパラメータの取得を行いました。
>内部的な情報も含まれますので、別途結果を添付いたしました。
>
>度々お世話になり申し訳ないのですが、
>よろしくお願いいたします。
>
>a.mitani @ sra-europe.com さんは書きました:
>>篠崎さん,こんにちは.
>>
>>ロードバランサがロックしたときのnetstatの状態が分かりましたら教えてください.
>>あと,ロードバランサのホスト名にlocalhostを指定されているようですが,ここを
>>ホスト名にしてみてください.
>>それから,設定ファイルの内容がPGClusterのバージョンと合っていないようですの
>>で,念のため,他のバージョンのライブラリなどが混在していないか確認してみてく
>>ださい.
>>なお,1.5.0の最新版はrc12ですので,こちらもお試しください.
>>
>>三谷@オランダ
>>
>>>
>>> 今回は、また別の件でメールさせていただきます。
>>>
>>> システムの安定度を確認するために、
>>> PHPからselectとinsertをトランザクションを張らずに1回呼ぶプログラムを
>>> 外部から(http経由で)毎秒40回程度回すベンチを流してみたのですが、
>>> だいたい20分から30分程度でロードバランサが固まってしまいます。
>>>
>>> ロードバランサをリスタートすると復活するのですが、
>>> 書き込みが間に合ってない感じでしょうか?
>>>
>>> なにか対策等ありましたら、ご教授ください。
>>> よろしくお願いいたします。
>>>
>>> PGCluster-1.5.0rc10
>>>
>>> pglb.conf
>>>
>>> <Cluster_Server_Info>
>>> <Host_Name> db01 </Host_Name>
>>> <Port> 5432 </Port>
>>> <Max_Connect> 128 </Max_Connect>
>>> </Cluster_Server_Info>
>>> <Cluster_Server_Info>
>>> <Host_Name> db02 </Host_Name>
>>> <Port> 5432 </Port>
>>> <Max_Connect> 128 </Max_Connect>
>>> </Cluster_Server_Info>
>>> <Cluster_Server_Info>
>>> <Host_Name> db03 </Host_Name>
>>> <Port> 5432 </Port>
>>> <Max_Connect> 128 </Max_Connect>
>>> </Cluster_Server_Info>
>>>
>>> <Backend_Socket_Dir> /tmp </Backend_Socket_Dir>
>>> <Receive_Port> 5432 </Receive_Port>
>>> <Recovery_Port> 6001 </Recovery_Port>
>>> <Max_Cluster_Num> 4 </Max_Cluster_Num>
>>> <Use_Connection_Pooling> yes </Use_Connection_Pooling>
>>>
>>> pgreplicate.conf
>>>
>>> <Cluster_Server_Info>
>>> <Host_Name> db01 </Host_Name>
>>> <Port> 5432 </Port>
>>> <Recovery_Port> 7001 </Recovery_Port>
>>> </Cluster_Server_Info>
>>> <Cluster_Server_Info>
>>> <Host_Name> db02 </Host_Name>
>>> <Port> 5432 </Port>
>>> <Recovery_Port> 7001 </Recovery_Port>
>>> </Cluster_Server_Info>
>>> <Cluster_Server_Info>
>>> <Host_Name> db03 </Host_Name>
>>> <Port> 5432 </Port>
>>> <Recovery_Port> 7001 </Recovery_Port>
>>> </Cluster_Server_Info>
>>>
>>> <LoadBalance_Server_Info>
>>> <Host_Name> localhost </Host_Name>
>>> <Recovery_Port> 6001 </Recovery_Port>
>>> </LoadBalance_Server_Info>
>>>
>>> <LoadBalance_Server_Info>
>>> <Host_Name> lb02 </Host_Name>
>>> <Recovery_Port> 6001 </Recovery_Port>
>>> </LoadBalance_Server_Info>
>>>
>>> <Replication_Port> 8001 </Replication_Port>
>>> <Recovery_Port> 8101 </Recovery_Port>
>>> <Response_Mode> normal </Response_Mode>
>>> <Status_Log_File> /dbdata/pgreplicate.sts </Status_Log_File>
>>> <Error_Log_File> /dbdata/pgreplicate.log </Error_Log_File>
>>> <Use_Replication_Log> yes </Use_Replication_Log>
>>> <RLOG_Port> 8301 </RLOG_Port>
>>>
>>> 止まった時あたりの
>>> ロードバランサログ以下のログが繰り返されます。
>>>
>>> 2006-12-26 11:52:51 [29260] DEBUG:pool_process_query():read kind from
>>> backend C
>>> 2006-12-26 11:52:51 [29260] DEBUG:pool_process_query():read kind from
>>> backend pending data Z len: 5 po: 11
>>> 2006-12-26 11:52:51 [29260] DEBUG:ReadyForQuery(): message length: 5
>>> 2006-12-26 11:52:51 [29260] DEBUG:ReadyForQuery(): transaction state: I
>>> 2006-12-26 11:52:51 [29260]
>>> DEBUG:pool_connection_pool_timer():pool_connection_pool_timer: called
>>> 2006-12-26 11:52:51 [29222] DEBUG:pool_process_query():read kind from
>>> backend C
>>> 2006-12-26 11:52:51 [29222] DEBUG:pool_process_query():read kind from
>>> backend pending data Z len: 5 po: 11
>>> 2006-12-26 11:52:51 [29222] DEBUG:ReadyForQuery(): message length: 5
>>> 2006-12-26 11:52:51 [29222] DEBUG:ReadyForQuery(): transaction state: I
>>> 2006-12-26 11:52:51 [29222]
>>> DEBUG:pool_connection_pool_timer():pool_connection_pool_timer: called
>>> 2006-12-26 11:52:51 [29343] DEBUG:pool_process_query():read kind from
>>> backend C
>>> 2006-12-26 11:52:51 [29343] DEBUG:pool_process_query():read kind from
>>> backend pending data Z len: 5 po: 11
>>> 2006-12-26 11:52:51 [29343] DEBUG:ReadyForQuery(): message length: 5
>>> 2006-12-26 11:52:51 [29343] DEBUG:ReadyForQuery(): transaction state: I
>>> 2006-12-26 11:52:51 [29343]
>>> DEBUG:pool_connection_pool_timer():pool_connection_pool_timer: called
>>> 2006-12-26 11:52:51 [29391] DEBUG:pool_process_query():read kind from
>>> backend C
>>> 2006-12-26 11:52:51 [29391] DEBUG:pool_process_query():read kind from
>>> backend pending data Z len: 5 po: 11
>>> 2006-12-26 11:52:51 [29391] DEBUG:ReadyForQuery(): message length: 5
>>> 2006-12-26 11:52:51 [29391] DEBUG:ReadyForQuery(): transaction state: I
>>> 2006-12-26 11:52:51 [29391]
>>> DEBUG:pool_connection_pool_timer():pool_connection_pool_timer: called
>>> 2006-12-26 11:52:51 [29470] DEBUG:pool_process_query():read kind from
>>> backend C
>>> 2006-12-26 11:52:51 [29470] DEBUG:pool_process_query():read kind from
>>> backend pending data Z len: 5 po: 11
>>> 2006-12-26 11:52:51 [29470] DEBUG:ReadyForQuery(): message length: 5
>>> 2006-12-26 11:52:51 [29470] DEBUG:ReadyForQuery(): transaction state: I
>>> 2006-12-26 11:52:51 [29470]
>>> DEBUG:pool_connection_pool_timer():pool_connection_pool_timer: called
>>> 2006-12-26 11:52:51 [29471] DEBUG:pool_process_query():read kind from
>>> backend C
>>> 2006-12-26 11:52:51 [29471] DEBUG:pool_process_query():read kind from
>>> backend pending data Z len: 5 po: 11
>>> 2006-12-26 11:52:51 [29471] DEBUG:ReadyForQuery(): message length: 5
>>> 2006-12-26 11:52:51 [29471] DEBUG:ReadyForQuery(): transaction state: I
>>> 2006-12-26 11:52:51 [29471]
>>> DEBUG:pool_connection_pool_timer():pool_connection_pool_timer: called
>>>
>>>
>>>
>>> レプリログ以下のログ51分台が止まった時、ロードバランサを停止した時に53分
>>> のログが一気に出力されました。
>>>
>>> 2006-12-26 11:52:51 [29746] DEBUG:same host
>>> 2006-12-26 11:52:51 [30169] DEBUG:PGRread_packet():PG_read_query returns
>>> PGR_QUERY_DONE_NOTICE_CMD.
>>> 2006-12-26 11:52:51 [29184] DEBUG:start thread_send_cluster()
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,9574,0,1) )
>>> 2006-12-26 11:52:51 [29184] DEBUG:start thread_send_cluster()
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,9574,0,1) )
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> send :RESET ALL
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> returns :RESET
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():return value from
>>> send_replicate_packet_to_server() is 0
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():pthread_exit[0]
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> send :RESET ALL
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> returns :RESET
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():return value from
>>> send_replicate_packet_to_server() is 0
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():pthread_exit[1]
>>> 2006-12-26 11:52:51 [29746]
>>> DEBUG:PGRreturn_result():PGRreturn_result[8,9574,0,1]
>>> 2006-12-26 11:52:51 [29746] DEBUG:PGRreturn_result():128 send
>>> 2006-12-26 11:52:51 [29746] DEBUG:end thread_send_source()
>>> 2006-12-26 11:52:51 [29746] DEBUG:sem_unlock[1]
>>> 2006-12-26 11:52:51 [29738] DEBUG:sem_lock [1] got it
>>> 2006-12-26 11:52:51 [29738] DEBUG:same host
>>> 2006-12-26 11:52:51 [29184] DEBUG:start thread_send_cluster()
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,9575,0,1) )
>>> 2006-12-26 11:52:51 [29184] DEBUG:start thread_send_cluster()
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command(SELECT
>>> PGR_SYSTEM_COMMAND_FUNCTION(8,9575,0,1) )
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-12-26 11:52:51 [29184]
>>> DEBUG:send_replicate_packet_to_server():sync_command returns
>>> SYSTEM_COMMAND
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> send :RESET ALL
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> returns :RESET
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():return value from
>>> send_replicate_packet_to_server() is 0
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():pthread_exit[0]
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> send :RESET ALL
>>> 2006-12-26 11:52:51 [29184] DEBUG:send_replicate_packet_to_server():PQexec
>>> returns :RESET
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():return value from
>>> send_replicate_packet_to_server() is 0
>>> 2006-12-26 11:52:51 [29184] DEBUG:thread_send_cluster():pthread_exit[1]
>>> 2006-12-26 11:52:51 [29738]
>>> DEBUG:PGRreturn_result():PGRreturn_result[8,9575,0,1]
>>> 2006-12-26 11:52:51 [29738] DEBUG:PGRreturn_result():128 send
>>> 2006-12-26 11:52:51 [29738] DEBUG:end thread_send_source()
>>> 2006-12-26 11:52:51 [29738] DEBUG:sem_unlock[1]
>>> 2006-12-26 11:52:51 [29738] DEBUG:PGRread_packet():PG_read_query returns
>>> PGR_QUERY_DONE_NOTICE_CMD.
>>> 2006-12-26 11:52:51 [29746] DEBUG:PGRread_packet():PG_read_query returns
>>> PGR_QUERY_DONE_NOTICE_CMD.
>>> 2006-12-26 11:58:53 [29673] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29717] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [30007] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29697] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [30010] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29704] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [30021] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29720] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [30032] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:PGRread_packet():PG_read_query returns .
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():got result:cmdSys='S'
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
>>> 2006-12-26 11:58:53 [29185] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog(): SYS_CALL process done
>>> 2006-12-26 11:58:53 [29185] DEBUG:do_rlog():process result done:cmdSys='S'
>>> 2006-12-26 11:58:53 [29673] DEBUG:PGRreturn_result():PGRreturn_result[5,0]
>>> 2006-12-26 11:58:53 [29717] DEBUG:PGRreturn_result():PGRreturn_result[5,0]
>>> 2006-12-26 11:58:53 [29717] DEBUG:PGRreturn_result():128 send
>>> 2006-12-26 11:58:53 [29717] DEBUG:replicate_loop():session closed
>>> 2006-12-26 11:58:53 [29717] DEBUG:PGRdo_replicate():query ::
>>> PGR_CLOSE_CONNECTION
>>>
>>
>
pgcluster メーリングリストの案内