[pgcluster: 899] クラスタDBのリカバリが途中で止まる
青木俊憲
aoki @ shonai.co.jp
2006年 7月 20日 (木) 19:22:43 JST
はじめまして。青木と申します。
現在、以下の環境でpgcluster-1.5.0rc7を評価していますが、
クラスタDBのリカバリ中に止まってしまいます。
具体的には
Start in recovery mode!
Please wait until a data synchronization finishes from Master DB...
1st recovery step of [global] directory...OK
1st recovery step of [base] directory...OK
1st recovery step of [pg_clog] directory...OK
1st recovery step of [pg_xlog] directory...OK
1st sync_table_space OK
からずっと(一晩中)止まったままになってしまいます。
トレースのメッセージも見てもエラーらしきものもありませんので、
原因もわからずメールしました。
クラスタDB 1
Xeon3.0G × 2 Memory 4G
RHEL3.3 2.4.20.ELsmp
/var/pgsql/dataにiSCSI経由で560GBのディスクをマウント
ホスト名 sssi-ddb01
IP 192.168.35.151
クラスタDB 2 & レプリケーションサーバ
Xeon3.0G × 2 Memory 4G
RHEL3.3 2.4.20.ELsmp
/var/pgsql/dataにiSCSI経由で560GBのディスクをマウント
ホスト名 sssi-ddb02
IP 192.168.35.152
(ロードバランサーはなし)
cluster.conf
<Replicate_Server_Info>
<Host_Name> sssi-ddb02 </Host_Name>
<Port> 8001 </Port>
<Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>
<Recovery_Port> 7001 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_write </When_Stand_Alone>
pgreplicate.conf
<Cluster_Server_Info>
<Host_Name> sssi-ddb01 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> sssi-ddb02 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7001 </Recovery_Port>
</Cluster_Server_Info>
<Replication_Port> 8001 </Replication_Port>
<Recovery_Port> 8101 </Recovery_Port>
<Response_Mode> normal </Response_Mode>
<Use_Replication_Log> no </Use_Replication_Log>
<RLOG_Port> 8301 </RLOG_Port>
pgreplicateのトレースメッセージ
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():recovery accept port 8101
2006-07-20 18:25:54 [8013] DEBUG:read_packet():receive packet
2006-07-20 18:25:54 [8013] DEBUG:no = 1
2006-07-20 18:25:54 [8013] DEBUG:max_connect = 100
2006-07-20 18:25:54 [8013] DEBUG:port = 5432
2006-07-20 18:25:54 [8013] DEBUG:recoveryPort = 7001
2006-07-20 18:25:54 [8013] DEBUG:hostName = sssi-ddb01
2006-07-20 18:25:54 [8013] DEBUG:pg_data = /var/pgsql/data
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():receive packet no:1
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():1st master - 0
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():1st target - 0
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():1st setup target sssi-dd
b01
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():1st setup port 5432
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():add recovery target to h
ost table
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():set RECOVERY_PGDATA_REQ
packet data
2006-07-20 18:25:54 [8013] DEBUG:PGRsend_replicate_packet_to_server():host(10.50
.35.152) : port(5432)
2006-07-20 18:25:54 [8013] DEBUG:PGRsend_replicate_packet_to_server():set new Tr
ansaction
2006-07-20 18:25:54 [8013] DEBUG:pgr_createConn():PQsetdbLogin host[192.168.35.152
] port[5432] db[template1] user[postgres]
2006-07-20 18:25:54 [8013] DEBUG:pgr_createConn():PQsetdbLogin ok!!
2006-07-20 18:25:54 [8013] DEBUG:PGRsend_replicate_packet_to_server():connect db
:template1 port:5432 user:postgres host:192.168.35.152
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command(
SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1,1) )
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command
returns
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command(
SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,0,0,1) )
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command
returns SYSTEM_COMMAND
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():execute query
(VACUUM)
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():PQexec return
s :VACUUM
2006-07-20 18:25:54 [8013] DEBUG:deleteTransactionTbl():
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():send packet to master ss
si-ddb02 recoveryPort 7001
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():wait answer from master
server
2006-07-20 18:25:54 [8013] DEBUG:read_packet():receive packet
2006-07-20 18:25:54 [8013] DEBUG:no = 3
2006-07-20 18:25:54 [8013] DEBUG:max_connect = 100
2006-07-20 18:25:54 [8013] DEBUG:port = 5432
2006-07-20 18:25:54 [8013] DEBUG:recoveryPort = 7001
2006-07-20 18:25:54 [8013] DEBUG:hostName = sssi-ddb02
2006-07-20 18:25:54 [8013] DEBUG:pg_data = /var/pgsql/data
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():get answer from master:n
o[3]
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():first_setup_recovery end:0
2006-07-20 18:39:10 [8061] DEBUG:cmdSts=O
2006-07-20 18:39:10 [8061] DEBUG:cmdType=x
2006-07-20 18:39:10 [8061] DEBUG:rlog=0
2006-07-20 18:39:10 [8061] DEBUG:port=5432
2006-07-20 18:39:10 [8061] DEBUG:pid=8060
2006-07-20 18:39:10 [8061] DEBUG:from_host=192.168.35.152
2006-07-20 18:39:10 [8061] DEBUG:dbName=template1
2006-07-20 18:39:10 [8061] DEBUG:userName=postgres
2006-07-20 18:39:10 [8061] DEBUG:recieve sec=1153388350
2006-07-20 18:39:10 [8061] DEBUG:recieve usec=988134
2006-07-20 18:39:10 [8061] DEBUG:query_size=21
2006-07-20 18:39:10 [8061] DEBUG:request_id=0
2006-07-20 18:39:10 [8061] DEBUG:replicate_id=0
2006-07-20 18:39:10 [8061] DEBUG:query=PGR_CLOSE_CONNECTION
2006-07-20 18:39:10 [8061] DEBUG:sem_lock [1] req
2006-07-20 18:39:10 [8061] DEBUG:sem_lock [1] got it
2006-07-20 18:39:10 [8061] DEBUG:PGRreplicate_packet_send():checking host sssi-d
db01 for creating threads
2006-07-20 18:39:10 [8061] DEBUG:PGRreplicate_packet_send():checking host sssi-d
db02 for creating threads
2006-07-20 18:39:10 [8061] DEBUG:[0] is same host
2006-07-20 18:39:10 [8061] DEBUG:sem_unlock[1]
2006-07-20 18:39:10 [8061] DEBUG:PGRdo_replicate():PGRreplicate_packet_send retu
rns 0
2006-07-20 18:39:10 [8061] DEBUG:replicate_loop():session closed
2006-07-20 18:39:10 [8061] DEBUG:replicate_loop():replicate loop exit
以上、よろしくお願いいたします。
pgcluster メーリングリストの案内