[pgcluster: 899] クラスタDBのリカバリが途中で止まる

青木俊憲 aoki @ shonai.co.jp
2006年 7月 20日 (木) 19:22:43 JST


はじめまして。青木と申します。

現在、以下の環境でpgcluster-1.5.0rc7を評価していますが、
クラスタDBのリカバリ中に止まってしまいます。
具体的には

Start in recovery mode!
Please wait until a data synchronization finishes from Master DB...
1st recovery step of [global] directory...OK
1st recovery step of [base] directory...OK
1st recovery step of [pg_clog] directory...OK
1st recovery step of [pg_xlog] directory...OK
1st sync_table_space OK

からずっと(一晩中)止まったままになってしまいます。
トレースのメッセージも見てもエラーらしきものもありませんので、
原因もわからずメールしました。


クラスタDB 1
Xeon3.0G × 2 Memory 4G
RHEL3.3 2.4.20.ELsmp
/var/pgsql/dataにiSCSI経由で560GBのディスクをマウント
ホスト名 sssi-ddb01
IP 192.168.35.151

クラスタDB 2 & レプリケーションサーバ
Xeon3.0G × 2 Memory 4G
RHEL3.3 2.4.20.ELsmp
/var/pgsql/dataにiSCSI経由で560GBのディスクをマウント
ホスト名 sssi-ddb02
IP 192.168.35.152

(ロードバランサーはなし)

cluster.conf
<Replicate_Server_Info>
        <Host_Name> sssi-ddb02 </Host_Name>
        <Port> 8001 </Port>
        <Recovery_Port> 8101 </Recovery_Port>
</Replicate_Server_Info>

<Recovery_Port> 7001 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<Rsync_Compress> yes </Rsync_Compress>
<When_Stand_Alone> read_write  </When_Stand_Alone>

pgreplicate.conf
<Cluster_Server_Info>
    <Host_Name>           sssi-ddb01  </Host_Name>
    <Port>                5432                </Port>
    <Recovery_Port>       7001                </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>           sssi-ddb02 </Host_Name>
    <Port>                5432                 </Port>
    <Recovery_Port>       7001                 </Recovery_Port>
</Cluster_Server_Info>

<Replication_Port>    8001            </Replication_Port>
<Recovery_Port>       8101            </Recovery_Port>
<Response_Mode>       normal          </Response_Mode>
<Use_Replication_Log> no              </Use_Replication_Log>
<RLOG_Port>           8301            </RLOG_Port>

pgreplicateのトレースメッセージ
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():recovery accept port 8101
2006-07-20 18:25:54 [8013] DEBUG:read_packet():receive packet
2006-07-20 18:25:54 [8013] DEBUG:no = 1
2006-07-20 18:25:54 [8013] DEBUG:max_connect = 100
2006-07-20 18:25:54 [8013] DEBUG:port = 5432
2006-07-20 18:25:54 [8013] DEBUG:recoveryPort = 7001
2006-07-20 18:25:54 [8013] DEBUG:hostName = sssi-ddb01
2006-07-20 18:25:54 [8013] DEBUG:pg_data = /var/pgsql/data
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():receive packet no:1
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():1st master  - 0
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():1st target  - 0
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():1st setup target sssi-dd
b01
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():1st setup port 5432
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():add recovery target to h
ost table
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():set RECOVERY_PGDATA_REQ
packet data
2006-07-20 18:25:54 [8013] DEBUG:PGRsend_replicate_packet_to_server():host(10.50
.35.152) : port(5432)
2006-07-20 18:25:54 [8013] DEBUG:PGRsend_replicate_packet_to_server():set new Tr
ansaction
2006-07-20 18:25:54 [8013] DEBUG:pgr_createConn():PQsetdbLogin host[192.168.35.152
] port[5432] db[template1] user[postgres]
2006-07-20 18:25:54 [8013] DEBUG:pgr_createConn():PQsetdbLogin ok!!
2006-07-20 18:25:54 [8013] DEBUG:PGRsend_replicate_packet_to_server():connect db
:template1 port:5432 user:postgres host:192.168.35.152
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command(
SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1,1) )
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command
returns
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command(
SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,0,0,1) )
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():sync_command
returns SYSTEM_COMMAND
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():execute query
(VACUUM)
2006-07-20 18:25:54 [8013] DEBUG:send_replicate_packet_to_server():PQexec return
s :VACUUM
2006-07-20 18:25:54 [8013] DEBUG:deleteTransactionTbl():
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():send packet to master ss
si-ddb02 recoveryPort 7001
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():wait answer from master
server
2006-07-20 18:25:54 [8013] DEBUG:read_packet():receive packet
2006-07-20 18:25:54 [8013] DEBUG:no = 3
2006-07-20 18:25:54 [8013] DEBUG:max_connect = 100
2006-07-20 18:25:54 [8013] DEBUG:port = 5432
2006-07-20 18:25:54 [8013] DEBUG:recoveryPort = 7001
2006-07-20 18:25:54 [8013] DEBUG:hostName = sssi-ddb02
2006-07-20 18:25:54 [8013] DEBUG:pg_data = /var/pgsql/data
2006-07-20 18:25:54 [8013] DEBUG:first_setup_recovery():get answer from master:n
o[3]
2006-07-20 18:25:54 [8013] DEBUG:pgrecovery_loop():first_setup_recovery end:0
2006-07-20 18:39:10 [8061] DEBUG:cmdSts=O
2006-07-20 18:39:10 [8061] DEBUG:cmdType=x
2006-07-20 18:39:10 [8061] DEBUG:rlog=0
2006-07-20 18:39:10 [8061] DEBUG:port=5432
2006-07-20 18:39:10 [8061] DEBUG:pid=8060
2006-07-20 18:39:10 [8061] DEBUG:from_host=192.168.35.152
2006-07-20 18:39:10 [8061] DEBUG:dbName=template1
2006-07-20 18:39:10 [8061] DEBUG:userName=postgres
2006-07-20 18:39:10 [8061] DEBUG:recieve sec=1153388350
2006-07-20 18:39:10 [8061] DEBUG:recieve usec=988134
2006-07-20 18:39:10 [8061] DEBUG:query_size=21
2006-07-20 18:39:10 [8061] DEBUG:request_id=0
2006-07-20 18:39:10 [8061] DEBUG:replicate_id=0
2006-07-20 18:39:10 [8061] DEBUG:query=PGR_CLOSE_CONNECTION
2006-07-20 18:39:10 [8061] DEBUG:sem_lock [1] req
2006-07-20 18:39:10 [8061] DEBUG:sem_lock [1] got it
2006-07-20 18:39:10 [8061] DEBUG:PGRreplicate_packet_send():checking host sssi-d
db01 for creating threads
2006-07-20 18:39:10 [8061] DEBUG:PGRreplicate_packet_send():checking host sssi-d
db02 for creating threads
2006-07-20 18:39:10 [8061] DEBUG:[0] is same host
2006-07-20 18:39:10 [8061] DEBUG:sem_unlock[1]
2006-07-20 18:39:10 [8061] DEBUG:PGRdo_replicate():PGRreplicate_packet_send retu
rns 0
2006-07-20 18:39:10 [8061] DEBUG:replicate_loop():session closed
2006-07-20 18:39:10 [8061] DEBUG:replicate_loop():replicate loop exit

以上、よろしくお願いいたします。




pgcluster メーリングリストの案内