[pgcluster: 750] Re: リカバリーに失敗する

164 164 @ 7250.org
2005年 3月 16日 (水) 00:20:22 JST


はじめまして、長文失礼致します。
勉強中で外しているかも知れませんけど、似たような状況かと思い報告させて頂
きます。

OS FedoraCore3(2.6.10-1.770_FC3)  PGCluster 1.1.0c 1.3.0c CPU Pentium4HT
です。
レプリケーションサーバ dbrp1 dbrp2
クラスタDB                 db1 db2
ロードバランサー       dblb1 dblb2

はじめ、2.6.10-1.766_FC3smp を使用していました。
リカバリーモードでの起動時にエラーメッセージは有りませんでした。
その後 "[pgcluster: 714] pgreplicateが固まります" が、再現致しましたので
カーネルを変更しました。

変更後、"[pgcluster: 714] pgreplicateが固まります" が起こらないことは確
認しましたが、
リカバリーモード起動時にエラーメッセージが出力されるようになりました。
その間PGCluster関係の設定は変更していません。

その後1.1.0cの再インストール、1.3.0cのインストール等行ないましたが、
全てで一度リカバリーモードで起動後に不安定になりました。
負荷をかけていると途中で停止して、pgreplicate が無限に増え続けたりします。
1.3.0cではリカバリーモードで起動後、エラーメッセージは出力されず普通に起
動したように見えますが
postmaster.pid は生成されない、 psでプロセスの確認はできる、
と言った状態になり停止も起動もできなくなって kill しています。

rsyncのバージョンは 2.6.3 、手動でのテストは成功しています。
diffで確認しましたが、エラーが出ているファイル以外は同期していました。

山本樣、pgfoundryの投稿、僕もカーネル2.6ですが、2.4とかのほうがいいんで
しょうか?


----------------------------------------
[root @ db2 data]# su - postgres -c "pg_ctl start -D data -o '-R'"
postmaster successfully started
[root @ db2 data]# Start in recovery mode!
Please wait until a data synchronization finishes from Master DB...
1st recovery step of [global] directory...OK
1st recovery step of [base] directory...OK
1st recovery step of [pg_clog] directory...OK
1st recovery step of [pg_xlog] directory...OK
2nd recovery step of [global] directory...OK
2nd recovery step of [base] directory...OK
2nd recovery step of [pg_clog] directory...OK
2nd recovery step of [pg_xlog] directory...OK
rsync: stat "/usr/local/pgsql/data/base/17142/.100636.0h3qIu" failed: No
suchfile or directory (2)
rsync: rename "/usr/local/pgsql/data/base/17142/.100636.0h3qIu" ->
"base/17142/100636": No such file or directory (2)
rsync: stat "/usr/local/pgsql/data/pg_xlog/.0000000000000006.EJDMcf"
failed: No such file or directory (2)
rsync: rename "/usr/local/pgsql/data/pg_xlog/.0000000000000006.EJDMcf"
-> "pg_xlog/0000000000000006": No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1146)
rsync error: some files could not be transferred (code 23) at main.c(1146)

[root @ db2 data]# ps ax
  PID TTY      STAT   TIME COMMAND
~~
 5537 pts/0    S      0:00 postgres: stats buffer process
 5538 pts/0    S      0:00 postgres: stats collector process
 5540 pts/0    S      0:00 /usr/local/pgcluster-1.1.0c/bin/postmaster -R
-D da
 5541 pts/0    S      0:00 /usr/local/pgcluster-1.1.0c/bin/postmaster -R
-D da
 5542 pts/0    S      0:00 /usr/local/pgcluster-1.1.0c/bin/postmaster -R
-D da
 5614 pts/0    R+     0:00 ps -ax

[root @ db2 data]# su - postgres -c "pg_ctl stop -D data "
pg_ctl: could not find data/postmaster.pid
Is postmaster running?

[root @ db2 data]# su - postgres -c "pg_ctl start -D data "
postmaster successfully started
[root @ db2 data]# LOG:  could not bind IPv6 socket: アドレスは既に使用中です
HINT:  Is another postmaster already running on port 5432? If not, wait
a fewseconds and retry.
LOG:  could not bind IPv4 socket: アドレスは既に使用中です
HINT:  Is another postmaster already running on port 5432? If not, wait
a fewseconds and retry.
FATAL:  could not create TCP/IP listen socket

[root @ db2 data]# kill 5537 5538 5540 5541 5542
[root @ db2 data]# su - postgres -c "pg_ctl start -D data "
postmaster successfully started
[root @ db2 data]# su - postgres -c "psql test_db"


その間のpgreplicate
 -----------------------------------------------------------------
[root @ dbrp1 ~]# su - postgres -c "pgreplicate -nv -D
/usr/local/pgcluster-1.1.0c/etc/"
DEBUG:Use Replication Log. Start PGR_RLog_Main()
addr.sun_path[/usr/local/pgcluster-1.1.0c/etc//.s.PGRLOG.8301]
Replicateion_Log->RLog_Sock_Path[/usr/local/pgcluster-1.1.0c/etc//.s.PGRLOG.8301]
DEBUG:replicate_main():replicate main 8001 port bind OK
DEBUG:PGRreplicate_packet_send():cmdSts=N
DEBUG:PGRreplicate_packet_send():cmdType=
DEBUG:PGRreplicate_packet_send():rlog=0
DEBUG:PGRreplicate_packet_send():request_id=0
DEBUG:PGRreplicate_packet_send():replicate_id=0
DEBUG:PGRreplicate_packet_send():port=0
DEBUG:PGRreplicate_packet_send():pid=0
DEBUG:PGRreplicate_packet_send():from_host=dbrp1
DEBUG:PGRreplicate_packet_send():dbName=template1
DEBUG:PGRreplicate_packet_send():userName=postgres
DEBUG:PGRreplicate_packet_send():recieve sec=0
DEBUG:PGRreplicate_packet_send():recieve usec=0
DEBUG:PGRreplicate_packet_send():query_size=72
DEBUG:PGRreplicate_packet_send():query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,'dbrp1',8001,8101,8201)
DEBUG:sem_lock[1]
DEBUG:pgr_createConn():PQsetdbLogin host[db1] port[5432] db[template1]
user[postgres]
DEBUG:pgr_createConn():PQsetdbLogin host[db2] port[5432] db[template1]
user[postgres]
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin failed. close socket
ERROR:pgr_createConn():PQsetdbLogin  timeout
ERROR:setTransactionTbl():New Transaction but pgr_createConn5432 @ db1 failed
DEBUG:deleteTransactionTbl(): getTransactionTbl failed
ERROR:pgr_createConn():PQsetdbLogin  timeout
ERROR:setTransactionTbl():New Transaction but pgr_createConn5432 @ db2 failed
DEBUG:deleteTransactionTbl(): getTransactionTbl failed
DEBUG:sem_unlock[1]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:Cascade_Inf->upper is NULL
DEBUG:PGRsend_cascade():PGRsend_cascade sock[6]
DEBUG:PGRsend_cascade():send[] size[572]
DEBUG:PGRget_lower_cascade():lower cascade search[8001]@[dbrp2] use[2]
DEBUG:PGRget_lower_cascade():find lower cascade
DEBUG:PGRnotice_replication_server(): can not connect server[dbrp2]
DEBUG:replicate_loop():replicate_loop selected
DEBUG:replicate_loop(): PGRread_packet failed query[(null)] cmdSys[]
DEBUG:replicate_loop():session closed
DEBUG:replicate_loop():replicate loop exit
DEBUG:pgrecovery_loop():receive packet no:1
DEBUG:first_setup_recovery():1st setup target db2
DEBUG:first_setup_recovery():1st setup port 5432
ERROR:send_recovery_packet():send() failed. (Socket operation on non-socket)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_packet():send failed and PGR_Create_Socket_Connect failed
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_packet():send failed and PGR_Create_Socket_Connect failed
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_recovery_packet():send() failed. (Bad file descriptor)
ERROR:send_packet():send failed and PGR_Create_Socket_Connect failed
DEBUG:pgr_createConn():PQsetdbLogin host[db1] port[5432] db[template1]
user[postgres]
DEBUG:pgr_createConn():PQsetdbLogin ok
DEBUG:send_sync_data():sync_command(SELECT
PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1) )
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
DEBUG:pgrecovery_loop():1st master db1 - 5432
DEBUG:pgrecovery_loop():1st target db2 - 5432
DEBUG:pgrecovery_loop():receive packet no:5
DEBUG:send_sync_data():sync_command(SELECT
PGR_SYSTEM_COMMAND_FUNCTION(3,0,0,0,1) )
DEBUG:pgrecovery_loop():2nd master db1 - 5432
DEBUG:pgrecovery_loop():2nd target db2 - 5432
DEBUG:pgrecovery_loop():second_setup_recovery end :1
DEBUG:pgrecovery_loop():receive packet no:9
DEBUG:pgrecovery_loop():last master db1 - 5432
DEBUG:pgrecovery_loop():last target db2 - 5432
DEBUG:PGRsend_queue():master db1 - 5432
DEBUG:PGRsend_queue():target db2 - 5432
ERROR:PGRget_recovery_queue_file_for_read():could not open recovery
queue file as /usr/local/pgcluster-1.1.0c/etc//.pgr_recovery.1. reason:
No such file or directory
DEBUG:pgrecovery_loop():PGRsend_queue ok
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
ERROR:send_packet():PGR_Create_Socket_Connect failed
DEBUG:replicate_loop():replicate_loop selected
DEBUG:replicate_loop(): PGRread_packet failed query[(null)] cmdSys[]
DEBUG:replicate_loop():session closed
DEBUG:replicate_loop():replicate loop exit
DEBUG:replicate_loop():replicate_loop selected
DEBUG:replicate_loop(): PGRread_packet failed query[(null)] cmdSys[]
DEBUG:replicate_loop():session closed
DEBUG:replicate_loop():replicate loop exit

以上



pgcluster メーリングリストの案内