[pgcluster: 343] PGCluster の起動順序

宮澤 誠 miyazawa1017 @ yahoo.co.jp
2004年 6月 10日 (木) 11:33:12 JST


お世話になっております。
現在、Solaris8(OS5.8)2台でPGClusterの検証を行っております、宮澤と申します。

リカバリーを使用としたときに気がついたのですが、
1.0.7RC1では今までどおり、レプリケーション(以下、RP)→クラスタDB(以下CL) の順序で
1号機を立ち上げてから、2号機でリカバリーコマンド投入で同期がとれましたが
1.0.7RC2では、CL → RP の順序で立ち上げないと
RPがCLを認識してくれず、master is NULL 、master may be down のメッセージで
同期を取ってくれませんでした。

以下に詳細を示します。

【開発環境】
実機環境:Solaris8(sparc、OS5.8)  2台
サーバ構成:
バージョン:1.0.7RC2
0系…RP(マスタ)+CL(マスタ)
1系…RP+CL
ロードバランサは未インストール

【手順】
(1)0系のRP、CLを起動する。
(2)1分待つ
(3)1系でリカバリーコマンドを投入する。

【お聞きしたいこと】
これはCL→RPの起動順序の仕様変更なのですか?
このまま使用していてもなんら問題ないでしょうか?

【ログ】
(1)RP→CLでリカバリした場合のログ
・RPログ(1号機)

DEBUG(init_server_tbl): /usr/local/pgsql/etc/pgreplicate.log open ok
DEBUG(init_server_tbl): /usr/local/pgsql/etc/pgreplicate.sts open ok
DEBUG(init_server_tbl): PGR_Get_Conf_Data ok
DEBUG(init_server_tbl): LoadBalanceTbl allocate ok
DEBUG(init_server_tbl): CascadeTbl shmget ok
DEBUG(init_server_tbl): CascadeTbl shmat ok
DEBUG(init_server_tbl): CascadeInf shmget ok
DEBUG(init_server_tbl): CascadeInf shmat ok
DEBUG(init_server_tbl): CommitLog shmget ok
DEBUG(init_server_tbl): Commit_Log_Tbl shmat ok
DEBUG(init_server_tbl): Conf data read ok
DEBUG(init_server_tbl): HostTbl shmget ok
DEBUG(init_server_tbl): HostTbl shmat ok
DEBUG(write_log_file): LockWaitTbl shmget ok
DEBUG(write_log_file): LockWaitTbl shmat ok
DEBUG(PGRrecovery_main): PGRrecovery_main bind port 7778
DEBUG(replicate_main): replicate main 8777 port bind OK
DEBUG(PGRreplicate_packet_send): cmdSts=N
DEBUG(PGRreplicate_packet_send): cmdType=
DEBUG(PGRreplicate_packet_send): port=0
DEBUG(PGRreplicate_packet_send): pid=0
DEBUG(PGRreplicate_packet_send): except_host=
DEBUG(PGRreplicate_packet_send): from_host=test1
DEBUG(PGRreplicate_packet_send): dbName=template1
DEBUG(PGRreplicate_packet_send): userName=postgres
DEBUG(PGRreplicate_packet_send): recieve sec=0
DEBUG(PGRreplicate_packet_send): recieve usec=0
DEBUG(PGRreplicate_packet_send): query_size=56
DEBUG(PGRreplicate_packet_send): query=SELECT PGR_SYSTEM_COMMAND_FUNCTION(1,'test1',8777,7778)
DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): sem_lock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:0@ host:5432 @ test1.adtest.co.jp
DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test1.adtest.co.jp
DEBUG(PGRsend_replicate_packet_to_server): host(test1.adtest.co.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test1.adtest.co.jp] port[5432] db[template1] user[postgres]
ERROR(pgr_createConn): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): dbPersistLogin  timeout
ERROR(write_log_file): New Transaction but pgr_createConn failed
ERROR(write_log_file): setTransactionTbl failed
DEBUG(write_log_file): sem_lock[2]
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(PGRis_same_host): not same host:
DEBUG(PGRreplicate_packet_send_each_server): except:0@ host:5432 @ test2.adtest.co.jp
DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test2.adtest.co.jp
DEBUG(PGRsend_replicate_packet_to_server): host(test2.adtest.co.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test2.adtest.co.jp] port[5432] db[template1] user[postgres]
ERROR(pgr_createConn): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): dbPersistLogin  timeout
ERROR(write_log_file): New Transaction but pgr_createConn failed
ERROR(write_log_file): setTransactionTbl failed
DEBUG(write_log_file): sem_lock[3]
DEBUG(PGRsem_lock): sem_unlock[2]
DEBUG(getTransactionTbl): sem_unlock[3]
DEBUG(PGRsem_unlock): PGRreplicate_packet_send end
DEBUG(PGRsem_unlock): wait replicate
DEBUG(PGRsem_unlock): wait replicate
DEBUG(PGRsem_unlock): wait replicate
DEBUG(pgrecovery_loop): recovery accept port 7778
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 1
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test2
DEBUG(read_packet): pg_data = /data/pgdb
DEBUG(read_packet): receive packet no:1
DEBUG(first_setup_recovery): 1st setup target test2
DEBUG(first_setup_recovery): 1st setup port 5432
DEBUG(first_setup_recovery): check another recovery process
DEBUG(PGRsem_unlock): add recovery target to host table
DEBUG(PGRsend_load_balance_packet): set RECOVERY_PGDATA_REQ packet data
ERROR(PGRget_master): get master info error , master may be down
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DEBUG(PGRsem_unlock): 1st master  - 0
DEBUG(PGRsem_unlock): 1st target test2 - 5432
DEBUG(pgrecovery_loop): recovery accept port 7778
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 200
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test2
DEBUG(read_packet): pg_data = /data/pgdb
DEBUG(read_packet): receive packet no:200
DEBUG(read_packet): recovery error accept. top queueing and initiarse recovery status
DEBUG(PGRsend_queue): master  - 0
ERROR(PGRget_HostTbl): master table is null
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DEBUG(PGRsem_unlock): wait replicate

・RP stsログ(1号機)
Thu Jun 10 11:07:43 2004  port(5432) host:test1.adtest.co.jp start use
Thu Jun 10 11:07:43 2004  port(5432) host:test2.adtest.co.jp start use
Thu Jun 10 11:07:43 2004  cascade(test1) port(8777) start use
Thu Jun 10 11:07:43 2004  cascade(test1) port(8777) become top
Thu Jun 10 11:07:43 2004  port(5432) host:test1.adtest.co.jp error
Thu Jun 10 11:07:43 2004  port(5432) host:test2.adtest.co.jp error
Thu Jun 10 11:10:23 2004  port(5432) host:test2 initialize  ← リカバリーコマンド投入後
Thu Jun 10 11:10:23 2004  port(5432) host:test2.adtest.co.jp initialize

上記エラーのため、同期失敗、CL起動失敗(2号機ログに出力、DB確認)

(2)CL→RPでリカバリした場合のログ
・RPログ
DEBUG(init_server_tbl): /usr/local/pgsql/etc/pgreplicate.log open ok
DEBUG(init_server_tbl): /usr/local/pgsql/etc/pgreplicate.sts open ok
DEBUG(init_server_tbl): PGR_Get_Conf_Data ok
DEBUG(init_server_tbl): LoadBalanceTbl allocate ok
DEBUG(init_server_tbl): CascadeTbl shmget ok
DEBUG(init_server_tbl): CascadeTbl shmat ok
DEBUG(init_server_tbl): CascadeInf shmget ok
DEBUG(init_server_tbl): CascadeInf shmat ok
DEBUG(init_server_tbl): CommitLog shmget ok
DEBUG(init_server_tbl): Commit_Log_Tbl shmat ok
DEBUG(init_server_tbl): Conf data read ok
DEBUG(init_server_tbl): HostTbl shmget ok
DEBUG(init_server_tbl): HostTbl shmat ok
DEBUG(write_log_file): LockWaitTbl shmget ok
DEBUG(write_log_file): LockWaitTbl shmat ok
DEBUG(PGRrecovery_main): PGRrecovery_main bind port 7778
DEBUG(replicate_main): replicate main 8777 port bind OK
DEBUG(PGRreplicate_packet_send): cmdSts=N
DEBUG(PGRreplicate_packet_send): cmdType=
DEBUG(PGRreplicate_packet_send): port=0
DEBUG(PGRreplicate_packet_send): pid=0
DEBUG(PGRreplicate_packet_send): except_host=
DEBUG(PGRreplicate_packet_send): from_host=test1
DEBUG(PGRreplicate_packet_send): dbName=template1
DEBUG(PGRreplicate_packet_send): userName=postgres
DEBUG(PGRreplicate_packet_send): recieve sec=0
DEBUG(PGRreplicate_packet_send): recieve usec=0
DEBUG(PGRreplicate_packet_send): query_size=56
DEBUG(PGRreplicate_packet_send): query=SELECT PGR_SYSTEM_COMMAND_FUNCTION(1,'test1',8777,7778)
DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): sem_lock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:0@ host:5432 @ test1.adtest.co.jp
DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test1.adtest.co.jp
DEBUG(PGRsend_replicate_packet_to_server): host(test1.adtest.co.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test1.adtest.co.jp] port[5432] db[template1] user[postgres]
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432 user:postgres host:test1.adtest.co.jp query:SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,'test1',8777,7778)
DEBUG(insertTransactionTbl): sem_lock[2]
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(getTransactionTbl): hit !! transaction tbl host test1.adtest.co.jp db:template1 pid:0
DEBUG(PGRis_same_host): not same host:
DEBUG(PGRreplicate_packet_send_each_server): except:0@ host:5432 @ test2.adtest.co.jp
DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test2.adtest.co.jp
DEBUG(PGRsend_replicate_packet_to_server): host(test2.adtest.co.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test2.adtest.co.jp] port[5432] db[template1] user[postgres]
ERROR(pgr_createConn): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): PQsetdbLogin failed. close socket!!
ERROR(write_log_file): dbPersistLogin  timeout
ERROR(write_log_file): New Transaction but pgr_createConn failed
ERROR(write_log_file): setTransactionTbl failed
DEBUG(write_log_file): sem_lock[3]
DEBUG(PGRsem_lock): sem_unlock[2]
DEBUG(getTransactionTbl): sem_unlock[3]
DEBUG(PGRsem_unlock): PGRreplicate_packet_send end
DEBUG(PGRsem_unlock): wait replicate
DEBUG(PGRsem_unlock): wait replicate
DEBUG(PGRsem_unlock): wait replicate
DEBUG(pgrecovery_loop): recovery accept port 7778
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 1
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test2
DEBUG(read_packet): pg_data = /data/pgdb
DEBUG(read_packet): receive packet no:1
DEBUG(first_setup_recovery): 1st setup target test2
DEBUG(first_setup_recovery): 1st setup port 5432
DEBUG(first_setup_recovery): check another recovery process
DEBUG(PGRsem_unlock): add recovery target to host table
DEBUG(PGRsend_load_balance_packet): set RECOVERY_PGDATA_REQ packet data
DEBUG(PGRget_master): send packet to master test1.adtest.co.jp recoveryPort 7779
DEBUG(PGRsem_unlock): wait answer from master server
DEBUG(read_packet_from_master): wait
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 3
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test1
DEBUG(read_packet): pg_data = /usr/local/pgsql/data
DEBUG(read_packet): get answer from master
DEBUG(send_recovery_packet): 1st master test1.adtest.co.jp - 5432
DEBUG(send_recovery_packet): 1st target test2 - 5432
DEBUG(PGRsem_unlock): wait replicate
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 5
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test1.adtest.co.jp
DEBUG(read_packet): pg_data = /usr/local/pgsql/data
DEBUG(read_packet): receive packet no:5
DEBUG(read_packet_from_master): wait
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 7
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test1
DEBUG(read_packet): pg_data = /usr/local/pgsql/data
DEBUG(send_recovery_packet): 2nd master test1.adtest.co.jp - 5432
DEBUG(send_recovery_packet): 2nd target test2 - 5432
DEBUG(send_recovery_packet): second_setup_recovery end :1
DEBUG(pgrecovery_loop): recovery accept port 7778
DEBUG(read_packet): receive packet
DEBUG(read_packet): no = 9
DEBUG(read_packet): max_connect = 100
DEBUG(read_packet): port = 5432
DEBUG(read_packet): recoveryPort = 7779
DEBUG(read_packet): hostName = test2
DEBUG(read_packet): pg_data = /data/pgdb
DEBUG(read_packet): receive packet no:9
DEBUG(read_packet): last master test1.adtest.co.jp - 5432
DEBUG(read_packet): last target test2 - 5432
DEBUG(PGRsend_queue): master test1.adtest.co.jp - 5432
DEBUG(PGRget_HostTbl): target test2 - 5432
DEBUG(PGRget_HostTbl): send_queue return status 0
DEBUG(PGRget_HostTbl): PGRsend_queue ok
DEBUG(PGRsem_unlock): wait replicate

・RP stsログ
Thu Jun 10 11:15:19 2004  port(5432) host:test1.adtest.co.jp start use
Thu Jun 10 11:15:19 2004  port(5432) host:test2.adtest.co.jp start use
Thu Jun 10 11:15:19 2004  cascade(test1) port(8777) start use
Thu Jun 10 11:15:19 2004  cascade(test1) port(8777) become top
Thu Jun 10 11:15:19 2004  port(5432) host:test2.adtest.co.jp error
Thu Jun 10 11:17:29 2004  port(5432) host:test2 initialize  ← リカバリーコマンド投入後
Thu Jun 10 11:17:29 2004  port(5432) host:test2.adtest.co.jp initialize
Thu Jun 10 11:18:34 2004  port(5432) host:test2 start use
Thu Jun 10 11:18:34 2004  port(5432) host:test2.adtest.co.jp start use

同期、レプリケーションを確認。

以上、よろしくお願いします。

__________________________________________________
Do You Yahoo!?
http://bb.yahoo.co.jp/




pgcluster メーリングリストの案内