[pgcluster: 254] レプリケーションがうまく動作しません。
miyazawa makoto
miyazawa_pgcluster_mailling @ yahoo.co.jp
2004年 4月 19日 (月) 18:46:47 JST
現在、Solaris8(OS5.8)2台でPGClusterの検証を行っておりま
す、宮澤と申します。
さて、ご相談させていただきたい現象なのですが
レプリケーションサーバとクラスタDBの連動がうまくできてい
ません。
いろいろ試してみたのですが、どうにもわからず皆様のお知恵
をお借りできたらと
思って投稿させていただております。
長文で申し訳ありませんが、よろしくお願いいたします。
以下に環境等の情報を挙げます。
------------------------------------------------------
実機環境:Solaris8(sparc、OS5.8) 2台
サーバ構成:
PGClusterバージョン:PGReplicate version [1.0.6cv9]
0系…レプリケーションサーバ+クラスタサーバ(マスタ)
1系…クラスタサーバ(スレーブ)
ロードバランサは未インストール、未設定
(1)インストール方法
PGClusterの設定方法、rsyncの設定は
http://www.csra.co.jp/~mitani/jpug/pgcluster/stable/index.html
の設定方法を参照。
お互いのサーバでrsyncを実行し、ファイルが転送されること
を確認。
(2)設定ファイル
設定ファイルは以下のように記述
0系
pgreplicate.conf
クラスタサーバ名を追記(/etc/hostsで解決できるホス
ト名、0系をマスタ、1系をスレーブとした)
それ以外はdefault
<Cluster_Server_Info>
<Host_Name> test1 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7779 </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
<Host_Name> test2 </Host_Name>
<Port> 5432 </Port>
<Recovery_Port> 7779 </Recovery_Port>
</Cluster_Server_Info>
cluster.conf
レプリケーションサーバを追記(/etc/hostsで解決でき
るホスト名。0系のホスト名を指定)
それ以外はdefault
<Replicate_Server_Info>
<Host_Name> test1 </Host_Name>
<Port> 8777 </Port>
<Recovery_Port> 7778 </Recovery_Port>
</Replicate_Server_Info>
〜
<Recovery_Port> 7779 </Recovery_Port>
<Rsync_Path> /usr/local/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<When_Stand_Alone> read_write </When_Stand_Alone>
pg_hba.conf
ネットワークアドレス指定で同一セグメント内からのア
クセスを許可
# TYPE DATABASE USER IP-ADDRESS
IP-MASK METHOD
local all all
trust
host all all 127.0.0.1
255.255.255.255 trust
host all all xx.xx.xx.0
255.255.255.0 trust
1系
cluster.conf
レプリケーションサーバを追記(/etc/hostsで解決でき
るホスト名。0系のホスト名を指定)
それ以外はdefault
<Replicate_Server_Info>
<Host_Name> test1 </Host_Name>
<Port> 8777 </Port>
<Recovery_Port> 7778 </Recovery_Port>
</Replicate_Server_Info>
〜
<Recovery_Port> 7779 </Recovery_Port>
<Rsync_Path> /usr/local/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<When_Stand_Alone> read_write </When_Stand_Alone>
pg_hba.conf
ネットワークアドレス指定で同一セグメント内からのア
クセスを許可
# TYPE DATABASE USER IP-ADDRESS
IP-MASK METHOD
local all all
trust
host all all 127.0.0.1
255.255.255.255 trust
host all all xx.xx.xx.0
255.255.255.0 trust
(3)動作確認手順
(1) 1系のクラスタDBを以下のコマンドで起動した。
/usr/local/pgsql/bin/pg_ctl start -D
/usr/local/pgsql/data -o "-i"
(2) 0系のレプリケーションサーバを以下のコマンドで起動
した。
/usr/local/pgsql/bin/pgreplicate -D
/usr/local/pgsql/etc -nv
この時点での、レプリケーションのログは以下の通り。
$ /usr/local/pgsql/bin/pgreplicate -D
/usr/local/pgsql/etc -nv
DEBUG(init_server_tbl):
/usr/local/pgsql/etc/pgreplicate.log open ok
DEBUG(init_server_tbl): PGR_Get_Conf_Data ok
DEBUG(init_server_tbl): LoadBalanceTbl allocate ok
DEBUG(init_server_tbl): Conf data read ok
DEBUG(init_server_tbl): HostTbl shmget ok
DEBUG(init_server_tbl): HostTbl shmat ok
DEBUG(pgr_set_log): LockWaitTbl shmget ok
DEBUG(pgr_set_log): LockWaitTbl shmat ok
DEBUG(PGRrecovery_main): PGRrecovery_main bind port
7778
DEBUG(PGRrecovery_main): wait recovery
DEBUG(replicate_main): replicate main 8777 port bind
OK
DEBUG(replicate_packet_send): cmdSts=N
DEBUG(replicate_packet_send): cmdType=
DEBUG(replicate_packet_send): port=0
DEBUG(replicate_packet_send): pid=0
DEBUG(replicate_packet_send): except_host=
DEBUG(replicate_packet_send): from_host=test1
DEBUG(replicate_packet_send): dbName=template1
DEBUG(replicate_packet_send): userName=postgres
DEBUG(replicate_packet_send): recieve sec=0
DEBUG(replicate_packet_send): recieve usec=0
DEBUG(replicate_packet_send): query_size=54
DEBUG(replicate_packet_send): query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,test1,8777,7778)
DEBUG(replicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): same_host[0] mode[1]
current[0]
DEBUG(is_need_response): sem_lock[1]
DEBUG(replicate_packet_send_each_server): except:0@
host:5432 @ test1
DEBUG(replicate_packet_send_each_server): send
replicate to:test1
DEBUG(PGRsend_replicate_packet_to_server): host(test1)
: port(5432)
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
DEBUG(setDBServerTbl): host:test1 dbName:template1
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): dbPersistLogin timeout
ERROR(pgr_createConn): pgr_createConn failed
ERROR(pgr_createConn): setDBServerTbl failed
DEBUG(pgr_createConn): sem_unlock[1]
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): same_host[0] mode[1]
current[1]
DEBUG(is_need_response): sem_lock[2]
DEBUG(replicate_packet_send_each_server): except:0@
host:5432 @ test2
DEBUG(replicate_packet_send_each_server): send
replicate to:test2
DEBUG(PGRsend_replicate_packet_to_server): host(test2)
: port(5432)
DEBUG(getDBServerTbl): search host is (test2)
DEBUG(get_ip_by_name): not found
DEBUG(setDBServerTbl): host:test2 dbName:template1
DEBUG(getDBServerTbl): search host is (test2)
DEBUG(get_ip_by_name): not found
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(get_ip_by_name): send_replicate_packet_to_server
query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,test1,8777,7778)
DEBUG(get_ip_by_name): db:template1 port:5432
user:postgres host:test2 query:SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,test1,8777,7778)
ERROR(get_ip_by_name): PQexec error
DEBUG(get_ip_by_name): sem_unlock[2]
DEBUG(getDBServerTbl): search host is (test2)
DEBUG(get_ip_by_name): search
host(1112880650):port(5432):db(template1)
DEBUG(get_ip_by_name): found
DEBUG(get_ip_by_name): replicate_packet_send end
DEBUG(get_ip_by_name): wait replicate
(3) 0系のクラスタDBを以下のコマンドで起動した。
/usr/local/pgsql/bin/pg_ctl start -D
/usr/local/pgsql/data -o "-i"
(4) この状態で以下を確認
0系
/usr/local/pgsql/bin/psql -h test2 -l
List of databases
Name | Owner | Encoding
-----------+----------+----------
template0 | postgres | EUC_JP
template1 | postgres | EUC_JP
(2 rows)
1系
/usr/local/pgsql/bin/psql -h test1 -l
List of databases
Name | Owner | Encoding
-----------+----------+----------
template0 | postgres | EUC_JP
template1 | postgres | EUC_JP
(2 rows)
(5) 0系で以下を実行
/usr/local/pgsql/bin/psql -U postgres template1
すると、レプリケーションログで以下のメッセージを
出力
DEBUG(replicate_loop): replicate_loop selected
DEBUG(read_packet): query size=5
DEBUG(read_packet): read[5] query[BEGIN]
DEBUG(read_packet): query :: BEGIN
DEBUG(replicate_packet_send): cmdSts=T
DEBUG(replicate_packet_send): cmdType=B
DEBUG(replicate_packet_send): port=5432
DEBUG(replicate_packet_send): pid=5371
DEBUG(replicate_packet_send): except_host=test1
DEBUG(replicate_packet_send): from_host=test1
DEBUG(replicate_packet_send): dbName=template1
DEBUG(replicate_packet_send): userName=postgres
DEBUG(replicate_packet_send): recieve sec=1082365158
DEBUG(replicate_packet_send): recieve usec=534535
DEBUG(replicate_packet_send): query_size=5
DEBUG(replicate_packet_send): query=BEGIN
DEBUG(replicate_packet_send): useFlag[2]
DEBUG(get_ip_by_name): same host:5432 @ 4155360a -
5432 @ 4155360a
DEBUG(get_ip_by_name): 5432 @ test1 return trigger
DEBUG(PGRis_need_sync_time): sem_lock[1]
DEBUG(return_result): 128[3,1082365158,534535]
DEBUG(return_result):
return_result[3,1082365158,534535]
DEBUG(return_result): 128 send
DEBUG(return_result): wait for answer
DEBUG(return_result): read_answer selected
DEBUG(read_packet): query size=25
DEBUG(read_packet): read[25]
query[PGR_QUERY_DONE_NOTICE_CMD]
DEBUG(read_packet): answer[PGR_QUERY_DONE_NOTICE_CMD]
DEBUG(read_packet): QUERY DONE
DEBUG(read_packet): status of return_result[0]
DEBUG(read_packet): sem_unlock[1]
DEBUG(get_ip_by_name): not same host:5432 @ 4155360a -
5432 @ 4255360a
DEBUG(get_ip_by_name): sem_lock[2]
DEBUG(replicate_packet_send_each_server):
except:5432 @ test1 host:5432 @ test2
DEBUG(replicate_packet_send_each_server): send
replicate to:test2
DEBUG(PGRsend_replicate_packet_to_server): host(test2)
: port(5432)
DEBUG(get_ip_by_name): not found in transaction tbl
host test1 db:template1 pid:5371
DEBUG(get_ip_by_name): not found in getTransactionTbl
DEBUG(get_ip_by_name): not found in transaction tbl
host test1 db:template1 pid:5371
DEBUG(replicate_loop): wait replicate
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432
user:postgres host:test2 query:BEGIN
DEBUG(insertTransactionTbl): sync_command(SELECT
PGR_SYSTEM_COMMAND_FUNCTION(3,1082365158,534535) )
DEBUG(insertTransactionTbl): PQexec send :BEGIN
DEBUG(insertTransactionTbl): PQstatus(0)
DEBUG(child_wait): replicate main: selected
DEBUG(replicate_loop): replicate_loop selected
DEBUG(read_packet): query size=54
DEBUG(read_packet): read[54] query[SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7778)]
DEBUG(read_packet): query :: SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7778)
DEBUG(replicate_packet_send): cmdSts=N
DEBUG(replicate_packet_send): cmdType=
DEBUG(replicate_packet_send): port=4194305
DEBUG(replicate_packet_send): pid=1
DEBUG(replicate_packet_send): except_host=
DEBUG(replicate_packet_send): from_host=
DEBUG(replicate_packet_send): dbName=
DEBUG(replicate_packet_send): userName=
DEBUG(replicate_packet_send): recieve sec=1082365158
DEBUG(replicate_packet_send): recieve usec=785650
DEBUG(replicate_packet_send): query_size=54
DEBUG(replicate_packet_send): query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7778)
DEBUG(replicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): same_host[0] mode[1]
current[0]
DEBUG(is_need_response): sem_lock[1]
DEBUG(replicate_packet_send_each_server):
except:4194305@ host:5432 @ test1
DEBUG(replicate_packet_send_each_server): send
replicate to:test1
DEBUG(PGRsend_replicate_packet_to_server): host(test1)
: port(5432)
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
DEBUG(setDBServerTbl): host:test1 dbName:
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
DEBUG(replicate_loop): wait replicate
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): dbPersistLogin timeout
ERROR(pgr_createConn): pgr_createConn failed
ERROR(pgr_createConn): setDBServerTbl failed
DEBUG(pgr_createConn): sem_unlock[1]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): same_host[0] mode[1]
current[1]
DEBUG(is_need_response): sem_lock[2]
0系のクラスタサーバ起動を行ったTeratermに以下のメッ
セージが表示される。
FATAL: no PostgreSQL user name specified in startup
packet
FATAL: no PostgreSQL user name specified in startup
packet
FATAL: no PostgreSQL user name specified in startup
packet
FATAL: no PostgreSQL user name specified in startup
packet
FATAL: no PostgreSQL user name specified in startup
packet
FATAL: no PostgreSQL user name specified in startup
packet
しばらくコマンドを受け付けない。
また、/usr/local/pgsql/data 配下にcoreファイルを出力
1系のクラスタサーバ起動を行ったTeratermに以下のメッ
セージ
$ LOG: pq_recvbuf: unexpected EOF on client
connection
(6)0系のクラスタサーバ起動を行ったTeratermで以下のコ
マンドを実行。
template1=# select * from shinamono;
hinmei | nedan
--------------+-------
rentalserver | 3000
(1 row)
#insert into shinamono values('only-shar',30000);
(しばらく無反応)
template1=# select * from shinamono;
hinmei | nedan
--------------+-------
rentalserver | 3000
only-shar | 30000
(2 rows)
(7) 1系で以下を実行
/usr/local/pgsql/bin/psql -U postgres template1
(しばらく無反応)
すると、レプリケーションログで以下のメッセージを
出力
DEBUG(replicate_loop): replicate main: selected
DEBUG(replicate_loop): replicate_loop selected
DEBUG(read_packet): query size=5
DEBUG(read_packet): read[5] query[BEGIN]
DEBUG(read_packet): query :: BEGIN
DEBUG(replicate_packet_send): cmdSts=T
DEBUG(replicate_packet_send): cmdType=B
DEBUG(replicate_packet_send): port=5432
DEBUG(replicate_packet_send): pid=762
DEBUG(replicate_packet_send): except_host=test2
DEBUG(replicate_packet_send): from_host=test2
DEBUG(replicate_packet_send): dbName=template1
DEBUG(replicate_packet_send): userName=postgres
DEBUG(replicate_packet_send): recieve sec=1082366639
DEBUG(replicate_packet_send): recieve usec=855403
DEBUG(replicate_packet_send): query_size=5
DEBUG(replicate_packet_send): query=BEGIN
DEBUG(replicate_packet_send): useFlag[2]
DEBUG(get_ip_by_name): not same host:5432 @ 4255360a -
5432 @ 4155360a
DEBUG(is_need_response): same_host[0] mode[1]
current[0]
DEBUG(is_need_response): sem_lock[1]
DEBUG(replicate_packet_send_each_server):
except:5432 @ test2 host:5432 @ test1
DEBUG(replicate_packet_send_each_server): send
replicate to:test1
DEBUG(PGRsend_replicate_packet_to_server): host(test1)
: port(5432)
DEBUG(get_ip_by_name): not found in transaction tbl
host test2 db:template1 pid:762
DEBUG(get_ip_by_name): not found in getTransactionTbl
DEBUG(get_ip_by_name): not found in transaction tbl
host test2 db:template1 pid:762
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432
user:postgres host:test1 query:BEGIN
DEBUG(insertTransactionTbl): sync_command(SELECT
PGR_SYSTEM_COMMAND_FUNCTION(3,1082366639,855403) )
DEBUG(insertTransactionTbl): PQexec send :BEGIN
DEBUG(insertTransactionTbl): PQstatus(0)
DEBUG(replicate_loop): wait replicate
DEBUG(replicate_loop): replicate main: selected
DEBUG(replicate_loop): replicate_loop selected
DEBUG(read_packet): query size=54
DEBUG(read_packet): read[54] query[SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7779)]
DEBUG(read_packet): query :: SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7779)
DEBUG(replicate_packet_send): cmdSts=N
DEBUG(replicate_packet_send): cmdType=
DEBUG(replicate_packet_send): port=4194305
DEBUG(replicate_packet_send): pid=1
DEBUG(replicate_packet_send): except_host=
DEBUG(replicate_packet_send): from_host=
DEBUG(replicate_packet_send): dbName=
DEBUG(replicate_packet_send): userName=
DEBUG(replicate_packet_send): recieve sec=1082366640
DEBUG(replicate_packet_send): recieve usec=114786
DEBUG(replicate_packet_send): query_size=54
DEBUG(replicate_packet_send): query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7779)
DEBUG(replicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): same_host[0] mode[1]
current[0]
DEBUG(is_need_response): sem_lock[1]
DEBUG(replicate_loop): wait replicate
0系のクラスタサーバ起動を行ったTeratermに以下のメッ
セージ
$ LOG: pq_recvbuf: unexpected EOF on client
connection
(8)1系のクラスタサーバ起動を行ったTeratermで以下のコ
マンドを実行。
template1=# select * from shinamono;
hinmei | nedan
--------------+-------
rentalserver | 3000
(1 row)
それでお聞きしたいことですが、以下の通りです。
(1)0系でinsertしたレコードが1系でレプリケーションされな
い根本的な原因はなんでしょうか?
(2)動作確認手順(2)、(5)でのログなのですが
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
DEBUG(setDBServerTbl): host:test1 dbName:template1
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
ERROR(pgr_createConn): dbPersistLogin timeout
ERROR(pgr_createConn): pgr_createConn failed
ERROR(pgr_createConn): setDBServerTbl failed
DEBUG(pgr_createConn): sem_unlock[1]
DEBUG(getDBServerTbl): search host is (test1)
DEBUG(get_ip_by_name): not found
とありますように、どうもホストの検索で失敗しているよ
うに見えます。
/etc/hostsに自分、相手のホスト名を追記するだけで問題
ないとおもっていたのですが
なにか設定する必要があるのでしょうか?
(3)動作確認手順(5),(7)でエラーメッセージ
LOG: pq_recvbuf: unexpected EOF on client connection
が表示される原因がよくわかりません。
なんでもかんでも、聞いているようで恐縮なのですが
なにでもかまわないのでよろしくお願い致します。
以上
__________________________________________________
Do You Yahoo!?
http://bb.yahoo.co.jp/
pgcluster メーリングリストの案内