[pgcluster: 340] 1.0.7RC2が上手く動いてくれません。

Kouji Hamada hamada @ hamlock.jp
2004年 6月 9日 (水) 19:15:23 JST


浜田と申します。長い文章になってしまってすみません。
今まで1.0.6bフルパッケージ版を使用していて正常稼動していたのですが
LOCK管理に問題があるという事だったので、今回から最新版の1.0.7RC2フ
ルパッケージ版をインストールして動かそうとしてるのですが、psql -l
を入力して結果が返ってくるまで10分程かかったりと、上手く動いてく
れません。


【構成及び設定】

クラスタサーバ:2台
レプリケーションサーバ:1台
PGClusterバージョン:1.0.7RC2フルパッケージ版
OSバージョン:RedHat7.3(3台ともOSインストールからやりました)

IPアドレスは以下のように振っています。
192.168.0.210 test0.aaa.jp(レプリケーションサーバ)
192.168.0.211 test1.aaa.jp(クラスタサーバ(マスタDB))
192.168.0.212 test2.aaa.jp(クラスタサーバ(2号機))


クラスタサーバ2台

/usr/local/pgsql/data/cluster.confの内容は以下の通りです。
クラスタサーバ2台同じ設定内容です。

<Replicate_Server_Info>
        <Host_Name> test0.aaa.jp </Host_Name>
        <Port> 8777 </Port>
        <Recovery_Port> 7778 </Recovery_Port>
</Replicate_Server_Info>
#<Replicate_Server_Info>
#       <Host_Name> replicate2.postgres.jp </Host_Name>
#       <Port> 8777 </Port>
#       <Recovery_Port> 7778 </Recovery_Port>
#</Replicate_Server_Info>
#<Replicate_Server_Info>
#       <Host_Name> replicate3.postgres.jp </Host_Name>
#       <Port> 8777 </Port>
#       <Recovery_Port> 7778 </Recovery_Port>
#</Replicate_Server_Info>
<Recovery_Port> 7779 </Recovery_Port>
<Rsync_Path> /usr/bin/rsync </Rsync_Path>
<Rsync_Option> ssh -1 </Rsync_Option>
<When_Stand_Alone> read_only  </When_Stand_Alone>
<Not_Replicate_Info>
        <DB_Name>     test_db      </DB_Name>
        <Table_Name>  log_table    </Table_Name>
</Not_Replicate_Info>


/usr/local/pgsql/data/pg_hda.confの内容は以下の通りです。
クラスタサーバ2台同じ設定内容です。
local   all         all                                             trust
host    all         all         127.0.0.1         255.255.255.255   trust
host    all         all         192.168.0.0       255.255.255.0     trust


/usr/local/pgsql/data/postgresql.confの内容は以下の通りです。
クラスタサーバ2台同じ設定内容です。
tcpip_socket = true
silent_mode = true


レプリケーションサーバの設定


/usr/local/pgsql/data/pgreplicate.confの内容は以下の通りです。
<Cluster_Server_Info>
    <Host_Name>   test1.aaa.jp  </Host_Name>
    <Port>        5432                </Port>
    <Recovery_Port>       7779        </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>   test2.aaa.jp  </Host_Name>
    <Port>        5432                </Port>
    <Recovery_Port>       7779        </Recovery_Port>
</Cluster_Server_Info>
#<Cluster_Server_Info>
#    <Host_Name>   cluster3.postgres.jp   </Host_Name>
#    <Port>        5432                </Port>
#    <Recovery_Port>       7779       </Recovery_Port>
#</Cluster_Server_Info>
#<LoadBalance_Server_Info>
#       <Host_Name>   loadbalancer.postgres.jp  </Host_Name>
#       <Recovery_Port>       7780            </Recovery_Port>
#</LoadBalance_Server_Info>
#<Replicate_Server_Info>
#       <Host_Name> test0.aaa.jp </Host_Name>
#       <Port> 8777 </Port>
#       <Recovery_Port> 7778 </Recovery_Port>
#</Replicate_Server_Info>
<Replication_Port>    8777            </Replication_Port>
<Recovery_Port>       7778            </Recovery_Port>
<Response_Mode>       normal          </Response_Mode>


/usr/local/pgsql/data/pg_hda.confの内容は変更していません。
local   all         all                                             trust
host    all         all         127.0.0.1         255.255.255.255   trust
host    all         all         192.168.0.0       255.255.255.0     trust


/usr/local/pgsql/data/postgresql.confの内容は以下の通りです。
tcpip_socket = true
silent_mode = true


rsyncの鍵もサーバー間で交換済みですパスワード無しで相互にファイル交換
出来るか動作確認済みなのでFQDN形式による名前の解決は出来てるように見えます。
(rsync コマンドを叩いてテストしました。)


【発生した現象1】
1.0.6bの頃と同じようにレプリケーションサーバー、クラスタサーバ(マスタDB)
クラスタサーバ(2号機)の順に立ち上げると、レプリケーションサーバーの
pgreplicate.stsにエラーが出ます。
Tue Jun  9 14:11:46 2004  port(5432) host:test1.aaa.jp start use
Tue Jun  9 14:11:46 2004  port(5432) host:test2.aaa.jp start use
Tue Jun  9 14:11:46 2004  cascade(test0.aaa.jp) port(8777) start use
Tue Jun  9 14:11:46 2004  cascade(test0.aaa.jp) port(8777) become top
Tue Jun  9 14:11:46 2004  port(5432) host:test1.aaa.jp error
Tue Jun  9 14:11:46 2004  port(5432) host:test2.aaa.jp error
1.0.7RC2は立ち上げ方が違うんでしょうか。それとも設定方法に誤りがあるので
しょうか。


【発生した現象2】
【発生した現象1】のエラーを回避する為に、クラスタサーバ(マスタDB)
クラスタサーバ(2号機)、レプリケーションサーバーの順に立ち上げます。
レプリケーションサーバーのpgreplicate.stsもOKです。
Wed Jun  9 14:45:01 2004  port(5432) host:test1.aaa.jp start use
Wed Jun  9 14:45:01 2004  port(5432) host:test2.aaa.jp start use
Wed Jun  9 14:45:01 2004  cascade(test0.aaa.jp) port(8777) start use
Wed Jun  9 14:45:01 2004  cascade(test0.aaa.jp) port(8777) become top


マスターDBのコンソールでpsql -lと入力して結果が返ってくるまで10分程
掛かります。クラスタ2台とも同じ動作をします。

        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 template0 | postgres | UNICODE
 template1 | postgres | UNICODE
(2 rows)


レプリケーションサーバーのpgreplicate.logです。

Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  dbPersistLogin  timeout
Wed Jun  9 14:15:31 2004  New Transaction but pgr_createConn failed
Wed Jun  9 14:15:31 2004  setTransactionTbl failed
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  PQsetdbLogin failed. close socket!!
Wed Jun  9 14:15:31 2004  dbPersistLogin  timeout
Wed Jun  9 14:15:31 2004  New Transaction but pgr_createConn failed
Wed Jun  9 14:15:31 2004  setTransactionTbl failed
Wed Jun  9 14:25:58 2004  session closed
Wed Jun  9 15:15:01 2004  session closed
Wed Jun  9 15:25:01 2004  session closed
Wed Jun  9 15:35:01 2004  session closed
Wed Jun  9 15:45:00 2004  session closed
Wed Jun  9 15:45:01 2004  session closed
Wed Jun  9 15:55:02 2004  session closed
Wed Jun  9 16:05:02 2004  session closed
Wed Jun  9 17:26:40 2004  session closed
Wed Jun  9 17:36:40 2004  session closed
Wed Jun  9 17:36:40 2004  session closed
Wed Jun  9 17:46:40 2004  session closed


[postgres @ test0 data]$ pgreplicate -D /usr/local/pgsql/data -W /usr/local/pgsql/
data -l -n -v
DEBUG(init_server_tbl): /usr/local/pgsql/data/pgreplicate.log open ok

DEBUG(init_server_tbl): /usr/local/pgsql/data/pgreplicate.sts open ok

DEBUG(init_server_tbl): PGR_Get_Conf_Data ok
DEBUG(init_server_tbl): LoadBalanceTbl allocate ok
DEBUG(init_server_tbl): CascadeTbl shmget ok
DEBUG(init_server_tbl): CascadeTbl shmat ok
DEBUG(init_server_tbl): CascadeInf shmget ok
DEBUG(init_server_tbl): CascadeInf shmat ok
DEBUG(init_server_tbl): CommitLog shmget ok
DEBUG(init_server_tbl): Commit_Log_Tbl shmat ok
DEBUG(init_server_tbl): Conf data read ok
DEBUG(init_server_tbl): HostTbl shmget ok
DEBUG(init_server_tbl): HostTbl shmat ok
DEBUG(write_log_file): LockWaitTbl shmget ok
DEBUG(write_log_file): LockWaitTbl shmat ok
DEBUG(replicate_main): replicate main 8777 port bind OK

DEBUG(PGRreplicate_packet_send): cmdSts=N

DEBUG(PGRreplicate_packet_send): cmdType=

DEBUG(PGRreplicate_packet_send): port=0

DEBUG(PGRreplicate_packet_send): pid=0

DEBUG(PGRreplicate_packet_send): except_host=

DEBUG(PGRreplicate_packet_send): from_host=test0.aaa.jp

DEBUG(PGRreplicate_packet_send): dbName=template1

DEBUG(PGRreplicate_packet_send): userName=postgres

DEBUG(PGRreplicate_packet_send): recieve sec=0

DEBUG(PGRreplicate_packet_send): recieve usec=0

DEBUG(PGRreplicate_packet_send): query_size=62

DEBUG(PGRreplicate_packet_send): query=SELECT PGR_SYSTEM_COMMAND_FUNCTION(1,'tes
t0.aaa.jp',8777,7778)

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): not same host:
DEBUG(is_need_response): sem_lock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:0@ host:5432 @ test1.aaa.jp

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test1.aaa.jp

DEBUG(PGRsend_replicate_packet_to_server): host(test1.aaa.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test1.aaa.jp] port[5432] db[template1]
user[postgres]
DEBUG(PGRrecovery_main): PGRrecovery_main bind port 7778
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432 user:postgres host:test1.aaa
.jp query:SELECT PGR_SYSTEM_COMMAND_FUNCTION(1,'test0.aaa.jp',8777,7778)
DEBUG(insertTransactionTbl): sem_lock[2]
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(getTransactionTbl): hit !! transaction tbl host test1.aaa.jp db:template1
pid:0
DEBUG(PGRis_same_host): not same host:
DEBUG(PGRreplicate_packet_send_each_server): except:0@ host:5432 @ test2.aaa.jp

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test2.aaa.jp

DEBUG(PGRsend_replicate_packet_to_server): host(test2.aaa.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test2.aaa.jp] port[5432] db[template1]
user[postgres]
DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432 user:postgres host:test2.aaa
.jp query:SELECT PGR_SYSTEM_COMMAND_FUNCTION(1,'test0.aaa.jp',8777,7778)
DEBUG(insertTransactionTbl): sem_lock[3]
DEBUG(PGRsem_lock): sem_unlock[2]
DEBUG(getTransactionTbl): hit !! transaction tbl host test2.aaa.jp db:template1
pid:0
DEBUG(getTransactionTbl): sem_unlock[3]
DEBUG(PGRsem_unlock): PGRreplicate_packet_send end
DEBUG(PGRsem_unlock): wait replicate

DEBUG(PGRsem_unlock): replicate main: selected

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): replicate main: selected

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): replicate main: selected

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): replicate_loop selected

DEBUG(PGRread_packet): query size=31
DEBUG(PGRread_packet): read[31] query[SET client_encoding = 'UNICODE']
DEBUG(PGRread_packet): query :: SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): cmdSts=Q

DEBUG(PGRreplicate_packet_send): cmdType=T

DEBUG(PGRreplicate_packet_send): port=5432

DEBUG(PGRreplicate_packet_send): pid=2481

DEBUG(PGRreplicate_packet_send): except_host=test1.aaa.jp

DEBUG(PGRreplicate_packet_send): from_host=test1.aaa.jp

DEBUG(PGRreplicate_packet_send): dbName=template1

DEBUG(PGRreplicate_packet_send): userName=postgres

DEBUG(PGRreplicate_packet_send): recieve sec=1086769000

DEBUG(PGRreplicate_packet_send): recieve usec=305849

DEBUG(PGRreplicate_packet_send): query_size=31

DEBUG(PGRreplicate_packet_send): query=SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): 5432 @ test1.aaa.jp return trigger
DEBUG(is_need_sync_time): sem_lock[1]
DEBUG(PGRreturn_result): PGRreturn_result[]
DEBUG(PGRreturn_result): 128 send
DEBUG(PGRreturn_result): status of PGRreturn_result[0]
DEBUG(PGRreturn_result): sem_lock[2]
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:5432 @ test1.aaa.jp host:5432@
test2.aaa.jp

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test2.aaa.jp

DEBUG(PGRsend_replicate_packet_to_server): host(test2.aaa.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test2.aaa.jp] port[5432] db[template1]
user[postgres]
DEBUG(replicate_loop): replicate_loop selected

DEBUG(PGRread_packet): query size=31
DEBUG(PGRread_packet): read[31] query[SET client_encoding = 'UNICODE']
DEBUG(PGRread_packet): query :: SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): cmdSts=Q

DEBUG(PGRreplicate_packet_send): cmdType=T

DEBUG(PGRreplicate_packet_send): port=5432

DEBUG(PGRreplicate_packet_send): pid=2483

DEBUG(PGRreplicate_packet_send): except_host=test1.aaa.jp

DEBUG(PGRreplicate_packet_send): from_host=test1.aaa.jp

DEBUG(PGRreplicate_packet_send): dbName=template1

DEBUG(PGRreplicate_packet_send): userName=postgres

DEBUG(PGRreplicate_packet_send): recieve sec=1086769000

DEBUG(PGRreplicate_packet_send): recieve usec=311010

DEBUG(PGRreplicate_packet_send): query_size=31

DEBUG(PGRreplicate_packet_send): query=SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(PGRis_same_host): 5432 @ test1.aaa.jp return trigger
DEBUG(is_need_sync_time): sem_lock[1]
DEBUG(PGRreturn_result): PGRreturn_result[]
DEBUG(PGRreturn_result): 128 send
DEBUG(PGRreturn_result): status of PGRreturn_result[0]
DEBUG(PGRreturn_result): sem_lock[2]
DEBUG(replicate_loop): replicate_loop selected

DEBUG(PGRread_packet): query size=31
DEBUG(PGRread_packet): read[31] query[SET client_encoding = 'UNICODE']
DEBUG(PGRread_packet): query :: SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): cmdSts=Q

DEBUG(PGRreplicate_packet_send): cmdType=T

DEBUG(PGRreplicate_packet_send): port=5432

DEBUG(PGRreplicate_packet_send): pid=2405

DEBUG(PGRreplicate_packet_send): except_host=test2.aaa.jp

DEBUG(PGRreplicate_packet_send): from_host=test2.aaa.jp

DEBUG(PGRreplicate_packet_send): dbName=template1

DEBUG(PGRreplicate_packet_send): userName=postgres

DEBUG(PGRreplicate_packet_send): recieve sec=1086769000

DEBUG(PGRreplicate_packet_send): recieve usec=313834

DEBUG(PGRreplicate_packet_send): query_size=31

DEBUG(PGRreplicate_packet_send): query=SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(is_need_response): sem_lock[1]
DEBUG(replicate_loop): replicate main: selected

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): replicate_loop selected

DEBUG(PGRread_packet): query size=31
DEBUG(PGRread_packet): read[31] query[SET client_encoding = 'UNICODE']
DEBUG(PGRread_packet): query :: SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): cmdSts=Q

DEBUG(PGRreplicate_packet_send): cmdType=T

DEBUG(PGRreplicate_packet_send): port=5432

DEBUG(PGRreplicate_packet_send): pid=2415

DEBUG(PGRreplicate_packet_send): except_host=test2.aaa.jp

DEBUG(PGRreplicate_packet_send): from_host=test2.aaa.jp

DEBUG(PGRreplicate_packet_send): dbName=template1

DEBUG(PGRreplicate_packet_send): userName=postgres

DEBUG(PGRreplicate_packet_send): recieve sec=1086769000

DEBUG(PGRreplicate_packet_send): recieve usec=321690

DEBUG(PGRreplicate_packet_send): query_size=31

DEBUG(PGRreplicate_packet_send): query=SET client_encoding = 'UNICODE'

DEBUG(PGRreplicate_packet_send): useFlag[2]
DEBUG(is_need_response): sem_lock[1]
DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): wait replicate

DEBUG(pgr_createConn): PQsetdbLogin ok!!
DEBUG(insertTransactionTbl): db:template1 port:5432 user:postgres host:test2.aaa
.jp query:SET client_encoding = 'UNICODE'
DEBUG(insertTransactionTbl): sync_command(SELECT PGR_SYSTEM_COMMAND_FUNCTION(3,1
086769000,305849) )
DEBUG(insertTransactionTbl): PQexec send :SET client_encoding = 'UNICODE'
DEBUG(insertTransactionTbl): PQexec end
DEBUG(getTransactionTbl): hit !! transaction tbl host test2.aaa.jp db:template1
pid:2481
DEBUG(getTransactionTbl): sem_lock[3]
DEBUG(PGRsem_lock): sem_unlock[2]
DEBUG(getTransactionTbl): sem_unlock[3]
DEBUG(PGRsem_unlock): PGRreplicate_packet_send end
DEBUG(PGRsem_unlock): replicate_loop selected

DEBUG(PGRread_packet): connection closed
ERROR(PGRread_packet): session closed
DEBUG(PGRsem_lock): sem_unlock[1]
DEBUG(PGRreplicate_packet_send_each_server): except:5432 @ test1.aaa.jp host:5432@
test2.aaa.jp

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test2.aaa.jp

DEBUG(PGRsend_replicate_packet_to_server): host(test2.aaa.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test2.aaa.jp] port[5432] db[template1]
user[postgres]
DEBUG(child_wait): replicate main: selected

DEBUG(PGRreplicate_packet_send_each_server): except:5432 @ test2.aaa.jp host:5432@
test1.aaa.jp

DEBUG(PGRreplicate_packet_send_each_server): send replicate to:test1.aaa.jp

DEBUG(PGRsend_replicate_packet_to_server): host(test1.aaa.jp) : port(5432)
DEBUG(getTransactionTbl): not found in getTransactionTbl
DEBUG(pgr_createConn): PQsetdbLogin host[test1.aaa.jp] port[5432] db[template1]
user[postgres]
DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): replicate main: selected

DEBUG(replicate_loop): wait replicate

DEBUG(replicate_loop): replicate_loop selected

この後も同じようなログがずっと繰り返し続いています。
この時点でも十分長いですが、これ以下は意図的にカットしました。



現時点ではpostgresqlのチューニング等一切行なっておりません。
各種設定ファイルは1.0.6bと同じような感じで設定しました。
1.0.7RC2になって新たに設定する項目などありましたら教えて下さい。

よろしくお願いします。





pgcluster メーリングリストの案内