[pgcluster: 977] Re: フェイルオーバのテストで pglb= ?ISO-2022-JP?B?GyRCJCxNbiRBJD8kaiEiJS8laSU5JT8kcjgrPDokQyQ/JGohSiEpGyh C?= ）します

2007年 4月 4日 (水) 23:03:20 JST

こんにちは．

ロードバランサはSSLに対応していないのですが，
ひょっとしてSSL通信されていたりしますか？

三谷＠オランダ


> こんばんは、初めて投稿させていただきます。ナカノと申します。
> 社内環境でPGClusterを使用したく動作検証を行っていますが、クラスタフェイルオー
> バのテストでロードバランサが思ったように動作せずに困っています。
> 当方の設定がおかしいのだろうと思いますが、どなたかご教授いただけないでしょう
> か。
>
> 動作環境)
> OS: RedHat Enterprise Linux WS4
> PGCluster Ver.:1.5.0rc16
>
> config時のオプション)
> $ configure --enable-thread-safety --enable-nls=ja
> --enable-multibyte=UNICODE --with-perl --with-python --with-tcl
> --with-openssl
>
> ホストを３つ用意し、１つにpglb、１つにpgreplicate、残り1つでクラスタサーバを
> ２つ動かします。
>    hostA             hostB            hostC
> ------------   ----------------   -------------
> |          |   |              |   |           |
> |          |   | ------------ |   |           |
> |          |---| |portmaster| |---|           |
> |   pglb   |   | ------------ |   |pgreplicate|
> |(port5432)|   |  (port5440)  |   |           |
> |          |   |              |   |           |
> |          |   | ------------ |   |           |
> |          |---| |portmaster| |---|           |
> |          |   | ------------ |   |           |
> |          |   |  (port5441)  |   |           |
> |          |   |              |   |           |
> ------------   ----------------   -------------
>
> 起動手順)
> クラスタ2つ->レプリケータ->バランサの順で起動します。
>   hostB% pg_ctl start -D /usr/local/pgsql/data
>   hostB% pg_ctl start -D /usr/local/pgsql/data2
>   hostC$ pgreplicate -D /usr/local/pgsql/etc -l
>   hostA# pglb -D /usr/local/pgsql/etc -n -v -l
>
> psqlでhostA上のpglbにアクセスし、参照/更新が正常に行なわれるを確認。
>
>
> 現象1)
> pglbをコネクションプーリングonにした状態で、pglb.conf/pgreplicate.conf で最
> 初に記述してあるクラスタサーバ(5440 ＠ hostB)を落すと、数分後にhostA上のpglbが
> 落ちてしまいます。
> pglb(デバッグモード)のログ：
> 2007-04-03 21:08:49 [27163]
> DEBUG:PGRset_status_on_cluster_tbl():host:hostB port:5440 max:2 use:0
> status1
> 2007-04-03 21:08:49 [27163]
> DEBUG:PGRset_status_on_cluster_tbl():host:hostB port:5441 max:2 use:0
> status1
> 2007-04-03 21:08:49 [27163] DEBUG:init_pglb():Child_Tbl size is[144]
> 2007-04-03 21:08:49 [27163] DEBUG:PGRcreate_child():create child
> [5440 ＠ hostB]
> 2007-04-03 21:08:49 [27163] DEBUG:PGRcreate_child():create child
> [5440 ＠ hostB]
> 2007-04-03 21:08:49 [27163] DEBUG:PGRcreate_child():create child
> [5441 ＠ hostB]
> 2007-04-03 21:08:49 [27163] DEBUG:PGRcreate_child():create child
> [5441 ＠ hostB]
>         --- ここで 5440 ＠ hostBに対して pg_ctl stop -D /usr/local/pgsql/data -
> m i
> 2007-04-03 21:09:56 [27164] DEBUG:set_recovery():received no:101
> 2007-04-03 21:09:56 [27164]
> DEBUG:PGRset_status_on_cluster_tbl():host:hp8193 port:5440 max:2 use:1
> status99
> 2007-04-03 21:09:56 [27163] ERROR:scan_cluster_by_pid():pid:27164 not
> found in child table
> 2007-04-03 21:09:56 [27163] ERROR:scan_cluster_by_pid():pid:27166 not
> found in child table
> ----ここで落ちる。
> このときのpgreplicateのDEBUGログ：
> 2007-04-03 21:09:56 [9807] ERROR:PGRcreateConn():Retry. h_errno is
> 0,reason is 'could not connect to server: Connection refused
>         Is the server running on host "172.xx.xx.xxx" and accepting
>         TCP/IP connections on port 5440?
> '
> 2007-04-03 21:09:56 [9807] ERROR:PGRcreateConn():Retry. h_errno is
> 0,reason is 'could not connect to server: Connection refused
>         Is the server running on host "172.xx.xx.xxx" and accepting
>         TCP/IP connections on port 5440?
> '
> 2007-04-03 21:09:56 [9807] DEBUG:PGRsend_load_balance_packet():host[hostA]
> port[6001]
>
> pglbを再起動した後、5440 ＠ hostB をリカバリモードで起動するとrsync による同期
> を取った後、通常運用可能となります。
> 2つめのクラスタ(5441 ＠ hostB)を落した時はバランサは落ちません。
> １つめのクラスタが落ちてもバランサが落ちずに2つ目のクラスタで運用が続くよう
> にしたいです。
>
>
> 現象2)
> pglbをコネクションプーリングoffにして、現象1)と同様、pglb.conf/pgreplicate.c
> onf で最初に記述してあるクラスタサーバ(5440 ＠ hostB)を落した後psqlでpglbにアク
> セスすると、以下の様なエラーが出てしまい、運用を続けることができません。
> psql: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> pglbをデバッグモードで起動してみると、クラスタサーバが落ちたことを認識した後
> も常に同じクラスタ(5440 ＠ hostB)に対してのみアクセスしているようです。
> 2007-04-03 20:10:28 [7957] DEBUG:PGRscan_cluster:0 ClusterDB can be used
> 2007-04-03 20:10:28 [7957] DEBUG:PGRscan_cluster:hostB [5440],useFlag->2
> max->2 use_num->0
> 2007-04-03 20:10:28 [8061] DEBUG:PGRdo_child():I am 8061
> 2007-04-03 20:10:28 [8061] DEBUG:do_accept():I am 8061 accept fd 6
> 2007-04-03 20:10:28 [8061] DEBUG:read_startup_packet():Protocol Major:
> 1234
> Minor: 5679 database:  user:
> 2007-04-03 20:10:28 [8061] DEBUG:PGRdo_child():SSLRequest: sent N; retry
> startup2007-04-04 20:10:28 [8061] DEBUG:read_startup_packet():Protocol
> Major:
> 3 Minor: 0 database: master user: postgres
> 2007-04-03 20:10:28 [8061] ERROR:connect_inet_domain_socket(): connect()
> failed: Connection refused
> 2007-04-03 20:10:28 [8061] DEBUG:PGRset_status_on_cluster_tbl():host:hostB
> port:5440 max:2 use:2 status98
> 2007-04-03 20:10:40 [7957] ERROR:load_balance_main():all clusters were
> dead.
> で、ログをよくみると、
> "0 ClusterDB can be used"とか"all clusters were dead." 等、
> pglbは全てのクラスタが落ちているとかんちがいしているようです。
> (2つめのクラスタサーバが生きているのはpsqlで直接アクセスして確認。)
>
> 同一ホスト上で複数のクラスタを動かしているのが原因かと思い、2つ目のクラスタ
> を新たにhostDを用意して移してみましたが2ケースとも同じ現象になります。
> confファイル内のホスト名は、ドメインなしのホスト名のみ/FQDNどちらでやっても
> 現象変わらずです。
>
>
> hostA の pglb.conf ---
> <Cluster_Server_Info>
>     <Host_Name>   hostB </Host_Name>
>     <Port>        5440  </Port>
>     <Max_Connect> 2     </Max_Connect>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>     <Host_Name>   hostB </Host_Name>
>     <Port>        5441  </Port>
>     <Max_Connect> 2     </Max_Connect>
> </Cluster_Server_Info>
> <Host_Name>   hostA  </Host_Name>
> <Backend_Socket_Dir>  /tmp </Backend_Socket_Dir>
> <Receive_Port>        5432 </Receive_Port>
> <Recovery_Port>       6001 </Recovery_Port>
> <Max_Cluster_Num>     3    </Max_Cluster_Num>
> <Use_Connection_Pooling> no </Use_Connection_Pooling>  #現象１のときはここを
> yes
> <LifeCheck_Timeout>      3s </LifeCheck_Timeout>
> <LifeCheck_Interval>    15s </LifeCheck_Interval>
> <Log_File_Info>
>         <File_Name> /usr/local/pgsql/log/pglb.log </File_Name>
>         <File_Size> 1M </File_Size>
>         <Rotate> 3 </Rotate>
> </Log_File_Info>
>
>
> 5440 ＠ hostB  のcluster.conf -----
> <Replicate_Server_Info>
>         <Host_Name> hostC </Host_Name>
>         <Port> 8001 </Port>
>         <Recovery_Port> 8101 </Recovery_Port>
> </Replicate_Server_Info>
> <Host_Name> hostB </Host_Name>
> <Recovery_Port> 7040 </Recovery_Port>
> <Rsync_Path> /usr/bin/rsync </Rsync_Path>
> <Rsync_Option> ssh -1 </Rsync_Option>
> <Rsync_Compress> yes </Rsync_Compress>
> <Pg_Dump_Path> /usr/local/pgsql/bin/pg_dump
> </Pg_Dump_Path>
> <When_Stand_Alone> read_only </When_Stand_Alone>
> <Replication_Timeout>   1min </Replication_Timeout>
> <LifeCheck_Timeout> 3s </LifeCheck_Timeout>
> <LifeCheck_Interval> 11s </LifeCheck_Interval>
>
>
> 5441 ＠ hostB  のcluster.conf -----
> <Replicate_Server_Info>
>         <Host_Name> hostC </Host_Name>
>         <Port> 8001 </Port>
>         <Recovery_Port> 8101 </Recovery_Port>
> </Replicate_Server_Info>
> <Host_Name> hostB </Host_Name>
> <Recovery_Port> 7040 </Recovery_Port>
> <Rsync_Path> /usr/bin/rsync </Rsync_Path>
> <Rsync_Option> ssh -1 </Rsync_Option>
> <Rsync_Compress> yes </Rsync_Compress>
> <Pg_Dump_Path> /usr/local/pgsql/bin/pg_dump
> </Pg_Dump_Path>
> <When_Stand_Alone> read_only </When_Stand_Alone>
> <Replication_Timeout>   1min </Replication_Timeout>
> <LifeCheck_Timeout> 3s </LifeCheck_Timeout>
> <LifeCheck_Interval> 11s </LifeCheck_Interval>
>
>
> hostC の pgreplicate.conf ---
> <Cluster_Server_Info>
>     <Host_Name>     hostB </Host_Name>
>     <Port>          5440  </Port>
>     <Recovery_Port> 7040  </Recovery_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>     <Host_Name>     hostB </Host_Name>
>     <Port>          5441  </Port>
>     <Recovery_Port> 7041  </Recovery_Port>
> </Cluster_Server_Info>
> <LoadBalance_Server_Info>
>         <Host_Name>     hostA </Host_Name>
>         <Recovery_Port> 6001  </Recovery_Port>
> </LoadBalance_Server_Info>
> <Host_Name> hostC </Host_Name>
> <Replication_Port>              8001            </Replication_Port>
> <Recovery_Port>                 8101            </Recovery_Port>
> <RLOG_Port>                     8301            </RLOG_Port>
> <Response_Mode>                 normal          </Response_Mode>
> <Use_Replication_Log>   no                      </Use_Replication_Log>
> <Replication_Timeout>   1min                    </Replication_Timeout>
> <LifeCheck_Timeout>     3s                      </LifeCheck_Timeout>
> <LifeCheck_Interval>    15s                     </LifeCheck_Interval>
> <Log_File_Info>
>         <File_Name> /usr/local/pgsql/log/pgreplicate.log </File_Name>
>         <File_Size> 1M </File_Size>
>         <Rotate> 3 </Rotate>
> </Log_File_Info>
>
> お手数おかけしますがよろしくお願いします。
>