[pgcluster: 493] Re: av13でリカバリーができない

2004年 8月 12日 (木) 13:46:29 JST

三谷様
皆様

いつもお世話になっております。
青嶋と申します。

av13への反映ありがとうございました。
私も幾つか検証を行いました結果、
レプリケーション時に以下のような状況となりましたので、
ご報告させて頂きます。

【環境】
ロードバランサ　1台
クラスタDB　　  3台
レプリケーション 1台

【現象】
クライアントがロードバランサ接続時に
停止したクラスタDBを検知し新たなクラスタDBへの接続を行った際、
クライアントが送信したInsertがレプリケートされ続けます。

【条件・手順】
(1)クラスタDB（2台目）を一台停止します。
(2)クライアント1，2からLBへ接続を行います。
　
　　クライアント1：　クラスタDB(1台目)へ接続されます。
　　クライアント2：　2台目のクラスタDBへ接続を試みエラーを検知し、
  　　　　　　　　3台目に再接続されます

(3)クライアント2からInsertを送信します。
　クライアント2への応答がありません。
　（クライアント1台目から送られた場合は、正常に動作しています。）

(4)レプリケーションサーバのログでは、以下のような表示となりました。
　1)　クライアント2がクラスタ(3台目)に送信した
　　　Insert文が繰り返しレプリケートされています。

　2)　レプリケーションサーバは、
　　　クラスタDB（3台目）から受けたクエリを、再度クラスタDB(3台目)にも
　　　レプリケートしているログが出力されています。
　　　（"except クラスタDB3ホスト名"の表示後、
　　　 "send replicate to クラスタDB3"とログ表示されています）

　　　再接続を行ったクライアントからのクエリのみ、
　　　繰り返しレプリケートされているようです。

以上です。


On Thu, 12 Aug 2004 00:01:55 +0900
kazunari takahashi <kazunari.takahashi ＠ ctc-g.co.jp> wrote:

> 高橋です．
> 
> 早速、av13で試してみました．
> 下記のような現象がでましたので報告させていただきます．
> 
> ####################################
> 現象
> ####################################
> 
> ・通常のリカバリーがエラーで終了する
> 
> ####################################
> #環境
> ####################################
> pgcluster-1.0.7av13
> 
> clusterDB ×3 (solaris8 sparc) host名:serverA , serverB , serverC
> rgrp × 1 (solaris8 sparc) host名：pgrp
> 
> ####################################
> #手順
> ####################################
> 
> 1. レプリケーションサーバ立ち上げ
> 
> 2. serverA起動
> 
> 3. serverBをリカバリーモードで起動
> 
> % pg_ctl start -D /usr/local/pgsql/data -o "-i -R"
> postmaster successfully started
> Start in recovery mode!
> Please wait until a data synchronization finishes from Master DB...
> /usr/local/pgsql/bin/postmaster: sorry, recovery failed.
> 
> 
> 
> -----------------------------------------------------------------------------------
> replicate.conf
> -----------------------------------------------------------------------------------
> pgrp% more pgreplicate.conf
> #=============================================================
> #  PGReplicate configuration file
> #-------------------------------------------------------------
> # file: pgreplicate.conf
> #-------------------------------------------------------------
> # This file controls:
> #       o which hosts & port are cluster server
> #       o which port use for replication request from cluster server
> #=============================================================
> #
> #-------------------------------------------------------------
> # A setup of Cluster DB(s)
> #
> #               o Host_Name : The host name of Cluster DB.
> #                             -- please write a host name by FQDN.
> #                             -- do not write IP address.
> #               o Port : The connection port with postmaster.
> #               o Recovery_Port : The connection port at the time of
> #                                 a recovery sequence .
> #-------------------------------------------------------------
> <Cluster_Server_Info>
>     <Host_Name>   serverA  </Host_Name>
>     <Port>        5432                </Port>
>     <Recovery_Port>       7779        </Recovery_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>     <Host_Name>   serverB </Host_Name>
>     <Port>        5432                </Port>
>     <Recovery_Port>       7779        </Recovery_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>     <Host_Name>   serverC   </Host_Name>
>     <Port>        5432                </Port>
>     <Recovery_Port>       7779       </Recovery_Port>
> </Cluster_Server_Info>
> #
> #-------------------------------------------------------------
> # A setup of Load Balance Server
> #
> #               o Host_Name : The host name of a load balance server.
> #                             -- please write a host name by FQDN.
> #                             -- do not write IP address.
> #               o Recovery_Port : The connection port at the time of
> #                                 a recovery sequence .
> #-------------------------------------------------------------
> #<LoadBalance_Server_Info>
> #       <Host_Name>   pglb  </Host_Name>
> #       <Recovery_Port>       7780            </Recovery_Port>
> #</LoadBalance_Server_Info>
> #
> #------------------------------------------------------------
> # A setup of the upper replication server for cascade connection.
> #
> #               o Host_Name : The host name of Cluster DB.
> #                             -- please write a host name by FQDN.
> #                             -- do not write IP address.
> #               o Port : The connection port with postmaster.
> #               o Recovery_Port : The connection port at the time of
> #                                 a recovery sequence .
> #------------------------------------------------------------
> #<Replicate_Server_Info>
> #       <Host_Name> pglb </Host_Name>
> #       <Port> 8887 </Port>
> #       <Recovery_Port> 7778 </Recovery_Port>
> #</Replicate_Server_Info>
> #
> #-------------------------------------------------------------
> # A setup of a replication server
> #
> #               o Replicate_Port : connection for reprication
> #               o Recovery_Port : connection for recovery
> #               o Response_mode : timing which returns a response
> #                 normal   -- return result of DB which received the query
> #                 reliable -- return result after waiting for response of
> #                      all Cluster DBs.
> #-------------------------------------------------------------
> <Replication_Port>    8777            </Replication_Port>
> <Recovery_Port>       7778            </Recovery_Port>
> <Response_Mode>       reliable          </Response_Mode>
> 
> 
> -----------------------------------------------------------------------------------
> debugログ
> 
> DEBUG(replicate_loop): replicate main: selected
> 
> DEBUG(pgrecovery_loop): recovery accept port 7778
> 
> DEBUG(read_packet): receive packet no:200
> 
> DEBUG(read_packet): recovery error accept. top queueing and initiarse recovery status
> 
> DEBUG(PGRsend_queue): master  - 0
> 
> ERROR(PGRget_HostTbl): master table is null
> 
> ERROR(send_recovery_packet): send() failed. (Socket operation on non-socket)
> ERROR(send_recovery_packet): send() failed. (Bad file number)
> ERROR(send_recovery_packet): send() failed. (Bad file number)
> ERROR(send_recovery_packet): send() failed. (Bad file number)
> ERROR(send_recovery_packet): send() failed. (Bad file number)
> ERROR(send_recovery_packet): send failed and PGR_Create_Socket_Connect failed
> DEBUG(replicate_loop): replicate_loop selected
> 
> DEBUG(PGRclear_connections): replicate loop exit
> DEBUG(replicate_loop): wait replicate
> 
> 
> 
> ---------------------------------------
> 
> 高橋 一成 <kazunari.takahashi ＠ ctc-g.co.jp>

===============================================
青嶋　憲太郎　AOSHIMA Kentaro
メール : aoshima.kentaro ＠ nttcom.co.jp
===============================================