[pgcluster: 254] レプリケーションがうまく動作しません。

miyazawa makoto miyazawa_pgcluster_mailling @ yahoo.co.jp
2004年 4月 19日 (月) 18:46:47 JST


現在、Solaris8(OS5.8)2台でPGClusterの検証を行っておりま
す、宮澤と申します。

さて、ご相談させていただきたい現象なのですが
レプリケーションサーバとクラスタDBの連動がうまくできてい
ません。
いろいろ試してみたのですが、どうにもわからず皆様のお知恵
をお借りできたらと
思って投稿させていただております。
長文で申し訳ありませんが、よろしくお願いいたします。

以下に環境等の情報を挙げます。
------------------------------------------------------
実機環境:Solaris8(sparc、OS5.8)  2台
サーバ構成:
PGClusterバージョン:PGReplicate version [1.0.6cv9]
0系…レプリケーションサーバ+クラスタサーバ(マスタ)
1系…クラスタサーバ(スレーブ)
ロードバランサは未インストール、未設定

(1)インストール方法
  PGClusterの設定方法、rsyncの設定は
http://www.csra.co.jp/~mitani/jpug/pgcluster/stable/index.html

の設定方法を参照。
お互いのサーバでrsyncを実行し、ファイルが転送されること
を確認。

(2)設定ファイル
設定ファイルは以下のように記述
0系
   pgreplicate.conf
      クラスタサーバ名を追記(/etc/hostsで解決できるホス
ト名、0系をマスタ、1系をスレーブとした)
      それ以外はdefault

   <Cluster_Server_Info>
       <Host_Name>   test1  </Host_Name>
       <Port>        5432                </Port>
       <Recovery_Port>       7779        </Recovery_Port>
   </Cluster_Server_Info>
   <Cluster_Server_Info>
       <Host_Name>   test2  </Host_Name>
       <Port>        5432                </Port>
       <Recovery_Port>       7779        </Recovery_Port>
   </Cluster_Server_Info>

  cluster.conf
      レプリケーションサーバを追記(/etc/hostsで解決でき
るホスト名。0系のホスト名を指定)
      それ以外はdefault

     <Replicate_Server_Info>
        <Host_Name> test1 </Host_Name>
        <Port> 8777 </Port>
        <Recovery_Port> 7778 </Recovery_Port>
     </Replicate_Server_Info>
                    〜
     <Recovery_Port> 7779 </Recovery_Port>
     <Rsync_Path> /usr/local/bin/rsync </Rsync_Path>
     <Rsync_Option> ssh -1 </Rsync_Option>
     <When_Stand_Alone> read_write  </When_Stand_Alone>

  pg_hba.conf
      ネットワークアドレス指定で同一セグメント内からのア
クセスを許可

      # TYPE  DATABASE    USER        IP-ADDRESS       
IP-MASK           METHOD
      local   all         all                             
               trust
      host    all         all         127.0.0.1        
255.255.255.255   trust
      host    all         all         xx.xx.xx.0       
255.255.255.0     trust

1系
  cluster.conf
      レプリケーションサーバを追記(/etc/hostsで解決でき
るホスト名。0系のホスト名を指定)
      それ以外はdefault

    <Replicate_Server_Info>
        <Host_Name> test1 </Host_Name>
        <Port> 8777 </Port>
        <Recovery_Port> 7778 </Recovery_Port>
    </Replicate_Server_Info>
                    〜
    <Recovery_Port> 7779 </Recovery_Port>
    <Rsync_Path> /usr/local/bin/rsync </Rsync_Path>
    <Rsync_Option> ssh -1 </Rsync_Option>
    <When_Stand_Alone> read_write  </When_Stand_Alone>

  pg_hba.conf
      ネットワークアドレス指定で同一セグメント内からのア
クセスを許可

      # TYPE  DATABASE    USER        IP-ADDRESS       
IP-MASK           METHOD
      local   all         all                             
               trust
      host    all         all         127.0.0.1        
255.255.255.255   trust
      host    all         all         xx.xx.xx.0       
255.255.255.0     trust

(3)動作確認手順
   (1) 1系のクラスタDBを以下のコマンドで起動した。
        /usr/local/pgsql/bin/pg_ctl start -D
/usr/local/pgsql/data -o "-i"
        
   (2) 0系のレプリケーションサーバを以下のコマンドで起動
した。
        /usr/local/pgsql/bin/pgreplicate -D
/usr/local/pgsql/etc -nv

   この時点での、レプリケーションのログは以下の通り。
   $ /usr/local/pgsql/bin/pgreplicate -D
/usr/local/pgsql/etc -nv
    DEBUG(init_server_tbl):
/usr/local/pgsql/etc/pgreplicate.log open ok
    DEBUG(init_server_tbl): PGR_Get_Conf_Data ok
    DEBUG(init_server_tbl): LoadBalanceTbl allocate ok
    DEBUG(init_server_tbl): Conf data read ok
    DEBUG(init_server_tbl): HostTbl shmget ok
    DEBUG(init_server_tbl): HostTbl shmat ok
    DEBUG(pgr_set_log): LockWaitTbl shmget ok
    DEBUG(pgr_set_log): LockWaitTbl shmat ok
    DEBUG(PGRrecovery_main): PGRrecovery_main bind port
7778
    DEBUG(PGRrecovery_main): wait recovery
    DEBUG(replicate_main): replicate main 8777 port bind
OK
    DEBUG(replicate_packet_send): cmdSts=N
    DEBUG(replicate_packet_send): cmdType=
    DEBUG(replicate_packet_send): port=0
    DEBUG(replicate_packet_send): pid=0
    DEBUG(replicate_packet_send): except_host=
    DEBUG(replicate_packet_send): from_host=test1
    DEBUG(replicate_packet_send): dbName=template1
    DEBUG(replicate_packet_send): userName=postgres
    DEBUG(replicate_packet_send): recieve sec=0
    DEBUG(replicate_packet_send): recieve usec=0
    DEBUG(replicate_packet_send): query_size=54
    DEBUG(replicate_packet_send): query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,test1,8777,7778)
    DEBUG(replicate_packet_send): useFlag[2]
    DEBUG(PGRis_same_host): not same host:
    DEBUG(is_need_response): same_host[0] mode[1]
current[0]
    DEBUG(is_need_response): sem_lock[1]
    DEBUG(replicate_packet_send_each_server): except:0@
host:5432 @ test1
    DEBUG(replicate_packet_send_each_server): send
replicate to:test1
    DEBUG(PGRsend_replicate_packet_to_server): host(test1)
: port(5432)
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    DEBUG(setDBServerTbl): host:test1 dbName:template1
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): dbPersistLogin  timeout
    ERROR(pgr_createConn): pgr_createConn failed
    ERROR(pgr_createConn): setDBServerTbl failed
    DEBUG(pgr_createConn): sem_unlock[1]
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    DEBUG(PGRis_same_host): not same host:
    DEBUG(is_need_response): same_host[0] mode[1]
current[1]
    DEBUG(is_need_response): sem_lock[2]
    DEBUG(replicate_packet_send_each_server): except:0@
host:5432 @ test2
    DEBUG(replicate_packet_send_each_server): send
replicate to:test2
    DEBUG(PGRsend_replicate_packet_to_server): host(test2)
: port(5432)
    DEBUG(getDBServerTbl): search host is (test2)
    DEBUG(get_ip_by_name): not found
    DEBUG(setDBServerTbl): host:test2 dbName:template1
    DEBUG(getDBServerTbl): search host is (test2)
    DEBUG(get_ip_by_name): not found
    DEBUG(pgr_createConn): PQsetdbLogin ok!!
    DEBUG(get_ip_by_name): send_replicate_packet_to_server
query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,test1,8777,7778)
    DEBUG(get_ip_by_name): db:template1 port:5432
user:postgres host:test2 query:SELECT
PGR_SYSTEM_COMMAND_FUNCTION(1,test1,8777,7778)
    ERROR(get_ip_by_name): PQexec error
    DEBUG(get_ip_by_name): sem_unlock[2]
    DEBUG(getDBServerTbl): search host is (test2)
    DEBUG(get_ip_by_name): search
host(1112880650):port(5432):db(template1)
    DEBUG(get_ip_by_name): found
    DEBUG(get_ip_by_name): replicate_packet_send end
    DEBUG(get_ip_by_name): wait replicate

   (3)  0系のクラスタDBを以下のコマンドで起動した。
        /usr/local/pgsql/bin/pg_ctl start -D
/usr/local/pgsql/data -o "-i"
   
   (4) この状態で以下を確認
       
       0系
          /usr/local/pgsql/bin/psql -h test2 -l
        
         List of databases
            Name    |  Owner   | Encoding
         -----------+----------+----------
          template0 | postgres | EUC_JP
          template1 | postgres | EUC_JP
         (2 rows)
       
       1系
          /usr/local/pgsql/bin/psql -h test1 -l
          
          List of databases
             Name    |  Owner   | Encoding
          -----------+----------+----------
           template0 | postgres | EUC_JP
           template1 | postgres | EUC_JP
          (2 rows)

   (5) 0系で以下を実行
       /usr/local/pgsql/bin/psql -U postgres template1
       
       すると、レプリケーションログで以下のメッセージを
出力
       
    DEBUG(replicate_loop): replicate_loop selected
    DEBUG(read_packet): query size=5
    DEBUG(read_packet): read[5] query[BEGIN]
    DEBUG(read_packet): query :: BEGIN
    DEBUG(replicate_packet_send): cmdSts=T
    DEBUG(replicate_packet_send): cmdType=B
    DEBUG(replicate_packet_send): port=5432
    DEBUG(replicate_packet_send): pid=5371
    DEBUG(replicate_packet_send): except_host=test1
    DEBUG(replicate_packet_send): from_host=test1
    DEBUG(replicate_packet_send): dbName=template1
    DEBUG(replicate_packet_send): userName=postgres
    DEBUG(replicate_packet_send): recieve sec=1082365158
    DEBUG(replicate_packet_send): recieve usec=534535
    DEBUG(replicate_packet_send): query_size=5
    DEBUG(replicate_packet_send): query=BEGIN
    DEBUG(replicate_packet_send): useFlag[2]
    DEBUG(get_ip_by_name): same host:5432 @ 4155360a -
5432 @ 4155360a
    DEBUG(get_ip_by_name): 5432 @ test1 return trigger
    DEBUG(PGRis_need_sync_time): sem_lock[1]
    DEBUG(return_result): 128[3,1082365158,534535]
    DEBUG(return_result):
return_result[3,1082365158,534535]
    DEBUG(return_result): 128 send
    DEBUG(return_result): wait for answer
    DEBUG(return_result): read_answer selected
    DEBUG(read_packet): query size=25
    DEBUG(read_packet): read[25]
query[PGR_QUERY_DONE_NOTICE_CMD]
    DEBUG(read_packet): answer[PGR_QUERY_DONE_NOTICE_CMD]
    DEBUG(read_packet): QUERY DONE
    DEBUG(read_packet): status of return_result[0]
    DEBUG(read_packet): sem_unlock[1]
    DEBUG(get_ip_by_name): not same host:5432 @ 4155360a -
5432 @ 4255360a
    DEBUG(get_ip_by_name): sem_lock[2]
    DEBUG(replicate_packet_send_each_server):
except:5432 @ test1 host:5432 @ test2
    DEBUG(replicate_packet_send_each_server): send
replicate to:test2
    DEBUG(PGRsend_replicate_packet_to_server): host(test2)
: port(5432)
    DEBUG(get_ip_by_name): not found in transaction tbl
host test1 db:template1 pid:5371
    DEBUG(get_ip_by_name): not found in getTransactionTbl
    DEBUG(get_ip_by_name): not found in transaction tbl
host test1 db:template1 pid:5371
    DEBUG(replicate_loop): wait replicate
    DEBUG(pgr_createConn): PQsetdbLogin ok!!
    DEBUG(insertTransactionTbl): db:template1 port:5432
user:postgres host:test2 query:BEGIN
    DEBUG(insertTransactionTbl): sync_command(SELECT
PGR_SYSTEM_COMMAND_FUNCTION(3,1082365158,534535) )
    DEBUG(insertTransactionTbl): PQexec send :BEGIN
    DEBUG(insertTransactionTbl): PQstatus(0)
    DEBUG(child_wait): replicate main: selected
    DEBUG(replicate_loop): replicate_loop selected
    DEBUG(read_packet): query size=54
    DEBUG(read_packet): read[54] query[SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7778)]
    DEBUG(read_packet): query :: SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7778)
    DEBUG(replicate_packet_send): cmdSts=N
    DEBUG(replicate_packet_send): cmdType=
    DEBUG(replicate_packet_send): port=4194305
    DEBUG(replicate_packet_send): pid=1
    DEBUG(replicate_packet_send): except_host=
    DEBUG(replicate_packet_send): from_host=
    DEBUG(replicate_packet_send): dbName=
    DEBUG(replicate_packet_send): userName=
    DEBUG(replicate_packet_send): recieve sec=1082365158
    DEBUG(replicate_packet_send): recieve usec=785650
    DEBUG(replicate_packet_send): query_size=54
    DEBUG(replicate_packet_send): query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7778)
    DEBUG(replicate_packet_send): useFlag[2]
    DEBUG(PGRis_same_host): not same host:
    DEBUG(is_need_response): same_host[0] mode[1]
current[0]
    DEBUG(is_need_response): sem_lock[1]
    DEBUG(replicate_packet_send_each_server):
except:4194305@ host:5432 @ test1
    DEBUG(replicate_packet_send_each_server): send
replicate to:test1
    DEBUG(PGRsend_replicate_packet_to_server): host(test1)
: port(5432)
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    DEBUG(setDBServerTbl): host:test1 dbName:
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    DEBUG(replicate_loop): wait replicate
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): dbPersistLogin  timeout
    ERROR(pgr_createConn): pgr_createConn failed
    ERROR(pgr_createConn): setDBServerTbl failed
    DEBUG(pgr_createConn): sem_unlock[1]
    DEBUG(PGRis_same_host): not same host:
    DEBUG(is_need_response): same_host[0] mode[1]
current[1]
    DEBUG(is_need_response): sem_lock[2]

    0系のクラスタサーバ起動を行ったTeratermに以下のメッ
セージが表示される。
    FATAL:  no PostgreSQL user name specified in startup
packet
    FATAL:  no PostgreSQL user name specified in startup
packet
    FATAL:  no PostgreSQL user name specified in startup
packet
    FATAL:  no PostgreSQL user name specified in startup
packet
    FATAL:  no PostgreSQL user name specified in startup
packet
    FATAL:  no PostgreSQL user name specified in startup
packet
    
    しばらくコマンドを受け付けない。
    また、/usr/local/pgsql/data 配下にcoreファイルを出力
    
    1系のクラスタサーバ起動を行ったTeratermに以下のメッ
セージ
       $ LOG:  pq_recvbuf: unexpected EOF on client
connection
       
    (6)0系のクラスタサーバ起動を行ったTeratermで以下のコ
マンドを実行。
       template1=# select * from shinamono;
           hinmei    | nedan
       --------------+-------
        rentalserver |  3000
       (1 row)
       #insert into shinamono values('only-shar',30000);
       (しばらく無反応)
       template1=# select * from shinamono;
           hinmei    | nedan
       --------------+-------
        rentalserver |  3000
        only-shar    | 30000
       (2 rows)       

   (7) 1系で以下を実行
       /usr/local/pgsql/bin/psql -U postgres template1
       (しばらく無反応)
       
       すると、レプリケーションログで以下のメッセージを
出力
    DEBUG(replicate_loop): replicate main: selected
    DEBUG(replicate_loop): replicate_loop selected
    DEBUG(read_packet): query size=5
    DEBUG(read_packet): read[5] query[BEGIN]
    DEBUG(read_packet): query :: BEGIN
    DEBUG(replicate_packet_send): cmdSts=T
    DEBUG(replicate_packet_send): cmdType=B
    DEBUG(replicate_packet_send): port=5432
    DEBUG(replicate_packet_send): pid=762
    DEBUG(replicate_packet_send): except_host=test2
    DEBUG(replicate_packet_send): from_host=test2
    DEBUG(replicate_packet_send): dbName=template1
    DEBUG(replicate_packet_send): userName=postgres
    DEBUG(replicate_packet_send): recieve sec=1082366639
    DEBUG(replicate_packet_send): recieve usec=855403
    DEBUG(replicate_packet_send): query_size=5
    DEBUG(replicate_packet_send): query=BEGIN
    DEBUG(replicate_packet_send): useFlag[2]
    DEBUG(get_ip_by_name): not same host:5432 @ 4255360a -
5432 @ 4155360a
    DEBUG(is_need_response): same_host[0] mode[1]
current[0]
    DEBUG(is_need_response): sem_lock[1]
    DEBUG(replicate_packet_send_each_server):
except:5432 @ test2 host:5432 @ test1
    DEBUG(replicate_packet_send_each_server): send
replicate to:test1
    DEBUG(PGRsend_replicate_packet_to_server): host(test1)
: port(5432)
    DEBUG(get_ip_by_name): not found in transaction tbl
host test2 db:template1 pid:762
    DEBUG(get_ip_by_name): not found in getTransactionTbl
    DEBUG(get_ip_by_name): not found in transaction tbl
host test2 db:template1 pid:762
    DEBUG(pgr_createConn): PQsetdbLogin ok!!
    DEBUG(insertTransactionTbl): db:template1 port:5432
user:postgres host:test1 query:BEGIN
    DEBUG(insertTransactionTbl): sync_command(SELECT
PGR_SYSTEM_COMMAND_FUNCTION(3,1082366639,855403) )
    DEBUG(insertTransactionTbl): PQexec send :BEGIN
    DEBUG(insertTransactionTbl): PQstatus(0)
    DEBUG(replicate_loop): wait replicate
    DEBUG(replicate_loop): replicate main: selected
    DEBUG(replicate_loop): replicate_loop selected
    DEBUG(read_packet): query size=54
    DEBUG(read_packet): read[54] query[SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7779)]
    DEBUG(read_packet): query :: SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7779)
    DEBUG(replicate_packet_send): cmdSts=N
    DEBUG(replicate_packet_send): cmdType=
    DEBUG(replicate_packet_send): port=4194305
    DEBUG(replicate_packet_send): pid=1
    DEBUG(replicate_packet_send): except_host=
    DEBUG(replicate_packet_send): from_host=
    DEBUG(replicate_packet_send): dbName=
    DEBUG(replicate_packet_send): userName=
    DEBUG(replicate_packet_send): recieve sec=1082366640
    DEBUG(replicate_packet_send): recieve usec=114786
    DEBUG(replicate_packet_send): query_size=54
    DEBUG(replicate_packet_send): query=SELECT
PGR_SYSTEM_COMMAND_FUNCTION(2,test1,8777,7779)
    DEBUG(replicate_packet_send): useFlag[2]
    DEBUG(PGRis_same_host): not same host:
    DEBUG(is_need_response): same_host[0] mode[1]
current[0]
    DEBUG(is_need_response): sem_lock[1]
    DEBUG(replicate_loop): wait replicate

    0系のクラスタサーバ起動を行ったTeratermに以下のメッ
セージ
       $ LOG:  pq_recvbuf: unexpected EOF on client
connection

    (8)1系のクラスタサーバ起動を行ったTeratermで以下のコ
マンドを実行。
       template1=# select * from shinamono;
           hinmei    | nedan
       --------------+-------
        rentalserver |  3000
       (1 row)

それでお聞きしたいことですが、以下の通りです。
(1)0系でinsertしたレコードが1系でレプリケーションされな
い根本的な原因はなんでしょうか?
(2)動作確認手順(2)、(5)でのログなのですが
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    DEBUG(setDBServerTbl): host:test1 dbName:template1
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): PQsetdbLogin failed. close
socket!!
    ERROR(pgr_createConn): dbPersistLogin  timeout
    ERROR(pgr_createConn): pgr_createConn failed
    ERROR(pgr_createConn): setDBServerTbl failed
    DEBUG(pgr_createConn): sem_unlock[1]
    DEBUG(getDBServerTbl): search host is (test1)
    DEBUG(get_ip_by_name): not found
    
    とありますように、どうもホストの検索で失敗しているよ
うに見えます。
    /etc/hostsに自分、相手のホスト名を追記するだけで問題
ないとおもっていたのですが
    なにか設定する必要があるのでしょうか?
(3)動作確認手順(5),(7)でエラーメッセージ
     LOG:  pq_recvbuf: unexpected EOF on client connection
     が表示される原因がよくわかりません。

なんでもかんでも、聞いているようで恐縮なのですが
なにでもかまわないのでよろしくお願い致します。

以上

__________________________________________________
Do You Yahoo!?
http://bb.yahoo.co.jp/




pgcluster メーリングリストの案内