[pgcluster: 968] Re: リカバリモードでpostmasterが起動しない

2007年 3月 30日 (金) 09:30:55 JST

お世話になります。
すやまです。

下記の件、エラー部分のソース上での調査、
英語のメーリングリストなどで調査を続けていましたが、
決定的な設定ミスを見つけることができずに悩んでいました....
最終手段としてバージョンアップをしてみました。
pgcluster-1.3.1rc7で同じ定義と環境で
動作確認したところ、問題なく動作してしまいました。

pgcluster-1.3.0cで作成したデータをそのまま使いたいので、
dataフォルダをpgcluster-1.3.1rc7のPGDATAにコピーして
動作させました。
そのため、pg_hba.h のデータベース定義に
対象のDBに加え、template1を追加してしまいました。
template1, 0はシステムで使うDBなので、対象にしては
まずいかとも思いましたが....

<pg_hba.conf>
host    test    postgres        xxx.xxx.xxx.0    255.255.255.0   trust
host    template1  postgres   xxx.xxx.xxx.xxx  255.255.255.255 trust
host    template1  postgres   xxx.xxx.xxx.xxx  255.255.255.255 trust

なぜバージョンアップをすると、うまくいったのかよくわからないので、
再度、pgcluster-1.3.0cで確認してみようと思います。
とりあえず....お騒がせしました。

１点改善要望です。（サイトについて）
私のような初心者はまず以下のページをみてしまいます。
http://pgcluster.projects.postgresql.org/jp/index.html
十分注意していないのがよくないとは思いますが、
ついつい、そのページのダウンロードから媒体をダウンロードしてしまいます。
できればダウンロード先をpgFoundryへ変更してもらったほうが
最新版で検証できるのでよいと思います。

これからも負荷分散・レプリケーションと検証など行いたいと
思っていますので、
また初歩的な質問を出してしまうかもしれませんが、
よろしくお願いいたします。

以上です。

----- Original Message ----- 
From: "Ikuko Suyama" <I.Suyama ＠ cec-ltd.co.jp>
To: <pgcluster ＠ ml.postgresql.jp>
Sent: Wednesday, March 28, 2007 8:54 PM
Subject: リカバリモードでpostmasterが起動しない

> お世話になります。
> すやまと申します。
>
> 2台構成(host_a, host_b)で、レプリケーション機能を使って
> データのバックアップを採取しています。
>
> 以前構築・確認したマシンから、別マシンに環境を
> 移行しようとしています。
> その際にどうしても、リカバリで失敗してしまい、
> とても困っています。
>
> 同じような内容でメーリングリストに投稿されていますが、
> その対処方法などで試しても状況は変わりませんでした。
> また、移行前の環境ファイルに置き換えてみたり、
> DBのスーパーユーザ名と同じ名前のDBを作ったり、
> ホスト名をFQDNに変えたり、PGClusterの再構築など実施しましたが、
> この状態から抜け出せません....
> 多忙続きでもしかしたら単純ミスかもしれず、とても不安ですが...
> どなたかアドバイスを頂けないでしょうか？
>
>
> ■操作
> Master  起動（pgreplicate, pg_ctl: read_writeモード)
> Master  DB, table作成
> Slave    起動（pgreplicate, pg_ctl; read_writeモード)
> Slave    DB, table作成
> Slave    停止
> Master  再起動（pgreplicate, pg_ctl; read_onlyモード)
> Master  INSERT開始
> Slave    リカバリモードで起動（pgreplicate, pg_ctl -R)
>
> すると以下のメッセージが表示され、起動できません。
> [postgres ＠ host_b tmp]$ pg_ctl -w -o '-R' start
> waiting for postmaster to start....Start in recovery mode!
> Please wait until a data synchronization finishes from Master DB...
> PGR_Get_Cluster_Conf_Data 
> failed...........................................................could not 
> start postmaster
> [postgres ＠ host_b tmp]$
>
> ■hosts定義
> host_a    xx.xx.xx.xx
> host_b    xx.xx.xx.xx
>
> ■hostname
> host_a, host_b で表示されるよう設定済
>
> ■rsync
> host_a, host_b のpostgresユーザからファイルの転送が
> できることを確認済。
> rootユーザでもファイルの転送はできるが、パスワードが
> 毎回要求される。
>
> ■Master(host_a)側定義
> □　cluster.conf
> <Replicate_Server_Info>
>        <Host_Name> host_a </Host_Name>
>        <Port> 8001 </Port>
>        <Recovery_Port> 8101 </Recovery_Port>
>        <LifeCheck_Port> 8201 </LifeCheck_Port>
> </Replicate_Server_Info>
> <Replicate_Server_Info>
>        <Host_Name> host_b </Host_Name>
>        <Port> 8002 </Port>
>        <Recovery_Port> 8102 </Recovery_Port>
>        <LifeCheck_Port> 8202 </LifeCheck_Port>
> </Replicate_Server_Info>
> <Recovery_Port> 7101 </Recovery_Port>
> <LifeCheck_Port> 7201 </LifeCheck_Port>
> <Rsync_Path> /usr/bin/rsync </Rsync_Path>
> <Rsync_Option> ssh -1 </Rsync_Option>
> <When_Stand_Alone> read_only  </When_Stand_Alone>
> <Status_Log_File>  /tmp/cluster.sts </Status_Log_File>
> <Error_Log_File> /tmp/cluster.log  </Error_Log_File>
>
> □　pgrelicate.conf
> <Cluster_Server_Info>
>    <Host_Name>   host_a </Host_Name>
>    <Port>                5432        </Port>
>    <Recovery_Port>       7101        </Recovery_Port>
>    <LifeCheck_Port>      7201        </LifeCheck_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>    <Host_Name>   host_b </Host_Name>
>    <Port>                5432        </Port>
>    <Recovery_Port>       7101        </Recovery_Port>
>    <LifeCheck_Port>      7201        </LifeCheck_Port>
> </Cluster_Server_Info>
> <Status_Log_File>  /tmp/pgreplicate.sts  </Status_Log_File>
> <Error_Log_File>   /tmp/pgreplicate.log  </Error_Log_File>
> <Replication_Port>       8001            </Replication_Port>
> <Recovery_Port>          8101            </Recovery_Port>
> <LifeCheck_Port>         8201            </LifeCheck_Port>
> <RLOG_Port>              8301            </RLOG_Port>
> <Response_Mode>        normal            </Response_Mode>
> <Use_Replication_Log>      no            </Use_Replication_Log>
> <Reserved_Connections>      1            </Reserved_Connections>
>
>
> ■Slave(host_b)側定義
> □　cluster.conf
> <Replicate_Server_Info>
>        <Host_Name> host_a </Host_Name>
>        <Port> 8001 </Port>
>        <Recovery_Port> 8101 </Recovery_Port>
>        <LifeCheck_Port> 8201 </LifeCheck_Port>
> </Replicate_Server_Info>
> <Replicate_Server_Info>
>        <Host_Name> host_b </Host_Name>
>        <Port> 8002 </Port>
>        <Recovery_Port> 8102 </Recovery_Port>
>        <LifeCheck_Port> 8202 </LifeCheck_Port>
> </Replicate_Server_Info>
> <Recovery_Port> 7101 </Recovery_Port>
> <LifeCheck_Port> 7201 </LifeCheck_Port>
> <Rsync_Path> /usr/bin/rsync </Rsync_Path>
> <Rsync_Option> ssh -1 </Rsync_Option>
> <When_Stand_Alone> read_only  </When_Stand_Alone>
> <Status_Log_File>  /tmp/cluster.sts </Status_Log_File>
> <Error_Log_File> /tmp/cluster.log  </Error_Log_File>
>
> □　pgrelicate.conf
> <Cluster_Server_Info>
>    <Host_Name>   host_a </Host_Name>
>    <Port>                5432        </Port>
>    <Recovery_Port>       7101        </Recovery_Port>
>    <LifeCheck_Port>      7201        </LifeCheck_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>    <Host_Name>   host_b </Host_Name>
>    <Port>                5432        </Port>
>    <Recovery_Port>       7101        </Recovery_Port>
>    <LifeCheck_Port>      7201        </LifeCheck_Port>
> </Cluster_Server_Info>
>
> <Status_Log_File>  /tmp/pgreplicate.sts  </Status_Log_File>
> <Error_Log_File>   /tmp/pgreplicate.log  </Error_Log_File>
> <Replication_Port>       8001            </Replication_Port>
> <Recovery_Port>          8101            </Recovery_Port>
> <LifeCheck_Port>         8201            </LifeCheck_Port>
> <RLOG_Port>              8301            </RLOG_Port>
> <Response_Mode>        normal            </Response_Mode>
> <Use_Replication_Log>      no            </Use_Replication_Log>
> <Reserved_Connections>      1            </Reserved_Connections>
>
>
> ■Slave(host_b)側でリカバリコマンド発行時の
> 　Master(host_a)側のgreplicateのログ(-nv)
> DEBUG:pgrecovery_loop():[0]receive packet no:1
> DEBUG:first_setup_recovery():1st setup target host_b
> DEBUG:first_setup_recovery():1st setup port 5432
> ERROR:first_setup_recovery():get master info error , master may be down
> DEBUG:pgrecovery_loop():1st master  - 0
> DEBUG:pgrecovery_loop():1st target host_b - 5432
> DEBUG:pgrecovery_loop():first_setup_recovery end :1
> DEBUG:pgrecovery_loop():[0]receive packet no:200
> DEBUG:pgrecovery_loop():recovery error accept. top queueing and initiarse 
> recovery status
> DEBUG:PGRsend_queue():master  - 0
>
> ERROR:PGRsend_queue():master table is null
> ERROR:send_packet():PGR_Create_Socket_Connect failed
> DEBUG:replicate_loop():replicate_loop selected
> DEBUG:replicate_loop(): PGRread_packet failed query[(null)] cmdSys[]
> DEBUG:replicate_loop():session closed
> DEBUG:replicate_loop():replicate loop exit
>
> ■Master(host_a)側のgreplicate.sts
> Wed Mar 28 20:23:53 2007  port(5432) host:host_a start use
> Wed Mar 28 20:23:53 2007  port(5432) host:host_b start use
> Wed Mar 28 20:23:53 2007  cascade(host_a) port(8001) start use
> Wed Mar 28 20:24:08 2007  port(5432) host:host_b initialize
> Wed Mar 28 20:24:08 2007  port(5432) host:host_b initialize
> Wed Mar 28 20:24:08 2007  port(5432) host:host_a error
>
> ■環境
> RHEL4 ,PGCluster1.3.0c
> host_a : （Fujitsu製マシン）
> Intel(R) Xeon(TM) CPU 3.00GHz
> 1Gmemory
> Disk : 10G以上の空きあり
> host_b:（IBM製マシン）
> Mobile Intel(R) Pentium(R) 4 - M CPU 1.80GHz
> 770Mmemory
> Disk: 8G以上の空きあり
>
>
>
> よろしくお願いします。
> 以上です。
>
> ...................................................................................................
> Ikuko Suyama
> Open Source Solutions Div. IT Solutions Group.
> Computer Engineering & Consulting, Ltd.
> E-mail: I.Suyama ＠ cec-ltd.co.jp Phone : 03-5789-2477 (570)
> Open Source Expert http://www.oss-expert.com/
>