[pgsql-jp: 32842] Invalid page header in block 11153 of tbl_a; zeroing out pageというワーニング

2004年 4月 20日 (火) 17:36:44 JST

箕山と申します。
質問ばかりですみませんが、アドバイス御願いします。

Postgresのバージョン
PostgreSQL 7.3.4 on i686-redhat-linux-gnu, compiled by GCC 2.96

ＯＳ
Red Hat Enterprise Linux ES release 2.1

にて、使用しているのですが、
select * from tbl_a limit 1;　とか　select count(*) from tbl_a;
とすると、
WARNING:  Invalid page header in block 11153 of tbl_a; zeroing out page
WARNING:  Invalid page header in block 11154 of tbl_a; zeroing out page
WARNING:  Invalid page header in block 11155 of tbl_a; zeroing out page
WARNING:  Invalid page header in block 11156 of tbl_a; zeroing out page
WARNING:  Invalid page header in block 11157 of tbl_a; zeroing out page
WARNING:  Invalid page header in block 11158 of tbl_a; zeroing out page
WARNING:  Invalid page header in block 11159 of tbl_a; zeroing out page
とか
PANIC:  open of /var/lib/pgsql/data/pg_clog/0651 failed: そのようなファイル
やディレクトリはありません
といったエラーメッセージがpostgreSQLのログに表示されています。

前者のエラーメッセージが出たときには、select結果は正常に返ってきます。（よう
に見えます。）
後者のエラーメッセージが出たときには、クライアント側に
pqReadData() -- backend closed the channel unexpectedly
というメッセージが返って来て、postgresにリブートが掛っているようです。
その時のpostgresのログは次のようになっています。
PANIC:  open of /var/lib/pgsql/data/pg_clog/0651 failed: そのようなファイル
やディレクトリはありません
LOG:  statement: select count(*) from tbl_a;
LOG:  server process (pid 24221) was terminated by signal 6
LOG:  terminating any other active server processes
WARNING:  Message from PostgreSQL backend:
^IThe Postmaster has informed me that some other backend
^Idied abnormally and possibly corrupted shared memory.
^II have rolled back the current transaction and am
^Igoing to terminate your database system connection and exit.
^IPlease reconnect to the database system and repeat your query.
WARNING:  Message from PostgreSQL backend:
^IThe Postmaster has informed me that some other backend
^Idied abnormally and possibly corrupted shared memory.
WARNING:  Message from PostgreSQL backend:
WARNING:  Message from PostgreSQL backend:
WARNING:  Message from PostgreSQL backend:
WARNING:  Message from PostgreSQL backend:
^II have rolled back the current transaction and am
^IThe Postmaster has informed me that some other backend
^IThe Postmaster has informed me that some other backend
^IThe Postmaster has informed me that some other backend
^IThe Postmaster has informed me that some other backend
^Igoing to terminate your database system connection and exit.
^Idied abnormally and possibly corrupted shared memory.
^Idied abnormally and possibly corrupted shared memory.
^Idied abnormally and possibly corrupted shared memory.
^Idied abnormally and possibly corrupted shared memory.
^IPlease reconnect to the database system and repeat your query.
^II have rolled back the current transaction and am
^II have rolled back the current transaction and am
^II have rolled back the current transaction and am
^II have rolled back the current transaction and am
^Igoing to terminate your database system connection and exit.
^Igoing to terminate your database system connection and exit.
^Igoing to terminate your database system connection and exit.
^Igoing to terminate your database system connection and exit.
^IPlease reconnect to the database system and repeat your query.
^IPlease reconnect to the database system and repeat your query.
^IPlease reconnect to the database system and repeat your query.
^IPlease reconnect to the database system and repeat your query.
LOG:  all server processes terminated; reinitializing shared memory and
semaphores
LOG:  database system was interrupted at 2004-04-16 07:52:54 JST
LOG:  checkpoint record is at 16/54A4F164
LOG:  redo record is at 16/54A4F164; undo record is at 0/0; shutdown FALSE
LOG:  next transaction id: 180438; next oid: 50751694
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  ReadRecord: record with zero length at 16/54A4F1A4
LOG:  redo is not required
LOG:  recycled transaction log file 0000001600000053
LOG:  database system is ready

いろんな過去ログから、ハードウエア的な障害かと思い、ハードベンダーのサポート
と協力して調査したりしたのですが、最終的に、「ハード的な問題ではなく、
postgres
固有の問題」という結論をハードベンダーさんから頂いてしまいました。

その他システム構成など気になる点として、
現在、システム試験中なのですが、
毎日該当するテーブルは、夜間バッチで全件削除し、psql -c"copy tbl_a・・・・
でデータを流し込んでいます。
また、さらに、このテーブルにインサートトリガーが仕込んであり、
インサート時に、さらに別のテーブルにデータを流し込んでおります。
使用頻度が低くて確かかどうかわからないのですが、
大量データをcopyで流し込んだときに発生しているような感じもします。

その他
zero_damaged_pages = true
の設定を、postgres.confに入れています。（根本原因の解決にはつながらない
ことは分かってますが。）

自分で試したアクションは、
１．postgresの再インストール
　　結果、同じ現象が再発。

２．該当するテーブルをdrop->create
　　一時的に直ったが、同じ現象が再発

次のアクションはどうするのがよいのでしょうか？
バージョンを上げてみようかな　とか思っているのですが、、、、
同じようなご経験をされた諸先輩方のアドバイスをいただけませんでしょうか。
何卒、よろしく御願いします。

以上