[pgsql-jp: 36131] pg_clogディレクトリ内のログが参照できずPANIC

松本 貴臣 matsumoto_takaomi @ nextcom.co.jp
2005年 10月 14日 (金) 08:45:59 JST


お世話になります。
松本と申します。

readhat+java+postgres7.3.6を採用し、システムを構築しております。
以下のようなPANICが頻発し、原因の究明にあたっておりますが、
行き詰まっております。
皆様のお知恵を頂戴できればと思い、ご連絡差し上げた次第でございます。

[事象]
以下のようなログが発生し、コネクションがクローズされる。

=========================================================
postgres[28659]: [3] PANIC:  open of /home/postgres/data/pg_clog/001B
failed: そのようなファイルやディレクトリはありません
postgres[28659]: [4] LOG:  statement: select * from pgstattuple('tbl')
postgres[2279]: [1] LOG:  server process (pid 28659) was terminated by
signal 6
postgres[2279]: [2] LOG:  terminating any other active server processes
postgres[28026]: [1202-1] WARNING:  Message from PostgreSQL backend:
   (省略)
=========================================================

ログを遡ってみたところ、pg_clogの001Bは、以下のような形で
removeされているようでした。

========================================================
postgres[28582]: [106] LOG:  recycled transaction log file 0000028600000075
postgres[28582]: [107] LOG:  recycled transaction log file 0000028600000074
postgres[28582]: [108] LOG:  removing commit log file 001B
========================================================


今回の処理としては、rootのcronにて以下のような形にしております。
(vacuumの前後の状態を統計として蓄積しようとしております)

========================================================
 1. select * from pgstattuple('tbl')
 2. vacuumdb -a -U postgres -z
 3. select * from pgstattuple('tbl')
========================================================

「1」では問題なく処理が終了しているようです。
「2」の最後で「removing commit log file 001B」されているようでした。
「3」1とまったく同じ処理を実施しているのですが、
 以下のようなログが出力されました。
 (長くなってしまい、申し訳ございません。)

以後、PANICが出るたびにコネクションが切断され、
サービスに影響が出てしまっております。

vacuum処理の中で、実施された「removing commit log file 001B」が
原因と考えておりますが、色々とWebで検索したもののこれといったものが
見つからず、投稿させていただいた次第です。
(postgresの後の[ ]内のPIDからvacuumが実施した処理で
 あると考えております。)

[ご質問]
 * これはDBが破損した状態で復旧はできないのでしょうか。
 * pg_resetxlogを使えるかもと思ったのですが、
  「最後の手段」的な記述があったため、躊躇っております。
   他にリスクの低い方法はございませんでしょうか。
 * なぜ消してはいけないと思われるコミットログをvacuumは
  消してしまったのでしょうか・・・・。

乱文で申し訳ございません。
以上、よろしくお願いいたします。

========================================================
postgres[28659]: [3] PANIC:  open of /home/postgres/data/pg_clog/001B
failed: そのようなファイルやディレクトリはありません
postgres[28659]: [4] LOG:  statement: select * from pgstattuple('tbl')
postgres[2279]: [1] LOG:  server process (pid 28659) was terminated by
signal 6
postgres[2279]: [2] LOG:  terminating any other active server processes
postgres[28026]: [1202-1] WARNING:  Message from PostgreSQL backend:
postgres[28026]: [1202-2] ^IThe Postmaster has informed me that some
other backend
postgres[28026]: [1202-3] ^Idied abnormally and possibly corrupted
shared memory.
postgres[28026]: [1202-4] ^II have rolled back the current transaction
and am
postgres[28026]: [1202-5] ^Igoing to terminate your database system
connection and exit.
postgres[28663]: [3] LOG:  connection received: host=[local]
postgres[28177]: [392-1] WARNING:  Message from PostgreSQL backend:
postgres[28027]: [586-1] WARNING:  Message from PostgreSQL backend:
postgres[28172]: [388-1] WARNING:  Message from PostgreSQL backend:
postgres[28026]: [1202-6] ^IPlease reconnect to the database system and
repeat your query.
postgres[28663]: [4] FATAL:  The database system is in recovery mode
postgres[28177]: [392-2] ^IThe Postmaster has informed me that some
other backend
postgres[28027]: [586-2] ^IThe Postmaster has informed me that some
other backend
postgres[28172]: [388-2] ^IThe Postmaster has informed me that some
other backend
postgres[28667]: [3] LOG:  connection received: host=[local]
postgres[28177]: [392-3] ^Idied abnormally and possibly corrupted shared
memory.
postgres[28027]: [586-3] ^Idied abnormally and possibly corrupted shared
memory.
postgres[28172]: [388-3] ^Idied abnormally and possibly corrupted shared
memory.
postgres[28667]: [4] FATAL:  The database system is in recovery mode
postgres[28177]: [392-4] ^II have rolled back the current transaction and am
postgres[28027]: [586-4] ^II have rolled back the current transaction and am
postgres[28172]: [388-4] ^II have rolled back the current transaction and am
postgres[28671]: [3] LOG:  connection received: host=[local]
postgres[28177]: [392-5] ^Igoing to terminate your database system
connection and exit.
postgres[28027]: [586-5] ^Igoing to terminate your database system
connection and exit.
postgres[28172]: [388-5] ^Igoing to terminate your database system
connection and exit.
postgres[28671]: [4] FATAL:  The database system is in recovery mode
postgres[28177]: [392-6] ^IPlease reconnect to the database system and
repeat your query.
postgres[28027]: [586-6] ^IPlease reconnect to the database system and
repeat your query.
postgres[28172]: [388-6] ^IPlease reconnect to the database system and
repeat your query.
postgres[28675]: [3] LOG:  connection received: host=[local]
postgres[28675]: [4] FATAL:  The database system is in recovery mode
postgres[28679]: [3] LOG:  connection received: host=[local]
postgres[28679]: [4] FATAL:  The database system is in recovery mode
postgres[28683]: [3] LOG:  connection received: host=[local]
postgres[28683]: [4] FATAL:  The database system is in recovery mode
postgres[28687]: [3] LOG:  connection received: host=[local]
postgres[28687]: [4] FATAL:  The database system is in recovery mode
postgres[28691]: [3] LOG:  connection received: host=[local]
postgres[28691]: [4] FATAL:  The database system is in recovery mode
postgres[28695]: [3] LOG:  connection received: host=[local]
postgres[28695]: [4] FATAL:  The database system is in recovery mode
postgres[2279]: [3] LOG:  all server processes terminated;
reinitializing shared memory and semaphores
postgres[28700]: [4] LOG:  connection received: host=[local]
postgres[28700]: [5] FATAL:  The database system is starting up
postgres[28704]: [4] LOG:  connection received: host=[local]
postgres[28704]: [5] FATAL:  The database system is starting up
postgres[28699]: [4] LOG:  database system was interrupted at 2005-10-13
01:09:16 JST
postgres[28699]: [5] LOG:  checkpoint record is at 286/77FD28A4
postgres[28699]: [6] LOG:  redo record is at 286/77FD28A4; undo record
is at 0/0; shutdown FALSE
postgres[28699]: [7] LOG:  next transaction id: 29988496; next oid: 3678961
postgres[28699]: [8] LOG:  database system was not properly shut down;
automatic recovery in progress
postgres[28708]: [4] LOG:  connection received: host=[local]
postgres[28708]: [5] FATAL:  The database system is starting up
postgres[28712]: [4] LOG:  connection received: host=[local]
postgres[28712]: [5] FATAL:  The database system is starting up
postgres[28716]: [4] LOG:  connection received: host=[local]
postgres[28699]: [9] LOG:  ReadRecord: record with zero length at
286/77FD28E4
postgres[28716]: [5] FATAL:  The database system is starting up
postgres[28699]: [10] LOG:  redo is not required
postgres[28720]: [4] LOG:  connection received: host=[local]
postgres[28720]: [5] FATAL:  The database system is starting up
postgres[28724]: [4] LOG:  connection received: host=[local]
postgres[28724]: [5] FATAL:  The database system is starting up
postgres[28728]: [4] LOG:  connection received: host=[local]
postgres[28728]: [5] FATAL:  The database system is starting up
postgres[28732]: [4] LOG:  connection received: host=[local]
postgres[28732]: [5] FATAL:  The database system is starting up
postgres[28736]: [4] LOG:  connection received: host=[local]
postgres[28736]: [5] FATAL:  The database system is starting up
postgres[28740]: [4] LOG:  connection received: host=[local]
postgres[28740]: [5] FATAL:  The database system is starting up
postgres[28744]: [4] LOG:  connection received: host=[local]
postgres[28744]: [5] FATAL:  The database system is starting up
postgres[28748]: [4] LOG:  connection received: host=[local]
postgres[28748]: [5] FATAL:  The database system is starting up
postgres[28752]: [4] LOG:  connection received: host=[local]
postgres[28752]: [5] FATAL:  The database system is starting up
postgres[28756]: [4] LOG:  connection received: host=[local]
postgres[28756]: [5] FATAL:  The database system is starting up
postgres[28760]: [4] LOG:  connection received: host=[local]
postgres[28760]: [5] FATAL:  The database system is starting up
postgres[28764]: [4] LOG:  connection received: host=[local]
postgres[28764]: [5] FATAL:  The database system is starting up
postgres[28699]: [11] LOG:  recycled transaction log file 0000028600000076
postgres[28699]: [12] LOG:  database system is ready
========================================================






pgsql-jp メーリングリストの案内