[pgsql-jp: 41366] トランザクションID周回後の事象につきまして
MIDORI Ikegami
prjcer8 @ mistymode.com
2013年 3月 8日 (金) 11:01:06 JST
池上と申します。
pg_dump以外のバックアップ方法につきましてアドバイス頂きたく、投稿させて頂きました。
以下のpostgreSQLのバージョンで、トランザクションIDが周回したクラスタがあります。
psql (PostgreSQL) 7.4.30 LInux
●トランザクションID周回後の状況
template1=# SELECT datname, datvacuumxid, datfrozenxid,
age(datfrozenxid) from pg_database;
datname | datvacuumxid | datfrozenxid | age
--------------------+--------------+--------------+-------------
aaaaaaaaa | 464 | 464 | -1139367901
bb | 464 | 464 | -1139367901
ccccccc | 464 | 464 | -1139367901
ddddddd | 464 | 464 | -1139367901
eeeeeee | 464 | 464 | -1139367901
hoge | 464 | 464 | -1139367901
hogefuga | 464 | 464 | -1139367901
(7 rows)
この状況から、2度ほどVACUUMをかけて、以下のような状態となりました。
bash-3.00$ vacuumdb -a -v -z
●VACUUM実施後
template1=# SELECT datname,datvacuumxid,datfrozenxid,age(datfrozenxid)
FROM pg_database;
datname | datvacuumxid | datfrozenxid | age
-----------------------+--------------+--------------+-------------
test | 3155603132 | 2081861309 | 1073749791
testtest-com | 3155603221 | 2081861398 | 1073749702
test000 | 3155603307 | 2081861484 | 1073749616
test001 | 3155603398 | 2081861575 | 1073749525
template1 | 3155603533 | 2081861710 | 1073749390
template0 | 464 | 464 | -1139356660
testtest000000 | 3155603619 | 2081861796 | 1073749304
testtest001 | 3155603715 | 2081861892 | 1073749208
test00002 | 3155603834 | 2081862011 | 1073749089
aa | 3155604087 | 2081862264 | 1073748836
aaaaaa | 3155604487 | 2081862664 | 1073748436
kkk | 3155606078 | 2081864255 | 1073746845
0000 | 3155607520 | 2081865697 | 1073745403
0001 | 3155607906 | 2081866083 | 1073745017
aaaaaaaaa | 3155608292 | 2081866469 | 1073744631
bb | 3155608411 | 2081866588 | 1073744512
ccccccc | 3155608506 | 2081866683 | 1073744417
ddddddd | 3155608908 | 2081867085 | 1073744015
eeeeeee | 3155608994 | 2081867171 | 1073743929
hoge | 3155609080 | 2081867257 | 1073743843
hogefuga | 3155609166 | 2081867343 | 1073743757
(21 rows)
VACUUM実施後は、それまでシステム的に不可視状態であったデータベースのIDや、
オーナー情報等が取得出来る状態にはなったのですが、この状態でpg_dumpallを実施すると
以下のエラーが出て中断してしまいます。
bash-3.00$ pg_dumpall -Upostgres -v > testdump.`date +%Y%m%d`
pg_dumpall: executing SELECT usename, usesysid, passwd, usecreatedb,
usesuper, valuntil FROM pg_shadow WHERE usesysid <> (SELECT datdba
FROM pg_database WHERE datname = 'template0')
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000000';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000001';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000002';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000003';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000004';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000005';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000006';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000007';
pg_dumpall: executing SELECT useconfig[1] FROM pg_shadow WHERE usename
= '000008';
pg_dumpall: executing SELECT groname, grosysid, grolist FROM pg_group
pg_dumpall: executing SELECT datname, coalesce(usename, (select
usename from pg_shadow where usesysid=(select datdba from pg_database
where datname='template0'))), pg_encoding_to_char(d.encoding),
datistemplate, datpath, datacl FROM pg_database d LEFT JOIN pg_shadow
u ON (datdba = usesysid) WHERE datallowconn ORDER BY 1
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'test';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'testtest-com ';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'test000';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'test001';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'testtest000000';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'testtest001';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'test00002';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE datname = 'aa';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'aaaaaa';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'kkk';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = '0000';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = '0001';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'aaaaaaaaa';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE datname = 'bb';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'ccccccc';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'ddddddd';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'eeeeeee';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'hoge';
pg_dumpall: executing SELECT datconfig[1] FROM pg_database WHERE
datname = 'hogefuga';
pg_dumpall: executing SELECT datname FROM pg_database WHERE
datallowconn ORDER BY 1
pg_dumpall: dumping database "test"...
pg_dumpall: running "pg_dump -U 'postgres' -v -Fp 'test'"
pg_dump: saving encoding
pg_dump: saving database definition
pg_dump: reading schemas
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression
pg_dump: *** aborted because of error
pg_dumpall: pg_dump failed on database "test", exiting
pg_dumpallではなく、任意のデータベースに対してpg_dumpを実施した際も結果は同じでした。
-bash-3.00$ pg_dump -Upostgres -v hogefuga > hogefugadump.`date +%Y%m%d`
pg_dump: saving encoding
pg_dump: saving database definition
pg_dump: reading schemas
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression
pg_dump: *** aborted because of error
原因を調査していたところ、以下の事象が発生している事が発覚致しました。
どうも、同じ名称のスキーマが2つずつ存在しているようなのです。
hogefuga=# \dn
List of schemas
Name | Owner
--------------------+----------
information_schema | postgres
information_schema | postgres
pg_catalog | postgres
pg_catalog | postgres
pg_temp_1 | postgres
pg_temp_1 | postgres
pg_toast | postgres
pg_toast | postgres
public | postgres
public | postgres
(10 rows)
更に、postgresユーザーも2ユーザー出来てしまっています。
いずれもusesysidは同一となっております。
postgresユーザーを削除しようと思いDROP USER postgres;を実行しても、
★印のほうはお化けのように残ってしまい、削除が出来ません。
hogefuga=# SELECT * FROM pg_user where usename='postgres';
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd |
valuntil | useconfig
----------+----------+-------------+----------+-----------+----------+----------+-----------
postgres | 1 | t | t | t | ******** |
| ★
postgres | 1 | t | t | t | ******** |
infinity |
(2 rows)
VACUUMによって強制的に周回を戻した代償なのかな、とは思っているのですが、
オンラインでバックアップが取れない事が問題です。
postgreSQL7系という事もありますので、PITRも利用出来ません。
何かバックアップで他の代替方法等ご存知の方がいらっしゃいましたら、
お知恵を拝借頂けましたら幸いです。
(最悪、コールドバックアップしかない事は覚悟しております)
宜しくお願い致します。
pgsql-jp メーリングリストの案内