[pgsql-jp: 33422] PANICエラーによるデータベース障害について

佐藤伸行 n-satoh @ ms3.omn.ne.jp
2004年 6月 29日 (火) 22:29:54 JST


-------------------------------------------------------------
岸田さん・やまださん、ご指摘ありがとうございます。
確かにお二人の言うとおりであり、大変ご迷惑おかけしました。
改めて質問を書きなおします。
管理人さん、大変お手数ですが改めて質問させて頂きますので、
Subject: [pgsql-jp: 33415] を削除してください。
無理なお願いをして申し訳ありませんが宜しくお願いします。
-------------------------------------------------------------


初めまして。ここ数ヶ月原因がわからないPANICエラーが発生しています。

【/var/log/messages の内容】
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [10] PANIC:  open of
/fs/pgsql_db/pg_clog/0207 failed: ??????????????????????
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [11-1] LOG:  statement: SELECT
"opr"."seisansuuryo","opr"."ruikeisuuryo","opr"."startplan","opr"."starttime
","opr"."kouteicd"  FROM
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [11-2]  "opr" WHERE
"opr"."bubunhincd" LIKE '%A1' AND "opr"."sakuban" = '1G2N5U01' AND
"opr"."bunkatsuno" = 0 AND "opr"."kouteicd" =
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [11-3]  'CP08' AND
"opr"."versionno" = '20030001'
Jun 27 11:03:22 a4hh-db01 postgres[855]: [10] LOG:  server process (pid
13533) was terminated by signal 6
Jun 27 11:03:22 a4hh-db01 postgres[855]: [11] LOG:  terminating any other
active server processes
Jun 27 11:03:22 a4hh-db01 postgres[855]: [12] LOG:  all server processes
terminated; reinitializing shared memory and semaphores
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [13] LOG:  database system was
interrupted at 2004-06-27 11:00:07 JST
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [14] LOG:  checkpoint record is
at 133/79723B1C
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [15] LOG:  redo record is at
133/79723B1C; undo record is at 0/0; shutdown FALSE
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [16] LOG:  next transaction id:
133379436; next oid: 389189064
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [17] LOG:  database system was
not properly shut down; automatic recovery in progress
Jun 27 11:03:22 a4hh-db01 postgres[13560]: [13] FATAL:  The database system
is starting up
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [18] LOG:  redo starts at
133/79723B5C
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [19] LOG:  ReadRecord: record
with zero length at 133/797D1730
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [20] LOG:  redo done at
133/797D170C
Jun 27 11:03:25 a4hh-db01 postgres[13559]: [21] LOG:  database system is
ready
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [13] PANIC:  open of
/fs/pgsql_db/pg_clog/0207 failed: ??????????????????????
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [14-1] LOG:  statement: UPDATE
jisseki SET jissekisagyou = opr.sagyoujikan FROM opr WHERE (opr.oprdivno =
jisseki.oprdivno) AND
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [14-2]  (opr.opr = jisseki.opr)
AND (opr.lotno = jisseki.lotno) AND (jisseki.lotno = 'KF04196') AND
(jisseki.opr Between 60 AND 60) AND
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [14-3]  (jisseki.oprdivno=00) AND
(opr.versionno=(SELECT MAX(opr2.versionno) as versionno from opr as opr2))
Jun 27 11:16:35 a4hh-db01 postgres[855]: [13] LOG:  server process (pid
14076) was terminated by signal 6
Jun 27 11:16:35 a4hh-db01 postgres[855]: [14] LOG:  terminating any other
active server processes
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-4] ^II have rolled back the
current transaction and am
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jun 27 11:16:35 a4hh-db01 postgres[855]: [15] LOG:  all server processes
terminated; reinitializing shared memory and semaphores
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [16] LOG:  database system was
interrupted at 2004-06-27 11:08:28 JST
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [17] LOG:  checkpoint record is
at 133/798426C8
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [18] LOG:  redo record is at
133/798426C8; undo record is at 0/0; shutdown FALSE
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [19] LOG:  next transaction id:
133384229; next oid: 389189064
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [20] LOG:  database system was
not properly shut down; automatic recovery in progress
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [21] LOG:  redo starts at
133/79842708
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [22] LOG:  ReadRecord: record
with zero length at 133/798735C4
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [23] LOG:  redo done at
133/7987156C
Jun 27 11:16:38 a4hh-db01 postgres[14079]: [24] LOG:  database system is
ready

PANICを引き起こすTableは、20万件を越すデータを格納しているテーブルで
エラーを起こしているのでpostgres.confがおかしいのではと、考えているのですが
決定打がなく困っています。

【Panicを引き起こすテーブルサイズ】
select * from pgstattuple('opr');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count |
dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+---
-------------+--------------------+------------+--------------
 186867712 |      228968 |  85359984 |         45.68 |             1890 |
622932 |               0.33 |   98028056 |        52.46
(1 row)

select database_size('loadcalc');
 database_size
---------------
    4393840884
(1 row)


そこで有識者のみなさんにお願いです。下記postgres.confで、
確認すべきポイント・矛盾点等があれば
是非教えてください。足りない情報があれば確認します。OS/
PostgreSQLバージョンは以下のとおりです。

OS:Redhat linux7.3
PostgreSQLバージョン7.3.2


【postgres.conf】
#
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".


#========================================================================


#
# Connection Parameters
#
tcpip_socket = on
#ssl = false

max_connections = 56 # 32:
superuser_reserved_connections = 8

#port = 5432
#hostname_lookup = false
#show_source_port = false

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal

#virtual_host = ''

#krb_server_keyfile = ''

#
# Shared Memory Size
#
#!shared_buffers = 112  # 64:min max_connections*2 or 16, 8KB each
#shared_buffers = 224
#2004/05/11 gotoh update(224 * 3)
# shared_buffers = 672
shared_buffers = 1344

max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes
#!max_fsm_pages = 10000  # min 1000, fsm is free space map, ~6 bytes
 max_fsm_pages = 524288
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 8  # min 4, typically 8KB each

#
# Non-shared Memory Sizes
#
#sort_mem = 1024  # min 64, size in KB
sort_mem = 3072
#!vacuum_mem = 8192  # min 1024, size in KB
#vacuum_mem = 12288
#2004/05/11 gotoh update (12288 * 3)
vacuum_mem = 36864

#
# Write-ahead log (WAL)
#
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#
#commit_delay = 0  # range 0-100000, in microseconds
#commit_siblings = 5  # range 1-1000
#
#fsync = true
#wal_sync_method = fsync # the default varies across platforms:
#    # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0   # range 0-16

#
# Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true

#effective_cache_size = 1000 # typically 8KB each
#random_page_cost = 4  # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

#default_statistics_target = 10 # range 1-1000

#
# GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0  # default based on tables in statement,
    # range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1  # auto-compute seed


#
# Message display
#
#server_min_messages = info # Values, in order of decreasing detail:
    #   debug5, debug4, debug3, debug2, debug1,
    #   info, notice, warning, error, log, fatal,
    #   panic
#client_min_messages = info # Values, in order of decreasing detail:
    #   debug5, debug4, debug3, debug2, debug1,
    #   log, info, notice, warning, error
silent_mode = on

#log_connections = false
#log_pid = false
#log_statement = false
#log_duration = false
#log_timestamp = false

log_min_error_statement =  warning #    Values in order of increasing
severity:
      #   debug5, debug4, debug3, debug2, debug1,
      #   info, notice, warning, error, panic(off)

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false

#explain_pretty_print = true

# requires USE_ASSERT_CHECKING
#debug_assertions = true


#
# Syslog
#
syslog = 2   # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'


#
# Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_statement_stats = false

# requires BTREE_BUILD_STATS
#show_btree_build_stats = false


#
# Access statistics collection
#

#2004/05/07 gotoh update # out
stats_start_collector = true

#stats_reset_on_server_start = true
stats_command_string = true

#2004/05/07 gotoh update # & false out
stats_row_level = true
stats_block_level = true

#
# Lock Tracing
#
#trace_notify = false

# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0

#
# Misc
#
#autocommit = true
#dynamic_library_path = '$libdir'
#search_path = '$user,public'
#datestyle = 'iso, us'
#timezone = unknown  # actually, defaults to TZ environment setting
#australian_timezones = false
#client_encoding = sql_ascii # actually, defaults to database encoding
#authentication_timeout = 60 # 1-600, in seconds
#deadlock_timeout = 1000 # in milliseconds
#default_transaction_isolation = 'read committed'
#max_expr_depth = 10000  # min 10
#max_files_per_process = 1000 # min 25
#password_encryption = true
#sql_inheritance = true
#transform_null_equals = false
#statement_timeout = 0  # 0 is disabled, in milliseconds
#db_user_namespace = false

#
# Locale settings
#
# (initialized by initdb -- may be changed)
LC_MESSAGES = 'ja_JP.eucJP'
LC_MONETARY = 'ja_JP.eucJP'
LC_NUMERIC = 'ja_JP.eucJP'
LC_TIME = 'ja_JP.eucJP'

******************
o(^-^)o NOB o(^-^)o
******************





pgsql-jp メーリングリストの案内