[pgsql-jp: 33415] PANIC障害について
佐藤伸行
n-satoh @ ms3.omn.ne.jp
2004年 6月 29日 (火) 19:03:25 JST
みなさん、初めまして。
ここ数ヶ月原因がわからないPANIC障害に陥っております。
【/var/log/messages の内容】
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [10] PANIC: open of
/fs/pgsql_db/pg_clog/0207 failed: ??????????????????????
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [11-1] LOG: statement: SELECT
"opr"."seisansuuryo","opr"."ruikeisuuryo","opr"."startplan","opr"."starttime
","opr"."kouteicd" FROM
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [11-2] "opr" WHERE
"opr"."bubunhincd" LIKE '%A1' AND "opr"."sakuban" = '1G2N5U01' AND
"opr"."bunkatsuno" = 0 AND "opr"."kouteicd" =
Jun 27 11:03:22 a4hh-db01 postgres[13533]: [11-3] 'CP08' AND
"opr"."versionno" = '20030001'
Jun 27 11:03:22 a4hh-db01 postgres[855]: [10] LOG: server process (pid
13533) was terminated by signal 6
Jun 27 11:03:22 a4hh-db01 postgres[855]: [11] LOG: terminating any other
active server processes
Jun 27 11:03:22 a4hh-db01 postgres[855]: [12] LOG: all server processes
terminated; reinitializing shared memory and semaphores
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [13] LOG: database system was
interrupted at 2004-06-27 11:00:07 JST
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [14] LOG: checkpoint record is
at 133/79723B1C
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [15] LOG: redo record is at
133/79723B1C; undo record is at 0/0; shutdown FALSE
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [16] LOG: next transaction id:
133379436; next oid: 389189064
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [17] LOG: database system was
not properly shut down; automatic recovery in progress
Jun 27 11:03:22 a4hh-db01 postgres[13560]: [13] FATAL: The database system
is starting up
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [18] LOG: redo starts at
133/79723B5C
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [19] LOG: ReadRecord: record
with zero length at 133/797D1730
Jun 27 11:03:22 a4hh-db01 postgres[13559]: [20] LOG: redo done at
133/797D170C
Jun 27 11:03:25 a4hh-db01 postgres[13559]: [21] LOG: database system is
ready
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [13] PANIC: open of
/fs/pgsql_db/pg_clog/0207 failed: ??????????????????????
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [14-1] LOG: statement: UPDATE
jisseki SET jissekisagyou = opr.sagyoujikan FROM opr WHERE (opr.oprdivno =
jisseki.oprdivno) AND
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [14-2] (opr.opr = jisseki.opr)
AND (opr.lotno = jisseki.lotno) AND (jisseki.lotno = 'KF04196') AND
(jisseki.opr Between 60 AND 60) AND
Jun 27 11:16:35 a4hh-db01 postgres[14076]: [14-3] (jisseki.oprdivno=00) AND
(opr.versionno=(SELECT MAX(opr2.versionno) as versionno from opr as opr2))
Jun 27 11:16:35 a4hh-db01 postgres[855]: [13] LOG: server process (pid
14076) was terminated by signal 6
Jun 27 11:16:35 a4hh-db01 postgres[855]: [14] LOG: terminating any other
active server processes
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-1] WARNING: Message from
PostgreSQL backend:
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-4] ^II have rolled back the
current transaction and am
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jun 27 11:16:35 a4hh-db01 postgres[14075]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jun 27 11:16:35 a4hh-db01 postgres[855]: [15] LOG: all server processes
terminated; reinitializing shared memory and semaphores
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [16] LOG: database system was
interrupted at 2004-06-27 11:08:28 JST
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [17] LOG: checkpoint record is
at 133/798426C8
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [18] LOG: redo record is at
133/798426C8; undo record is at 0/0; shutdown FALSE
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [19] LOG: next transaction id:
133384229; next oid: 389189064
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [20] LOG: database system was
not properly shut down; automatic recovery in progress
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [21] LOG: redo starts at
133/79842708
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [22] LOG: ReadRecord: record
with zero length at 133/798735C4
Jun 27 11:16:35 a4hh-db01 postgres[14079]: [23] LOG: redo done at
133/7987156C
Jun 27 11:16:38 a4hh-db01 postgres[14079]: [24] LOG: database system is
ready
PANICを引き起こすTableは、20万件を越すデータを格納しているテーブルで
エラーを起こしているのでpostgres.confがおかしいのではと、考えているのですが
決定打がなく困っています。
【Panicを引き起こすテーブルサイズ】
select * from pgstattuple('opr');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count |
dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+---
-------------+--------------------+------------+--------------
186867712 | 228968 | 85359984 | 45.68 | 1890 |
622932 | 0.33 | 98028056 | 52.46
(1 row)
select database_size('loadcalc');
database_size
---------------
4393840884
(1 row)
そこで有識者のみなさんにお願いです。下記postgres.confで、
確認すべきポイント・矛盾点等があれば
是非教えてください。足りない情報があれば確認します。
【postgres.conf】
#
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".
#========================================================================
#
# Connection Parameters
#
tcpip_socket = on
#ssl = false
max_connections = 56 # 32:
superuser_reserved_connections = 8
#port = 5432
#hostname_lookup = false
#show_source_port = false
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#virtual_host = ''
#krb_server_keyfile = ''
#
# Shared Memory Size
#
#!shared_buffers = 112 # 64:min max_connections*2 or 16, 8KB each
#shared_buffers = 224
#2004/05/11 gotoh update(224 * 3)
# shared_buffers = 672
shared_buffers = 1344
max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes
#!max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes
max_fsm_pages = 524288
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 8 # min 4, typically 8KB each
#
# Non-shared Memory Sizes
#
#sort_mem = 1024 # min 64, size in KB
sort_mem = 3072
#!vacuum_mem = 8192 # min 1024, size in KB
#vacuum_mem = 12288
#2004/05/11 gotoh update (12288 * 3)
vacuum_mem = 36864
#
# Write-ahead log (WAL)
#
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
#
#fsync = true
#wal_sync_method = fsync # the default varies across platforms:
# # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0 # range 0-16
#
# Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true
#effective_cache_size = 1000 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)
#default_statistics_target = 10 # range 1-1000
#
# GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0 # default based on tables in statement,
# range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1 # auto-compute seed
#
# Message display
#
#server_min_messages = info # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, log, fatal,
# panic
#client_min_messages = info # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# log, info, notice, warning, error
silent_mode = on
#log_connections = false
#log_pid = false
#log_statement = false
#log_duration = false
#log_timestamp = false
log_min_error_statement = warning # Values in order of increasing
severity:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, panic(off)
#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#explain_pretty_print = true
# requires USE_ASSERT_CHECKING
#debug_assertions = true
#
# Syslog
#
syslog = 2 # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#
# Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_statement_stats = false
# requires BTREE_BUILD_STATS
#show_btree_build_stats = false
#
# Access statistics collection
#
#2004/05/07 gotoh update # out
stats_start_collector = true
#stats_reset_on_server_start = true
stats_command_string = true
#2004/05/07 gotoh update # & false out
stats_row_level = true
stats_block_level = true
#
# Lock Tracing
#
#trace_notify = false
# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0
#
# Misc
#
#autocommit = true
#dynamic_library_path = '$libdir'
#search_path = '$user,public'
#datestyle = 'iso, us'
#timezone = unknown # actually, defaults to TZ environment setting
#australian_timezones = false
#client_encoding = sql_ascii # actually, defaults to database encoding
#authentication_timeout = 60 # 1-600, in seconds
#deadlock_timeout = 1000 # in milliseconds
#default_transaction_isolation = 'read committed'
#max_expr_depth = 10000 # min 10
#max_files_per_process = 1000 # min 25
#password_encryption = true
#sql_inheritance = true
#transform_null_equals = false
#statement_timeout = 0 # 0 is disabled, in milliseconds
#db_user_namespace = false
#
# Locale settings
#
# (initialized by initdb -- may be changed)
LC_MESSAGES = 'ja_JP.eucJP'
LC_MONETARY = 'ja_JP.eucJP'
LC_NUMERIC = 'ja_JP.eucJP'
LC_TIME = 'ja_JP.eucJP'
******************
o(^-^)o NOB o(^-^)o
******************
pgsql-jp メーリングリストの案内