[pgsql-jp: 30357] Re: indexを使用しない/ 使用する違いは？

2003年 7月 3日 (木) 10:00:13 JST

  杉田です。

From: Genshirou Taki <g_taki ＠ max-corporation.com>
Subject: [pgsql-jp: 30292] Re: indexを使用しない/ 使用する違いは？
Date: Mon, 30 Jun 2003 23:54:23 +0900

;;; [pgsql-jp: 30286]で書いたように、以下のようになったわけですが、
;;; 
;;; ##########
;;; explain時のSQL文
;;; explain select * from haisyutu_map WHERE id_area = 80;
;;; 
;;; 環境Aの結果
;;; Index Scan using haisyutu_map_pkey on haisyutu_map  (cost=0.00..17.07
;;; rows=5 width=40)
;;;    Index Cond: (id_area = 80)
;;; 
;;; 環境Bの結果
;;; Seq Scan on haisyutu_map  (cost=0.00..78.25 rows=353 width=40)
;;; ################3

  予測結果行数 rows が異なるので、データの内容が異なるのではないでしょうか?

;;; 勝手に、indexを使用してくれた環境で、vacuumdb -a -f -z　ではなく、
;;; psql 上で　analyze をかけたところ、indexを使わなくなってしまいました。
;;; 
;;; vacuumdb と analyze とでは、働きが違うのかな？とこれから調べるところです。

  テーブルの概算タプル数を求める部分が異なります。VACUUM では、正確に求まりま
す。しかし、ANALYZE は、ゴミ領域の具合によっては、不正確になる事があります。

;;; index、パフォーマンス、プランナといった言葉で調べたところ、
;;; 
;;; 以下のcreate文のようにprimary_keyを設定しただけでは、
;;; explain select * from haisyutu_map WHERE id_area = 80;といったSQLのとき
;;; にindexを使ってくれないのでは？という考えに至りました。
;;; 
;;; create table haisyutu_map(
;;; id_area         integer,
;;; id_material     integer,
;;; img_total       integer,
;;; 中略
;;; primary key(id_area, id_material)
;;; );
;;; 
;;; そこで、create index aaaaa on haisyutu_map(id_area);として、
;;; 手動でindexを追加したところ、
;;; どちらの環境でも確実にindexを使用し、速度も格段に向上しました。
;;; 
;;; 
;;; indexを作成する際の指針として、
;;; 「SQL文のWHEREなどの条件に使用する列をindexにする」としました。

----------------------------------------------------------------

;;; indexを何パターンもつくることのデメリットを見つけられなかったので、今の
;;; ところこれで良いのか？と思うのですが、この考え方は正しいのでしょうか？

  データが変更されるときにインデックスも変更されるので、インデックス数に応じて
その手間もかかります。

;;; たとえば、例に挙げてあるhaisyutu_mapというtableでは、

----------------------------------------------------------------

;;; 1:id_area で絞り込むので、作る。

  (id_area, id_material) で作成していれば、以降の例のように、これは不要ではな
いでしょうか?

;;; 2:id_area と id_material 両方を条件に絞るので、primary_keyで作られる
;;;   indexを使用
;;; 3:id_materialからしぼることは行わないので、
;;;   create index bbbb opn haisyutu_map(id_material) は作っていません。
;;; といった具合なのですが。

  ところで、データの順番を変えて試してみました。

    create table haisyutu_map(
	id_area         integer,
	id_material     integer,
	img_total       integer,
	primary key(id_area, id_material)
    );

    insert into haisyutu_map (n / 1000, n % 1000) -- n は 0～100000 を順番に。

とした場合には、以下のようになりました。

    explain select count(*) from haisyutu_map where id_area = 80;
					       QUERY PLAN                                            
    -------------------------------------------------------------------------------------------------
     Aggregate  (cost=1489.51..1489.51 rows=1 width=0)
       ->  Index Scan using haisyutu_map_pkey on haisyutu_map  (cost=0.00..1487.07 rows=975 width=0)
	     Index Cond: (id_area = 80)
    (3 rows)

    explain analyze select count(*) from haisyutu_map where id_area = 80;
								     QUERY PLAN                                                                 
    --------------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=1489.51..1489.51 rows=1 width=0) (actual time=5.43..5.43 rows=1 loops=1)
       ->  Index Scan using haisyutu_map_pkey on haisyutu_map  (cost=0.00..1487.07 rows=975 width=0) (actual time=0.25..4.38 rows=1000 loops=1)
	     Index Cond: (id_area = 80)
     Total runtime: 5.95 msec
    (4 rows)

    Timing is on.
    select count(*) from haisyutu_map where id_area = 80;
     count 
    -------
      1000
    (1 row)

    Time: 4.22 ms
    set enable_indexscan to off;
    SET
    Time: 0.37 ms
    select count(*) from haisyutu_map where id_area = 80;
     count 
    -------
      1000
    (1 row)

    Time: 141.26 ms

予測もインデックススキャンで、結果もインデックススキャンの方が速く、予測と結果が一致しています。

  次に、

    insert into haisyutu_map (n / 1000, n % 1000) -- n は 0～100000 で、order by random() 

とすると、

    explain select count(*) from haisyutu_map where id_area = 80;
				  QUERY PLAN                               
    -----------------------------------------------------------------------
     Aggregate  (cost=1743.39..1743.39 rows=1 width=0)
       ->  Seq Scan on haisyutu_map  (cost=0.00..1741.00 rows=957 width=0)
	     Filter: (id_area = 80)
    (3 rows)

    Timing is on.
    select count(*) from haisyutu_map where id_area = 80;
     count 
    -------
      1000
    (1 row)

    Time: 142.34 ms

  PostgreSQL を停止し起動。

    set enable_seqscan to off;
    SET
    Time: 0.43 ms
    explain select count(*) from haisyutu_map where id_area = 80;
					       QUERY PLAN                                            
    -------------------------------------------------------------------------------------------------
     Aggregate  (cost=1959.41..1959.41 rows=1 width=0)
       ->  Index Scan using haisyutu_map_pkey on haisyutu_map  (cost=0.00..1957.02 rows=957 width=0)
	     Index Cond: (id_area = 80)
    (3 rows)

    Time: 1.10 ms
    select count(*) from haisyutu_map where id_area = 80;
     count 
    -------
      1000
    (1 row)

    Time: 56.01 ms

  予測が外れました。

Kenji Sugita