[pgsql-jp: 33899] Re: 異なるクエリ間のコスト比較

2004年 8月 20日 (金) 18:21:32 JST

板垣です。

On Fri, 20 Aug 2004 17:42:16 +0900
TANIDA Yutaka <tanida ＠ sra.co.jp> wrote:

> とりあえず、比較対照のPostgreSQLのバージョンはなんですか？

PostgreSQL 8.0.0beta1 です。
Fedora Core2 でコンパイルしました。(GCC 3.3.3, libc-2.3.3)
また、パラメータはデフォルトで、set enable_XXX もしていません。

> 可能性はいくつかあります。
> 
> ・単純な速度比較だと、キャッシュの絡み等があるために純粋にコストに比例す
> るとは限らない。

これが一番関係ありそうだと思い、それぞれのクエリごとに
pg_ctl stop/start して計測しました。
キャッシュは使っていないつもりです。

> ・コスト算出のパラメーターが正しくない等の理由で、速度に比例した結果が出
> ていない。

パラメータを変えて試してみましたが、EXISTS を使うクエリのコストは、
random_page_cost に大体比例していました。
もともとのコストが高すぎるため、
- random_page_cost < 1.0 に小さくする
- cpu_*_cost を10倍以上に大きくする
ような、極端なパラメータ設定をしない限りコストは逆転しませんでした。

> ・コスト算出アルゴリズム自体が誤っていて、このケースで正しいコストを算出
> できていない。

単なる勘なのですが……
　　ループの一週目のコストにループ数を掛けているので、
　　二週目以降のループではキャッシュが効いて、実際には高速に処理できている
……いう状態になっているんでしょうか？


> あと、explain (analyze)の結果があると議論がしやすいです。
以下、長くなりますが、EXPLAIN ANALYZE の結果を書かせていただきます。

テーブル tbl0, tbl1 は、共に
  id int8 primary key
  var int4
という構造で、id, var の両方に btree インデックスが張ってあります。
それぞれのタプル数は、
  count(tbl0) == 1000
  count(tbl1) == 5000
です。

explain analyze select distinct tbl0.id from tbl0, tbl1 where tbl0.var = tbl1.var;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=547.71..573.79 rows=1000 width=8) (actual time=147.855..212.331 rows=995 loops=1)
   ->  Sort  (cost=547.71..560.75 rows=5216 width=8) (actual time=147.837..176.632 rows=4899 loops=1)
         Sort Key: tbl0.id
         ->  Hash Join  (cost=18.50..225.66 rows=5216 width=8) (actual time=13.677..113.840 rows=4899 loops=1)
               Hash Cond: ("outer".var = "inner".var)
               ->  Seq Scan on tbl1  (cost=0.00..80.00 rows=5000 width=4) (actual time=0.058..33.831 rows=5000 loops=1)
               ->  Hash  (cost=16.00..16.00 rows=1000 width=12) (actual time=13.452..13.452 rows=0 loops=1)
                     ->  Seq Scan on tbl0  (cost=0.00..16.00 rows=1000 width=12) (actual time=0.057..6.726 rows=1000 loops=1)
 Total runtime: 219.058 ms

explain analyze select tbl0.id from tbl0 where tbl0.var in (select tbl1.var from tbl1);
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=94.89..125.89 rows=999 width=8) (actual time=79.994..100.874 rows=995 loops=1)
   Hash Cond: ("outer".var = "inner".var)
   ->  Seq Scan on tbl0  (cost=0.00..16.00 rows=1000 width=12) (actual time=0.089..6.904 rows=1000 loops=1)
   ->  Hash  (cost=92.50..92.50 rows=958 width=4) (actual time=79.816..79.816 rows=0 loops=1)
         ->  HashAggregate  (cost=92.50..92.50 rows=958 width=4) (actual time=66.508..73.234 rows=994 loops=1)
               ->  Seq Scan on tbl1  (cost=0.00..80.00 rows=5000 width=4) (actual time=0.055..34.209 rows=5000 loops=1)
 Total runtime: 107.412 ms

explain analyze select tbl0.id from tbl0 where exists (select * from tbl1 where tbl0.var = tbl1.var);
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl0  (cost=0.00..3998.58 rows=500 width=8) (actual time=0.295..27.957 rows=995 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Index Scan using idxvar1 on tbl1  (cost=0.00..23.90 rows=6 width=12) (actual time=0.013..0.013 rows=1 loops=1000)
           Index Cond: ($0 = var)
 Total runtime: 34.133 ms


------------------------------------------------------------
板垣貴裕 <itagaki.takahiro ＠ lab.ntt.co.jp>