[pgsql-jp: 25029] optimizerの動作

2002年 3月 1日 (金) 17:27:46 JST

新井です。
ちょっと細かいことで恐縮なのですが、気になってることがあります。

table foo (id1 integer, id2 integer)
table bar (id2 integer, updated timestamp)

というようなテーブルがあります。(全て索引付き)

SELECT updated FROM foo,bar WHERE id1=? AND foo.id2=bar.id2
ORDER BY updated DESC LIMIT 100;

そこで、上のようなクエリを発行します。

すると、foo.id1 = ?に一致するレコードが1件しかない場合と2件以上の
場合で、クエリプランが大きく変わります。

1件しかない場合、
Limit  (cost=2779.06..2779.06 rows=100 width=377)
  ->  Sort  (cost=2779.06..2779.06 rows=1130 width=377)
        ->  Nested Loop  (cost=0.00..2721.75 rows=1130 width=377)
              ->  Seq Scan on foo ...
              ->  Index Scan using bar_id2_idx on bar ...
2件以上の場合、
Limit  (cost=0.00..1278.44 rows=100 width=377)
  ->  Nested Loop  (cost=0.00..63872.07 rows=4996 width=377)
        ->  Index Scan Backward using bar_updated_idx on bar ...
        ->  Seq Scan on foo  (cost=0.00..1.60 rows=6 width=4)
となります。

とうぜん全件をソートしない後者のほうがずっと高速です。
実際にかなりの速度差が出てしまい、仕方なくenable_sortをオフにして
使っています。(すると、全て後者のようになり、高速に動作します。)

これは私のテーブル設計やクエリの立て方に問題があるのでしょうか?
それともオプティマイザの仕様であって、enable_sortをオフにして使えば
良いのでしょうか。

バージョンはpostgres 7.2です。

-----
新井 俊一
arai ＠ mellowtone.org
http://www.mellowtone.org/