[pgsql-jp: 41997] 【実行時間】関数を用いた ORDER BY

2018年 12月 17日 (月) 09:56:00 UTC

JPUG 各位

　お世話になります。内山と申します。
ORDER BY 句でのソートに、列を指定した場合と
関数演算を指定した場合で実行時間が、100倍程度
違ってきます。今回は、件数を絞り込めないのですが
何か良い方法があるでしょうか。ちなみに、関数版の
ORDER BY は、algorithm という単語の出現回数です。
explain analyze buffers の結果を引用します。

〇vc 列でソート
moovle=# explain ( analyze, buffers ) WITH v AS
( SELECT id, video, track, stime, content, theme FROM subtitled
WHERE content &@~ 'algorithm' ORDER BY vc DESC OFFSET 0 LIMIT 70 )
select id, video, track, stime, theme, array_to_json( content ) as c from v;

  QUERY PLAN

----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
---------------
 CTE Scan on v  (cost=37767.73..37769.31 rows=70 width=180)
                 (actual time=32.827..41.907 rows=70 loops=1)
   Buffers: shared hit=9714
   CTE v
     ->  Limit  (cost=37767.56..37767.73 rows=70 width=1077) (actual
time=32.697
..32.713 rows=70 loops=1)
           Buffers: shared hit=9361
           ->  Sort  (cost=37767.56..37855.40 rows=35137 width=1077)
                     (actual time=32.695..32.706 rows=70 loops=1)
                 Sort Key: subtitled.vc DESC
                 Sort Method: top-N heapsort  Memory: 137kB
                 Buffers: shared hit=9361
                 ->  Index Scan using subtitled2_pgroonga_content_idx on
                     subtitled
                     (cost=0.00..36515.05 rows=35137 width=1077)
                     (actual time=3.708..28.850 rows=10084 loops=1)
                       Index Cond: (content &@~ 'algorithm'::text)
                       Buffers: shared hit=9361
 Planning time: 0.353 ms
 Execution time: 42.206 ms
(14 行)

〇関数を含む ORDER BY （algorithm という単語の出現回数）
moovle=# explain ( analyze, buffers ) WITH v AS
( SELECT id, video, track, stime, content, theme FROM subtitled
WHERE content &@~ 'algorithm'
ORDER BY ( (char_length( array_to_string( content, '' ) )
- char_length( replace( array_to_string( content, '' ), 'algorithm', '' )) )

/ char_length( 'algorithm' ) )  DESC OFFSET 0 LIMIT 70 )
select id, video, track, stime, theme, array_to_json( content ) as c from v;

              QUERY PLAN

----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
--------------------------------------
 CTE Scan on v  (cost=38382.63..38384.21 rows=70 width=180)
　　　　　　　　(actual time=5447.048..5468.595 rows=70 loops=1)
   Buffers: shared hit=121547
   CTE v
     ->  Limit  (cost=38382.46..38382.63 rows=70 width=1073)
　　　　　　　　(actual time=5446.719..5446.744 rows=70 loops=1)
           Buffers: shared hit=121052
           ->  Sort  (cost=38382.46..38470.30 rows=35137 width=1073)
　　　　　　　　　　(actual time=5446.718..5446.737 rows=70 loops=1)
                 Sort Key: (((char_length(array_to_string(subtitled.content,
'':
:text)) - char_length(replace(array_to_string(subtitled.content, ''::text),
'
algorithm'::text, ''::text))) / 9)) DESC
                 Sort Method: top-N heapsort  Memory: 58kB
                 Buffers: shared hit=121052
                 ->  Index Scan using subtitled2_pgroonga_content_idx on
　　　　　　　　　　　subtitled
　　　　　　　　　　　(cost=0.00..37129.95 rows=35137 width=1073)
　　　　　　　　　　　(actual time=7.956..5439.700 rows=10084 loops=1)
                       Index Cond: (content &@~ 'algorithm'::text)
                       Buffers: shared hit=121049
 Planning time: 0.751 ms
 Execution time: 5468.803 ms
(14 行)

PostgreSQL 9.6 です。PostgreSQL 10 とは少し違う実行計画かと思います。
10でも、100倍程度かかりました。10だと、実行計画が Index Scan のあとの
Bitmap Heap Scan の Recheck で時間がかかっていました。
　良い知恵があったら、ご教示のほどよろしくお願いします。