[hackers-jp: 185] Re: Fwd: Re: [HACKERS] WAL: O_DIRECT and multipage-writer

Hiroki Kataoka kataoka @ interwiz.jp
2005年 2月 15日 (火) 14:04:56 JST


片岡です。

  O_DIRECTの話は気になってたのですが、スレッドがぱったり止まってたのでど
うなってるのかな?と思ってたところです。8.1に取り込みになって(決定です
よね)良かったです。8.0向けにバックポートパッチ出したら個人的に当てて使
う人結構いそう。

Y.Shimada wrote:
> 島田@Storgateです。
> 
>  こんな話題も。。
> 
> ---------------- Begin Forwarded Message ----------------
> Subject: Re: [HACKERS] WAL: O_DIRECT and multipage-writer
> Date Sent: 2005/2/14 18:25
> From: Bruce Momjian <pgman @ candle.pha.pa.us>
> To: ITAGAKI Takahiro <itagaki.takahiro @ lab.ntt.co.jp>
> CC: pgsql-hackers @ postgresql.org
> , pgsql-patches @ postgresql.org
> 
> 
> This thread has been saved for the 8.1 release:
> 
> 	http://momjian.postgresql.org/cgi-bin/pgpatches2
> 
> ---------------------------------------------------------------------------
> 
> ITAGAKI Takahiro wrote:
> 
>>Hello, all.
>>
>>I think that there is room for improvement in WAL. 
>>Here is a patch for it.
>>  - Multiple pages are written in one write() if it is contiguous.
>>  - Add 'open_direct' to wal_sync_method.
>>
>>WAL writer writes one page in one write(). This is not efficient
>>when wal_sync_method is 'open_sync', because the writer waits for
>>IO completions at each write(). Multipage-writer can reduce syscalls
>>and improve IO throughput. 
>>
>>'open_direct' uses O_DIRECT instead of O_SYNC. O_DIRECT implies synchronous
>>writing, so it may show the tendency like open_sync. But maybe it can reduce
>>memcpy() and save OS's disk cache memory.
>>
>>I benchmarked this patch with pgbench. It works well and 
>>improved 50% of tps on my machine. WAL seems to be bottle-neck
>>on machines with poor disks.
>>
>>This patch has not yet tested enough. I would like it to be examined much
>>and taken into PostgreSQL.
>>
>>There are still many TODOs:
>>  * Is this logic really correct?
>>  - O_DIRECT_BUFFER_ALIGN should be adjusted to runtime, not compile time.
>>  - Consider to use writev() instead of write().
>>    Buffers are noncontiguous when WAL ring buffer rotates.
>>  - If wan_sync_method is not open_direct, XLOG_EXTRA_BUFFERS can be 0.
>>
>>
>>Sincerely,
>>ITAGAKI Takahiro
>>
>>
>>
>>-- pgbench result --
>>
>>$ ./pgbench -s 100 -c 50 -t 400
>>
>>- 8.0.0 default + fsync:
>>    tps = 20.630632 (including connections establishing)
>>    tps = 20.636768 (excluding connections establishing)
>>- multipage-writer + open_direct:
>>    tps = 33.761917 (including connections establishing)
>>    tps = 33.778320 (excluding connections establishing)
>>
>>Environment:
>>  OS     : Linux kernel 2.6.9
>>  CPU    : Pentium 4 3GHz
>>  disk   : ATA 5400rpm (Data and WAL are placed on same partition.)
>>  memory : 1GB
>>  config : shared_buffers=10000, wal_buffers=256,
>>           XLOG_SEG_SIZE=256MB, checkpoint_segment=4
>>
>>---
>>ITAGAKI Takahiro <itagaki.takahiro @ lab.ntt.co.jp>
>>NTT Cyber Space Laboratories
>>Nippon Telegraph and Telephone Corporation.
> 
> 
> ----------------- End Forwarded Message -----------------
> 

-- 
Hiroki Kataoka <kataoka @ interwiz.jp>



hackers-jp メーリングリストの案内