[pgsql-jp: 33571] Jan Wieck氏 MySQL Clusterを語る

2004年 7月 9日 (金) 12:00:58 JST

石井です．だいぶ日が経ってしまいましたが，

Subject: Re: [GENERAL] Replication
From: Jan Wieck <JanWieck ＠ Yahoo.com>
To: Andrew Sullivan <ajs ＠ crankycanuck.ca>
Cc: pgsql-general ＠ postgresql.org
Date: Wed, 21 Apr 2004 11:23:51 -0400

というメールで，MySQL Clusterに対するJan Wieck氏(PostgreSQLコアメンバー
の一人)の技術的な criticism が流れていました．おもしろそうなので，翻訳
してみました．
# 急いで訳したのでたぶん誤訳多数あり．
--
Tatsuo Ishii

------------------------------------------------------------------------
Andrew Sullivan wrote:

> On Tue, Apr 20, 2004 at 11:26:24AM +0200, Pailloncy Jean-G?rard wrote:
>> Hi,
>> 
>> I just see that Mysql will propose at the end of the month a full 
>> synchronous replication system with auto-recovery.
> 
> Well, sort of.  It seems to be yet another 80/20 Solution From MySQL
> (tm).
> 
> It looks like it's based on a new table type.  It stores everything
> in memory, and then writes out asynchronously.  This strikes me as
> pretty dangerous from the point of view of reliability: what if the
> box dies before the write is complete?  (And don't tell me about
> super-redundant high-availability hardware.  I _have_ all that.  All
> hardware sucks; HA stuff just sucks less often at a higher price.)
> Also, it doesn't support the other table types.  I don't want to
> contemplate the horrible mess you'd have to clean up if you had a
> transaction crossing three table types and get a hardware failure. 
> 
> I'm afraid I agree with the recently-posted Oracle Veep interview:
> this does not represent any serious challenge to the core ORAC
> market.

Quoting from the MySQL(tm) FAQ about MySQL(tm) Cluster(tm) avaliable at 
http://www.mysql.com/products/cluster/faq.html

<同ページからの引用開始>

Q: MySQL ClusterはMyISAMやInnoDBで動きますか？

A: MySQL ClusterはMyISAMやInnoDBストレージエンジンを使用することができ
ます．高可用性を保証するデータはこれらのストレージエンジンの中に保存さ
れなければなりません．

MySQL ClusterのDBノードにはMySQL Clusterが扱うデータが保存されます．
MySQL ServerはSQLをパースし，アクセス要求をDBノードに送信します．MySQL
Server 自体はMySQL Clusterストレージエンジンが扱うようなデータは持って
いません．

MySQL ServerにもInnoDB/MyISAMがありますから，普通のMySQLのように使うこ
ともできますが，レプリケーションの対象にはなりません．ですから，このよ
うなデータはMySQLClusterに接続している他のMySQL Serverからは見えません．

<同ページからの引用終了>

つまり，(MySQL Clusterの本質は)SQLクエリエンジンが使用できる新しいテー
ブルハンドラーと言うことですね．「MySQL Clusterは世界で最もポピュラー
なオープンソースデータベースとパラレルサーバの統合を果たした」という声
高な宣伝から，外部キー，MVCC，ロールバックのような類いの新しい機能が，
あたかも高可用性ノードに水平スケールするような印象を与えますが，これら
は真実ではありません．

NDBテーブルタイプは外部キー，制約，トリガをサポートしていません．トラ
ンザクションはサポートしますが，InnoDBテーブルハンドラーがサポートする
トランザクションとは同列のものではありません．つまり，異なるテーブルタ
イプの間では，COMMITの原子性は保たれないのです．よくMySQLは SAP R/3 の
ような最大級のシステムが参照整合性ー制約をサポートしていないと言います．
たしかにその通りなのですが，私自身がSAPのコンサルティングを長年勤めた
経験から言うと，(SAPなどが参照整合性制約をサポートしていない理由は)性
能上の問題ではないのです．SAPは，DBベンダーからの独立性を得るために，
独自のカスタム一貫性制約やデータドメインシステムをDBを抽象化したレイヤ
に数度に渡って実装しています．その抽象化レイヤはPHPとApacheを合わせた
よりも更に大きく，典型的なMySQLユーザには全く関係ない事例と言えます．

また，NDBテーブルタイプはメモリ上のストレージエンジンを使って実装され
ています（それが速い理由でもあるのですが）．99.999%以上の高可用性を実
現するためには，データベース全体の倍（更にOSやその他のオーバヘッドも必
要です）のメモリが必要です．たとえば，100GBのデータベースがあるとする
と，メモリが220-240GB必要になるのです．つまり，32GBのメモリを積んだ8台
のマシンが必要ってこと？また，私が話をしたMySQLのコンサルタントによれ
ば，ボトルネックはネットワーク"なので，それらのマシンには「ギガビット
イーサよりも高速なネットワーク」がバックボーンとして必要になるというこ
とです．

結局，NDBテーブルタイプが有効なのは，特殊なケースだと言うことになりま
す．そのような例としては，中断することなくセンサーからデータを読み出さ
なければならないようなシステムが思い浮かびます．普通センサーシステムは
参照整合性には拘らないので，ロギングシステムに実際無関係です．データは
その場で記録され，後で訂正されます．私が思うに，より複雑なアプリケーショ
ンが使用するのと同じSQLクエリエンジンの中で，同時にロギングデータも処
理できるのは意味があると思います．ただし，それだけのこと．その程度のこ
となら普通のDBにログデータを一括ロードすればよいだけの話ですから．ログ
データを本当にその場で解析する必要性があるのでなければ，何台もの10万ド
ルのハードやネットワーク機器を，ただメモリクラスタで遊ぶために使用する
のはちょっとやり過ぎだと思います．

Oracleの商品戦略担当副社長である Ken Jacobs はこう指摘しています．
「MySQL はサードパーティーの技術を使って製品の欠点を補おうとしている．
つまり，クラスタであろうとなかろうと，MySQL自身が今やOracleやその他の
DBと戦えるようになった，というわけではないのだ」Jacobs氏は全面的に正し
いと思います．以前にもMySQLはInnoDBを追加していますし，今回は機能的に
制限のあるマルチマスタレプリケーション機能を追加しました．InnoDBテーブ
ルハンドラを取り込んだソリューションを開発すれば役に立つものができたの
に，MySQLは車に5番目の車輪を付けようとしているわけです．

>> I use PostgreSQL and I would appreciate to have the same features in 
>> PostgreSQL.
> 
> Sure, so would I.  Talk to Jan Wieck about what he plans to do
> about it, and maybe consider supporting that development work too ;-)

Ken Jacobs氏はこうも言っています．「OracleのReal Application Clusterに
匹敵するものはない」それは確かにそうです．しかしながら，PostgreSQLは今
やSQLの機能や単体のDB性能はOracleに匹敵するものになっています．レプリ
ケーションだけが2年以上遅れをとっています．

今必要なことは，Slony-Iを世の中に送り出して普及させ，数度のリリースを
経て改良していくことです．そういう状況なったら，稼働実績のある非同期レ
プリケーションシステムをベースに，同期マルチマスタシステムを開発してい
くつもりです．私見では，運用を停止することなく故障したノードを再構成で
きなければ，こういった「高可用性」バブルは無意味です．こうした機能は
MySQLのロードマップ上では，2008年頃?にリリースされる予定のバージョン
5.1に設定されています．Slonyはすでに非同期マスタ/スレーブレプリケーショ
ンシステムにおいてこの機能を実現しています．

Jan

---------------------------------------------------
[以下原文]

Subject: Re: [GENERAL] Replication
From: Jan Wieck <JanWieck ＠ Yahoo.com>
To: Andrew Sullivan <ajs ＠ crankycanuck.ca>
Cc: pgsql-general ＠ postgresql.org
Date: Wed, 21 Apr 2004 11:23:51 -0400
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624
X-Mew: Charset (windows-1252) for body is not supported.

Andrew Sullivan wrote:

> On Tue, Apr 20, 2004 at 11:26:24AM +0200, Pailloncy Jean-G?rard wrote:
>> Hi,
>> 
>> I just see that Mysql will propose at the end of the month a full 
>> synchronous replication system with auto-recovery.
> 
> Well, sort of.  It seems to be yet another 80/20 Solution From MySQL
> (tm).
> 
> It looks like it's based on a new table type.  It stores everything
> in memory, and then writes out asynchronously.  This strikes me as
> pretty dangerous from the point of view of reliability: what if the
> box dies before the write is complete?  (And don't tell me about
> super-redundant high-availability hardware.  I _have_ all that.  All
> hardware sucks; HA stuff just sucks less often at a higher price.)
> Also, it doesn't support the other table types.  I don't want to
> contemplate the horrible mess you'd have to clean up if you had a
> transaction crossing three table types and get a hardware failure. 
> 
> I'm afraid I agree with the recently-posted Oracle Veep interview:
> this does not represent any serious challenge to the core ORAC
> market.

Quoting from the MySQL(tm) FAQ about MySQL(tm) Cluster(tm) avaliable at 
http://www.mysql.com/products/cluster/faq.html

<quote>
Q: Does MySQL Cluster work with MyISAM and InnoDB?

A: MySQL Cluster can include the MyISAM and InnoDB storage engines. Of 
these, the high-availability data must reside in the MySQL Cluster 
storage engine.

The MySQL Cluster DB node stores MySQL Cluster data, the MySQL Server 
parses SQL and sends requests to the DB node. The MySQL Server does not 
store any data belonging to the MySQL Cluster storage engine.

InnoDB/MyISAM data is still stored in the MySQL server and can be used 
in the standard way, but that data is not replicated, so that data is 
not visible from any other MySQL server that is connected to the MySQL 
Cluster.
</quote>

It is just another table handler made available for the SQL query 
engine. Touting loudly and on all available channels that "MySQL Cluster 
combines the world's most popular open source database with a 
parallel-server" naturally leads to the misinterpretation that all the 
wonderfull new features like foreign keys, MVCC and rollback will now 
horizontally scale over multiple, high available nodes. This is not true.

The NDB table type does not have support for foreign keys, constraints, 
triggers. It does support transactions, but these transactions are not 
the same transactions as the ones of the InnoDB table handler, so a 
COMMIT is not atomic across different table types. MySQL likes to point 
out that the largest systems like SAP R/3 do not use referential 
integrity on the database level. That is true so far, but having worked 
for many years as an SAP base consultant I can tell you that the reason 
for that is NOT performance. SAP spends that effort multiple times by 
implementing their own, custom integrity control and data domain system 
in the DB abstraction layer, to gain DB vendor independence. That 
abstraction layer is larger than PHP and Apache together, so this 
example is IMHO totally irrelevant for the typical MySQL user.

Also, the NDB table type is based on an in-memory, partitioned storage 
engine (that's where the speed comes from) and to get high availablility 
one needs at least two times the full database size in RAM (plus some 
for the OS and other overhead), and a higher factor to really achieve 
the 99.999%. So to serve let's say a 100 GB database, we're talking 
about 220-240 GB of RAM. Now that's 8 boxes with 32GB each? And 
according to a MySQL consultant I spoke with, the real bottleneck is the 
network, so these boxes like to have "better than Gigabit Ethernet" as a 
backbone. That are some decent hardware requirements, make sure you have 
a forklift on your next shopping list.

So what one gets with NDB on the bottom line is another table type that 
is usefull for some special cases. I can imagine for example systems 
that read sensor data, which cannot be interrupted. Sensors usually 
don't care much about referential integrity, so for the logging system 
this is in fact irrelevant, the data has to be stored now and corrected 
later. I think it is indeed a big plus for a system, to make that 
logging data available inside the same SQL query engine where the more 
complicated bits and pieces of the application are implemented in. But 
that is all, and that can pretty easy be achieved by doing bulk-loads of 
the log data into regular database tables. Unless one really needs the 
ability to query and analyse up to the last second of logdata, running 
some multiple 100 kilodollar hardware and network equipment just for the 
fun of a memory cluster solution is a bit overkill.

As the Oracle VP of product strategy, Ken Jacobs, pointed out: "MySQL is 
trying to address certain product shortcomings by acquiring a 
third-party technology. This does not mean they now have a product that 
is competitive with Oracleor even otherdatabase products, whether 
clustered or not.". Absolutely right Mr. Jacobs, they have done that 
before by adding InnoDB, now they added some limited multimaster 
replication capabilities. But instead of developing an integrated 
solution that includes the InnoDB table handler, where this 
functionality would be usefull, they just added a fifth wheel to the cart.

> 
>> I use PostgreSQL and I would appreciate to have the same features in 
>> PostgreSQL.
> 
> Sure, so would I.  Talk to Jan Wieck about what he plans to do
> about it, and maybe consider supporting that development work too ;-)

Ken Jacobs further said "No one has anything at all like Oracle's Real 
Application Clusters". And that is right too. However good PostgreSQL by 
now compares on SQL features and standalone DB performance. On 
replication we are 2 years or more behind.

Right now we need to get the Slony-I project out the door and let that 
settle a bit and maybe get enhanced over one more release. With that as 
the base, we will start designing a synchronous multimaster system that 
can be jump-started from a running, asynchronous replication setup. All 
this "high-availability" babble is IMHO totally pointless as long as 
there is no way of (re)creataing a (failed) node from scratch without 
taking an outage. And that functionality is listed on the MySQL roadmap 
for 5.1 ... so somewhere in 2008? Slony does that for async master-slave 
right today.

Jan