2018年11月11日日曜日

Amazon Auroraの先進性を誰も解説してくれないから解説する

TL;DR;

· Amazon AuroraはIn-Memory DBでもなくDisk-Oriented DBでもなく、In-KVS DBとでも呼ぶべき新地平に立っている。

· その斬新さたるやマスターのメインメモリはキャッシュでありながらWrite-BackでもなくWrite-Throughでもないという驚天動地。

· ついでに従来のチェックポイント処理も不要になったのでスループットも向上した。

· 詳細が気になる人はこの記事をチェキ！

Amazon Aurora

Amazon AuroraはAWSの中で利用可能なマネージド(＝運用をAWSが面倒見てくれる)なデータベースサービス。
ユーザーからはただのMySQL、もしくはPostgreSQLとして扱う事ができるのでそれらに依存する既存のアプリケーション資産をそのまま利用する事ができて、落ちたら再起動したりセキュリティパッチをダウンタイムなしで(!?)適用したりなどなどセールストークを挙げだすとキリがないけど、僕はAWSからお金を貰っているわけではないのでそこは控えめにしてAuroraでのトランザクションの永続性について論文から分かる範囲と想像で補った内容を説明していく。

Auroraのアーキテクチャ

AWSの公式資料を取ってこればいくらでもそれっぽい図はあるが、説明に合わせて必要な部分だけ切りだした。

AZとはAvailability Zoneの事で、AWSのデータセンターで障害が発生した場合に別の故障単位になるよう設計されているユニットの事である。物理的には部屋が分かれているのか建物が分かれているのかわからないが、電源やスイッチは確実に系統が分かれておりミドルウェアのバージョンアップなども分かれているという。それをまたがる形でMasterが一つとSlaveが複数(最大15台)立ち上がる。MasterはDBに対する読み書き両方ができるが、Slaveは読み出ししかできない。典型的なWebサービスは読み出しが負荷の多くを占めるので読み出し可能な複製が複数用意できるのは理に適っている。

このそれぞれのコンポーネントを繋ぐ矢印はRedo-Logを表している。Redo-Logとは「特定のページを書き換える操作とその内容」が記述されたDBログの最小単位である。一般にDBを複製すると言うと読み書きされるあらゆるデータが複製されるものであるがAuroraではこのRedo-Logしか複製しない点が面白い。論文中に THE LOG IS THE DATABASE とでっかく書いてあるのは恐らくこの辺に由来する。

Masterは普通のMySQL(もしくはPostgreSQL)サーバのように見えユーザから読み書きがリクエストできる。
InnoDBの代わりのバックエンドのデータストアとして分散KVSが稼働しており、その分散KVSはAZを跨った6多重に冗長化されている。論文中ではKVSだなんて一言も書いていないがストレージバックエンドの説明として理解しやすいのであえてKVSに喩えて説明していく。
6多重のうち4つにまで保存できた段階で永続化完了と見なしユーザに返答する事でレイテンシの短縮を図っている。システムはいろんなノイズで遅れるが、全体の足を引っ張って律速するは決まってstrugglerであり、90パーセンタイルぐらいであれば圧倒的に機敏に返事を返してくるのは巨大システムの常である。
全部の複製が全く同じ情報を持っていないといけないので、仮にログを取りこぼした複製がいたとしてもMasterに聞き直さず複製同士でGossip通信を行って全部のログを全員が受け取るように取り計らう。
この辺の話はAWSの人の公式スライドにも腐るほど出てくるので僕は詳しく説明しない。

トランザクションの挙動の違い

どれかのDBにとって極端に都合が良いワークロードで比較しても単なるセールストークにしかならない。
複数の方式のDBが明白に異なる挙動をする典型例のワークロードとして「巨大テーブルの全部の行の特定のフラグを立てる」というトランザクションを例に挙げて伝統的なDisk-Oriented DB・In-Memory DB・Auroraの動作を順に説明する。

SQL文としてはこんな感じである。

UPDATE table1 SET flag = true;

なおこのtable1はものすごく行数が多い（＝縦に長い）とする。

Disk-Oriented DBの挙動

まず巨大テーブル全体を一気にメモリに置くアーキテクチャにはなっておらず、メモリ上に用意したデータベースページ領域にDisk上のDBの一部を複製してくる所から始まる。ここまではMySQLでもPostgreSQLでも同じはず。この文脈でのページとはDBの中身の一部が(MySQLならデフォルトで)16KBの大きさ毎に詰め込まれた連続したメモリ領域であり、OSが提供するメモリぺージとは少し違う。(ちなみにPostgreSQLのデフォルトページサイズは8KB）

Disk上の全域データを直接一瞬で書き換えることは当然できないので、狭いメモリ空間でLRU等を用いて取り回しながら書き込みの終わった未コミットなダーティページをディスクに書き戻しながら進行する他ない¹。だがそんなことをすると、その瞬間にDBのプロセスが強制終了してリスタートした時に未コミットなダーティページがディスク上から読み出し可能な状態で観測される恐れがある。それではACIDのうちAに違反してしまう。そこで各DBは僕の知る限り以下の挙動をとる。

ARIES

進行しながらRedo-Undo logをディスクに永続化し、もし途中でシステムがリスタートした時はリカバリとしてUndo処理を行う。

この図で言うとページ1は未コミットなトランザクションによって既に書き換えられているが、Undo-Redo Logの形で既にWALを永続化しているのでリカバリ可能でありダーティなページはそのままディスクに永続化して構わない。なので永続化して空けたBuffer Poolのスペースに次に更新したいページ5をフェッチしてくる事ができる。

PostgreSQL

　上書きは常に新たなバージョンでの追記操作であり、clogというデータで保存されているトランザクションステータスが commited でない限り読み出しできない。したがって痕跡は物理的にページに残るがデータベースのユーザからは不可視であり問題にならず、いずれバキュームされて物理的にも消失する。
この図でいうと、ページ1は物理的にはダーティだが追記がされているだけでありclogのお陰で論理的に他のトランザクションから見えない。なのでそのままディスクに永続化されても問題が発生しないのでこうしてバッファプールからそのままページ1を追い出して、空いた領域にページ5をフェッチしてくる事ができる。

MySQL

ibdataの中に更新前の値が保存されており、ディスクに書き戻される際にはそちらも永続化されるので、リスタート時のリカバリ処理でibdataとテーブルデータを突き合わせて可視なデータがユーザから見えるように整合性を保つ（詳しくないが多分）。

いずれにせよ、トランザクションが走りながらログを記述していく事は共通している。

In-Memory DBの挙動

全部のデータがメモリに収まる前提を置いて良いのでこちらはだいぶシンプルに収まる。
進行途中でログを書き出す必要は無いし、バッファの中でLRU等を用いてどのページをディスクに書き戻すかなども心配しなくてよい。
トランザクションログを書き出すタイミングは典型的な実装としてはコミット時に一気に書き出す事が多いようだ。

リスタート時はログデータをスキャンしてデータベースを再構築するので、ユーザから commit が命じられていないトランザクションはログにすら残っておらず、ダーティページはそもそも概念が存在しない。

Auroraの挙動

メモリにもローカルのディスクにもテーブル全体が入りきらない前提で設計されている。
トランザクションの都合上必要なページがMasterのメモリで運良くキャッシュできていない場合、KVSに問い合わせを行いページを持ってくる。
なお、KVSは物理的には6多重で保存しているが論理的には一つのデータが6重に保存されているだけなので論理的には1つのストレージ領域と考えて良い(RAID1を論理的には単一のHDD扱いするのと同様)のでそう書く。

走りながら当然ログも永続化していく。6多重で保存されるのはログも同じだ。前述したように驚くべき事にRedo-logしか保存していかない。

当然Masterのメモリには全データ乗らないので、どうにかして処理用にメモリを取り回す必要がある。
そこでMasterは一番使わないと判断したページをKVSに書き戻…さずに捨てる。　　もう一度言う、捨てるのだ、キャッシュなのに。

そんなことをしたらKVSに載ったページは古いままじゃないかと心配になるが、Auroraの分散KVSは単なるストレージではなくてAurora用の専用のロジックが駆動するインテリジェントな分散KVSである。
こいつらはMasterから受け取ったRedo-Logを必要に応じて手元のページに適用(Apply)していく事ができる²。

なんでせっかく作った更新済みPage1を捨ててまで新たにKVS側でログを適用し直すかというと、基本的にAWSにおいてMasterのCPUやネットワーク資源は限られたリソースである一方、KVS側のCPUは相対的に持て余したリソースであり安いこと。さらには後に述べるチェックポイントの簡潔さのために完全に分散KVS側に倒した設計を行っているように見える。
MasterからKVSへはRedo-logしか流れないし、KVSからMasterへはページしか流れない。圧倒的にシステムが簡単になった。

Masterがページを問い合わせる場合、バージョン番号もセットで問い合わせるのでそこまでに投げつけたRedo-logをKVS側で適用した最新ホカホカのページが返ってくるのでMasterは手元のメモリに乗っているダーティなページを気兼ねなく任意のタイミングで捨てて構わない。問い合わせの際はトランザクションの識別子を入れて引いてくるので、読んではいけないDirtyなページを獲得することはない。Slaveがページを問い合わせる場合は必ず永続化されたバージョンのものだけを読むようにしている。
ついでに言うとSlaveのページはMasterが6多重な分散KVSの他にSlaveにもRedo-logを投げつける。それを受け取るたびに(恐らくKVSと同じようなロジックで)ログ適用を行い、最新のコミット済みデータが読めるようになっている。ここで気づいた人もいると思うが、MasterはSlaveにログを共有するがその完了を待つとは一言も書いていない。4/6のKVS永続化が完了した時点でユーザにコミットを報告してしまう。なのでMaster側で更新を確認したデータがSlave側で読めるようになるには若干のタイムラグが発生する可能性がある。いわゆるSequential Consistencyである。ミリ秒オーダーなのでHTTPなWebサービスの文脈で大問題になるケースは稀だが覚えておいた方がいいかも知れない。

チェックポイントの挙動の違い

Auroraはシステム全体で見ると、Masterがせっかく更新したページをそのまま複製せずにKVSがログリプレイして再構築する分CPUクロックは無駄になっている。しかし、Masterはページを書き戻す必要が無くなり、更に言うとMasterがチェックポイント処理をする必要もなくなった。なぜならチェックポイント処理は分散KVS側で継続的にページ単位で実施されているからだ。なんだこれは。In-MemoryDBでもDisk-Oriented DBとも違うチェックポイントアーキテクチャだ。それぞれのチェックポイント戦略をここに列挙する。

· ARIES: Checkpoint-Begin をWALに書いてからその瞬間のDirty Page TableとTransaction Tableを保存して、リスタート時のRedo-Log適用開始ポイントを算出可能にする。

· MySQL: ダーティなページをディスクに書き出す。ページの境界とブロックストレージのページ境界が一致しない事のほうが普通なのでチェックポイント中に電源が落ちたらページの一部が中途半端に永続化されてしまう。そこで二度書く事によってアトミック性を達成する（Double Write と呼ぶ）。

· PostgreSQL: ダーティなページをディスクに書き出す。ページの境界とブロックストレージのページ境界が一致しない事のほうが普通なのでチェックポイント中に電源が落ちたらページの一部が中途半端に永続化されてしまう。そこでそのチェックポイント後に最初にそのページに触るWALの中にページ(デフォルトで8KB)を丸っと埋め込んで完全性を保障する(Full Page Writeと呼ぶ)。

· In-Memory DB: どこかのタイミングでメモリの内容をモリッとディスクに書き出してリスタート時に整合性を直すSiloRとか、ログを並列スキャンして完全なイメージを生成するFOEDUSとか戦略はまだ多岐に渡っている。

· Aurora: バックエンドのAuroraストレージが自動でログを適用していく。ページごとにログバッファが付いてて、バッファの長さがしきい値を超えるたびにページへのログ適用が実施される。ログは未コミットのトランザクションの進行中のログも含むがMasterがリスタートしている時点でそのトランザクションはそれ以上進むはずがないのでログを切り詰める(Truncate)。その際には最新の永続化済みのコミット完了のLSNまで復旧する。なおこの復旧処理はMasterが元気に進行している最中であってもバックグラウンドで良しなに実行される。ここのバックグラウンド処理とチェックポイントに差がないのがAuroraの学術的新規性の一つだと思う。Redo-logリプレイ機能付き分散ストレージがいればチェックポイントに係る複雑さが一気に解決できる。

ベンチマーク結果

論文から抜粋すると

大きめのインスタンスの場合に性能向上の伸びしろが大きいようだ。

その他

なんか他に工夫ないの

ログ処理周りは大胆に手が加えられており、中でも感心したのはFlush Pipeliningが実装されている。
通常、ログが永続化されるのを待つにはロガーにログ内容を渡して、完了が報告されるまでセマフォなどで寝るのが典型的な実装パターンである。しかしAuroraではロガーにログ内容を渡した後に「終わったらクライアントに完了を報告せよ」とキューに依頼を投げ込むだけで、そのスレッドは即座に次のリクエストを捌く処理に移行する。キューの中身を確認する専用のスレッドが居て、新たに永続化されたログのLSNとキューに登録された依頼を見比べて、永続化されたコミットの完了をクライアントに代理で報告する。
PostgreSQLでもpgbenchでベンチマークを取ってイジメてみるとすぐにセマフォ処理近辺がボトルネックになるのでAuroraを真似てこの辺弄っても良さそうな気がするが大改造になるのでコミュニティには歓迎されない気がする。

Aurora Multi-Masterってどうなの

この論文で解説されてる仕組みだとLSNの発行からして複数台のマシンからやってダメなのでログのフォーマットのレベルで改造が加えられてそうな気がする。詳しくは動画で
https://www.youtube.com/watch?time_continue=2620&v=rPmKo2g9znA
どうやらパーティション単位で「テーブルのこの範囲はサーバAがリーダーね！」的に分割統治してMasterを複数用意するようだ。そして自分がMasterじゃないテーブルには一応書き込めるが最終的には調停者が決定するとの事である。更新が競合している場合はWrite性能は上がらないが競合していない場合は性能はよく伸びるらしい。

どんな更新がこれから来るかな

分散KVS側のCPUが安くて空いていて、そいつが保存しているページ内容に対してredo-logを適用できる程度に中身を解釈して動いているので、そいつらに集計系クエリを実行させるのはコストメリットが良さそう。貧者のOLAPとしてJOINが苦手なDB実装がクローズドな世界に君臨する可能性はあると思っている。もしくはredo-logをRedshiftにそのまま投げつけていってSlaveの一つとして稼働するようになるとか。

まとめ

· Auroraは投げつけられたRedoログをストレージ側でバックグラウンドで適用できるからMasterの負担が減った。なので性能が伸びるようになった。

· インテリジェントな分散ストレージすげーな！

· これを設計したAmazonの中の人は確実に「分散システムの肌感覚(Struggler対策とかシングルマスター構成とか)」「Disk-Oriented DBのACID保障の裏側(新しいDBページ一貫性保障プロトコル設計)」「Webアプリプログラマの需要(Webの文脈ではSlaveの厳密な一貫性までは必要としない事)」「クラウドのリソース感覚(分散KVSのCPUは遊んでいるリソースだから活用すべき)」の4つ全てについて確実に僕より深い造詣と生きた知識を持っている。恐るべし。

1. 気合の入ったDBならLRUではなくて二重底なvictim付きのページ管理機構にすることで単なるフルスキャンがバッファプール全部を追い出す状況を作らないように頑張ったりするがそれは今回は話さない） ↩

2. この処理を6多重全部でやると重いので実は一部のマシンでしかこのApply操作はしないらしい。

https://qiita.com/kumagi/items/67f9ac0fb4e6f70c056d

When Should I Use Amazon Aurora and When Should I use RDS MySQL?

Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?

DBaaS cloud services allow users to use databases without configuring physical hardware and infrastructure, and without installing software. I'm not sure if there is a straightforward answer, but when trying to find out which solution best fits an organization there are multiple factors that should be taken into consideration. These may be performance, high availability, operational cost, management, capacity planning, scalability, security, monitoring, etc.

There are also cases where although the workload and operational needs seem to best fit to one solution, there are other limiting factors which may be blockers (or at least need special handling).

In this blog post, I will try to provide some general rules of thumb but let's first try to give a short description of these products.

What we should really compare is the MySQL and Aurora database engines provided by Amazon RDS.

An introduction to Amazon RDS

Amazon Relational Database Service (Amazon RDS) is a hosted database service which provides multiple database products to choose from, including Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server. We will focus on MySQL and Aurora.

With regards to systems administration, both solutions are time-saving. You get an environment ready to deploy your application and if there are no dedicated DBAs, RDS gives you great flexibility for operations like upgrades or backups. For both products, Amazon applies required updates and the latest patches without any downtime. You can define maintenance windows and automated patching (if enabled) will occur within them. Data is continuously backed up to S3 in real time, with no performance impact. This eliminates the need for backup windows and other, complex or not, scripted procedures. Although this sounds great, the risk of vendor lock-in and the challenges of enforced updates and client-side optimizations are still there.

So, Aurora or RDS MySQL?

Amazon Aurora is a relational, proprietary, closed-source database engine, with all that that implies.

RDS MySQL is 5.5, 5.6 and 5.7 compatible and offers the option to select among minor releases. While RDS MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability. Until recently, it was a limitation that Aurora was only compatible with MySQL 5.6 but it's now compatible with both 5.6 and 5.7 too.

So, in most cases, no significant application changes are required for either product. Keep in mind that certain MySQL features like the MyISAM storage engine are not available with Amazon Aurora. Migration to RDS can be performed using Percona XtraBackup.

For RDS products shell access to the underlying operating system is disabled and access to MySQL user accounts with the "SUPER" privilege isn't allowed. To configure MySQL variables or manage users, Amazon RDS provides specific parameter groups, APIs and other special system procedures which be used. If you need to enable remote access this article will help you do so https://www.percona.com/blog/2018/05/08/how-to-enable-amazon-rds-remote-access/

Performance considerations

Although Amazon RDS uses SSDs to achieve better IO throughput for all its database services, Amazon claims that the Aurora is able to achieve a 5x performance boost than standard MySQL and provides reliability out of the box. In general, Aurora seems to be faster, but not always.

For example, due to the need to disable the InnoDB change buffer for Aurora (this is one of the keys for the distributed storage engine), and that updates to secondary indexes must be write through, there is a big performance penalty in workloads where heavy writes that update secondary indexes are performed. This is because of the way MySQL relies on the change buffer to defer and merge secondary index updates. If your application performs a high rate of updates against tables with secondary indexes, Aurora performance may be poor. In any case, you should always keep in mind that performance depends on schema design. Before taking the decision to migrate, performance should be evaluated against an application specific workload. Doing extensive benchmarks will be the subject of a future blog post.

Capacity Planning

Talking about underlying storage, another important thing to take into consideration is that with Aurora there is no need for capacity planning. Aurora storage will automatically grow, from the minimum of 10 GB up to 64 TiB, in 10 GB increments, with no impact on database performance. The table size limit is only constrained by the size of the Aurora cluster volume, which has a maximum of 64 tebibytes (TiB). As a result, the maximum table size for a table in an Aurora database is 64 TiB. For RDS MySQL, the maximum provisioned storage limit constrains the size of a table to a maximum size of 16 TB when using InnoDB file-per-table tablespaces.

Replication

Replication is a really powerful feature of MySQL (like) products. With Aurora, you can provision up to fifteen replicas compared to just five in RDS MySQL. All Aurora replicas share the same underlying volume with the primary instance and this means that replication can be performed in milliseconds as updates made by the primary instance are instantly available to all Aurora replicas. Failover is automatic with no data loss on Amazon Aurora whereas the replicas failover priority can be set.

An explanatory description of Amazon Aurora's architecture can be found in Vadim's post written a couple of years ago https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-deeper/

The architecture used and the way that replication works on both products shows a really significant difference between them. Aurora is a High Availablity (HA) solution where you only need to attach a reader and this automatically becomes Multi-AZ available. Aurora replicates data to six storage nodes in Multi-AZs to withstand the loss of an entire AZ (Availability Zone) or two storage nodes without any availability impact to the client's applications.

On the other hand, RDS MySQL allows only up to five replicas and the replication process is slower than Aurora. Failover is a manual process and may result in last-minute data loss. RDS for MySQL is not an HA solution, so you have to mark the master as Multi-AZ and attach the endpoints.

Monitoring

Both products can be monitored with a variety of monitoring tools. You can enable automated monitoring and you can define the log types to publish to Amazon CloudWatch. Percona Monitoring and Management (PMM) can also be used to gather metrics.

Be aware that for Aurora there is a limitation for the T2 instances such that Performance Schema can cause the host to run out of memory if enabled.

Costs

Aurora instances will cost you ~20% more than RDS MySQL. If you create Aurora read replicas then the cost of your Aurora cluster will double. Aurora is only available on certain RDS instance sizes. Instances pricing details can be found here and here.

Storage pricing may be a bit tricky. Keep in mind that pricing for Aurora differs to that for RDS MySQL. For RDS MySQL you have to select the type and size for the EBS volume, and you have to be sure that provisioned EBS IOPs can be supported by your instance type as EBS IOPs are restricted by the instance type capabilities. Unless you watch for this, you may end up having EBS IOPs that cannot be really used by your instance.

For Aurora, IOPs are only limited by the instance type. This means that if you want to increase IOPs performance on Aurora you should proceed with an instance type upgrade. In any case, Amazon will charge you based on the dataset size and the requests per second.

That said, although for Aurora you pay only for the data you really use in 10GB increments if you want high performance you have to select the correct instance. For Aurora, regardless of the instance type, you get billed $0.10 per GB-month and $0.20 per 1 million requests so if you need high performance the cost maybe even more than RDS MySQL. For RDS MySQL storage costs are based on the EBS type and size.

Percona provides support for RDS services and you might be interested in these cases studies:

When a more fully customized solution is required, most of our customers usually prefer the use of AWS EC2 instances supported by our managed services offering.

TL;DR

If you are looking for a native HA solution then you should use Aurora
For a read-intensive workload within an HA environment, Aurora is a perfect match. Combined with ProxySQL for RDS you can get a high flexibility
Aurora performance is great but is not as much as expected for write-intensive workloads when secondary indexes exist. In any case, you should benchmark both RDS MySQL and Aurora before taking the decision to migrate. Performance depends much on workload and schema design
By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades
If you need to use MySQL plugins you should use RDS MySQL
Aurora only supports InnoDB. If you need other engines i.e. MyISAM, RDS MySQL is the only option
With RDS MySQL you can use specific MySQL releases
Aurora is not included in the AWS free-tier and costs a bit more than RDS MySQL. If you only need a managed solution to deploy services in a less expensive way and out of the box availability is not your main concern, RDS MySQL is what you need
If for any reason Performance Schema must be ON, you should not enable this on Amazon Aurora MySQL T2 instances. With the Performance Schema enabled, the T2 instance may run out of memory
For both products, you should carefully examine the known issues and limitations listed here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html and here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.AuroraMySQL.html

https://www.percona.com/blog/2018/07/17/when-should-i-use-amazon-aurora-and-when-should-i-use-rds-mysql/

登録: 投稿 (Atom)