关系数据库已经存在了很长一段time. The relational model of data was pioneered back in the 1970s by E.F. Codd. The core technologies underpinning the major relational database management systems of today were developed in the 1980–1990s. Relational database fundamentals, including data relationships, ACID (Atomicity, Consistency, Isolation, Durability) transactions, and the SQL query language, have stood the test of time. Those fundamentals helped make relational databases immensely popular with users everywhere. They remain a cornerstone of IT infrastructure in many companies.

这并不是说系统管理员必须喜欢处理关系数据库。几十年来,管理关系数据库是一项高技能,劳动密集型任务。这是一项需要专用系统和数据库管理员的不可分割的关注的任务。缩放关系数据库,同时保持容错,性能和爆炸半径大小(故障的影响)对管理员来说是一种持久的挑战。

At the same time, modern internet workloads have become more demanding and require several essential properties from infrastructure:

  1. Users want to start with a small footprint and then grow massively without infrastructure limiting their velocity.
  2. 在大型系统中,失败是一种规范,而不是例外。客户工作负载必须与组件故障或面部系统故障绝缘。
  3. Small blast radius. No one wants a single system failure to have a large impact on their business.

这些是难题,解决它们需要断绝旧护罩的关系数据库架构。当亚马逊面临着Oracle等旧护罩关系数据库的局面时,我们创建了一个现代关系数据库服务,Amazon Aurora.

Aurora's design preserves the core transactional consistency strengths of relational databases. It innovates at the storage layer to create a database built for the cloud that can support modern workloads without sacrificing performance. Customers love this because Aurora provides the performance and availability of commercial grade databases at 1/10th the cost. Since Aurora's original release, it has been the fastest-growing service in the history of AWS.

我n this post, I'd like to give you a peek under the hood at how we built Aurora. I'll also discuss why customers are adopting it faster than any other service in AWS history.

关系数据库重新称列

Consider the old-guard relational database architecture:

This monolithic relational database stack hadn't changed much in the last 30–40 years. While different conventional approaches exist for scaling out databases (for example, sharding, shared nothing, or shared disk), all of them share the same basic database architecture. None solve the performance, resiliency, and blast radius problems at scale, because the fundamental constraint of the tightly coupled, monolithic stack remains.

To start addressing the limitations of relational databases, we reconceptualized the stack by decomposing the system into its fundamental building blocks. We recognized that the caching and logging layers were ripe for innovation. We could move these layers into a purpose-built, scale-out, self-healing, multitenant, database-optimized storage service. When we began building the distributed storage system, Amazon Aurora was born.

We challenged the conventional ideas of caching and logging in a relational database, reinvented the database I/O layer, and reaped major scalability and resiliency benefits. Amazon Aurora is remarkably scalable and resilient, because it embraces the ideas of offloading redo logging, cell-based architecture, quorums, and fast database repairs.

Offloading redo logging: The log is the database

Traditional relational databases organize data inpages,并且随着页面被修改,它们必须定期刷新到磁盘。为了抵御酸语义的失败和维护的抵御能力,还记录了页面修改重做日志记录, which are written to disk in a continuous stream. While this architecture provides the basic functionality needed to support a relational database management system, it's rife with inefficiencies. For example, a single logical database write turns into multiple (up to five) physical disk writes, resulting in performance problems.

Database admins try to combat the write amplification problem by reducing the frequency of page flushes. This in turn worsens the problem of crash recovery duration. A longer interval between flushes means more redo log records to read from disk and apply to reconstruct the correct page image. That results in a slower recovery.

在amazonaurora中,日志就是数据库。数据库实例将重做日志记录写入分布式存储层,存储层负责根据需要从日志记录构造页面图像。数据库实例永远不必刷新脏页,因为存储层总是知道页的样子。这提高了数据库的几个性能和可靠性方面。由于消除了写放大和使用扩展存储组,写性能大大提高。

For example, Amazon Aurora MySQL-compatible edition demonstrates 5x write IOPS on the SysBench benchmark compared to Amazon RDS for MySQL running on similar hardware. Database crash recovery time is cut down dramatically, because a database instance no longer has to perform a redo log stream replay. The storage layer takes care of redo log application on page reads, resulting in a new storage service free from the constraints imposed by a legacy database architecture, so you can innovate even further.

Cell-based architecture

As I said before, everything fails all the time. Components fail, and fail often, in large systems. Entire instances fail. Network failures can isolate significant chunks of infrastructure. Less often, an entire data center can become isolated or go down due to a natural disaster. At AWS, we engineer for failure, and we rely on基于单元的架构在问题发生之前解决问题。

AWS有多个地理区域(20和计数),在每个区域内,我们有几个可用区域。利用多个区域和区域允许精心设计的服务来生存磨损的磨削组件故障和更大的灾难,而不会影响服务可用性。Amazon Aurora复制到三个区域的所有写入,以提供卓越的数据耐用性和可用性。事实上,极光可以容忍整个区域的损失而不会丢失数据可用性,并且可以从更大的故障恢复。

However, replication is well-known to be resource-intensive, so what makes it possible for Aurora to provide robust data replication while also offering high performance? The answer lies in quorums.

The beauty of quorums

Everything fails all the time. The larger the system, the larger the probability that something is broken somewhere: a network link, an SSD, an entire instance, or a software component. Even when a software component is bug-free, it still needs periodic restarts for upgrades.

The traditional approaches of blocking I/O processing until a failover can be carried out—and operating in "degraded mode" when a faulty component is present—are problematic at scale. Applications often don't tolerate I/O hiccups well. With moderately complex math, it can be demonstrated that, in a large system, the probability of operating in degraded mode approaches 1 as the system size grows. And then, there's the truly insidious problem of "gray failures." These occur when components do not fail completely, but become slow. If the system design does not anticipate the lag, the slow cog can degrade the performance of the overall system.

Amazon Aurora使用Quorums来打击组件故障和性能下降的问题。写法仲裁的基本概念很简单:写入尽可能多的副本,以确保Quorum读取始终找到最新数据。最基本的仲裁示例是“2中的2”:


Vw+Vr>五
Vw>五/ 2
V=3
Vw= V.r=2

例如,您可能有三个物理写入以使用2.在逻辑写操作声明成功之前,您不必等待所有三个完成。如果一个写入失败,或者慢,所以它没关系,因为整体操作结果和延迟不受异常的影响。这是一个很大的事项:即使某些东西被打破,写作也可以成功且快速。

The simple 2/3 quorum would allow you to tolerate a loss of an entire Availability Zone. This is still not good enough, though. While a loss of a zone is a rare event, it doesn't make component failures in other zones any less likely. With Aurora, our goal is Availability Zone+1: we want to be able to tolerate a loss of a zone plus one more failure without any data durability loss, and with a minimal impact on data availability. We use a 4/6 quorum to achieve this:


Vw+Vr>五
Vw>五/ 2
V=6
Vw=4
Vr=3

对于每一次逻辑日志写入,我们将发出六次物理副本写入,并在其中四次写入完成时将写入操作视为成功。如果每个区域有两个副本,那么如果整个可用性区域出现故障,写入仍将完成。如果某个区域发生故障并发生其他故障,您仍然可以实现读仲裁,然后通过执行fast repair.

快速维修和追赶

有不同的方法来做数据复制。我n traditional storage systems, data mirroring or erasure coding occurs at the level of an entire physical storage unit, with several units combined together in a RAID array. This approach makes repairs slow. RAID array rebuild performance is limited by the capabilities of the small number of devices in the array. As storage devices get larger, so does the amount of data that should be copied during a rebuild.

Amazon Aurora uses an entirely different approach to replication, based on sharding and scale-out architecture. An Aurora database volume is logically divided into 10-GiB logical units (protection groups), and each protection group is replicated six ways into physical units (segments). Individual segments are spread across a large distributed storage fleet. When a failure occurs and takes out a segment, the repair of a single protection group only requires moving ~10 GiB of data, which is done in seconds.

此外,当必须修复多个保护组时,整个存储船队参与修复过程。这提供了大规模的带宽,快速完成整个批次维修。因此,如果发生区域损失后跟另一个组件故障,Aurora可能会丢失给定的保护组的几秒钟写法仲裁。但是,自动启动的修复然后以极快的速度恢复可写的。换句话说,极光存储迅速愈合。

如何复制数据六种方式并保持写入的高性能?在传统的数据库架构中,这是不可能的,其中完整页面或磁盘扇区被写入存储,因为网络将被淹没。相比之下,与Aurora,实例只将重做日志记录写入存储。这些记录要小得多(通常是数十个字节),这使得4/6写入仲裁可能会使网络进行重载。

写法仲裁的基本思想意味着一些段可能不会最初收到所有的写入。这些细分会如何处理重做日志流中的空白?Aurora存储节点在他们自己之间连续“八卦”以填充孔(并执行维修)。日志流进步通过日志序列号(LSN)管理紧密策划。我们使用一组LSN标记来维护每个单个段的状态。

What about reads? A quorum read is expensive, and is best avoided. The client-side Aurora storage driver tracks which writes were successful for which segments. It does not need to perform a quorum read on routine page reads, because it always knows where to obtain an up-to-date copy of a page. Furthermore, the driver tracks read latencies, and always tries to read from the storage node that has demonstrated the lowest latency in the past. The only scenario when a quorum read is needed is during recovery on a database instance restart. The initial set of LSN markers must be reconstructed by asking storage nodes.

创新基金会

许多非凡的新极光功能由分布式自我修复的存储架构直接启用。要命名一些:

  • Read scalability:我n addition to the master database instance, up to 15阅读副本can be provisioned across multiple zones in Aurora, for read scalability and higher availability. Read replicas use the same shared storage volume as the master.
  • Continuous backup and point-in-time restore:Aurora存储层连续透明地返回到Amazon S3的重做日志流。betway88体育官网您可以在配置的备份窗口中的任何时间戳执行时间恢复。无需计划快照创建,并且当最接近感兴趣时间的快照距离更远时,没有事务丢失。
  • Fast clone:Aurora存储层可以快速创建一个体积的物理副本,而无需复制所有页面。最初在父卷和子卷之间共享页面,当修改页面时,编写复制完成。克隆卷时没有重复成本。
  • Backtrack:一种快速的方式,将数据库带到特定的时间点,而无需从备份执行完整的还原。错误地掉了一张桌子?您可以使用Aurora BackTrack回头。

There are many more relational database innovations to come, built on the foundation of the Aurora storage engine. We've all entered a new era of the relational database, and Aurora is just the beginning. The customer response has been one of resounding agreement. Leaders in every industry—like Capital One, Dow Jones, Netflix, and Verizon—are migrating their relational database workloads to Aurora, including MySQL and PostgreSQL-compatible editions.

Want to learn more about Amazon Aurora design?

Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.在Sigmod 2017.

Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes.在Sigmod 2018.

评论

博客评论支持Disqus