Today, we are releasing a plugin that allows customers to use the Titan graph engine with Amazon DynamoDB as the backend storage layer. It opens up the possibility to enjoy the value that graph databases bring to relationship-centric use cases, without worrying about managing the underlying storage.

关系的重要性

关系的一个基本方面physical and virtual worlds. Modern applications need to quickly navigate connections in the physical world of people, cities, and public transit stations as well as the virtual world of search terms, social posts, and genetic code, for example. Developers need efficient methods to store, traverse, and query these relationships. Social media apps navigate relationships between friends, photos, videos, pages, and followers. In supply chain management, connections between airports, warehouses, and retail aisles are critical for cost and time optimization. Similarly, relationships are essential in many other use cases such as financial modeling, risk analysis, genome research, search, gaming, and others. Traditionally, these connections have been stored in relational databases, with each object type requiring its own table. When using relational databases, traversing relationships requires expensive table JOIN operations, causing significantly increased latency as table size and query complexity grow.

输入图形数据库

Graph databases belong to the NoSQL family, and are optimized for storing and traversing relationships. A graph consists of vertices, edges, and associated properties. Each vertex contains a list of properties and edges, which represent the relationships to other vertices. This structure is optimized for fast relationship query and traversal, without requiring expensive table JOIN operations.

In this way, graphs can scale to billions of vertices and edges, while allowing efficient queries and traversal of any subset of the graph with consistent low latency that doesn’t grow proportionally to the overall graph size. This is an important benefit for many use cases that involve accessing and traversing small subsets of a large graph. A concrete example is generating a product recommendation based on purchase interests of a user’s friends, where the relevant social connections are a small subset of the total network. Another example is for tracking inventory in a vast logistics system, where only a subset of its locations is relevant for a specific item. For us at Amazon, the challenge of tracking inventory at massive scale is not just theoretical, but very real.

Graph databases at Amazon

与许多AWS的创新一样,为可扩展图数据库构建解决方案的愿望来自亚马逊的零售业务。亚马逊运行世界上最大的履行网络之一,我们需要优化我们的系统,以快速准确地跟踪大量库存的运动。这需要一个数据库,可以快速遍历给定项目或订单的物流历史记录。图表数据库是任务的理想选择,因为它们使其易于存储和检索每个项目的物流历史记录。

我们选择右图引擎的标准是:

  1. The ability to support a graph containing billions of vertices and edges.
  2. The ability to scale with the accelerating pace of new items added to the catalog, and new objects and locations in the company’s expanding fulfillment network.

在评估不同的技术之后,我们决定使用Titan,一个用于创建和查询大图的分布式图数据库引擎。Titan具有可插拔存储体系结构,使用现有的NoSQL数据库作为图形数据的基础存储。虽然基于Titan的解决方案适用于我们的需求,但该团队很快发现本身必须为泰坦的数据库集群提供越来越多的时间,而不是专注于优化履行库存的原始任务跟踪。

Thus, the idea was born for a robust, highly available, and scalable backend solution that wouldn’t require the burden of managing a massive storage layer. As I wrote in the past, I believe DynamoDB is a natural choice for such needs, providing developers flexibility and minimal operational overhead without compromising scale, availability, durability, or performance. Making use of Titan’s flexible architecture, we created a plugin that uses DynamoDB as the storage backend for Titan. The combination of Titan with DynamoDB is now powering Amazon’s fulfillment network, with a multi-terabyte dataset.

与您分享

今天,我们很乐意通过释放来将这项努力带来这一努力的结果DynamoDB存储后端用于GitHub上的TITAN插件。该插件为每个Titan后端表提供灵活的数据模型,允许开发人员为简单性(单项模型)或可扩展性(多项模型)进行优化。必威体育精装版app官网

单项模型使用单个DynamoDB项来存储顶点的边缘和属性。在DynamoDB中,顶点ID被存储为项目的散列键,顶点属性和边缘标识符是属性名称,并且顶点属性值和边缘属性值存储在相应的属性值中。虽然单项数据模型更简单,但由于DynamoDB的400 kB项目大小限制,而您应该仅将其用于具有相当低顶点度和每个顶点的小型性能的图表。

For graphs with higher vertex degrees, the multi-item model uses multiple DynamoDB items to store properties and edges of a single vertex. In the multiple-item data model, the vertex ID remains the DynamoDB hash key, but unlike the single-item model, each column becomes the range key in its own item. Each column value is stored in its own attribute. While requiring more writes to initially load the graph, the multiple-item model allows you to store large graphs without limiting vertex degree.

Amazon’s need for a hassle-free, scalable Titan solution is not unique. Many of our customers told us they have used Titan as a scalable graph solution, but setting up and managing the underlying storage are time-consuming chores. Several of them participated in a preview program for the plugin and are excited to offload their graph storage management to AWS. Brian Sweatt, Technical Advisor at AdAgility, explained:

“在善意上,我们将与广告商和出版商有关的数据,以及关于查看和互动的客户的交易数据。这些利益相关者之间的关系自然地向图表数据库提供了自然,我们计划利用我们对我们下一代广告目标平台的泰坦和Groovy的经验。亚马逊在Titan和DynamoDB之间的集成将允许我们在没有花时间设置和管理存储群集的情况下这样做,这是一个无敏捷,快速增长的启动的脑子。“

另一个客户说AWS使其更容易分析数据内的数据和关系的大图。据美国宇航局喷气式推进实验室的首席技术官汤姆Soderstrom表示:

“我们已经开始在JPL广泛利用图形数据库,并在这些中运行深层机器学习。Titan Over DynamoDB的Open Sourced插件将帮助我们将我们的用例扩展到更大的数据集,同时在完全托管的NoSQL数据库中享受云计算的力量。看到AWS集成DynamoDB与Elasticsearch和Titan等开放的项目,同时开放采购集成,它很令人兴奋。“

Bringing it all together

在构建以与关系(如社交网络或主数据管理)为中心的应用程序或现有应用程序的辅助关系使用情况(例如用于在游戏中匹配玩家的推荐引擎或支付系统的欺诈检测),a图表数据库是一种直观且有效的方法,可以在比例下实现快速性能,并且应该在您的数据库选项中签名。借助TITAN的DynamoDB存储后端的启动,您不再需要担心为Titan图形管理存储层,使其易于管理甚至非常大的图表,如我们在亚马逊的位置。我很高兴听到你如何利用图表数据库为您的应用程序。请在下面的评论部分分享您的想法。

For more information about the DynamoDB storage backend plug-in for Titan, see杰夫·巴尔的博客Amazon DynamoDB Storage Backend for Titan主题在Amazon DynamoDB开必威体育精装版app官网发人员指南

Comments