Performance and Scalability

||Comments ()

In theA Word On Scalabilityposting I tried to write down a more precise definition of scalability than is commeonly used. There were good comments about the definition at the posting as well as in a discussion atThe ServerSide.

To recap in a less precise manner I stated that

  • A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added
  • An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.
  • A scalable service needs to be able to handle heterogeneity of resources.

关于在定义中使用性能有很多评论。这就是我在此上下文中的表现方式的推理方式:我假设每个服务都有一个SLA合同,它定义了客户/客户的期望(SLA =服务级别协议)。究竟在那个SLA依赖于您所处的服务业务;对于Amazon.com网站有助于延迟驱动的SLA有很多服务。这种延迟将具有一定的分发,您可以选择分发的许多点作为测量SLA的代表。例如,在亚马逊,我们还追踪了99.9%的标志的延迟,以确保所有客户都在SLA或更好的情况下获得体验。

This SLA needs to be maintained if you grow your business. Growing can mean increasing the number of requests, increasing the number of items you serve, increasing the amount of work you do for each request, etc. But no matter along which axis you grow, you will need to make sure you can always meet your SLA. Growth along some axis can be served by scaling up to faster CPUs and larger memories, but if you keep growing there is an end to what you can buy and you will need to scale out. Given that scaling up is often not cost effective, you might as well start by working on scaling out, as you will have to go that path eventually.

I have not seen many SLAs that are purely throughput driven. It is often a combination of the amount of work that needs to be done, the distribution in which it will arrive and when that work needs to be finished, that will lead to a throughput driven SLA. Latency does play a role here as it is often a driver for what throughput is necessary to achieve the output distribution. If you have a request arrival distribution that is non-uniform you can play various games with buffering and capping the throughput at lower than you peak load as long as you are willing to accept longer latencies. Often it is the latency distribution that you try to achieve that drives you throughput requirements.

There were some other points made with respect to what should be part of a scalability definition, among others byGideon Low @ the serverside thread(I tried to link to his individual response but seem to fail) who make some good points.

  • Operationally efficient – It takes less human resources to manage the system as the number of hardware resources scales up.
  • Resilient – Increasing the number of resources will also increase the probability of failure of one of those resources, but the impact of such a failure should be reduced as the number of resource grows.

These two points combined with a discussion about cost/capacity/efficiency should be part of a definition of a scalable service. I’ll be thinking a bit about what the right wording should be and will post a proposal later.


blog comments powered byDisqus