Big Data Part III: Performance, Predictability, Price

This is the last in a three part series on the processing and infrastructure of AddThis. Make sure to check out part one and part two.

In the end, our decisions rest on cost and control.  For our uses, CPU cycles in the cloud are about 3x more expensive than the fully loaded cost of dedicated cycles in our own data center.  Data stored in the cloud is at least 10x more expensive than in our own clusters.  Even if data storage were free, we would still be paying for CPU cycles to process the data and network costs to expose it.  But the killer is still latency and IO bottlenecks.  At our scale, there aren’t meaningful pricing options to overcome these in the cloud.

As both our business and cloud offerings have matured, we would still make the same choices around internal build-out vs shared cloud.  If there is one reason that drives this decision, it’s the fact that we have a business in which our needs can be modeled and anticipated.  And with the ability to predict comes the ability to optimize resources for both cost and performance.  With our own build-outs, we are able to regain control and apply it to the bottom line.  But to achieve this goodness requires a clear understanding of your needs, a skilled dev/ops team, a hacker mentality and focus.

  • clouds remain king for spot capacity, typical web apps, development
  • they do not solve complex management problems or magically scale apps
  • operations at even modest scale still requires skilled, creative staff
  • once a baseline of capacity is known, dedicated hardware can’t be beat

Depending on the profile of your business, cloud computing could be a game changer.  Our decision to bypass the traditional cloud was game changing for us;  we are noticeably stronger, faster and larger than our competitive set.  And with our developed abilities in large scale data and processing, we are in control of our destiny, ready for what comes next.

