Nirvanix is abruptly shutting down.
Leave the gun. Take the cannoli.1

Some weeks are more interesting than others – even at DDN. Last week was one of those weeks where you needed high-bandwidth and lots of processing power to stay on top of industry news. (Fortunately, we’re pretty good at that.)

Firstly, there was the launch of WOS3 which was very well received by customers and thought leaders alike. That wasn’t much of a surprise given the impressive performance, efficiency and scalability numbers we have engineered into the platform for our most-demanding customers. Still, it’s always rewarding when people appreciate the work of our team.

Secondly, while shows like last week’s Structure Europe are historically good launch pads for new products and solutions and even startups, there was only one company on everyone’s minds. Nirvanix.  After just six years, the once so promising startup had to tell their customers to get their data back “fast”… TechTarget may have explained it best as a “worst nightmare” situation.

In my previous article, I explained why the public cloud is not the best option from a cost point of view, especially at scale. Last week, it became obvious that from a sustainability point of view, it isn’t clear that public cloud services are always the best choice either. I don’t intend to give another opinion on what went wrong and what Nirvanix should have done differently (read what Sonia Lelii wrote here, what Chris Preimesberger wrote here and what Chris Mellor wrote here). Instead, I’d much rather look at some lessons we can learn from this unfortunate event:

  • When a public cloud service implodes, two weeks is what you get to retrieve your data;
  • At that point, bandwidth suddenly becomes hyper-important;
  • The popular cloud gateway providers won’t come to the rescue;
  • Without an on-premise solution which is “always owned”, there’s a strong argument building for multi-vendor approaches to cloud services to avoid the “forever off” risk.

While Nirvanix is the most recent and definitely the most disastrous example of public cloud failure, it is definitely not the only example. Just about a month ago, Amazon had a major interruption; earlier this year, Microsoft’s Azure was down for no less than eight hours; and, even Google has had issues with its cloud services. Some will say: “Amazon, Google and Microsoft won’t disappear from one week to the other”. True, but think how much revenue a business like Netflix loses when their shows and movies – proudly stored in Amazon – are not accessible for a few hours. The point is, at public cloud scale – external operators and shared services customers can and do create added complexity around the only thing that you care about – which is YOUR data and YOUR applications.

But, let’s get back to the actual use case and what we can learn from Nirvanix’ service closure. Customers sadly received a communication in which they were informed about the situation and they were advised to move their data off the Nirvanix cloud within two weeks. Two weeks doesn’t sound too bad when you first read it – until you think it through. Two weeks means customers will only be able to recover the amount of data they manage to transfer off the Nirvanix cloud to whatever alternative solution they may come up with within the next 15 days. Many of these customers don’t have spare capacity in their own datacenters, so for them this means, “customers will only be able to recover the amount of data they manage to transfer off the Nirvanix cloud to Amazon S3.” Given that it typically takes a few days to make such decisions, we are probably talking more like 10 days now.

So, how much data can you ingest into S3 in that amount of time?  Amazon provides very little information about their ingest speeds. Part of that is due to the fact that those speeds are also very much influenced by parameters that Amazon does not control: the application used, network latency, object sizes, object volumes etc. Henry Baltazar of research firm Forrester calculated that “it could take up to 13 days to transfer 150TB of data [out of a public cloud] over the WAN”.

Some customers may be able to speed things up through Direct Connect or super fast network connections, but even if they could recover 10 times that amount of data, that still may not be enough for many of Nirvanix’s MSP or enterprise customers whose digital data exceeds 1.5PB.

What’s also interesting in the Nirvanix case is that the cloud gateway providers can’t do a lot for their customers either. Some of these technologies provide great transfer optimization tools, but those usually just use some sort of caching with or without deduplication. This is obviously of little help when trying to get all your data out of the Nirvanix cloud. In the coverage of the event, the Nasuni CEO, Andres Rodriguez, was reported to have confirmed “there is not enough bandwidth to get the data out in time.”

In conclusion, it is probably fair to say that, from a reliability point of view, anyone storing >0.5PB in a public cloud is potentially playing with fire. Coincidentally, we demonstrated last week that from a cost point of view, it makes sense to invest in an on-premise storage cloud at petabyte+ scale. In addition to higher reliability and significantly better TCO, an on-premise storage cloud provides the added benefits of:

  • Downtime on your terms.
  • Scale on your own terms.
  • Data protection, behind your firewall, on your own security terms.
  • The ability to define your own SLA’s.
  • Far superior performance to support both archive and performance use cases … all on your own terms.

In summary – there are a number of advantages to public cloud services that need to be dimensioned according to your business objectives and risk tolerance.  As you plan through your cloud storage agenda, make sure you understand all of the terms … theirs and yours.

  • DDN Storage
  • DDN Storage
  • Date: September 25, 2013