Cloud, DevOps, Evangelism

ScaleIO @ Scale - 200 Nodes and Beyond!


Ever since my last post a couple weeks about ScaleIO, I've been wanting to push its limits.  Boaz and Erez (the founders of ScaleIO) are certainly smart guys, but I'm an engineer, and whenever anyone says 'It can handle hundreds of nodes', I tend to want to test that for myself.

So, I decided to do exactly that.  Now, my home lab doesn't have room for more than a half dozen VMs.  My EMC lab could probably support about 50-60.  I was going for more - WAY more.  I wanted hundreds, maybe thousands.  Even EMC's internal cloud didn't really have the scale that I wanted to do, as its geared for more long lived workloads.

So, I ended up running against Amazon Web Services, simply because I could spin up cheap ($.02/hr) t1.micro instances very rapidly without worrying about cost (too much - it still aint free). They have an excellent API (boot) that is very good and easy to use.  Combine that with the paramiko ssh library and you have a pretty decent platform to deploy a bunch of software on.

Some have asked why I didn't use the Fabric project - I didn't feel that its handling of SSH keys was quite up to par, nor was its threading model.  So rather than deal with it, I used my own thread pool implementation.

Anyways - where did I end up?  Well, I found that about 5% of the deployed systems (when deploying hundreds) would simply fail to initialize properly.  Rather than investigate, I just treat them as cattle and shoot them in the head, then replace them.  After all the nodes are built and joined to the cluster, I created 2 x 200GB volumes and exported them back out to all the nodes.  Lastly, I ran a workload generator on them to drive some decent IO.

I ended up being able to shove 200 hosts into a cluster before the MDM on ScaleIO 1.1 refused to let me add any more. I haven't identified yet if that is actually the limit, nor have I tried with ScaleIO 1.2 yet.  But - you can bet its next on my list!

What does it all look like?

Here are the nodes in the Amazon Web Services Console...

Screen Shot 2013-11-07 at 11.44.45 AM

And then they've all been added to the cluster:

Screen Shot 2013-11-07 at 11.39.26 AMThen, I ran some heavy workload against it.  Caveat: Amazon t1.micro instances are VERY small, and limited to less than 7MB/s throughput each, along with about half a CPU and only about 600MB RAM.  As a result, they do not reasonable represent the performance of a modern machine.  So don't take these numbers as what ScaleIO is capable of - I'll have a post in the next couple weeks demonstrating what it can do on some high powered instances.

Screen Shot 2013-11-06 at 9.36.01 PM


Pushing over 1.1GB/s of throughput (and yes, thats gigabytes/sec, so over 10Gbits total throughput across almost 200 instances.

Screen Shot 2013-11-07 at 11.39.39 AM


The individual host view also shows some interesting info, although I did notice a bug where if you have more than a couple dozen hosts, they won't all show individual in the monitoring GUI.  Oh well - thats why we do tests like this.

Lastly, when I terminated all the instances simultaneously (with one command, even!), I caught a pic of the very unhappy MDM status:

Screen Shot 2013-11-07 at 11.43.20 AM


How much did this cost?  Well, excluding the development time and associated test instance costs....to run the test along required 200 t1.micro instances @ $0.02/hr, 1 m1.small instance @ $0.06/hr, 201 * 10GB EBS volumes @ $0.10 / GB-month.  In total?  About $7.41 :).  Although if I add in the last couple weeks worth of development instances, I'm at about $41

Screen Shot 2013-11-07 at 12.20.33 PM


Maybe @sakacc (Chad Sakac) will comp me $50 worth of drinks at EMCworld?

Lastly, you can find all the code I used to drive this test at my GitHub page.  Note - its not very clean, has little documentation and very little error handling.  Nontheless, it helpful if you want some examples of using EC2, S3, thread pooling, etc.

I'll have 3-5 more posts over the next week or two describing in more depth each of the stages (building the MDM, building the Nodes, adding to the cluster and running the workload generator) for the huge nerds, but for now - enjoy!