Where did all the IOPs go…?

It is an undisputed fact that HyperConverged Infrastructure (HCI) is a truly disruptive technology that has significantly changed the compute landscape. Initially focused around the storage layer, it enabled early adopters to free themselves from the storage array vendor technology lock-in that was becoming so prevalent. One of the major benefits is that it allows you to scale your infrastructure using commodity off the shelf systems without the same sort of inherent scaling issues that are common in traditional virtualized infrastructure with disk arrays.

So what could possibly be wrong with this picture…..?

Well, I hate to be a naysayer, I’m generally an incredible optimist to a fault, but on this point I feel I must highlight a rather inconvenient truth; the one rather large missing ingredient my fellow HCI’ers is performance. For a long time I fear the industry has been hypnotised into believing that you can have flexibility, or you can have performance, but you can’t have both. It only struck me recently exactly how much time and money has been spent on producing materials to ‘distract’ people away from the inconvenient truth, something akin to a jedi mind trick: “you don’t really care about IOPs, your workload can’t even consume them anyway, look even though you want to buy fast hardware, we’ve done hours and hours of workload profiling to explain to you why you don’t need more performance”. My friends, search your true feelings …. trust your instincts …. if it looks like it and it smells like it then yes it most probably is!

But joking aside… the real question we should be asking is why would anyone spend time effort and money on diverting attention away from the problem rather than just fixing it?! Well it took us many years to find the true answer … and it’s ugly. You can’t just tweak a few things here and there to make things perform properly, it turns out that full internal surgery is the only way to go. We’ve done it, and trust me when I say this, it was really hard! It took four years of dedicated work, rewriting the stack from the ground up. Just like building a performance car, you start with the easy stuff; body aerodynamics, the suspension, add a tail fin and modify the wheels before you have to accept the truth that actually the gearbox needs a complete overhaul and while you’re at it, you might as well rebuild the entire engine. The result is something rather different from what you started with…. but oh what a result! It is therefore quite understandable that spending money on distraction is much more appealing than changing everything. It is a cheaper and much less risky fix to the problem, but unfortunately it can only work for so long

So how bad is it really?

Well, where do we begin. The problems lie in many areas. Firstly, access speeds for NVMe flash storage are getting better and better (i.e. lower). On the current generation of Intel Optane 3D X-point technology you can read a 4K block in under 10 microseconds, and on many storage drives on the market today from a variety of vendors you can achieve between 800K and 1Million IOs per second. Lets let the enormity of that sink in for a second… 5 years ago we were talking about deploying disk arrays of 32 or 64 SAS drives in order to achieve this sort of performance! The access latency for that sort of performance was 1, maybe even 2, orders of magnitude larger than NVMe flash drives today with all the unpredictably of mechanical head access latency depending on where you were reading data physically from the drive. I can guarantee that when I was part of the Xen team building the first open source hypervisor platform (still in use today incidentally on a majority of the public cloud infrastructure, e.g. AWS, Oracle) high performance storage and networking was far less of a concern than virtualised CPU core efficiency. In fact the storage and network devices were always a massive bottleneck in the whole system. Fast forward 15 years and we see a very different picture, driving even a single NVMe or single 100GBit Ethernet NIC at maximum line rate suddenly requires multiple cores and ultra-efficient NUMA-aware scheduling.

So how are those 15 year old virtualisation architectures holding up?

Well, we’ve been running a series of tests in order to compare our supercharged v12 racing car against the classics, and well rather (un?)surprisingly they perform like they did 15 years ago. To keep things fair we compare against baremetal performance of the latest Ubuntu Linux stack with the identical hardware. We measure basic raw performance of the hardware and, as expected, we achieve something very close to the published manufacturer specs for NVMe storage drives. We include a basic configuration of 2 Intel P4610 NVMe drives per blade, and measured across a minimum set comprising a 3 node cluster. We measured the same performance then on the Sunlight.io platform, creating VMs with at least as many virtual drives as physical drives, and then did the same exact test on the granddaddy of them all, the big V with the latest and greatest hyper converged storage stack…. then we spent we spent many hours scratching our heads.

Time for a little basic mathematics…
So, the total aggregate IOPs of the 3-node cluster is around 4.5 Million. On the Sunlight measurement we were able to scale this up pretty effortlessly using just the default configuration with no special ‘tinkering’ under the hood to the 4.5 Million mark with 18 VMs, i.e. (6 VMs per blade). The other comparison “platform” just flatlined at around 400K, regardless of how many VMs were added and on which nodes. That right there my friends is an order of magnitude off, or in real terms, only 9% of the total.

Then we came across another completely independent (and somewhat less polite!) analysis
[http://www.cultofanarchy.org/vmware-vsan-is-great-in-messing-things-up-re-investigating-vmware-vsan-4-node-cluster-performance/]
And when it was followed soon afterwards by a similar story from another independent source that had conducted their own very similar measurements we realised we were not alone:

The verdict:
This is the first of many discussions on the topic as we try to uncover the truth about what is really going on out there. I know for a fact that HCI technology does not need to perform so badly, and in fact we’ve built a product to prove it! If you haven’t done so already, come and give it a try and let us know what you think. We’ve got some pretty neat pricing models as well that make this ultra affordable, and of course the performance efficiency is just years ahead.

This is a brave new world my friends, as Moore’s law officially dies, and processor performance fails to continue scaling, but we look to replace it with faster storage, memory and network convergence, the challenge to all of us in the software world is to bridge the widening gap with the innovations at the hardware level and achieve orders of magnitude faster performance in network and storage access, both latency and bandwidth. One thing is clear, the established products in the market are not going to cut the mustard and in fact are getting further away from reality with every new hardware release. We need new solutions that are built to scale.