Hyper-Converged Cynicism

Or “How I’ve come to love my vSAN Ready Nodes”

I’ll admit it, some years ago I was very cynical about HyperConverged Infrastructure (HCI). Outside of VDI workloads I couldn’t see how it would fit in my environment - and this was all down to the scaling model.

With the building-block architecture of HCI; storage, compute, and memory are all expanded in a linear fashion. Adding an extra host to the cluster to expand the storage capacity also increases the available memory and CPU in the pool of resources. But my workloads were varied, one day we might get a new storage-intensive application, the next week it might be one which is memory intensive. I was used to independently expanding the storage through a SAN and just the compute/memory side through the servers and didn’t want to be either running up against a capacity wall or purchasing unnecessary compute just to cater for storage demands.

This opinion changed when my own HCI journey started in 2017 with the purchase of a VMware vSAN cluster built on Dell Ready Nodes. Whist I’ll be writing about that particular technology here, the principles apply to other HCI infrastructures.

If the problem of HCI could is scaling, the solution is scale. These imbalances in load and growth balance out once a number of VMs are on the system- and this scale doesn’t have to be massive, even from the 4-host starting point of a vSAN cluster I found that when the time came to install node 5, the demands on storage and memory were roughly matched to the relevant capacities of the new node.

The original hosts need to be sized correctly, but unless you’re starting in a totally greenfield environment then you will have existing hosts and storage to interrogate and establish a baseline on current usage requirements. Use these figures, allow appropriate headroom for growth, and then add a bit more (particularly when considering the storage) to prevent the new infrastructure from running near capacity. Remember you are trading a certain level of efficiency for resilience - the cluster needs to be able to withstand at least one host loss and still have plenty of capacity for manoeuvre.

If you are going down the vSAN route, I can thoroughly recommend the ReadyNode option. Knowing that hardware will arrive and just work with the software-defined storage layer without spending hours digging in the Hardware Compatibility Lists was a great time saver, and we’re confident that we can turn round to our vendors and say “this didn’t work” without getting told “it’s because you’ve got disk controller chipset X and that’s not compatible with driver Y on version Z”. There’s a reason I named this blog “IT Should Just Work”.

:right
When expanding the cluster I consider best practice to be to expand with hosts of as similar configuration as possible to the original. If larger nodes are added (for example, storage/memory/CPU is now cheaper/bigger/faster) then these can create a performance imbalance in the cluster. For example a process running on host A might get access to a 2.2GHz CPU, but run the same process on host B with a 3GHz CPU and it will finish slower. Also worth considering is what happens when a host fails, or is taken into maintenance mode for patching. If this host is larger than it’s compatriots then (without very careful planning and capacity management) there might not be sufficient capacity on the remaining hosts to keep the workloads running smoothly.

It is possible in vSAN to add “storage-only” nodes, reducing the memory and possibly going single-socket (this saves on your license cost too!) and then using DRS rules to keep VMs off the host. Likewise “compute-only” nodes are possible, where the host doesn’t contribute any storage to the cluster. Whilst there are probably specific use-cases for both these types of nodes, the vast majority of the time I believe them to be best avoided. Without very careful consideration of workloads and operational practices these could easily land you in hot water.

So, I’m a convert. Two years down the line here and HCI is the on-premises infrastructure I’d recommend to anyone who asks. And those clouds gathering on the horizon? Well, if you migrate to VMware Cloud on AWS then you’re going to be running vSAN HCI there too!