Archive for 29th June 2015

Tech Startup Spotlight – Hedvig


After posting this comment last week, I thought it might be worth following up with a quick post. I’ll be honest and say that until Friday I hadn’t actually heard of Hedvig, but I was invited along by the folks at Tech Field Day to attend a Webex with this up and coming distributed storage company, who have recently raised $18 million in their Series B funding round, having only come out of stealth in March 2015.

Hedvig are a “Software Defined Storage” company, but in their own words they are not YASS (Yet Another Storage Solution). Their new solution has been in development for a number of years by their founder and CEO Avinash Lakshman; the guy who invented Cassandra at Facebook as well as Amazon Dynamo, so a chap who knows about designing distributed systems! It’s based around a software only distributed storage architecture, which supports both hyper-converged and traditional infrastructure models.

It’s still pretty early days, but apparently has been tested to up to 1000 nodes in a single cluster, with about 20 Petabytes, so it would appear to definitely be reasonably scalable! 🙂 It’s also elastic, as it is designed to be able to shrink by evacuating nodes, as well as add more. When you get to those kind of scales, power can become a major part to your cost to serve, so it’s interesting to note that both x86 and ARM hardware are supported in the initial release, though none of their customers are actually using the latter as yet.

In terms of features and functionality, so far it appears to have all the usual gubbins such as thin provisioning, compression, global deduplication, multi-site replication with up to 6 copies, etc; all included within the standard price. There is no specific HCL from a hardware support perspective, which in some ways could be good as it’s flexible, but in others it risks being a thorn in their side for future support. They will provide recommendations during the sales cycle though (e.g. 20 cores / 64GB RAM, 2 SSDs for journalling and metadata per node), but ultimately it’s the customer’s choice on what they run. Multiple hypervisors are supported, though I saw no mention of VAAI support just yet.

The software supports auto-tiering via two methods, with hot blocks being moved on demand, and a 24/7 background housekeeping process which reshuffles storage at non-busy times. All of this is fully automated with no need for admin input (something which many admins will love, and others will probably freak out about!). This is driven by their philosophy or requiring as little human intervention as possible. A noteworthy goal in light of the modern IT trend of individuals often being responsible for concurrently managing significantly more infrastructure than our technical forefathers! (See Cats vs Chickens).

Where things start to get interesting though is when it comes to the file system itself. It seems that the software can present block, file and object storage, but the underlying file system is actually based on key-value pairs. (Looks like Jeff Layton wasn’t too far off with this article from 2014) They didn’t go into a great deal of detail on the subject, but their architecture overview says:

“The Hedvig Storage Service operates as an optimized key value store and is responsible for writing data directly to the storage media. It captures all random writes into the system, sequentially ordering them into a log structured format that flushes sequential writes to disk.”

Supported Access Protocols
Block – iSCSI and Cinder
File – NFS (SMB coming in future release)
Object – S3 or SWIFT APIs

Working for a service provider, my first thought is generally a version of “Can I multi-tenant it securely, whilst ensuring consistent performance for all tenants?”. Neither multi-tenancy of the file access protocols (e.g. attaching the array to multiple domains for different security domains per volume) nor storage performance QoS are currently possible as yet, however I understand that Hedvig are looking at these in their roadmap.

So, a few thoughts to close… Well they definitely seem to be a really interesting storage company, and I’m fascinated to find out more as to how their key-value filesystem works in detail.  I’d suggest they’re not quite there yet from a service provider perspective, but for private clouds in the the enterprise market, mixed hypervisor environments, and big data analytics, they definitely have something interesting to bring to the table. I’ll certainly be keeping my eye on them in the future.

For those wanting to find out a bit more, they have an architectural white paper and datasheet on their website.

Assigning vCenter Permissions and Roles for DRS Affinity Rules

Today I was looking at a permissions element for a solution. The requirement was to provide a customer with sufficient permissions to be able to configure host and virtual machine affinity / anti-affinity groups in the vCentre console themselves, without providing any more permissions than absolutely necessary.

After spending some time trawling through vCentre roles and permissions, I couldn’t immediately find the appropriate setting; certainly nothing specifically relating to DRS permissions. A bit of Googling and Twittering also yielded nothing concrete. I finally found that the key permission required to be able to allow users to create and modify affinity groups is the “Host \ Inventory \ Modify Cluster” privilege. Unfortunately the use of this permission is a bit like using a sledgehammer to crack a nut!


By providing the Modify Cluster permission, this will also provide sufficient permissions to be able to enable, Configure and disable HA, modify EVC settings, and change pretty much anything you like within DRS. all of these settings are relatively safe to modify without risking uptime (though they do present some risk in the event of unexpected downtime); what is a far more concerning is that these permissions and allow you to enable, configure and disable DPM! It doesn’t take a great deal of imagination to come up with scenario where for example a junior administrator accidentally enables DPM on your cluster, a large percentage of your estate unexpectedly shuts down overnight without the appropriate config to boot back up, and all hell breaks loose at 9am!

The next question then becomes, how do you ensure that this scenario is at least partly mitigated? Well it turns out that DPM can be controlled via vCenter Scheduled Tasks. Based on that, the potential workaround for this solution is to enable the Modify Cluster privilege for your users in question, then set a scheduled task to auto-disable DPM on a regular basis (such as hourly). This should at least minimise any risk, without necessarily eradicating it. Not ideal, but it would work. I’m not convinced as to whether this would be such a great idea for use on a critical production system. Certainly a bit of key training before letting anyone loose in vCenter, even with “limited” permissions, is always a good idea!

I have tested this in my homelab on vSphere 5.5 and it seems to work pretty well. I don’t have vSphere 6 set up in my homelab at the moment, so can’t confirm if the same configuration options are available, however it seems likely. I’ll test this again once I have upgraded my lab.

It would be great to see VMware provide more granular permissions in this area, as even basic affinity rules such as VM-VM anti-affinity are absolutely critical in many application solutions to ensure resilience and availability of services such as Active Directory, Exchange, web services, etc. To allow VM administrators achieve this, it should not be necessary to start handing out sledgehammers to all and sundry! 🙂

If anyone has any other suggested solutions or workarounds to this, I would be very interested to hear them? Fire me a message via Twitter, and I will happily update this post with any other suggested alternatives. Unfortunately due to inundation with spam, I removed the ability to post comments from my site back in 2014. sigh


%d bloggers like this: