Pure Storage – Now available in Petite

Pure Storage are probably one of the best known “All Flash” vendors in the industry today, but one of the things which has set the bar a little high for smaller organisations to get a slice of this speedy action, is the price.

Well, the good news is that for customers with smaller projects or simply smaller budgets, a Pure AFA is now within reach!

At Pure Accelerate today, along with their new FlashStack and FlashBlades, Pure announced a mini version of their ever popular M-series arrays, the FlashArray //M10.

This new array is fully featured, comes in the same form factor and with all of the same software, support, etc as the bigger models (//M20, //50, //70) but at a lower entry point of <$50k list including the first year of support. Not only that, but it is in-place non-disruptively upgradeable to the larger controller models later, all the way up to the //M70, so it is possible to buy in at this cheapest level and upgrade as business needs dictate later.

The only main differences between te //M10 and other Pure models is the lack of expansion ports on the controllers (you need at least an //M20 if you want to add shelves), and reduced compute / DRAM capacity.

m10.png

Specs are pretty much in line with the rest of the arrays in the range, with the //M10 coming in at 5TB / 10TB raw. Depending on your workloads, after dedupe and compression, this could be up to the stated useable (12.5TB/25TB). Mileage, as always, may vary! This is the perfect quantity for many use cases, including small to medium sized VDI environments, critical databases, etc. I suspect the //M10 may even find its way into some larger enterprises who’s internal processes often dictate that every project has its own budgets and its own pool of dedicated resources!

Lastly, and possibly most importantly to small businesses who may not have full time staff dedicated to managing storage, Pure’s monitoring and upgrade services are all included as well, via Pure1.

I think this is a positive step for the company and will help to engage with their customer base earlier in the organisational lifecycle, and when combined with their unique and very sticky Evergreen Storage offering, it will enable them to keep customers for life!

Disclaimer/Disclosure: My accommodation, meals and event entry to Pure Accelerate were provided by Pure Storage, and my flights were provided by Tech Field Day, but there was no expectation or request for me to write about any of the products or services and I was not compensated in any way for my time at the event.

Pure Storage Diversity – Time for the All Flash Vendor to go All File

It was only a couple of weeks ago I was saying to some colleagues that now Pure have finished with the whole IPO business, I thought they needed to diversify their portfolio a bit beyond the straight forward AFA.

I am very pleased to say they must have pre-read my mind and that’s exactly what they’ve announced today! 🙂

Not only is their new Pure FlashBlade platform designed to provide pretty much every file type (and object) you might require for your applications and users, it is also Scale Out, which is a key feature I am looking for more and more these days when researching new products for my customers.

FlashBlade.png

Not only is this a really interesting change in direction for Pure, but I see it as a pretty nifty bit of kit in and of itself! You would hope so, as Pure have been working on it in secret for the past two and a half years… 😮

For starters Pure have mixed and matched both Intel and ARM chips on every single blade, with different computational tasks being assigned to different chips, and a bit of FPGA technology thrown in for good measure. The latter being primarily used as a programmable data mover between the different elements of the blade, so as future flash technology becomes available, the FPGA can simply be re-coded instead of requiring total redesign / replacement with every generation. This will enable Pure to change out their flash as often as every 6 months in their production plants, taking maximum advantage of the falling prices in the NAND market.

This chip design was created to use the ARM processors as embedded and linked to the FPGAs, which effectively gives you a software overlay / management function, along with other low intensity, multi-threaded processes. The significant computational power of the Intel chips, particularly for single threaded workloads, rounds out the compute. From a nerdy technologists standpoint, all I can say is schweeeet!

The numbers they are suggesting are pretty impressive too! Each 4u appliance is capable of scaling out linearly with the following stats:

  • Up to 15x 8TB or 52TB blades, for a maximum of 1.6PB per 4u chassis
  • Up to 15GB/sec throughput per chassis, though I believe this is 4K 100% read, and real numbers might be around 1/3 of this.
  • 40Gbps ethernet out, with 2x 10Gbps per blade, connected to a broadcom based, custom, resilient backplane / switch layer within each chassis. Scaling to multiple chassis would require you to provide ToR switch ports for east-west traffic between chassis.
  • Overlaying this is Pure’s custom SDN code, which securely separates internal and external traffic, and uses multicast for auto-discovery of new devices.
  • Integrated DRAM and NV-RAM on every blade, along with PCIe access to the NAND.

The blades themselves look something like this:

blade.png

In terms of protocols, it will support NFSv3 out of the box on GA, with SMB and object storage over S3 supported shortly afterward. My understanding is that initial S3 support will be limited to basic commands, PUT, GET, etc, and more advanced feature support is in the pipeline. The initial release seems to be primarily targetted at the filer market, with object being the underlying architecture, but not the main event. As this support is built out later, the object offering could become more compelling.

The data itself is distributed and protected through the use of N+2 erasure coding, using however many blades are in the chassis. For example an 8 blade system would be configured as EC 6+2. As the number of blades in the system increases, garbage collection cycles are used to redistribute data to the new capacity, though I am still not 100% sure how this will work when your existing blades are almost full. The compute within each blade, however, acts independently of the storage and can access data resources across the chassis, so from the moment the additional blade is added, you have immediate access to the compute capacity for data processing.

My only query on this would be why Pure did not offer the ability to choose between Erasure Coding, which is ideal for lower performance requirements, and replicas, which would be handier for very low latency use cases? If you are putting it all on flash in the first place, instead of a hybrid model, there may be times when you want to keep that latency as low as possible.

The software platform they have design to manage this is called Elasticity, and to reduce the need to learn yet another interface, it looks very similar to the existing Pure management interfaces:

elasticity.png

A metadata engine with search functionality will be coming later, which will allow you to gain insights into the types of data you hold, and may potentially able to delve into the content of that data to find things such as social security numbers, etc. There are few details available on this at the time of writing.

As with the other Pure platforms, telemetry data is sent back to base on a regular basis, and Pure take care of all of the proactive maintenance and alerting for you. All of this data is presented through their Pure1 portal, which is pretty fully featured and intuitive.

I have to say I am genuinely surprised to see Pure come out with a solution with such completely bespoke hardware, when the entire industry is going in the direction of commodity + software, but the end result looks really promising. The sooner they can get SMB (not CIFS!) into the product the better, as this will allow them to begin competing properly with the likes of NetApp on the filer front.

As with many new products we tend to see on the market, the data services are not all there yet, but at the rate Pure do code releases, I don’t imagine it will be long before many of those RFP check boxes will be getting checked!

GA is expected during the second half of 2016.

Disclaimer/Disclosure: My accommodation, meals and event entry to Pure Accelerate were provided by Pure Storage, and my flights were provided by Tech Field Day, but there was no expectation or request for me to write about any of the products or services and I was not compensated in any way for my time at the event.

Secondary can be just as important as Primary

There can be little doubt these days, that the future of the storage industry for primary transactional workloads is All Flash. Finito, that ship has sailed, the door is closed, the game is over, [Insert your preferred analogy here].

Now I can talk about the awesomeness of All Flash until the cows come home, but the truth is that flash is not now, and may never be as inexpensive for bulk storage as spinning rust! I say may as technologies like 3D NAND are changing the economics for flash systems. Either way, I think it will still be a long time before an 8TB flash device is cheaper than 8TB of spindle. This is especially true for storing content which does not easily dedupe or compress, such as the two key types of unstructured data which are exponentially driving global storage capacities through the roof year on year; images and video.

With that in mind, what do we do with all of our secondary data? It is still critical to our businesses from a durability and often availability standpoint, but it doesn’t usually have the same performance characteristics as primary storage. Typically it’s also the data which consumes the vast majority of our capacity!

AFA Backups

Accounting needs to hold onto at leat 7 years of their data, nobody in the world ever really deletes emails these days (whether you realise or not, your sysadmin is probably archiving all of yours in case you do something naughty, tut tut!), and woe betide you if you try to delete any of the old marketing content which has been filling up your arrays for years! A number of my customers are also seeing this data growing at exponential rates, often far exceeding business forecasts.

Looking at the secondary storage market from my personal perspective, I would probably break it down into a few broad groups of requirements:

  • Lower performance “primary” data
  • Dev/test data
  • Backup and archive data

As planning for capacity is becoming harder, and business needs are changing almost by the day, I am definitely leaning more towards scale-out solutions for all three of these use cases nowadays. Upfront costs are reduced and I have the ability to pay as I grow, whilst increasing performance linearly with capacity. To me, this is a key for any secondary storage platform.

One of the vendors we visited at SFD8, Cohesity, actually targets both of these workload types with their solution, and I believe they are a prime example of where the non-AFA part of the storage industry will move in the long term.

The company came out of stealth last summer and was founded by Mohit Aron, a rather clever chap with a background in distributed file systems. Part of the team who wrote the Google File System, he went on to co-found Nutanix as well, so his CV doesn’t read too bad at all!

Their scale-out solution utilises the now ubiquitous 2u, 4-node rack appliance physical model, with 96TB of HDDs and a quite reasonable 6TB of SSD, for which you can expect to pay an all-in price of about $80-100k after discount. It can all be managed via the console, or a REST API.

Cohesity CS2000 Series

2u or not 2u? That is the question…

That stuff is all a bit blah blah blah though of course! What really interested me is that Cohesity aim to make their platform infinitely and incrementally scalable; quite a bold vision and statement indeed! They do some very clever work around distributing data across their system, whilst achieving a shared-nothing architecture with a strongly consistent (as opposed to eventually consistent), 2-phase commit file system. Performance is achieved by first caching data on the SSD tier, then de-staging this sequentially to HDD.

I suspect the solution being infinitely scalable will be difficult to achieve, if only because you will almost certainly end up bottlenecking at the networking tier (cue boos and jeers from my wet string-loving colleagues). In reality most customers don’t need infinite as this just creates one massive fault domain. Perhaps a better aim would be to be able to scale massively, but cluster into large pods (perhaps by layer 2 domain) and be able to intelligently spread or replicate data across these fault domains for customers with extreme durability requirements?

Lastly they have a load of built-in data protection features in the initial release, including instant restore, and file level restore which is achieved by cracking open VMDKs for you and extracting the data you need. Mature features, such as SQL or Exchange object level integration, will come later.

Cohesity Architecture

Cohesity Architecture

As you might have guessed, Cohesity’s initial release appeared to be just that; an early release with a reasonable number of features on day one. Not yet the polished article, but plenty of potential! They have already begun to build on this with the second release of their OASIS software (Open Architecture for Scalable Intelligent Storage), and I am pleased to say that next week we get to go back and visit Cohesity at Storage Field Day 9 to discuss all of the new bells and whistles!

Watch this space! 🙂

To catch the presentations from Cohesity as SFD8, you can find them here:
http://techfieldday.com/companies/cohesity/

Further Reading
I would say that more than any other session at SFD8, the Cohesity session generated quite a bit of debate and interest among the guys. Check out some of their posts here:

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 8 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

AWS Certified Solutions Architect Associate Exam Study Guide & Resources

After about 5 weeks of steeping myself in the AWS ecosystem and platform, labbing like crazy, and attending a compressed AWS training course, I finally sat the AWS Certified Solutions Architect Associate exam last week and passed.

I’ve described my experience and thoughts on the exam itself here:
#AWS Certified Solutions Architect Associate Exam Prep & Experience

Study Materials

In preparation for the exam, I used the following study materials:

Best of luck with your exams!!! 🙂

AWS Certified Solutions Architect Associate Exam Prep & Experience