Long Term Data Retention – What do I do?

One of the more common requirements I come across on a day to day basis working with organisations across a broad spectrum of industries is the question of how to manage long-term data retention.

Frankly, I have massively oversimplified the question as there are many more nuances to it that this! Some of the questions, discussion points and potential solutions I see when trying to scope out and define a long-term data retention strategy are below. We assume in this case that we are talking about backing up application data, but the same can apply to file data, such as from a file server.

Long Term Data Retention – Questions, questions, questions?!

Like beautiful snowflakes, ultimately it always comes back to gathering the requirements for the individual business.

What are the regulatory and compliance requirements for long-term retention of data, and what are the consequences for loss of that data? In the new world, this could be pretty serious, especially with things like GDPR right around the corner. Escalating this up the business hierarchy can get buy in from other parts of the business to provide additional budget outside of IT, for a solution to meet the actual requirements, not just a botch job which will likely fail when put to the test.

How long is the actual data retention required? Looking at most current applications, if we are relying on being able to read back data in 7 years, current or future backup software may still work, but will we have the kit to read the tapes or data? If using spinning rust as a storage media, do we expect to be able to migrate data from one disk system to another easily in future, and if so, how does that impact things like encryption, capacity, deduplication and compression of that data?

What is it that we are trying to protect against? Deliberate or accidental deletion, total destruction of a server, array or DC, or perhaps we just need to be able to prove what your data looked like at a specific date / time.

How granular does the data need to be? For example do we need to be able to pull a file version from a specific week in the past X years? The more granular we need to get, potentially the more expensive. If we have controls in place to protect archive data against accidental / deliberate deletion, then we may not actually need to keep more than a few days or weeks of backups (as an example).

The use of FIM (File Integrity Management) tooling can be very helpful in this regard, especially for flat file structures. They can track all changes to your file system and if something is removed or updated, you could alert your server teams to investigate why and restore it from a recent backup.

Can the application or server prevent deliberate or accidental data deletion? If the application can be treated as, or write to, WORM storage (Write Once Read Many times), then the risk of data loss is further reduced, especially if that storage can be replicated off site. This doesn’t really help much with things like SQL databases, however!

Where is the archive data for the application or solution actually held? Is it within the live system (e.g. the live DB), or can it be exported onto a tertiary archive system where it becomes Read Only to all parties, including administrators? Even better, can the application export the data into a generic format, more likely to be readable in 25+ years time (such as CSV, text etc)? This provides quite a bit more flexibility in terms of future access and recovery options.

Does the application or server provide RBAC, and has it actually been implemented yet? If we minimise the number of people who could update or delete data (maliciously or accidentally), we minimise the risk of data loss.

What is the budget for the solution? All singing, all dancing, physical or software solutions can be great, but you may not be able to afford them.

Are we looking for an appliance-based solution which includes storage, replication, backup plugins, etc, or do you already have the HW and just need some software? This often, but not always, comes down to a time vs budget question. Do you want to spend your team’s time managing clunky backup software, or just buying an appliance which does half the work for you and is policy based?

What are your sovereignty requirements for the data, and would a cloud-based service be appropriate for your business? It can be very cheap to store data in something like S3 or blob storage, if the business accepts this and you don’t need to pull any of the data back very often (if at all).

How quickly is the data required when requested, how large is a typical access request, and how often are they needed? If this can be hours or days, then an offline or cloud solution may be appropriate, but anything where immediate access is required, is a different story.

Similarly, will we want to restore or access this data in the event of a DR, does this solution form part of our DR strategy? Perhaps it’s only required for access to much older data because you are replicating the most recent data to a DR facility!

As we can see, there are many, many, [many!] things to think about when considering long-term retention of data in a backup or archive solution.

What brought this up Alex?…

… I hear you ask!

I recently attended Storage Field Day 13, where we had a presentation from a backup vendor, StorageCraft, who has been in the SMB and mid-market space for many years, and it got me thinking!

The latest iteration of their backup software provides a local cache with cloud integration, and the added ability to spin up a DR environment in the event of an outage to your primary DC. A pretty nifty feature if you are legally able to store your data outside of your local environment (they currently have DCs in the US and EU only).

They can also create backups using their proprietary SPF file format, which has apparently not changed since its inception around 15 years ago. There is also no concept of a media server, as each server manages its own backups (albeit with the ability to use a central scheduler tool). This gets around the issue of backup compatibility, though may limit their ability to provide additional data services for the backup files, such as encryption, dedupe or compression, outside that of the storage targets they reside on.

This is what tickled my mental matrix into deploying my keyboard! 🙂

Want to Know More?

The session was recorded and is now available to stream online:

http://techfieldday.com/appearance/exablox-presents-at-storage-field-day-13/

Some of the other SFD13 delegates had their own thoughts on the session and StorageCraft in general. You can find them here:

Dan Frith – StorageCraft Are In Your Data Centre And In The Cloud

Scott Lowe – Backup and Recovery in the Cloud: Simplification is Actually Really Hard

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 13 were provided by Tech Field Day / Gestalt IT, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Storage, Tech Field Day , , , , , , , , , , , , , , , , ,

vBlog 2017 – Top Virtualisation & Storage Blogs

I’ll keep this post about vBlog 2017 very brief as you can see my thoughts on the subject of soliciting votes for awards in my post from 2015!

It’s that time of year again when Eric Siebert of vSphere Land and vLaunchpad runs his annual Top 100 vBlog nominations!

There are a huge number of bloggers around the world producing great documentation and insight, as well as podcasters helping you pass your daily commute in a constructive and educational fashion! Eric’s awards give people the opportunity to recognise those who really stand out from the crowd, as well as more up and coming bloggers / podcasts.

vBlog 2017 sounds great! How do I vote?

I would encourage you to head over to Eric’s site and cast your votes; it only takes take a few seconds of your time to show some appreciation for the time and effort put in by those ladies and gentlemen who worked tirelessly throughout the year to help make all of our jobs that little bit easier.

Of course, if you do feel like throwing a vote for the Open TechCast podcast and / or Tekhead.it, then it would of course be much appreciated! 😀

Direct link to the voting is also here:
http://topvblog2017.questionpro.com/

vBLog 2017 Awards

Podcasting, VMware, Web , , , , , , , , , ,

Storage Field Day 13 (SFD13) – Preview

For those people who haven’t heard of Tech Field Day, it’s an awesome set of events run by the inimitable Stephen Foskett. The event enables tech vendors and real engineers / architects / bloggers (aka delegates) to sit down and have a conversation about their latest products, along with technology and industry trends.

Ever been reading up on a vendor’s website about their technology and had some questions they didn’t answer? One of the roles of the TFD delegates is to ask the questions which help viewers to understand the technology. If you tune in live, you can also post questions via Twitter and the delegates, who will happily ask them on your behalf!

As a delegate it’s an awesome experience as you get to spend several days visiting some of the biggest and newest companies in the industry, nerding out with like-minded individuals, and learning as much from the other delegates as you do from the vendors!

So with this in mind, I am very pleased to say that I will be joining the TFD crew for the fourth time in Denver, for Storage Field Day 13, from the 14th-16th of June!

As you can see from the list of vendors, there are some really interesting sessions coming up! Having previously met with Primary Data, it will be great to catch up with them and find out about how they have improved in the past couple of years. We also use quite a selection of DellEMC products at my organisation, so it will be really good to meet them and get the latest updates.

Lastly, I am particularly keen to find out what future trends and movements will be from the perspective of SNIA, the Storage Network Industry Association, about some of the most cutting edge developments in the industry.

SFD13 Sounds great! How do I tune in?

If you want to tune in live to the sessions, see the following link:
Storage Field Day 13

If for any reason you can’t make it live, have no fear! All of the videos are posted on YouTube and Vimeo within a day or so of the event.

Storage, Tech Field Day , , , , , , , , , , , ,

AWS for VMware Admins – McVMUG Scottish VMUG Slide Deck

There is a certain amount of irony that the last post I did was on re-skilling, as this is the precise reason that it has taken me about 6 weeks to get around to posting this deck from our session at the McVMUG (Scottish VMUG)! I have spent all my time studying for my Microsoft Azure Architect exam (70-534)! Anyway, enough about that, I will cover it in a future post!

Last month, Chris Porter and I did a presentation at the Scottish VMUG (aka McVMUG) on AWS for VMware admins; a simple beginners guide with a few gotchas and tips we’ve picked up along our journeys to the public cloud.

The results of our mini survey were very similar to those of the recent London VMUG, in that most people had little or no AWS experience, but several were planning to do any AWS certs in the next 12 months, though notably less than half this time round.

Random Fact

After the McVMUG I was fortune enough to be able to go and spend a couple of days visiting family in my hometown of Oban. Here are a couple of cheeky snaps I managed to grab on the stunning train, if grey, journey to the West Highlands (between watching Azure study videos!). There also follows a wee pano from the hill behind my teenage home, looking out across Oban bay towards the island of Kerrera. Definitely enough to make me homesick!

McVMUG AWS Slide Deck

You can find a copy of the slide deck below:

AWS for VMware Admins v0.9 (McVMUG)

If you were unable to make the session and happen to be in or around Newcastle on 22nd June this year, Chris will be doing a solo repeat at the North East England VMUG, so make sure you register asap so you don’t miss out!

Further Reading

If you want to find out more about AWS, certification, etc, I have a load of additional resources and posts available here:

Index of Tekhead.it Blog Posts on Amazon AWS

AWS, VMUG, VMware , , , , , , , , , ,