Tag Archive for backup

Long Term Data Retention – What do I do?

One of the more common requirements I come across on a day to day basis working with organisations across a broad spectrum of industries is the question of how to manage long-term data retention.

Frankly, I have massively oversimplified the question as there are many more nuances to it that this! Some of the questions, discussion points and potential solutions I see when trying to scope out and define a long-term data retention strategy are below. We assume in this case that we are talking about backing up application data, but the same can apply to file data, such as from a file server.

Long Term Data Retention – Questions, questions, questions?!

Like beautiful snowflakes, ultimately it always comes back to gathering the requirements for the individual business.

What are the regulatory and compliance requirements for long-term retention of data, and what are the consequences for loss of that data? In the new world, this could be pretty serious, especially with things like GDPR right around the corner. Escalating this up the business hierarchy can get buy in from other parts of the business to provide additional budget outside of IT, for a solution to meet the actual requirements, not just a botch job which will likely fail when put to the test.

How long is the actual data retention required? Looking at most current applications, if we are relying on being able to read back data in 7 years, current or future backup software may still work, but will we have the kit to read the tapes or data? If using spinning rust as a storage media, do we expect to be able to migrate data from one disk system to another easily in future, and if so, how does that impact things like encryption, capacity, deduplication and compression of that data?

What is it that we are trying to protect against? Deliberate or accidental deletion, total destruction of a server, array or DC, or perhaps we just need to be able to prove what your data looked like at a specific date / time.

How granular does the data need to be? For example do we need to be able to pull a file version from a specific week in the past X years? The more granular we need to get, potentially the more expensive. If we have controls in place to protect archive data against accidental / deliberate deletion, then we may not actually need to keep more than a few days or weeks of backups (as an example).

The use of FIM (File Integrity Management) tooling can be very helpful in this regard, especially for flat file structures. They can track all changes to your file system and if something is removed or updated, you could alert your server teams to investigate why and restore it from a recent backup.

Can the application or server prevent deliberate or accidental data deletion? If the application can be treated as, or write to, WORM storage (Write Once Read Many times), then the risk of data loss is further reduced, especially if that storage can be replicated off site. This doesn’t really help much with things like SQL databases, however!

Where is the archive data for the application or solution actually held? Is it within the live system (e.g. the live DB), or can it be exported onto a tertiary archive system where it becomes Read Only to all parties, including administrators? Even better, can the application export the data into a generic format, more likely to be readable in 25+ years time (such as CSV, text etc)? This provides quite a bit more flexibility in terms of future access and recovery options.

Does the application or server provide RBAC, and has it actually been implemented yet? If we minimise the number of people who could update or delete data (maliciously or accidentally), we minimise the risk of data loss.

What is the budget for the solution? All singing, all dancing, physical or software solutions can be great, but you may not be able to afford them.

Are we looking for an appliance-based solution which includes storage, replication, backup plugins, etc, or do you already have the HW and just need some software? This often, but not always, comes down to a time vs budget question. Do you want to spend your team’s time managing clunky backup software, or just buying an appliance which does half the work for you and is policy based?

What are your sovereignty requirements for the data, and would a cloud-based service be appropriate for your business? It can be very cheap to store data in something like S3 or blob storage, if the business accepts this and you don’t need to pull any of the data back very often (if at all).

How quickly is the data required when requested, how large is a typical access request, and how often are they needed? If this can be hours or days, then an offline or cloud solution may be appropriate, but anything where immediate access is required, is a different story.

Similarly, will we want to restore or access this data in the event of a DR, does this solution form part of our DR strategy? Perhaps it’s only required for access to much older data because you are replicating the most recent data to a DR facility!

As we can see, there are many, many, [many!] things to think about when considering long-term retention of data in a backup or archive solution.

What brought this up Alex?…

… I hear you ask!

I recently attended Storage Field Day 13, where we had a presentation from a backup vendor, StorageCraft, who has been in the SMB and mid-market space for many years, and it got me thinking!

The latest iteration of their backup software provides a local cache with cloud integration, and the added ability to spin up a DR environment in the event of an outage to your primary DC. A pretty nifty feature if you are legally able to store your data outside of your local environment (they currently have DCs in the US and EU only).

They can also create backups using their proprietary SPF file format, which has apparently not changed since its inception around 15 years ago. There is also no concept of a media server, as each server manages its own backups (albeit with the ability to use a central scheduler tool). This gets around the issue of backup compatibility, though may limit their ability to provide additional data services for the backup files, such as encryption, dedupe or compression, outside that of the storage targets they reside on.

This is what tickled my mental matrix into deploying my keyboard! 🙂

Want to Know More?

The session was recorded and is now available to stream online:

StorageCraft Presents at Storage Field Day 13

Some of the other SFD13 delegates had their own thoughts on the session and StorageCraft in general. You can find them here:

Dan Frith – StorageCraft Are In Your Data Centre And In The Cloud

Scott Lowe – Backup and Recovery in the Cloud: Simplification is Actually Really Hard

Disclaimer/Disclosure: My flights, accommodation, meals, etc, at Storage Field Day 13 were provided by Tech Field Day / Gestalt IT, but there was no expectation or request for me to write about any of the vendors products or services and I was not compensated in any way for my time at the event.

Scale-Out Doesn’t Just Mean Applications

Scale Out

A couple of months ago I wrote a post entitled Scale-Out. Distributed. Whatever the Name, it’s the Future of Computing.

Taking the concept a step further, I recently started thinking about other elements in IT which are moving in that direction; not just applications and storage, but underlying infrastructure and management elements too.

Then it dawned on me that this really is not a new thing… we’ve been taking this approach for years! Technologies like VMware vSphere, have enabled us to become trusting, almost presumptuous, that we can add resources as we need them; increasing the shared pool transparently and enabling us to continue to service requirements, whilst eliminating downtime. (You can even use them to scale up on-the-fly if you really have to!)

The current breed of infrastructure engineers and startups have grown up in this era and the great thing is that this has now become part of their DNA! Typically, no longer are solutions designed from scratch to be scale-up in nature; hitting some artificial limit in capacity or having to scale specific elements of a solution to avoid nasty bottlenecks.

Instead, infrastructure is being designed to scale-out natively; distributed architectures, balancing workloads and metadata evenly across platforms. This has the added benefit, of course, of making them more resilient to failure of individual components.Distributed Systems

Backup isn’t Sexy, but it’s Necessary

One great example of this new architecture paradigm (drink!), is Rubrik, a startup in the backup space who we met at Tech Field Day 12. Their home-grown distributed file system, distributed metadata, built in off-site replication and global namespace, provide a massively scalable and resilient backup system.

All of the roles from a traditional backup solution (such as backup proxies/media servers/metadata servers, etc) are now rolled into a single, scale-out platform. As I seem to find myself saying more and more often these days, KISS personified!kiss - Keep it simple stupid EFS

With shrinking IT teams, I commonly find that companies are willing to trade budget for time savings. Utilising a simple, policy-driven management interface and enabling off-site replication to be done over-the-wire, has a lot of benefits to operational time!

As an added bonus, it can even replicate out to S3, Blob and NFS targets, to give even more options for off-site replication. Of course, a big fat pipe to the internet will cost you more each month; though you’re probably investing in that anyway, to meet your employee’s peak lunchtime demand for facebook and youtube! 🙂

Much like any complex machine, under the hood, Rubrik is pretty impressive. There is a masterless cluster management solution, multi-tier flash and disk for performance, and a clever redirect-on-write snapshot chain algorithm, which minimises capacity utilisation whilst providing very granular restores.

The key thing here, though, is we don’t really care; we are a consumer society who just wants things to work, as we have more exciting things than backup to worry about!



We have enough complexity in IT these days without having to worry about backup. I would say that the simple to manage, scale-out solution from Rubrik is certainly worth considering as part of any PoC or RFP! 🙂

Further Info

You can catch the full Rubrik session at the link below:
Rubrik Presents at Tech Field Day 12

Further Reading

Some of the other TFD delegates had their own takes on the presentation we saw. Check them out here:

Disclaimer: My flights, accommodation, meals, etc at Tech Field Day 12 were provided by Tech Field Day, but there was no expectation or request for me to write about any of the vendors products or services.

Amazon AWS Tips and Gotchas – Part 10 – EFS (Elastic File System)

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS. This week, we talk about the latest feature of AWS, EFS (aka Elastic File System).

For the first post in this series with a bit of background on where it all originated from, see here:
Amazon #AWS Tips and Gotchas – Part 1

For more posts in this series, see here:
Index of AWS Tips and Gotchas

20. Amazon AWS Tips and Gotchas – Part 10 – EFS (Elastic File System)

A big challenge when designing highly available web infrastructures is historically how to provide a centralised content store for static content without wasting resources.

A classic model for this is a pair of web / file servers with either rsync or Gluster to replicate the content between them. In Windows world, this would be something like either a WSFC (failover cluster) or perhaps something evil like a DFS replicated share. This means that not only are you wasting money on multiple virtual machines / instances just to serve file content, but you also add significant risk and complexity in the replication and failover between these machines.

Enter, AWS EFS!AWS EFSAt a simple level, EFS is basically an NFS (v4.1) share within the AWS cloud, which is replicated across all AZs in any one region. No need for managing and replicating between instances, or indeed paying for EC2 instances just to create file shares! Great!

As this is still a relatively immature product, there are still a few “features” to be aware of:

  1. There is no native EFS backup solution (yet!). I’m sure this will come very soon. As we have Re:invent coming up, it wouldn’t surprise me if something came out then. In the meantime, your main methods would be either to use Data Pipeline to backup to another EFS store or potentially mount EFS and backup through an EC2 instance using your own tools or scripts. I would be concerned about backing up EFS to EFS (if in the same region), as this is putting all your eggs in one basket. Hopefully, AWS will provide other target options in the future.
  2. There is no native encryption of EFS data as yet. If you need this right now, you could achieve it by simply pre-encrypting the data in your application first, before it is written to EFS. Alternatively, just hold your breath as AWS have already stated that:
    “Amazon EFS does not currently provide the option to encrypt data at rest, but we will offer this option soon”.AWS EFS Meme
  3. If you have less than about 100GB, then due to the way the performance burst credits work you may not get the performance you need. The more you buy, the more performance you get, so don’t short change your app for the sake of a few dollars!

    “Amazon EFS uses a credit system to determine when file systems can burst. Each file system earns credits over time at a baseline rate that is determined by the size of the file system, and uses credits whenever it reads or writes data”

    In early testing, it has been seen that very small filesystems can lead to IO starvation and performance issues. I would recommend you start with 100GB as a minimum (subject to your workload requirements of course!). This is still pretty cheap at only about $30-33 a month; a lot less than even a pair of EC2 instances, never mind the complexity reduction benefits. KISS!

    Of course, the more caching you can do on that content, e.g. using CloudFront as a CDN, the lower the IO requirements on your EFS store.

    For more info on performance see here:
    Amazon EFS Performance

    kiss - Keep it simple stupid EFS

  4. And finally… being NFS based, this is obviously primarily aimed at Linux solutions. It would be nice to think that AWS will release an SMB version in the future… we can but hope!

Thanks to my learned colleague Tom Ellis for the tip! As he says, “The size needs to be determined by the throughput needs, and not the storage capacity needs. “

Find more posts in this series here:
Index of AWS Tips and Gotchas

HOWTO: Process for Upgrading Veeam Backup & Replication 7 to 8

As a VMware vExpert we are kindly provided free licenses for Veeam Backup & Replication and Veeam One. I have been using Veeam B&R for the last year and have successfully used it to protect half a dozen of my key lab machines and do one or two restores over that time.

The licenses we are provided with by Veeam are based on a 365 day evaluation, so my backup server was reaching its expiry date this week. I was running Veeam B&R version 7.x, so as part of the upgrade license I also needed to update the Veeam software from version 7 to 8.

This turned out to be an incredibly easy process with only a couple of minor tweaks at the end to get things up and running. As you can see from the screenshots below the installation and update of Veeam is pretty much a next, next, finish type of installation.

It’s also with mentioning that I have documented the process for upgrading Veeam B&R, but the process for upgrading Veeam One is pretty much the same.

As with any standard upgrade to software running in a virtual machine, I started by taking a snapshot of that machine.

Next step was to mount the ISO file Veeam into a virtual machine operating system and start the install wizard.

Of course I read every single word of the license agreement.

The installer recognised the previous version of the software and offered to upgrade to latest automatically.

I then pointed the install wizard to the evaluation license key provided to me by the folks at Veeam.

A number of basic checks are completed to ensure that the appropriate pre-requisites are in place.

Next you would enter the service account for Veeam. Obviously being a home lab and me being incredibly lazy, this is the local machine administrator in this case. In any production environment this should of course be a dedicated account.

The existing SQL express database instance is selected.

Veeam recognises this has an instance on it which can be upgraded.

The installer is now ready to run.

After about five minutes installation is complete.

After a quick reboot, the server is back up and running and I log back in. When I launch Veeam B&R 8 for the first time, it recognises that some server components still need to be upgraded.

Again this is just a next, next, finish setup.

The only issues I have seen after the upgrade were a couple of VMs which failed their backups. After a reboot of said machines, everything was right as rain and backups are running as normal.

Once I was sure everything was working properly, and had run a couple of successful backups, I committed and deleted the snapshots taken at the start of the process.

Overall the process was very simple and very slick, exactly what you want from a software upgrade. Particularly impressive considering this was a full version upgrade, not just a point release. You can see why their marketing department came up with the tagline “It Just Works”!

Although most organisations I have worked for in the past have generally used more traditional backup vendors, Veeam is definitely enterprise ready and well worth considering. The only drawback, is that if you run a mixed environment of physical and virtual machines, you may require multiple backup platforms. Even then, Veeam Endpoint can do this in some scenarios AFAIK.

%d bloggers like this: