The Pushes and Pulls Affecting Legacy Data Achieve True Active Archive Status

The concept for an active archive is intriguing as it relates to legacy data or any data that was created some time ago amidst a different set of legal, business, and compliance requirements. The lines of how data should be treated and classified through its use and age are not the same now as they were at the time the data was created. We’ve been brought up in IT data storage with (hopefully) sound records management policies that seek to classify and categorize data at certain times in its lifecycle. Of course, what throws the best-laid plans out of the window is to change the rules for how to deal with data once it’s been created. If only we knew those things when the data was created, the use for it would not change, the length of time we wanted it for would not change. In business, the drivers for making changes to the rules that might have been set up initially have primarily been business value, regulatory requirements, and legal intervention or preservation.

The rest of it has data that falls outside of these categorical uses is for all intents and purposes junk at best or liability at worst.

Let’s also look at the technological changes. The devices we use to store data have changed. There is still disk and tape, but other variations on these themes have come into play. Optical, flash, cloud (probably disk and tape but that’s another topic). The value of tape in the storage world has largely remained unchanged; it is portable, secure and cheap. As a result, it has been an obvious choice for backup and archive. More elegant solutions have evolved for backup over the past 20 years but the argument for archive seems to be as strong as ever, possibly stronger.

However, there are pushes and pulls to the life of data that was created, used, for compliance, legal, and now increasingly, more importantly, deleted. It seems the two biggest drivers that have created a demand for flexible or active archives as they related to legacy data are the security of critical mission-based data from hacking and the developing privacy regulations that could be placing demands on which data can be retained after having been archived.

As a tape services company, we have certainly seen a significant increase in work from organizations that have been ransomed and need to find some resolution they have sought from tape backups. It cannot solve all the issues with ransomware, but it can certainly clean up some of the mess created. What interests us more is the developing privacy regulation framework. It started with GDPR in Europe and is steadily developing other variations on the same theme. First in California (CCPA), now it looks like every state is going to have a version of their own Privacy Act with Florida the latest adding HB 969 to the Regulatory Reform Subcommittee agenda. The general premise of these acts is to place an increased strength on the rights of individuals to have their own personal information stored by companies. They place a framework for the removal of such information and many organizations recognize how serious this is and now employ experts to deal with the inquiries. But let’s think about the technology demands for a minute; if some personal information is stored on a PC on the hard drive it’s a pretty easy task to identify it and remove it. The deeper the data goes, NAS/SAN/tape/cloud the more difficult it becomes and the more costly and time-consuming to resolve. We don’t yet really know how sharp the regulators’ teeth will be regarding data that is unstructured (has no easily identified paths defined) and not easily identifiable in terms of sanctions, but companies are increasingly thinking about pro-active remedies for their (or not their) historical legacy data and trying to act accordingly.

You could say that what is required is not an archival system but an “active archival system”, where data can be easily found even if unstructured and easily removed after archival, if and when required. Inevitably the active archives of the future will be software-driven and infinitely more capable than the dumb archives of the past.

