Federal Government Information Technology

What to look for in a data management system

Government Data Retrieval for Prosecution of Terrorist Suspects Teaches Us All a Lesson

By Kevin M. Warnk Sr. (i360Gov guest columnist)

During a recent engagement with a large software company in the archive and data protection space, our implementation team got a call from an attorney with a large government customer. He had a curious question: “Could we help the government with a problem they were having regarding data management and compliance?”

The conversation became more curious when we learned the lawyer was supporting the prosecution of terrorist suspects. As the story developed, all the electronic data that had been created over the last five years (this was in 2005) had to be quickly put into a searchable format for legal discovery.

Problem was, the data was not online. It was on 5,000 tapes, stored in boxes in a hot, non-air-conditioned space. Our team would be reconstructing file and email data, to merge it with existing production data in a searchable repository.

The challenge: We had to complete the job in 60 days, and the engagement wasn’t risk free: The government’s IT contractor made it clear that if we caused an outage on one of their production systems, they would seek relief for any SLA penalty the government might impose. (Turns out that 25% of tapes were unusable, some data sets could only be partially reconstructed, and it took us three times as long to complete the job.)

Our team actually read the data in from tape to scratch space, and then ingested it to the compliance repository, while copying data from the production file and email systems to that same repository. Speed was important, but so was maintaining chain of custody.

It Could Have All Been Avoided

The most remarkable thing about this engagement was how avoidable it all could have been. That’s true today, ten years later, even as data has expanded logarithmically, new government mandates require the archiving of data.

The first challenge when creating a digital archive is the sheer amount of data that is created in our society today. IDC tracks this phenomenon in its “Digital Universe” Report. Here are some interesting facts they discovered:

From 2013 to 2020, the digital universe will grow by a factor of 10 – from 4.4 trillion gigabytes to 44 trillion. It more than doubles every two years.

In 2013, two-thirds of the digital universe bits were created or captured by consumers and workers, yet enterprises had liability or responsibility for 85% of the digital universe.

Of the useful data, IDC estimates that in 2013 perhaps 5% was especially valuable, or “target rich.” That percentage should more than double by 2020 as enterprises take advantage of new Big Data and analytics technologies and new data sources, and apply them to new parts of the organization.

So, as we move further into the “digital universe,” Enterprise Information Technology (IT) departments are faced with an exploding amount of unstructured digital content with no effective means of managing it and all the liability or responsibility of its stewardship. Even worse, currently only 5% of this content is valuable in its current state. The result has been higher IT infrastructure cost, even as business is putting pressure on IT to do more with less. This also means that only a fraction of the unstructured data currently being used is of value to the organization, even though it is stored on assets like Network Attached Storage (NAS) that cost millions.

How to Meet Cost, Management, and Compliance Needs

So, how do we move data to a system that presents a lower cost, yet allow end users to access that data within their current workflow? Keep in mind, that IT bares the responsibility and possible liability (based on the SLA) of stored information. The organization will see significant benefits from creating an information repository that better leverages lower cost technologies and allows for better data management and compliance. Recognize that IDC points out that only 5% of the data is valuable “in its current state.” Might there be a way to meet the cost, management and compliance needs while also delivering value back to the organization?

What to Look for in a Data Management System

Managing data is more than moving it from fast storage to slow storage and back again. It’s about understanding the value of the information to the organization and providing it in to the people that need it in a cost-effective way. This could include the creator of the data, legal departments for litigation or compliance and even as a source to Big Data projects where it can be mined for additional value or aid in decision making.

When delivering a data management system, consider the following.

A system that allows you to consolidate multiple sources of unstructured and semi-structured data into a single repository. This would include things like email, file systems and MS SharePoint.

No IT department wants to implement a new system that lights up the help desk call center. Consider solutions that allow end users to access their data in the same way they do today.

The solution should also allow you to “drain” your production file servers of all the seldom accessed data. This allows OPEX costs to be reduced and performance can be focused on the active data that needs it.

Scalability and reliability are key. Look for solutions with no single point of failure and can scale out as required.

As the data repository grows it will become more cost effective to move archived data from disk to tape and even to internal or external cloud providers. This allows you to reduce cost while still providing reasonable access times.

It’s important that the repository be able to reside on heterogeneous storage or media. Implementing a solution and then finding out it can only be on one vendor’s storage array or tape library can drive all the cost savings out of the solution.

Look for providers that use industry standard interfaces like CIFS, NFS, REST, HTTPS and even S3 will allow you to “on ramp” the data and provide interoperability with today and tomorrow’s technology.

Compliance can be a difficult issue for IT departments and organizations as a whole. Having built-in (not bolted on) compliance and search capability can save the organization hundreds of man-hours.

Allowing end users a “Cloud like” interface via smart phone and tablets can really drive adoption of the consolidation efforts. It will make everyone want to use the solution because it’s so much easier to find and use their information.

Creating unstructured data repositories can also allow organizations to get a jump start on Big Data efforts that are in progress or that are looking for useful data sources to drive fledgling initiatives.

Seek a provider that has meets Federal security certifications and standards like FIPS 140-2 and NIAP Protect Level 3. Look for providers and vendors that have a focus on support. If there is an issue you will want the fastest time to resolution.

We are seeing fastest data growth in human history, but even more fascinating is that it is conceivable that this trend has no end, as we continue into our digital universe. Fortunately, we have the ability to meet these challenges by driving new technologies into the enterprise data center and bring value to our customers in exciting new ways.

Kevin Warnk works for the ViON Corporation, where he focuses on the data protection and Cloud industries. Kevin has spent the last 25 years in many roles dedicated to working with Intelligence agencies and Department of Defense customers. Before that, he served as an intelligence analyst with the US Army.

Read article