Sarbanes Oxley : Content Management : Metadata
Think you’re compliant? Corrupt metadata could land you in jail
April 1, 2009 12:00 PM
The amount of data that companies must store is growing at an enormous pace as regulations including Sarbanes-Oxley, HIPAA, SEC 17a-4, and the Federal Rules of Civil Procedure —to name just a few—place an increasing burden on enterprise storage, backup and recovery needs. Fail to protect information in a recoverable state, and company officers face fines, other penalties and even jail time.
Director of Product Management & Services
Matters are made more complex when enterprise content management (ECM) systems are in use. The benefits of ECM - workflow, collaboration, and integrated management of information assets - bring an extra level of responsibility when it comes to protecting the content and metadata within them. That extra level of care required primarily relates to ECM metadata –audit trails, digital signatures, workflows, renditions and other “data about data,” which must be backed up with the content it supports in a synchronous manner to ensure full recoverability and compliance with regulatory mandates.
Most C level executives believe the information in their ECM systems is already protected by their enterprise backup systems. “We’re compliant,” they’ll say. “We have an enterprise content management system in place, and my IT team assures me it’s being backed up.”
Is this CEO asking the right questions? If you ask your IT department if your ECM information is backed up, the answer you will inevitably get back is “yes.” But you need to ask much deeper, probing questions if you are to protect your organization and yourself from the risks of data loss. ECM systems present several challenges to recoverability: complex data structures, large databases and critical metadata that cannot be lost.
Today’s regulatory environment is forcing many companies to re-evaluate their company-wide processes and procedures, including backup and recovery strategies. Section 404 of the Sarbanes-Oxley act of 2002, specifically, makes C level executives accountable for misrepresentation of corporate, operational or financial information. Without a sound backup strategy that protects content and metadata in ECM systems, the risk of non-compliance grows.
Why metadata is important
Most of the time, the metadata associated with contracts, supplier bids, and other documents is more important than the content within them. It’s important to understand the complexity of information stored in ECM repositories to get the full picture on the relationship between content and metadata. ECM systems have applications that integrate with them such as bid management software, pharmaceutical eSubmissions applications, credit and loan processing applications, and contract management applications. Business process management tools and applications are also often integrated with ECM systems. The ECM system acts as a tool for creating compliant workflows and processes, manages all transactions, and is ultimately the warehouse for information and the relationships between different pieces of that information.
As new business applications, documents and transactions are added to the ECM repository, an infrastructure of content - say a loan application - and metadata - say loan numbers, annotations, revisions and digital signatures - is built around the original document. This infrastructure metadata is lifecycle information about the original content that creates complex interrelationships among different pieces of information.
Take, for example, the government procurement manager coordinating responses from defense contractors replying to an RFI for a guidance system. The procurement manager must be certain that the response documents, which may include a contracts section, an engineering section, a project management section, specific responses to questions, and a number of appendices, are routed through the various steps integral to the contractor selection process. As the responses move through these steps, they “collect” metadata - workflows, digital signatures, approvals and renditions. Any loss or corruption of this metadata could significantly delay the contractor selection process, because, for reasons we will soon explore in further detail, metadata is not easily recreated or recovered.
Metadata is also important because its preservation is required by numerous industry and government regulatory mandates holding IT and business managers liable for the retention of original, unaltered information down to the individual record. Without proper safeguards, the loss of ECM system metadata carries the liability of not complying with regulations, the result of which can include significant fines, loss of shareholder value, and even jail time. Preserving ECM content and metadata such as audit trails while ensuring granular recoverability and creating a demonstrable “chain of custody” can help organizations avoid many of the risks associated with the loss or corruption of complex ECM system information
Depending upon which constituency your company serves, any single piece of ECM information could be subject to a number of compliance, records management and regulatory conditions. Not ensuring the integrity and recoverability of this information is, at a minimum, risky behavior. A quick sampling of regulations illustrates the importance of preserving metadata:
• For government entities - and those dealing with the government - COOP, or Continuity of Operations (which originated with Cold War directives to maintain key government functions in the face of a nuclear attack), specifies the protection, recovery and availability of electronic documents or records needed to support essential functions. Today, COOP planning has a broader context, essentially business continuity planning for government agencies and their contractors.
• For life sciences companies, the FDA’s 21 CFR Part 11 regulation stipulates the protection and retention of information. The regulation requires pharmaceutical companies to protect electronic records to ensure their preservation throughout the records retention period. Additionally, it specifies that complex metadata, such as audit trails and documentation, must be retained at least as long as the electronic records.
• For financial services companies, SEC 17a-4 sets out which records must be retained, and for what time periods. The regulation also calls for companies to maintain a system to show the audit trail of each record, to provide verification that those records were not altered, and to store records in such a way that they cannot be altered, overwritten or erased.
• For all companies doing business in the United States, the Federal Rules of Civil Procedure (FRCP) apply. The eDiscovery amendment to the FRCP says that metadata - including audit trails and renditions, regardless of how complex - must be preserved and produced on demand, and specifies a default form for producing electronically stored information in a ‘reasonably usable’ form.
While there are records management solutions that help companies meet data retention requirements associated with the above regulations, they do not protect against the information loss and corruption which can result in an inability to comply with them.
Shortfalls of Traditional Backup & Recovery Practices
The key to protecting ECM system information is ensuring that the relationships between content and metadata are not broken during either backup or recovery, or during a logical failure. Logical failures account for over 80 percent of ECM information loss* and are caused by common occurrences such as user, programmatic, and operational errors, malfeasance, viruses, and metadata corruption. Regardless of the cause, ECM data loss and corruption can result in compliance risk, the inability to respond to eDiscovery demands, lost productivity, and loss of potential revenue.
Traditional enterprise solutions are structured for system-level backup and recovery, and are designed for response to full system failures or disasters. They do not provide recovery at the incident level for response to logical failures, and require companies to either suffer the repercussions of the failure, or recover from the last full backup. Traditional “cold” recovery disrupts business operations by requiring the ECM system to be brought down for a roll back, taking productive employees offline, and causes the loss of all additions and changes made to repository information since the last cold backup - which can mean extremely large amounts of information if the loss is not noticed until weeks or months after it occurs.
Hot backup practices using traditional enterprise solutions have become popular because they eliminate the need to take the ECM system offline for the system backup. However, this method is not foolproof. Most hot backup solutions capture content and metadata in an asynchronous manner, one after the other, while users may be making changes or additions to the information in the repository. This can cause the relationships between content and metadata to become disassociated, rendering information lost or inaccessible (corrupt) upon recovery. When the integrity of information is compromised in this way, the result is not only disruption of business operations (when a user attempts to access it), but also compliance-related risks since audit trails and other metadata may be inaccessible. ECM-specific solutions that provide synchronized hot backups do exist, but while they are ideal for recovery from a full system failure or natural disaster, they don’t enable recovery of information at the incident level.
Continuous Data Processing (CDP) is a form of always-on, continuous backup which captures changes as they are made. CDP creates a ‘snapshot’ at the point in time data is modified, capturing a record of transactions or changes while systems stay online. However, this solution requires the ECM system to be offline during recovery, and information cannot be recovered at a granular level. In addition, if the CDP technology used in an ECM repository doesn’t capture changes to both metadata and content simultaneously, all bets are off that integrity of information will be maintained.
Replication and Mirroring captures changes frequently or even as they happen, so recovery typically doesn’t require a roll back. In cases of logical information loss, however, these methods can compromise the integrity of documents and their related metadata because the loss is copied to the secondary location. If data is corrupted, damaged or deleted in the ECM system, it will be in the same state in the secondary copy.
Protecting ECM metadata and content
Clearly, ensuring the integrity and recoverability of ECM metadata and content so it can easily be retrieved and authenticated in response to business demands, eDiscovery requirements, audits and inspections is a critical business competence. To accomplish this, an ECM-specific granular recovery and integrity solution should be deployed. The goal is to ensure the integrity of information when it’s captured, and ensure its quick recovery with minimal effort and negative impact.
Look for an ECM-specific protection solution with the following capabilities:
• Integrity checks - Performs proactive integrity checks so that corrupt information is not captured by the system -ensuring, for example, that contract metadata can be restored to their original state.
• Granular recovery – The ability to rapidly identify and recover only the lost or corrupt ECM content and metadata affected by a logical loss incident with its integrity ensured.
• “Hot” capture and recovery –Perform information capture and recovery while the ECM system remains online to avoid disruption to work processes and maintain productivity.
• Incremental capture – At frequent intervals, capture only the information that has been changed or added to the repository.
• Synchronized capture –Capture both content and metadata in a single, synchronized pass while the ECM system is online to prevent data corruption and inaccessibility upon recovery.
• Adherence to records retention policies – The solution should comply with ECM system and enterprise-wide records retention policies by destroying captured information in accordance with those policies.
• Minimize data loss windows – The solution should enable ECM information captures at a near real-time frequency that will help meet or exceed recovery point objectives (RPOs).
• Fast recovery – The time it takes to recover from a logical failure should take only minutes, enabling you to meet or exceed recovery time objectives (RTOs).
• Minimal required recovery resources – The recovery process should be manageable by a single administrator.
Data recovery cannot be treated as the ugly stepsister of enterprise backup, and the special needs that ECM systems place on backup must not be ignored. Regulatory authorities and industry experts are beginning to demand more ECM- and compliance-savvy recovery management strategies, thereby setting new industry-wide legal precedents. One misstep can lead to disaster; however, there are approaches and ECM solutions that help avoid noncompliance, downtime and other incidents.
From the executive suite to the data center, maintaining critical information - and avoiding mis-steps that can lead to censure, fines and even jail time - requires an understanding of the relationship between ECM content and metadata. Taking steps to deploy a solution that ensures the integrity, recoverability and authenticity of information in ECM repositories is possible with proven technology, available today, which works with leading enterprise backup and recovery solutions. Protecting ECM information is an investment that cannot be overlooked.
*AIIM International & Strategic Research Inc.
Director of Product Management & Services
As Director, Product Management & Services for CYA Technologies, Mike Fernandes is responsible for managing services, as well as driving product enhancements, product strategy and market requirements. He has been in the technology field for more than 15 years and has experience in network engineering, implementation, and ASP hosting services.
Mike previously served in various roles for service and ASP companies such as Netzee/Concentrex, Inc., MECA Software, Precision Computer Services and Glencore Inc. He holds a BS in computer science from Sacred Heart University and is certified as a Microsoft systems engineer.