DESIGNING YOUR LONG-TERM BULK ARCHITECTURE
While organizations take advantage of many types of storage, this paper will focus on long-term, bulk storage as it represents the vast majority of data — and optimal management of this type of storage will have the greatest impact on meeting an organization’s long-term storage requirements.
Typically, long-term, bulk storage is used for unstructured data (e.g. audio, video, email or documents) which represents 80% of all data. Organizations also have a financial justification to keep this data on disk for fast retrieval. Therefore, the storage infrastructure must allow operating systems, applications or users to actively reference the data. Long-term, bulk storage excludes data stored in databases, object oriented data or data that must be retained for very long periods (or archived for legal conformance or discovery).
|Age in Days||Probability of Reference|
How does one build an efficient bulk data repository that optimizes the use of stack managed virtualized storage?
The first step is to classify the performance service level for the types of applications the repository will support and the unstructured data that it will store. Unstructured data can have highly variable performance requirements. For instance, some businesses may consider an Exchange server to be a mission critical application, requiring very high performance service levels. Conversely, documents may have a very low reference probability and therefore, require a low service level for performance. Classifying the data according to required performance levels not only helps to define the technical requirements for the infrastructure, it will also expose the optimal management methodology.
The second step is to understand how data will be used initially and over time. This helps to determine how the data should be stored and whether any business-justified reasons exist to migrate the data to a different tier of storage over time. Typically, the reference behavior of most user data falls steeply over time. After the first day, the probability that the record will be referenced again is about 75%; after 90 days, the probability of reference is near zero.
After the analysis, an organization might determine that some data should be captured onto tier-2 storage and remain there for its entire lifecycle. Other data might originate on high-performance, tier-1 storage, which is the most feature-rich and expensive type of storage, then migrate it to tier-2 storage (which has terrific performance and reliability at less than half the cost) as its access requirements decline.
Information Lifecycle Management is the term used to describe a tiered storage infrastructure that is managed by hierarchical storage management capabilities in the stack. A tiered storage architecture allows organizations to optimally align the performance and availability characteristics of their storage hardware with their business requirements.
When the IT organization does not actively classify and manage the location of unstructured data within their infrastructure, storage systems typically contain considerable inert data that has not been referenced in six months or more. Inert data occupies, but essentially wastes, inordinate amounts of capacity. It makes far better sense to move low reference probability data, or inert data, off expensive tier-1 disks and return the capacity for use by high performance data. This allows the organization to “grow” its storage capacity without cost.
Storage classification also improves the efficiency of the protection architecture. When administrators attempt to restart or recover a failed storage volume, they often find multiple copies of disk volumes, along with all of the inert data on them, replicated multiple times. Commonly, too many copies are made and captured on expensive tier-1 storage when they could be housed on tier-2 long-term, bulk storage.
The biggest objection storage administrators typically have to adopting a cost-effective tiered storage architecture is the time and effort necessary to set up the management controls. The good news is that cost effective software greatly simplifies this process. Storage administrators can use a migration manager running in the stack to apply simple classification guidelines to locate, move and manage hierarchical tiered storage.