From The Editor | December 9, 2004

EMC Sees User Acceptance Of CAS

Little more than two years after its release, Centera, EMC's CAS
(content addressed storage)-based solution, has already made its
way to more than 1,000 customers. Centera exec Roy Sanford outlines
some key advantages of storing fixed content as addressable objects.

By Tom von Gunden

In Massachusetts recently, I spent the better part of a day at EMC headquarters in Hopkinton. I was there, primarily, to get an update on Centera, EMC's fixed content storage solution. When it was initially released in 2002, Centera more or less defined a new category of storage: CAS (content addressed storage). Designed to archive, protect, manage, and deliver fixed content (i.e. content that will not — in fact, must not — be rewritten), Centera treats and stores files as objects, with each object having its own unique content address. That approach allows Centera users to store and retrieve such pieces of fixed content as medical images, document images, e-mail, manufacturing designs, and so on.

Sitting in an EMC demo room, I got a firsthand look at Centera Compliance Edition, a version of Centera that specifically targets organizations' compliance-driven archival needs, including adherence to SEC 17a4 and Sarbanes-Oxley requirements. I watched as CentraStar, the management software installed on the Centera units, stored a sample letter that needed to be retained in unalterable form. Both the file itself and the metadata about the file (what EMC calls the CDF, or content descriptor file) were stored to a content address assigned by CentraStar. The software also applied a retention schedule for the file. (Centera applies retention policies set by whichever content management application the organization is using; in this case, EMC's own Documentum suite.) By associating the file — technically, the object — with a unique content address, Centera ensured that only one copy of the sample letter was stored. All authorized users of that file would simply be given pointers to the same content address. If, for instance, the file were an e-mail message with attachments, CentraStar would minimize the capacity burden on Centera's disk arrays by storing only one copy of the e-mail message. Perhaps most important in terms of capacity utilization, it would also store only one copy of any attachments to that e-mail, no matter how many users across the organization originally received the message.

After the demo, I sat down with Roy Sanford, VP of marketing and alliances for Centera, to get his views on the evolution of the product and its impact on the market.

Tom: Centera was initially released about two and a half years ago. At that time, the concept of content addressed storage, or CAS, was little known by most end users. Has market awareness and acceptance of CAS-based storage evolved at the pace and depth you would have hoped for?

Roy: We're definitely pleased with the progress we're making in establishing CAS as an architecture for archival storage. In 2002, we felt we were creating both the topology for CAS and the market for it. So, we knew we'd have some work to do to help people understand the value of CAS. Over the last two years, we've been getting strong indicators that partners and customers have come to agree with us. We recently announced that we've passed a key growth threshold: 1,000 customers. We also have more than 400 Centera partners. And, Centera integrates with more than 50 ECM applications, including Documentum. By the way, Documentum was a strong EMC vendor partner even before the acquisition. Other major ECM suites, such as FileNet and IBM Content Manager, are among the applications that integrate with Centera.

In addition to comprehensive content management suites from FileNet and IBM, which point products line up well with Centera? Some organizations looking to support their records management initiatives may not have or need a high-end ECM suite.

We understand that. E-mail archiving has turned out to be a particularly strong sector for Centera. All of the major e-mail archiving products are integrated with Centera. Many of these were natively integrated with Centera even before EMC's acquisition of LEGATO, which also has e-mail archiving capabilities.

Medical imaging is another important point-specific application for us. But, really, any application that generates or uses content — e-mail, voice mail, documents, images, and so on — is ripe for support from Centera.

What would a typical Centera customer have previously used to archive and access content it wanted to keep in unalterable form?

At the user level, customers were relying on what they still use: a content management application. At the infrastructure and storage management level, they were using conventional storage technologies: optical devices, disk arrays, or tape. And they typically applied only basic HSM [hierarchical storage management] software functionality. They would initially put content on their most expensive, high-performance media. Then as the content aged, they would move it to less expensive media — usually, moving it offline — and put up with lower performance in terms of retrieval. They often had to embed storage management functions at the content management application level instead of in the storage infrastructure, where those functions should reside.

With Centera, users no longer have to configure ways to push content through various storage topologies. They can now manage the placement of an object through a simple API between Centera and their content management app.

When end users discuss the storage management challenges they face, they often point to the sheer volume of incoming data. Is Centera designed to help alleviate the pain of dealing with data growth?

Customers tend to look at data growth as only a capacity issue. They worry about how many TBs are coming into the infrastructure and how much raw capacity will be needed to store all of those TBs. But, it's not just capacity they should be worrying about. It's also the number of objects they have to manage and the likelihood of retrieving those objects within a reasonable time period. Most file systems are limited in terms of how quickly they can deliver files once the file system reaches a certain size. The recovery times of most file systems begin to noticeably degrade when the system is storing more than few million objects. With Centera, you can store billions of objects, and your recovery or read time will remain in the subseconds.

What are some applications or business requirements that illustrate the ease and speed of retrieving objects from a CAS-based system?

A good example is check imaging. One of our financial services providers receives 100 million new check images each month. Its traditional approach was to package up these images into very large batches of information. One file might contain hundreds of check images. Having all those images in one file slows the search when a banking customer makes a request to see if a particular check has cleared. So, the ability to store each check image as a discrete object in Centera allows the provider to much more efficiently address customer service needs. One Centera customer, a large bank, charges customers $3 per month for the ability to view check images online. So, not only has this bank unlocked the value of stored objects as revenue generators, it has also reduced its distribution costs. The bank no longer sends out photocopies of checks.

E-mail archiving is another example. Some of our large customers are storing as many as a million e-mail messages each day. Because they're subject to SEC 17a4 records retention regulations, these organizations have to store e-mail for three years. So, pretty quickly, they're on the way to accumulating billions of stored e-mail messages. If the organization is audited or involved in litigation, the request won't be to produce all stored e-mail. The request will be to deliver all e-mail from within a particular slice of time or all e-mail associated with a certain keyword. The ability to identify and retrieve e-mail messages stored as individual objects will become crucial during an audit or during the discovery phase of a lawsuit.

So, it sounds as if compliance is a key driver for adoption of Centera. Hence, we've seen the release of the Compliance Edition of Centera.

Yes, compliance is certainly a market for us, but that's not the only reason customers are deploying Centera as their enterprise archiving solution. Some aren't faced with strict regulatory compliance requirements, but they still like the retention protection intrinsic to Centera.

To manage the review, approval, and business integration of key enterprise content, customers often need to push that content through applications for BPM (business process management) and workflow. Does Centera integrate with those processes?

Centera can augment a BPM or workflow strategy by serving as the real-time storage platform for an application that manages those processes. Centera doesn't set the processes, but it can enforce the processes. When a document has gone through its audit and needs to be locked down for its retention period, Centera will accept the object, retain it, and secure it until the retention period expires.

Another key customer need is disaster recovery. Would customers want to consider Centera as part of a remote replication strategy?

About one in three customers buys two Centera units and replicates objects remotely from one to the other using low-cost IP [Internet protocol] connections. They realize that, while backing up 10 TB may not be that difficult, quickly recovering 10 TB is next to impossible. With Centera remote replication, failover is basically through a remote IP address, so you have immediate access to content even if you have an outage on your primary Centera.

Interestingly, customers are starting to rethink basic distinctions between backup and archiving. Customers are seeing that backup copies are typically overwritten and have a limited life cycle, perhaps only days or weeks. By contrast, archives house primary copies that are maintained for a reason, often for long periods of time, perhaps decades. Many customers need fast access to that information. So, the traditional method of moving data to optical discs or vaulting it in offsite tape storage doesn't address those customers' needs. They need rapid access to online content at a reasonable cost.

Which types of content are you seeing customers rely on Centera to store?

We've seen just about any type of content you can imagine: e-mail, medical images, electronic patient records, check images, mortgage applications, insurance applications, voice mail, rich media files, instant messages, audio files, satellite imagery, manufacturing design documents, repair documents, pharmaceutical information, versions of software … you name it.

It's interesting to see how customers' use of Centera evolves. More than 75% of our customers initially deploy Centera to replace tape or optical storage. They see the value of being able to access content in real time. But, when they come to understand that Centera is actually an enterprise archive infrastructure, not just a storage device, they start migrating additional information to it. We're seeing an uptake in customers taking non-digitized content, scanning it, and putting it on Centera. Doing that enables their records managers to be more efficient.

Speaking of the convergence of content management and storage, is there anything else you'd like to say about the relationship between Documentum and Centera?

As I mentioned, the relationship predates EMC's acquisition of Documentum. So, we've always been working on greater and more detailed integration between Centera and Documentum at the content services layer. But, as I also mentioned earlier, we work to integrate Centera with lots of other solutions. In fact, we take requests from customers to put their planned solutions through our certification process. We label a certified solution CARS (content archive retrieval system). That means we've put it through a full, integrated test of all the elements. The CARS solution for Documentum is one such proven solution. The benefit for customers is they don't have to try untested solutions in their production environments or use their own labor and resources to pretest solutions. We test the solutions in our labs and publish the certified architecture for the customer. That lowers customers' costs and decreases the time spent on implementation. This is not a casual exercise. EMC has invested more than $2 billion in the last few years on interoperability testing of solutions.


Tom von Gunden is chief editor of ECM Connection and Data Storage Connection