iArchives

The Challenge

Finding any storage solution that could consolidate 25 different file systems on a single array, scale to massive capacity for image archiving, and deliver high throughput writes to simultaneously support multi-core, multi-threaded image processing and the ingest of new images from multiple scanners.

Application

iArchives converts microfilm and original print content into searchable, digitized online databases using high-speed scanners paired with proprietary image manipulation and optical character recognition (OCR) techniques.

Solution

A DDN S2A data infrastructure platform provided a cost-effective, high-capacity, consolidated extreme storage pool. With Consistenly Predictable Performance™--full write performance equaling system read performance, with guaranteed QoS—image capturing and processing operations were accelerated. In addition, the S2A made it finally possible to utilize cost-effective SATA drives with high reliability.

 

About iArchives

iArchives’ vision is to be the world leader in transforming microfilm and other print content into searchable, digitized, online databases. iArchives provides innovative technology and processes that substantially reduce the cost and time it takes to archive documents while enhancing the user’s experience in exploring those documents. The company’s state-of-the-art software converts print or microfilm-based content into a customized database searchable over the Internet or an intranet.

Customers such as the Dallas Morning News, Brigham Young University, the Library of Congress, the National Archives and Records Administration, the Church of Jesus Christ of Latter Day Saints, and the University of Utah, rely on iArchives to transform massive amounts of historical documents, records, and other content into digital archives that appear in their original context with the added benefit of being searchable. iArchives also hosts archived content through Footnote.com, the consumer website of iArchives.

 

Efficient Digitization Requires Extreme Storage

The company’s challenge was to consolidate 25 different file systems and associated storage into a single, common storage pool that could rapidly ingest scanned content while simultaneously providing high-speed access to technicians using custom software that implements proprietary OCR (Optical Character Recognition) techniques and unique algorithms to scan, de-skew (or auto-straighten), crop, clean up, enhance, and index images.

Because the image capturing and processing operations occur concurrently, there are extreme demands placed on the storage system. For ingesting content, iArchives uses five microfilm scanners and one microfiche scanner that run around the clock, seven days a week, streaming images at 100MB/s each. In addition, they regularly receive drives ranging in size from 750GB to one terabyte from partner companies who do their own scanning but want to have iArchives perform indexing on the content. iArchives operates a cluster computer consisting of more than 250 multi-threaded CPU cores, resulting in 400 or more threads hitting the storage array at any one time for processing. Each large original TIFF image will produce five derivative images, plus associated XML metadata files, after processing.

With 25 separate file systems, scanned data would be stored on the system with the most available space, depositing data among the different file systems and making it very difficult to locate, manage, and process the images. In addition, the cluster was often underutilized because the storage couldn’t support simultaneous processing by all the nodes.

"I've never seen an array as fast as the DataDirect Networks array!"

- Daniel Leaberry, Senior Systems Administrator, iArchives

"Our primary issue was not bandwidth for the scanners, but rather providing a storage system that could be responsive when every processing node hits the storage array and expects an immediate response."

He continued, "We were originally going to use generic 'white box' storage servers, but realized that they would be hard to support. We evaluated a number of storage solutions, but we were concerned about the complexity of implementing a failover solution with one product we considered."

"The Lustre file system community widely uses DataDirect Networks’ storage, so based on that and other positive research, we decided to invest in a DataDirect Networks solution, which was still competitive on price. I’m very happy with our decision."

 

The DDN Extreme Storage Solution

iArchives’ storage foundation is anchored by a DDN S2A data infrastructure platform. The S2A platform manages 86TB of cost-effective SATA drives and utilizes native InfiniBand interconnects which are paired with the Lustre parallel file system.

The S2A incorporates DDN’s SATAssure software. This intelligent SATA drive management software is unique in its ability to make large pools of SATA drives as safe and reliable as more expensive fibre channel drives by detecting and correcting silent data corruption without any impact to performance or capacity. The result is a substantial increase in uptime; a reduction in drive rebuilds and ensured data integrity. SATAssure software was an important consideration in the iArchives centralized storage configuration due in large part to the fact that it allowed the use of lower cost, larger capacity drives without sacrificing reliability or data availability.

The S2A’s performance has impressed Leaberry and iArchive’s technicians by delivering sustained high-throughput reads and writes as multiple systems simultaneously access content for processing while ingesting new content at the same time. Additionally, the system’s fault-tolerant architecture with inherent zero-time failover appealed to the team for hassle-free reliability.

"DataDirect Networks has such a fantastic array. We’re not yet using more than a quarter of its performance capabilities. My big file benchmark test yielded over one gigabyte per second throughput through the file system, and that’s on just a singlet," Leaberry enthused.

"The stability is marvelous. We’ve never had a controller crash or had any hiccups or failures at all. In the course of three months since we installed it, we’ve pushed at least 86TB through it and never had an issue."

Leaberry and his team liked DDN’s unique design approach, which integrates multiple zero-latency, full-access host ports, each able to access the entire pool of storage at full speed at the same time, while delivering solid reliability.

He continued, "We’re now adding new content at the rate of about 600GB per day, or nearly 20TB per month. The remarkable performance of the DataDirect Networks system has allowed us to dramatically increase the rate at which we are able to add new content, letting us complete projects for our customers much faster than before the DataDirect Networks implementation."

To keep pace with this growing need, and because the S2A platform has reduced his costs by half, Leaberry and his team are planning to add an additional DataDirect Networks S2A platform in the near future.

Leaberry summed it up, "I expected the DataDirect Networks solution to be much more expensive. I was pleasantly surprised at the price/performance ratio. In fact, I’ve been telling everyone I know that if they need a pool of storage in the neighborhood of 90TB or more, they should buy DataDirect Networks."

Take the Next Step

Schedule your Free Storage Consultation
We can help you to determine the best extreme storage solution for your needs.
1.800.TERABYTE
1.800.837.2298
 

Learn More About S2A Technology

Snapshot

S2A Extreme Storage accelerates iArchives & Footnote.com’s document digitization -- transforming microfilm and print content into searchable, digitized, online databases.