• DDN | BLOG
    • THINK BIG
    • INSIGHT & PERSPECTIVES ON
      LEVERAGING BIG DATA

It is more important than ever for life science researchers to choose the right technology provider as they treat cancer, enhance lives with personalized medicine and enable sustainable food production. As a scientist, you want a technology partner that understands your organization’s demands, and lets you focus on research problems while they address the exploding infrastructure demands that distract you from your primary objective.

At most life science organizations, data-driven research is grouped into three major categories: (a) genome sequencing, (b) modeling/simulation and (c) instrumented data analysis. Whether you are analyzing genomic data, developing pharmaceutical or agricultural products or performing life-saving clinical research, you are likely engaged in one of these three areas.

Genomics is the sequencing of DNA or RNA and the analysis of the resulting bioinformatics data. Modeling and simulation includes computational chemistry, molecular dynamics and computer-aided drug design. Some clients combine genotypic with phenotypic data from different instruments and sources to test new hypotheses. Furthermore, advances in sensor equipment with higher resolutions and frame rates also generate massive amounts of data in a short time. Regardless of the process, analytics resources are struggling to keep up with all these data requirements.

Shared Life Science Technology Vision

DDN and IBM have a joint history of more than a decade of delivering high-performance computing solutions. Both companies share a vision for high performance computing (HPC) in the most demanding environments. Combined with the IBM Spectrum Scale parallel file system and DDN’s GRIDScaler® family of high-performance appliances, this solution supports production systems with more than 30PB of data in a single namespace, more than 10,000 node clusters and greater than 400GB/s throughput in a single file system. Both companies can help life science researchers working on groundbreaking projects deliver actionable results in far less time.

  • Enabling collaborative research – Enable the distributed value-chain essential for collaboration and competitiveness
  • Improving data management by deploying an infrastructure able to manage exploding data volumes on storage with the appropriate cost/performance characteristics
  • Optimizing performance and contain costs by deploying a single infrastructure able to support both high-performance batch and Hadoop MapReduce oriented life-sciences workloads

Whether it’s pure research or handling clinical and patient data with maximum security, collaboration and encryption, GRIDScaler and IBM Spectrum Scale solutions can help life science researchers focus on science while DDN and IBM create infrastructure that scales.

Top Life Science Challenges

Our IBM and DDN staff recognize these challenges. It has become increasingly difficult to process and store growing volumes of complex data and manage the accompanying infrastructure. From next-generation sequencing, biomedical imaging and electronic medical records to curating scientific literature and analyzing instrumentation data, we see clients struggling with a very large volume, variety and velocity of data.

In fact, some organizations report their scientists are spending up to 80 percent of their time manually integrating data silos and less than 20 percent deriving insights from their analysis. Finally, collaboration is critical. Many organizations now require data sharing across international boundaries. In one client’s case, government regulations mandate that companies must use highly secure and encrypted infrastructure when sharing data between companies, or within divisions of the same company.

Researchers Focus on Science, Not Infrastructure

Scientists care about application performance, not infrastructure. For example, at the Max Delbrück Center for Molecular Medicine in Berlin, researchers observed a 7-times performance increase while using the Variant Calling with Genome Analysis Toolkit (GATK) while running on a DDN SFA12KE® with IBM Spectrum Scale, compared to a NFS server. The difference was dramatic: the job completed in 5 hours compared to 36 hours, allowing researchers to speed their analysis and accelerate their research accordingly.

Similarly, scientists at The Wellcome Trust Centre for Human Genetics at Oxford University found that DDN GRIDScaler and IBM Spectrum Scale helped the research team to focus on their critical work instead of supporting open-source solutions. We believe that clients need to fully appreciate the hidden costs of open-source alternatives, including the providing of support under critical discovery timelines. According to the Wellcome Trust Center staff, since both DDN GRIDScaler and Spectrum Scale are well supported and documented, the research team could focus on their research instead of supporting IT storage infrastructure.

Life Sciences Field Day

As the developer of IBM Spectrum Scale which is part of the DDN GRIDScaler solution, IBM is co-sponsoring this year’s DDN Life Science Field Day, co-located with the popular Basel Life conference. You can register for the DDN Life Sciences Field Day here.  Please join industry thought leaders for this one-day workshop where they will share their experiences, concerns, and plans for emerging trends around data challenges in life sciences research.

  • Peter Basmajian