Robust Biobank, Through Storage Expansion

Tohoku University Tohoku Medical Megabank Organization (ToMMo)

“By bringing in new technologies we didn’t have in the beginning – AI,GPUs, next-generation CPUs – we have boosted our analytical capacity and our sample sizes, giving us a major leg up in terms of the accuracy of our data analyses. Therefore, by being able to deal with a much larger dataset, methods that had previously only existed in theory have now become viable for our use.”

Professor Kengo Kinoshita, Ph. D. (Science), Graduate School of Information Sciences, Tohoku University

Tohoku University Tohoku Medical Megabank Organization (ToMMo) was established in February of 2012 to begin building the advanced medical system. Created at the heart of the disaster area hit by the 2011 Great East Japan Earthquake and Tsunami, it also sought to help rebuild those areas. Its supercomputer system, launched in July of 2014, underpins a significant part of its operations and has produced a great deal of results. The system consolidates analytical data about biospecimens, including health survey and genome sequence data; that data can then be shared with researchers across Japan through a registration and review process. ToMMo is also developing special educational programs in leading-edge medical fields, like genetic research.

The number of whole-genome sequence data from samples supplied by residents in the disaster area is expected to reach around 5,000 by the end of 2019, and the number of total participants in the organization’s cohort studies has reached 150,000. These are not only used by ToMMo, but also by a large number of outside researchers to accelerate and advance a great deal of research projects. With the support of the Japan Agency for Medical Research and Development, the system was overhauled in 2018, improving the organization’s competitiveness internationally and providing and sharing data and computational/analytical functionality with researchers and research organizations struggling to secure such resources.





The Challenge

  • Keeping costs down while updating a system under a tight budget.
  • Enabling external access for data analysis and sharing, and building a foundation for a national-level initiative across all of Japan.
  • An increased need for the system’s ability to run simulations, and boost storage capacity.
  • Moving large amounts of data without system outages.

The Solution

  • DDN ES14KX, SFA12KX, SFA7700X, EXAScaler, GRIDScaler
  • 29PB Total

The Benefits

“When a child born today falls ill fifty years from now, being able to trace their entire medical history would be an incredible achievement. I want to build a system where the data when that person was three years old can be called up in an instant. That vision is what led us to expand our total datastorage pool to 29 petabytes.”

Professor Kengo Kinoshita, Ph. D

  • Enabled system overhaul under limited budgets, combining Infiniband and 40/10 Gbps Ethernet with a DDN Storage solution to improve performance and build capabilities appropriate for the different needs for each of the three computational units.
  • Greatly expanded storage capacity to 29 PB, improving data biobank functionality and supporting Japan’s genomic medicine research, all while limiting cost outlays.
  • DDN Storage is connected to a NVIDIA DGX-1 GPU-based analysis server running Parabricks (genomic analysis software); the combination of these technologies results in superior performance.
  • Smoothly migrated an enormous amount of existing data (approx. 6 PB) to the new system within only two days without interrupting research activities.
ToMMo Diagram

*This article was drafted based on interviews conducted at the Tohoku University Tohoku Medical Megabank Organization on January 31, 2019