TERAGRID | Contents | Next

Transforming Research with High-Performance Grid Computing 

NATIONWIDE COMUTING FABRIC
INFORMATION TECHNOLOGY TRIATHLON
ENABLING NEXT-GENERATION SCIENCE

imulations on high-performance computers allow molecular biologists to discover how proteins work and predict how drugs might be designed to target diseases. Until recently, the most advanced simulations were limited to proteins of, at most, 50,000 atoms. Today, thanks to advances in algorithms and NPACI's Blue Horizon, biochemists and mathematicians at UCSD have performed calculations on cellular structures with more than 1 million atoms. In the near future, these researchers may be able to examine structures consisting of tens of millions of atoms and begin to understand the fundamental forces that drive cellular functions. Such revolutionary advances across the physical and life sciences will be possible with the installation of the TeraGrid.
 
Layout of the Distributed Terascale Facility
Figure1. Layout of the Distributed Terascale Facility

While the nodes of the DTF are geographically distributed, the integrated hardware will make the system behave as one colossal computing machine.

In early August, the National Science Foundation (NSF) awarded $53 million to four U.S. research institutions to build and deploy a distributed terascale facility (DTF). The DTF will be the largest, most comprehensive infrastructure ever deployed for scientific research. It will compute at more than 13.6 teraflops (trillions of calculations per second) while simultaneously managing more than 650 terabytes (trillions of bytes) of data. This awesome, data-intensive computing power will be connected by a cross-country fiber-optic backbone 16 times faster than today's fastest research networks. All of these components will be tightly integrated into an information infrastructure dubbed the TeraGrid.

"Breakthrough discoveries in fields from genomics to astronomy depend critically on computational and data management infrastructure as a first-class scientific tool," said Fran Berman, director of NPACI and SDSC and one of the two principal investigators of the TeraGrid award (see story, p. 1). "The TeraGrid recognizes the increasing importance of data-oriented computing and connection of data archives, remote instruments, computational sites, and visualization over high-speed networks. The TeraGrid will be a far more powerful and flexible scientific tool than any single supercomputing system." 

NATIONWIDE COMPUTING FABRIC

The four research institutions in the DTF project are SDSC, the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, Caltech, and Argonne National Laboratory. Each institution has played and will continue to play a key role in the NSF's Partnerships for Advanced Computational Infrastructure (PACI) program. This program is charged with meeting the expanding needs of the U.S. academic community for high-end information technologies. SDSC is the leading-edge site for NPACI, and Caltech is a key NPACI partner. NCSA leads the National Computational Science Alliance (Alliance), and Argonne is a major Alliance partner.

The partnership expects to work primarily with IBM, Intel Corporation, and Qwest Communications to build the facility, along with Sun Microsystems, Myricom, and Oracle Corporation. "Nothing like the DTF has ever been attempted before. This will be the largest, most comprehensive infrastructure ever deployed for open scientific research," said Dan Reed, director of NCSA and the Alliance and a principal investigator of the TeraGrid award. "Unprecedented amounts of data are being generated by new observatories and sensors, and groups of scientists are conducting new simulations of increasingly complex phenomena. This new age of science requires a sustainable national infrastructure that can bring together new tools, powerful computers, and the best minds in the country. This is the national infrastructure that will allow us to solve the most pressing scientific problems of our time."

The DTF will consist primarily of clustered IBM servers based on Intel Itanium-family processors connected with Myricom's Myrinet. Linux clusters purchased through the DTF award and distributed across the four DTF sites will total 11.6 teraflops of computing power. In addition, two 1-teraflops Linux cluster systems already in use at NCSA will be integrated into the DTF system, creating the 13.6-teraflops system-the most powerful distributed computing system ever. Besides the world's fastest unclassified supercomputers, the DTF's hardware and software will include ultra-high-speed networks, high-resolution visualization environments, and toolkits for grid computing. Scientists and industry researchers across the country will be able to tap into this infrastructure to solve scientific problems.

"The distributed terascale facility will be a tremendous national resource," said NSF Director Rita Colwell. "With this innovative facility, NSF will demonstrate a whole new range of capabilities for computer science and fundamental scientific and engineering research, setting high standards for 21st Century deployment of information technology."

The clusters will operate as a single distributed facility, linked via a dedicated optical network that will initially operate at 40 gigabits per second and later be upgraded to 50-80 gigabits per second. The DTF network, developed in partnership with Qwest, will transport data 16 times faster than the fastest research networks now in operation. It will connect to Abilene, the high-performance network that links more than 180 research institutions across the country, STAR TAP, an interconnect point in Chicago that provides access to and from international research networks, and CENIC's CalREN-2, an advanced high-speed network that connects institutions in California. In Illinois, I-WIRE optical network will provide the DTF with network capacity and will give Argonne and NCSA additional bandwidth for related network-research initiatives. 

INFORMATION TECHNOLOGY TRIATHLON

The DTF architecture demonstrates that the TeraGrid has been designed with much more than sheer computing performance in mind. High-performance systems have traditionally been designed for the computing equivalent of a 100-meter sprint-the more flops (floating-point operations per second), the better. But the TeraGrid is targeting an information technology triathlon-huge amounts of online data storage and network bandwidth as well as speedy computing performance.

To ensure that the DTF achieves its full potential, each of the four sites will play a unique role in the project. SDSC will lead the TeraGrid data and knowledge management effort by deploying a data-intensive IBM Linux cluster based on Intel Itanium-family processors. This system will have a peak performance of just over 4 teraflops and 225 terabytes of network disk storage. In addition, a next-generation Sun Microsystems high-end server will provide a gateway to grid-distributed data for data-oriented applications.

NCSA will lead the TeraGrid project's computational aspects with an IBM Linux cluster powered by the next generation of Intel Itanium processors, code-named McKinley. The cluster's peak performance will be 8 teraflops, combining the DTF-funded systems and other NCSA clusters, with 240 terabytes of secondary storage.

Caltech will focus on providing online access to very large scientific data collections and will facilitate access to those data by connecting data-intensive applications to components of the TeraGrid. Caltech will deploy a 0.4-teraflops IBM Itanium-family processor cluster and an IA-32 cluster that will manage 86 terabytes of online storage.

"An exciting prospect for the TeraGrid is that, by integrating simulation and modeling capabilities with collection and analysis of huge scientific databases, it will create a computing environment that unifies the research methodologies of theory, experiment, and simulation," said Paul Messina, director of Caltech's Center for Advanced Computing Research and a TeraGrid co-principal investigator.

Argonne will lead the effort to deploy advanced distributed computing software, high-resolution rendering and remote visualization capabilities, and networks. This effort will require a 1-teraflops IBM Linux cluster with parallel visualization hardware.

ENABLING NEXT-GENERATION SCIENCE

"Supercomputing traditionally has been associated with weather and aircraft design," said Rick Stevens, director of the Mathematics and Computer Science Division at Argonne National Laboratory and TeraGrid co-principal investigator. "Recent breakthroughs in chemistry and the life sciences, however, have presented an even greater demand for advanced computation. If we are to achieve the performance necessary to support these new applications, we must develop capabilities to harness the collective power of not only dozens of supercomputers, but also thousands of individual PCs. The DTF will provide critical insight into building such systems, while immediately enabling new classes of science."

The TeraGrid will enable scientists to answer the next generation of science questions. Researchers will be able to swiftly look for meaningful relationships across scientific disciplines to gain powerful new insights into everything from human diseases and climate change, to earthquake prediction and the evolution of the universe. -DH 


 

Principal Investigators
Fran Berman
SDSC

Dan Reed
NCSA


Co-Investigators
Rick Stevens,
Ian Foster
ANL

Paul Messina
Caltech