NATIONWIDE
COMUTING FABRIC
INFORMATION
TECHNOLOGY TRIATHLON
ENABLING
NEXT-GENERATION SCIENCE
imulations
on high-performance computers allow molecular biologists to discover how
proteins work and predict how drugs might be designed to target diseases.
Until recently, the most advanced simulations were limited to proteins
of, at most, 50,000 atoms. Today, thanks to advances in algorithms and
NPACI's Blue Horizon, biochemists and mathematicians at UCSD have performed
calculations on cellular structures with more than 1 million atoms. In
the near future, these researchers may be able to examine structures consisting
of tens of millions of atoms and begin to understand the fundamental forces
that drive cellular functions. Such revolutionary advances across the physical
and life sciences will be possible with the installation of the TeraGrid.
 |
Figure1. Layout
of the Distributed Terascale Facility
While the nodes
of the DTF are geographically distributed, the integrated hardware will
make the system behave as one colossal computing machine. |
In early August, the National
Science Foundation (NSF) awarded $53 million to four U.S. research institutions
to build and deploy a distributed terascale facility (DTF). The DTF will
be the largest, most comprehensive infrastructure ever deployed for scientific
research. It will compute at more than 13.6 teraflops (trillions of calculations
per second) while simultaneously managing more than 650 terabytes (trillions
of bytes) of data. This awesome, data-intensive computing power will be
connected by a cross-country fiber-optic backbone 16 times faster than
today's fastest research networks. All of these components will be tightly
integrated into an information infrastructure dubbed the TeraGrid.
"Breakthrough discoveries
in fields from genomics to astronomy depend critically on computational
and data management infrastructure as a first-class scientific tool," said
Fran Berman, director of NPACI and SDSC and one of the two principal investigators
of the TeraGrid award (see story, p. 1). "The TeraGrid recognizes the increasing
importance of data-oriented computing and connection of data archives,
remote instruments, computational sites, and visualization over high-speed
networks. The TeraGrid will be a far more powerful and flexible scientific
tool than any single supercomputing system."
NATIONWIDE
COMPUTING FABRIC
The four research institutions
in the DTF project are SDSC, the National Center for Supercomputing Applications
(NCSA) at the University of Illinois at Urbana-Champaign, Caltech, and
Argonne National Laboratory. Each institution has played and will continue
to play a key role in the NSF's Partnerships for Advanced Computational
Infrastructure (PACI) program. This program is charged with meeting the
expanding needs of the U.S. academic community for high-end information
technologies. SDSC is the leading-edge site for NPACI, and Caltech is a
key NPACI partner. NCSA leads the National Computational Science Alliance
(Alliance), and Argonne is a major Alliance partner.
The partnership expects to
work primarily with IBM, Intel Corporation, and Qwest Communications to
build the facility, along with Sun Microsystems, Myricom, and Oracle Corporation.
"Nothing like the DTF has ever been attempted before. This will be the
largest, most comprehensive infrastructure ever deployed for open scientific
research," said Dan Reed, director of NCSA and the Alliance and a principal
investigator of the TeraGrid award. "Unprecedented amounts of data are
being generated by new observatories and sensors, and groups of scientists
are conducting new simulations of increasingly complex phenomena. This
new age of science requires a sustainable national infrastructure that
can bring together new tools, powerful computers, and the best minds in
the country. This is the national infrastructure that will allow us to
solve the most pressing scientific problems of our time."
The DTF will consist primarily
of clustered IBM servers based on Intel Itanium-family processors connected
with Myricom's Myrinet. Linux clusters purchased through the DTF award
and distributed across the four DTF sites will total 11.6 teraflops of
computing power. In addition, two 1-teraflops Linux cluster systems already
in use at NCSA will be integrated into the DTF system, creating the 13.6-teraflops
system-the most powerful distributed computing system ever. Besides the
world's fastest unclassified supercomputers, the DTF's hardware and software
will include ultra-high-speed networks, high-resolution visualization environments,
and toolkits for grid computing. Scientists and industry researchers across
the country will be able to tap into this infrastructure to solve scientific
problems.
"The distributed terascale
facility will be a tremendous national resource," said NSF Director Rita
Colwell. "With this innovative facility, NSF will demonstrate a whole new
range of capabilities for computer science and fundamental scientific and
engineering research, setting high standards for 21st Century deployment
of information technology."
The clusters will operate
as a single distributed facility, linked via a dedicated optical network
that will initially operate at 40 gigabits per second and later be upgraded
to 50-80 gigabits per second. The DTF network, developed in partnership
with Qwest, will transport data 16 times faster than the fastest research
networks now in operation. It will connect to Abilene, the high-performance
network that links more than 180 research institutions across the country,
STAR TAP, an interconnect point in Chicago that provides access to and
from international research networks, and CENIC's CalREN-2, an advanced
high-speed network that connects institutions in California. In Illinois,
I-WIRE optical network will provide the DTF with network capacity and will
give Argonne and NCSA additional bandwidth for related network-research
initiatives.
INFORMATION
TECHNOLOGY TRIATHLON
The DTF architecture demonstrates
that the TeraGrid has been designed with much more than sheer computing
performance in mind. High-performance systems have traditionally been designed
for the computing equivalent of a 100-meter sprint-the more flops (floating-point
operations per second), the better. But the TeraGrid is targeting an information
technology triathlon-huge amounts of online data storage and network bandwidth
as well as speedy computing performance.
To ensure that the DTF achieves
its full potential, each of the four sites will play a unique role in the
project. SDSC will lead the TeraGrid data and knowledge management effort
by deploying a data-intensive IBM Linux cluster based on Intel Itanium-family
processors. This system will have a peak performance of just over 4 teraflops
and 225 terabytes of network disk storage. In addition, a next-generation
Sun Microsystems high-end server will provide a gateway to grid-distributed
data for data-oriented applications.
NCSA will lead the TeraGrid
project's computational aspects with an IBM Linux cluster powered by the
next generation of Intel Itanium processors, code-named McKinley. The cluster's
peak performance will be 8 teraflops, combining the DTF-funded systems
and other NCSA clusters, with 240 terabytes of secondary storage.
Caltech will focus on providing
online access to very large scientific data collections and will facilitate
access to those data by connecting data-intensive applications to components
of the TeraGrid. Caltech will deploy a 0.4-teraflops IBM Itanium-family
processor cluster and an IA-32 cluster that will manage 86 terabytes of
online storage.
"An exciting prospect for
the TeraGrid is that, by integrating simulation and modeling capabilities
with collection and analysis of huge scientific databases, it will create
a computing environment that unifies the research methodologies of theory,
experiment, and simulation," said Paul Messina, director of Caltech's Center
for Advanced Computing Research and a TeraGrid co-principal investigator.
Argonne will lead the effort
to deploy advanced distributed computing software, high-resolution rendering
and remote visualization capabilities, and networks. This effort will require
a 1-teraflops IBM Linux cluster with parallel visualization hardware.
ENABLING
NEXT-GENERATION SCIENCE
"Supercomputing traditionally
has been associated with weather and aircraft design," said Rick Stevens,
director of the Mathematics and Computer Science Division at Argonne National
Laboratory and TeraGrid co-principal investigator. "Recent breakthroughs
in chemistry and the life sciences, however, have presented an even greater
demand for advanced computation. If we are to achieve the performance necessary
to support these new applications, we must develop capabilities to harness
the collective power of not only dozens of supercomputers, but also thousands
of individual PCs. The DTF will provide critical insight into building
such systems, while immediately enabling new classes of science."
The TeraGrid will enable scientists
to answer the next generation of science questions. Researchers will be
able to swiftly look for meaningful relationships across scientific disciplines
to gain powerful new insights into everything from human diseases and climate
change, to earthquake prediction and the evolution of the universe. -DH
|
Principal
Investigators
Fran Berman
SDSC
Dan Reed
NCSA
Co-Investigators
Rick Stevens,
Ian Foster
ANL
Paul Messina
Caltech
|