Global
Arrays
Extending the ACTS
Toolkit with the Shared Memory Programing Model
Objective
The ACTS Toolkit project combines
strengths of multiple programming tools and environments designed for the
development of application codes that address complex, multidisciplinary
scientific and technical problems faced by DoE. The Global
Arrays (GA) shared memory NUMA programming model has been proven effective
in the development of scalable and efficient application
codes in several areas relevant to the DoE mission: such as quantum
chemistry, molecular dynamics, thermochemistry and in other application
domains.
The purpose of this project
is to further extend the capabilities of the Global Arrays package to address
needs of computational chemistry, molecular dynamics, and ground water
flow modeling applications, and integrate GA with other components
of the ACTS toolkit. The primary new capability of GA required by the applications
are the multidimensional global arrays. In addition, the library has to
be supported and optimized on the new hardware platforms as they become
available.
Approach
Funding
and Personnel
The project received funding
at the 0.75 FTE level and it was started in FY'98. The personnel includes
Jarek Nieplocha (PI), Joel Malard, and Jialin Ju (all PNNL).
Technical
Approach
The GA toolkit traditionally
offered a shared memory programming model in the context of the 2-dimensional
distributed arrays. The core functionality was based on the underlying
one-sided communication capabilities of GA. They were implemented in a
platform specific manner and tuned for the communication patterns in the
distributed 2-D arrays.

Based on the input received
from the existing applications and tools as well as anticipating future
developments required by the SSI (primarily in the combustion area), it
became clear that the specificity of the one-sided communication engine
in GA has became an obstacle to the expandability of the library. Therefore,
a new portable, flexible and efficient communication library was created
-- able to support arbitrary dimensional arrays as well as sparse data
structures. This library is called ARMCI
(Aggregate Remote Memory Copy Interface) and it is optimized for noncontiguous
data transfers predominantly used in many scientific calculations. ARMCI
is compatible with MPI-1 but it differs in
several key aspects from the MPI-2 one-sided communication standard.
Changes in the Architecture of the Global Arrays Toolkit
The Global Array library
has been rearchitected to:
Interoperability
The activity in this area has
been motivated by our understanding of the application needs. In particular,
interoperability with PADRE and A++/P++ is expected to be very important
for the combustion codes that would require coupling of grid based CFD
codes (A++/P++) and quantum chemistry codes (GA). In addition, the GA application
developers have been asking for language support in the GA shared memory
programming model. We established collaboration with the PCRC
HPspmd project that also aims to provide language extensions to Fortran,
C++, and Java consistent with the GA model.
Performance and Portability
Performance and scalability
of the GA applications relies on the efficiency of the one-sided communications
in Global Arrays (and now in ARMCI). To achive the best possible performance,
a significant component of this project has to be devoted to the platform-specific
performance tunning, especially on the MPP distibuted-memory platforms.
We established a close collaboration with IBM and have been able to influence
the design of their one-sided communication subsystem (LAPI)
on the IBM SP computer.
Recent
Accomplishments
New
capabilities
Applications
Both the number of applications
and application areas have been expanded since this project was started.
An example of a new application area is the processing of data from the
electron microscopy (University of Houston).
During last year the package
was downloaded by developers of applications in other new areas as well
(astrophysics, signal processing, DoD applications) but it is to early
determine if it will be actually useful for their purposes.
Recent improvements in performance
of the GA locking operations have greatly improved scalability of the COLUMBUS
MRCI code that had been parallelized with GA before. The new scalability
curve for this code on the IBM SP is shown here.
Publications
-
J. Nieplocha, B. Carpenter,
"ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries
and Compiler Run-time Systems", to appear in Proc. of 3rd Workshop on Runtime
Systems for Parallel Programming (RTSPP)
of the International Parallel Processing Symposium IPPS'99, 1999.
-
H. Dachsel, J. Nieplocha, RJ
Harrison, "An Out-of-Core Implementation of the COLUMBUS Massively Parallel
Multireference Configuration Interaction Program", Proc. SuperComputing'98
(The SC'98
Best Overall Paper Award), 1998.
Plans
for FY99
Ports and optimizations of the
ARMCI one-sided communication library:
take advantage of the shared
memory on the SMP nodes of the IBM SP to bypass the unnecessary active
message communication within SMP nodes,
exploit new capabilities in
the next version of LAPI, the IBM Active Message library (a beta version
already available at PNNL under a collaboration agreement with IBM),
port to MPI-2,
performance tuning on all the
supported platforms,
ports to new platforms as requested
by the users.
Development of the nonblocking
communication operations in ARMCI and GA.
Development of the collective
array operations on arbitrary sections of multidimensional arrays.
Interoperability with PADRE
and P++.
Interoperability with Tau.
Official release of the version
3.0 of GA (including the multidimensional arrays).
Tool
Availability
There are currently four distribution
channels for the GA package:
direct
internet download from the GA homepage,
included in the distribution
code of some GA applications,
downloaded with the High
Performance Virtual Machine (HPVM) communication package for clusters
(GA is a component
of HPVM) supported by the University of Illinois at Urbana Champaign and
the University of California at San Diego, and
by contacting the developers.
The package has been in public
domain since 1994 and it has been distributed at the average rate of 20
downloads/month from its internet site. From 1994 until April 1998, GA
was distributed from an anonymous ftp server. Since April 1998, all the
internet downloads have been automatically recorded in an Oracle database.
The GA toolkit has been distributed
with NWChem, a large computational chemistry package developed on top of
GA. NWChem has been installed at 150 sites.
The development version of
multidimensional GA has been available by contacting the developers. It
was distributed to:
LLNL (collaboration with PADRE)
University of Lund, Sweden
University of Syracuse
Argonne National Laboratory
Application developers at PNNL
(for development of an computational fluid dynamics code, and molecular
dynamics area - an implementation of the Carr-Parrinello
method )