Global Arrays

Extending the ACTS Toolkit with the Shared Memory Programing Model

 

Objective

The ACTS Toolkit project combines strengths of multiple programming tools and environments designed for the development of application codes that address complex, multidisciplinary scientific and technical problems faced by DoE. The Global Arrays (GA) shared memory NUMA programming model has been proven effective in the development of scalable and efficient application codes in several areas relevant to the DoE mission: such as quantum chemistry, molecular dynamics, thermochemistry and in other application domains.

The purpose of this project is to further extend the capabilities of the Global Arrays package to address needs of computational chemistry, molecular dynamics, and ground water flow modeling applications, and integrate GA  with other components of the ACTS toolkit. The primary new capability of GA required by the applications are the multidimensional global arrays. In addition, the library has to be supported and optimized on the new hardware platforms as they become available.

Approach

Funding and Personnel

The project received funding at the 0.75 FTE level and it was started in FY'98. The personnel includes Jarek Nieplocha (PI), Joel Malard, and Jialin Ju (all PNNL).

Technical Approach

The GA toolkit traditionally offered a shared memory programming model in the context of the 2-dimensional distributed arrays. The core functionality was based on the underlying one-sided communication capabilities of GA. They were implemented in a platform specific manner and tuned for the communication patterns in the distributed 2-D arrays.

Based on the input received from the existing applications and tools as well as anticipating future developments required by the SSI (primarily in the combustion area), it became clear that the specificity of the one-sided communication engine in GA has became an obstacle to the expandability of the library. Therefore, a new portable, flexible and efficient communication library was created -- able to support arbitrary dimensional arrays as well as sparse data structures. This library is called ARMCI (Aggregate Remote Memory Copy Interface) and it is optimized for noncontiguous data transfers predominantly used in many scientific calculations. ARMCI is compatible with MPI-1 but it  differs in several key aspects from the MPI-2 one-sided communication standard.


Changes in the Architecture of the Global Arrays Toolkit

The Global Array library has been rearchitected to:

  • support multidimensional arrays for integer, double, and double complex datatypes, and
  • take advantage of the ARMCI one-sided communication capabilities.
  • Interoperability
    The activity in this area has been motivated by our understanding of the application needs. In particular, interoperability with PADRE and A++/P++ is expected to be very important for the combustion codes that would require coupling of grid based CFD codes (A++/P++) and quantum chemistry codes (GA). In addition, the GA application developers have been asking for language support in the GA shared memory programming model. We established collaboration with the PCRC HPspmd project that also aims to provide language extensions to Fortran, C++, and Java consistent with the GA model.
     

    Performance and Portability

    Performance and scalability of the GA applications relies on the efficiency of the one-sided communications in Global Arrays (and now in ARMCI). To achive the best possible performance, a significant component of this project has to be devoted to the platform-specific performance tunning, especially on the MPP distibuted-memory platforms. We established a close collaboration with IBM and have been able to influence the design of their  one-sided communication subsystem (LAPI) on the IBM SP computer.
     

    Recent Accomplishments

    New capabilities

  • ARMCI, a new portable one-sided communication library developed by this project
  • ARMCI has been ported to the IBM SP, Cray T3E and Cray J90, all major brands of workstations (Unix and Windows NT), SGI Origin, clusters of UNIX workstations, and Fujitsu VX/VPP. This represents the complete list of systems the GA developers have access to.
  • ARMCI has been used for development of the multidimensional Global Arrays at PNNL.
  • ARMCI has been employed at the University of Syracuse to in the PCRC compiler run-time library Adlib. They achieved 20% improvement in performance of collective array operations on shared memory platforms over the existing MPI implementation of these operations.
  • the multidimensional Global Arrays
  • supported operations correspond to the functionality present in the 2-D GA such as
  • one-sided operations including put, get, accumulate, scatter, gather, scatter
  • collective operations on entire arrays
  • The remaining subset of functionality of the 2-D GA not implemented yet includes collective operations on sections of multidimensional arrays.
  • Quality Assurance: a suite of test programs was developed using the m4 preprocessor and Fortran code templates to verify the operational correctness for the array dimensions ranging from 1 to 7 (max value supported by Fortran)  and the three supported datatypes (21 cases total).
  • The portability of multidimensional GA is defined by portability of ARMCI.
  • ports of Global Arrays to additional platforms: Windows NT (shared memory, to complement HPVM that supports GA on NT and Linux clusters) and Cray J90 which were not supported by the 2-D GA library.
  • a new C interface to GA (primarily to simplify interoperability with PADRE and Overture)
  • Applications

    Both the number of applications and application areas have been expanded since this project was started. An example of a new application area is the processing of data from the electron microscopy (University of Houston).

    During last year the package was downloaded by developers of applications in other new areas as well (astrophysics, signal processing, DoD applications) but it is to early determine if it will be actually useful for their purposes.

    Recent improvements in performance of the GA locking operations have greatly improved scalability of the COLUMBUS MRCI code that had been parallelized with GA before. The new scalability curve for this code on the IBM SP is shown here.
     

    Publications

    1. J. Nieplocha, B. Carpenter, "ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems", to appear in Proc. of 3rd Workshop on Runtime Systems for Parallel Programming (RTSPP) of the International Parallel Processing Symposium IPPS'99, 1999.
    2. H. Dachsel, J. Nieplocha, RJ Harrison, "An Out-of-Core Implementation of the COLUMBUS Massively Parallel Multireference Configuration Interaction Program", Proc. SuperComputing'98 (The SC'98 Best Overall Paper Award), 1998.

    3.  

    Plans for FY99

  • Ports and optimizations of the ARMCI one-sided communication library:
  • take advantage of the shared memory on the SMP nodes of the IBM SP to bypass the unnecessary active message communication within SMP nodes,
  • exploit new capabilities in the next version of LAPI, the IBM Active Message library (a beta version already available at PNNL under a collaboration agreement with IBM),
  • port to MPI-2,
  • performance tuning on all the supported platforms,
  • ports to new platforms as requested by the users.
  • Development of the nonblocking communication operations in ARMCI and GA.
  • Development of the collective array operations on arbitrary sections of multidimensional arrays.
  • Interoperability with PADRE and P++.
  • Interoperability with Tau.
  • Official release of the version 3.0 of GA (including the multidimensional arrays).
  •  

    Tool Availability

    There are currently four distribution channels for the GA package:
  • direct internet download from the GA homepage,
  • included in the distribution code of some GA applications,
  • downloaded with the High Performance Virtual Machine (HPVM) communication package for clusters (GA is a component of HPVM) supported by the University of Illinois at Urbana Champaign and the University of California at San Diego, and
  • by contacting the developers.
  • The package has been in public domain since 1994 and it has been distributed at the average rate of 20 downloads/month from its internet site. From 1994 until April 1998, GA was distributed from an anonymous ftp server. Since April 1998, all the internet downloads have been automatically recorded in an Oracle database.

    The GA toolkit has been distributed with NWChem, a large computational chemistry package developed on top of GA. NWChem has been installed at 150 sites.

    The development version of multidimensional GA has been available by contacting the developers. It was distributed to:

  • LLNL (collaboration with PADRE)
  • University of Lund, Sweden
  • University of Syracuse
  • Argonne National Laboratory
  • Application developers at PNNL (for development of an computational fluid dynamics code, and molecular dynamics area - an implementation of the Carr-Parrinello method )