The Particle Simulation Toolkit

Overview

There are many kinds of simulations that are best represented with particle methods, which sample a space with points that move around with no fixed connections to each other. The "particles" commonly interact with some grid-based quantities, and while they often interact with other particles, they can move in an independent manner.

Modern computer architectures work most efficiently with memory references that are localized in time and predictable in space. Particle simulations typically do neither. Because the particles can move in unpredictable ways, the memory reference pattern they generate is often random, resulting in inefficient use of the hardware. Many optimizations are known for this problem, but they often do not get incorporated in real codes because they are complex and may obscure the physics being modeled by the codes.

The goal of the PST is to simplify the construction of complex particle codes by providing a high-level interface and to allow the time and space localization optimizations to be separated from the numerical algorithms and physical models that are represented by the code.

This page has a quick summary of the project, and these links have more detailed information about the PST. This document is focused on describing functionality rather than syntax. A detailed description of the syntax is available with the library itself.

Approach

Our approach is to build on our experience writing particle classes in POOMA 1.0 and generic programming in POOMA 2.0. The basic implementation will be in C++ because the availability of many features in that language greatly simplifies generic library construction. Users of C and Fortran will be able to utilize the PST through a procedural interface which wraps the flat C and FORTRAN structures in PST structures. This is a technique this team has used before on other projects such as PAWS.

The key components of the PST are the Particles base class and the Attribute class. The Particles class contains a list of Attribute objects describing the particles to be modeled. Each Attribute is a distributed vector of elements of arbitrary type, with one element to hold the value of that variable for each particle. The Particles class provides methods to access each Attribute and to add or remove individual particles. The user can create his or her own unique type of particles representation by creating a new class derived from Particles that has specific types of Attribute objects as data members. This new user-defined class can register its Attribute objects and then all of the methods of Particles will operate on each of the Attributes of this new class, making this representation completely general. For some common cases, we will provide sample classes derived from Particles that provide arrays of built-in scalar types and can be constructed directly in C++ or via a C or Fortran interface.

The Attribute data distribution across multiple processors will be managed by one of several Layout classes that we will provide. The Layout classes will implement different parallel decomposition strategies. The most common of these strategies is a "spatial layout", in which the particles are assumed to move within a block-decomposed spatial domain and we keep particle data locality with any field data that exists on this domain. This is done so that when particles interact with field quantities (i.e., via a gather or scatter operation), such operations are local to a processor and therefore fast. Once Layout objects are set up, data distribution can handled automatically with a simple function call, and we will provide several load balancing classes that can repartition the domain based on different optimization strategies.

There are a few different ways that the user can interact with Attribute data in the PST.

Our intent is to provide generic data containers for particle simulations that are dynamically resizable and distributed in memory, can be used in standard mathematical expressions, and support features common to many types of particle simulation.

Tool Availability

The PST is not yet available. We anticipate an initial release of PST classes in early 2000. It will be distributed freely from the ACL software web page and the PST web page.

Because the PST and PETE make extensive use of the template mechanism in C++ and templates are a relatively advanced feature, not all compilers will be able to compile the library. The KCC compiler from KAI and the EGCS compiler from Cygnus work well on all Unix platforms for advanced C++ source code, along with the SGI CC compiler on Irix platforms, and CodeWarrior from Metrowerks and the Intel C++ compiler on Macintosh and Windows platforms. For users that only link against the library (such as C and Fortran users) there are no restrictions on the compiler to use once the library itself is compiled.

Participants

The primary participants in the PST project have been Bill Humphrey, Julian Cummings, Stephen Smith and Steve Karmesin, all of the Advanced Computing Lab at Los Alamos National Laboratory. 

Interoperability in the PST

A key goal of DOE 2000 is to build tools that are interoperable. Interoperability has been a persistent problem in scientific computing, and we lay out here some of the reasons this has been the case and how we are approaching that problem.

Why is interoperability so hard and so rare?

The problem of the PST and of DOE2000 more generally, is that most of our projects hit all of these problems. Once you make even an array parallel, a lot more information has to be specified in a data structure somewhere about how it is parallelized, and there are a lot of different choices for how to parallelize it. Parallel tools is a comparatively niche market, and cannot support the number and quality of tools available for serial computation. We're trying to write interoperable tools, and that means that someone other than the developer will use it, and that means moving toward production code. DOE2000 is fairly unique in supporting tool construction and pushing interoperability.

External Polymorphism and Traits

If a library such as the PST is going to be able to work with (fairly) arbitrary user data formats, there needs to be a way for the user to describe that format to the library. The library needs to clearly define the types of operations that it will need to do with the user's data structure. For example, the PST needs to be able to insert objects into the container, delete elements from it, do random access within it, and so on.

The crux is that the library generally doesn't just specify the generic operations, it specifies very specific syntax for how to do those operations. There are many subtle choices to be made in the syntax and the precise meaning of each syntax construction, and it is on this sort of mundane issue that much interoperability founders.

External polymorphism and traits are specific techniques for encapsulating an external data structure and allowing the user to supply their own data structure along with instructions on how to use it. They greatly simplify the task of coming up with a consistent interface for the PST to use and which a user can supply. By using these techniques, the actual data format can be a user choice, not a library choice, and we think that is a key step toward routine library interoperability.

Leveraging

Because writing production code is expensive and no one program can provide all the funding necessary to bring a large body of work to production level, we leverage this work to the extent we can off of work that others are funding. That means that the PST can ride on top of the Array-Engine work done for other customers and have enough resources accomplish goals it could not accomplish starting from scratch. Some of those customers care only about having bulletproof code, others care more about the papers generated.

In order to leverage off of our previous work in the POOMA, SMARTS, and other libraries in the ACL, we are in the process of factoring those libraries so that the PST (and other packages) can use the capabilities it needs without importing capabilities it does not need. For example, the PST does not need the complexities of expression evaluation in C++, but it does need the domain calculus and communication layer. By factoring POOMA so it can use only what it needs, we can leverage previous work without burdening customers with more than they need.

Object Oriented Programming and Generics

In order to create code to implement the Array-Engine pattern, we have to use sophisticated tools, and the C++ language provides those tools.

Modern C++ provides object oriented mechanisms (inheritance, virtual functions, etc.) and generic mechanism (templates) as well as the low level elements necessary for programming close to the machine (pointers, bitwise operations) necessary for representing complex data structures and algorithms efficiently. The sort of separation that is achieved with Arrays and Engines cannot be done without appropriate tools.

That is not to say that only sophisticated C++ programmers can take advantage of a library such as the PST. The library itself can be written using these techniques, and then export a C or FORTRAN interface. The calling program then need not care what language the library is written in. The C++ user would have a richer or more convenient interface with which to manipulate the PST, but the functionality is available more widely. 


Computations Using Particles

Many simulations are most conveniently and efficiently done with particles. By "particles", we mean sets of points that generally move during the computation in ways that do not preserve connectivity. This distinguishes them from moving meshes, which generally preserve some sort of connectivity with nearby points as the grid points move.

Particle-Particle

In many cases the forces on the particles are determined by very nearby particles, and the computation takes the form of looping over those particles to find the forces, updating the velocities and positions of all the particles and repeating.

Some typical examples of this sort of computation are Smooth Particle Hydrodynamics and many kinds of semi-classical molecular simulations.

Particle-Mesh

Some forms of particle simulations have important computations that involve a mesh of some kind. An example of this is a Particle In Cell (PIC) code for an application like a fusion tokamak, in which the particles represent clusters of electrons and ions. The forces between the particles could be calculated directly by calculating the forces from all the other particles, but that would be very inefficient. It is more efficient in cases like this to find electromagnetic fields on a mesh by finding an averaged charge and current densities on the mesh, find the implied electromagnetic fields, and apply those fields to the particles again. The only interactions between the particles are mediated by the mesh.

Plasma simulations are a very common form of PIC because the interaction length scale is typically much longer than mean distance between particles, so the force on any given particle is the sum of many small interactions and is well represented by a smooth field represented on a mesh.

Combinations and Complexities

There are of course many situations in which both kinds of forces are used, and complex ways in which they are combined.

At high enough plasma densities, the hard interactions in a plasma simulation are important because they represent fast momentum and energy transfers that otherwise would not be properly handled. Direct Simulation Monte Carlo (DSMC) is a technique for smoothly matching a fluid simulation taking place purely on a grid to a surface with a length scale comparable to the mean free path, and through that transition both grid and particle quantities must be handled in consistent ways.

Because there are a huge number of different ways in which these computations need to be constructed, a library to facilitate them must be careful not to box in the user. It must provide generic mechanisms for fundamental operations (gather/scatter, select subsets of particles, loop over particles, define attributes of particles, etc) rather than seemingly high level but inflexible "deposit charge" functions. These low level operations can then be combined flexibly and recursively to build common higher level operations while allowing users to open the hood write arbitrary algorithms while still taking advantage of the bookkeeping functions for operations like moving particles between processors using Layouts. 


PST Particles Objects

The PST Particles objects represent, unsurprisingly, a collection of particles. Each particle has a set of attributes such as mass, momentum, charge, etc., and all of the particles in a single Particles object has the same attributes.

There are a couple of possible ways to handle that, and the PST team has explored the implications of them for performance, flexibility and generality. Our decision has been to use what is often called the "multiple arrays" technique, which means that the Particles object holds an array for each attribute rather than a single array of particle objects, where each particle object has all of the attributes for each particle.

This technique has the following advantages:

Synchronization

Maintaining multiple arrays for the different attributes introduces complexity in the form of keeping the arrays synchronized. Whenever you add or delete or reorder the particles, the attributes must be kept synchronized. A primary task of the Particles object is to maintain this synchronization for its attributes. It is of course entirely possible to have multiple Particles objects in a simulation, and no synchronization is done between those Particles objects so the user can maintain separately managed sets of particles.

This synchronization is maintained by making sure that all of the operations available in the Particles class (add, delete, reorder, etc.) are synchronization preserving.

Data Registration and Access

The Particles class itself -- which has all the synchronization algorithms -- is actually an abstract base class. All of the classes that define ways of holding Attributes are subclasses of that, so there can be different ways of holding attributes and completely different kinds of attribute classes, but they all reuse the same synchronization code.

Several of these concrete subclasses are provided, both to be used directly and as examples of how to design your own if the provided ones do not have the necessary functionality.

An important point is that these choices are not orthogonal, particularly between the different representations and the static versus dynamic attributes. By combining generics and inheritance, we allow the user user great flexibility in writing orthogonal code to represent orthogonal ideas. 

PST Attributes

A PST Attribute is, essentially, an array of data elements, one for each particle. That data element can be a compound object -- a struct in C to represent a momentum vector, for example -- but as far as the Attribute object is concerned, each element is an opaque object.

The Attribute object provides two kinds of services:

Generic Interface

A major goal of the PST is to allow the user to choose the data representation, but that introduces a tension with allowing the PST to manipulate the data also. Most existing libraries resolve this tension by defining one or a small number of allowable formats and requiring the user to conform to that. When there are multiple libraries in use, however, this will often mean the user is saddled with a great deal of complexity in mapping between the representations the two libraries require.

The PST resolves this tension by giving the user the ability to describe the format for the data, and prebuilding descriptions for some common cases. It places few restrictions on the details of the interface for the Attribute object the user supplies, and instead has the user provide a traits class to describe the interface.

For example, some arrays are not resizable, and those that are can have different interfaces to actually resize it. The traits class the user supplies records whether it is resizable and if it is, what is the function to call to resize it.

C and FORTRAN Interface

The techniques used to create this generic interface are in C++, and are not available in C and FORTRAN. That means that only C++ code can define completely new Attribute types, but there is still a lot of flexibility available for C and FORTRAN because the fact that it is easy to define new storage formats means that it is easy to prepackage common C and FORTRAN formats.

Conventional FORTRAN 77 arrays, for example, are not resizable, and allocatable arrays have a well defined interface. In C most arrays are manipulated with malloc/free/realloc, and you could allow the user to supply their own functions for each of those. In C++ there are the containers in the standard library which have their own interface, and parallel POOMA arrays with still another. By packaging the interface specification in a user supplied traits class and using that to supply a number of prebuilt ones, ease of use and extensibility are provided. 


PST Calculations with Particles

There are lots of ways to represent particle data because there are lots of different ways in which calculations need to be specified. The PST has three goals in specifying calculations: There are a few different ways for the user to specify the calculation using the PST.

Sort, Load Balance, Create, Destroy

These are the basic bookkeeping operations. The PST allows the user to specify the fundamentals of the operations without getting too involved with the bookkeeping. For example, the user says "Put all the particles whose locations are within a given rectangle on a given CPU" instead of the much more involved description that goes something like "if the third element of this structure is less than the second element of that one ... allocate a message buffer ... call MPI_Send ... call MPI_Recv ... check to see if we have space for the new particles on this CPU ... " and so on.

Gather/Scatter with Stencils

In most particle codes, the majority of the compute time goes to gather and scatter operations between two different data structures like particles and a grid, and often the specifics of that calculation are long and involved. The PST provides a mechanism for building deposit and interoplate stencil objects to encapsulate that. The PST then handles looping over the particles, making sure the guard cells have been updated, and lets the user specify the calculation to be done.

DIY: They're your arrays. Use them.

An example of a time when interoperability is crucial is when a PIC code needs to solve a Poisson-like field equation on a grid. The grid arrays then have to be handed off to a transform or other solver package, and since they are fundamentally your arrays, not the PST's arrays, you're free to do that and the PST is out of the way. 

PST Layouts and Load Balancing

Handling particle layouts and load balancing is perhaps the largest headache in writing particle codes. The particles can move arbitrarily over the course of most simulations, so it is imperative to rebalance them. Because simulations also vary greatly in just how the particles move and just what other data structures they have to interact with while they're moving, the method of specifying the rebalancing is often part of the research being conducted by the user, so it must be flexible and extensible.

Particle Layout

A ParticleLayout class defines a mapping between a group of particles and a CPU. Like the attribute layout is not actually a single class, or even an abstract base class. It is a template parameter that simply implements a certain interface and which can be supplied by a user.

A couple of prepackaged particle layout objects are the trivial one which never moves any particles, and a spacial layout which defines mappings between regions of physical space and CPUs. These are fairly abstract representations of how to map particles to CPUs.

Attribute Layout

The attribute layout is the more concrete representation of how you map specific data structures to memory spaces. It keeps the different attributes synchronized so that the elements that represent the attributes for a particular particle remain together, and it coordinates the interaction with the messaging system. 
Steve Karmesin

Last modified: Thu Jan 13 13:39:17 MST 2000