The Particle Simulation Toolkit
Overview
There are many kinds of simulations that are best represented with particle
methods, which sample a space with points that move around with no fixed
connections to each other. The "particles" commonly interact with some
grid-based quantities, and while they often interact with other particles,
they can move in an independent manner.
Modern computer architectures work most efficiently with memory references
that are localized in time and predictable in space. Particle simulations
typically do neither. Because the particles can move in unpredictable ways,
the memory reference pattern they generate is often random, resulting in
inefficient use of the hardware. Many optimizations are known for this
problem, but they often do not get incorporated in real codes because they
are complex and may obscure the physics being modeled by the codes.
The goal of the PST is to simplify the construction of complex particle
codes by providing a high-level interface and to allow the time and space
localization optimizations to be separated from the numerical algorithms
and physical models that are represented by the code.
This page has a quick summary of the project, and these links have more
detailed information about the PST. This document is focused on describing
functionality rather than syntax. A detailed description of the syntax
is available with the library itself.
Approach
Our approach is to build on our experience writing particle classes in
POOMA 1.0 and generic programming in POOMA
2.0. The basic implementation will be in C++ because the availability
of many features in that language greatly simplifies generic library construction.
Users of C and Fortran will be able to utilize the PST through a procedural
interface which wraps the flat C and FORTRAN structures in PST structures.
This is a technique this team has used before on other projects such as
PAWS.
The key components of the PST are the Particles base class and the Attribute
class. The Particles class contains a list of Attribute objects describing
the particles to be modeled. Each Attribute is a distributed vector of
elements of arbitrary type, with one element to hold the value of that
variable for each particle. The Particles class provides methods to access
each Attribute and to add or remove individual particles. The user can
create his or her own unique type of particles representation by creating
a new class derived from Particles that has specific types of Attribute
objects as data members. This new user-defined class can register its Attribute
objects and then all of the methods of Particles will operate on each of
the Attributes of this new class, making this representation completely
general. For some common cases, we will provide sample classes derived
from Particles that provide arrays of built-in scalar types and can be
constructed directly in C++ or via a C or Fortran interface.
The Attribute data distribution across multiple processors will be managed
by one of several Layout classes that we will provide. The Layout classes
will implement different parallel decomposition strategies. The most common
of these strategies is a "spatial layout", in which the particles are assumed
to move within a block-decomposed spatial domain and we keep particle data
locality with any field data that exists on this domain. This is done so
that when particles interact with field quantities (i.e., via a gather
or scatter operation), such operations are local to a processor and therefore
fast. Once Layout objects are set up, data distribution can handled automatically
with a simple function call, and we will provide several load balancing
classes that can repartition the domain based on different optimization
strategies.
There are a few different ways that the user can interact with Attribute
data in the PST.
-
As far as the Particles class is concerned, it just contains a set of Attributes,
and it manages them to make sure that they all represent the same number
of particles with the same distribution.
-
The Particles class then provides access to individual Attribute elements
so the user can manipulate those arrays however they wish. In many situations,
including all C and FORTRAN uses and some C++ uses, the user will simply
loop over these local arrays to manipulate the particle data.
-
In C++, the Attribute class can be "expression enabled" using, for example,
the Portable Expression Template Engine, or PETE,
so that one can write simple data-parallel mathematical expressions involving
Attributes and scalar types that are evaluated very efficiently.
-
The user may impose "boundary conditions" on an Attribute that take a particular
action when a particle's value for that Attribute goes outside a specified
range.
-
For users writing particle-in-cell (PIC) simulations, we will provide a
simple interface for interpolating data between particle positions and
mesh locations with a choice of interpolation schemes for both writing
to a mesh and reading from a mesh.
-
For simulations involving the computation of interactions between nearby
particles, an Attribute is capable of maintaining a separate list of data
for particles stored with sub-domains in non-local memory. That way, interaction
force calculations involving spatially-neighboring particles can be performed
efficiently on parallel computers.
Our intent is to provide generic data containers for particle simulations
that are dynamically resizable and distributed in memory, can be used in
standard mathematical expressions, and support features common to many
types of particle simulation.
Tool Availability
The PST is not yet available. We anticipate an initial release of PST classes
in early 2000. It will be distributed freely from the ACL software web
page and the PST web page.
Because the PST and PETE make extensive use of the template mechanism
in C++ and templates are a relatively advanced feature, not all compilers
will be able to compile the library. The KCC compiler from KAI and the
EGCS compiler from Cygnus work well on all Unix platforms for advanced
C++ source code, along with the SGI CC compiler on Irix platforms, and
CodeWarrior from Metrowerks and the Intel C++ compiler on Macintosh and
Windows platforms. For users that only link against the library (such as
C and Fortran users) there are no restrictions on the compiler to use once
the library itself is compiled.
Participants
The primary participants in the PST project have been Bill Humphrey, Julian
Cummings, Stephen Smith and Steve Karmesin, all of the Advanced Computing
Lab at Los Alamos National Laboratory.
Interoperability in the PST
A key goal of DOE 2000 is to build tools that are interoperable. Interoperability
has been a persistent problem in scientific computing, and we lay out here
some of the reasons this has been the case and how we are approaching that
problem.
Why is interoperability so hard and so rare?
-
Data formats. The LINPACK and LAPACK libraries have been very successful,
and a big reason for that is that everybody knows and understands the data
format - FORTRAN order two dimensional arrays - and that format is well
suited to the computation at hand. In many cases the fact that different
packages have their data in conflicting formats is a fatal barrier to having
them interoperate.
-
Tools. If dense arrays of numbers in a single memory space with
a single thread of control what all you need, the primary tool of the scientific
developer - the FORTRAN language - does a good job. Other languages (such
as Matlab, C++ and Mathematica for different types of tasks) have only
recently gotten to the point where they are really useful for mainstream
scientific computing. The state of tools for scientific computing is vastly
more primitive than it is for GUI or even Web programming.
-
Production versus prototype software. Producing code that someone
else can use out of the box on a problem different from one that the original
author has done is much harder than producing code only for the developer's
own use. Industrial estimates of the cost of moving prototype code to production
range from a factor of three to a factor of ten above producing the prototype.
-
Reward systems. Individual scientists are usually not trained in
writing reusable software, are rarely rewarded for that skill, and projects
are rarely rewarded for taking a software project past the prototype stage.
A prototype will get you a paper, and taking the code from there to production
quality will rarely generate a new paper.
The problem of the PST and of DOE2000 more generally, is that most of our
projects hit all of these problems. Once you make even an array parallel,
a lot more information has to be specified in a data structure somewhere
about how it is parallelized, and there are a lot of different choices
for how to parallelize it. Parallel tools is a comparatively niche market,
and cannot support the number and quality of tools available for serial
computation. We're trying to write interoperable tools, and that means
that someone other than the developer will use it, and that means moving
toward production code. DOE2000 is fairly unique in supporting tool construction
and pushing interoperability.
External Polymorphism and Traits
If a library such as the PST is going to be able to work with (fairly)
arbitrary user data formats, there needs to be a way for the user to describe
that format to the library. The library needs to clearly define the types
of operations that it will need to do with the user's data structure. For
example, the PST needs to be able to insert objects into the container,
delete elements from it, do random access within it, and so on.
The crux is that the library generally doesn't just specify the generic
operations, it specifies very specific syntax for how to do those operations.
There are many subtle choices to be made in the syntax and the precise
meaning of each syntax construction, and it is on this sort of mundane
issue that much interoperability founders.
External polymorphism and traits are specific techniques for encapsulating
an external data structure and allowing the user to supply their own data
structure along with instructions on how to use it. They greatly simplify
the task of coming up with a consistent interface for the PST to use and
which a user can supply. By using these techniques, the actual data format
can be a user choice, not a library choice, and we think that is a key
step toward routine library interoperability.
Leveraging
Because writing production code is expensive and no one program can provide
all the funding necessary to bring a large body of work to production level,
we leverage this work to the extent we can off of work that others are
funding. That means that the PST can ride on top of the Array-Engine work
done for other customers and have enough resources accomplish goals it
could not accomplish starting from scratch. Some of those customers care
only about having bulletproof code, others care more about the papers generated.
In order to leverage off of our previous work in the POOMA, SMARTS,
and other libraries in the ACL, we are in the process of factoring those
libraries so that the PST (and other packages) can use the capabilities
it needs without importing capabilities it does not need. For example,
the PST does not need the complexities of expression evaluation in C++,
but it does need the domain calculus and communication layer. By factoring
POOMA so it can use only what it needs, we can leverage previous work without
burdening customers with more than they need.
Object Oriented Programming and Generics
In order to create code to implement the Array-Engine pattern, we have
to use sophisticated tools, and the C++ language provides those tools.
Modern C++ provides object oriented mechanisms (inheritance, virtual
functions, etc.) and generic mechanism (templates) as well as the low level
elements necessary for programming close to the machine (pointers, bitwise
operations) necessary for representing complex data structures and algorithms
efficiently. The sort of separation that is achieved with Arrays and Engines
cannot be done without appropriate tools.
That is not to say that only sophisticated C++ programmers can take
advantage of a library such as the PST. The library itself can be written
using these techniques, and then export a C or FORTRAN interface. The calling
program then need not care what language the library is written in. The
C++ user would have a richer or more convenient interface with which to
manipulate the PST, but the functionality is available more widely.
Computations Using Particles
Many simulations are most conveniently and efficiently done with particles.
By "particles", we mean sets of points that generally move during the computation
in ways that do not preserve connectivity. This distinguishes them from
moving meshes, which generally preserve some sort of connectivity with
nearby points as the grid points move.
Particle-Particle
In many cases the forces on the particles are determined by very nearby
particles, and the computation takes the form of looping over those particles
to find the forces, updating the velocities and positions of all the particles
and repeating.
Some typical examples of this sort of computation are Smooth Particle
Hydrodynamics and many kinds of semi-classical molecular simulations.
Particle-Mesh
Some forms of particle simulations have important computations that involve
a mesh of some kind. An example of this is a Particle In Cell (PIC) code
for an application like a fusion tokamak, in which the particles represent
clusters of electrons and ions. The forces between the particles could
be calculated directly by calculating the forces from all the other particles,
but that would be very inefficient. It is more efficient in cases like
this to find electromagnetic fields on a mesh by finding an averaged charge
and current densities on the mesh, find the implied electromagnetic fields,
and apply those fields to the particles again. The only interactions between
the particles are mediated by the mesh.
Plasma simulations are a very common form of PIC because the interaction
length scale is typically much longer than mean distance between particles,
so the force on any given particle is the sum of many small interactions
and is well represented by a smooth field represented on a mesh.
Combinations and Complexities
There are of course many situations in which both kinds of forces are used,
and complex ways in which they are combined.
At high enough plasma densities, the hard interactions in a plasma simulation
are important because they represent fast momentum and energy transfers
that otherwise would not be properly handled. Direct Simulation Monte Carlo
(DSMC) is a technique for smoothly matching a fluid simulation taking place
purely on a grid to a surface with a length scale comparable to the mean
free path, and through that transition both grid and particle quantities
must be handled in consistent ways.
Because there are a huge number of different ways in which these computations
need to be constructed, a library to facilitate them must be careful not
to box in the user. It must provide generic mechanisms for fundamental
operations (gather/scatter, select subsets of particles, loop over particles,
define attributes of particles, etc) rather than seemingly high level but
inflexible "deposit charge" functions. These low level operations can then
be combined flexibly and recursively to build common higher level operations
while allowing users to open the hood write arbitrary algorithms while
still taking advantage of the bookkeeping functions for operations like
moving particles between processors using Layouts.
PST Particles Objects
The PST Particles objects represent, unsurprisingly, a collection of particles.
Each particle has a set of attributes such as mass, momentum, charge, etc.,
and all of the particles in a single Particles object has the same attributes.
There are a couple of possible ways to handle that, and the PST team
has explored the implications of them for performance, flexibility and
generality. Our decision has been to use what is often called the "multiple
arrays" technique, which means that the Particles object holds an array
for each attribute rather than a single array of particle objects, where
each particle object has all of the attributes for each particle.
This technique has the following advantages:
-
Performance. Many particle calculations don't involve all of the
particle attributes in any one loop. On a cache based machine if the attributes
for each particle are stored together, data that will not get used in a
particular calculation will have to be loaded into cache, which is a performance
bottleneck. On vector machines, keeping the attributes for each particle
together imply a non-unit stride for any given attribute, which often results
in performance degradation.
-
Flexibility. If the attributes for each particle were stored together
you would not be able to vary the specific attributes in a Particles class
at run time. With separate arrays for each attribute, you can add and delete
attributes at will.
-
Generality. If a given code really would benefit from storing the
particle attributes together, this behavior can be recovered by having
a single attribute which is a composite object that contains all of the
attributes for each particle.
Synchronization
Maintaining multiple arrays for the different attributes introduces complexity
in the form of keeping the arrays synchronized. Whenever you add or delete
or reorder the particles, the attributes must be kept synchronized. A primary
task of the Particles object is to maintain this synchronization for its
attributes. It is of course entirely possible to have multiple Particles
objects in a simulation, and no synchronization is done between those Particles
objects so the user can maintain separately managed sets of particles.
This synchronization is maintained by making sure that all of the operations
available in the Particles class (add, delete, reorder, etc.) are synchronization
preserving.
Data Registration and Access
The Particles class itself -- which has all the synchronization algorithms
-- is actually an abstract base class. All of the classes that define ways
of holding Attributes are subclasses of that, so there can be different
ways of holding attributes and completely different kinds of attribute
classes, but they all reuse the same synchronization code.
Several of these concrete subclasses are provided, both to be used directly
and as examples of how to design your own if the provided ones do not have
the necessary functionality.
-
Staticly defined attributes. If a given application knows that it
will never have to add and delete attribute arrays, simplifications can
be made in the implementation and the interface.
-
Dynamically defined attributes. When the user requires the ability
to dynamically add and delete attributes, the more complex subclass would
be used.
-
Different attribute representations. There are FORTRAN arrays that
cannot be resized, C/C++ arrays that can be resized, there are standard
C++ classes like std::vector, there are any number of more complex
array classes that internally handle parallelism, such as the POOMA Array
classes. Different subclasses of Particles can work with different data
representations.
An important point is that these choices are not orthogonal, particularly
between the different representations and the static versus dynamic attributes.
By combining generics and inheritance, we allow the user user great flexibility
in writing orthogonal code to represent orthogonal ideas.
PST Attributes
A PST Attribute is, essentially, an array of data elements, one for each
particle. That data element can be a compound object -- a struct in C to
represent a momentum vector, for example -- but as far as the Attribute
object is concerned, each element is an opaque object.
The Attribute object provides two kinds of services:
-
Synchronization control. This is the interface the Particles object
sees, and which it uses to keep the various Attributes it is controlling
in synch. Array resizing operations are fundamental here, and the numerical
user is not intended to use these controls directly.
-
Element access. This is the interface the user sees to manipulate
the data in the Attribute to perform the computation. This usually comes
down to random access on a one dimensional array. This interface is also
used by the Particles class to insert and delete elements.
Generic Interface
A major goal of the PST is to allow the user to choose the data representation,
but that introduces a tension with allowing the PST to manipulate the data
also. Most existing libraries resolve this tension by defining one or a
small number of allowable formats and requiring the user to conform to
that. When there are multiple libraries in use, however, this will often
mean the user is saddled with a great deal of complexity in mapping between
the representations the two libraries require.
The PST resolves this tension by giving the user the ability to describe
the format for the data, and prebuilding descriptions for some common cases.
It places few restrictions on the details of the interface for the Attribute
object the user supplies, and instead has the user provide a traits
class to describe the interface.
For example, some arrays are not resizable, and those that are can have
different interfaces to actually resize it. The traits class the user supplies
records whether it is resizable and if it is, what is the function to call
to resize it.
C and FORTRAN Interface
The techniques used to create this generic interface are in C++, and are
not available in C and FORTRAN. That means that only C++ code can define
completely new Attribute types, but there is still a lot of flexibility
available for C and FORTRAN because the fact that it is easy to define
new storage formats means that it is easy to prepackage common C and FORTRAN
formats.
Conventional FORTRAN 77 arrays, for example, are not resizable, and
allocatable arrays have a well defined interface. In C most arrays are
manipulated with malloc/free/realloc, and you could allow the user to supply
their own functions for each of those. In C++ there are the containers
in the standard library which have their own interface, and parallel POOMA
arrays with still another. By packaging the interface specification in
a user supplied traits class and using that to supply a number of prebuilt
ones, ease of use and extensibility are provided.
PST Calculations with Particles
There are lots of ways to represent particle data because there are lots
of different ways in which calculations need to be specified. The PST has
three goals in specifying calculations:
-
Get out of the way. A primary goal of any advanced library must
be to be sure to get out of the way of doing simple things. For the PST
that means that it has to be able to only add capabilities to the users'
data structures, so that the they can do everything they could do without
the PST, plus new things.
-
Encapsulate the bookkeeping. Often the most complex part of a code
that does particle computations is the bookkeeping of packing messages,
unpacking them, sorting the particles for cache performance, looping over
subsets of the particles, and so on. There is no way of hiding from the
user that these things have to happen, but you can encapsulate the mechanics
of doing it.
-
Prefer several small orthogonal tools to one monolithic integrated one.
Whenever you can break a large tool up into several smaller ones that work
together but don't overlap, you increase the flexibility available to the
user.
There are a few different ways for the user to specify the calculation
using the PST.
Sort, Load Balance, Create, Destroy
These are the basic bookkeeping operations. The PST allows the user to
specify the fundamentals of the operations without getting too involved
with the bookkeeping. For example, the user says "Put all the particles
whose locations are within a given rectangle on a given CPU" instead of
the much more involved description that goes something like "if the third
element of this structure is less than the second element of that one ...
allocate a message buffer ... call MPI_Send ... call MPI_Recv ... check
to see if we have space for the new particles on this CPU ... " and so
on.
Gather/Scatter with Stencils
In most particle codes, the majority of the compute time goes to gather
and scatter operations between two different data structures like particles
and a grid, and often the specifics of that calculation are long and involved.
The PST provides a mechanism for building deposit and interoplate stencil
objects to encapsulate that. The PST then handles looping over the particles,
making sure the guard cells have been updated, and lets the user specify
the calculation to be done.
DIY: They're your arrays. Use them.
An example of a time when interoperability is crucial is when a PIC code
needs to solve a Poisson-like field equation on a grid. The grid arrays
then have to be handed off to a transform or other solver package, and
since they are fundamentally your arrays, not the PST's arrays, you're
free to do that and the PST is out of the way.
PST Layouts and Load Balancing
Handling particle layouts and load balancing is perhaps the largest headache
in writing particle codes. The particles can move arbitrarily over the
course of most simulations, so it is imperative to rebalance them. Because
simulations also vary greatly in just how the particles move and just what
other data structures they have to interact with while they're moving,
the method of specifying the rebalancing is often part of the research
being conducted by the user, so it must be flexible and extensible.
Particle Layout
A ParticleLayout class defines a mapping between a group of particles and
a CPU. Like the attribute layout is not actually a single class, or even
an abstract base class. It is a template parameter that simply implements
a certain interface and which can be supplied by a user.
A couple of prepackaged particle layout objects are the trivial one
which never moves any particles, and a spacial layout which defines mappings
between regions of physical space and CPUs. These are fairly abstract representations
of how to map particles to CPUs.
Attribute Layout
The attribute layout is the more concrete representation of how you map
specific data structures to memory spaces. It keeps the different attributes
synchronized so that the elements that represent the attributes for a particular
particle remain together, and it coordinates the interaction with the messaging
system.
Steve Karmesin
Last
modified: Thu Jan 13 13:39:17 MST 2000