PETE (Portable Expression Template Engine) is an extensible implementation of the expression template technique. This technique uses C++ recursively-defined templates for transforming certain kinds of C++ statements into other statements with the same effect but higher performance. As an example, it can transform an array statement such as

 A = B + C * D;
into a single loop of the form
 for (i = ... ; ... ; ...) {
A[i] = B[i] + C[i] * D[i];

Evaluating such array expressions in a single loop is of fundamental importance for performance. Using simple C++ overloading, it is difficult (practically impossible) to obtain this kind of translation. Furthermore, PETE has the advantage that the entire transformation is done by the C++ compiler, without requiring separate tools.

Expression templates work by creating, at compile time, an object representing the parse tree of an expression. Consider the following expression and its representation as a parse tree and as a PETE generated object:

A = -B + 2 * C;
Parse Tree: PETE Generated Object:
PETE Parse Tree
ArrayA, BinaryNode<OpAdd,
UnaryNode<OpUnaryMinus, ArrayB>
Scalar<int> ArrayC> > >

Notice how the structure of the parse tree is encoded in the type. Stored within the object are unary and binary operator objects at non-leaf nodes and container objects (such as references to arrays or scalars) at the leaves.

Once an expression object is generated, PETE provides facilities for traversing the parse tree, getting information from the leaves, and combining the results. One of the most useful parse tree operations is evaluating the expression. If the expression object generated above is named then the line of code,

 forEachTag(expr, EvalFunctorTag1(i), OpCombineTag()); 

could be used to traverse the tree, obtain the i-th value from the arrays at the leaves, and combine them according to the arithmetic operators at the nodes. PETE is constructed so that the code above is translated into:

 A[i] = -B[i] + 2 * C[i];

PETE also contains a system for handling type promotions, including facilities for handling user defined types. This means that the arrays A, B, and C can all have different element types and the expression will be evaluated in the manner that users expect.

PETE can be used with user-defined container classes. The simplest mode of operation requires the container class to inherit from a base class supplied by PETE and to define a few member functions and type definitions. An existing container class can also be made PETE-aware without modifying the class directly. In this case, the user is required to supply some external traits classes and write appropriate operator functions. PETE supplies Perl scripts to help with this process.

While expression evaluation is the most well-known use of expression templates, PETE provides facilities for performing arbitrary operations involving parse trees. One can define "Functors", like EvalFunctorTag1 from the example above, that perform arbitrary operations at the tree leaves and "Combiners", like OpCombineTag, that glue results together. These functors and combiners can be used, for example, to check conformance of the arrays in the expression, to print out a textual version of the expression, or to implement optimizations based on special characteristics of the arrays in the expression (like unit stride).


In general, PETE is useful to writers of C++ scientific libraries who want to implement high level mathematical abstractions and simultaneously obtain high performance (for example to provide array statements in the style of FORTRAN 90 or HPF).

PETE is used in the POOMA framework. There have been plans to use it in A++/P++ as well (a package used by Overture), but the current plan is to use a separate preprocessor called ROSE.


PETE is reliant on "next generation" compiler technology because of its extensive use of templates. Many current generation (and past generation) compilers do not support the required template functionality and will not be able to compile PETE based programs (see the status section below). Even when a compiler supports the required functionality, it often requires large amounts of time and memory to perform the necessary template expansions. This can make it impossible to compile programs which use PETE on some platforms (for instance, the Cray T3E) and expensive on several other platforms. Indeed, the biggest hurdle in the use of expression templates is the long compilation time typically incurred.

The use of expression templates can result in very good performance, although there are cases in which its efficiency is limited. The use of operator overloading (as exploited by expression templates) will allow only the optimization of single statements. For instance, two statements

 A = B + C * D;
E = A * F

may at best be transformed into two separate loops. The PETE development team is exploring possibilities for loop fusion through statements such as

 fuse (A = B + C * D , E = A * F);

NERSC has not evaluated PETE on a realistic C++ code, partly owing the the aforementioned compilation difficulties. However, you can submit your own evaluation of PETE if you would like to.


PETE has not been installed on NERSC machines. PETE does compile on most UNIX platforms (including Linux) with KCC 3.2 or later or EGCS 1.1 or later. It compiles on the Macintosh and Windows platforms using Metrowerks CodeWarrior Professional 4.

For further information, please visit CodeSourcery


The PETE website contains information about the tool. The expression template technique itself is not trivial. It was created by Todd Velduizhen and David Vandevoorde.


PETE was developed at the Advanced Computing Laboratory at Los Alamos National Laboratory, originally (and still) for use within the POOMA framework, but has since been spun off as a standalone capability. Its principal developers were Steve Karmesin and Scott Haney.

Tools News Project Home