Comparing Isosurface With and Without Scatter Classes

Jump to: navigation, search

We have recently added scatter classes to VTK-m, which allows building worklets that have a variable amount of output rather than a 1:1 relationship. This is particularly important for algorithms like isosurface that require specifying some amount of output depending on values of the input. Before these scatter classes were introduced, algorithms like isosurface had to build their own indices and then perform their own cross lookup. In addition to being inconvenient and difficult to read, it also meant that the algorithm could not take advantage of other VTK-m features available in the dispatcher/transport/fetch location.

This document compares the implementation of isosurface just before moving to scatter and just after.

The Code

The classify cell part of the isosurface algorithm has remained essentially the same, so we'll ignore that part. The implementation of the generate part of the algorithm has changed significantly in this transformation. We will look at the code in parts and compare the implementations directly.

Worklet State

Because the original implementation had to build its own indices and perform its own data lookups, it had to store array portals in the state of the worklet as well as auxiliary metadata so that the data could be reconstructed in the body of the worklet. In contrast, the new implementation can leverage VTK-m's built in ability to perform these data lookups. All that needs to be specified in the state is the Scatter class being used.

    const FieldType Isovalue;
    vtkm::Id xdim, ydim, zdim;
    const float xmin, ymin, zmin, xmax, ymax, zmax;

    typedef typename vtkm::cont::ArrayHandle<FieldType>::
        template ExecutionTypes<DeviceAdapter>::PortalConst FieldPortalType;
    FieldPortalType Field, Source;

    typedef typename vtkm::cont::ArrayHandle<vtkm::Vec<FieldType, 3> >::
        template ExecutionTypes<DeviceAdapter>::Portal VectorPortalType;
    VectorPortalType Vertices;
    VectorPortalType Normals;

    typedef typename vtkm::cont::ArrayHandle<FieldType>::
        template ExecutionTypes<DeviceAdapter>::Portal OutputPortalType;
    OutputPortalType Scalars;

    typedef typename vtkm::cont::ArrayHandle<vtkm::Id> IdArrayHandle;
    typedef typename IdArrayHandle::ExecutionTypes<DeviceAdapter>::PortalConst
    IdPortalType TriTable;

    const vtkm::Id cellsPerLayer, pointsPerLayer;
    const FieldType Isovalue;
    ScatterType Scatter;
Original Implementation New implementation

As the amount of state has dramatically reduced, so has the the complexity of the constructor.

    template<typename U, typename W, typename X>
    IsoSurfaceGenerate(FieldType ivalue, const vtkm::Id3 cdims,
                       IdPortalType triTablePortal,
                       const U & field, const U & source,
                       const W & vertices, const W & normals,
                       const X & scalars) :
      xdim(cdims[0]), ydim(cdims[1]), zdim(cdims[2]),
      xmin(-1), ymin(-1), zmin(-1), xmax(1), ymax(1), zmax(1),
      Field( field.PrepareForInput( DeviceAdapter() ) ),
      Source( source.PrepareForInput( DeviceAdapter() ) ),
      cellsPerLayer(xdim * ydim),
      pointsPerLayer ((xdim+1)*(ydim+1))
    template<typename CountArrayType, typename Device>
    IsoSurfaceGenerate(FieldType isovalue,
                       const CountArrayType &countArray,
      : Isovalue(isovalue), Scatter(countArray, Device()) {  }
Original Implementation New implementation


Because the new implementation passes data through the dispatcher's invoke interface rather than as array state. Consequently, the interface of the ControlSignature and ExecutionSignature are much longer in the new interface, but they also specify more features of the transport/fetch.

    typedef void ControlSignature(FieldIn<IdType> inputCellId,
                                  FieldIn<IdType> inputIteration);
    typedef void ExecutionSignature(WorkIndex, _1, _2);
    typedef void ControlSignature(
        TopologyIn topology, // Cell set
        FieldInPoint<> fieldIn, // Input point field defining the contour
        FieldInPoint<Vec3> pcoordIn, // Input point coordinates
        FieldOutCell<> vertexOut, // Vertices for output triangles
        // TODO: Have a better way to iterate over and interpolate fields
        FieldInPoint<Scalar> scalarsIn, // Scalars to interpolate
        FieldOutCell<> scalarsOut, // Interpolated scalars (one per tri vertex)
        FieldOutCell<> normalsOut, // Estimated normals (one per tri vertex)
        ExecObject TriTable // An array portal with the triangle table
    typedef void ExecutionSignature(
        CellShape, _2, _3, _4, _5, _6, _7, _8, VisitIndex);
Original Implementation New implementation

Likewise, the interface to the parentheses operator has grown commensurately.

    void operator()(vtkm::Id outputCellId,
                    vtkm::Id inputCellId,
                    vtkm::Id inputLowerBounds) const
    template<typename CellShapeTag,
             typename FieldInType, // Vec-like, one per input point
             typename CoordType, // Vec-like (one per input point) of Vec-3
             typename VertexOutType, // Vec-3 of Vec-3 coordinates (for triangle)
             typename ScalarInType, // Vec-like, one per input point
             typename ScalarOutType, // Vec-3 (one value per tri vertex)
             typename NormalOutType, // Vec-3 of Vec-3
             typename TriTablePortalType> // Array portal
    void operator()(
        CellShapeTag shape,
        const FieldInType &fieldIn, // Input point field defining the contour
        const CoordType &coords, // Input point coordinates
        VertexOutType &vertexOut, // Vertices for output triangles
        // TODO: Have a better way to iterate over and interpolate fields
        const ScalarInType &scalarsIn, // Scalars to interpolate
        ScalarOutType &scalarsOut, // Interpolated scalars (one per tri vertex)
        NormalOutType &normalsOut, // Estimated normals (one per tri vertex)
        const TriTablePortalType &triTable, // An array portal with the triangle table
        vtkm::IdComponent visitIndex
        ) const
Original Implementation New implementation

The Meat

Of course, the meat of the worklet algorithm is the code inside the parentheses operator. The new implementation of this code is quite simplified because the operations to compute indices and load data are completely removed. Likewise the point coordinates do not have to be directly computed either. The only data not directly provided is the Marching Cubes case.

      // Get data for this cell
      const int verticesForEdge[] = { 0, 1, 1, 2, 3, 2, 0, 3,
                                      4, 5, 5, 6, 7, 6, 4, 7,
                                      0, 4, 1, 5, 2, 6, 3, 7 };

      const vtkm::Id x = inputCellId % xdim;
      const vtkm::Id y = (inputCellId / xdim) % ydim;
      const vtkm::Id z = inputCellId / cellsPerLayer;

      // Compute indices for the eight vertices of this cell
      const vtkm::Id i0 = x    + y*(xdim+1) + z * pointsPerLayer;
      const vtkm::Id i1 = i0   + 1;
      const vtkm::Id i2 = i0   + 1 + (xdim + 1); //xdim is cell dim
      const vtkm::Id i3 = i0   + (xdim + 1);     //xdim is cell dim
      const vtkm::Id i4 = i0   + pointsPerLayer;
      const vtkm::Id i5 = i1   + pointsPerLayer;
      const vtkm::Id i6 = i2   + pointsPerLayer;
      const vtkm::Id i7 = i3   + pointsPerLayer;

      // Get the field values at these eight vertices
      FieldType f[8];
      f[0] = this->Field.Get(i0);
      f[1] = this->Field.Get(i1);
      f[2] = this->Field.Get(i2);
      f[3] = this->Field.Get(i3);
      f[4] = this->Field.Get(i4);
      f[5] = this->Field.Get(i5);
      f[6] = this->Field.Get(i6);
      f[7] = this->Field.Get(i7);

      // Compute the Marching Cubes case number for this cell
      unsigned int cubeindex = 0;
      cubeindex += (f[0] > this->Isovalue);
      cubeindex += (f[1] > this->Isovalue)*2;
      cubeindex += (f[2] > this->Isovalue)*4;
      cubeindex += (f[3] > this->Isovalue)*8;
      cubeindex += (f[4] > this->Isovalue)*16;
      cubeindex += (f[5] > this->Isovalue)*32;
      cubeindex += (f[6] > this->Isovalue)*64;
      cubeindex += (f[7] > this->Isovalue)*128;

      // Compute the coordinates of the uniform regular grid at each of the cell's eight vertices
      vtkm::Vec<FieldType, 3> p[8];

      // If we have offset and spacing, can we simplify this computation
        vtkm::Vec<FieldType, 3> offset = vtkm::make_Vec(xmin+(xmax-xmin),
                                                        zmin+(zmax-zmin) );

        vtkm::Vec<FieldType, 3> spacing = vtkm::make_Vec( 1.0f /((float)(xdim-1)),
                                                          1.0f /((float)(ydim-1)),
                                                          1.0f /((float)(zdim-1)));

        vtkm::Vec<FieldType, 3> firstPoint = offset * spacing *  vtkm::make_Vec( x, y, z );
        vtkm::Vec<FieldType, 3> secondPoint = offset * spacing * vtkm::make_Vec( x+1, y+1, z+1 );

        p[0] = vtkm::make_Vec( firstPoint[0],   firstPoint[1],   firstPoint[2]);
        p[1] = vtkm::make_Vec( secondPoint[0],  firstPoint[1],   firstPoint[2]);
        p[2] = vtkm::make_Vec( secondPoint[0],  secondPoint[1],  firstPoint[2]);
        p[3] = vtkm::make_Vec( firstPoint[0],   secondPoint[1],  firstPoint[2]);
        p[4] = vtkm::make_Vec( firstPoint[0],   firstPoint[1],   secondPoint[2]);
        p[5] = vtkm::make_Vec( secondPoint[0],  firstPoint[1],   secondPoint[2]);
        p[6] = vtkm::make_Vec( secondPoint[0],  secondPoint[1],  secondPoint[2]);
        p[7] = vtkm::make_Vec( firstPoint[0],   secondPoint[1],  secondPoint[2]);

      // Get the scalar source values at the eight vertices
      FieldType s[8];
      s[0] = this->Source.Get(i0);
      s[1] = this->Source.Get(i1);
      s[2] = this->Source.Get(i2);
      s[3] = this->Source.Get(i3);
      s[4] = this->Source.Get(i4);
      s[5] = this->Source.Get(i5);
      s[6] = this->Source.Get(i6);
      s[7] = this->Source.Get(i7);
      // Get data for this cell
      const vtkm::IdComponent verticesForEdge[] = { 0, 1, 1, 2, 3, 2, 0, 3,
                                                    4, 5, 5, 6, 7, 6, 4, 7,
                                                    0, 4, 1, 5, 2, 6, 3, 7 };

      // Compute the Marching Cubes case number for this cell
      vtkm::IdComponent cubeindex = 0;
      cubeindex += (fieldIn[0] > this->Isovalue);
      cubeindex += (fieldIn[1] > this->Isovalue)*2;
      cubeindex += (fieldIn[2] > this->Isovalue)*4;
      cubeindex += (fieldIn[3] > this->Isovalue)*8;
      cubeindex += (fieldIn[4] > this->Isovalue)*16;
      cubeindex += (fieldIn[5] > this->Isovalue)*32;
      cubeindex += (fieldIn[6] > this->Isovalue)*64;
      cubeindex += (fieldIn[7] > this->Isovalue)*128;
Original Implementation New implementation

Note that although the new code is significantly smaller, it corrects a major bug in the original implementation where the point coordinates were not being computed correctly. This is because we are using the shared code in the VTK-m library that we know is consistent and correct.

Another advantage of this approach is that we have removed all the structured-grid-specific code. This code is ripe for converting to being used for other types of grids (like explicit cell sets). It will be fairly trivial to add cases for other types of cells as well.

The remainder of the code is interpolating values on the edges. This code is largely the same. Note, however, that I have changed most of the variable names to be more expressive (for code readability), which makes the code extend a few more lines. Also note that I changed the normal computation. The original code used the normal for the triangle generated whereas the new code uses to gradient of the 3D field. The gradient is a bit more accurate.

      // Interpolate for vertex positions and associated scalar values
      const vtkm::Id inputIteration = (outputCellId - inputLowerBounds);
      const vtkm::Id outputVertId = outputCellId * 3;
      const vtkm::Id cellOffset = static_cast<vtkm::Id>(cubeindex*16) + (inputIteration * 3);
      for (int v = 0; v < 3; v++)
        const vtkm::Id edge = this->TriTable.Get(cellOffset + v);
        const int v0   = verticesForEdge[2*edge];
        const int v1   = verticesForEdge[2*edge + 1];
        const FieldType t  = (this->Isovalue - f[v0]) / (f[v1] - f[v0]);
        this->Vertices.Set(outputVertId + v, vtkm::Lerp(p[v0], p[v1], t));
        this->Scalars.Set(outputVertId + v, vtkm::Lerp(s[v0], s[v1], t));

      vtkm::Vec<FieldType, 3> vertex0 = this->Vertices.Get(outputVertId + 0);
      vtkm::Vec<FieldType, 3> vertex1 = this->Vertices.Get(outputVertId + 1);
      vtkm::Vec<FieldType, 3> vertex2 = this->Vertices.Get(outputVertId + 2);

      vtkm::Vec<FieldType, 3> curNorm = vtkm::Cross(vertex1-vertex0, vertex2-vertex0);
      this->Normals.Set(outputVertId + 0, curNorm);
      this->Normals.Set(outputVertId + 1, curNorm);
      this->Normals.Set(outputVertId + 2, curNorm);
      // Interpolate for vertex positions and associated scalar values
      const vtkm::Id triTableOffset =
          static_cast<vtkm::Id>(cubeindex*16 + visitIndex*3);
      for (vtkm::IdComponent triVertex = 0;
           triVertex < 3;
        const vtkm::IdComponent edgeIndex =
            triTable.Get(triTableOffset + triVertex);
        const vtkm::IdComponent edgeVertex0 = verticesForEdge[2*edgeIndex + 0];
        const vtkm::IdComponent edgeVertex1 = verticesForEdge[2*edgeIndex + 1];
        const FieldType fieldValue0 = fieldIn[edgeVertex0];
        const FieldType fieldValue1 = fieldIn[edgeVertex1];
        const FieldType interpolant =
            (this->Isovalue - fieldValue0) / (fieldValue1 - fieldValue0);
        vertexOut[triVertex] = vtkm::Lerp(coords[edgeVertex0],
        scalarsOut[triVertex] = vtkm::Lerp(scalarsIn[edgeVertex0],
        const vtkm::Vec<vtkm::FloatDefault,3> edgePCoord0 =
              fieldIn.GetNumberOfComponents(), edgeVertex0, shape, *this);
        const vtkm::Vec<vtkm::FloatDefault,3> edgePCoord1 =
              fieldIn.GetNumberOfComponents(), edgeVertex1, shape, *this);
        const vtkm::Vec<vtkm::FloatDefault,3> interpPCoord =
            vtkm::Lerp(edgePCoord0, edgePCoord1, interpolant);
        normalsOut[triVertex] =
                           fieldIn, coords, interpPCoord, shape, *this));
Original Implementation New implementation

The Control-Side

The code on the control side is also greatly simplified because the index generation is removed because it is now all performed internally by the scatter class.

    typedef typename vtkm::cont::DeviceAdapterAlgorithm<DeviceAdapter> DeviceAlgorithms;

    // Set up the Marching Cubes case tables
    vtkm::cont::ArrayHandle<vtkm::Id> vertexTableArray =
    vtkm::cont::ArrayHandle<vtkm::Id> triangleTableArray =

    // Call the ClassifyCell functor to compute the Marching Cubes case numbers
    // for each cell, and the number of vertices to be generated
    ClassifyCell classifyCell(vertexTableArray.PrepareForInput(DeviceAdapter()),

    typedef typename vtkm::worklet::DispatcherMapTopology<
                                      DeviceAdapter> ClassifyCellDispatcher;
    ClassifyCellDispatcher classifyCellDispatcher(classifyCell);

    vtkm::cont::ArrayHandle<vtkm::Id> numOutputTrisPerCell;
    classifyCellDispatcher.Invoke( field,

    // Compute the number of valid input cells and those ids
    const vtkm::Id numOutputCells = DeviceAlgorithms::ScanInclusive(numOutputTrisPerCell,

    // Terminate if no cells has triangles left
    if (numOutputCells == 0) return;

    vtkm::cont::ArrayHandle<vtkm::Id> validCellIndicesArray, inputCellIterationNumber;
    vtkm::cont::ArrayHandleIndex validCellCountImplicitArray(numOutputCells);

    // Compute for each output triangle what iteration of the input cell generates it

    // Generate a single triangle per cell
    const vtkm::Id numTotalVertices = numOutputCells * 3;

    IsoSurfaceGenerate isosurface(isovalue,
                                  verticesArray.PrepareForOutput(numTotalVertices, DeviceAdapter()),
                                  normalsArray.PrepareForOutput(numTotalVertices, DeviceAdapter()),
                                  scalarsArray.PrepareForOutput(numTotalVertices, DeviceAdapter())

    typedef typename vtkm::worklet::DispatcherMapField< IsoSurfaceGenerate,
                                                        DeviceAdapter> IsoSurfaceDispatcher;
    IsoSurfaceDispatcher isosurfaceDispatcher(isosurface);
    // Set up the Marching Cubes case tables
    vtkm::cont::ArrayHandle<vtkm::IdComponent> vertexTableArray =
    vtkm::cont::ArrayHandle<vtkm::IdComponent> triangleTableArray =

    typedef vtkm::exec::ExecutionWholeArrayConst<vtkm::IdComponent, VTKM_DEFAULT_STORAGE_TAG, DeviceAdapter>

    // Call the ClassifyCell functor to compute the Marching Cubes case numbers
    // for each cell, and the number of vertices to be generated
    ClassifyCell classifyCell(isovalue);

    typedef typename vtkm::worklet::DispatcherMapTopology<
                                      DeviceAdapter> ClassifyCellDispatcher;
    ClassifyCellDispatcher classifyCellDispatcher(classifyCell);

    vtkm::cont::ArrayHandle<vtkm::IdComponent> numOutputTrisPerCell;
    classifyCellDispatcher.Invoke( field,

    IsoSurfaceGenerate isosurface(isovalue, numOutputTrisPerCell, DeviceAdapter());

    vtkm::worklet::DispatcherMapTopology<IsoSurfaceGenerate, DeviceAdapter>
          // Currently forcing cell set to be structured. Eventually we should
          // relax this as we support other grid types.
            vtkm::ListTagBase<vtkm::cont::CellSetStructured<3> >()),
          field, // This is silly. The field will interp to isovalue
Original Implementation New implementation


This change shifts the structure of the code quite a bit, so we should be careful not to tank performance. To test this, I have measured the performance of the Marching Cubes algorithm on a couple of testbeds at Sandia. The first uses a Haswell Intel CPU containing 32 cores with 2x hyperthreading (64 threads all total). The second uses an NVIDIA Tesla K80 GPU. All experiments were run on the isosurface of the supernova data set with 10 isovalues ranging from 0 to 0.045. Each isosurface was executed 10 times.

These tests were run using a version of VTK-m taken just before the addition of Scatter objects (SHA 4a167dec0a01aaff40c54f15c179707c1cefb044) and one after the update of the Marching Cubes algorithm to the Scatter classes (SHA 1898ab473ce2a352de9341c322e69557105f400a).

The following table gives the summary of the average times. The average is taken for each code version and hardware architecture over all trials and all isosurface values.

Device Code Version Average Time
Haswell Original 0.151 s
Scatter 0.140 s
Tesla K80 Original 0.058 s
Scatter 0.054 s

As you can see, using the scatter classes is just as efficient as the original code. If anything, the changes have improved the performance a little bit.

Next Steps

The new code structure will make it easier to implement isosurfaces for more general data sets. That is surely the next step.

An improvement that can be made to the VTK-m framework is to better support geometry generation. There are still some by-hand things being done (such as creating ArrayHandleGroupVec arrays for the triangle outputs). It would be better if this was managed by the dispatcher. This implementation also generates a triangle soup. There should be a way to weld the vertices to make a manifold surface. We could add something specific to isosurface, but this is a common enough operation in visualization that there should be support at the framework level.