meta data for this page
  •  

CPU performance

There is a great deal of flexibility in the SDK for tuning CPU and GPU performance. This page outlines some approaches. For information on optimizing memory usage, see Keeping Memory Usage Low.

Tree cell sizes

Pick a tree cull cell size that works best with your forest configuration. The cell size affects the number of cells the SDK has to process in the rough and fine culling stages. The larger the cell size, the quicker the visible cells can be determined. Once the cells are determined, for any cells that intersect the frustum and aren't at a billboard distance, the SDK must loop through the 3D instances to determine their individual visibility. Having smaller cells reduces this time. The reference application, whose world units are feet, uses a default cell size of 1200.0 feet (set in the .sfc file, world::3dtree_cell_size parameter), which we believe strikes a good balance for the example forest's population density, but since every forest is different and uses different units, you should experiment with different sizes.

Grass cell sizes

You can pick different cell sizes per grass type and it can impact both CPU and GPU performance a great deal. Each grass model is culled in a separate call, so separate sizes are easily accommodated. In contrast to tree instances, grass instances are not individually culled. If the cell is in the frustum, then every instance in it will be rendered. This cuts down on CPU usage greatly but can be hard on the GPU for high densities. To strike a balance, smaller cell sizes can be used. In fact, we recommend smaller cell sizes for high-density grass and larger sizes for sparser populations like rocks or boulders.

By way of example, the reference application's Plantation forest (modeled in feet) uses cell sizes of 20 for the high-density grass, and up to 100 for the rocks and butterfly models.

Impact of 3D trees

The more 3D trees that appear on screen (as opposed to billboards), the more level of detail (LOD) computations the CPU will have to do. This includes updating the instance vertex buffers. Keeping the billboard transition line as close to the camera as possible helps both GPU and CPU costs. With larger billboard textures, a good texture filter, and global billboard wind, billboards can get pretty close to the camera without losing too much quality.

Draw distance

A large draw distance greatly impacts performance. For the most part, the 3D tree rendering system is unaffected, but the CPU load needed to stream billboards in and out of a large frustum will take its toll. Considering that the volume of a frustum increases exponentially with distance, the number of visible billboards can get out of hand very quickly. We've routinely tested with one- and two-mile visibilities, but they do come at a price.

Grass densities

Nothing can kill performance faster than high grass densities. Keep populations as sparse as possible, keep the pixel effects as low as possible, and keep the LOD range reasonable. Remember that the grass models have only a single LOD level, so whichever effect you choose will be used for every instance of that type.

Application-side data

Keep app data SDK-friendly. The entire streaming and culling system is based on organizing trees into evenly spaced cells. It is important for performance reasons that the application can quickly populate the cells provided by the SDK. This mostly means not wasting cycles during a render loop determining which instances go into which cells. We provide the example class CMyInstancesContainer, defined in the reference application in MyPopulate.h/cpp. It shows how to quickly and easily organize an existing population of base trees and instances into cells so that they can be quickly passed into the SDK.

Populating grass

Grass-populating code is critical. Try to precompute as much about the grass instances as possible. Profiling your population function is recommended. During the development of our reference application, we found that while we were using a random generator that provides float values very quickly, it was slow to reseed. As a result, we avoid reseeding it per cell.

Parallel culling and streaming

The culling and streaming functions are thread safe. Opportunities for parallel culling/streaming include the following:

  • Trees and grass in separate threads
  • Each grass layer in a separate thread
  • A thread per light view (for example, for use with a cascaded shadow map)

Note: At the Render Interface library level, these updates involve instance vertex buffer updates.

Stalls

The instance vertex buffers are double buffered (or better). The number of buffers is defined as c_nNumInstBuffers in Include/SpeedTree/RenderInterface/ForestRI.h and can be adjusted easily. However, even with double buffering, sometimes the buffer updates can cause a wait-on-GPU condition that will be accounted as CPU time by the SDK. Spikes in the SDK's reported cull/stream time are almost always due to this.