Background
Particle System is a general name of a large number of techniques that simulate natural phenomena such as smoke, dust, fireworks, rain, etc. The common theme in all these phenomena is that they are composed of a large amount of small particles that move together in a way which is characteristic of each type of phenomenon.
In order to simulate a natural phenomenon made from particles we usually maintain the position as well as other attributes for each particle (velocity, color, etc) and perform the following steps once per frame:
- Update the attributes of each particle. This step usually involves some math calculations (ranging from very simple to very complex - depending on the complexity of the phenomenon).
- Render the particles (as simple colored points or full blown texture mapped billboard quads).
In the past step 1 usually took place on the CPU. The application would access the vertex buffer, scan its contents and update the attributes of each and every particle. Step 2 was more straightforward and took place on the GPU as any other type of rendering. There are two problems with this approach:
- Updating the particles on the CPU requires the OpenGL driver to copy the contents of the vertex buffer from the GPU memory (on discrete cards this means over the PCI bus) to the CPU memory. The phenomena that we are insterested in usually require a large amount of particles. 10,000 particles is not a rare number in that regard. If each particle takes up 64 bytes and we are running at 60 frames per second (very good frame rate) this means copying back and forth 640K from the GPU to the CPU 60 times each second. This can have an negative effect on the performance of the application. As the number of particles grows larger the effect increases.
- Updating the particle attributes means running the same mathematical formula on different data items. This is a perfect example of distributed computing that the GPU excels at. Running it on the CPU means serializing the entire update process. If our CPU is multi core we can take advantage of it and reduce the total amount of time but that requires more work from the application. Running the update process on the GPU means that we get parallel execution for free.
DirectX10 introduced a new feature known as Stream Output that is very useful for implementing particle systems. OpenGL followed in version 3.0 with the same feature and named it Transform Feedback. The idea behind this feature is that we can connect a special type of buffer (called Transform Feedback Buffer right after the GS (or the VS if the GS is absent) and send our transformed primitives to it. In addition, we can decide whether the primitives will also continue on their regular route to the rasterizer. The same buffer can be connected as a vertex buffer in the next draw and provide the vertices that were output in the previous draw as input into the next draw. This loop enables the two steps above to take place entirely on the GPU with no application involvement (other than connecting the proper buffers for each draw and setting up some state). The following diagram shows the new architecture of the pipeline:
How many primitives end up in the transform feedback buffer? well, if there is no GS the answer is simple - it is based on the number of vertices from the draw call parameters. However, if the GS is present the number of primitives is unknown. Since the GS is capable of creating and destroying primitives on the fly (and can also include loops and branches) we cannot always calculate the total number of primitives that will end up in the buffer. So how can we draw from it later when we don't know exactly the number of vertices it contains? To overcome this challenge transform feedback also introduced a new type of draw call that does not take the number of vertices as a parameter. The system automatically tracks the number of vertices for us for each buffer and later uses that number internally when the buffer is used for input. If we append several times to the transform feedback buffer (by drawing into it several times without using it as input) the number of vertices is increased accordingly. We have the option of reseting the offset inside the buffer whenever we want and the system will also reset the number of vertices.
In this tutorial we will use transform feedback in order to simulate the effect of fireworks. Fireworks are relatively easy to simulate in terms of the math involved so we will be able to focus on getting transform feedback up and running. The same framework can later be used for other types of particle systems as well.
OpenGL enforces a general limitation that the same resource cannot be bound for both input and output in the same draw call. This means that if we want to update the particles in a vertex buffer we actually need two transform feedback buffers and toggle between them. On frame 0 we will update the particles in buffer A and render the particles from buffer B and on frame 1 we will update the particles in buffer B and render the particles from buffer A. All this is transparent to the viewer.
In addition, we will also have two techniques - one technique will be responsible for updating the particles and the other for rendering. We will use the billboarding technique from the previous tutorial for rendering so make sure you are familiar with it.
... According
to OpenGL documentation, “Transform
Feedback is the process of altering the rendering pipeline so that
primitives processed by a Vertex Shader and optionally
a Geometry Shader will be written to buffer objects. This allows
one to preserve the post-transform rendering state of an object and resubmit
this data multiple times.”
“Transform
feedback lets us do computations in the shaders and record results without needing the
CPU and GPU to do work with each other. It allows all of the work to be done on
the GPU and allows previously accessed attributes easily be read again.”
Basically that
is what it does for users. It allows you to do more work on the graphics card
than one the cpu. It
allows for faster access of attributes because you will not have to constantly
unload and reload data from physical memory or RAM.
--By Alex
Ecker and Aaron Emmert
......
No comments:
Post a Comment