Open GL : Storing Transformed Vertices
—Transform Feedback (part 1)
In OpenGL, it is possible to save the results of the vertex or geometry shader into a buffer object. This is a feature known as transform feedback.
- When transform feedback is used, a specified set of attributes output from the vertex shader or geometry shader are written into a buffer.
- When no geometry shader is present (remember, geometry shaders are optional), the data comes from the vertex shader. When a geometry shader is present, the vertices generated by the geometry shader are recorded.
- The buffers used for capturing the output of vertex and geometry shaders are known as transform feedback buffers.
- Once data has been placed into a buffer using transform feedback, it can be read back using a function like glGetBufferSubData() or by mapping it into the application’s address space using glMapBuffer() and reading from it directly. It can also be used as the source of data for subsequent drawing commands.
Transform Feedback
Transform feedback is a special mode of OpenGL:
The position of transform feedback is illustrated in Figure 1.
- that allows the results of a vertex or geometry shader to be saved into a buffer.
- Once the information is present in the buffer, it can be used as a source of vertex data for more drawing commands.
- Any attribute output from the vertex or geometry shader can be stored into the buffers.
- However, you can’t simultaneously record the output of the vertex shader and the geometry shader.
- If a geometry shader is active, only the output of the geometry shader is accessible.
- If you need the raw data from the vertex shader, you need to pass it through the geometry shader unmodified.
The position of transform feedback is illustrated in Figure 1.
Figure 1. Schematic of the OpenGL pipeline, including transform feedback.
- As you can see, transform feedback buffers sit between the output of the geometry shading and vertex assembly stages. As the geometry shader is an optional stage, if it is not present, the data actually comes from the vertex shader—this is denoted by dotted lines.
- Although the diagram shows transform feedback buffers feeding the vertex assembly stage, this is only to illustrate the feedback loop that is created (hence the term,transform feedback-TFB).
- While OpenGL will allow you to bind the same buffer as a transform feedback buffer and as a vertex buffer simultaneously, the results will not be defined if you do this, and you almost certainly won’t get what you wanted.
- The set of vertex attributes, or varyings, to be recorded during transform feedback mode is specified using
void glTransformFeedbackVaryings( GLuint program, GLsizei count , const GLchar ** varyings, GLenum bufferMode);
- The first parameter is the name of a program object. The transform feedback varying state is maintained per program object. This means that different programs can record different sets of vertex attributes, even if the same vertex or geometry shaders are used in them.
- The second parameter is the number of varyings to record and is also the length of the array whose address is given in the third parameter.
This third parameter is simply an array of C-style strings giving the names of the varyings to record. These are the names of the out variables in the vertex or geometry shader.Finally, the last parameter specifies the mode in which the varyings are to be recorded. This must be either GL_SEPARATE_ATTRIBS or GL_INTERLEAVED_ATTRIBS. If bufferMode is GL_INTERLEAVED_ATTRIBS, the varyings are recorded into a single buffer, one after another. If bufferMode is GL_SEPARATE_ATTRIBS, each of the varyings is recorded into its own buffer.
-------------------------------------------------------------------------------------------------
- out vec4 vs_position_out;
- out vec4 vs_color_out;
- out vec3 vs_normal_out;
- out vec3 vs_binormal_out;
- out vec3 vs_tangent_out;
[00] To specify that the varyings vs_position_out, vs_color_out, and so on should be written into a single interleaved transform feedback buffer, the following C code could be used in your application:
- static const char * varying_names[] ={
- "vs_position_out"
- , "vs_color_out"
- , "vs_normal_out"
- , "vs_binormal_out"
- , "vs_tangent_out"
- };
glTransformFeedbackVaryings(program,5,varying_names,GL_INTERLEAVED_ATTRIBS);
- Not all of the outputs from your vertex (or geometry) shader need to be stored into the transform feedback buffer. It is possible to save a subset of the vertex shader outputs to the transform feedback buffer and send more to the fragment shader for interpolation.
- Likewise, it is also possible to save some outputs from the vertex shader into a transform feedback buffer that are not used by the fragment shader. Because of this, outputs from the vertex shader that may have been considered inactive (because they’re not used by the fragment shader) may become active due to their being stored in a transform feedback buffer.
- glLinkProgram(program);
[01] Once the transform feedback varyings have been specified and the program has been linked, it may be used as normal.
- Before actually capturing anything, you need to bind a buffer object as the transform feedback buffer (TFB-Buffer).
- When you have specified the transform feedback mode as GL_INTERLEAVED_ATTRIBS, all of the stored vertex attributes are written one after another into a single buffer(TFB-Buffer which also called written TFB-Buffer).
- To specify this buffer, call
- glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, buffer);
- The second parameter is the name of the buffer object that we previously created with a call to glGenBuffers().
[02] Before any data can be written to a (TFB-Buffer)buffer, space must be allocated in the buffer for it. To allocate space without specifying data, call
- glBufferData(GL_TRANSFORM_FEEDBACK_BUFFER, size, NULL, GL_DYNAMIC_COPY);
- The first parameter is the buffer to allocate space for. You can use any buffer binding you like just for the purpose of binding a buffer and allocating space for it. However, OpenGL might make assumptions about what the buffer is going to be used for based on the first binding point it is bound to, and so, especially if this is a new buffer, the GL_TRANSFORM_FEEDBACK_BUFFER binding point is a good choice.
The size parameter specifies how much space you want to allocate in bytes. This is up to your application’s needs, but if, during transform feedback, too much data is generated to fit into the buffer, the excess will be thrown away.
- The last parameter, usage, gives OpenGL a hint as to what you plan to do with the buffer.There are many possible values for usage, but GL_DYNAMIC_COPY is probably a good choice for a transform feedback buffer.
[03] To specify which buffer the transform feedback data will be written to, you need to bind a buffer(TFB-Buffer) to one of the indexed transform feedback binding points. There are actually multiple GL_TRANSFORM_FEEDBACK_BUFFER binding points for this purpose, which are conceptually separate, but related to the general binding GL_TRANSFORM_FEEDBACK_BUFFER binding point. A schematic of this is shown in Figure 2.
Figure 2. Relationship of transform feedback binding points.
[04] To bind a buffer to any of the indexed binding points, call
- glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index, buffer);
- As before, GL_TRANSFORM_FEEDBACK_BUFFER tells OpenGL that we’re binding a buffer object to store the results of transform feedback, and
[05] An important thing to note is that there is no way to directly address any of the extra binding points provided by glBindBufferBase() through functions like glBufferData() or glCopyBuffer().
However, when you call glBindBufferBase(), it actually binds the buffer to the indexed binding point and to the generic binding point.
Therefore, you can use the extra binding points to allocate space in the buffer if you access the general binding point right after calling glBindBufferBase(). A slightly more advanced version of glBindBufferBase() is glBindBufferRange(), whose prototype is
- void glBindBufferRange(GLenum target,GLuint index
,GLuint buffer,GLintptr offset,GLsizeiptr size);
The glBindBufferRange() function allows you to bind a section of a buffer to an indexed binding point, whereas glBindBuffer() and glBindBufferBase() can only bind the whole buffer at once. The first three parameters (target, index, and buffer) have the same meanings as in glBindBufferBase().
The offset and size parameters are used to specify the start and length of the section of the buffer that you’d like to bind, respectively. You can bind different sections of the same buffer to several different indexed binding points simultaneously. This enables you to use transform feedback in GL_SEPARATE_ATTRIBS mode to write each attribute of the output vertices into separate sections of a single buffer. If your application packs all attributes into a single vertex buffer and uses glVertexAttribPointer() to specify nonzero offsets into the buffer, this allows you to make the output of transform feedback match the input of your vertex shader.
[06] If you specified that all of the attributes should be recorded into a single transform feedback buffer by using the GL_INTERLEAVED_ATTRIBS parameter to glTransformFeedbackVaryings(), the data will be written into the buffer bound to the first GL_TRANSFORM_FEEDBACK_BUFFER binding point (that with index zero).
However, if you specified that the mode for transform feedback is GL_SEPARATE_ATTRIBS, each output from the vertex shader will be recorded into its own separate buffer (or section of a buffer, if you used glBindBufferRange()). In this case, you need to bind multiple buffers or buffer sections as transform feedback buffers. The index parameter must be between zero and one less than the maximum number of varyings that can be recorded into separate buffers using transform feedback mode. This limit depends on your graphics hardware and drivers and can be found by calling glGetIntegerv() with the GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS parameter. This limit is also applied to the count parameter to glTransformFeedbackVaryings().
However, if you specified that the mode for transform feedback is GL_SEPARATE_ATTRIBS, each output from the vertex shader will be recorded into its own separate buffer (or section of a buffer, if you used glBindBufferRange()). In this case, you need to bind multiple buffers or buffer sections as transform feedback buffers. The index parameter must be between zero and one less than the maximum number of varyings that can be recorded into separate buffers using transform feedback mode. This limit depends on your graphics hardware and drivers and can be found by calling glGetIntegerv() with the GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS parameter. This limit is also applied to the count parameter to glTransformFeedbackVaryings().
There is no upper limit on the number of separate varyings that can be written to transform feedback buffers in GL_INTERLEAVED_ATTRIBS mode, but there is a maximum number of components that can be written into a buffer. For example, it is possible to write more vec3s than vec4s into a buffer using transform feedback. Again, this limit depends on your graphics hardware and can be found using glGetIntegerv() with the GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS parameter.
It is not possible to write one set of output varyings interleaved into one buffer while writing another set of attributes into another buffer.
- When transform feedback is active, the output varyings are either all stored, interleaved into one buffer, or stored packed into several different buffers or sections of buffers.
- Therefore, if you plan to use transform feedback to generate vertex data for subsequent passes, you need to consider this when you plan your input vertex layout. The vertex shader is generally a little more flexible in the way that it is able to read vertex data than in the way data can be written through transform feedback.
[07] Once the buffers that are to receive the results of the transform feedback have been bound, transform feedback mode is activated by calling
- void glBeginTransformFeedback(GLenum primitiveMode);
Now whenever vertices pass through a vertex or geometry shader, output varyings from the later shader will be written to the transform feedback buffers.
- The parameter to the function, primitiveMode, tells OpenGL what types of geometry topology to expect.
- The acceptable parameters are GL_POINTS, GL_LINES, or GL_TRIANGLES. When you call glDrawArrays() or another OpenGL drawing function, the basic geometric type must match what you have specified as the transform feedback primitive mode, or you must have a geometry shader that outputs the appropriate primitive type.
- For example, if primitiveMode is GL_TRIANGLES, you must call glDrawArrays() with GL_TRIANGLES, GL_TRIANGLE_STRIP, or GL_TRIANGLE_FAN, or you must have a geometry shader that produces GL_TRIANGLE_STRIP primitives. The mapping of transform feedback primitive mode to draw types is shown in Table 1.
Value of PrimitiveMode | Allowed Draw Types |
---|---|
GL_POINTS | GL_POINTS |
GL_LINES | GL_LINES, GL_LINE_STRIP, GL_LINE_LOOP |
GL_TRIANGLES | GL_TRIANGLES, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN |
[08] Vertices are recorded into the TFB-Buffer until transform feedback mode is exited or until the space allocated for the TFB-Buffer is exhausted. To exit transform feedback mode, call
- glEndTransformFeedback();
All rendering that occurs between a call to glBeginTransformFeedback() and glEndTransformFeedback() results in data being written into the currently bound transform feedback buffers. Each time glBeginTransformFeedback() is called, OpenGL starts writing data at the beginning of the buffers bound for transform feedback, overwriting what might be there already. Some care should be taken while transform feedback is active as changing transform feedback state between calls to glBeginTransformFeedback() and glEndTransformFeedback() is not allowed. For example, it’s not possible to change the transform feedback buffer bindings or to resize or reallocate any of the transform feedback buffers while transform feedback mode is active.
Open GL : Storing Transformed Vertices
—Transform Feedback (part 2)
Turning Off Rasterization
So far, you have seen that transform feedback is a mechanism to save the intermediate results of vertex or geometry shaders while OpenGL is rendering. But,
- what if you don’t want to actually draw anything?
- What if you only want to use transform feedback on its own without changing the contents of the screen?
- This is the kind of thing you may want to do if you’re using the vertex shader for computation other than geometry processing (physical simulation, for example). It is possible to use transform feedback for this purpose by turning off rasterization.
- This means that the vertex and geometry shaders will still run so that transform feedback will work, but the OpenGL pipeline will be chopped off after that and the fragment shader will not run at all.
- This is therefore more efficient than simply making a fragment shader that discards everything, or turning off color writes with glColorMask, for example.
[00]
To turn off rasterization, you actually need tell to OpenGL that it should discard all rasterization by calling
- glEnable(GL_RASTERIZER_DISCARD);
- glDisable(GL_RASTERIZER_DISCARD);
When GL_RASTERIZER_DISCARD is enabled, anything produced by the vertex or geometry shader (if present) does not create any fragments, and the fragment shader never runs. If you turn off rasterization and do not use transform feedback mode, the OpenGL pipeline is essentially turned off.
So far, you have seen that transform feedback is a mechanism to save the intermediate results of vertex or geometry shaders while OpenGL is rendering. But,
- what if you don’t want to actually draw anything?
- What if you only want to use transform feedback on its own without changing the contents of the screen?
- This is the kind of thing you may want to do if you’re using the vertex shader for computation other than geometry processing (physical simulation, for example). It is possible to use transform feedback for this purpose by turning off rasterization.
- This means that the vertex and geometry shaders will still run so that transform feedback will work, but the OpenGL pipeline will be chopped off after that and the fragment shader will not run at all.
- This is therefore more efficient than simply making a fragment shader that discards everything, or turning off color writes with glColorMask, for example.
[00]
To turn off rasterization, you actually need tell to OpenGL that it should discard all rasterization by calling
- glEnable(GL_RASTERIZER_DISCARD);
- glDisable(GL_RASTERIZER_DISCARD);
When GL_RASTERIZER_DISCARD is enabled, anything produced by the vertex or geometry shader (if present) does not create any fragments, and the fragment shader never runs. If you turn off rasterization and do not use transform feedback mode, the OpenGL pipeline is essentially turned off.
Counting Vertices Using Primitive Queries
[01]
- When a vertex shader but no geometry shader is present, the output from the vertex shader is recorded, and the number of vertices stored into the transform feedback is the same as the number of vertices sent to OpenGL unless the available space in any of the transform feedback buffers is exhausted.
- If a geometry shader is present, that shader may create or discard vertices and so the number of vertices written to the TFB-Buffer may be different than the number of vertices sent to OpenGL.
- OpenGL can keep track of the number of vertices written to the TFB-Buffers through query objects. The application can then use this information to draw the resulting data or to know how much to read back from the transform feedback buffer, should it want to keep the data.
- glGenQueries(1, &one_query);
- glGenQueries(10, ten_queries);
Now that you have created your query objects, you can ask OpenGL to start counting primitives as it produces them by beginning a GL_PRIMITIVES_GENERATED or GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query by beginning the query of the appropriate type. To start either query, call
- glBeginQuery(GL_PRIMITIVES_GENERATED, one_query);
or
- glBeginQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, one_query);
After a call to glBeginQuery with either GL_PRIMITIVES_GENERATED or GL_TRANSFORM_FEEDBACK_PRIMTIVES_WRITTEN, OpenGL keeps track of how many primitives were produced by the vertex or geometry shader, or how many were actually written into the TFB-Buffers until the query is ended using
- glEndQuery(GL_PRIMITIVES_GENERATED);
or
- glEndQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN);
The results of the query can be read by calling glGetQueryObjectuiv with the GL_QUERY_RESULT parameter and the name of the query object.
As with other OpenGL queries, the result might not be available immediately because of the pipelined nature of OpenGL. To find out if the results are available, call glGetQueryObjectuiv with the GL_QUERY_RESULT_AVAILABLE parameter.
- There are a couple of subtle differences between the GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
- The first is that the GL_PRIMITIVES_GENERATED query counts the number of primitives emitted by the geometry shader, but the GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query only counts primitives that were successfully written into the TFB-Buffers.
- The primitive count generated by the geometry shader may be more or less than the number of primitives sent to OpenGL, depending on what it does.
- Normally, the results of these two queries would be the same, but if not enough space is available in the transform feedback buffers, GL_PRIMITIVES_GENERATED will keep counting, but GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN will stop.
- You can check whether all of the primitives produced by your application were captured into the TFB-Buffer by running one of each query simultaneously and comparing the results.
- If they are equal, then all the primitives were successfully written. If they differ, the buffers you used for TFB were probably too small.
The second difference is that GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN is only meaningful when transform feedback is active. That is why it has TRANSFORM_FEEDBACK in its name but GL_PRIMITIVES_GENERATED does not.
If you run a GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query when TFB is not active, the result will be zero. However, the GL_PRIMITIVES_GENERATED query can be used at any time and will produce a meaningful count of the number of primitives produced by OpenGL. You can use this to find out how many vertices your geometry shader produced or discarded.
- When a vertex shader but no geometry shader is present, the output from the vertex shader is recorded, and the number of vertices stored into the transform feedback is the same as the number of vertices sent to OpenGL unless the available space in any of the transform feedback buffers is exhausted.
- If a geometry shader is present, that shader may create or discard vertices and so the number of vertices written to the TFB-Buffer may be different than the number of vertices sent to OpenGL.
- OpenGL can keep track of the number of vertices written to the TFB-Buffers through query objects. The application can then use this information to draw the resulting data or to know how much to read back from the transform feedback buffer, should it want to keep the data.
- glGenQueries(1, &one_query);
- glGenQueries(10, ten_queries);
- glBeginQuery(GL_PRIMITIVES_GENERATED, one_query);
- glBeginQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, one_query);
- glEndQuery(GL_PRIMITIVES_GENERATED);
- glEndQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN);
The results of the query can be read by calling glGetQueryObjectuiv with the GL_QUERY_RESULT parameter and the name of the query object.
- There are a couple of subtle differences between the GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
- The first is that the GL_PRIMITIVES_GENERATED query counts the number of primitives emitted by the geometry shader, but the GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query only counts primitives that were successfully written into the TFB-Buffers.
- The primitive count generated by the geometry shader may be more or less than the number of primitives sent to OpenGL, depending on what it does.
- Normally, the results of these two queries would be the same, but if not enough space is available in the transform feedback buffers, GL_PRIMITIVES_GENERATED will keep counting, but GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN will stop.
- You can check whether all of the primitives produced by your application were captured into the TFB-Buffer by running one of each query simultaneously and comparing the results.
- If they are equal, then all the primitives were successfully written. If they differ, the buffers you used for TFB were probably too small.
If you run a GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query when TFB is not active, the result will be zero. However, the GL_PRIMITIVES_GENERATED query can be used at any time and will produce a meaningful count of the number of primitives produced by OpenGL. You can use this to find out how many vertices your geometry shader produced or discarded.
Using the Results of a Primitive Query
You also know how much data is in that TFB-Buffer(writing type) by using a query object. Now it’s time to use those results in further rendering.
- [00]the results of the vertex or geometry shader are placed into a buffer using transform feedback.
- [01]The only thing making the buffer a TFB-Buffer is that it’s bound to one of the GL_TRANSFORM_FEEDBACK_BUFFER binding points.
[02]However, buffers in OpenGL are generic chunks of data and can be used for other purposes.
Generally, after running a rendering pass that produces data into a transform feedback buffer, you bind the buffer object to the GL_ARRAY_BUFFER binding point so that it can be used as a vertex buffer.
If you are using a geometry shader that might produce an unknown amount of data, you need to use a GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query to figure out how many vertices to render on the second pass.
You also know how much data is in that TFB-Buffer(writing type) by using a query object. Now it’s time to use those results in further rendering.
- [00]the results of the vertex or geometry shader are placed into a buffer using transform feedback.
- [01]The only thing making the buffer a TFB-Buffer is that it’s bound to one of the GL_TRANSFORM_FEEDBACK_BUFFER binding points.
[02]However, buffers in OpenGL are generic chunks of data and can be used for other purposes.
If you are using a geometry shader that might produce an unknown amount of data, you need to use a GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query to figure out how many vertices to render on the second pass.
Listing 1. Drawing Data Written to a Transform Feedback Buffer
// We have two buffers, buffer1 and buffer2. First, we'll bind buffer1 as the
// source of data for the draw operation (GL_ARRAY_BUFFER), and buffer2 as
// the destination for transform feedback (GL_TRANSFORM_FEEDBACK_BUFFER).
glBindBuffer(GL_ARRAY_BUFFER, buffer1);
glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFFER, buffer2);
// Now, we need to start a query to count how many vertices get written to
// the transform feedback buffer
glBeginQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, q);
// Ok, start transform feedback...
glBeginTransformFeedback(GL_POINTS);
// Draw something to get data into the transform feedback buffer
DrawSomePoints();
// Done with transform feedback
glEndTransformFeedback();
// End the query and get the result back
glEndQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN);
glGetQueryObjectuiv(q, GL_QUERY_RESULT, &vertices_to_render);
// Now we bind buffer2 (which has just been used as a transform
// feedback buffer) as a vertex buffer and render some more points
// from it.
glBindBuffer(GL_ARRAY_BUFFER, buffer2);
glDrawArrays(GL_POINTS, 0, vertices_to_render);
// We have two buffers, buffer1 and buffer2. First, we'll bind buffer1 as the // source of data for the draw operation (GL_ARRAY_BUFFER), and buffer2 as // the destination for transform feedback (GL_TRANSFORM_FEEDBACK_BUFFER). glBindBuffer(GL_ARRAY_BUFFER, buffer1); glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFFER, buffer2); // Now, we need to start a query to count how many vertices get written to // the transform feedback buffer glBeginQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, q); // Ok, start transform feedback... glBeginTransformFeedback(GL_POINTS); // Draw something to get data into the transform feedback buffer DrawSomePoints(); // Done with transform feedback glEndTransformFeedback(); // End the query and get the result back glEndQuery(GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN); glGetQueryObjectuiv(q, GL_QUERY_RESULT, &vertices_to_render); // Now we bind buffer2 (which has just been used as a transform // feedback buffer) as a vertex buffer and render some more points // from it. glBindBuffer(GL_ARRAY_BUFFER, buffer2); glDrawArrays(GL_POINTS, 0, vertices_to_render);
|
Example Uses for Transform Feedback
Here are a couple of examples of how you might use data stored in a transform feedback buffer. Remember, though, OpenGL is very flexible, and there are a myriad of other potential applications for transform feedback.Storing Intermediate Results
The first example usage for transform feedback is the storage of intermediate results. You already read about instanced rendering.
Consider an algorithm that performs a set of operations per instance and then requires the results of those operations per vertex. Now imagine that you want to render many copies of the object using instanced rendering. You could set up a vertex shader that uses as its input a few instanced arrays and a few regular, per-vertex attributes. All of those per-instance calculations would have to be performed for every copy of the object, even though they produce identical results each time.
Consider an algorithm that performs a set of operations per instance and then requires the results of those operations per vertex. Now imagine that you want to render many copies of the object using instanced rendering. You could set up a vertex shader that uses as its input a few instanced arrays and a few regular, per-vertex attributes. All of those per-instance calculations would have to be performed for every copy of the object, even though they produce identical results each time.
Instead of writing one, large vertex shader that does all of the calculations in a single pass, it is possible to break this kind of algorithm into two passes. Write a first vertex shader that calculates the common per-instance results and writes them as a set of output varyings into a transform feedback buffer. This shader can now be run once, per instance. Next, write a second vertex shader that performs the rest of the calculations (those that will be different for each copy of the object) and combines them with the intermediate results from the first vertex shader by reading the per-instance attributes using an instanced array.
Now that you have your pair of shaders, you can run the first shader once for each instance (using a regular glDrawArrays command) and then use the second to actually render each copy of the object. The first shader (the per-instance one) should be run with rasterization off (using the GL_RASTERIZER_DISCARD enable discussed earlier). This produces the intermediate results in the transform feedback buffer without actually rendering anything. Now, turn rasterization back on and render all of the individual copies of the object using the second shader and a call to one of the instanced rendering functions such as glDrawArarysInstanced.
Iterative or Recursive Algorithms
Many algorithms are recursive, recirculating results from one step to another. Physical simulations are a prime example of this type of algorithm, and transform feedback is an ideal way to produce data that is reused in subsequent passes. Because transform feedback writes data into buffers in a format that allows those buffers to be subsequently bound as vertex buffers, no conversion or copying is required between passes over the data. All that is required is a simple double-buffering scheme.
A good example of a recirculating algorithm is a particle system simulation. At each step in the simulation, each particle has a position and a velocity that must be updated. It may also have some fixed parameters such as mass, color, or any number of other attributes. To produce a simple particle system using transform feedback, each particle can be represented as a vertex and its attributes stored in vertex buffers. A vertex shader can be constructed that calculates an updated position and velocity for the particles in the system. The particle parameters that don’t change between iterations of the particle system can be stored in one vertex buffer, best allocated using the GL_STATIC_DRAW usage mode. The parameters that change between allocations should be double-buffered. One buffer is used as a vertex buffer and the source of parameters for rendering the particle system. The second buffer is bound as a transform feedback buffer and updated parameters written into it by the vertex shader. Between each iteration, the two buffers are swapped.
When the particle system is rendered, a time-step is passed to the vertex shader to indicate how much time has passed since the last update. The vertex shader calculates the approximate force on the particle due to its mass (gravity), input velocity (wind resistance), and any other factors important to the application; integrates the particle’s velocity over the appropriate time-step; and produces a new position and velocity.
To simply render the particles as points, send the particles to OpenGL using a command such as glDrawArrays with GL_POINTS as the primitive type. You may want to only update the particle positions using transform feedback but draw something more complex at each particle (a ball, or spaceship, for example). You can do this by enabling GL_RASTERIZER_DISCARD to turn off rasterization during the update phase and then use the position data as an input to a second pass that turns the points into more complex sets of geometry for rendering on the screen.
An In-Depth Example of Transform Feedback—Flocking
Let’s combine these two examples into one and create an implementation of a flocking algorithm. Flocking algorithms show emergent behavior within a large group by updating the properties of individual members independently of all others. This kind of behavior is regularly seen in nature, and examples are swarms of bees, flocks of birds, and schools of fish apparently moving in unison even though the members of the group don’t communicate directly. The decisions made by an individual are based solely on its perception of the other members of the group. However, no collaboration is made between members over the outcome of any particular decision. This means that each group member’s new properties can be calculated in parallel—ideal for a GPU implementation.
To demonstrate both of the ideas outlined previously (storing intermediate results and iterative algorithms), we implement the flocking algorithm with a pair of vertex shaders. We represent each member of the flock as a single vertex. Each vertex has a position and a velocity updated by the first vertex shader. The result is written to a buffer using transform feedback. That buffer is then bound as a vertex buffer and used as an instanced input to the second shader. Each member of the flock is an instance in the second draw. The second vertex shader is responsible for transforming a mesh (perhaps a model of a bird) into the position and orientation calculated in the first vertex shader. The algorithm then iterates, starting again with the first vertex shader, reusing the positions and velocities calculated in the previous pass. No data leaves the graphics card’s memory, and the CPU is not involved in any calculations.
The data structures we need in this example are a set of VAOs to represent the vertex array state for each pass and a set of VBOs to hold the positions and velocities of the members within the flock and the vertex data for the model we use to represent them. The flock positions and velocities need to be double-buffered because we can’t read and write the same buffer at the same time using transform feedback. Also because each member of the flock (vertex) needs to have access to the current position and velocity of all the other members of the flock, we bind the position and velocity buffers to a pair of texture buffer objects (TBOs) simultaneously. That way, the vertex shader can read arbitrarily from the TBO to access the properties of other vertices.
Figure 3 illustrates the passes that the algorithm makes.
Figure 3. Stages in the iterative flocking algorithm.
In (a), we perform the update for an even frame. The first position and velocity buffers are bound as input to the vertex shader, and the second position and velocity buffers are bound as transform feedback buffers. Notice that we also use the first set of position and velocity buffers as backing for textures (actually TBOs) that are used by the vertex shader. Next we render, in (b), using the same set of buffers as inputs as in the update pass. We use the same buffers as input in both the update and render passes so that the render pass has no dependency on the update pass. That means that OpenGL may be able to start working on the render pass before the update pass has finished. The position and velocity buffers are now instanced, and the additional geometry buffer is used to provide vertex position data.
In (c), we move to the next frame. The buffers have been exchanged—the second set of buffers is now the input to the vertex shader, and the first set is written using transform feedback. Finally, in (d), we render the odd frames. The second set of buffers is used as input to the vertex shader. Notice, though, that the flock_geometry buffer is a member of both render_vao1 and render_vao2 because the same data is used in both passes, and so we don’t need two copies of it.
The code to set all that up is shown in Listing 2. It isn’t particularly complex, but there is a fair amount of repetition, making it long. The listing contains the bulk of the initialization, with some parts omitted for brevity (those parts are indicated by *** in the comments).
Listing 2. Initializing Data Structures for the Flocking Example
// Create the four VAOs – update_vao1, update_vao2, render_vao1 and render
// vao2. Yes, we could use an array, but for the purposes of this example,
// this is more explicit
glGenVertexArrays(1, &update_vao1);
// *** Create update_vao2, render_vao1 and render_vao2 the same way
// Create the buffer objects. We'll bind and initialize them in a moment
glGenBuffers(1, &flock_positions1);
// *** Create flock_positions2, flock_velocities1, flock_velocities2 and
// flock_geometry the same way
// Set up the VAOs and buffers – first update_vao1
glBindVertexArray(update_vao1);
glBindBuffer(GL_ARRAY_BUFFER, flock_positions1);
// *** Put some initial positions in flock_positions1 here
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, flock_velocities1);
// *** Initialize flock_velocities1 with zeroes
// (glBufferData(... NULL), glMapBuffer, memset, for example))
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(1);
// Next, update_vao2
// *** This is pretty much the same as update_vao1, except we don't need
// *** initial data for flock_positions2 or flock_velocities2 because
// *** they'll be written on the first pass. We do need to allocate them
// *** using glBufferData(... NULL), though
// Now the render VAOs – render_vao1 first
// We bind the same flock_positions1 and flock_positions2 buffers to this
// VAO, but this time they're instanced arrays. We also bind flock_geometry
// as a regular vertex array
glBindVertexArray(render_vao1);
glBindBuffer(GL_ARRAY_BUFFER, flock_positions1);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(0);
glVertexAttribDivisor(0, 1);
glBindBuffer(GL_ARRAY_BUFFER, flock_velocities1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(1);
glVertexAttribDivisor(1, 1);
glBindBuffer(GL_ARRAY_BUFFER, flock_geometry);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 0, NULL);
glEnableVertexAttribArray(2);
// Set up render_vao2
// *** This looks just like the setup for render_vao1, except we're using
// *** flock_positions2, and flock_velocities2. Note, though, that we'd
// *** still bind flock_geometry because that doesn't change from iteration
// *** to iteration.
// Finally, set up the TBOs
glGenTextures(1, &position_texture1);
glBindTexture(GL_TEXTURE_BUFFER, position_texture1);
glBindBuffer(GL_TEXTURE_BUFFER, flock_positions1);
// *** Create a buffer texture for each of flock_velocities1, flock_position2,
// *** and flock_velocities2 in the same way
|
Once we have our buffers set up, we need to compile our shaders and link them together in a program. Before the program is linked, we need to bind the attributes in the vertex shader to the appropriate locations so that they match the vertex arrays that we set up. We also need to tell OpenGL which varyings we’re planning on writing to the transform feedback buffers. Listing 3 shows how the vertex attributes and transform feedback varyings are initialized.
Listing 3. Initializing Attributes and Transform Feedback for the Flocking Example
// *** Assume we've created our vertex and fragment shaders, compiled them
// *** and attached them to our program object.
// First, we'll set up the attributes in the update program
glBindAttribLocation(update_program, 0, "position");
glBindAttribLocation(update_program, 1, "velocity");
// Now the rendering program. The first two attributes are actually the
// same as those written by the update_program. The third is the position
// of the vertices in the geometry
glBindAttribLocation(render_program, 0, "instance_position");
glBindAttribLocation(render_program, 1, "instance_velocity");
glBindAttribLocation(render_program, 2, "geometry_position");
// Now we set up the transform feedback varyings:
static const char * tf_varyings[] = { "position_out", "velocity_out" };
glTransformFeedbackVaryings(update_program, 2, tf_varyings,
GL_SEPARATE_ATTRIBS);
// Now, everything's set up so we can go ahead and link our program objects
glLinkProgram(update_program);
glLinkProgram(render_program);
|
Now we need a rendering loop to update our flock positions and draw the members of the flock. It’s actually pretty simple, now that we have our data encapsulated in VAOs. The rendering loop is shown in Listing 4.
Listing 4. The Rendering Loop for the Flocking Example
// Make the update program current
glUseProgram(update_program);
// We use one set of buffers as shader inputs, and another as transform
// feedback buffers to hold the shader outputs. On alternating frames,
// we'll swap the two around
if (frame_index & 1) {
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, position_buffer1);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, velocity_buffer1);
glBindVertexArray(update_vao2);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_BUFFER, position_texture2);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_BUFFER, velocity_texture2);
} else {
// *** This is the same again, only using position_buffer2, and velocity
// *** buffer2 as transform feedback buffers, and update_vao1, position
// *** texture1 and velocity_texture1 as shader inputs
}
// Turn off rasterization (enable rasterizer discard)
glEnable(GL_RASTERIZER_DISCARD);
// Start transform feedback – record updated positions
glBeginTransformFeedback(GL_POINTS);
// Draw arrays – one point for each member of the flock
glDrawArrays(GL_POINTS, 0, flock_size);
// Done with transform feedback
glEndTransformFeedback(GL_POINTS);
// Ok, now we'll draw everything. Need to turn rasterization back on.
glDisable(GL_RASTERIZER_DISCARD);
// Use the rendering program
glUseProgram(render_program);
if (frame_index & 1) {
glBindVertexArray(render_vao2);
} else {
glBindVertexArray(render_vao1);
}
// Do an instanced draw – each member is an instance. The data updated
// by the 'update_program' on the last frame is now an instanced array
// in the render_program
glDrawArraysInstanced(GL_TRIANGLES, 0, 50, flock_size);
frame_index++;
|
That’s pretty much the interesting part of the program side. Let’s take a look at the shader side of things. The flocking algorithm works by applying a set of rules for each member of the flock to decide which direction to travel in. Each rule considers the current properties of the flock member and the properties of the other members of the flock as perceived by the individual being updated. Most of the rules require access to the other member’s position and velocity data, so update_program uses a pair of TBOs to read from the buffers containing that information. Listing 5 shows the start of the update vertex shader.
Listing 5. Initializing Attributes and Transform Feedback for the Flocking Example
#version 150
precision highp float;
// These are the input attributes
in vec3 position;
in vec3 velocity;
// These get written to transform feedback buffers
out vec3 position_out;
out vec3 velocity_out;
// These are the TBOs that are mapped to the same buffers as position
// and velocity
uniform samplerBuffer tex_position;
uniform samplerBuffer tex_velocity;
// The number of members in the flock
uniform int flock_size;
// Parameters for the simulation
uniform Parameters
{
// *** Put all the simulation parameters here
};
|
The main body of the program is simple. We simply read the position and velocity of the other members of the flock, apply each rule in turn, sum up the resulting vector, and output an updated position and velocity. Code to do this is given in Listing 6.
Listing 6. Main Body of the Flocking Update Vertex Shader
void main(void)
{
vec3 other_position;
vec3 other_velocity;
vec3 accelleration = vec3(0.0);
int i;
for (i = 0; i < flock_size; i++) {
other_position = texelFetch(tex_position, i).xyz;
other_velocity = texelFetch(tex_velocity, i).xyz;
accelleraton += rule1(position, velocity,
other_position, other_velocity);
accelleraton += rule2(position, velocity,
other_position, other_velocity);
// *** And so on... we can apply as many rules as we want.
// *** Three or four is is enough to produce a convincing
// *** simulation
}
position_out = position + velocity;
velocity_out = velocity + acceleration / float(flock_size);
}
|
Listing 7 contains the shader code for the first rule. If we’re closer to another member than we’re supposed to be, we simply move away from that member:
Listing 7. The First Rule of Flocking
vec3 rule1(vec3 my_position, vec3 my_velocity,
vec3 their_position, vec3 their_velocity)
{
vec3 d = my_position – their_position;
if (dot(d, d) < parameters.closest_allowed_position)
return d * parameters.rule1_weight;
return vec3(0.0);
}
|
Here’s the shader code for the second rule (see Listing 8). It returns a change in velocity weighted by the inverse square of the distance from to other member.
Listing 8. The Second Rule of Flocking
vec3 rule2(vec3 my_position, vec3 my_velocity,
vec3 their_position, vec3 their_velocity)
{
vec3 dv = (their_velocity – my_velocity);
return parameters.rule2_weight *
dv / (dot(my_position, their_position) + 1.0);
}
|
Putting all this together along with any other rules we want to implement completes the update part of the program. Now we need to produce the second vertex shader—the one responsible for rendering the flock. This uses the position and velocity data as instanced arrays and transforms a fixed set of vertices into position based on the position and velocity of the individual member. Listing 9 shows the inputs to the shader.
Listing 9. Declarations of Inputs to the Flocking Rendering Vertex Shader
#version 150
precision highp float;
// These are the instanced arrays
in vec3 instance_position;
in vec3 instance_velocity;
// The regular geometry array
in vec3 position;
|
The body of our shader (given in Listing 10) simply transforms the mesh represented by position into the correct orientation and location for the particular instance.
Listing 10. Flocking Vertex Shader Body
void main(void)
{
// rotate_to_match is a function that rotates a point
// (position) around the origin to match a direction vector
// (instance_velocity)
vec3 local_position = rotate_to_match(position, instance_velocity);
gl_Position = mvp * vec4(instance_position + local_position, 1.0);
}
|
...
No comments:
Post a Comment