Wednesday, September 21, 2016

Define: glBufferSubData .. glNamedBufferSubData .. updates a subset of a buffer object's data store.

Name

glBufferSubData, glNamedBufferSubData 
— updates a subset of a buffer object's data store

C Specification

void glBufferSubData(GLenum target,
GLintptr offset,
GLsizeiptr size,
const GLvoid * data);
void glNamedBufferSubData(GLuint buffer,
GLintptr offset,
GLsizei size,
const void *data);

Parameters

target
Specifies the target to which the buffer object is bound for glBufferSubData
buffer
Specifies the name of the buffer object for glNamedBufferSubData.
offset
Specifies the offset into the buffer object's data store where data replacement will begin, measured in bytes.
size
Specifies the size in bytes of the data store region being replaced.
data
Specifies a pointer to the new data that will be copied into the data store.

Description

glBufferSubData and glNamedBufferSubData redefine some or all of the data store for the specified buffer object. 
  • Data starting at byte offset offset and 
  • extending for size bytes is copied to the data store 
  • from the memory pointed to by data. 
  • offset and size must define a range lying entirely within the buffer object's data store.

Notes

When replacing the entire data store, consider using glBufferSubData rather than completely recreating the data store with glBufferData. This avoids the cost of reallocating the data store.
Consider using multiple buffer objects to avoid stalling the rendering pipeline during data store updates. If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated.
Clients must align data elements consistent with the requirements of the client platform, with an additional base-level requirement that an offset within a buffer to a datum comprising Nbytes be a multiple of N.
The GL_ATOMIC_COUNTER_BUFFER target is available only if the GL version is 4.2 or greater.
The GL_DISPATCH_INDIRECT_BUFFER and GL_SHADER_STORAGE_BUFFER targets are available only if the GL version is 4.3 or greater.
The GL_QUERY_BUFFER target is available only if the GL version is 4.4 or greater.
_______________________________________________________________________

LNK

[00]
Keep in mind that glBufferData and glMapBuffer[Range] don't do the same thing. glBufferData actually allocates the memory for the buffer, as well as setting the contents whereas glMapBuffer(Range) only sets the data (once the buffer memory has been allocated). glBufferData is good enough for introductory material but I would also just mention glMapBuffer[Range] for the curious students who may want to know more.

[01]
  • glBufferData is used to actually allocate the buffer(rather you fill it with null data is up to you.) 
  • glMapBuffer[range] is similar to glBufferSubData() in that it allows you to update the contents of the data in the buffer. However neither glMapBuffer or glBufferSubData() work if you haven't called glBufferData to actually allocate the buffer. 
  • You should be teaching both, as both are very important to using video buffers. 
  • Lastly the invalidate bit used in the map operations simply tells OpenGL that the contents of the existing buffer do not need to be maintained after finishing writing. This allows the gpu to not have to stall if its still using the existing buffer for drawing operations.
This is a complex question with a lot of small details that really matter, the performance will vary based on platform and application. So you should profile for possible bottlenecks before investing in optimizations. 
That said, firstly, I assume you should reduce uploads and updates as much as you can, for example use instancing. 
Secondly, note that GPUs can’t transfer buffers and render at the same time, so all OpenGL commands in the command queue are processed sequentially by the device. There are different ways to copy data and/or make it avaialbe to be used by the GPU. 
There are various ways to stream data to GPU 
  • 1-glBufferData or glBufferSubData method
Using glBufferData or glBufferSubData is like memcpy. you pass a pointer and DMA operation might be performed, I said might because memory might be pinned in CPU memory and used directly by the GPU without actually a memory transfer to GPU happening, depending on the usage flag (GL_STREAM). In opinion you should try this at first because it's simpler to implement. 
  • 2- getting a pointer to internal memory using glMapBuffer
If the above isn't good enough you can use glMapBuffer, you get a pointer to internal memory, and you can use this pointer to full the buffer directly, this is good with file read and write operations, as you can directly map the file data to GPU memory rather than copying to a temp buffer first. 
If you don't want to map the whole buffer you can use glMapBufferRange which can be used to map a portion of the buffer.
One trick is to create a large buffer, use the first half for rendering and the second half for updating.
  • 3- Buffer Orphaning
Regarding buffer orphaning this can be done using glBufferData with null and the same parameters it had. The driver will return the memory block once it's not used. And will be used by the next glBufferData call (no new memory will be allocated). 
All the methods mentioned cause a lot of expensive sync, again GPUs can’t transfer buffers and render at the same time.
  • 4- Unsynchronized Buffers
The fastest (and hardest to get right) method is to use buffers without synchronization you can use GL_MAP_UNSYNCHRONIZED_BIT flag with glMapBufferRange, the problem is that no sync is done, so we might upload data to a buffer begin used, and hence screwing everything up. You can use multiple buffers with unsync bit to make things a little bit easier.
[03][LINK]


  • Direct Memory Access, DMA

Ok so it's been a while since I've reviewed the hardware side of this but from what I remember-
The GPU has a DMA engine on board, that allows it to access client memory without the CPU's intervention. However this memory has to be mapped very particularly for this to be possible, and there are hard limits on how much memory can be used this way. In the AGP days it was something like 64 MB, but I don't know what it is now. I know that this number is definitely less than a 32 bit address space (4 GB) and that space is also shared amongst all devices that might like to initiate DMA transfers. There's also a question of how much space the driver is willing to set aside. Also keep in mind that the actual memory allocation is fluid and constantly adapted by the driver. Any uploads to device memory (BufferData, TexImageND, etc) would certainly require allocations in that space, which may have performance side effects.

Apart from that, I find their assertion that the bus is fast enough for it to make no difference to be suspect. By the numbers, there's tons of PCIe bandwidth, no doubt. But I don't know what kinds of internal overheads and latencies exist in fetching this data. Client memory doesn't have the bandwidth of GPU memory either. I would want this information before attempting to use client memory. This is something you'd have to go to the GPU hardware or driver people to get a clear answer on.

All in all, I would stick to the simple device/client memory model for most practical purposes. The rest should be treated as internal driver and hardware optimization, not part of the model for how things work.

Coming back to your original question - if you want to allocate a buffer in the first place, you either have to use BufferData or BufferStorage. 
  • Storage would be considered the more sophisticated way to handle it, but it's not available everywhere and it's a considerably more rigid function in usage.  
  • So I would take it as a given that you HAVE to teach them BufferData. Don't forget also that BufferSubData is hiding in there and it is NOT the same thing (indeed it's more your 'filling up function' than the regular BufferData).  
  • MapBufferRange is again a rather sophisticated function, and it isn't immediately obvious what it's capable of. 
 It's up to you whether you want to teach the regular MapBuffer on top of BufferData, which is really going to be down to your course pacing. MapBuffer isn't actually mandatory for anything, and isn't even in ES 2.0. (It's an extension on iOS devices.)

Strictly looking at performance concerns and ease of use be damned: buffer storage, persistent mapping+memcpy, manual double/triple buffering, and fences will blow the doors off all other transfer methods. This is the expected approach in the new APIs.



No comments:

Post a Comment