Saturday, September 24, 2016

Define: random floats in GPU-Server side

Following essay is totally cited , thanks for the author :)  


float rand(vec2 co){
  return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}



..Welcome to the third Lumina tutorial 
This tutorial is not only Lumina specific and is about generating procedural noise with 
GLSL. This algorithms are similar or different from other well known like "Perlin Noise" 

1. Random 
What is noise? The best expample is the snow on a TV. Each pixel is set with a random 
value. But each photo of that screen looks different from a other. In reality there is a 
simple answer for that problem: Record that noise to a video tape and take photos from 
the freeze image and each will look like the other... 

When we want to use random / noise to describe materials it is important that the 
randomnumbers are reproducible. This numbers aren't true randomnumbers because there is 
no random. 

How to create good and reproducible random numbers? 
That is a big problem. Always known are only the inputs. It's a position in a 
N-dimensional space. A good working formula for GPUs is: 

random = fract(sin(in.x * 12.9898 + in.y * 78.233.......) * 43758.5453); 

this will be compiled to a few instructions: The first step is to combine all input 
dimensions to one value. That is only a simple dot product. The sin() cos() or tan() 
(all three are good) is the second instruction. The multiplication with a big value the 
third and fraction the last. 

The noise algorithms samples the pseudo randomnumbers in regular intervals, that will 
result in an undersampling, which produce together with the fraction of sin() random 
numbers 
that are good enough for graphics. (but not for cryptographic) 

This is a simple random fragmentshader: 
float rand(vec2 co){

        return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);

        }



void main(void){

        gl_FragColor = rand(gl_TexCoord[0].xy);

        }

It creates a TV like snow. The interval is from 0.0 to 1.0 (not like the noise(gentype) 
function from GLSL that has a range from -1.0 to 1.0 but returns on many cards 0.0) 

2. Noise 

The random pixels aren't a material. To create on we have to undersampling the random 
numbers: 
float rand(vec2 co){

        return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);

        }



float noise(vec2 co){

        return rand(floor(co * 128.0));

        }



void main(void){

        gl_FragColor = noise(gl_TexCoord[0].xy);

        }

That looks like a nearest filtered noise texture. The next step is adding a simple 
bilinear filter: 
vec4 rand(vec2 A,vec2 B,vec2 C,vec2 D){

        vec2 s = vec2 (12.9898,78.233);

        vec4 tmp = vec4( dot(A,s),dot(B,s),dot(C,s),dot(D,s));

        return fract(tan(tmp)  * 43758.5453);

        }



float noise(vec2 coord,float d){

        vec2 C[4];

        C[0] = floor( coord * d)/d ;

        C[1] = C[0] + vec2(1.0/d ,0.0  );

        C[2] = C[0] + vec2(1.0/d ,1.0/d);

        C[3] = C[0] + vec2(0.0   ,1.0/d);



        vec2 p = fract(coord * d);

        vec2 q = 1.0 - p;

        vec4 w = vec4(q.x * q.y, p.x * q.y, p.x * p.y, q.x * p.y);



        return dot(vec4(rand(C[0],C[1],C[2],C[3])),w);

        }



void main(void){

        gl_FragColor = noise(gl_TexCoord[0].xy,128.0 );

        }

This shader is a littlebit optimized for vector based GPUs. "cgc" will compile that code 
to 29 instructions. 

3. Better filtered noise 

The previous implementation lacks in filteringquality if the procedural material is 
undersampled by the rasterizer. A solution is, to add only a noise octave only if it 
will be oversampled by the rasterizer. The calculation is exactly the same like for 
calculating a mipmap level. This means that filtered noise has the same anisotropic 
filtering problems like a texture. 


vec4 rand(vec2 A,vec2 B,vec2 C,vec2 D){ 

        vec2 s=vec2(12.9898,78.233); 

        vec4 tmp=vec4(dot(A,s),dot(B,s),dot(C,s),dot(D,s)); 

        return fract(sin(tmp) * 43758.5453)* 2.0 - 1.0; 

        } 

 

float noise(vec2 coord,float d){ 

        vec2 C[4]; 

        float d1 = 1.0/d;

        C[0]=floor(coord*d)*d1; 

        C[1]=C[0]+vec2(d1,0.0); 

        C[2]=C[0]+vec2(d1,d1); 

        C[3]=C[0]+vec2(0.0,d1);

 

        vec2 p=fract(coord*d); 

        vec2 q=1.0-p; 

        vec4 w=vec4(q.x*q.y,p.x*q.y,p.x*p.y,q.x*p.y); 

        return dot(vec4(rand(C[0],C[1],C[2],C[3])),w); 

        } 



void main(void){



        float level= -log2

(min(length(dFdx(gl_TexCoord[0].st)),length(dFdy(gl_TexCoord[0].st))));

        level = min(level,13.0); //limit the level to avoid slowness

        float n = 0.5;

        for(int i = 3; i< level;i++){ //3 is the lowest noise level

                n +=  0.12 * noise(gl_TexCoord[0].xy, exp2(i));

                }



        gl_FragColor = n;

        }


 
A bilinear filtered noise with mipmap like filtering. Notice the washed out structure on 
the tilted surface. Only an anisotropic filtering algorithm would help. 

Now it's possible to add a trilinear like filtering by modifying the main function: 
void main(void){

        float level= -1.0 -log2 

(min(length(dFdx(gl_TexCoord[0].st)),length(dFdy(gl_TexCoord[0].st))));

        //-1.0 is a bias shift to avoid flickering



        level = min(level,16.0); //limit the level. Equalient to a 65536x65536 texture

        float n = 0.5;

        for(int i = 3; i< level;i++){

                n +=  0.12 * noise(gl_TexCoord[0].xy, exp2(i));

                }



        n +=  0.12 * noise(gl_TexCoord[0].xy, exp2(floor(level)+1.0)) * fract(level);

        // add the last level multiplied with the fraction of the 

        // level calculation for a smooth filtering     

        gl_FragColor =  max(0.0,sin (n* 12.0))* vec4(0.3,0.8,0.4,0.0) + vec4

(0.3,0.3,0.5,0.0) * n;

        }


 



4. Texture based noise 
The previous noise version aren't very performant. Using textures has some advantages 
like working anisotropic filtering or four independent values. The input texture should 
be a small texture (smaller textures will be better cached). This small texture can be 
generated easily with a small program or gimp: 

 

The shader uses parts from the previous version: 
uniform sampler2D random;





void main(void){

        float level= -0.5 -log2 

(min(length(dFdx(gl_TexCoord[0].st)),length(dFdy(gl_TexCoord[0].st))));

        //-0.5 is a bias shift to avoid flickering



        level = min(level,20.0); //limit the level. Equalient to a 1Mx1M texture

        vec4 n = vec4(0.5, 0.5, 0.5, 1.0);



        vec2 coords =  gl_TexCoord[0].xy / 8; // note1



        for(int i = 3; i< level;i++){

                n +=  0.32 * (texture2D(random, coords)-0.5) ;//-0.05;

                coords *= 2.0;

                }



        n +=  0.32 * (texture2D(random, coords)- 0.5) * fract(level);

  

        gl_FragColor =  n;

        }

note1: The mipmap levels 3,4 and 5 don't wrap correctly, because only a part of the 
texture is used for this low resolution maps. The best solution is to start the for loop 
with log2(texture_size) and remove the divisor. The low frequency part can be easily 
replaced by a 64x64 texture with mipmaps. 

 
For the GF8 it's a good idea to use a anisotropic filter. The 2x AF would be priceless, 
because the TMUs are able to filter 2 bilinear samples at the same time. 






_____________________________________________________________________________________

Here is another one

Random Floats in GLSL 330

I had some fun putting together a random number generator in GLSL.

I wanted a set functions with the following specs:
  • Outputs a float in the half-open range 0:1  (a fraction)
  • Outputs all variations in the range given sufficient input variation
  • Multidimensional variation: accepts between 1 and 4 arbitrary float inputs
  • Uniform distribution
  • No apparent patterns
  • Self-contained: no texture lookup
  • Portable: identical results on all GPUs
  • Low overhead
Fairly stringent requirements.  I think I managed to pull it off though.

The key to making a decent set of functions was realising that in a shader - which has no state - a random function is actually a kind of hash function.  It returns a uniformly distributed output value which varies wildly in value given small changes in input.

But we're dealing with floats.   Floats which according to the GLSL spec explicitly lack precision guarantees.  Writing a hashing function in terms of floats isn't going to work very well and isn't likely to be portable either.  Also, writing a good hashing function is a difficult task in itself.  I'd rather use an existing integer hash with well defined properties.

That's what we're going to do.  So that we only have to hash integers, we're going to move the bits of the float directly into an unsigned integer, hash it, and move the bits back into a float - accompanied by a little bit-twiddling of course.  For this purpose GLSL 330 hasfloatBitsToUint and uintBitsToFloat.  This way the PRNG will be capable of dealing with any input value without range or precision being a concern.

The GLSL spec guarantees that float is in IEEE-754 binary32 format and also that uint is a 32-bit unsigned integer.  Which means we can do bit-twiddling tricks without worrying about compatibility issues.

Here's the algorithm I came up with:
  1. Hash all inputs together into a uint
  2. AND away everything but the low 23 bits
  3. Construct a valid float where the mantissa bits are all zero (value 1.0)
  4. OR the constructed float with the 23-bit uint to get range [1:2]
  5. With float arithmetic, subtract 1.0 to get desired [0:1] range
This probably sounds quite confusing if you're not intimately familiar with the binary32 format.  I'll explain it more thoroughly after I post the actual code.

Here's our integer hash function, a single iteration of Bob Jenkins' OAT algorithm.  This is what we're going to mix up our bits with.
uint hash( uint x ) {
    x += ( x << 10u );
    x ^= ( x >>  6u );
    x += ( x <<  3u );
    x ^= ( x >> 11u );
    x += ( x << 15u );
    return x;
}
Now for the bit-twiddling random() function.  Let's keep it simple and stick with one input for now:
float random( float f ) {
    const uint mantissaMask = 0x007FFFFFu;
    const uint one          = 0x3F800000u;
    
    uint h = hash( floatBitsToUint( f ) );
    h &= mantissaMask;
    h |= one;
    
    float  r2 = uintBitsToFloat( h );
    return r2 - 1.0;
}
What the hell is going on here?  To explain this I have to first explain some aspects of the floating point format:
The mantissa of a binary32 float is a 23-bit binary fraction in the half-open range [0:1].  There's an implicit leading 1 in the mantissa so the effective value of the mantissa is always 1.0 + fraction.  The absolute value of a float can be calculated as:
(1.0 + fraction) * power(2,exponent)
This means that the float value 1.0 has both an exponent and mantissa of zero.  Zero mantissa means 23 zero bits.  As the value increases from 1.0 to 2.0 only those mantissa bits change - they count upwards the same as a 23-bit unsigned integer.  By exploiting this property we can easily construct a float in that range using only bitwise operations.  Stick those bits back inside a float, subtract 1.0, and you've got a float in the desired range of [0:1].

To get multidimensional variants of this function we simply combine more inputs into the hash function.



Let's try our one-dimensional variant first:
float r = random( gl_FragCoord.x );
fragment = vec4( r,r,r, 1.0 );
This should vary the luminance randomly over the X axis.  Result:


Already we can see some good indications.  The average luminance is about 50% which means the distribution of the output is roughly uniform and covers the full range.

But it's not possible to tell whether there are nasty patterns using only one-dimensional input.  Let's make some multi-dimensional variants of the hash function:
uint hash( uvec2 v ) {
    return hash( v.x ^ hash(v.y) );
}

uint hash( uvec3 v ) {
    return hash( v.x ^ hash(v.y) ^ hash(v.z) );
}

uint hash( uvec4 v ) {
    return hash( v.x ^ hash(v.y) ^ hash(v.z) ^ hash(v.w) );
}
The way I combined the inputs is totally arbitrary and rather lazy.  I just experimented with them until I found something that worked on my test inputs.  There's probably a better and faster way of doing this, like unrolling the OAT algorithm separately for each one.

Making versions of random() which use these functions is very simple so I'll spare you the wall of text.



Anyway, let's try some 2D randomness:
float r = random( gl_FragCoord.xy );
fragment = vec4( r,r,r, 1.0 );

Great - not a pattern to be seen.  Bad RNGs create visual artifacts which spoil the effect, typically diagonal lines.  Something that looks exactly like static is the best possible outcome.



Our noise is still stationary though.  If you want the value to differ each frame then you need another input: time.  You can pass this into the shader as a uniform.  With both spatial and temporal inputs it now looks like this:
float r = random( vec3(gl_FragCoord.xy, time) );
fragment = vec4( r,r,r, 1.0 );

Wonderful.  (It looks much better in practice at 60hz.  There's only so much I can do with a GIF)



It would be interesting to see how it holds up compared to other techniques.  I've never seen anyone generate 'random' floats this way before.  Knowing computer science somebody probably invented this 50 years ago though.



What about performance?  It seems pretty good but I'm no expert on measuring GPU perf.  My Geforce 660 Ti manages ~3400 FPS at 1920x1200x32 but this is probably a useless figure.  GPUs can execute multiple instruction streams in parallel, interleave them to hide latency, have very deep pipelines, and are often limited by memory bandwidth rather than computational power.  You'd have to compare it with other random number techniques in an actual application to get useful answers.

No comments:

Post a Comment