ELF: Define: random floats in GPU-Server side

Following essay is totally cited , thanks for the author :)

ClickMe

float rand(vec2 co){
  return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}

..Welcome to the third Lumina tutorial
This tutorial is not only Lumina specific and is about generating procedural noise with
GLSL. This algorithms are similar or different from other well known like "Perlin Noise"

1. Random
What is noise? The best expample is the snow on a TV. Each pixel is set with a random
value. But each photo of that screen looks different from a other. In reality there is a
simple answer for that problem: Record that noise to a video tape and take photos from
the freeze image and each will look like the other...

When we want to use random / noise to describe materials it is important that the
randomnumbers are reproducible. This numbers aren't true randomnumbers because there is
no random.

How to create good and reproducible random numbers?
That is a big problem. Always known are only the inputs. It's a position in a
N-dimensional space. A good working formula for GPUs is:

random = fract(sin(in.x * 12.9898 + in.y * 78.233.......) * 43758.5453);

this will be compiled to a few instructions: The first step is to combine all input
dimensions to one value. That is only a simple dot product. The sin() cos() or tan()
(all three are good) is the second instruction. The multiplication with a big value the
third and fraction the last.

The noise algorithms samples the pseudo randomnumbers in regular intervals, that will
result in an undersampling, which produce together with the fraction of sin() random
numbers
that are good enough for graphics. (but not for cryptographic)

This is a simple random fragmentshader:

float rand(vec2 co){

        return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);

        }



void main(void){

        gl_FragColor = rand(gl_TexCoord[0].xy);

        }

It creates a TV like snow. The interval is from 0.0 to 1.0 (not like the noise(gentype)
function from GLSL that has a range from -1.0 to 1.0 but returns on many cards 0.0)

2. Noise

The random pixels aren't a material. To create on we have to undersampling the random
numbers:

float rand(vec2 co){

        return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);

        }



float noise(vec2 co){

        return rand(floor(co * 128.0));

        }



void main(void){

        gl_FragColor = noise(gl_TexCoord[0].xy);

        }

That looks like a nearest filtered noise texture. The next step is adding a simple
bilinear filter:

vec4 rand(vec2 A,vec2 B,vec2 C,vec2 D){

        vec2 s = vec2 (12.9898,78.233);

        vec4 tmp = vec4( dot(A,s),dot(B,s),dot(C,s),dot(D,s));

        return fract(tan(tmp)  * 43758.5453);

        }



float noise(vec2 coord,float d){

        vec2 C[4];

        C[0] = floor( coord * d)/d ;

        C[1] = C[0] + vec2(1.0/d ,0.0  );

        C[2] = C[0] + vec2(1.0/d ,1.0/d);

        C[3] = C[0] + vec2(0.0   ,1.0/d);



        vec2 p = fract(coord * d);

        vec2 q = 1.0 - p;

        vec4 w = vec4(q.x * q.y, p.x * q.y, p.x * p.y, q.x * p.y);



        return dot(vec4(rand(C[0],C[1],C[2],C[3])),w);

        }



void main(void){

        gl_FragColor = noise(gl_TexCoord[0].xy,128.0 );

        }

This shader is a littlebit optimized for vector based GPUs. "cgc" will compile that code
to 29 instructions.

3. Better filtered noise

The previous implementation lacks in filteringquality if the procedural material is
undersampled by the rasterizer. A solution is, to add only a noise octave only if it
will be oversampled by the rasterizer. The calculation is exactly the same like for
calculating a mipmap level. This means that filtered noise has the same anisotropic
filtering problems like a texture.

vec4 rand(vec2 A,vec2 B,vec2 C,vec2 D){ 

        vec2 s=vec2(12.9898,78.233); 

        vec4 tmp=vec4(dot(A,s),dot(B,s),dot(C,s),dot(D,s)); 

        return fract(sin(tmp) * 43758.5453)* 2.0 - 1.0; 

        } 

 

float noise(vec2 coord,float d){ 

        vec2 C[4]; 

        float d1 = 1.0/d;

        C[0]=floor(coord*d)*d1; 

        C[1]=C[0]+vec2(d1,0.0); 

        C[2]=C[0]+vec2(d1,d1); 

        C[3]=C[0]+vec2(0.0,d1);

 

        vec2 p=fract(coord*d); 

        vec2 q=1.0-p; 

        vec4 w=vec4(q.x*q.y,p.x*q.y,p.x*p.y,q.x*p.y); 

        return dot(vec4(rand(C[0],C[1],C[2],C[3])),w); 

        } 



void main(void){



        float level= -log2

(min(length(dFdx(gl_TexCoord[0].st)),length(dFdy(gl_TexCoord[0].st))));

        level = min(level,13.0); //limit the level to avoid slowness

        float n = 0.5;

        for(int i = 3; i< level;i++){ //3 is the lowest noise level

                n +=  0.12 * noise(gl_TexCoord[0].xy, exp2(i));

                }



        gl_FragColor = n;

        }

A bilinear filtered noise with mipmap like filtering. Notice the washed out structure on
the tilted surface. Only an anisotropic filtering algorithm would help.

Now it's possible to add a trilinear like filtering by modifying the main function:

void main(void){

        float level= -1.0 -log2 

(min(length(dFdx(gl_TexCoord[0].st)),length(dFdy(gl_TexCoord[0].st))));

        //-1.0 is a bias shift to avoid flickering



        level = min(level,16.0); //limit the level. Equalient to a 65536x65536 texture

        float n = 0.5;

        for(int i = 3; i< level;i++){

                n +=  0.12 * noise(gl_TexCoord[0].xy, exp2(i));

                }



        n +=  0.12 * noise(gl_TexCoord[0].xy, exp2(floor(level)+1.0)) * fract(level);

        // add the last level multiplied with the fraction of the 

        // level calculation for a smooth filtering     

        gl_FragColor =  max(0.0,sin (n* 12.0))* vec4(0.3,0.8,0.4,0.0) + vec4

(0.3,0.3,0.5,0.0) * n;

        }

4. Texture based noise
The previous noise version aren't very performant. Using textures has some advantages
like working anisotropic filtering or four independent values. The input texture should
be a small texture (smaller textures will be better cached). This small texture can be
generated easily with a small program or gimp:

The shader uses parts from the previous version:

uniform sampler2D random;





void main(void){

        float level= -0.5 -log2 

(min(length(dFdx(gl_TexCoord[0].st)),length(dFdy(gl_TexCoord[0].st))));

        //-0.5 is a bias shift to avoid flickering



        level = min(level,20.0); //limit the level. Equalient to a 1Mx1M texture

        vec4 n = vec4(0.5, 0.5, 0.5, 1.0);



        vec2 coords =  gl_TexCoord[0].xy / 8; // note1



        for(int i = 3; i< level;i++){

                n +=  0.32 * (texture2D(random, coords)-0.5) ;//-0.05;

                coords *= 2.0;

                }



        n +=  0.32 * (texture2D(random, coords)- 0.5) * fract(level);

  

        gl_FragColor =  n;

        }

note1: The mipmap levels 3,4 and 5 don't wrap correctly, because only a part of the
texture is used for this low resolution maps. The best solution is to start the for loop
with log2(texture_size) and remove the divisor. The low frequency part can be easily
replaced by a 64x64 texture with mipmaps.

For the GF8 it's a good idea to use a anisotropic filter. The 2x AF would be priceless,
because the TMUs are able to filter 2 bilinear samples at the same time.

_____________________________________________________________________________________

Here is another one

Random Floats in GLSL 330

I had some fun putting together a random number generator in GLSL.

I wanted a set functions with the following specs:

Outputs a float in the half-open range 0:1 (a fraction)
Outputs all variations in the range given sufficient input variation
Multidimensional variation: accepts between 1 and 4 arbitrary float inputs
Uniform distribution
No apparent patterns
Self-contained: no texture lookup
Portable: identical results on all GPUs
Low overhead

Fairly stringent requirements. I think I managed to pull it off though.

The key to making a decent set of functions was realising that in a shader - which has no state - a random function is actually a kind of hash function. It returns a uniformly distributed output value which varies wildly in value given small changes in input.

But we're dealing with floats. Floats which according to the GLSL spec explicitly lack precision guarantees. Writing a hashing function in terms of floats isn't going to work very well and isn't likely to be portable either. Also, writing a good hashing function is a difficult task in itself. I'd rather use an existing integer hash with well defined properties.

That's what we're going to do. So that we only have to hash integers, we're going to move the bits of the float directly into an unsigned integer, hash it, and move the bits back into a float - accompanied by a little bit-twiddling of course. For this purpose GLSL 330 hasfloatBitsToUint and uintBitsToFloat. This way the PRNG will be capable of dealing with any input value without range or precision being a concern.

The GLSL spec guarantees that float is in IEEE-754 binary32 format and also that uint is a 32-bit unsigned integer. Which means we can do bit-twiddling tricks without worrying about compatibility issues.

Here's the algorithm I came up with:

Hash all inputs together into a uint
AND away everything but the low 23 bits
Construct a valid float where the mantissa bits are all zero (value 1.0)
OR the constructed float with the 23-bit uint to get range [1:2]
With float arithmetic, subtract 1.0 to get desired [0:1] range

This probably sounds quite confusing if you're not intimately familiar with the binary32 format. I'll explain it more thoroughly after I post the actual code.

Here's our integer hash function, a single iteration of Bob Jenkins' OAT algorithm. This is what we're going to mix up our bits with.

uint hash( uint x ) {
    x += ( x << 10u );
    x ^= ( x >> 6u );
    x += ( x << 3u );
    x ^= ( x >> 11u );
    x += ( x << 15u );
    return x;
}

Now for the bit-twiddling random() function. Let's keep it simple and stick with one input for now:

float random( float f ) {
    const uint mantissaMask = 0x007FFFFFu;
    const uint one          = 0x3F800000u;

    uint h = hash( floatBitsToUint( f ) );
    h &= mantissaMask;
    h |= one;

    float r2 = uintBitsToFloat( h );
    return r2 - 1.0;
}

What the hell is going on here? To explain this I have to first explain some aspects of the floating point format:

The mantissa of a binary32 float is a 23-bit binary fraction in the half-open range [0:1]. There's an implicit leading 1 in the mantissa so the effective value of the mantissa is always 1.0 + fraction. The absolute value of a float can be calculated as:

(1.0 + fraction) * power(2,exponent)

This means that the float value 1.0 has both an exponent and mantissa of zero. Zero mantissa means 23 zero bits. As the value increases from 1.0 to 2.0 only those mantissa bits change - they count upwards the same as a 23-bit unsigned integer. By exploiting this property we can easily construct a float in that range using only bitwise operations. Stick those bits back inside a float, subtract 1.0, and you've got a float in the desired range of [0:1].

To get multidimensional variants of this function we simply combine more inputs into the hash function.

Let's try our one-dimensional variant first:

float r = random( gl_FragCoord.x );
fragment = vec4( r,r,r, 1.0 );

This should vary the luminance randomly over the X axis. Result:

Already we can see some good indications. The average luminance is about 50% which means the distribution of the output is roughly uniform and covers the full range.

But it's not possible to tell whether there are nasty patterns using only one-dimensional input. Let's make some multi-dimensional variants of the hash function:

uint hash( uvec2 v ) {
    return hash( v.x ^ hash(v.y) );
}

uint hash( uvec3 v ) {
    return hash( v.x ^ hash(v.y) ^ hash(v.z) );
}

uint hash( uvec4 v ) {
    return hash( v.x ^ hash(v.y) ^ hash(v.z) ^ hash(v.w) );
}

The way I combined the inputs is totally arbitrary and rather lazy. I just experimented with them until I found something that worked on my test inputs. There's probably a better and faster way of doing this, like unrolling the OAT algorithm separately for each one.

Making versions of random() which use these functions is very simple so I'll spare you the wall of text.

Anyway, let's try some 2D randomness:

float r = random( gl_FragCoord.xy );
fragment = vec4( r,r,r, 1.0 );

Great - not a pattern to be seen. Bad RNGs create visual artifacts which spoil the effect, typically diagonal lines. Something that looks exactly like static is the best possible outcome.

Our noise is still stationary though. If you want the value to differ each frame then you need another input: time. You can pass this into the shader as a uniform. With both spatial and temporal inputs it now looks like this:

float r = random( vec3(gl_FragCoord.xy, time) );
fragment = vec4( r,r,r, 1.0 );

Wonderful. (It looks much better in practice at 60hz. There's only so much I can do with a GIF)

It would be interesting to see how it holds up compared to other techniques. I've never seen anyone generate 'random' floats this way before. Knowing computer science somebody probably invented this 50 years ago though.

What about performance? It seems pretty good but I'm no expert on measuring GPU perf. My Geforce 660 Ti manages ~3400 FPS at 1920x1200x32 but this is probably a useless figure. GPUs can execute multiple instruction streams in parallel, interleave them to hide latency, have very deep pipelines, and are often limited by memory bandwidth rather than computational power. You'd have to compare it with other random number techniques in an actual application to get useful answers.

ELF

Saturday, September 24, 2016

Define: random floats in GPU-Server side

Following essay is totally cited , thanks for the author :)

ClickMe

Here is another one

Random Floats in GLSL 330

No comments:

Post a Comment

Saturday, September 24, 2016

Define: random floats in GPU-Server side

Following essay is totally cited , thanks for the author :) ClickMe

Here is another one

Random Floats in GLSL 330

No comments:

Post a Comment

Following essay is totally cited , thanks for the author :)

ClickMe