Compiler Level optimization, Loop unfolding trick
Now when you are all done with your programming, Algorithm optimization etc. etc. Now it is the time for compiler to optimize the code for the binary.
Most of the compilers generate UNOPTIMIZED code for the binary just to reduce the compilation time.
We will discuss the compiler tweeks later and lets focus on the topic “Loop Unfolding”
So what is the meaning of Loop Unfolding?
for (int i = 0 ; i < 100 ; i++) { //statements here } can be written as for (int i = 0 ; i < 100 ; i++) { if (i < 100) { //statements here i++; } if (i < 100) { //statements here i++; } /* * And so on for for 10-15 times */ if (i < 100) { //statements here i++; } }
Believe me or not but this trick actually works, and it makes your binary to run faster (But not extremely fast)
So, It is very tedious job to write the the above statements and as well as time consuming
The #pragma unroll directive is used to unroll the innermost or outermost loops in your program, which can help improve program performance.
#pragma unroll(3) for (i=0; i<n; i++) { a[i]=b[i] * c[i]; } //This will generate i=0; if (i>n-2) goto remainder; for (; i<n-2; i+=3) { a[i]=b[i] * c[i]; a[i+1]=b[i+1] * c[i+1]; a[i+2]=b[i+2] * c[i+2]; } if (i<n) { remainder: for (; i<n; i++) { a[i]=b[i] * c[i]; } }
As you can see that compiler is smart enough to eliminate almost 3 IF statements in every statement hence reducing the execution time.
FYI: It will definitely increase your compilation time but who cares if your code is fast enough to execute in given time
EXPERIMENTS ON #pragma unroll (n)
We made 2 programs to find prime numbers (using Sieve of Eratosthenes) from 1 to 100000000 (10^8)
And results of program 1 : WITHOUT USING LOOP UNFOLDING
Results of program 2: We used #pragma unroll (1000)
Now you can clearly see the difference, almost 500 ms difference is there