Need help on code optimisation

KeithWalker

Joined Jul 10, 2017
2,525
Hi, I'm trying to reduce the time taken to execute the code
The examples given appear to be quite concise. They are all compiled and will generate very similar machine code. The speed at which they run will be mostly affected by the type and clock speed of the processor they are compiled to.
The best way to optimize a program is to create a flow chart of its function. That way, you can see if there are any repeated or redundant functions and if any shortcuts that can be made.
 

WBahn

Joined Mar 31, 2012
27,509
I think, for this particular type of problem, a lot of it comes down to how well the compiler can leverage things like the SIMD instructions of the processor. I don't know how well modern C compilers are at doing that. My understanding is that Python and Java both have modules/libraries available that are pretty good at it. A language like MATLAB is specifically designed to leverage these capabilities and the matrix nature of the notation is designed to make doing so much easier.

In addition, this is a problem that would lend itself extremely well to parallelization on different cores (or using a high-end graphics card), so once again it comes down to whether the compiler is able to effectively decompose the problem along those lines and how the code should be written to enable the compiler to do so.
 

Thread Starter

goutham1995

Joined Feb 18, 2018
104
I think, for this particular type of problem, a lot of it comes down to how well the compiler can leverage things like the SIMD instructions of the processor. I don't know how well modern C compilers are at doing that. My understanding is that Python and Java both have modules/libraries available that are pretty good at it. A language like MATLAB is specifically designed to leverage these capabilities and the matrix nature of the notation is designed to make doing so much easier.

In addition, this is a problem that would lend itself extremely well to parallelization on different cores (or using a high-end graphics card), so once again it comes down to whether the compiler is able to effectively decompose the problem along those lines and how the code should be written to enable the compiler to do so.
Thanks..so how would the code look like in that case?
 

WBahn

Joined Mar 31, 2012
27,509
Thanks..so how would the code look like in that case?
That's going to depend on the language and, possibly most importantly, the specific compiler you are using. If you really want to squeeze performance out of your code, you have to get extremely familiar with the intimate details of your tools and learn what they can do and how you need to write your code to enable them to do it.

This is why, in the early days of C, there where three ways to increment that value in a variable:

c = c + 1;
c += 1;
c++;

Every processor could add the value of two variables and store the result in a third. So that's the instruction that a brain dead compiler would use for the first version. But many processors had instructions that could add a value to a variable in place more quickly, so that same brain-dead compiler could easily be written to use that instruction for the second version. Even more common was the ability to increment the value of a variable in place, so that compiler could use that instruction if it saw the third version.

Thus the programmer could write statements that had a major impact on the performance of their code just by which of these forms they chose. Today, most compilers are sophisticated enough that the programmer can use any of these forms and the generated code will be identical.

But these abilities can also be hit and miss. Back when I was doing some SDR work that needed extreme performance, we found that making a tiny change to the code suddenly resulted in a huge performance gain. But then we found that many other tiny changes made it go away. After investigating the object code that was being generated we discovered that the macro we had written to do an n-bit rotation on a variable (an operation that C does not support and so you have to write an expression to do it) was being implemented using the processor's intrinsic rotation instruction, but that it was very fragile and so small changes that had no logical impact resulted in the code pattern being one that the compiler could no longer recognize as a rotation and thus being implemented as the much slower logic expression.
 
Top