EmbDev.net

Forum: ARM programming with GCC/GNU tools math.h and compiler optimisation


von Guest (Guest)


Rate this post
useful
not useful
Hi,

I am a quite intensive equation (sin/cos double) which is taking about 
40ms at 1MHz on a Mega328. Compiler is WinAVR 20090313.

Changing the compiler options from -Os to -O3 or even -O0 does not 
change the execution time at all. Do these options affect the execution 
time of the math.h functions or is the library precompiled?

1
#include <avr/io.h>
2
#include <math.h>
3
4
int main(void){
5
 volatile double distance;
6
 volatile double a1;
7
 volatile double a2;
8
 volatile double b1;
9
 volatile double b2;
10
 volatile double rearth;
11
12
 a1 = 1.2;
13
 a2 = 1.1;
14
 b1 = 2.2;
15
 b2 = 2.3;
16
 rearth = 6378.137;
17
18
 DDRB = 0xFF;
19
20
  while(1) {
21
22
23
  PORTB = PORTB ^ 0xFF; //led on pb5
24
25
  distance = rearth * acos( sin(a1) * sin(a2) + cos(a1) * cos(a2) * cos(b2 - b1) );
26
  }
27
}

von Guest (Guest)


Rate this post
useful
not useful
aaaaahhh, I am of course not an equation, I am computing an equation :-)

von Karl Heinz Buchegger (Guest)


Rate this post
useful
not useful
Execution time of library functions would only change if the compiler is 
able to inline a specific function (eg. the family of str... functions 
is a good candidate for this). In the case of math functions, those 
functions are precompiled and there is very little you can do about it. 
Your best option would be to find a simpler solution to your problem, if 
such a thing is mathematicaly possible, of course.

von chris (Guest)


Rate this post
useful
not useful
use the function sincos, or if not available, usa a library/source of 
it.
This removes two instances of sin/cos call, probably 13 milliseconds.
Further, if you want speed up things, use this 
http://www.dspguru.com/comp.dsp/tricks/alg/sincos.htm or a 
lookuptable+intepolation.

von Guest (Guest)


Rate this post
useful
not useful
If your variables are volatile, your optimizer has no chance to speed 
your code up! Be careful with volatile. If you don´t need it, change the 
variables to normal ones.

The math-Functions are pre-compiled, because they´re located in libm.a.

To change the optimization of this functions, you have to re-compile the 
library or you can compile your own version of sin(), cos(),... in the 
way you look for the GNU source files (GPL) and compile it with your own 
compiler flags.

von Karl Heinz Buchegger (Guest)


Rate this post
useful
not useful
I realy hate to say that, but

> taking about 40ms at 1MHz on a Mega328.

There ist a relatively easy 'solution'. Remove the CKDIV8 Fuse, your 
processor runs at 8Mhz instead of just 1 und the calculation is done in 
5ms instead of 40.

von Jörg W. (dl8dtl) (Moderator)


Rate this post
useful
not useful
Karl Heinz Buchegger wrote:
> Remove the CKDIV8 Fuse, your
> processor runs at 8Mhz instead of just 1 und the calculation is done in
> 5ms instead of 40.

However, that's only possible of Vcc is high enough.  (That's the
main reason why the devices ship with CKDIV8 programmed.)

Also, if the device does have a CKDIV8 fuse (rather than four different
RC oscillators, as it's been the case on older AVRs), you can use
1
#include <avr/power.h>
2
3
...
4
  clock_prescale_set(clock_div_1);

in order to achieve the same effect at run-time as unprogramming the
CKDIV8 fuse would.

von Guest (Guest)


Rate this post
useful
not useful
Thanks for your replies!

@Karl Heinz: I had the impression that the stuff is precompiled... I am 
still in a very early phase in the design and don't know yet how much 
precision I need. So I'm starting with the highest complexity and if the 
system works I'll try to reduce it.

@Guest: Thanks for pointing that out, I was afraid the compiler might 
optimise my completely useless calculation away and therefore declared 
them as volatile. I'll have a try without tonight.

The final system will run at 16MHz. I just wrote the code yesterday to 
get an impression how long it would take to compute the equation. The 
scope on PORTB shows an execution time of about 2.5ms at 16MHz, the 40ms 
figure was taken from the instruction set simulator at 1MHz. Anyway, I'm 
doing the calculation 10 times per second so there's still some margin.

von Jörg W. (dl8dtl) (Moderator)


Rate this post
useful
not useful
Guest wrote:

> @Guest: Thanks for pointing that out, I was afraid the compiler might
> optimise my completely useless calculation away

It will, yes.  But to prevent this, you have to do a little more.
Placing the entire function into a separate compilation unit might
be a good starting point.  As the earth's radius seldom changes :),
make that a "const".  Make a1, a2, b1, b2 parameters to the
function, and return the result.  If you call it from main()
(without using the option -combine when compiling), it should make
a real CALL to the function, so you can debug it.  Inside main,
you can still place the result into a global (and maybe even volatile)
variable to prevent the compiler from throwing the unused result
away.

von Guest (Guest)


Rate this post
useful
not useful
Ok, I tried it without the volatiles. The calculation is now in a 
function as Joerg suggested. The execution time is still not depending 
on the optimisation settings.
Seems that there will be no gain without recompiling the library. I'll 
just leave it like that now and once I run into performance problems 
start asking questions how to recompile the lib :-)

Thanks!

von Jörg W. (dl8dtl) (Moderator)


Rate this post
useful
not useful
Guest wrote:
> The execution time is still not depending
> on the optimisation settings.

Sure, because the major time is spent inside library functions.

> Seems that there will be no gain without recompiling the library.

That won't help.  The library is already using hand-optimized assembly
code.

von Clifford S. (clifford)


Rate this post
useful
not useful
No one has yet pointed out that this is an ARM forum, not an AVR forum! 
;) Are you lost?

However, firstly the library is provided as object code, and will have 
already been optimised when it was compiled, so compiler optimisations 
will not have any effect. Second, the part has no hardware FPU, and 1MHz 
is very slow. I'd say that 40ms is pretty good going under those 
circumstances, and wonder that you are at all surprised.

Arguably you have the wrong part for your application (depending on the 
application). However it is probably possible to speed things up 
considerably using fixed-point arithmetic and the CORDIC algorithm. 
Here's an article on exactly that: http://www.ddj.com/cpp/207000448 with 
a source code library you can download and adapt too.

Another way to speed this up at the expense of a perhaps significant 
chunk of Flash is to use a look-up table. The larger the lookup table, 
the better the resolution you might achieve. You could reduce the table 
size at the expense of processing time by using interpolation for 
intermediate values. And remember that you only need a lookup table for 
a single quadrant (pi/2 radians), the other quadrants can easily be 
determined from that.


Clifford

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.