Hi,
I am a quite intensive equation (sin/cos double) which is taking about
40ms at 1MHz on a Mega328. Compiler is WinAVR 20090313.
Changing the compiler options from -Os to -O3 or even -O0 does not
change the execution time at all. Do these options affect the execution
time of the math.h functions or is the library precompiled?
Execution time of library functions would only change if the compiler is
able to inline a specific function (eg. the family of str... functions
is a good candidate for this). In the case of math functions, those
functions are precompiled and there is very little you can do about it.
Your best option would be to find a simpler solution to your problem, if
such a thing is mathematicaly possible, of course.
use the function sincos, or if not available, usa a library/source of
it.
This removes two instances of sin/cos call, probably 13 milliseconds.
Further, if you want speed up things, use this
http://www.dspguru.com/comp.dsp/tricks/alg/sincos.htm or a
lookuptable+intepolation.
If your variables are volatile, your optimizer has no chance to speed
your code up! Be careful with volatile. If you don´t need it, change the
variables to normal ones.
The math-Functions are pre-compiled, because they´re located in libm.a.
To change the optimization of this functions, you have to re-compile the
library or you can compile your own version of sin(), cos(),... in the
way you look for the GNU source files (GPL) and compile it with your own
compiler flags.
I realy hate to say that, but
> taking about 40ms at 1MHz on a Mega328.
There ist a relatively easy 'solution'. Remove the CKDIV8 Fuse, your
processor runs at 8Mhz instead of just 1 und the calculation is done in
5ms instead of 40.
Karl Heinz Buchegger wrote:
> Remove the CKDIV8 Fuse, your> processor runs at 8Mhz instead of just 1 und the calculation is done in> 5ms instead of 40.
However, that's only possible of Vcc is high enough. (That's the
main reason why the devices ship with CKDIV8 programmed.)
Also, if the device does have a CKDIV8 fuse (rather than four different
RC oscillators, as it's been the case on older AVRs), you can use
1
#include<avr/power.h>
2
3
...
4
clock_prescale_set(clock_div_1);
in order to achieve the same effect at run-time as unprogramming the
CKDIV8 fuse would.
Thanks for your replies!
@Karl Heinz: I had the impression that the stuff is precompiled... I am
still in a very early phase in the design and don't know yet how much
precision I need. So I'm starting with the highest complexity and if the
system works I'll try to reduce it.
@Guest: Thanks for pointing that out, I was afraid the compiler might
optimise my completely useless calculation away and therefore declared
them as volatile. I'll have a try without tonight.
The final system will run at 16MHz. I just wrote the code yesterday to
get an impression how long it would take to compute the equation. The
scope on PORTB shows an execution time of about 2.5ms at 16MHz, the 40ms
figure was taken from the instruction set simulator at 1MHz. Anyway, I'm
doing the calculation 10 times per second so there's still some margin.
Guest wrote:
> @Guest: Thanks for pointing that out, I was afraid the compiler might> optimise my completely useless calculation away
It will, yes. But to prevent this, you have to do a little more.
Placing the entire function into a separate compilation unit might
be a good starting point. As the earth's radius seldom changes :),
make that a "const". Make a1, a2, b1, b2 parameters to the
function, and return the result. If you call it from main()
(without using the option -combine when compiling), it should make
a real CALL to the function, so you can debug it. Inside main,
you can still place the result into a global (and maybe even volatile)
variable to prevent the compiler from throwing the unused result
away.
Ok, I tried it without the volatiles. The calculation is now in a
function as Joerg suggested. The execution time is still not depending
on the optimisation settings.
Seems that there will be no gain without recompiling the library. I'll
just leave it like that now and once I run into performance problems
start asking questions how to recompile the lib :-)
Thanks!
Guest wrote:
> The execution time is still not depending> on the optimisation settings.
Sure, because the major time is spent inside library functions.
> Seems that there will be no gain without recompiling the library.
That won't help. The library is already using hand-optimized assembly
code.
No one has yet pointed out that this is an ARM forum, not an AVR forum!
;) Are you lost?
However, firstly the library is provided as object code, and will have
already been optimised when it was compiled, so compiler optimisations
will not have any effect. Second, the part has no hardware FPU, and 1MHz
is very slow. I'd say that 40ms is pretty good going under those
circumstances, and wonder that you are at all surprised.
Arguably you have the wrong part for your application (depending on the
application). However it is probably possible to speed things up
considerably using fixed-point arithmetic and the CORDIC algorithm.
Here's an article on exactly that: http://www.ddj.com/cpp/207000448 with
a source code library you can download and adapt too.
Another way to speed this up at the expense of a perhaps significant
chunk of Flash is to use a look-up table. The larger the lookup table,
the better the resolution you might achieve. You could reduce the table
size at the expense of processing time by using interpolation for
intermediate values. And remember that you only need a lookup table for
a single quadrant (pi/2 radians), the other quadrants can easily be
determined from that.
Clifford