# Forum: ARM programming with GCC/GNU tools fast speed.ARM

Rate this post
 0 ▲ useful ▼ not useful
Hi all.
I have question for all.
I test LPC2146 (72MHz) with this function
if variable are
DWORD a,b
float m

b = a*(1-m.a.a);

and it's for me 7,5us time.
is ARM9 faster??? from ST or NXP .

regards

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
> Hi all.
> I have question for all.
> I test LPC2146 (72MHz) with this function
> if variable are
> DWORD a,b
> float m
>
> b = a*(1-m.a.a);
>
> and it's for me 7,5us time.
> is ARM9 faster??? from ST or NXP .
>
> regards

On ARM7 you don't have any Hardware floating point units. SO every
floating point issue is done in asm. This is not really good for speed
point. You could look into fixed point multiplication that give you a
better speed.

I have found a class that encapsulate this. could not remember where.
But if you google it, you should find it.

I'm pretty sure you can find this info on that board.

As for ARM9, you should get a better speed as the cpu is faster.

hope this help

Jonathan

Rate this post
 0 ▲ useful ▼ not useful
here a class of it.

regards

Jonathan

Rate this post
 0 ▲ useful ▼ not useful
thanks Jonathan.

I need calculate some value .
m = 530/(72000000*72000000)
so m is very small number and I need calculate this

DWORD a,b
float m

b = a*(1-m.a.a);

have any ARM9 only faster core (to 96MHz) or have Hardware floating
point units.
like STR9 or other.
but I need MCU fith internal flash.

regards

Rate this post
 0 ▲ useful ▼ not useful
or any help how to make good formula to calculate my variable????

regards

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
> or any help how to make good formula to calculate my variable????

Try to describe the complete calculation. How is "b" used in the
following code?

Rate this post
 0 ▲ useful ▼ not useful
Martin Thomas wrote:
> Sevc Dominik wrote:
>> or any help how to make good formula to calculate my variable????
>
> Try to describe the complete calculation. How is "b" used in the
> following code?

I need calculate delay for steper motor.
this formula is approximation of acceleration/decceleration .

b is T0MR0 , a is value delay from first step or previos step. m is
multiplicator (m = accel/(Fosc*Fosc)
if formula is  a= a*(1-m*a*a) then acceleration if si 1+m... then
decceleration
and it's work fine. but I want generate more step per sec.
I need reduce this formula or calculate more efective.

regards

sorru for my eanglish

Rate this post
 0 ▲ useful ▼ not useful
Not many off-the-shelf ARM microcontrollers (as opposed to FPGA soft
core IP) include an FPU.

Choices I am aware of are:

NXP LPC3xxx (220MHz ARM9 with VFP coprocessor)
Freescale i.MX31 (600MHz ARM11 with VFP)

The LPC3xxx runs at 220MHz and FP on the hardware is about 5 times
faster than software floating point operations at the same clock rate.
Plus the ARM9 has cache memory, but you could then expect it to be at
least 15 times faster that your LPC2xxx part. The VFP supports SIMD
instructions, so it si possible to achieve even faster throughput for
some times of calculation, but you would have to code in assembler, I
have not come across an ARM compiler yet that can perform vectorisation
for VFP operations. Note that if you are going to use teh floating point
math library, you will need to rebuild teh library from source to
utilise the VFP. And even then you will ahve to re-implement the
sqrt()/fsqrt() functions to use teh SQRT/FSQRT instruction instead of a
software algorithm.

All that said, it is a lot of expense to go to just for a delay
calculation. A sledgehammer to crack a nut in fact! Plenty of motor
controllers work with far less.

First you might ask yourself how fast the calculation actually needs to
be to guarantee deadlines. If 7.5us is fast enough why worry. Fast
enough is fast enough after all.

A very fast alternative solution (at the expense of memory) would be to
pre-calculate delay values and place them into a look-up table. If the
range of values is large, then it may be sufficient to use a sparse
table with linear interpolation between values.

If you do m = 530/(72000000*72000000) you will have trouble with fixed
point arithmetic, and I would suggest trouble with precision in floating
point - it is too small a value to be represented by a float and a
double would be even slower. However there is a lot of redundancy there,
how much precision do you need. Consider this:

530 * 10000 /(72*72) = 1022
530/(72000000*72000000) = 1.022e-13

Note that the significant figures are the same, but the scale is
different. The trick is to work in different units so you can use
integer arithmetic and scaled integers.

Now if you make your scaling a power of two (so instead of say 10000 use
8192, it will be even faster because the compiler will replace the / and
* operations with bit-shift operations (or you can do it explicitly in
code if you wish, but the compiler will do it.

Using a fixed point library or class may cause you problems in this
situation, it is often best to deal with the calculation directly so
that you can use different scaling values for different parts of the
calculation to maximise precision and avoid overflow.

I usually start with decimal scaling because it is easier to see if the
answer is what you expected and where you might have missed a scaling,
and then switch the scale factors to binary afterwards. I also generally
model the calculations in a spreadsheet first to ensure it is correct
since that is easier than target debugging. You can calculate a full
range of values all at once and graph them. If you do that, use separate
cells for partial calculations to ensure that no sub-expression
overflows will be hidden. Then I would prototype the code using a
desktop compiler such as VC++ to make sure the C implementation matches
the spreadsheet model, and because debugging is so much easier. You can
get the prototype to output comma separated data and import that
directly into a spreadsheet and graph it to compare with the model.

My recommendation however is if you can do it with a lookup table, then
do so. It will be lightening fast, even with interpolation.

Also it may be easier to look at the whole thing differently. You can
simply calculate the desired position over time and issue the
appropriate number of steps to get to that position from the current
position. That is far simpler than calculating individual step periods.
Like this:

t = current time
p = current position
d = desired position at time t
error = p - d
if error < 0 direction = reverse
issue |error| steps

If the loop time is fast enough error will always be small. Timing is
far less critical because you issue the pulses at the fastest possible
speed to catch up to where you should be at that time. This method works
for constant velocity and acceleration. If you use a hardware timer to
it becomes even simpler because delta-t is a constant.

Clifford

Rate this post
 0 ▲ useful ▼ not useful
Hi Clifford.

1. I want use MCU with internal flash (LOW price , easy construction of
board ...) LPC3000 series are with external flash.
2. I need calculate acceleration/decelarieon and genrate pulse to output
with linear or circle interpolation (it's about 7us), it's all time for
me about 14us and my max step speed is 1/14us , it's  71428 step per
sec. its good but problem is so I need free time to comunicate trought
serial port.
3. look-up table for full sample is not real . if want use some max
speed and use some accelration it's about 2.000.000 sample . I too think
to create look-up table only for about 10.000 sample and use linear
interpolation to calculate value, but I don't know if it take more time
like this 7,5us.
4. I think so good way be to using a fixed point library or class.
value m (accelration in step / (osc * osc)) are like constant and
calculate only one . fool formula dtime=dtime*(1-m*dtime*dtime)
calculate for every next step. look at this
http://www.hwml.com/LeibRamp.htm , there is  I'm use it.

hmm can any help to use fixed point library. in previous post dumarjo
give some code but hou to use it. I try somethig in this week end.

regards

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
> http://www.hwml.com/LeibRamp.htm , there is  I'm use it.

Hmmm... I realise that my response was long and suggested a number of
solutions, and none of them in detail, but I am pretty sure I would not
do it that way. I'd go with my last suggestion - the "where should I be
now?" approach. It is far less time critical and the errors will be
minute. Errors will grow if CPU time is taken up in other processing,
but your performance can now be defined in terms of the maximum error
probably more closely related to the application.

A fixed periodic update will make it more deterministic, but at the
expense of greater average error.

To expand on my earlier proposal, where v is velocity, a is
acceleration, dt is the change in time since the last update, p is
position, and np the 'next' position (pseudocode):

np = p + v * dt + (a  dt  dt) / 2
v = v + a * dt
direction = np < p ? -1 : 1
error = abs(np - p )
if( error > MAX_ERROR )
{
// handle exception
}
else
{
for( i = 0; i < abs(np - p); i++ )
{
step( direction )
}
}

Because this does not depend on the huge value of Fosc, you are far less
likley to end up with overflow issues when using fixed point arithmetic.
Even if using floating point you are less likely to end up with loss of
precision or range errors.

Clifford

Rate this post
 0 ▲ useful ▼ not useful
Hi .
I try this:

IOSET1 = (1<<5); // output to osciloskop
np = p + v * dt + (a  dt  dt) / 2;
v = v + a * dt;
direction = np < p ? -1 : 1;
error = abs(np - p );
IOCLR1 = (1<<5); // output to osciloskop

ant it's  : 280ns

regards.

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
> Hi .
> I try this:
>
> IOSET1 = (1<<5); // output to osciloskop
> np = p + v * dt + (a  dt  dt) / 2;
> v = v + a * dt;
> direction = np < p ? -1 : 1;
> error = abs(np - p );
> IOCLR1 = (1<<5); // output to osciloskop
>
> ant it's  : 280ns
>
> regards.

20 clock cycles! I am not sure if I believe that especially if the
variables were floating point! Some of the calculations will have been
optimised out if you did nothing with the results. Pass all the
variables to a printf() call after the scope output. That will prevent
them from being optimised away. Do not use the volatile keyword since
that will prevent other valid optimisations.

Your timing mechanism is flawed. On many microcontrollers GPIO
operations require wait states. This probably says more about GPIO
performance than calculation performance; and for that, 280ns I can
believe - it is of the same magnitude as your 1/14th us step pulse
capability. You need to time repeated calculations so that the I/O
timing becomes insignificant.

Note my 'solution' was not intended to be directly translated to code.
It was merely the algorithm and equations. You will need to translate
them to machine math. You might use floating point or you could
translate it to fixed point (scaled integer) math. With fixed point math
there are additional operations to perform the scaling. If you use a
library, it will be far slower that optimising each calculation yourself
in one expression, since you can determine when scaling is necessary.

For example, if you use four decimal place fixed point to calculate:

a * b / c

a library implementation would result in:

multiply: temp = (a * b) / 1000
divide:   (temp * 1000) / c

whereas you could optimise the expression as simply a * b / c because
the scale factors cancel out, moreover for small values of a and b, the
library solution will give incorrect results, because the first step
will significantly loose precision (or even result in zero!).

Of course you'd not use 1000, but probably 1024 for the reasons I
mentioned earlier. Personally I would not trust a fixed point library
solution, it will be less efficient, and many I have come across on the
Internet are seriously flawed in any case - do it by hand.

Note an omission in my suggestion, np must be copied to p:

error = abs(np - p )
p = np

Clifford

• $formula (LaTeX syntax)$