# Forum: ARM programming with GCC/GNU tools run code from.

 Author: Sevc D. (sevc) Posted on: 2008-01-25 19:14

Rate this post
 0 ▲ useful ▼ not useful
Hi all.
I test my application and see (osciloscope) so my function are 60us
length.
how can I run this function from RAM??? or check if use MAM????

regards

 Author: Clifford S. (clifford) Posted on: 2008-01-25 21:22

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
> Hi all.
> I test my application and see (osciloscope) so my function are 60us
> length.
> how can I run this function from RAM??? or check if use MAM????
>
> regards
With respect to the MAM, read the manual!
http://www.standardics.nxp.com/support/documents/microcontrollers/pdf/user.manual.lpc2141.lpc2142.lpc2144.lpc2146.lpc2148.pdf.
Chapter 3 is what you need.

Basically you ensure it is switch off (reset default) by writing to the
MAMCR register, and then set the timing in the MAMTIM register before
switching it back on again. The manual has the timing details. I suggest
that you do this in the C Runtime Start-up (normally crt0.s) to get it
running as soon as possible after setting up the clock.

With this enabled you may find you need not run from RAM - it may make
little difference for the extra complexity and RAM is more scarce on
your part. The MAM is designed to allow code in Flash to execute with
very few wait states. It will have no affect on code that is run from
RAM (unless the code fetches data from Flash).

Clifford

 Author: Sevc D. (sevc) Posted on: 2008-01-26 05:37

Rate this post
 0 ▲ useful ▼ not useful
how many faster is execute code from Ram ???
In code I use calculation variable (double) and this is long time.
I must reduce calculation.
I meter time this:
IOSET0 = (1<<5);
IOCLR0 = (1<<5);
time of this instruction is 250ns .
and this :
IOSET0 = (1<<5);
T0MR0 = T0TC + Ax_p_x.Timer;//INTERVAL_XuS;//
IOCLR0 = (1<<5);
time of this is 1,12us;

If I do this from RAM how many faster will be,like from Flash???
Now I use in my calculation big precision of motion, but it's more us.
the acceleration and deceleration are in sinusoide , I thinks so it's
not for ARM7 .
If I use Toggle on match output then I can save some time, but I don't
know if it's not spare.

regards

 Author: Clifford S. (clifford) Posted on: 2008-01-26 11:38

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
> how many faster is execute code from Ram ???
> In code I use calculation variable (double) and this is long time.
As I said, the MAM probably makes the improvement marginal. Your
hardware does not have floating point hardware making floating point
calculations especially slow. In my experience double precision floating
point operation in software takes about ten times longer that an
equivalent (or at least suitable) fixed-point calculation. I suggest
that the greatest improvement can be made by removing the need for
floating point.

A quick gain may be achieved by using single precision rather than
double, because a single precision value fits into a single machine
word, it requires far fewer operations and memory accesses to
manipulate. If you do this, and are using library functions, names sure
you use the single precision versions, in C++ this is automatic when you
include <cmath> because they are overloaded, but in C (<math.h>) you
would for example have to use fsin() rather than sin().

Don't get hung up on the RAM execution thing, I think you will see
little benefit. It is generally true that RAM execution is faster, but
on your hardware specifically, the MAM is designed exactly to make that
unnecessary. Note that the MAM is peculiar to the LPC2000 series, and if
you were to port to another device you may need to run from RAM, but
frankly I even doubt that, your use of floating point will swamp any
performance hit from Flash execution.

> I must reduce calculation.
> I meter time this:
>     IOSET0 = (1<<5);
>     IOCLR0 = (1<<5);
> time of this instruction is 250ns .
> and this :
>     IOSET0 = (1<<5);
>     T0MR0 = T0TC + Ax_p_x.Timer;//INTERVAL_XuS;//
>     IOCLR0 = (1<<5);
> time of this is 1,12us;
>
> If I do this from RAM how many faster will be,like from Flash???
Are those timings with or without MAM fully enabled? The answer is
easiest to determine by disassembling the generated code and counting
the instructions. At 60MHz you will achieve more-or-less 60 MIPS.
However bear in mind that I/O operations are slower than RAM as well,
and insert wait states. Again the LPC architecture has 'acceleration'
features to mitigate this. Again read the manual, but to use the
accelerated I/O you need to use the 'F' prefixed I/O registers - you are
using the legacy code support registers, which are far slower. The 'F'
registers include single bit operations that are even faster than FIOSET
accesses. Chapter 8 of the manual. There really is no substitute for
reading the documentation in these cases - that is what I am doing- I've
never used the part!

> Now I use in my calculation big precision of motion, but it's more us.
> the acceleration and deceleration are in sinusoide , I thinks so it's
> not for ARM7 .

On the contrary, use fixed point arithmetic (scaled integers) and create
a sinusoid look-up table. It is very unlikely that you need double
precision. Use a 2^n value for your scaling so that you can use shift
operations rather than division for scaling (the compiler will do that
for you even if you use a divide if the RHS is a power of 2 constant).
The easiset way to create the look up table is in a spreadsheet, export
as CSV and wrap it

int sin_table[] = { <insert CSV dats here> } ;

or you might write your own code generator. Even if you persisted with
floating point and made the lookup table type float, it will be faster
than using the sin() function.

The size of the table will depend on the necessary precision and
available memory, a 512 element lookup table will give better than one
degree accuracy and take 2K of Flash. You may not need that much
resolution since the motor will naturally interpolate between calculated
points with a linear approximation. You can reduce the table size by
only encoding one quadrant, and then using reflection and inversion to
obtain values in the other quadrants. If you are using degrees or
radians, it may be simpler to make your table a multiple of 360 or 3145
(PI*1000 for four digit accuracy)

> If I use Toggle on match output then I can save some time, but I don't
> know if it's not spare.
>
Then you may as well use the PWM as I suggested.

Anyway my conclusion would have to be that you don't need RAM execution
or a faster processor or an FPU, you merely need to adapt your coding
practices to suit the hardware. That means reading the manual, and
learning to do without floating point (using fixed point and look-up
tables). I can assure you that most commercial motion control and even
more computationally complex applications do not use floating point.
Some resources:
http://www.embedded.com/98/9804fe2.htm
http://en.wikipedia.org/wiki/Fixed-point_arithmetic

If you were to post the expression you are trying to compute I could

Clifford

 Author: Sevc D. (sevc) Posted on: 2008-01-26 22:01

Rate this post
 0 ▲ useful ▼ not useful
Hi Clifford.
I meter some time of some function.
Basic setting are in Startup.S file.
This value I have set, Cristal is 12MHz>
VPB Div 4
PLL M Multiplier 5-1 > 4
D Divider 1 > 2
MEM Fully Enabled
Timing 4
I meter this code, it's timer1 interupt every 10Ms:

IOSET0 = (1<<5);
T1IR = 1;      /* clear interrupt flag from MR0*/
IENABLE;      /* handles nested interrupt */
IDISABLE;
VICVectAddr = 0;    /* Acknowledge Interrupt */
IOCLR0 = (1<<5);
It's 790ns.

Multiplier to 6: 660ns
Multiplier to 7: 490ns
Multiplier to 8: 430ns
Multiplier to 9: 390ns

I change Multiplier back to 5 and change VPB Div to 0 and meter again.
Multiplier to 5: 520ns
Multiplier to 6: 430ns
Multiplier to 7: 320ns
Multiplier to 8: 290ns

In this value I don't know if MCU is stable or not , is more hot like
normal setting.

I thinks so will be good to write some code in asm not in c.

regards

 Author: Clifford S. (clifford) Posted on: 2008-01-27 03:04

Rate this post
 0 ▲ useful ▼ not useful
Sevc Dominik wrote:
>
> I thinks so will be good to write some code in asm not in c.
>

I think you are "sweating the small stuff". Why are you worried about a
few hundred nanoseconds when elsewhere you are using floating point
operations that take far longer!?

That code is so simple, I doubt very much hand coding in assembler will
make much difference.

Bear in mind that given the way you are performing the timings, a
significant amount of the time is actually be consumed by the output
operations you are using to do the measurement! You earlier measured
that at 250ns, which since the MAM would not affect the output timing,
suggests that the execution time between the output toggles is very
short. You would have to at least time the toggle on its own for each
scenario and subtract that from all your timings, and even then that
does not account for all the timing instrumentation overhead.

I note that you are still using the slow I/O registers. Which makes me
wonder why I bothered to investigate it for you!

Another reason I suggest that you are sweating "the small stuff" is that
I suspect that the interrupt latency is more significant that the time
of execution of the lines you posted, and you can probably do little

A better way to estimate overall interrupt execution time is to create a
busy-loop in main() (after setting up the timer interrupt) that
constantly toggles the output, and then when the interrupt executes the
activity will stop - time the period for which there is no activity to
determine the true interrupt execution time. Your code is probably
insignificant compared to the number of instructions required to
preserve and restore registers and perform the processor mode switching
to and from IRQ, and then there is the hardware latency as well. If your
whole interrupt truly takes less than 10us I would be impressed. You
have to measure correctly and consider the effect the act of measurement
itself has on the timing.

The user manual suggests a MAM clock of 3 for processors >40MHz, so you
might be pushing it. What is worth more, shaving off 100ns or having
your application reliable in the field?

What timing do you actually need to achieve (and why)? I am struggling
to understand what you are trying to achieve. Stepping a motor and
generating motion profiles is well within the capability of the part you
are using, and I suspect that you are worrying about the wrong thing.
Remember it is only too slow if it fails to meet its deadlines - what
are the deadlines? If you cannot quantify them you are wasting your time
measuring - you have no means of determining success. I would suggest
that if success depends a few hundreds of nanoseconds, then you will
fail.

Clifford

 Author: Sevc D. (sevc) Posted on: 2008-01-28 23:24

Rate this post
 0 ▲ useful ▼ not useful
Hi Clifford.

I'm work all time .I reduce calculation for acceleration,deceleration
max.speed... etc. now My code is 16us long , if overlock cpu then les,
but I meter my step motor (with driver) on MACH3 and I can't turn more
then 12000step per sec. I'm surprised , I thinks so I can set more then
35K step per sec , but no.It's not import to generate more step how
stepmotor can do. If test my electronic with new code then can step more
then 20K step per sec. My code is match faster then mach3 on my PC.
OK ,I need some reserve of free step speed, with this code I can do
this.
may be problem will be , if give 3 motors to machine (small cnc) and
create any motion , then I see if motion is correct (in minimal step
resolution) or error is more then one step.
Now I use screwed shaft M6. Motor is 200step per, I use half step. On
one    turn generate 400step. minimal step is 0.0025mm .
If error is less then this step , then All is good, if no then must
calculate with match beter precision.

thanks for all your sugestion and help. I write more, if create used
code, and start test.

regards.

PS: Now I haw not problem with hardware (LPC2142) but with C language
(I'm not good programer in this language).

• $formula (LaTeX syntax)$