EmbDev.net

Forum: ARM programming with GCC/GNU tools Code execution from flash (mentioned at MP3 player project)


von Andy (Guest)


Rate this post
useful
not useful
Hi,

I am thinking about switching from Atmel AVR to ARM. I thought the
AT91SAM7S256 may be nice device among the ARM-uCs. Now i read in the
interesting ARM MP3/AAC player project on this site, that the execution
speed from flash is slow. How much slower it is exactly compared to ram
execution? In datasheet it is mentioned: "Single Cycle Access at Up to
30 MHz in Worst Case Conditions". Does that mean half execution speed if
clock is faster than 30 Mhz? Just a guess.

I could not find much more about this (maybe i just used the wrong
keywords). Does anyone know how other ARM devices perform here. I have
considered Philips LPC2148 and ADuC7027 (or ADuC7128 when available).

Of course any infos about other unexpected surprises similar to this one
will be appreciated :) Is the programming so easy as with the avr-gcc?

Thanks in advance,
Andy

von Stefan (Guest)


Rate this post
useful
not useful
Just to give you a "house number" - in my Atmel ARM7 AT91R40008 board
running at 66 MHz and with external Flash-ROM (AT91R40008 has no
Flash-ROM) the execution speed running from Flash-ROM is eight times
slower than running from internal SRAM of the ARM7. A detailled
description of the board is available in the Wiki of
www.mikrocontroller.net (look for Tyco or NavMe).

Stefan

von Clifford S. (clifford)


Rate this post
useful
not useful
You need ot refer to the processor's user manual - in this case
http://www.atmel.com/dyn/resources/prod_documents/doc6175.pdf. Refer to
page 110; Depending on the FWS level, reads may take 1, 2, 3, or 4 clock
cycles. Page 504 has a table detailing the tradeoff between FWS and
clock speed. For the chips maximum of 55MHz, you will need 2 wait
states, reducing flash access to 27.5MHz (but with faster RAM access, so
in general it would probably still be benficial overall).

It is almost always the case that RAM execution will be faster than any
kind of ROM unless you are executing at very low clock frequencies.
However the differential between ROM and RAM on this chip is not that
great.

Other chips support external memory devices which may provide more
flexible control of performance by selection of an appropriate memory
device, but typically on chip devices where present will be as fast or
faster than external. For external devices, the performance will be
dependent upon the specific memory part and the bus configuration
(timing and width)

Many other chips also include instruction and data caches which will
significantly improve performance for all memory devices. Applications
with small code, or where most of the execution time is spent only in a
small part of the code will especially benefit form a cache. You can
also lock-down the cache, so you can prevent critical code or data from
being flushed.

For high performance applications, ROM should generally be reserved for
boot code and application image storage. The boot code should copy the
application image to RAM for execution. Looking at the flash/RAM balance
on the AT91SAM7S parts they are obviously intended for ROM code
execution - that is why the flash performance is relatively high. You
might consider having specific critical code execute form RAM, but that
would complicate the link and load process considerably, since you have
to arrange for the linker to place code in either ROM or RAM, place a
copy of the RAM code in the ROM image, and then copy it to RAM at
runtime.

Clifford



In some a

von Andy (Guest)


Rate this post
useful
not useful
Thanks for the quick answers. They helped a lot. Now i know what i have
to look for.

Andy

P.S.: But doesn't the table indicate that 1 FWS is enough at 55 Mhz?

von Clifford S. (clifford)


Rate this post
useful
not useful
Andy wrote:
> P.S.: But doesn't the table indicate that 1 FWS is enough at 55 Mhz?

Yes, sorry, I meant two clock cycles (i.e. 1 wait state). There is no
benefit in using FWS greater than 1, presumably this is intended for
faster parts in the family (or future family).

As an aside, and an example of the benefits of a chip with a cache: I
recently evaluated the Philips LPC3180, it runs at 208MHz and has a
Vector Floating Point unit. It was the floating point performance I was
interested in, specifically for a large single precision matix
multiplication. The test matrices' dimensions were 1200x1200 and
1200x30. With instruction/data caching off the execution time was 36
seconds, with caching it was 19 seconds. This was using SDR SDRAM
(104MHz bus). All the instructions for the calculation loop will have
been cached, leaving only the data accesses needing to go to the bus,
thus almost halving the execution time! In algorithms where the data set
is also small, most of the data accesses may be from cache also.

My advice then is that if you have to execute from ROM, and you don't
have a cache, make sure it is a fast ROM. Of course, any application
only needs to be 'fast enough', so you also need to consider the
requirements and deadlines of the algorithm to avoid unnecessarily
paying for expensive hardware.

Clifford

von Andreas S. (andreas) (Admin)


Rate this post
useful
not useful
Clifford Slocombe wrote:
> You
> might consider having specific critical code execute form RAM, but that
> would complicate the link and load process considerably, since you have
> to arrange for the linker to place code in either ROM or RAM, place a
> copy of the RAM code in the ROM image, and then copy it to RAM at
> runtime.

It's actually very easy, you just have to add
  _attribute_ ((section (".data")))
to the function definition. The .data section will be copied to RAM by
the linker script anyway.

von Clifford S. (clifford)


Rate this post
useful
not useful
Andreas S. wrote:

> It's actually very easy, you just have to add
>   _attribute_ ((section (".data")))
> to the function definition. The .data section will be copied to RAM by
> the linker script anyway.

Good point. My entire application runs from RAM after expanding a ZLib
compressed application image from ROM (which is a somewhat more complex
trick); so since I have never had to perform the trick I had suggested,
I probably should have kept quiet about its level of complexity ;-)

Clifford

von Clifford S. (clifford)


Rate this post
useful
not useful
> Does anyone know how other ARM devices perform here. I have
> considered Philips LPC2148 and ADuC7027 (or ADuC7128 when available).

You might find this interesting:
http://eetimeseurope.cmp.com/products/micro/188501108

Clifford

von Jim K. (ancaritha)


Rate this post
useful
not useful
Andreas S. wrote:
> It's actually very easy, you just have to add
>   _attribute_ ((section (".data")))
> to the function definition. The .data section will be copied to RAM by
> the linker script anyway.

I just started working on ARMs and decided to try this on one of my
interrupts.

I declared the function:
void EXTCOMM_IRQ_INTERRUPT(void) _attribute_ ((section (".data")));


But when I compiled and ran it, the function wasn't called.  If I
removed the _attribute_ it works.  Is there anything else that I have
to change that I missed or forgot about?


If anyone could point me in the right direction I'd appreciate it,
thanks.

von Phil D. (Guest)


Rate this post
useful
not useful
Jim Kaz wrote:

> If anyone could point me in the right direction I'd appreciate it,
> thanks.

Did you ever find a solution to this, Jim?

von Jim K. (ancaritha)


Rate this post
useful
not useful
Phil D. wrote:
> Jim Kaz wrote:
>
>> If anyone could point me in the right direction I'd appreciate it,
>> thanks.
>
> Did you ever find a solution to this, Jim?

Nope.  I just kinda deal with it for the time being.  Perhaps later once
I have all of the components in my system and can spend the time to
focus on code execution time, I'll come back to this.

von Martin Thomas (Guest)


Rate this post
useful
not useful
Maybe useful: The "gamma" and the "RTT-basic" example for AT91SAM7S
demonstrate ISRs located in RAM:
http://www.siwawi.arubi.uni-kl.de/avr_projects/arm_projects/index_at91.html
.
I use a special function-section ".fastrun" for this which is integrated
in the linker's-.data-section but is also aligned. I have also used this
method in other projects not available from my web-pages. The problem
might be caused by other reasons. Check: thumb/thumb-interwork settings,
assembler-wrappter for INT (if any), alignment/symbol-list, set a
breakpoint inspect the stack-pointer, long_call etc. Difficult to help
without further details. Create a minimal full example (source,
linker-script, makefile) the reproduce the failure and attach it to a
message.

Martin Thomas

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.