Hi, I am thinking about switching from Atmel AVR to ARM. I thought the AT91SAM7S256 may be nice device among the ARM-uCs. Now i read in the interesting ARM MP3/AAC player project on this site, that the execution speed from flash is slow. How much slower it is exactly compared to ram execution? In datasheet it is mentioned: "Single Cycle Access at Up to 30 MHz in Worst Case Conditions". Does that mean half execution speed if clock is faster than 30 Mhz? Just a guess. I could not find much more about this (maybe i just used the wrong keywords). Does anyone know how other ARM devices perform here. I have considered Philips LPC2148 and ADuC7027 (or ADuC7128 when available). Of course any infos about other unexpected surprises similar to this one will be appreciated :) Is the programming so easy as with the avr-gcc? Thanks in advance, Andy
Just to give you a "house number" - in my Atmel ARM7 AT91R40008 board running at 66 MHz and with external Flash-ROM (AT91R40008 has no Flash-ROM) the execution speed running from Flash-ROM is eight times slower than running from internal SRAM of the ARM7. A detailled description of the board is available in the Wiki of www.mikrocontroller.net (look for Tyco or NavMe). Stefan
You need ot refer to the processor's user manual - in this case http://www.atmel.com/dyn/resources/prod_documents/doc6175.pdf. Refer to page 110; Depending on the FWS level, reads may take 1, 2, 3, or 4 clock cycles. Page 504 has a table detailing the tradeoff between FWS and clock speed. For the chips maximum of 55MHz, you will need 2 wait states, reducing flash access to 27.5MHz (but with faster RAM access, so in general it would probably still be benficial overall). It is almost always the case that RAM execution will be faster than any kind of ROM unless you are executing at very low clock frequencies. However the differential between ROM and RAM on this chip is not that great. Other chips support external memory devices which may provide more flexible control of performance by selection of an appropriate memory device, but typically on chip devices where present will be as fast or faster than external. For external devices, the performance will be dependent upon the specific memory part and the bus configuration (timing and width) Many other chips also include instruction and data caches which will significantly improve performance for all memory devices. Applications with small code, or where most of the execution time is spent only in a small part of the code will especially benefit form a cache. You can also lock-down the cache, so you can prevent critical code or data from being flushed. For high performance applications, ROM should generally be reserved for boot code and application image storage. The boot code should copy the application image to RAM for execution. Looking at the flash/RAM balance on the AT91SAM7S parts they are obviously intended for ROM code execution - that is why the flash performance is relatively high. You might consider having specific critical code execute form RAM, but that would complicate the link and load process considerably, since you have to arrange for the linker to place code in either ROM or RAM, place a copy of the RAM code in the ROM image, and then copy it to RAM at runtime. Clifford In some a
Thanks for the quick answers. They helped a lot. Now i know what i have to look for. Andy P.S.: But doesn't the table indicate that 1 FWS is enough at 55 Mhz?
Andy wrote: > P.S.: But doesn't the table indicate that 1 FWS is enough at 55 Mhz? Yes, sorry, I meant two clock cycles (i.e. 1 wait state). There is no benefit in using FWS greater than 1, presumably this is intended for faster parts in the family (or future family). As an aside, and an example of the benefits of a chip with a cache: I recently evaluated the Philips LPC3180, it runs at 208MHz and has a Vector Floating Point unit. It was the floating point performance I was interested in, specifically for a large single precision matix multiplication. The test matrices' dimensions were 1200x1200 and 1200x30. With instruction/data caching off the execution time was 36 seconds, with caching it was 19 seconds. This was using SDR SDRAM (104MHz bus). All the instructions for the calculation loop will have been cached, leaving only the data accesses needing to go to the bus, thus almost halving the execution time! In algorithms where the data set is also small, most of the data accesses may be from cache also. My advice then is that if you have to execute from ROM, and you don't have a cache, make sure it is a fast ROM. Of course, any application only needs to be 'fast enough', so you also need to consider the requirements and deadlines of the algorithm to avoid unnecessarily paying for expensive hardware. Clifford
Clifford Slocombe wrote: > You > might consider having specific critical code execute form RAM, but that > would complicate the link and load process considerably, since you have > to arrange for the linker to place code in either ROM or RAM, place a > copy of the RAM code in the ROM image, and then copy it to RAM at > runtime. It's actually very easy, you just have to add _attribute_ ((section (".data"))) to the function definition. The .data section will be copied to RAM by the linker script anyway.
Andreas S. wrote: > It's actually very easy, you just have to add > _attribute_ ((section (".data"))) > to the function definition. The .data section will be copied to RAM by > the linker script anyway. Good point. My entire application runs from RAM after expanding a ZLib compressed application image from ROM (which is a somewhat more complex trick); so since I have never had to perform the trick I had suggested, I probably should have kept quiet about its level of complexity ;-) Clifford
> Does anyone know how other ARM devices perform here. I have > considered Philips LPC2148 and ADuC7027 (or ADuC7128 when available). You might find this interesting: http://eetimeseurope.cmp.com/products/micro/188501108 Clifford
Andreas S. wrote: > It's actually very easy, you just have to add > _attribute_ ((section (".data"))) > to the function definition. The .data section will be copied to RAM by > the linker script anyway. I just started working on ARMs and decided to try this on one of my interrupts. I declared the function: void EXTCOMM_IRQ_INTERRUPT(void) _attribute_ ((section (".data"))); But when I compiled and ran it, the function wasn't called. If I removed the _attribute_ it works. Is there anything else that I have to change that I missed or forgot about? If anyone could point me in the right direction I'd appreciate it, thanks.
Jim Kaz wrote: > If anyone could point me in the right direction I'd appreciate it, > thanks. Did you ever find a solution to this, Jim?
Phil D. wrote: > Jim Kaz wrote: > >> If anyone could point me in the right direction I'd appreciate it, >> thanks. > > Did you ever find a solution to this, Jim? Nope. I just kinda deal with it for the time being. Perhaps later once I have all of the components in my system and can spend the time to focus on code execution time, I'll come back to this.
Maybe useful: The "gamma" and the "RTT-basic" example for AT91SAM7S demonstrate ISRs located in RAM: http://www.siwawi.arubi.uni-kl.de/avr_projects/arm_projects/index_at91.html . I use a special function-section ".fastrun" for this which is integrated in the linker's-.data-section but is also aligned. I have also used this method in other projects not available from my web-pages. The problem might be caused by other reasons. Check: thumb/thumb-interwork settings, assembler-wrappter for INT (if any), alignment/symbol-list, set a breakpoint inspect the stack-pointer, long_call etc. Difficult to help without further details. Create a minimal full example (source, linker-script, makefile) the reproduce the failure and attach it to a message. Martin Thomas