Hi, I recently started to use the winarm toolchain with the programmers notepad editor. My target is a AT91SAMS256 board based on the same drawings as the EK. The project import works pretty well with the editor thanks to the guys from PN. But if I build the atmel example (blinking leds) I encounter some problems: When I put the compiler OPT=s (size) my code isn't running at all the leds ar on but never start to blink ! When I put the compiler OPT=0 (none) my code works just fine as I see the led blinking ! In the makefile: # Optimization level, can be [0, 1, 2, 3, s]. # 0 = turn off optimization. s = optimize for size. # (Note: 3 is not always the best optimization level. See avr-libc FAQ.) OPT = s #OPT = 0 If I take a look in the main.lss file for both builds there's indeed a difference in target main code maybe OPT=S is not recommended for this target I don't know. nevertheless this cannot be the case, when changing compiler options resulting in other execution behaviour. Any one encountered the same kind of strange things ? Solution ?
> > # Optimization level, can be [0, 1, 2, 3, s]. > # 0 = turn off optimization. s = optimize for size. > # (Note: 3 is not always the best optimization level. See avr-libc FAQ.) > OPT = s > #OPT = 0 > > If I take a look in the main.lss file for both builds there's indeed a > difference in target main code maybe OPT=S is not recommended for this > target I don't know. nevertheless this cannot be the case, when changing > compiler options resulting in other execution behaviour. > > Any one encountered the same kind of strange things ? > Solution ? Have you try O1 or O2 ? Jonathan
Jonathan Dumaresq wrote:
> Have you try O1 or O2 ?
Yes I tried 1 and 2 same as s. Only the 0 is working correctly. I red
about some problems in thumb-interwork mode (bug in gcc) but this is not
used here.
My version used:
arm-elf-gcc (GCC) 4.1.1 (WinARM)
# MCU name and submodel
MCU = arm7tdmi
SUBMDL = AT91SAM7S256
#THUMB = -mthumb
#THUMB_IW = -mthumb-interwork
The only files in the project are build in arm-mode:
# List C source files here which must be compiled in ARM-Mode.
# use file-extension c for "c-only"-files
SRCARM = $(TARGET).c Cstartup_SAM7.c
# List Assembler source files here which must be assembled in ARM-Mode..
ASRCARM = Cstartup.S
I downloaded the lastest version from the site of Martin Thomas of the gamma project for AT91. And guess what no poblem at all with that for the OPT possibilties ! I saw there were lots of improvements done in the linker .LD files. And also some changes in the makefile (more targets)In this project there are some files compiled even in Thumb mode. Still don't know exactly what the error for the OPT=s was.But maybe Martin can tel me a bit more what he has done to update the gamma project. Because his latest version isn't included in the winarm distribution. Thnks&Cheers Guy
You did not mention which tool-chain versions you were using or exactly which example code. It is entirely possible that the change in behaviour is due to the code. For example if hardware registers and data shared between an interrupt context and the non-interrupt context, or between threads are not declared volatile, then compiler my optimise out a read or write that must be done for the code to work correctly. Without the volatile keyword, the compiler may determine that it already has a value from the last time it was read, or that it has not changed a value since it was last written, and may then optimise out the access. This is just one reason why apparently working code can fail after optimization. Generally it will be dure to unsafe code rather than optimizer bugs. That said, if the compiler has bugs, the optimizer is also the most likely place. Clifford
Clifford Slocombe wrote: > You did not mention which tool-chain versions you were using or exactly > which example code. the toolchain is winarm package prebuild for windows and the project was the AT91SAM7S64 example from atmel (blinking leds for EK) > Without the volatile keyword, the compiler may determine that it already > has a value from the last time it was read, or that it has not changed a > value since it was last written, and may then optimise out the access. Yes agree this can be the case. I changed two loop variables qualified with volatile keyword. global var: volatile unsigned int LedSpeed = SPEED *50 ; static void wait ( void ) {//* Begin volatile unsigned int waiting_time ; change_speed () ; for(waiting_time = 0; waiting_time < LedSpeed; waiting_time++) ; }//* End And now its running perfectly even with the OPT=s I must study the differences in the lss file to see how the code is put into assembler with or without the volatile keyword. But it's more or like sure that the compiler optimized the looper vars so they did not work properly. Also the question when to use the volatile to avoid these kind of optimizations. Are there some rules in the world of GCC to follow when you have a candidate variable to put into volatile? Thanks Guy
The "volatile" is an message to the compiler, to act literally as you wrote it down, not as it thinks you might have intended. Todays compilers are judged by speed/size of code produced. Without "volatile", the loop is equivalent to the statement "waiting_time = LedSpeed;" and since this variable is not used anywhere, even this statement can be dropped. Adding "volatile" to the loop variable tells the compiler that the variable has invisible side effects (like an I/O register) and all intermediate results produced by the written code are actually required.
As it is in this code example for the purpose to test and view something quickly on the board code like the wait loops which in fact do only an INC(or DEC) and values of the loop are not used in the following statements the compiler decide to optimize them out. This seems to me very logic moreover since we have the volatile to tell the compiler that they serve really in the program flow. Indeed there is link when referring to I/O registers which are not directly needed in the main program flow but must be proper set. So loops which act as a delay or wait intention are perfect candidates to use the volatile for the variables used. Most of the time in real programs delays often will be made by a timer interrupt routine and the caution for using "volatile" will be less present there. And yes the compilers these days are really pushed to the limit to produce smaller (smalest) code in a world of small amount flash MCUs. Guy
Guy Vo wrote: > the toolchain is winarm package prebuild for windows and the project was > the AT91SAM7S64 example from atmel (blinking leds for EK) > WinARM has incorporated many versions of GCC over time. Run: gcc --version to determine the version. What I was interested in rather was the difference in gcc version between the built that failed and the one that apparently worked. This information may not help in tracking down the problem, but it is useful information to the community should similar problems occur. With respect to the example, post a link so we can be sure we are looking at exactly the same thing. If it is an example supplied with WinARM, then I obviously have a different distribution to you, which is why the release version is important. WinARM distributions are identified by the date embedded into the distribution zip-file's file name. > > And now its running perfectly even with the OPT=s > I am surprised, from the fragment you posted I am not sure how it would make a difference. > Also the question when to use the volatile to avoid these > kind of optimizations. Are there some rules in the world of GCC to > follow when you have a candidate variable to put into volatile? As I said, normally, when data is shared between thread or interrupt contexts, or when it is a hardware register that may be modified by the hardware itself, or which does not have symmetrical read-write properties (such as reset-on-read behaviour typical of interrupt status registers for example). In practice, it is simpler to declare all memory mapped hardware registers and I/O as volatile, and you will see this in the platform header files. Clifford
Clifford Slocombe wrote: > gcc --version arm-elf-gcc (GCC) 4.1.1 (WinARM) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > With respect to the example, post a link so we can be sure we are > looking at exactly the same thing. If it is an example supplied with > WinARM, then I obviously have a different distribution to you, which is > why the release version is important. WinARM distributions are > identified by the date embedded into the distribution zip-file's file > name. > winarm20060606.zip I think it's the latest. The example is in the directory: c:\winarm\examples\at91sam7s64_Atmel_example >> >> And now its running perfectly even with the OPT=s >> > I am surprised, from the fragment you posted I am not sure how it would > make a difference. I removed the volatile once more to be sure and the leds just go on but not blinking. Just for your information the same project is included with the KEIL package and it seems to work without modifing any of these variables. > As I said, normally, when data is shared between thread or interrupt > contexts, or when it is a hardware register that may be modified by the > hardware itself, or which does not have symmetrical read-write > properties (such as reset-on-read behaviour typical of interrupt status > registers for example). In practice, it is simpler to declare all memory > mapped hardware registers and I/O as volatile, and you will see this in > the platform header files. In this program there are no such things as thread or IR switches. It just a simple toggling of leds. I just adapted the code a bit because my board has no PLL and only two leds and two switches like /*-----------------*/ /* Leds Definition */ /*-----------------*/ #define LED1 (1<<0) #define LED2 (1<<1) #define NB_LEB 2 #define LED_MASK (LED1|LED2) /*-------------------------*/ /* Push Buttons Definition */ /*-------------------------*/ #define SW1_MASK (1<<15) #define SW2_MASK (1<<16) #define SW_MASK (SW1_MASK|SW2_MASK) And I modified Cstartup.s for the Main clock to take no PLL: LDR R0, =PMC_BASE MOV R1, #1 STR R1, [R0, #PMC_MOR] STR R1, [R0, #PMC_MCKR] //*--------------------------------------------------------------------- ------- //* ATMEL Microcontroller Software Support - ROUSSET - //*--------------------------------------------------------------------- ------- //* The software is delivered "AS IS" without warranty or condition of any //* kind, either express, implied or statutory. This includes without //* limitation any warranty or condition with respect to merchantability or //* fitness for any particular purpose, or against the infringements of //* intellectual property rights of others. //*--------------------------------------------------------------------- ------- //* File Name : main.c //* Object : main application written in C //* Creation : JPP 16/Jun/2004 //*--------------------------------------------------------------------- ------- // Include Standard files #include "AT91SAM7S64.h" /* AT91SAMT7S64 definitions */ #include "lib_AT91SAM7S64.h" #include "Board.h" /* Global variables */ #define SPEED (MCKKHz/10) unsigned int LedSpeed = SPEED *50 ; //volatile unsigned int LedSpeed = SPEED *50 ; const int led_mask[]= {LED1, LED2}; //*--------------------------------------------------------------------- ----------------- //* Function Name : change_speed //* Object : Adjust "LedSpeed" value depending on SW1 and SW2 are pressed or not //* Input Parameters : none //* Output Parameters : Update of LedSpeed value. //*--------------------------------------------------------------------- ----------------- static void change_speed ( void ) {//* Begin if ( (AT91F_PIO_GetInput(AT91C_BASE_PIOA) & SW1_MASK) == 0 ) { if ( LedSpeed > SPEED ) LedSpeed -=SPEED ; } if ( (AT91F_PIO_GetInput(AT91C_BASE_PIOA) & SW2_MASK) == 0 ) { if ( LedSpeed < MCK ) LedSpeed +=SPEED ; } }//* End //*--------------------------------------------------------------------- ----------------- //* Function Name : wait //* Object : Software waiting loop //* Input Parameters : none. Waiting time is defined by the global variable LedSpeed. //* Output Parameters : none //*--------------------------------------------------------------------- ----------------- static void wait ( void ) {//* Begin unsigned int waiting_time ; //volatile unsigned int waiting_time ; change_speed () ; for(waiting_time = 0; waiting_time < LedSpeed; waiting_time++) ; }//* End //*--------------------------------------------------------------------- ----------------- //* Function Name : Main //* Object : Software entry point //* Input Parameters : none. //* Output Parameters : none. //*--------------------------------------------------------------------- ----------------- int main(void) {//* Begin // First, enable the clock of the PIO AT91F_PMC_EnablePeriphClock ( AT91C_BASE_PMC, 1 << AT91C_ID_PIOA ) ; // then, we configure the PIO Lines corresponding to LED1 to LED4 // to be outputs. No need to set these pins to be driven by the PIO because it is GPIO pins only. AT91F_PIO_CfgOutput( AT91C_BASE_PIOA, LED_MASK ) ; // Clear the LED's. On the EB55 we must apply a "1" to turn off LEDs AT91F_PIO_SetOutput( AT91C_BASE_PIOA, LED_MASK ) ; // Loop forever for (;;) { AT91F_PIO_ClearOutput( AT91C_BASE_PIOA, LED_MASK) ; AT91F_PIO_ClearOutput( AT91C_BASE_PIOA, LED_MASK) ; wait(); AT91F_PIO_SetOutput( AT91C_BASE_PIOA, LED_MASK ) ; AT91F_PIO_SetOutput( AT91C_BASE_PIOA, LED_MASK ) ; wait(); }// End for }//* End
The odd thing is that when a variable is not decclared volatile when it should be, the worst that should happen is that its value never changes. So this is very odd behaviour, and I still cannot see why it needs to be volatile. However just from a 'good-design' point of view, I would make LedSpeed a local static of change_speed(), and have change speed return the value. static int change_speed ( void ) { static unsigned int LedSpeed = SPEED *50 ; if ( (AT91F_PIO_GetInput(AT91C_BASE_PIOA) & SW1_MASK) == 0 ) { if ( LedSpeed > SPEED ) LedSpeed -=SPEED ; } if ( (AT91F_PIO_GetInput(AT91C_BASE_PIOA) & SW2_MASK) == 0 ) { if ( LedSpeed < MCK ) LedSpeed +=SPEED ; } return LedSpeed ; } static void wait ( void ) { unsigned int waiting_time ; int led_speed = change_speed() ; for(waiting_time = 0; waiting_time < led_speed; waiting_time++) { /* do nothing */ } }
void wait ( void ) { unsigned int waiting_time ; change_speed () ; for(waiting_time = 0; waiting_time < 1000; waiting_time++) ; } => compiled with -Os wait: b change_speed So there is nothing left of the delayloop and the variable, because as seen by the compiler it does not have any effect and is not seen by anyone else outside of the compiler's scope, thus can be dopped completely unless the variable is volatile. Volatile tells the compiler that there is someone outside of the compiler's scope which tracks the value of the variable. This is the expected behaviour for a compiler not specifically targetting microcontrollers (those which do usually know that such as loop does make sense, but this is knowledge beyond any C rules) and a common surprise for programmers, as such behavior may change between compiler versions.
> static void wait ( void ) > { > unsigned int waiting_time ; > int led_speed = change_speed() ; > for(waiting_time = 0; waiting_time < led_speed; waiting_time++) > { > /* do nothing */ > } > } When you ask a machine, you usually get the same result for the same answer. It doesn't matter where LedSpeed comes from, because the compiler knows that there is absolutely no visible effect beyond the call of change_speed(). Keeping a constant runtime is not a compiler's job. And once you find a way to fool the compiler by using code too complicated to be optimized away, just wait for some future update.
A.K. wrote: >> static void wait ( void ) >> { >> unsigned int waiting_time ; >> int led_speed = change_speed() ; >> for(waiting_time = 0; waiting_time < led_speed; waiting_time++) >> { >> /* do nothing */ >> } >> } I changed my code for change_speed() and wait() into your code snippets. still doesn't work. I must make the waiting_time volatile ! Here are some parts out of the lss files: ______________________________________________________________________ ____ Without making the waiting_time volatile this is the generated asm: bl 100f24 <wait> 00100f24 <wait>: 100f24: e3e03c0b mvn r3, #2816 ; 0xb00 100f28: e51330c3 ldr r3, [r3, #-195] static unsigned int LedSpeed = SPEED *50 ; ===> this is the only thing generated for wait ! if ( (AT91F_PIO_GetInput(AT91C_BASE_PIOA) & SW1_MASK) == 0 ) { if ( LedSpeed > SPEED ) LedSpeed -=SPEED ; 100f2c: e59f0048 ldr r0, [pc, #72] ; 100f7c <.text+0xf7c> 100f30: e3130902 tst r3, #32768 ; 0x8000 } ______________________________________________________________________ ____ With making the waiting_time volatile this is the generated asm: volatile unsigned int waiting_time ; int led_speed = change_speed() ; for(waiting_time = 0; waiting_time < led_speed; waiting_time++) 100f84: e3a03000 mov r3, #0 ; 0x0 100f88: ea000001 b 100f94 <wait+0x70> 100f8c: e59d3000 ldr r3, [sp] 100f90: e2833001 add r3, r3, #1 ; 0x1 100f94: e58d3000 str r3, [sp] 100f98: e59d3000 ldr r3, [sp] 100f9c: e1530002 cmp r3, r2 100fa0: 3afffff9 bcc 100f8c <wait+0x68> This sounds like loop code to me don't we think ? ;-) R3 = init zero of loop R2 = ledspeed ______________________________________________________________________ ____ So my conclusion is indeed that empty loops with no further use in other parts of the program will be optimized out in arm-elf-gcc. I agree that in normal cases you don't write code like that but it's given as an example in the winarm distribution. I'd worked with other compilers (intel,paradigm,VC,borland,watcom) and they did not such a thing as agressivly as that. But I think the guys who adapt gcc to work for the embedded world must make tradeoffs in their algorithms to optimize the most out of it when you put the OPT=s. In my opinion this can be better not optimized why ? Wel, I count 4 instructions for the loop more on the arm7(32b mode) target which means only 16 bytes ! And as i said before these are code snippets you write quick and dirty to test something out and are not supposed to be stay in in the final release. At last if you intend to write a line of code it must be useful after all and you can always chose OPT=0 in other cases ;-) Guy
> I'd worked with other compilers (intel,paradigm,VC,borland,watcom) > and they did not such a thing as agressivly as that. I don't know Paradigm, but both the Borland and the Watcom compilers are no longer up current code generation and optimization standards since at least a decade. And for Intel's compiler - well, IIRC Intel was caught cheating because their compiler detected SPECmark hotspots and replaced them by hand crafted code or simply by the well known result. Which either resulted in wrong results, when it did not recognize a minor change in source code and still used the know wrong replacement. Or resulted in a large performance drop when it did recognize it and had to do it the intended way. Well, that's what I'd call agressive optimization... (I'd guess they are not alone in this game though, GCC however is open source, you'd see it). > But I think the guys who adapt gcc to work for the > embedded world must make tradeoffs It is the same compiler for all. This kind of optimization is part of the common GCC code and unaffected by the target specification. So those who maintain the ARM code generator have no control over such optimizations. Sorry, but do not expect other people to do your job, especially when they are not paid for it. An empty loop without any side effects is just that - void. It is your job to tell the compiler that it is not. If you need a compiler specifically designed for the embedded world, then buy one. GCC is not.
And yet another thing I encounterd for the same project: In board.h there are following includes #ifndef Board_h #define Board_h #include "AT91SAM7S64.h" #define __inline extern inline #include "lib_AT91SAM7S64.h" In main.c there is #include "Board.h" But also in the startup file Cstartup_SAM7.c there #include "Board.h" If this file comes first in the compile pass which is the case board.h will no more included in the main.c The results I saw are different on the OPT parameter. ______________________________________________________________________ ____ If I take a OPT=0 than the linker gives me: linking: main.elf arm-elf-gcc -mcpu=arm7tdmi -I. -gstabs -DROM_RUN -O0 -Wall -Wcast-align -Wimplicit -Wpointer-arith -Wswitch -Wredundant-decls -Wreturn-type -Wshadow -Wunused -Wa,-adhlns=Cstartup.lst -MD -MP -MF .dep/main.elf.d Cstartup.o main.o Cstartup_SAM7.o --output main.elf -nostartfiles -Wl,-Map=main.map,--cref -lc -lm -lc -lgcc -TAT91SAM7S256-ROM.ld main.o: In function `change_speed': main.c:30: undefined reference to `AT91F_PIO_GetInput' main.c:34: undefined reference to `AT91F_PIO_GetInput' main.o: In function `main': main.c:63: undefined reference to `AT91F_PMC_EnablePeriphClock' main.c:67: undefined reference to `AT91F_PIO_CfgOutput' main.c:70: undefined reference to `AT91F_PIO_SetOutput' main.c:75: undefined reference to `AT91F_PIO_ClearOutput' main.c:76: undefined reference to `AT91F_PIO_ClearOutput' main.c:78: undefined reference to `AT91F_PIO_SetOutput' main.c:79: undefined reference to `AT91F_PIO_SetOutput' ______________________________________________________________________ ____ If I take a OPT=s: Size after: text data bss dec hex filename 0 628 0 628 274 main.hex Errors: none ______________________________________________________________________ ____ But in the case of OPT=s there is no code compiled for the above undefined function which are as extern inlines as the result the program isn't working regardless of volatile here. This is a seriuos malfunction of the linker/locater because I get no undefined warnings at all and moreover the functions are not present in the final bin file ! If I include these files explicitly like #include "AT91SAM7S64.h" /* AT91SAMT7S64 definitions */ #include "lib_AT91SAM7S64.h" in the main.c the output of the build is like: Size after: text data bss dec hex filename 0 6488 0 6488 1958 main.hex Errors: none The function are now in the hex file as you can see the size is 10 times more. ______________________________________________________________________ ____ My general conclusion on this example is that it is not working as it should be in using arm-elf- toolchain. I tried this in the KEIL environment and there wer no problems at all. If you start working and learning this toolchain it's better to take the gamma project from Martin Thomas on follwing link: http://gandalf.arubi.uni-kl.de/avr_projects/arm_projects/index_at91.html Take the gamma example you get pretty much on testing even a small monitor which works great on DBGU port. So for the guys who make the winarm distributions please take a look at your basic examples for AT91. Cheers Guy
It's just a wild guess because I have never used the SAM7 lib, but IMHO you currently are the process detected one optimization pitfall after the other, this time regarding an optimization strategy called "inlining". The keywork "volatile" won't help you out this time.
small fix: ... but IMHO you currently are the in process detecting ...
Yes by taking that one particular example I became the optimization pittfal guy ;-) But I 'm learning from it that's the good thing. I'm also rather new in this ARM buisness. I'm just looking out and try some IDEs to get me of programming for it. I checked both IAR and KEIL and now GNU. A friend of mine is using the codevision IDE which is cheaper than KEIL or IAR. But I red some benchmarking reports that GNU is one of the best in size/speed so I take a look at now ;-)
A.K. wrote: > void wait ( void ) > { > unsigned int waiting_time ; > change_speed () ; > for(waiting_time = 0; waiting_time < 1000; waiting_time++) ; > } > > => compiled with -Os > > wait: > b change_speed > > So there is nothing left of the delayloop and the variable, ... Well spotted A.K. I fell so foolish since I have seen this before (while benchmarking compilers!). I was confused by Guy's setting LedSpeed volatile, which I am presuming makes no difference. Because waiting time is not used outside the loop, it can be optimised out, sinve teh aim of the optimiser is to produce functionally equivalent code, but it is not aware of 'real-time' constraints of the code. Declaring volatile forces all values to be calculated. So to guy, that is one other 'rule' about volatime I omitted. You need it if the coded must be forced to generate all intermediate 'states' even if they are apparently 'ignored'. Timing loops and benchmarking being two cases in-point. There is a more general point here - using 'delay loops' is a bad idea, all sorts of things could break this code - a faster processor, caching, optimisation. It is far better to configure a hardware timer and either poll its value, or use its interrupt to increment a system counter that is polled. Clifford
Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
Log in with Google account
No account? Register here.