Posted on:
This swi wrapper has been working well for months. It packages swi calls properly for heavy optimization. But now, today, out of around 250 Kbytes of compiled code overflowing with SWI calls that compile fine with -O2 optimization -- failure! Repeatable. Here's test code. ******** COMPILE BY: arm-elf-gcc -O2 test.c arm-elf-objdump -d --source a.out > test.dmp ******** SOURCE CODE:
static inline long swi (long arg1, long arg2) { register long __res asm("r0"); register void *__a asm("r0") = (void *)(long)arg1; register void *__b asm("r1") = (void *)(long)arg2; asm volatile ("svc 0x15 @ %0%1%2" : "=r" (__res) : "r" (__a), "r" (__b) : "memory", "r12", "r14", "cc"); return(__res); } void main (int argc, char **argv) { swi (10, 1000/argc); swi (10, 1000); swi (10, 1000*argc); } void _exit (int status) { while (1); } |
******** DISASSEMBLY:
00008210 <main>:
8210: e92d4010 push {r4, lr}
8214: e1a01000 mov r1, r0
8218: e1a04000 mov r4, r0
821c: e3a00ffa mov r0, #1000 ; 0x3e8
8220: eb00000b bl 8254 <__aeabi_idiv>
8224: e1a01000 mov r1, r0
8228: ef000015 svc 0x00000015
822c: e3a0000a mov r0, #10
8230: e3a01ffa mov r1, #1000 ; 0x3e8
8234: ef000015 svc 0x00000015
8238: e0643284 rsb r3, r4, r4, lsl #5
823c: e0844103 add r4, r4, r3, lsl #2
8240: e3a0000a mov r0, #10
8244: e1a01184 lsl r1, r4, #3
8248: ef000015 svc 0x00000015
824c: e8bd8010 pop {r4, pc}
******** PROBLEM:
NOTE how the first svc call LOSES THE r0 ARGUMENT! The compiler is
religious about inserting it in the other two calls (the mov r0, #10).
But in the first call, r0 is returned by the divide call and shoved into
r1; but the r0 argument of #10 is NOT loaded!
AGAIN -- this svc wrapper is working for many thousands of lines of code
peppered with calls, all of them optimized with -O2; and it works fine
un-optimized as well. It is just this one configuration with the divide
call that doesn't work.
Should I report this? Or am I missing something?
Thanks,
--Bill
Posted on:
Everybody, I really think this is a bug. Here it is again, simplified. Compiled/dumped with: arm-elf-gcc -O2 test.c arm-elf-objdump -d a.out > test.dmp Unless somebody can tell me this really is what is supposed to happen, I'll report this in bugzilla this week. --Bill
void _exit (int status) { while (1); } void main (int argc, char **argv) { register int x asm ("r0") = 10; register int y asm ("r1") = 1000/argc; asm volatile ("and %0,%1,%2" : "=r" (x) : "r" (x), "r" (y)); _exit (x); } |
Compiles to:
821c: e3a00ffa mov r0, #1000 ; 0x3e8
8220: eb000002 bl 8230 <__aeabi_idiv>
8224: e1a01000 mov r1, r0
8228: e0000001 and r0, r0, r1
|
Where is the mov r0, #10? Is this really what is supposed to happen? Edit: This is arm-elf-gcc 4.5.1 compiled and running under Darwin (OS X Snow Leopard 10.6.4).
Posted on:
Bill Burgess wrote: > Where is the mov r0, #10? Is this really what is supposed to happen? I also like the -O0 version (first two lines):
mov r0, #10 mov r0, #1000 ldr r1, [fp, #-8] bl __aeabi_idiv mov r3, r0 mov r1, r3 and r0,r0,r1 |
GCC: (Sourcery G++ Lite 2009q3-68) 4.4.1 -- Marcus
Posted on:
See: http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Local-... Such as: "Defining such a register variable does not reserve the register;" and "This option does not guarantee that GCC will generate code that has this variable in the register you specify at all times." This pretty much explains what you see here. Those local register vars just give you a nice name for R0/R1 when passed to asm statements but do not prevent the compiler from it's usual use of the registers. Works as documented, so don't expect anything useful from Bugzilla report. So I don't think those R0/R1 names will work reliably in different optimization settings and GCC revisions when mixing them with C code.
Posted on:
Very useful indeed A.K. thank you very much! Thanks also for generously omitting to mention that the documentation refers to this as "a common pitfall". And yet more thanks for saving me from the embarrassment of a bugzilla report for known behavior. Unfortunately, the solution the manual gives -- using a temporary variable for the result of the divide-by expression -- does not work under optimization. As you predicted. So back to the drawing board I guess.
Posted on:
It should be a bit more reliable (and simpler) to use the parameter passing conventions to assign R0/R1, such as
long swi(long r0, long r1) __attribute__((noinline)); long swi(long r0, long r1) { asm volatile ("svc 0x15" : : "r" (r0), "r" (r1) : "memory", "r12", "r14", "cc"); } |
Note that you should not add any C operation to this function. And do not inline the wrapper. With those restrictions applied, your initial code might also work. You initial code likely broke due to the inlining of the wrapper function, which in this case separated the register assignments from the SWI code and placed a runtime function in between, which has it's own assumptions on use of R0.
Posted on:
And when all fails, you can always revert to "naked" funktions and add your own prolog/epilog, something like (untested)
long swi(long r0, long r1) __attribute__((noinline,naked)); long swi(long r0, long r1) { asm volatile ("push {lr}\n\tswi 0x15\n\tpop {pc}"); } |
Posted on:
Thanks AK! Another approach is to add mov instructions before the swi, as part of the same asm(), that explicitly load the arguments into r0 and r1. Smaller memory footprint than adding a function-call wrapper. The reason for my inline approach was to allow the optimizer to reduce code size as much as possible. And, in fact, it works splendidly. The resulting assembly is extremely tight and I've been very happy until now. I just wasn't aware that gcc made no effort to avoid clobbering register variables. That in fact a register variable isn't really a variable at all, it's an alias. In my architecture, the argument in r0 is always an immediate constant. So, by reversing the order in which the register aliases are assigned the problem goes away:
void main (int argc, char **argv) { register int y asm ("r1") = 1000/argc; register int x asm ("r0") = 10; asm volatile ("and %0,%1,%2" : "=r" (x) : "r" (x), "r" (y)); _exit (x); } |
...produces...
mov r0, #1000 ; 0x3e8 bl 8234 <__aeabi_idiv> mov r1, r0 mov r0, #10 and r0, r0, r1 |
This is not suitable for general-purpose use, and who knows if a compiler change may break it some day, but for my application right now it should be OK.