This swi wrapper has been working well for months. It packages swi
calls properly for heavy optimization. But now, today, out of around
250 Kbytes of compiled code overflowing with SWI calls that compile fine
with -O2 optimization -- failure!
Repeatable. Here's test code.
******** COMPILE BY:
arm-elf-gcc -O2 test.c
arm-elf-objdump -d --source a.out > test.dmp
******** SOURCE CODE:
1 | static inline long swi (long arg1, long arg2)
| 2 | {
| 3 | register long __res asm("r0");
| 4 | register void *__a asm("r0") = (void *)(long)arg1;
| 5 | register void *__b asm("r1") = (void *)(long)arg2;
| 6 |
| 7 | asm volatile ("svc 0x15 @ %0%1%2" : "=r" (__res) :
| 8 | "r" (__a), "r" (__b) :
| 9 | "memory", "r12", "r14", "cc");
| 10 |
| 11 | return(__res);
| 12 | }
| 13 |
| 14 | void main (int argc, char **argv)
| 15 | {
| 16 | swi (10, 1000/argc);
| 17 | swi (10, 1000);
| 18 | swi (10, 1000*argc);
| 19 | }
| 20 |
| 21 | void _exit (int status)
| 22 | {
| 23 | while (1);
| 24 | }
|
******** DISASSEMBLY:
00008210 <main>:
8210: e92d4010 push {r4, lr}
8214: e1a01000 mov r1, r0
8218: e1a04000 mov r4, r0
821c: e3a00ffa mov r0, #1000 ; 0x3e8
8220: eb00000b bl 8254 <__aeabi_idiv>
8224: e1a01000 mov r1, r0
8228: ef000015 svc 0x00000015
822c: e3a0000a mov r0, #10
8230: e3a01ffa mov r1, #1000 ; 0x3e8
8234: ef000015 svc 0x00000015
8238: e0643284 rsb r3, r4, r4, lsl #5
823c: e0844103 add r4, r4, r3, lsl #2
8240: e3a0000a mov r0, #10
8244: e1a01184 lsl r1, r4, #3
8248: ef000015 svc 0x00000015
824c: e8bd8010 pop {r4, pc}
******** PROBLEM:
NOTE how the first svc call LOSES THE r0 ARGUMENT! The compiler is
religious about inserting it in the other two calls (the mov r0, #10).
But in the first call, r0 is returned by the divide call and shoved into
r1; but the r0 argument of #10 is NOT loaded!
AGAIN -- this svc wrapper is working for many thousands of lines of code
peppered with calls, all of them optimized with -O2; and it works fine
un-optimized as well. It is just this one configuration with the divide
call that doesn't work.
Should I report this? Or am I missing something?
Thanks,
--Bill
Everybody,
I really think this is a bug. Here it is again, simplified.
Compiled/dumped with:
arm-elf-gcc -O2 test.c
arm-elf-objdump -d a.out > test.dmp
Unless somebody can tell me this really is what is supposed to happen,
I'll report this in bugzilla this week.
--Bill
1 | void _exit (int status)
| 2 | {
| 3 | while (1);
| 4 | }
| 5 |
| 6 | void main (int argc, char **argv)
| 7 | {
| 8 | register int x asm ("r0") = 10;
| 9 | register int y asm ("r1") = 1000/argc;
| 10 |
| 11 | asm volatile ("and %0,%1,%2" : "=r" (x) : "r" (x), "r" (y));
| 12 |
| 13 | _exit (x);
| 14 | }
|
Compiles to: 1 | 821c: e3a00ffa mov r0, #1000 ; 0x3e8
| 2 | 8220: eb000002 bl 8230 <__aeabi_idiv>
| 3 | 8224: e1a01000 mov r1, r0
| 4 | 8228: e0000001 and r0, r0, r1
|
Where is the mov r0, #10? Is this really what is supposed to happen?
Edit: This is arm-elf-gcc 4.5.1 compiled and running under Darwin (OS X
Snow Leopard 10.6.4).
Bill Burgess wrote:
> Where is the mov r0, #10? Is this really what is supposed to happen?
I also like the -O0 version (first two lines): 1 | mov r0, #10
| 2 | mov r0, #1000
| 3 | ldr r1, [fp, #-8]
| 4 | bl __aeabi_idiv
| 5 | mov r3, r0
| 6 | mov r1, r3
| 7 | and r0,r0,r1
|
GCC: (Sourcery G++ Lite 2009q3-68) 4.4.1
--
Marcus
See:
http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Local-Reg-Vars.html#Local-Reg-Vars
Such as: "Defining such a register variable does not reserve the
register;" and "This option does not guarantee that GCC will generate
code that has this variable in the register you specify at all times."
This pretty much explains what you see here. Those local register vars
just give you a nice name for R0/R1 when passed to asm statements but do
not prevent the compiler from it's usual use of the registers. Works as
documented, so don't expect anything useful from Bugzilla report.
So I don't think those R0/R1 names will work reliably in different
optimization settings and GCC revisions when mixing them with C code.
Very useful indeed A.K. thank you very much! Thanks also for generously
omitting to mention that the documentation refers to this as "a common
pitfall". And yet more thanks for saving me from the embarrassment of a
bugzilla report for known behavior.
Unfortunately, the solution the manual gives -- using a temporary
variable for the result of the divide-by expression -- does not work
under optimization. As you predicted. So back to the drawing board I
guess.
It should be a bit more reliable (and simpler) to use the parameter
passing conventions to assign R0/R1, such as 1 | long swi(long r0, long r1) __attribute__((noinline));
| 2 | long swi(long r0, long r1)
| 3 | {
| 4 | asm volatile ("svc 0x15"
| 5 | :
| 6 | : "r" (r0), "r" (r1)
| 7 | : "memory", "r12", "r14", "cc");
| 8 | }
|
Note that you should not add any C operation to this function. And do
not inline the wrapper. With those restrictions applied, your initial
code might also work.
You initial code likely broke due to the inlining of the wrapper
function, which in this case separated the register assignments from the
SWI code and placed a runtime function in between, which has it's own
assumptions on use of R0.
And when all fails, you can always revert to "naked" funktions and add
your own prolog/epilog, something like (untested) 1 | long swi(long r0, long r1) __attribute__((noinline,naked));
| 2 | long swi(long r0, long r1)
| 3 | {
| 4 | asm volatile ("push {lr}\n\tswi 0x15\n\tpop {pc}");
| 5 | }
|
Thanks AK!
Another approach is to add mov instructions before the swi, as part of
the same asm(), that explicitly load the arguments into r0 and r1.
Smaller memory footprint than adding a function-call wrapper.
The reason for my inline approach was to allow the optimizer to reduce
code size as much as possible. And, in fact, it works splendidly. The
resulting assembly is extremely tight and I've been very happy until
now.
I just wasn't aware that gcc made no effort to avoid clobbering register
variables. That in fact a register variable isn't really a variable at
all, it's an alias.
In my architecture, the argument in r0 is always an immediate constant.
So, by reversing the order in which the register aliases are assigned
the problem goes away:
1 | void main (int argc, char **argv)
| 2 | {
| 3 | register int y asm ("r1") = 1000/argc;
| 4 | register int x asm ("r0") = 10;
| 5 |
| 6 | asm volatile ("and %0,%1,%2" : "=r" (x) : "r" (x), "r" (y));
| 7 |
| 8 | _exit (x);
| 9 | }
|
...produces... 1 | mov r0, #1000 ; 0x3e8
| 2 | bl 8234 <__aeabi_idiv>
| 3 | mov r1, r0
| 4 | mov r0, #10
| 5 | and r0, r0, r1
|
This is not suitable for general-purpose use, and who knows if a
compiler change may break it some day, but for my application right now
it should be OK.
Please log in before posting. Registration is free and takes only a minute.
|