EmbDev.net

Forum: ARM programming with GCC/GNU tools arm-elf-gcc bug? optimizer+asm+divide


von Bill B. (auldreekie)


Rate this post
useful
not useful
This swi wrapper has been working well for months.  It packages swi 
calls properly for heavy optimization.  But now, today, out of around 
250 Kbytes of compiled code overflowing with SWI calls that compile fine 
with -O2 optimization -- failure!

Repeatable.  Here's test code.

******** COMPILE BY:

arm-elf-gcc -O2 test.c
arm-elf-objdump -d --source a.out > test.dmp

******** SOURCE CODE:
1
static inline long swi (long arg1, long arg2)
2
{
3
  register long __res asm("r0");
4
  register void *__a  asm("r0") = (void *)(long)arg1;
5
  register void *__b  asm("r1") = (void *)(long)arg2;
6
  
7
  asm volatile ("svc 0x15 @ %0%1%2" : "=r" (__res) :
8
      "r" (__a), "r" (__b) :
9
      "memory", "r12", "r14", "cc");
10
11
  return(__res);
12
}
13
14
void main (int argc, char **argv)
15
{
16
  swi (10, 1000/argc);
17
  swi (10, 1000);
18
  swi (10, 1000*argc);
19
}
20
21
void _exit (int status)
22
{
23
  while (1);
24
}

******** DISASSEMBLY:

00008210 <main>:
    8210:       e92d4010        push    {r4, lr}
    8214:       e1a01000        mov     r1, r0
    8218:       e1a04000        mov     r4, r0
    821c:       e3a00ffa        mov     r0, #1000       ; 0x3e8
    8220:       eb00000b        bl      8254 <__aeabi_idiv>
    8224:       e1a01000        mov     r1, r0
    8228:       ef000015        svc     0x00000015
    822c:       e3a0000a        mov     r0, #10
    8230:       e3a01ffa        mov     r1, #1000       ; 0x3e8
    8234:       ef000015        svc     0x00000015
    8238:       e0643284        rsb     r3, r4, r4, lsl #5
    823c:       e0844103        add     r4, r4, r3, lsl #2
    8240:       e3a0000a        mov     r0, #10
    8244:       e1a01184        lsl     r1, r4, #3
    8248:       ef000015        svc     0x00000015
    824c:       e8bd8010        pop     {r4, pc}

******** PROBLEM:

NOTE how the first svc call LOSES THE r0 ARGUMENT!  The compiler is 
religious about inserting it in the other two calls (the mov r0, #10). 
But in the first call, r0 is returned by the divide call and shoved into 
r1; but the r0 argument of #10 is NOT loaded!

AGAIN -- this svc wrapper is working for many thousands of lines of code 
peppered with calls, all of them optimized with -O2; and it works fine 
un-optimized as well.  It is just this one configuration with the divide 
call that doesn't work.

Should I report this?  Or am I missing something?

Thanks,
--Bill

von Bill B. (auldreekie)


Rate this post
useful
not useful
Everybody,

I really think this is a bug.  Here it is again, simplified. 
Compiled/dumped with:

arm-elf-gcc -O2 test.c
arm-elf-objdump -d a.out > test.dmp

Unless somebody can tell me this really is what is supposed to happen, 
I'll report this in bugzilla this week.

--Bill
1
void _exit (int status)
2
{
3
  while (1);
4
}
5
6
void main (int argc, char **argv)
7
{
8
  register int x asm ("r0") = 10;
9
  register int y asm ("r1") = 1000/argc;
10
  
11
  asm volatile ("and %0,%1,%2" : "=r" (x) : "r" (x), "r" (y));
12
  
13
  _exit (x);
14
}
Compiles to:
1
    821c:       e3a00ffa        mov     r0, #1000       ; 0x3e8
2
    8220:       eb000002        bl      8230 <__aeabi_idiv>
3
    8224:       e1a01000        mov     r1, r0
4
    8228:       e0000001        and     r0, r0, r1
Where is the mov r0, #10?  Is this really what is supposed to happen?

Edit:  This is arm-elf-gcc 4.5.1 compiled and running under Darwin (OS X 
Snow Leopard 10.6.4).

von Marcus H. (mharnisch)


Rate this post
useful
not useful
Bill Burgess wrote:
> Where is the mov r0, #10?  Is this really what is supposed to happen?

I also like the -O0 version (first two lines):
1
mov  r0, #10
2
mov  r0, #1000
3
ldr  r1, [fp, #-8]
4
bl  __aeabi_idiv
5
mov  r3, r0
6
mov  r1, r3
7
and  r0,r0,r1

GCC: (Sourcery G++ Lite 2009q3-68) 4.4.1

--
Marcus

von (prx) A. K. (prx)


Rate this post
useful
not useful
See: 
http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Local-Reg-Vars.html#Local-Reg-Vars

Such as: "Defining such a register variable does not reserve the 
register;" and "This option does not guarantee that GCC will generate 
code that has this variable in the register you specify at all times."

This pretty much explains what you see here. Those local register vars 
just give you a nice name for R0/R1 when passed to asm statements but do 
not prevent the compiler from it's usual use of the registers. Works as 
documented, so don't expect anything useful from Bugzilla report.

So I don't think those R0/R1 names will work reliably in different 
optimization settings and GCC revisions when mixing them with C code.

von Bill B. (auldreekie)


Rate this post
useful
not useful
Very useful indeed A.K. thank you very much!  Thanks also for generously 
omitting to mention that the documentation refers to this as "a common 
pitfall".  And yet more thanks for saving me from the embarrassment of a 
bugzilla report for known behavior.

Unfortunately, the solution the manual gives -- using a temporary 
variable for the result of the divide-by expression -- does not work 
under optimization.  As you predicted.  So back to the drawing board I 
guess.

von (prx) A. K. (prx)


Rate this post
useful
not useful
It should be a bit more reliable (and simpler) to use the parameter 
passing conventions to assign R0/R1, such as
1
long swi(long r0, long r1) __attribute__((noinline));
2
long swi(long r0, long r1)
3
{
4
  asm volatile ("svc 0x15"
5
      :
6
      : "r" (r0), "r" (r1)
7
      : "memory", "r12", "r14", "cc");
8
}

Note that you should not add any C operation to this function. And do 
not inline the wrapper. With those restrictions applied, your initial 
code might also work.

You initial code likely broke due to the inlining of the wrapper 
function, which in this case separated the register assignments from the 
SWI code and placed a runtime function in between, which has it's own 
assumptions on use of R0.

von (prx) A. K. (prx)


Rate this post
useful
not useful
And when all fails, you can always revert to "naked" funktions and add 
your own prolog/epilog, something like (untested)
1
long swi(long r0, long r1) __attribute__((noinline,naked));
2
long swi(long r0, long r1)
3
{
4
  asm volatile ("push {lr}\n\tswi 0x15\n\tpop {pc}");
5
}

von Bill B. (auldreekie)


Rate this post
useful
not useful
Thanks AK!

Another approach is to add mov instructions before the swi, as part of 
the same asm(), that explicitly load the arguments into r0 and r1. 
Smaller memory footprint than adding a function-call wrapper.

The reason for my inline approach was to allow the optimizer to reduce 
code size as much as possible.  And, in fact, it works splendidly.  The 
resulting assembly is extremely tight and I've been very happy until 
now.

I just wasn't aware that gcc made no effort to avoid clobbering register 
variables.  That in fact a register variable isn't really a variable at 
all, it's an alias.

In my architecture, the argument in r0 is always an immediate constant. 
So, by reversing the order in which the register aliases are assigned 
the problem goes away:
1
void main (int argc, char **argv)
2
{
3
  register int y asm ("r1") = 1000/argc;
4
  register int x asm ("r0") = 10;
5
  
6
  asm volatile ("and %0,%1,%2" : "=r" (x) : "r" (x), "r" (y));
7
  
8
  _exit (x);
9
}
...produces...
1
mov     r0, #1000       ; 0x3e8
2
bl      8234 <__aeabi_idiv>
3
mov     r1, r0
4
mov     r0, #10
5
and     r0, r0, r1
This is not suitable for general-purpose use, and who knows if a 
compiler change may break it some day, but for my application right now 
it should be OK.

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.