EmbDev.net

Forum: ARM programming with GCC/GNU tools gcc loads register from flash with two 16Bit moves


von arne (Guest)


Rate this post
useful
not useful
Hello,
I am using an LM3s9b90 Controller with cortex M3 Core.

I wrote a simple test which toggles a Port pin by accessing the 
appropriate Bitband-Alias.
In following example EL_PF_3 is an Bitband Alias to Port F bit 3.
(Address 0x424A7F8C ,1112178572 dec)

In the assembler listing, you can see that two 16Bit moves are used to 
load the address,
I don't understand why. I think, this could be done with one 32Bit move.

ssi.c ****   EL_PF_3 = 0;
 21538                  .loc 1 19 0
 21539 0000 47F68C73     movw  r3, #:lower16:1112178572
 21540 0004 C4F24A23     movt  r3, #:upper16:1112178572
 21541 0008 0021         movs  r1, #0
ssi.c ****   EL_PF_3 = 1;
 21542                  .loc 1 20 0
 21543 000a 0122         movs  r2, #1
 21544                  .loc 1 19 0
 21545 000c 1960         str  r1, [r3, #0]
 21546                  .loc 1 20 0
 21547 000e 1A60         str  r2, [r3, #0]


I am using arm-elf-gcc, gcc-Version 4.4.0 with following flags:
-O3 -g3 -Wall -L lib -mcpu=cortex-m3 -mthumb

Has someone an idea, what might be wrong?

Best regards
 Arne

von Giovanni D. (gdisirio)


Rate this post
useful
not useful
I think it is because the flash prefetch buffers, fetching two 
consecutive instructions is faster than fetching an instruction and then 
a 32 bits value displaced in memory which would require refilling the 
buffer with the insertion of wait states.

Giovanni
---
ChibiOS/RT http://chibios.sourceforge.net

von (prx) A. K. (prx)


Rate this post
useful
not useful
arne wrote:

> In the assembler listing, you can see that two 16Bit moves are used to
> load the address,

There are 2 alternative ways to load a 32bit constant. PC relative from 
a constant pool like the old ARM core had to do, and the way shown here.

The PC relative load needs 6 bytes total, but depending on the machine 
it could cost a lot of clock cycles due to slow flash memory (~6 cycles 
on a LPC1700 at max freq). Embedding the constant in the sequential 
instruction stream needs 8 bytes, but runs in 2 clock cycles as 
sequential operations are largely unaffected by flash speed for the 
reason mentioned previously.

In GCC the taken alternative depends on the optimization setting. With 
-Os you get the smaller code.

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.