Forum: ARM programming with GCC/GNU tools gcc loads register from flash with two 16Bit moves

von arne (Guest)

Rate this post
0 useful
not useful
I am using an LM3s9b90 Controller with cortex M3 Core.

I wrote a simple test which toggles a Port pin by accessing the 
appropriate Bitband-Alias.
In following example EL_PF_3 is an Bitband Alias to Port F bit 3.
(Address 0x424A7F8C ,1112178572 dec)

In the assembler listing, you can see that two 16Bit moves are used to 
load the address,
I don't understand why. I think, this could be done with one 32Bit move.

ssi.c ****   EL_PF_3 = 0;
 21538                  .loc 1 19 0
 21539 0000 47F68C73     movw  r3, #:lower16:1112178572
 21540 0004 C4F24A23     movt  r3, #:upper16:1112178572
 21541 0008 0021         movs  r1, #0
ssi.c ****   EL_PF_3 = 1;
 21542                  .loc 1 20 0
 21543 000a 0122         movs  r2, #1
 21544                  .loc 1 19 0
 21545 000c 1960         str  r1, [r3, #0]
 21546                  .loc 1 20 0
 21547 000e 1A60         str  r2, [r3, #0]

I am using arm-elf-gcc, gcc-Version 4.4.0 with following flags:
-O3 -g3 -Wall -L lib -mcpu=cortex-m3 -mthumb

Has someone an idea, what might be wrong?

Best regards

von Giovanni D. (gdisirio)

Rate this post
0 useful
not useful
I think it is because the flash prefetch buffers, fetching two 
consecutive instructions is faster than fetching an instruction and then 
a 32 bits value displaced in memory which would require refilling the 
buffer with the insertion of wait states.

ChibiOS/RT http://chibios.sourceforge.net

von A. K. (prx)

Rate this post
0 useful
not useful
arne wrote:

> In the assembler listing, you can see that two 16Bit moves are used to
> load the address,

There are 2 alternative ways to load a 32bit constant. PC relative from 
a constant pool like the old ARM core had to do, and the way shown here.

The PC relative load needs 6 bytes total, but depending on the machine 
it could cost a lot of clock cycles due to slow flash memory (~6 cycles 
on a LPC1700 at max freq). Embedding the constant in the sequential 
instruction stream needs 8 bytes, but runs in 2 clock cycles as 
sequential operations are largely unaffected by flash speed for the 
reason mentioned previously.

In GCC the taken alternative depends on the optimization setting. With 
-Os you get the smaller code.


Entering an e-mail address is optional. If you want to receive reply notifications by e-mail, please log in.

Rules — please read before posting

  • Post long source code as attachment, not in the text
  • Posting advertisements is forbidden.

Formatting options

  • [c]C code[/c]
  • [avrasm]AVR assembler code[/avrasm]
  • [code]code in other languages, ASCII drawings[/code]
  • [math]formula (LaTeX syntax)[/math]

Bild automatisch verkleinern, falls nötig
Note: the original post is older than 6 months. Please don't ask any new questions in this thread, but start a new one.