EmbDev.net

Forum: ARM programming with GCC/GNU tools Indirect Subroutine Call in Assembly for ARMv4T architecture


von Tat W. (Company: Universiti Sains Malaysia) (tcwan)


Rate this post
useful
not useful
Hi,

I'm trying to figure out what is the proper way to call a subroutine 
which has its address given as the contents of a register in ARMv4T 
assembly.

In ARMv5 there is the BLX instruction which accepts a register operand.
Unfortunately this is not available on ARMv4T.

TIA

: Moved by Admin
von (prx) A. K. (prx)


Rate this post
useful
not useful
The traditional sequence without support for interworking (being able to 
call Thumb code from ARM and vice versa) is
   mov lr,pc
   mov pc,target -or- ldr pc,target

With interworking:
   mov lr,pc
   bx  target

von Tat W. (Company: Universiti Sains Malaysia) (tcwan)


Rate this post
useful
not useful
@prx:

Thanks! Is this because pc = current instruction address + 8, so when
executing "mov lr, pc", it will store the instruction address after the 
branch?

Also, is there any penalty in using bx on ARMv4T for doing ARM -> ARM or 
Thumb -> Thumb subroutine calls?

von (prx) A. K. (prx)


Rate this post
useful
not useful
Tat Wan wrote:

> Thanks! Is this because pc = current instruction address + 8, so when
> executing "mov lr, pc", it will store the instruction address after the
> branch?

Yes.

> Also, is there any penalty in using bx on ARMv4T for doing ARM -> ARM or
> Thumb -> Thumb subroutine calls?

You should look at ARMs core documentation to read about individual 
instruction timings.

von Tat W. (Company: Universiti Sains Malaysia) (tcwan)


Rate this post
useful
not useful
A. K. wrote:

> With interworking:
>    mov lr,pc
>    bx  target

To revisit this topic, I am trying to understand how Thumb -> ARM 
Interworking is supposed to work in ARMv4T.

The example given works for ARM -> Thumb because LR[0] is 0 for ARM.
However, when we want to call from Thumb -> ARM, we need LR[0] to be 1.

In Thumb mode, BL <label> works since LR[0] is set to 1 automatically.
I can't find any documentation which specifies the behavior of
'mov LR, PC' for Thumb mode which assures me that LR[0] := 1, such that
1
    mov lr, pc
2
    bx  target
will work for Thumb -> ARM.

All description of interworking seem to gloss over this issue. There's 
mention of veneers but that seems to be generated by the linker for C 
object files only? Can anyone please clarify how this should be solved 
when programming purely in Assembly Language?

von Tat W. (Company: Universiti Sains Malaysia) (tcwan)


Attached files:

Rate this post
useful
not useful
Following up to myself:
[My Toolchain is arm-none-eabi-binutils and ld --
GNU ld (Linux/GNU Binutils) 2.20.51.0.9.20100526
GNU assembler (Linux/GNU Binutils) 2.20.51.0.9.20100526]

I tried it out in an example project. By looking at the linker output, 
it seems that a veener is automatically generated for Thumb->ARM calls.

Nonetheless, veneers seem to be generated for ALL .global labels. i.e., 
even if the routine is a Thumb routine, it will still result in the 
generation of a veneer (see excerpt from the interwork.objdump file 
below). thumb_routine2 is in a separate source file, so it needs to be 
declared .global, resulting in an invalid veneer being generated for the 
'BL routine3' in TEST_THUMB, as well as the 'BL icall_TEST_ARM' which is 
16-bit Thumb code to switch mode to ARM, declared as a .global. Is there 
a way to suppress linker veneer code generation?
1
00000060 <icall_TEST_THUMB>:
2
  60:   200f            movs    r0, #15
3
  62:   f000 f811       bl      88 <__TEST_ARM_from_thumb>
4
  66:   f7ff fffb       bl      60 <icall_TEST_THUMB>
5
  6a:   f000 f811       bl      90 <__thumb_routine2_from_thumb>
6
  6e:   f000 f803       bl      78 <thumb_routine3>
7
  72:   f000 f805       bl      80 <__icall_TEST_ARM_from_thumb>
8
  76:   4770            bx      lr
9
10
00000078 <thumb_routine3>:
11
  78:   3003            adds    r0, #3
12
  7a:   4770            bx      lr
13
14
0000007c <thumb_routine2>:
15
  7c:   3002            adds    r0, #2
16
  7e:   4770            bx      lr
17
18
00000080 <__icall_TEST_ARM_from_thumb>:
19
  80:   4778            bx      pc
20
  82:   46c0            nop                     ; (mov r8, r8)
21
  84:   eaffffef        b       48 <icall_TEST_ARM>
22
23
00000088 <__TEST_ARM_from_thumb>:
24
  88:   4778            bx      pc
25
  8a:   46c0            nop                     ; (mov r8, r8)
26
  8c:   eaffffef        b       50 <TEST_ARM>
27
28
00000090 <__thumb_routine2_from_thumb>:
29
  90:   4778            bx      pc
30
  92:   46c0            nop                     ; (mov r8, r8)
31
  94:   eafffff8        b       7c <thumb_routine2>

Finally, I've created some macros in interwork.h for declaring ARM 
routines to support Thumb-to-ARM calls based on the veneer code. 
However, GAs adds an ARM NOP instruction after the veneer. This is not 
fatal, but it does consume 4-bytes more. Is there any reason why GAs 
inserts the extral 32-bit NOP?
1
00000048 <icall_TEST_ARM>:
2
  48:   4778            bx      pc
3
  4a:   46c0            nop                     ; (mov r8, r8)
4
  4c:   e1a00000        nop                     ; (mov r0, r0)

Am I doing something wrong, or are these GAs quirks that I need to work 
around?

von Tat W. (Company: Universiti Sains Malaysia) (tcwan)


Attached files:

Rate this post
useful
not useful
Ok, after much futzing with the macros and code disassembly, I've come 
to the following conclusions:

1. All Interworked routines (ARM or Thumb) must be declared .global
2. All Thumb Interworked routines MUST have .thumb_func declared as well
   (This is critical for it to be recognized as a Thumb routine by the
    linker).
3. Interworking calls just use normal BL <interwork_routine>, the
   linker will handle the rest, and insert a veneer as necessary.

An updated macro file is included in case anyone is interested.
The arm_icall and thumb_icall are provided for programming clarity and 
just implements 'BL <routine>'.

It is not reliable to use manually generated veneers (based on the 
Linker generated code), the linker performs code block alignment which 
may cause invalid NOPs (32-bit instructions instead of 16-bit 
instructions, and vice versa) to be inserted after the veneer and mess 
up the mode switching. Of course, it is possible to write veneers that 
implement mode switching regardless of inserted NOPs, but I don't think 
it is worth the effort (in terms of number of instructions, and also 
execution cycles lost due to NOPs).

Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
No account? Register here.