Hi, I'm trying to figure out what is the proper way to call a subroutine which has its address given as the contents of a register in ARMv4T assembly. In ARMv5 there is the BLX instruction which accepts a register operand. Unfortunately this is not available on ARMv4T. TIA
:
Moved by Admin
The traditional sequence without support for interworking (being able to call Thumb code from ARM and vice versa) is mov lr,pc mov pc,target -or- ldr pc,target With interworking: mov lr,pc bx target
@prx: Thanks! Is this because pc = current instruction address + 8, so when executing "mov lr, pc", it will store the instruction address after the branch? Also, is there any penalty in using bx on ARMv4T for doing ARM -> ARM or Thumb -> Thumb subroutine calls?
Tat Wan wrote: > Thanks! Is this because pc = current instruction address + 8, so when > executing "mov lr, pc", it will store the instruction address after the > branch? Yes. > Also, is there any penalty in using bx on ARMv4T for doing ARM -> ARM or > Thumb -> Thumb subroutine calls? You should look at ARMs core documentation to read about individual instruction timings.
A. K. wrote: > With interworking: > mov lr,pc > bx target To revisit this topic, I am trying to understand how Thumb -> ARM Interworking is supposed to work in ARMv4T. The example given works for ARM -> Thumb because LR[0] is 0 for ARM. However, when we want to call from Thumb -> ARM, we need LR[0] to be 1. In Thumb mode, BL <label> works since LR[0] is set to 1 automatically. I can't find any documentation which specifies the behavior of 'mov LR, PC' for Thumb mode which assures me that LR[0] := 1, such that
1 | mov lr, pc |
2 | bx target |
will work for Thumb -> ARM. All description of interworking seem to gloss over this issue. There's mention of veneers but that seems to be generated by the linker for C object files only? Can anyone please clarify how this should be solved when programming purely in Assembly Language?
Following up to myself: [My Toolchain is arm-none-eabi-binutils and ld -- GNU ld (Linux/GNU Binutils) 2.20.51.0.9.20100526 GNU assembler (Linux/GNU Binutils) 2.20.51.0.9.20100526] I tried it out in an example project. By looking at the linker output, it seems that a veener is automatically generated for Thumb->ARM calls. Nonetheless, veneers seem to be generated for ALL .global labels. i.e., even if the routine is a Thumb routine, it will still result in the generation of a veneer (see excerpt from the interwork.objdump file below). thumb_routine2 is in a separate source file, so it needs to be declared .global, resulting in an invalid veneer being generated for the 'BL routine3' in TEST_THUMB, as well as the 'BL icall_TEST_ARM' which is 16-bit Thumb code to switch mode to ARM, declared as a .global. Is there a way to suppress linker veneer code generation?
1 | 00000060 <icall_TEST_THUMB>: |
2 | 60: 200f movs r0, #15 |
3 | 62: f000 f811 bl 88 <__TEST_ARM_from_thumb> |
4 | 66: f7ff fffb bl 60 <icall_TEST_THUMB> |
5 | 6a: f000 f811 bl 90 <__thumb_routine2_from_thumb> |
6 | 6e: f000 f803 bl 78 <thumb_routine3> |
7 | 72: f000 f805 bl 80 <__icall_TEST_ARM_from_thumb> |
8 | 76: 4770 bx lr |
9 | |
10 | 00000078 <thumb_routine3>: |
11 | 78: 3003 adds r0, #3 |
12 | 7a: 4770 bx lr |
13 | |
14 | 0000007c <thumb_routine2>: |
15 | 7c: 3002 adds r0, #2 |
16 | 7e: 4770 bx lr |
17 | |
18 | 00000080 <__icall_TEST_ARM_from_thumb>: |
19 | 80: 4778 bx pc |
20 | 82: 46c0 nop ; (mov r8, r8) |
21 | 84: eaffffef b 48 <icall_TEST_ARM> |
22 | |
23 | 00000088 <__TEST_ARM_from_thumb>: |
24 | 88: 4778 bx pc |
25 | 8a: 46c0 nop ; (mov r8, r8) |
26 | 8c: eaffffef b 50 <TEST_ARM> |
27 | |
28 | 00000090 <__thumb_routine2_from_thumb>: |
29 | 90: 4778 bx pc |
30 | 92: 46c0 nop ; (mov r8, r8) |
31 | 94: eafffff8 b 7c <thumb_routine2> |
Finally, I've created some macros in interwork.h for declaring ARM routines to support Thumb-to-ARM calls based on the veneer code. However, GAs adds an ARM NOP instruction after the veneer. This is not fatal, but it does consume 4-bytes more. Is there any reason why GAs inserts the extral 32-bit NOP?
1 | 00000048 <icall_TEST_ARM>: |
2 | 48: 4778 bx pc |
3 | 4a: 46c0 nop ; (mov r8, r8) |
4 | 4c: e1a00000 nop ; (mov r0, r0) |
Am I doing something wrong, or are these GAs quirks that I need to work around?
Ok, after much futzing with the macros and code disassembly, I've come to the following conclusions: 1. All Interworked routines (ARM or Thumb) must be declared .global 2. All Thumb Interworked routines MUST have .thumb_func declared as well (This is critical for it to be recognized as a Thumb routine by the linker). 3. Interworking calls just use normal BL <interwork_routine>, the linker will handle the rest, and insert a veneer as necessary. An updated macro file is included in case anyone is interested. The arm_icall and thumb_icall are provided for programming clarity and just implements 'BL <routine>'. It is not reliable to use manually generated veneers (based on the Linker generated code), the linker performs code block alignment which may cause invalid NOPs (32-bit instructions instead of 16-bit instructions, and vice versa) to be inserted after the veneer and mess up the mode switching. Of course, it is possible to write veneers that implement mode switching regardless of inserted NOPs, but I don't think it is worth the effort (in terms of number of instructions, and also execution cycles lost due to NOPs).
Please log in before posting. Registration is free and takes only a minute.
Existing account
Do you have a Google/GoogleMail account? No registration required!
Log in with Google account
Log in with Google account
No account? Register here.