Arm Architecture

Introduction

About ARM Architecture

Registers

Different State

ARM State

Thumb State

Jazzele

PC Relative Addressing

+-----------+
|  execute  | pc - 8 
+-----------+
     ^	 
     |	 
+-----------+
|  decode   | pc - 4
+-----------+
     ^
     |
+-----------+
|   fetcha  |  pc
+-----------+

Therefore , the real pc value is higher because while executing an instruction it will have decoded the next instruction and fetched the next to next instruction , thus pc value will be the address of two instruction ahead .

Instructions Set

Instruction Format

[ instruction ] [ condition ] [s] [ destination ] , [ source ] , [ other operands ... ]
add   r1 , r2 , #2   :  r1 = r2 + 2
suble r1 , r2 , #3   :  if less than : r1 = r2 + 3
movs  r1 , r2        :  r1 = r2 , Update Status register

Barrel Shifter

mov r7 ,r5 ,LSL #2       :  r7 = r5 << 2 
add r0 ,r1 ,r1 ,LSL #1   :  r0 = r1 + ( r1 << 1 )

Load / Store

Like x86 direct manipulation of memory is not possible in ARM , Here one need to load the data onto the register , manipulate it and then store it back to memory .

ldr r2 , [r1]  : value @ r1 is loaded to r2
add r2 , #1    : value is incremented
str r2 , [r1]  : value in r2 is strored @ r1
  1. Different Addressing mode

    There instruction have three primary addressing mode which use a base_register and a offset specified by the instruction

    1. Offset Addressing [ Rn , offset ]

      The memory address is formed by adding or subtraction an offset to or from the base register

      ldr r2 , [r0, #8]   : load value from r0+8
      str r2 , [r0, r1]   : value in r2 is stored in r0 + r1
      
    2. Pre-indexed Addressing [ Rn , offset ]!

      The memory address is formed in the same way as the offset addressing. As a side effect the memory address is also written back to the base register

      ldr r2 , [r0, #8]!   : load value from r0 + 8  and  r0 = r0 + 8  ( r0 is updated )
      str r2 , [r0, r1]!   : value in r2 is stored in r0 + r1 and  r0 = r0 + r1 ( r0 is updated )
      
    3. Post-indexed Addressing [ Rn ] , offset

      The address is the base register value , As a side effect , an offset is added to or subtracted from the base register value and the result is written back to the base register

      ldr r2, [r0], #8     : load value from r0 then set r0 = r0 + 8 ( r0 is updated after the operation )
      str r2 ,[r0], r1     : value in r2 is stored in r0 then r0 = r0 + r1 
      
  2. Load / Store Multiple

    ldm and stm can be used to store multiple register .

    ldm r0, {r1,r2,r3}   : r1 = [r0] , r2 = [r0+4] , r3 = [r0+8] 
        
    ldm r0!, {r1,r2,r3}  : r1 = [r0] , r2 = [r0+4] , r3 = [r0+8] , r0 = r0 + 8
        
    stm r0, {r1-r3}      : [r0] = r1 , [r0+4] = r2 , [r0+8] = r3
        
    stm r0!, {r1-r3}     : [r0] = r1 , [r0+4] = r2,  [r0+8] = r3 , r0 = r0 + 8
    

    There are 4 Addressing modes which decides how the address shall be incremented or decremented

    Mode Description
    IA Increment After (default)
    IB Increment Before
    DA Decrements After
    DB Decrements Before
       
    • push and pop are aliases for stmdb amd ldmia
    ldmib r0 , {r1,r2,r3}  : r1 = [r0+4] , r2 = [r4+8] , r3 = [r4+12]
    
  3. Load Immediate value

    • ARM has a fixed instruction length of 32 bit
      • Including opcode and operands
    • Only 12 bits are left for immediate values

    • if bit 25 is set to 1 the last 12 bit are handled as immediate
     31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 18 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
    +--+--+--+--+--+--+--+--+--+--+--+--+--------------+-----------+-----------------------------------+
    |   Cond    |0 |0 |1 |0 |0 |0 |0 |S |  Rn          |   Rd      |                immediate          |
    +--+--+--+--+--+--+--+--+--+--+--+--+--------------+-----------+-----------+-----------------------+
    
    • if bit 25 is set to 0 the last 12 bit are handled as 2nd operand
      • In order to make it possible to load bigger value than 4096 ( 12bit ), the value is split
      • a = 8 bit value ( 0 to 255 )
      • b = 4 bit value ( used for rotate right )
        • immediate = a ror ( b « 1 )
     31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 18 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
    +--+--+--+--+--+--+--+--+--+--+--+--+--------------+-----------+-----+--+--+-----------------------+
    |   Cond    |0 |0 |0 |0 |0 |0 |0 |S |  Rn          |   Rd      |  Rotate   |    immediate          |
    +--+--+--+--+--+--+--+--+--+--+--+--+--------------+-----------+-----------+-----------------------+
    

    Often other method are used to dodge big intermediate values

    ldr r1 , =0x11223344   : most likely substituted by pc + relative address
        
    movw r1, #0x3344       : load the value in two step r1 = 0x3344
    movt r1, #0x1122       : r1 = 0x11223344
        
    mov r2, #0x2e00        : assemble first part of 0x2ee0
    orr r2, #0xe0          : assemble second part of 0x2ee0
    

Bit wise Instruction

Operation Assembly Simplified
bitwise AND and r0, r1, #2 r0 = r1 & 2
bitwise OR orr r0, r1, r2 r0 = r1 or r2
bitwise XOR eor r0, r1, r2 r0 = r1 ^ r2
negation NOT mvn r0, r2 r0 = !r2

Arithmetic

Operation Assembly Simplified
Add add r0, r1 , #2 r0 = r1 + 2
Add with carry adc r0, r1 , r2 r0 = r1 + r2 + 1
Substract sub r0, r1 , #2 r0 = r1 - 2
Reverse Sub rsb r0, r1 , #2 r0 = 2 - r1
Multiply mul r0, r1 , r2 r0 = r1 * r2
     

Compare

Comparisons produce no results – they just set condition codes. Ordinary instructions will also set condition codes if the “S” bit is set. The “S” bit is implied for comparison instructions.

cmp r0, #42 : compare R0 to 42.
cmn r2, #42 : compare R2 to -42.
tst r11, #1 : test bit zero.
teq r8, r9  : test R8 equals R9.
subs r1, r0, #42 : compare R0 to 42, with result.

Branches

:branch
b #0x137       : branch to current address + 0x137
bx r1           : branch to address in r1

:branch and link
bl #0x137      : branch to current address + 0x137
blx r1          : branch to address in r1
  1. Branches with ARM / Thumb States

    In order to set the CPU in thumb state , the least significant bit has to be set to 1 , if it has bot been set , the CPU switches to ARM state .

    To jump to Thumb code at 0x40000

    : r1 contains the address ( 0x40000 )
    add r1,r1, #1        :  The least signeficant bit is set to 1
    bx r1                :  CPU will change to Thumb mode
    

Conditional Execution

subs r0, r0, #1       : s means that the flag register should be updated
subne r0, r0, #2      : sub not equal , substract if zeor flag is set
adde  r1, r1, #2      : add not equal , add if zero flag is set
Opcode [31:28] Suffix Descripton Flag
0000 EQ Equal Z==1
0001 NE Not Equal Z==0
0010 CS/HS Carry Set / unsigned high C==1
0011 CC/LO Carry clear / unsigned low C==0
0100 MI Minus / Negative N==1
0101 Pl Plus / Positive / Zero N==0
0110 VS Overflow V==1
0111 HI Undigned High ( C==1 && Z==0 )
1000 LS Unsigned Low ( C==0 && Z==1 )
1001 GE Signed greater than or equal N==V
1011 LT Signed less than ( N!=V)
1100 GT Signed greater than ( Z==0 && N==V )
1101 LE Signed less than ( Z===1 or N!=V )

Calling Convention

Calling Function

Calling System Call

Syscall Reference : syscall

Stack Frame

 +-----------------+
 |  Return Addr    | <- r11 ( fp ) 
 +-----------------+
 | Saved Frame ptr |
 +-----------------+
 |      ...        |
 |                 |
 |    Local Var    | 
 |                 |
 |      ...        | <- sp
 +-----------------+ 

Function Prologue

push {fp , lr}
add fp, sp, #4
sub sp, sp, #0x20

Function Epilogue

sub sp, fp, #4
pop {fp, pc}
sub sp, fp, #4S
pop {fp, lr}
bx lr

Reference

Setting Up the Lab

sudo apt-get install qemu qemu-user qemu-user-static

The defult GDB does not know anything about other architecture , but gdb-multiarch adds support for other architecture.

sudo apt-get install gdb-multiarch
$ sudo apt-get install gcc-arm-linux-gnueabihf libc6-dev-armhf-cross  binfmtc binfmt-support
$ sudo mkdir /etc/qemu-binfmt
$ sudo ln -s /usr/arm-linux-gnueabihf /etc/qemu-binfmt/arm 

Now you can compile ARM binary in your system with

arm-linux-gnueabihf-gcc -ohello hello.c

Now onto debugging ARM binaries , with QEMU and GDB .

qemu-arm -g 1337  hello

Now we can connect gdb to port 1337 and debug the program hello

$   gdb-multiarch -q hello
Reading symbols from hello...(no debugging symbols found)...done.
(gdb) set architecture arm
The target architecture is assumed to be arm
(gdb) target remote localhost:1337
Remote debugging using localhost:1337
(gdb) 

GEF is an extention for gdb which really plays well with non-x86 debugging : link

Also the creator of the same project has created many qemu image on different architecture to play around , It contains ARM image which is based on Raspberri pi , With this there is no need for remote debuggeing since it emulates the whole operating system, you can run the binary directly and debug it inside the qemu session . link to his blog