There are many different assembly languages, depending on the CPU used.

Intel and AMD commonly use the x86 architecture.

32-bit CPUs commonly use x86 assembly.

64-bit CPUs commonly use x86_64 assembly.

RISC and ARM assembly are other common ones.

I’ll just focus on x86_64.

There are also many compilers, such as NASM and GAS.

I’ll use the GNU assembler (GAS, executable named ‘as’).

GCC actually automatically invokes GAS when compiling c code.


BARE PROGRAM

[~/Desktop] 
$ cat x.s
.global _start

_start:

[~/Desktop] 
$ as x.s -o x.o && gcc -o x -nostdlib -static x.o

[~/Desktop] 
$ ./x
Segmentation fault (core dumped)


It segfaults because there’s no exit syscall, so let’s make that.

.global _start
.intel_syntax noprefix

_start:
	mov 	rax, 60 # systemcall for exit (linux)
	syscall

Great, now let’s check out choosing the exit code (note, they overflow if bigger than 256).

We can check the exit code of the previous program in bash with echo $?

Typically you should use exit code 0 to indicate the program ran successfuly.

But for this example I’ll try 7.


[~/Desktop] 
$ cat x.s
.global _start
.intel_syntax noprefix

_start:
        mov     rax, 60 # system call for exit
        mov     rdi, 7  # exit code 7
        syscall

[~/Desktop] 
$ as x.s -o x.o && gcc -o x -nostdlib -static x.o

[~/Desktop] 
$ ./x

[~/Desktop] 
$ echo $?
7


Addition

.global _start
.intel_syntax noprefix

_start:
	mov 	rax, 4
	add 	rax, 2
	# rax should now have 4+2=6

	mov 	rax, 60 
	mov 	rdi, 0 	
	syscall


You can observe the rax register with gdb

[~/Desktop] 
$ gdb
...

pwndbg> file x
Reading symbols from x...
(No debugging symbols found in x)

pwndbg> info functions
All defined functions:

Non-debugging symbols:
0x0000000000401000  _start
0x0000000000402000  __bss_start
0x0000000000402000  _edata
0x0000000000402000  _end

pwndbg> break _start
Breakpoint 1 at 0x401000

pwndbg> r
Starting program: /home/connor/Desktop/x 

Breakpoint 1, 0x0000000000401000 in _start ()
LEGEND: STACK | HEAP | CODE | DATA | WX | RODATA
[ REGISTERS / show-flags off / show-compact-regs off ]
 RAX  0
 RBX  0
 RCX  0
 RDX  0
 RDI  0
 RSI  0
 R8   0
 R9   0
 R10  0
 R11  0
 R12  0
 R13  0
 R14  0
 R15  0
 RBP  0
 RSP  0x7fffffffe4b0 ◂— 1
 RIP  0x401000 (_start) ◂— mov rax, 4
─────────[ DISASM / x86-64 / set emulate on ]─────────
 ► 0x401000 <_start>       mov    rax, 4                 RAX => 4
   0x401007 <_start+7>     add    rax, 2                 RAX => 6 (4 + 2)
   0x40100b <_start+11>    mov    rax, 0x3c              RAX => 0x3c
   0x401012 <_start+18>    mov    rdi, 0                 RDI => 0
   0x401019 <_start+25>    syscall  <SYS_exit>
   0x40101b                add    byte ptr [rax], al
   0x40101d                add    byte ptr [rax], al
   0x40101f                add    byte ptr [rax], al
   0x401021                add    byte ptr [rax], al
   0x401023                add    byte ptr [rax], al
   0x401025                add    byte ptr [rax], al
...
pwndbg> n
0x0000000000401007 in _start ()
LEGEND: STACK | HEAP | CODE | DATA | WX | RODATA
───────────────────────────────────────[ REGISTERS / show-flags off / show-compact-regs off ]────────────────────────────────────────
*RAX  4
 RBX  0
 RCX  0
 RDX  0
 RDI  0
 RSI  0
 R8   0
 R9   0
 R10  0
 R11  0
 R12  0
 R13  0
 R14  0
 R15  0
 RBP  0
 RSP  0x7fffffffe4b0 ◂— 1
*RIP  0x401007 (_start+7) ◂— add rax, 2
────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]─────────────────────────────────────────────────
   0x401000 <_start>       mov    rax, 4                 RAX => 4
 ► 0x401007 <_start+7>     add    rax, 2                 RAX => 6 (4 + 2)
   0x40100b <_start+11>    mov    rax, 0x3c              RAX => 0x3c
   0x401012 <_start+18>    mov    rdi, 0                 RDI => 0
   0x401019 <_start+25>    syscall  <SYS_exit>
   0x40101b                add    byte ptr [rax], al
   0x40101d                add    byte ptr [rax], al
   0x40101f                add    byte ptr [rax], al
   0x401021                add    byte ptr [rax], al
   0x401023                add    byte ptr [rax], al
   0x401025                add    byte ptr [rax], al
...
pwndbg> n
0x000000000040100b in _start ()
LEGEND: STACK | HEAP | CODE | DATA | WX | RODATA
───────────────────────────────────────[ REGISTERS / show-flags off / show-compact-regs off ]────────────────────────────────────────
*RAX  6
 RBX  0
 RCX  0
 RDX  0
 RDI  0
 RSI  0
 R8   0
 R9   0
 R10  0
 R11  0
 R12  0
 R13  0
 R14  0
 R15  0
 RBP  0
 RSP  0x7fffffffe4b0 ◂— 1
*RIP  0x40100b (_start+11) ◂— mov rax, 0x3c
────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]─────────────────────────────────────────────────
   0x401000 <_start>       mov    rax, 4                 RAX => 4
   0x401007 <_start+7>     add    rax, 2                 RAX => 6 (4 + 2)
 ► 0x40100b <_start+11>    mov    rax, 0x3c              RAX => 0x3c
   0x401012 <_start+18>    mov    rdi, 0                 RDI => 0
   0x401019 <_start+25>    syscall  <SYS_exit>
   0x40101b                add    byte ptr [rax], al
   0x40101d                add    byte ptr [rax], al
   0x40101f                add    byte ptr [rax], al
   0x401021                add    byte ptr [rax], al
   0x401023                add    byte ptr [rax], al
   0x401025                add    byte ptr [rax], al
...
pwndbg> 


assembly in bootloader that draws to VGA video mode

I thought this was pretty cool

https://www.youtube.com/watch?v=M-RVBiAmXj0

ORG 0x7C00

start:
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00

    mov al, 0x13
    int 0x10

    mov edi, 0xA0000
    mov byte [edi], COLOR_RED 

    jmp $

begin_draw dd 0
sq_width dd 40
sq_height dd 70
DRAW_START equ 0xA0000
COLOR_RED equ 0x04

times 510 - ($-$$) db 0
dw 0xAA55

nasm x.asm && q x

ORG 0x7C00

start:
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00

    mov al, 0x13
    int 0x10

    call draw_square

    jmp $

draw_square:
    mov edi, DRAW_START
    mov eax, [sq_y]
    mov ebx, 320
    mul ebx
    add eax, edi
    mov edi, eax
    add edi, [sq_x]

    jmp .put_pixel

.setup:

.move_down:
    add edi, 320
    sub edi, [sq_width]
    xor ecx, ecx

.put_pixel:
    mov byte [edi], COLOR_RED
    inc edi
    inc ecx
    cmp ecx, [sq_width]
    jl .put_pixel

    inc edx
    cmp edx, [sq_height]
    jl .move_down


sq_x dd 10
sq_y dd 10
begin_draw dd 0
sq_width dd 320 - 10
sq_height dd 200 - 10
DRAW_START equ 0xA0000 ;+ 64000
COLOR_RED equ 0x04

times 510 - ($-$$) db 0
dw 0xAA55


https://mendelson.org/wpdos/videomodes.txt


TempleOS uses 640x480 resolution, 16-color (mode 0x12 / 12H, the highest resolution supported by standard VGA)




Actually, I’ll start using NASM too, because that’s what’s used in this book https://www.amazon.com/Low-Level-Programming-Assembly-Execution-Architecture/dp/1484224027


image


image


https://math.hws.edu/eck/cs220/f22/registers.html

Hello World

global _start

;section .data
message: db "hello, world!", 10 ; the 10 is ord('\n')
;len: equ $-message


;section .text
_start:
    mov rax, 1 ; syscall for write (on linux)
    mov rdi, 1 ; stdout
    mov rsi, message ; string address
    mov rdx, 14 ; string len
    syscall

    ; exit
    mov rax, 60 
    mov rdi, 0  
    syscall
$ nasm -f elf64 x.asm && ld x.o && ./a.out
hello, world!



base 10 to hex

codes =  "0123456789ABCDEF"

def convet_to_hex(n):
    rcx = 64 # total bits
    while True:
        rax = n
        rcx -= 4
        rax >>= rcx

        print(codes[rax % 16])

        if rcx == 0:
            break

convet_to_hex(0x1122334455667788)


global _start

codes:
    db "1123456789ABCDEF"


section .text
_start:
    mov rax, 0x1122334455667788

    mov rdi, 1 ; write to stdout
    mov rdx, 1 ; string lens, always 1 at a time
    mov rcx, 64 ;  0x1122334455667788 has 64 bits

.loop:
    push rax ; save rax before our loop

    sub rcx, 4    ; rcx -= 4
    sar rax, cl   ;  Shift Arithmetic Right,     rcx errors so u have to use cl (lowest 8 bits)
    and rax, 0xf  ; 15 (will perform mod 16)

    lea rsi, [codes + rax] ; string address
    mov rax, 1 ; write syscall

    push rcx
    syscall ; the write syscall will affect rcx so you have to save and restore with push and pop
    pop rcx

    pop rax ; restore rax after our loop

    test rcx, rcx ; TEST sets the zero flag, ZF, if they're equal (bitwise &) 
    jnz .loop   ; jump if not zero (checks the zero flag)  (so break loop when rcx is 0)




    mov rax, 60; exit
    xor rdi, rdi
    syscall


Looking at the stack

The stack might look something like

...
0x7fffffffe878 ... other stuff before your program executes
0x7fffffffe870 ... other stuff before your program executes
0x7fffffffe868 ... other stuff before your program executes
--- program starts
0x7fffffffe860 ... junk
0x7fffffffe858 ... junk
0x7fffffffe850 ... junk
0x7fffffffe848 ... junk
...
0x000000000000 ... junk

RBP (base pointer) is typically 0 initially

in this example, RSP is 0x7fffffffe860 initially

now suppose we push some stuff on the stack:

global _start

_start:
    push 90
    push 100
    push 110


    ;exit
    mov rdi, 0
    mov rax,60
    syscall

Everytime we push something, RSP will decrease by 8 bytes, and the value will be stored there.

The maximum register values is 2**64, which is 8 bytes, which is why it’s in 8-byte sections (although the stack should usually be 16-byte alligned for various other reasons…)

If ur getting stack allignment errors when calling function, you can subtract 8 from the rsp and then add it back after.

https://stackoverflow.com/questions/64729055/what-does-aligning-the-stack-mean-in-assembly

https://stackoverflow.com/questions/51070716/glibc-scanf-segmentation-faults-when-called-from-a-function-that-doesnt-align-r

$ gdb ./a.out 
pwndbg> break _start
Breakpoint 1 at 0x401000
pwndbg> r
Starting program: /home/connor/t/a.out 

Breakpoint 1, 0x0000000000401000 in _start ()
...
 RSP  0x7fffffffe860 ◂— 1
 RIP  0x401000 (_start) ◂— push 0x5a
─────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]─────────────────────────────────────────────────
 ► 0x401000 <_start>       push   0x5a
   0x401002 <_start+2>     push   0x64
   0x401004 <_start+4>     push   0x6e
   0x401006 <_start+6>     mov    edi, 0                 EDI => 0
   0x40100b <_start+11>    mov    eax, 0x3c              EAX => 0x3c
   0x401010 <_start+16>    syscall  <SYS_exit>
   0x401012                add    byte ptr [rax], al
   0x401014                add    byte ptr [rax], al
   0x401016                add    byte ptr [rax], al
   0x401018                add    byte ptr [rax], al
   0x40101a                add    byte ptr [rax], al
──────────────────────────────────────────────────────────────[ STACK ]───────────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffe860 ◂— 1
01:0008│     0x7fffffffe868 —▸ 0x7fffffffeb5b ◂— '/home/connor/t/a.out'
02:0010│     0x7fffffffe870 ◂— 0
03:0018│     0x7fffffffe878 —▸ 0x7fffffffeb70 ◂— 'SHELL=/bin/bash'

pwndbg> n
...

*RSP  0x7fffffffe858 ◂— 0x5a /* 'Z' */
*RIP  0x401002 (_start+2) ◂— push 0x64
─────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]─────────────────────────────────────────────────
   0x401000 <_start>       push   0x5a
 ► 0x401002 <_start+2>     push   0x64
   0x401004 <_start+4>     push   0x6e
   0x401006 <_start+6>     mov    edi, 0                 EDI => 0
   0x40100b <_start+11>    mov    eax, 0x3c              EAX => 0x3c
   0x401010 <_start+16>    syscall  <SYS_exit>
   0x401012                add    byte ptr [rax], al
   0x401014                add    byte ptr [rax], al
   0x401016                add    byte ptr [rax], al
   0x401018                add    byte ptr [rax], al
   0x40101a                add    byte ptr [rax], al
──────────────────────────────────────────────────────────────[ STACK ]───────────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffe858 ◂— 0x5a /* 'Z' */
01:0008│     0x7fffffffe860 ◂— 1
02:0010│     0x7fffffffe868 —▸ 0x7fffffffeb5b ◂— '/home/connor/t/a.out'
03:0018│     0x7fffffffe870 ◂— 0
04:0020│     0x7fffffffe878 —▸ 0x7fffffffeb70 ◂— 'SHELL=/bin/bash'

pwndbg> n
...

*RSP  0x7fffffffe850 ◂— 0x64 /* 'd' */
*RIP  0x401004 (_start+4) ◂— push 0x6e
─────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]─────────────────────────────────────────────────
   0x401000 <_start>       push   0x5a
   0x401002 <_start+2>     push   0x64
 ► 0x401004 <_start+4>     push   0x6e
   0x401006 <_start+6>     mov    edi, 0                 EDI => 0
   0x40100b <_start+11>    mov    eax, 0x3c              EAX => 0x3c
   0x401010 <_start+16>    syscall  <SYS_exit>
   0x401012                add    byte ptr [rax], al
   0x401014                add    byte ptr [rax], al
   0x401016                add    byte ptr [rax], al
   0x401018                add    byte ptr [rax], al
   0x40101a                add    byte ptr [rax], al
──────────────────────────────────────────────────────────────[ STACK ]───────────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffe850 ◂— 0x64 /* 'd' */
01:0008│     0x7fffffffe858 ◂— 0x5a /* 'Z' */
02:0010│     0x7fffffffe860 ◂— 1
03:0018│     0x7fffffffe868 —▸ 0x7fffffffeb5b ◂— '/home/connor/t/a.out'
04:0020│     0x7fffffffe870 ◂— 0
05:0028│     0x7fffffffe878 —▸ 0x7fffffffeb70 ◂— 'SHELL=/bin/bash'

pwndbg> n
...

*RSP  0x7fffffffe848 ◂— 0x6e /* 'n' */
*RIP  0x401006 (_start+6) ◂— mov edi, 0
─────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]─────────────────────────────────────────────────
   0x401000 <_start>       push   0x5a
   0x401002 <_start+2>     push   0x64
   0x401004 <_start+4>     push   0x6e
 ► 0x401006 <_start+6>     mov    edi, 0                 EDI => 0
   0x40100b <_start+11>    mov    eax, 0x3c              EAX => 0x3c
   0x401010 <_start+16>    syscall  <SYS_exit>
   0x401012                add    byte ptr [rax], al
   0x401014                add    byte ptr [rax], al
   0x401016                add    byte ptr [rax], al
   0x401018                add    byte ptr [rax], al
   0x40101a                add    byte ptr [rax], al
──────────────────────────────────────────────────────────────[ STACK ]───────────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffe848 ◂— 0x6e /* 'n' */
01:0008│     0x7fffffffe850 ◂— 0x64 /* 'd' */
02:0010│     0x7fffffffe858 ◂— 0x5a /* 'Z' */
03:0018│     0x7fffffffe860 ◂— 1
04:0020│     0x7fffffffe868 —▸ 0x7fffffffeb5b ◂— '/home/connor/t/a.out'
05:0028│     0x7fffffffe870 ◂— 0
--- program starts
0x7fffffffe860 ... 1
0x7fffffffe858 ... junk
0x7fffffffe850 ... junk
0x7fffffffe848 ... junk
--- program starts
0x7fffffffe860 ... 1
0x7fffffffe858 ... 90
0x7fffffffe850 ... junk
0x7fffffffe848 ... junk
--- program starts
0x7fffffffe860 ... 1
0x7fffffffe858 ... 90
0x7fffffffe850 ... 100
0x7fffffffe848 ... junk
--- program starts
0x7fffffffe860 ... 1
0x7fffffffe858 ... 90
0x7fffffffe850 ... 100
0x7fffffffe848 ... 110



what is db doing?

message: db "hey", 10

db defines a variable of type bytes.

Alternatively, you can do this:

>>> int.from_bytes(b'hey\n', 'little')
175727976
global _start

_start:
    push 175727976
    mov rsi, rsp ; string pointer

    mov rax, 1 ; syscall for write (on linux)
    mov rdi, 1 ; stdout
    mov rdx, 4 ; string len
    syscall

    ; exit
    mov rax, 60 
    mov rdi, 0  
    syscall



what is lea doing?

I think of it as syntactic sugar for indexing bytes.

instead of

_start:
    push 175727976

    mov rsi, rsp
    add rsi, 1

    mov rax, 1 ; syscall for write (on linux)
    mov rdi, 1 ; stdout
    mov rdx, 3 ; string len
    syscall

you can just do

_start:
    push 175727976

    lea rsi, [rsp+1] 

    mov rax, 1 ; syscall for write (on linux)
    mov rdi, 1 ; stdout
    mov rdx, 3 ; string len
    syscall



Allocating memory (buffer/array)


option 1: use some builtin


option 2: decrease rsp (push does this automatically)


option 3: move the base pointer (kinda better practice than the stack pointer)

https://stackoverflow.com/questions/13468991/how-do-i-use-a-buffer-in-an-assembly-procedure


Calling convention

https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI

image


https://web.stanford.edu/class/archive/cs/cs107/cs107.1174/guide_x86-64.html

image

r12, r13, r14, r15 are good for normal variables that you don’t want to get overwritten










fuction stack frames

[~/t]
$ cat x.c
int main() {}

[~/t]
$ gcc x.c

[~/t]
$ gdb a.out
pwndbg> disas main
Dump of assembler code for function main:
   0x0000000000001119 <+0>:	    push   rbp
   0x000000000000111a <+1>:	    mov    rbp,rsp
   0x000000000000111d <+4>:	    mov    eax,0x0
   0x0000000000001122 <+9>:	    pop    rbp
   0x0000000000001123 <+10>:	ret
End of assembler dump.

Here’s a totally empty program, the mov eax,0x0 is main’s return code, and everything else is creating and destroying the main function’s stack frame.




Now let’s look at something with some more functions.

int mult(int a, int b) {
    int ret = a * b;
    return ret;
}

int triple(int x) {
    int ret = mult(x, 3);
    return ret;
}

int main() {
    triple(5);
}


pwndbg> disas main
Dump of assembler code for function main:
   0x0000000000001154 <+0>:	    push   rbp
   0x0000000000001155 <+1>:	    mov    rbp,rsp
   0x0000000000001158 <+4>:	    mov    edi,0x5
   0x000000000000115d <+9>:	    call   0x1132 <triple>
   0x0000000000001162 <+14>:	mov    eax,0x0
   0x0000000000001167 <+19>:	pop    rbp
   0x0000000000001168 <+20>:	ret
End of assembler dump.
pwndbg> disas triple
Dump of assembler code for function triple:
   0x0000000000001132 <+0>:	    push   rbp
   0x0000000000001133 <+1>:	    mov    rbp,rsp
   0x0000000000001136 <+4>:	    sub    rsp,0x18
   0x000000000000113a <+8>:	    mov    DWORD PTR [rbp-0x14],edi
   0x000000000000113d <+11>:	mov    eax,DWORD PTR [rbp-0x14]
   0x0000000000001140 <+14>:	mov    esi,0x3
   0x0000000000001145 <+19>:	mov    edi,eax
   0x0000000000001147 <+21>:	call   0x1119 <mult>
   0x000000000000114c <+26>:	mov    DWORD PTR [rbp-0x4],eax
   0x000000000000114f <+29>:	mov    eax,DWORD PTR [rbp-0x4]
   0x0000000000001152 <+32>:	leave
   0x0000000000001153 <+33>:	ret
End of assembler dump.
pwndbg> disas mult
Dump of assembler code for function mult:
   0x0000000000001119 <+0>:	    push   rbp
   0x000000000000111a <+1>:	    mov    rbp,rsp
   0x000000000000111d <+4>:	    mov    DWORD PTR [rbp-0x14],edi
   0x0000000000001120 <+7>:	    mov    DWORD PTR [rbp-0x18],esi
   0x0000000000001123 <+10>:	mov    eax,DWORD PTR [rbp-0x14]
   0x0000000000001126 <+13>:	imul   eax,DWORD PTR [rbp-0x18]
   0x000000000000112a <+17>:	mov    DWORD PTR [rbp-0x4],eax
   0x000000000000112d <+20>:	mov    eax,DWORD PTR [rbp-0x4]
   0x0000000000001130 <+23>:	pop    rbp
   0x0000000000001131 <+24>:	ret
End of assembler dump.

Main:

push   rbp                # save base pointer from whatever initially calls main
mov    rbp,rsp            # main's stack frame
mov    edi,0x5            # argument to triple
call   0x1132 <triple>    # call triple
mov    eax,0x0            # main's return code
pop    rbp                # restore base pointer to return to whatever initially called main
ret                       # return

If I break right at *main, I see:

RBP = 0x7fffffffe6d0 —▸ 0x7fffffffe730 ◂— 0
RSP = 0x7fffffffe638 —▸ 0x7ffff7c27675 ◂— mov edi, eax   (instruction right after main returns?)

Then after executing

push   rbp                
mov    rbp,rsp
# note here there's usually some 'sub rsp, ...' to actually allocate the stack size, but well in this code main doesn't need to store anything on the stack there aren't any variables etc so it's just empty like a sub rsp, 0

we have set up main’s stack frame.

RBP = 0x7fffffffe630 
RSP = 0x7fffffffe630 

stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)             <- rbp, rsp
...



Let’s keep going, mov edi,0x5 does nothing, but call triple is going to implicitly do some stuff.

call will always ‘essentially’ do this:

sub     rsp, 8
mov     [rsp], return_address
jmp     target
RBP = 0x7fffffffe630                                               (base pointer of main, which called triple)
RSP = 0x7fffffffe628 —▸ 0x555555555162 (main+14) ◂— mov eax, 0     (the instruction in main immediately after call triple)

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)             <- rbp
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     <- rsp
...

Next is this to finish setting up triple’s stack frame

push   rbp
mov    rbp, rsp
sub    rsp, 24      (24 bytes allocated for triple's stack frame)

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)          <- rbp
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000000000000                                                            <- rsp
...



A full look at the rest of triple:

# x = rdi
mov    DWORD PTR [rbp-0x14],edi   
mov    eax,DWORD PTR [rbp-0x14]

mov    esi,0x3                    # rsi = 3
mov    edi,eax                    # rdi = x
call   0x555555555119 <mult>      # calling convention is mul(rdi, rsi)

# ret = rax
mov    DWORD PTR [rbp-0x4],eax
mov    eax,DWORD PTR [rbp-0x4]  

leave                             # cleans up triple's stack frame before returning, equivalent to mov rsp, rbp (restore stack pointer) and then pop rbp (restore base pointer)
ret

Now let’s continue:

mov    DWORD PTR [rbp-0x14],edi   
mov    eax,DWORD PTR [rbp-0x14]
mov    esi,0x3              
mov    edi,eax             

rsp and rbp haven’t changed, stack now looks like this (just the 5 is new):

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)          <- rbp
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                            <- rsp



Next we call ret mult.

Remember call will ‘essentially’ do this:

sub     rsp, 8
mov     [rsp], return_address
jmp     target
 RBP  0x7fffffffe620 —▸ 0x7fffffffe630 —▸ 0x7fffffffe6d0 —▸ 0x7fffffffe730 ◂— 0
*RSP  0x7fffffffe600 —▸ 0x55555555514c (triple+26) ◂— mov dword ptr [rbp - 4], eax

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)           <- rbp
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                            
0x7fffffffe600:	0x000055555555514c (the instruction in triple immediately after calling mult)  <- rsp
...

Finish setting up mult’s stack frame (although like main there is no sub rsp, …, but for a different reason you’ll see below :) ):

push   rbp
mov    rbp,rsp

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)          
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                            
0x7fffffffe600:	0x000055555555514c (the instruction in triple immediately after calling mult)  
0x7fffffffe5f8: 0x00007fffffffe620 (base pointer of triple, saved when calling mult)           <- rbp, rsp
...



You may note that mult does not subtract from rsp! why I wondered?????

After some research, it turns out there is a 128-byte ‘red zone’, which can be used safely as long as the function doesn’t call any other functions! A neat little optimisation that’s always done, even with -O0.

https://stackoverflow.com/questions/38042188/where-exactly-is-the-red-zone-on-x86-64

https://en.wikipedia.org/wiki/Red_zone_(computing)



A full look at the rest of mult:

# mul(rdi, rsi)  a is rdi, b is rsi
mov    DWORD PTR [rbp-0x14],edi  # a = rdi
mov    DWORD PTR [rbp-0x18],esi  # b = rsi
mov    eax,DWORD PTR [rbp-0x14]  # tmp = a 
imul   eax,DWORD PTR [rbp-0x18]  # tmp *= b

# ret = tmp
mov    DWORD PTR [rbp-0x4],eax 
mov    eax,DWORD PTR [rbp-0x4]

pop    rbp
ret
int mult(int a, int b) {
    int ret = a * b;
    return ret;
}

Let’s continue. None of these instructions should affect rsp or rbp. But there’s some local variables in the red zone of the stack.

mov    DWORD PTR [rbp-0x14],edi  
mov    DWORD PTR [rbp-0x18],esi 
mov    eax,DWORD PTR [rbp-0x14]  
imul   eax,DWORD PTR [rbp-0x18]  
mov    DWORD PTR [rbp-0x4],eax 
mov    eax,DWORD PTR [rbp-0x4]

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)          
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                            
0x7fffffffe600:	0x000055555555514c (the instruction in triple immediately after calling mult)  
0x7fffffffe5f8: 0x00007fffffffe620 (base pointer of triple, saved when calling mult)           <- rbp, rsp
--- red zone below ---
0x7fffffffe5f0:	0x0000000f00000000  (ret = 15)
0x7fffffffe5e8:	0x0000000000000000
0x7fffffffe5e0:	0x0000000500000003  (a=5 and b=3)
...


Next is pop rbp.

So this takes the value at the ‘top’ of the stack (where rsp is), 0x00007fffffffe620, and puts it into rbp.

Essentially setting it up for the next ret instruction, cause 0x00007fffffffe620 is the base pointer of triple which called mult.

Pop will also always add 8 to rsp.

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)           <- rbp
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                            
0x7fffffffe600:	0x000055555555514c (the instruction in triple immediately after calling mult)  <- rsp
0x7fffffffe5f8: old popped junk       
...



Next is ret

Implicitly it will kinda do pop rip, returning back to the next instruction in triple immediately after calling mult.

And again the pop implicitly adds 8 to rsp.

 RBP  0x7fffffffe620 —▸ 0x7fffffffe630 —▸ 0x7fffffffe6d0 —▸ 0x7fffffffe730 ◂— 0
*RSP  0x7fffffffe608 ◂— 0x500000000
*RIP  0x55555555514c (triple+26) ◂— mov dword ptr [rbp - 4], eax

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)           <- rbp
0x7fffffffe618:	0x0000000000000000
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                             <- rsp                                           
0x7fffffffe600:	old popped junk
0x7fffffffe5f8: old popped junk       
...



Now we have returned back to triple.

Next is this so step twice:

# ret = rax
mov    DWORD PTR [rbp-0x4],eax
mov    eax,DWORD PTR [rbp-0x4]

Stack (only 0x7fffffffe618 changed, which has rax returned by mult):

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)           
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)     
0x7fffffffe620:	0x00007fffffffe630 (base pointer of main, saved when calling triple)           <- rbp
0x7fffffffe618:	0x0000000f00000000 (ret = 15)
0x7fffffffe610:	0x0000000000000000
0x7fffffffe608:	0x0000000500000000                                                             <- rsp                                           
0x7fffffffe600:	old popped junk
0x7fffffffe5f8: old popped junk       
...



And now we arrive at the leave instruction.

You can see it implicitly clears/pops all the 24 bytes allocated at the start, and then also does a pop rbp

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)            <- rbp
0x7fffffffe628:	0x0000555555555162 (the instruction in main immediately after call triple)    <- rsp
0x7fffffffe620:	old popped junk
0x7fffffffe618:	old popped junk
0x7fffffffe610:	old popped junk
0x7fffffffe608:	old popped junk                                      
0x7fffffffe600:	old popped junk
0x7fffffffe5f8: old popped junk       
...



Nex it ret, which remember is kinda like pop rip

Stack:

0x7fffffffe630:	0x00007fffffffe6d0 (previous base pointer to whatever called main)            <- rbp, rsp
0x7fffffffe628:	old popped junk
0x7fffffffe620:	old popped junk
0x7fffffffe618:	old popped junk
0x7fffffffe610:	old popped junk
0x7fffffffe608:	old popped junk                                      
0x7fffffffe600:	old popped junk
0x7fffffffe5f8: old popped junk       
...


Now we are back in main.

And if we execute the rest of main then 0x7fffffffe630 gets popped too.

0x7fffffffe630:	old popped junk
0x7fffffffe628:	old popped junk
0x7fffffffe620:	old popped junk
0x7fffffffe618:	old popped junk
0x7fffffffe610:	old popped junk
0x7fffffffe608:	old popped junk                                      
0x7fffffffe600:	old popped junk
0x7fffffffe5f8: old popped junk       
...


small arrays

There will be some threshold, it could depends on lots of things I think, the compiler, optimisation level, machine architecture, etc.

For me it was 32 elements * 8 bytes per element = 256 bytes.

<= 32 elements will do individual mov’s

eg

int main() {
    long my_list[32] = {0xa0ea132a134ab585, 0x8989cef66ebfa45c, 0x0774a7e04c852a56, 0x73e0f28de99282b3, 0x88165aff92a45b5c, 0x61fcd6729de1bb3b, 0x72cf46c7f74ae700, 0x80b2f200841b42cd, 0xdf374c1064969103, 0x17ec045959b91f97, 0xd8991e9b771a7246, 0x006c7d8f8783ac93, 0x2105e92dfcc83f8b, 0x8b41ab5772b432a5, 0x5bf6da1b85402bcb, 0x7763fad3d0bfbe9d, 0x4317f367188618a3, 0x508c627dccea5a92, 0x5522325731b5ac35, 0xa10b3f70cba8c125, 0x399df0803ed3d5bc, 0xe91cc4906f648c3d, 0xfafef0acca375b7d, 0xf0d430c203eb9bc2, 0xfd223a24398d786b, 0x71a182c71b23339c, 0x1a0d12c7906671c1, 0x8e9d27004cc624df, 0x2e36326df69e1603, 0xaf655da066e45254, 0x6033b28ebfb41ee1, 0x90d8b4593ba9bb88};
}
$ gcc y.c -o y -fno-stack-protector; gdb y
pwndbg> disas main
Dump of assembler code for function main:
   0x0000000000001119 <+0>:	    push   rbp
   0x000000000000111a <+1>:	    mov    rbp,rsp
   0x000000000000111d <+4>:	    sub    rsp,0x88
   0x0000000000001124 <+11>:	movabs rax,0xa0ea132a134ab585
   0x000000000000112e <+21>:	mov    QWORD PTR [rbp-0x100],rax
   0x0000000000001135 <+28>:	movabs rax,0x8989cef66ebfa45c
   0x000000000000113f <+38>:	mov    QWORD PTR [rbp-0xf8],rax
   0x0000000000001146 <+45>:	movabs rax,0x774a7e04c852a56
   0x0000000000001150 <+55>:	mov    QWORD PTR [rbp-0xf0],rax
   0x0000000000001157 <+62>:	movabs rax,0x73e0f28de99282b3
   0x0000000000001161 <+72>:	mov    QWORD PTR [rbp-0xe8],rax
   0x0000000000001168 <+79>:	movabs rax,0x88165aff92a45b5c
   0x0000000000001172 <+89>:	mov    QWORD PTR [rbp-0xe0],rax
   0x0000000000001179 <+96>:	movabs rax,0x61fcd6729de1bb3b
   0x0000000000001183 <+106>:	mov    QWORD PTR [rbp-0xd8],rax
   0x000000000000118a <+113>:	movabs rax,0x72cf46c7f74ae700
   0x0000000000001194 <+123>:	mov    QWORD PTR [rbp-0xd0],rax
   0x000000000000119b <+130>:	movabs rax,0x80b2f200841b42cd
   0x00000000000011a5 <+140>:	mov    QWORD PTR [rbp-0xc8],rax
   0x00000000000011ac <+147>:	movabs rax,0xdf374c1064969103
   0x00000000000011b6 <+157>:	mov    QWORD PTR [rbp-0xc0],rax
   0x00000000000011bd <+164>:	movabs rax,0x17ec045959b91f97
   0x00000000000011c7 <+174>:	mov    QWORD PTR [rbp-0xb8],rax
   0x00000000000011ce <+181>:	movabs rax,0xd8991e9b771a7246
   0x00000000000011d8 <+191>:	mov    QWORD PTR [rbp-0xb0],rax
   0x00000000000011df <+198>:	movabs rax,0x6c7d8f8783ac93
   0x00000000000011e9 <+208>:	mov    QWORD PTR [rbp-0xa8],rax
   0x00000000000011f0 <+215>:	movabs rax,0x2105e92dfcc83f8b
   0x00000000000011fa <+225>:	mov    QWORD PTR [rbp-0xa0],rax
   0x0000000000001201 <+232>:	movabs rax,0x8b41ab5772b432a5
   0x000000000000120b <+242>:	mov    QWORD PTR [rbp-0x98],rax
   0x0000000000001212 <+249>:	movabs rax,0x5bf6da1b85402bcb
   0x000000000000121c <+259>:	mov    QWORD PTR [rbp-0x90],rax
   0x0000000000001223 <+266>:	movabs rax,0x7763fad3d0bfbe9d
   0x000000000000122d <+276>:	mov    QWORD PTR [rbp-0x88],rax
   0x0000000000001234 <+283>:	movabs rax,0x4317f367188618a3
   0x000000000000123e <+293>:	mov    QWORD PTR [rbp-0x80],rax
   0x0000000000001242 <+297>:	movabs rax,0x508c627dccea5a92
   0x000000000000124c <+307>:	mov    QWORD PTR [rbp-0x78],rax
   0x0000000000001250 <+311>:	movabs rax,0x5522325731b5ac35
   0x000000000000125a <+321>:	mov    QWORD PTR [rbp-0x70],rax
   0x000000000000125e <+325>:	movabs rax,0xa10b3f70cba8c125
   0x0000000000001268 <+335>:	mov    QWORD PTR [rbp-0x68],rax
   0x000000000000126c <+339>:	movabs rax,0x399df0803ed3d5bc
   0x0000000000001276 <+349>:	mov    QWORD PTR [rbp-0x60],rax
   0x000000000000127a <+353>:	movabs rax,0xe91cc4906f648c3d
   0x0000000000001284 <+363>:	mov    QWORD PTR [rbp-0x58],rax
   0x0000000000001288 <+367>:	movabs rax,0xfafef0acca375b7d
   0x0000000000001292 <+377>:	mov    QWORD PTR [rbp-0x50],rax
   0x0000000000001296 <+381>:	movabs rax,0xf0d430c203eb9bc2
   0x00000000000012a0 <+391>:	mov    QWORD PTR [rbp-0x48],rax
   0x00000000000012a4 <+395>:	movabs rax,0xfd223a24398d786b
   0x00000000000012ae <+405>:	mov    QWORD PTR [rbp-0x40],rax
   0x00000000000012b2 <+409>:	movabs rax,0x71a182c71b23339c
   0x00000000000012bc <+419>:	mov    QWORD PTR [rbp-0x38],rax
   0x00000000000012c0 <+423>:	movabs rax,0x1a0d12c7906671c1
   0x00000000000012ca <+433>:	mov    QWORD PTR [rbp-0x30],rax
   0x00000000000012ce <+437>:	movabs rax,0x8e9d27004cc624df
   0x00000000000012d8 <+447>:	mov    QWORD PTR [rbp-0x28],rax
   0x00000000000012dc <+451>:	movabs rax,0x2e36326df69e1603
   0x00000000000012e6 <+461>:	mov    QWORD PTR [rbp-0x20],rax
   0x00000000000012ea <+465>:	movabs rax,0xaf655da066e45254
   0x00000000000012f4 <+475>:	mov    QWORD PTR [rbp-0x18],rax
   0x00000000000012f8 <+479>:	movabs rax,0x6033b28ebfb41ee1
   0x0000000000001302 <+489>:	mov    QWORD PTR [rbp-0x10],rax
   0x0000000000001306 <+493>:	movabs rax,0x90d8b4593ba9bb88
   0x0000000000001310 <+503>:	mov    QWORD PTR [rbp-0x8],rax
   0x0000000000001314 <+507>:	mov    eax,0x0
   0x0000000000001319 <+512>:	leave
   0x000000000000131a <+513>:	ret
End of assembler dump.

Note the sub rsp,0x88, which is 136 bytes.

The array will eat into the red zone (128 bytes) so only another 128 bytes is needed to be allocated as the stack frame, rather than 256.

Then an extra 8 bytes I presume for stack allignment.

so 128+8 = 136



Let’s take a slightly closer look at a smaller array.

int main() {
   long my_list[3] = {0x5555555555555555, 0x7777777777777777, 0x9999999999999999};
}

I think the first element (0x5555555555555555 in this case) has to start on a 16-byte boundary.

It’s less than 128 bytes so main shouldn’t have to decrease rsp at all and can just use the red zone directly.


[~/t]
$ gcc x.c -fno-stack-protector; gdb a.out
pwndbg> disas main
Dump of assembler code for function main:
   0x0000000000001119 <+0>:	push   rbp
   0x000000000000111a <+1>:	mov    rbp,rsp
   0x000000000000111d <+4>:	movabs rax,0x5555555555555555
   0x0000000000001127 <+14>:	mov    QWORD PTR [rbp-0x20],rax
   0x000000000000112b <+18>:	movabs rax,0x7777777777777777
   0x0000000000001135 <+28>:	mov    QWORD PTR [rbp-0x18],rax
   0x0000000000001139 <+32>:	movabs rax,0x9999999999999999
   0x0000000000001143 <+42>:	mov    QWORD PTR [rbp-0x10],rax
   0x0000000000001147 <+46>:	mov    eax,0x0
   0x000000000000114c <+51>:	pop    rbp
   0x000000000000114d <+52>:	ret
End of assembler dump.

pwndbg> break *main+46
Breakpoint 1 at 0x1147

pwndbg> r

pwndbg> x/gx $rsp
0x7fffffffe630:	0x00007fffffffe6d0
pwndbg> x/gx $rsp-8
0x7fffffffe628:	0x00007ffff7fe1e90
pwndbg> x/gx $rsp-16
0x7fffffffe620:	0x9999999999999999
pwndbg> x/gx $rsp-24
0x7fffffffe618:	0x7777777777777777
pwndbg> x/gx $rsp-32
0x7fffffffe610:	0x5555555555555555
pwndbg>

Stack:

0x7fffffffe630:	0x00007fffffffe6d0   (previous base ptr to whatever called main)                        <- rsp, rbp
--- red zone below ---
0x7fffffffe628:	0x00007ffff7fe1e90   (junk? just unused so that the array starts on 16-byte)
0x7fffffffe620:	0x9999999999999999
0x7fffffffe618:	0x7777777777777777
0x7fffffffe610:	0x5555555555555555   (assert 0x7fffffffe610 % 16 == 0)



big arrays

Now I’ll show a 33 element one, which will have more than 256 bytes.

There are no more individual mov’s.

int main() {
    long my_list[33] = {0xdfa85653b8bf5ab7, 0xaa43138ceac13504, 0x73eedc4fe80b50d7, 0x3cfa4687bb1ca78a, 0xc22acd68104be153, 0x65dddff568e23a92, 0x8a50268201e5e33a, 0x7a2f8010e1bec750, 0x8b4cdbe70dee8d8c, 0x3f236ed9937edc08, 0xe4ebb800c63d7ddc, 0x4c75de6f34375d43, 0x211807eff3da1173, 0x9f23ae23044585e5, 0x7f8f718008171996, 0xdecb9c2d55e2a4af, 0x577ea761c77881c0, 0x342c5a943e2eccd2, 0x64016b46d7e67821, 0x5c2b0f12610987c9, 0x410a42cf94341054, 0xc20bf80d33baffbe, 0x6f7597d5cebc0e85, 0x6ecb9adda4afc993, 0xecbaf721e30b66a7, 0x0f95f4bfe4e032a7, 0xa5d590f5137c482a, 0xb60dcef80f26b847, 0xc5f3ad64183f3231, 0xbbf444878689bfbc, 0x25037d4586ff2adb, 0xf6a34861910e4554, 0xb4d1a674a823abd4};
}
pwndbg> disas main
Dump of assembler code for function main:
   0x0000000000001119 <+0>:	    push   rbp
   0x000000000000111a <+1>:	    mov    rbp,rsp
   0x000000000000111d <+4>:	    sub    rsp,0x98
   0x0000000000001124 <+11>:	lea    rax,[rbp-0x110]
   0x000000000000112b <+18>:	lea    rdx,[rip+0xeee]        # 0x2020
   0x0000000000001132 <+25>:	mov    ecx,0x21
   0x0000000000001137 <+30>:	mov    rdi,rax
   0x000000000000113a <+33>:	mov    rsi,rdx
   0x000000000000113d <+36>:	rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
   0x0000000000001140 <+39>:	mov    eax,0x0
   0x0000000000001145 <+44>:	leave
   0x0000000000001146 <+45>:	ret
End of assembler dump.