CS 4401: Software Security Engineering

SSH (Secure Shell)
- Protocol to allow secure connection from one machine to another over a network
- ssh username@host-address
- SSH keys
  - Pair of cryptographic keys---a public key and a private key---used to connect without needing a password each time
  - Public key stored on server, while private key is stored on your device
    - Public key: ~/.ssh/id_ed25519.pub
    - Private key: ~/.ssh/id_ed25519
  - ssh-keygen -C “username@host-address
    - Adds a comment to the key
    - Stored by default at ~/.ssh/id_ed25519
Assembly
- mov eax, ebx: Copy the value of ebx into eax. Leaving ebx unchanged
- push eax: Push eax onto the stack.
- pop eax: Pop the top of the stack into eax.
- call 0xaddress: Call a function at the specified address.
- ret: Return from a function (pops the return address from the stack).
- add eax, ebx: Add ebx to eax.
- sub eax, ebx: Subtract ebx from eax.
- cmp eax, ebx: Compare eax and ebx (sets flags for conditional jumps).
- jmp 0xaddress: Jump to the specified address.
- je, jne, jl, jg: Conditional jumps (based on comparison results).
- Stack Frame: When a function is called
  - Arguments are pushed onto the stack (in reverse order for x86) with mov or push.
  - The return address is pushed with call.
  - The base pointer (ebp) is saved (by the callee) to restore later.
  - Local variables are allocated on the stack by the callee.

GDB cheatsheet
Linux permissions
- 10-character string (-rwxr-xr—)
- First character is file type
Character File Type

- Regular file

d Directory

l Symbolic link

c Character device

b Block device

s Socket

p Named pipe (FIFO)
Section titled “p Named pipe (FIFO)”
Owner, group, and others
r read, w write, x execute
Representations
- Symbolic: rwx for owner, group, and others (e.g., rwxr-xr—).
- Octal: Three digits (0-7) representing owner, group, and others (e.g., 755).
  - Calculated by summing the values of the individual permissions

Special permissions
- setuid (Set User ID)
  - When set on executable file, it runs with the owner’s privileges instead of the user’s
  - Symbolic: s in the owner’s execute bit (e.g., -rwsr-xr-x).
  - Octal: Add 4 to the beginning (e.g., 4755).
- setgid (Set Group ID):
  - When set on an executable file, it runs with the group’s privileges.
  - When set on a directory, new files inherit the directory’s group.
  - Symbolic: s in the group’s execute bit (e.g., -rwxr-sr-x).
  - Octal: Add 2 to the beginning (e.g., 2755).
- Sticky Bit:
  - When set on a directory, only the owner or root can delete files within it.
  - Symbolic: t in the others’ execute bit (e.g., drwxrwxrwt).
  - Octal: Add 1 to the beginning (e.g., 1777).
Calling convention
- A set of rules that dictate how a program should call a function
- PLs abstract low-level functionality in binaries
  - Moving stack pointer
  - Pass parameters into functions
- These decisions are specified by an Application Binary Interface (ABI)
  - Calling convention is a part of it
    - Specifies how function calls should work
- Most Linux programs use System V ABI
  - Consists of
    - Generic ABI document
    - Processor Supplement documents
      - Hardware-specific standards
- X86_64 architecture’s backward compatibility with 32-bit and 16-bit code makes x86_64 assembly have corresponding registers for 64-bit, 32-bit, and 16-bit versions

64 32 16 8

rax eax ax al return value

rbx ebx bx bl callee saved

rcx ecx cx cl 4th argument

rdx edx dx dl 3rd argument

rsi esi si sil 2nd argument

rdi edi di dil 1st argument

rbp ebp bp bpl callee saved

rsp esp sp spl stack pointer

r8 r8d r8w r8b 5th argument

r9 r9d r9w r9b 6th argument

r10 r10d r10w r10b caller saved

r11 r11d r11w r11b caller saved

r12 r12d r12w r12b callee saved

r13 r13d r13w r13b callee saved

r14 r14d r14w r14b callee saved

r15 r15d r15w r15b callee saved

64-bit
- The x86_64 System V ABI specifies that function parameters are passed via registers in this order: rdi, rsi, rdx, rcx, r8, r9. If a function has more than 6 parameters, the rest go on the stack. The return value is stored in rax.
32-bit
- The i386 System V ABI specifies that function parameters are passed on the stack, and the single return register is eax. Note the order of the parameters: They are pushed onto the stack in reverse order, i.e. the last parameter first. This means they end up ordered in memory as they are in the C code.
File descriptors
- 3 are open automatically for every process
- 0: stdin, 1: stdout, 2: stderr
Function prologue
- Setup code at the beginning of a function to allocate stack space and save registers.
- Purposes
  - Prepares the stack for local variable storage.
  - Maintains a stack frame for debugging and unwinding.
  - Ensures proper function execution flow.
- Steps
  - Push old base pointer (ebp/rbp) onto the stack.
  - Set up new base pointer (mov ebp, esp).
  - Allocate space for local variables (sub esp, size).
  - (Optional) Save callee-saved registers.
Function epilogue
- Cleanup code at the end of a function before returning.
- Purposes
  - Cleans up stack memory.
  - Restores previous execution context.
  - Ensures correct return to the caller function.
- Steps
  - Restore callee-saved registers (if saved in the prologue).
  - Deallocate local variables (mov esp, ebp).
  - Restore the old base pointer (pop ebp).
  - Return to caller (ret).
Stack canary
- Random 8-byte, changes every program run, inserted by compilers to protect against stack-based buffer overflow attacks
- Placed before the base pointer
- Stack Canaries | Practical CTF
Supplying malicious input
- gets and stdin
- Command-line arguments (argc, argv)
- Environment variables
- Files on disks (relative path)
- Network connections
- External sensors
Exploits
- Buffer overflow: Writing more data into a buffer than it can hold, overwriting adjacent memory (e.g., return address).
  - Memory regions in a process: most important ones are the stack, the heap, and the text sections.
    - The stack is a region of memory used by each thread (a thread is a smaller unit of a process that can run independently) to keep track of the thread’s execution state. This includes local variables, information about which functions have been called, and other temporary data. The stack for the main thread (the primary thread that begins executing when the program starts) also includes the command-line arguments and environment variables.
    - The heap is a region used for dynamically allocated memory. For example, when you use malloc() to allocate memory in a C program, that memory comes from the heap. Managing this memory correctly is challenging for programmers and is another common source of errors, which we’ll discuss in future lectures.
    - The text section of memory stores the compiled program code. This area is often marked as read-only to prevent the code from being accidentally (or maliciously) modified during execution.

```

0xff ff ff ff

cmd env (set at process start, lives on main thread’s stack)

stack (grows toward lower adddresses, bottom is fixed)

heap (managed by malloc)

text (code, read only)

0x00 00 00 00

```

Use file to determine if a binary is 64-bit or 32-bit
gdb cannot leverage setuid and setgid for privilege escalation attacks because it runs binaries with user permissions due to security reasons
Segmentation faults trigger when your input overwrites critical values in memory, one of which is the return address
- The CPU uses the instruction pointer register (rip) to keep track of the next instruction to execute
- When a function is called, the instruction pointer jumps to the first instruction of that function.
- However, because functions need to return to the point they were called from, the current value of the instruction pointer (the return address) must be saved before the jump. This saved return address is stored in memory---specifically, on the stack---so that the CPU can resume execution at the correct point after the function completes.
- By controlling the return address, we can redirect the program’s control flow, which can lead to arbitrary code execution
Format string: Exploiting printf-like functions to read/write memory.
Use-after-free: Using memory after it has been freed, leading to code execution.
Integer overflow: Causing an integer to wrap around, leading to unexpected behavior.
Understanding binaries
- Compiled form of a program that the OS loads to run as a process
- ELF binaries
  - Executable and Linkable Format
  - Standard file format for executables on Linux
  - Steps the OS takes to load ELF binary into memory and starts the process
    - Reading the ELF Header: An ELF binary starts with a header that contains important information about the structure of the file. This header includes details about the different sections of the binary, such as the text section (with contains the executable code) and the data section (which holds certain variables). The OS begins by reading this ELF header to understand how the binary is organized.
    - Mapping Sections into Virtual Memory: After reading the ELF header, the OS maps the various sections of the binary into the process’s virtual memory. This means that the OS allocates memory for each section and sets up the process’s virtual address space according to the instructions provided by the ELF file. For example:
      - The text section is mapped to a region of memory where the CPU can execute the code.
      - The data section is mapped to a separate region where variables and other data will be stored.
    - Setting Up the Process Environment: Once the sections are mapped into memory, the OS sets up the process’s environment. This includes initializing the stack, setting up the heap (for dynamic memory allocation), and preparing any necessary environment variables and command-line arguments.
    - Starting the Process: Finally, the OS transfers control to the entry point of the program, which is typically the main() function. At this point, the program begins executing its instructions.
  - Tools like readelf and objdump are useful for exploring the structure of ELF binaries
- File permissions and how setuid flag affects security
  - The setuid feature is helpful for tasks that require temporary elevated privileges, like accessing hardware or managing system resources. For example, commands like ping and sudo are setuid binaries. Fortunately, the OS has safeguards to prevent misuse, ensuring that an unprivileged user can’t simply write a program and use setuid to run it as root.
- Position Independent Executables (PIE) and how it affects exploit development
  - Some binaries are compiled to be position-independent, meaning that their code sections can be loaded at different memory locations each time the program runs
  - Called Position Independent Executables (PIE binaries)
  - When combined with Address Space Layout Randomization (ASLR) enhances security by making it harder for attackers to predict where specific functions or code sequences will be located in memory
  - Can check if a binary is compiled with PIE support using the checksec utility
Code injection
- Memory permissions
  - The virtual address space is divided into 4096-byte blocks of memory called pages. Each page is associated with a set of permissions that specify whether the page can be read from, written to, or executed
  - Any attempts to write to a read-only memory will result in a segmentation fault
  - Stack-based buffer overflows can only directly change the memory at addresses higher than the buffer
  - If, as an attacker, you can’t overwrite/modify the existing code for a process, then the logical next step to supply the malicious code. This class of attack is generally referred to as code injection
- Stack frames
  - The section of the stack associated with a single function call
  - Every time a function is called, a new stack frame is created
  - Setting up that stack frame involves several steps
    - some of which are the caller’s responsibility
    - some are the callee’s responsibility
  - The calling convention defines this behavior, and it varies from system to system.
  - Each function’s instructions for stack setup and tear-down are referred to as the function prologue and epilogue, respectively
    - The prologue and epilogue also free up registers
  - Register spilling
    - The process of saving the previous value of a register before using that register in a function prologue

push rbp ; Save old base pointer

mov rbp, rsp ; Set new base pointer to current stack pointer

sub rsp, 32 ; Allocate 32 bytes for local variables

Similarly, the function epilogue includes instructions to restore the saved register values

mov rsp, rbp ; Restore stack pointer

pop rbp ; Restore old base pointer

ret ; Return to caller

Shellcoding
- When the injected code launches a shell, giving the attacker a convenient interface to the hacked machine, that injected code is called shellcode.
- Writing shellcode
  - System call is a means for a user-mode process to give control to the OS kernel to carry out a privileged task on the process’s behalf
  - execve is a useful system call to launch a shell process that we can then interact with
- How to check if the stack is executable: In the EpicTreasure docker image, you can use checksec---look for the string “NX disabled”
- Call exec() system call with shell programs (sh, bash)
- Pwntools.shellcraft
Code reuse
- Memory permissions can make it impossible to execute any code on the stack
- Setting the NX (No eXecute) bit or data execution prevention (DEP)
  - Following a simple invariant: memory that is writable cannot be executed and memory that is executable cannot be writeable
- Code reuse attacks bypassing restrictions on writable memory execution by reusing code that is already in the binary
- Return-to-libc
  - Code reuse attack that redirect control flow to execute useful functions that are in commonly-used libraries (e.g., libc)
  - Steps
    - find where the code for system() lives in memory,
    - set up the stack (and/or registers for 64-bit binaries) with the proper arguments and a dummy return address, and
    - hijack the control-flow of the program to execute the desired function.
- Return-oriented programming
  - A gadget is a short sequence of instructions ending in a return
  - ROPgadget
  - Stack alignment 64 bit calling convention, rsp divisible by 16@call, ret div by 16+8 @ret, movaps might crash bc of hardware fault, can use ret gadget to align the stack
Memory sections of a program

[1] 08048000-0804a000 R+Xp 00000000 00:0b 812 /tmp/cat

[2] 0804a000-0804b000 RW+p 00002000 00:0b 812 /tmp/cat

[3] 40000000-40015000 R+Xp 00000000 03:07 110818 /lib/ld-2.2.5.so

[4] 40015000-40016000 RW+p 00014000 03:07 110818 /lib/ld-2.2.5.so

[5] 4001e000-40143000 R+Xp 00000000 03:07 106687 /lib/libc-2.2.5.so

[6] 40143000-40149000 RW+p 00125000 03:07 106687 /lib/libc-2.2.5.so

[7] 40149000-4014d000 RW+p 00000000 00:00 0

[8] bfffe000-c0000000 RWXp fffff000 00:00 0

As we can see, /tmp/cat is a dynamically linked ELF executable, its address space contains several file mappings.
[1] and [2] correspond to the loadable ELF segments of /tmp/cat containing code and data (both initialized and uninitialized), respectively.
[3] and [4] represent the dynamic linker whereas [5], [6] and [7] are the segments of the C runtime library ([7] holds its uninitialized data that is big enough to not fit into the last page of [6]).
[8] is the stack which grows downwards.
There are other mappings as well that this simple example does not show us: the brk() managed heap that would directly follow [2] and various anonymous and file mappings that the task can create via mmap() and would be placed between [7] and [8] (unless an explicit mapping address outside this region was requested using the MAP_FIXED flag.
For our purposes all these possible mappings can be split into three groups:
- [1], [2] and the brk() managed heap following them,
- [3]-[7] and all the other mappings created by mmap(),
- [8], the stack.
Format string vulnerabilities
- Both leak information from memory and modify arbitrary locations with arbitrary values (i.e., a write-what-where primitive)
- Taking control of printf
  - If the attacker has control of the first parameter, the format string, they can specify more format specifiers than variables, which will read more information (i.e., treating whatever is on the stack at where the supposed arguments are pushed as data), leaking information to bypass defenses like stack canaries and ASLR!
  - printf has another interesting format specifier, %n, which writes the number of characters printed by printf prior to reaching the %n to an address specified by the next argument
    - Enables arbitrary writes
  - %11$n in printf writes the number of bytes printed so far to the memory address pointed to by the 11th argument on the stack/12th argument from printf’s perspective.
  - In a format string vulnerability, it can be used to write an integer (usually for exploitation) to an arbitrary memory address, if you control the 11th argument.
- Integer overflow also one way to use format string if we have control of an index
- %s dereferences the string address → might dereference unmapped memory address → segfaults. Better use %p or %x to leak information
  - But useful to read out values in libc or text section
  - Stack layout → %p, other sections → %s
- Global offset table
  - Binaries can be broadly divided into two categories
    - Statically-linked
      - Self-contained
        
        Does not use external libraries
    - Dynamically-linked
      - Depend on system libraries to added at runtime
      - Get the list of dynamically linked binaries used by a binary at runtime with ldd <binary_name>

linux-vdso.so.1 => (0x00007ffff7ffa000)

libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7a0d000)

/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd7000)

These are the base addresses of the binaries, will change for each run due to ASLR → how can our binaries know these addresses if they change?

0x000000000040053a <+20>: call 0x400400 <puts@plt>

Here, the program will call puts in the procedure linkage table (plt or PLT) at address 0x400400
When binaries are compiled, there are sections called relocations that are filled in by the linker at runtime (think templating engine)
- Can run readelf —relocs ./<binary_name>

Relocation section ‘.rela.text’ at offset 0x208 contains 2 entries:

Offset Info Type Sym. Value Sym. Name + Addend

000000000010 00050000000a R_X86_64_32 0000000000000000 .rodata + 0

000000000015 000a00000002 R_X86_64_PC32 0000000000000000 puts - 4

Relocation section ‘.rela.eh_frame’ at offset 0x238 contains 1 entries:

Offset Info Type Sym. Value Sym. Name + Addend

000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0

When you compile a binary that uses external symbols (like printf, malloc, etc.), the compiler doesn’t know their final memory addresses.
So it generates relocation entries: metadata saying, “At this address, patch in the actual address of printf later.”
The linker/loader:
- During static linking, fills in known addresses from libraries you’re linking against.
- During dynamic linking, it leaves PLT stubs and GOT slots as placeholders.

Dump of assembler code for function puts@plt:

0x0000000000400400 <+0>: jmp QWORD PTR [rip+0x200c12] # 0x601018

0x0000000000400406 <+6>: push 0x0

0x000000000040040b <+11>: jmp 0x4003f0

We jumped to the PLT but we then immediately jump somewhere else, specifically to 0x601018
- This technique of jumping to a location and then immediately jumping somewhere else based on a stored value is sometimes called function trampolining.
0x601018 is actually an address stored in the global offset table (got or GOT).
- Can get by x/128wx _GLOBAL_OFFSET_TABLE_
At runtime:
- The dynamic linker (ld.so) acts like your “templating engine.”
- On the first call to an external function:
  - The PLT stub jumps to a resolver that finds the real address of the function (via the symbol table).
    - Lazy binding
  - It writes that address into the corresponding GOT slot.
  - Future calls jump directly to the resolved address via GOT → no extra lookup needed.
Argument clinic → how to get null bytes into stack
- ./hello “a” ” ” “c”
- Empty string
Heap
- Memory management
  - 3 basic ways
    - automatically, e.g. local variables in a function;
    - statically, e.g., global variables;
    - dynamically, .e.g, calls to malloc.
  - Bugs
    - buffer overflows: writing past the bounds of allocated memory objects;
    - dangling pointers: pointers to deallocated memory (that the program will then use as if it is valid memory);
    - double frees: deallocating memory twice;
    - memory leaks: never deallocating memory;
    - uninitialized reads: reading memory before it has been initialized.

Code pointers
- A program variable that stores a code address and that address is intended to be loaded into the instruction pointer.
- Examples
  - Return addresses: the saved address of where execution must resume when a function ends.
  - Function pointers: C variables used to dynamically specify which function to execute.
  - Global offset table: addresses here are used to execute dynamically loaded functions (lazy loading: contains stub code that will load the function into memory on the first call).
  - Virtual function table: addresses here are used to know which method to execute, e.g., dynamic binding in C++.
  - Destruction functions (i.e., dtors): these functions are called when a program executes.
  - Data pointers: these are not code pointers, per se, but they can be made to point to a code pointer and used to overwrite that pointer.
- There are many memory locations that an attacker can target to hijack control flow.
Heap-based vulnerabilities can be very dependent on how the internal implementation of the heap allocator actually works
- Google Chrome’s PartitionAlloc
- FreeBSD’s jemalloc
- glibc default heap implementation
- Windows heap
ptmalloc

Chunk alignment: For correctness and performance, all allocated chunks must be aligned
- 8-byte aligned on 32-bit systems, or 16-byte aligned on 64-bit
Chunk composition
- Allocation metadata
- Alignment-padding bytes
Chunk allocation: basic strategy
- If there is a previously-freed chunk of memory, and that chunk is big enough to service the request, the heap manager will use that freed chunk for the new allocation.
- Otherwise, if there is available space at the top of the heap, the heap manager will allocate a new chunk out of that available space and use that.
- Otherwise, the heap manager will ask the kernel to add new memory to the end of the heap, and then allocates a new chunk from this newly allocated space.
- If all these strategies fail, the allocation can’t be serviced, and malloc returns NULL.
Option 1: Allocating from free’d chunks

Heap manager uses “bins”, a series of linked lists, to keep track of free’d chunks (from free())
When an allocation request is made, the heap manager searches those bins for a free chunk that’s big enough to service the request
If it finds one, it can remove that chunk from the bin, mark it as “allocated”, and then return a pointer to the “user data” region of that chunk to the programmer as the return value of malloc.
For performance reasons, different types of bins
- fast bins
- unsorted bin
- small bins
- large bins
- per-thread tcache
Option 2: Allocating from the top of the heap
- If there are no free chunks available that are big enough, the heap manager must instead construct a new chunk.
- The heap manager first looks at the free space at the end of the heap (sometimes called the “top chunk” or “remainder chunk” or “wilderness”) to see if there is enough space there.
- If there is, the heap manager manufactures a new chunk out of this free space.
Option 3: Asking the kernel to add more memory to the top of the heap

Once the free space at the top of the heap is used up, the heap manager will have to ask the kernel to add more memory to the end of the heap.
- On the initial heap, the heap manager asks the kernel to allocate more memory at the end of the heap by calling [sbrk].
- On most Linux-based systems this function internally uses a system call called “brk”.
- This system call has a pretty confusing name—it originally meant “change the program break location”, which is a complicated way of saying it adds more memory to the region just after where the program gets loaded into memory. Since this is where the heap manager creates the initial heap, the effect of this system call is to allocate more memory at the end of the program’s initial heap.
Eventually, expanding the heap with sbrk will fail—the heap will eventually grow so large that expanding it further would cause it to collide with other things in the process’ address space, such as memory mappings, shared libraries, or a thread’s stack region. Once the heap reaches this point, the heap manager will resort to attaching new non-contiguous memory to the initial program heap using calls to [mmap].
If mmap also fails, then the process simply can’t allocate any more memory, and malloc returns NULL.
Option 4: Off-heap allocations via mmap
- Very large allocation requests get special treatment in the heap manager. These large chunks are allocated off-heap using a direct call to mmap, and this fact is marked using a flag in the chunk metadata. When these huge allocations are later returned to the heap manager via a call to free, the heap manager releases the entire mmaped region back to the system via munmap.
  - By default this threshold is 128KB up to 512KB on 32-bit systems and 32MB on 64-bit systems, however this threshold can also dynamically increase if the heap manager detects that these large allocations are being used transiently.
Arenas and sub-heaps
Chunk metadata

Header: mchunk_size
- a single size_t header that is positioned just behind the “user data” region
  - written to during malloc, and later used by free to decide how to handle the release of the allocation.
- three bits called “A”, “M”, and “P”
  - These can all be stored in the same size_t field because chunk sizes are always 8-byte aligned (or 16-byte aligned on 64-bit), and therefore the low three bits of the chunk size are always zero.
  - The “A” flag is used to tell the heap manager if the chunk belongs to secondary arena, as opposed to the main arena. During free, the heap manager is only given a pointer to the allocation that the programmer wants to free, and the heap manager needs to work out which arena the pointer belongs to. If the A flag is set in the chunk’s metadata, the heap manager must search each arena and see if the pointer lives within any of that arena’s subheaps. If the flag is not set, the heap manager can short-circuit the search because it knows the chunk came from the initial arena.
  - The “M” flag is used to indicate that the chunk is a huge allocation that was allocated off-heap via mmap. When this allocation is eventually passed back to free, the heap manager will return the whole chunk back to the operating system immediately via munmap rather than attempting to recycle it. For this reason, freed chunks never have this flag set.
  - The “P” flag is confusing because it really belongs to the previous chunk. It indicates that the previous chunk is a free chunk. This means when this chunk is freed, it can be safely joined onto the previous chunk to create a much larger free chunk.
Coalescing
gef
- heap chunks
- heap bins
Use-after-free
- libc metadata modification
Buffer overflow
Double free

CS 4401: Software Security Engineering

p Named pipe (FIFO)