CS 2011 - Introduction to Machine Organization and Assembly Language

Bits, Bytes, and Integers
- Binary representations of information

no caption for u haha

Electronic devices: represent bits with different voltage levels on a circuit
Binary to decimal
- 1011 = 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0 = 11
- Conversion from all other number bases to decimal works the same
Hexadecimal
- Useful in modern systems
- Extends past 9 to A, B, C, D, E, and F
- 1 hex character = ½ byte = 1 nibble = 4 bits
  - 0 - F would mean 0000 (0) to 1111 (15)
- 2 hex characters = 1 byte = 8 bits
- Write FA1D37B(16) in C as
  - 0xFA1D37B
  - 0xfa1d37b
- 0x prefix denotes a hex number — hexadecimal literal
- 0 prefix — octal
- No prefix — decimal
- 0b prefix — binary
Programmers - 64 Bit Calculator
Byte-oriented memory organization

no caption for u haha

Programs refer to virtual addresses
- Conceptually, very large array of bytes
- Actually, implemented with hierarchy of different memory types
- System provides address space private to particular “process”
  - Program being executed
  - Program can clobber its own data, but not that of others
Compiler + run-time system control allocation
- Where different program objects should be stored
- All allocation within single virtual address space
Machine words
- Machine has “word size”
  - A unit of data that a machine processes (transfers between CPU and RAM) in one operation
  - Nominal size of pointer data
    - Determinant of the max size of the address space
    - Hardware deals with the memory a word size at a time
- Older machines use 32 bits (4 bytes) words
  - Limits addresses to 4GB
  - Becoming too small for memory-intensive applications
- Many systems use 64 bits (8 bytes) words including X86-64
  - Potential address space ≈ 1.8 X 1019 bytes
  - x86-64 machines support 48-bit addresses: 256 Terabytes (2015)
- Machines support multiple data formats
  - Fractions or multiples of word size
  - Always integral number of bytes
Word-oriented memory organization

no caption for u haha

Addresses specify byte locations
- Address of first byte in word
- Addresses of successive words differ by 2 (16-bit), 4 (32-bit), or 8 (64-bit)
Data representations

no caption for u haha

Byte ordering
- How should bytes within a multi-byte data format / word be ordered in memory?
- Every byte has a unique address is memory. Endianness is the order that these bytes are read in
- Conventions - Big Endian: Sun, PPC Mac, Internet - Least significant byte has highest address - Example: x has 4-byte representation 0x01234567 - Address given by &x is 0x100

no caption for u haha

Little Endian: x86
- Least significant byte has lowest address

no caption for u haha

Convergence between network stack’s byte ordering and consumers’ chips (x86) → network stack handle conversion
Reading byte-reversed listings
- Assembly language instructions are human-friendly names — specific CPU operations
- Machine code is the binary representation of those instructions
- Every assembly instruction maps to a corresponding machine code rendition
- Disassembly: machine code → assembly
  - Text representation of binary machine code
  - Generated by program that reads the machine code
- Example Fragment

no caption for u haha

Deciphering Numbers
- Value: 0x12ab
- Pad to 32 bits: 0x000012ab
- Split into bytes: 00 00 12 ab
- Reverse: ab 12 00 00
Representing integers

no caption for u haha

Representing pointers — memory addresses of other variables

no caption for u haha

Representing strings
- Strings in C
  - Represented by array of characters
  - Each character encoded in ASCII format
    - Standard 7-bit encoding of character set
    - Character “0” has code 0x30
      - Digit i has code 0x30+i
  - String should be null-terminated
    - Final character = 0
- Compatibility - Byte ordering not an issue

no caption for u haha

Bit-level manipulations
- Boolean algebra - Operate on bit vectors - Operations applied bitwise

no caption for u haha

All of boolean algebra properties apply
Representation
- Width w bit vector represents subsets of {0, …, w—1}
- aj = 1 if j ∈ A
  - 01101001 { 0, 3, 5, 6 }
  - 76543210
  - 01010101 { 0, 2, 4, 6 }
  - 76543210
Operations
- Bit-level
  - & Intersection 01000001 { 0, 6 }
  - | Union 01111101 { 0, 2, 3, 4, 5, 6 }
  - ^ Symmetric difference 00111100 { 2, 3, 4, 5 }
  - ~ Complement 10101010 { 1, 3, 5, 7 }
  - Available in C
    - Apply to any “integral” data type—
      - long, int, short, char, unsigned
    - View arguments as bit vectors
    - Arguments applied bit-wise
- Logical
  - &&, ||, !
    - View 0 as “False”
    - Anything nonzero as “True”
    - Always return 0 or 1
- Shift - Left Shift: x << y - Shift bit-vector x left y positions - Throw away extra bits on left - Fill with 0’s on right - Right Shift: x >> y - Shift bit-vector x right y positions - Throw away extra bits on right - Logical shift - Fill with 0’s on left - Arithmetic shift - Replicate most significant bit on right - Undefined Behavior - Shift amount < 0 or ≥ word size - Different machines behave differently
Integers
- Representation: unsigned and signed - Encoding integers - Unsigned

no caption for u haha

Sign magnitude

no caption for u haha

Two’s complement
- Sign bit — The most significant bit - 0 for non-negative - 1 for negative

no caption for u haha

Numeric ranges

no caption for u haha

Unsigned
- UMin = 0 — 000…0
- UMax = 2^w - 1 — 111…1
Two’s complement
- TMin = -2^(w - 1) — 100…0
- TMax = 2^(w-1) -1 — 011…1
- -1 — 111…1
Observations
- |TMin| = TMax + 1
  - Asymmetric range
- UMax = 2 * TMax + 1
C Programming
- #include <limits.h>
- Declares constants
  - ULONG_MAX
  - LONG_MAX
  - LONG_MIN
- Values platform specific
Properties
- Equivalence
  - Same encodings for nonnegative values
- Uniqueness
  - Every bit pattern represents unique integer value
  - Each representable integer has unique bit encoding
- Can Invert Mappings
  - U2B(x) = B2U-1(x)
    - Bit pattern for unsigned integer
  - T2B(x) = B2T-1(x)
    - Bit pattern for two’s comp integer
Conversion, casting
- Keep same bit representations and reinterpret

no caption for u haha

C Programming
- Constants
  - Default is signed integers
  - Unsigned if have “U” as suffix
    - 4294967259U
- Casting - Explicit casting \

no caption for u haha

Implicit casting

no caption for u haha

Expression containing signed and unsigned int → int is cast to unsigned !!
- Common source of bugs

no caption for u haha

Expanding, truncating
- Expanding - Usigned: zeroes added - Sign extension: Convert w-bit signed integer x to w+k bit integer with same value - Make k copies of sign bit

no caption for u haha

C automatically performs sign extension
Truncating
- For unsigned numbers
  - Equivalent to dividiving by 2^k and keeping the remainder
    - truncate(x, k) = x mod 2^k
- For signed numbers - Same bit result but truncated number may have different sign (!)

no caption for u haha

Addition, negation, multiplication, shifting
- Negation: complement & increment
  - Claim: following holds for two’s complement
    - ~x + 1 == -x
  - Complement
    - Observation: ~x + x == 1111…111 == -1
- Unsigned addition - Operands: w bits - True sum: w+1 bits - Discard carry: w bits

no caption for u haha

Standard addition function
- Ignores carry output
Implement modular arithmetic

no caption for u haha

Mathematical properties
Two’s complement addition
- Operands: w bits
- True sum: w+1 bits
- Discard carry: w bits

no caption for u haha

TAdd and UAdd have Identical Bit-Level Behavior
- Signed vs. unsigned addition in C:
  
  int s, t, u, v;
  
  s = (int) ((unsigned) u + (unsigned) v);
  
  t = u + v
  
  Will give s == t
Multiplication — shifting & adding

no caption for u haha

Works the same for signed and unsigned
- Same bit pattern
- Different interpretation
- Different overflows
Performance
- 10 or more machine cycles
- Multiply-and-Add instruction
- Easily pipelined
  - Split into separate operations, processed in parallel
Compiler optimizations
- Small, constant multipliers
  - E.g., array indexes
- Shift instructions followed by adds
- Specialized instructions
Power-of-2 Multiply with Shift
- Operation
  - u << k gives u * 2^k
  - Both signed and unsigned
- Most machines shift and add faster than multiply - gcc generates this code automatically - x*12 → (x+x*2) << 2
Division — subtracting and shifting
- Too many edge cases if using integers → limitations of integers → float
Floating Point
- Background: Fractional binary numbers

no caption for u haha

Representation
- Bits to right of “binary point” represent fractional powers of 2
- Represents rational number:

no caption for u haha

Observations
- Divide by 2 by shifting right
- Multiply by 2 by shifting left
- Numbers of form 0.111111…2 are just below 1.0
  - 1/2 + 1/4 + 1/8 + … + 1/2i + … ➙ 1.0
  - Use notation 1.0 — ε
Representable numbers
- Limitations - Can only exactly represent numbers of the form x/2k - Other rational numbers have repeating bit representations - Many, many bits needed for very large or small numbers with fixed binary point
IEEE floating point standard: Definition
- A way to approximate real and most rational numbers in computers
- Examples:
  - 3.14159265358979323846 --- pi
  - 2.99792458 108 m/s --- c, the velocity of light
  - 6.62606885 10-27 erg sec --- h, Planck’s constant
- In C (and most other programming languages):—
  - 3.14159265358979323846
  - 2.99792458e8
  - 6.62606885e-27
- IEEE Standard 754
  - Established in 1985 as uniform standard for floating point arithmetic
  - Now supported by all major processors
- Driven by numerical concerns
  - Nice standards for rounding, overflow, underflow
  - Difficult to make fast in hardware
    - Numerical analysts predominated over hardware designers in defining standard
- Representation - Numerical form -

- Sign bit s determines whether number is negative or positive - Significand M normally a fractional value in range [1.0,2.0) (in implicit normalizations). - Exponent E weights value by power of two - Encoding - MSB s is sign bit s - exp field encodes E (but is not equal to E) - frac field encodes M (but is not equal to M)

no caption for u haha

Precisions
- Single precision: 32 bits
  - 1 — 8 — 23
- Double precision: 64 bits
  - 1 — 11 — 52
- Extended precision: 80 bits (Intel only)
  - 1 — 15 — 63/64
Normalized values
- Condition: exp != 000…0 and exp != 111…1
- Exponent coded as biased value: E = Exp - Bias
  - Exp: unsigned value exp
  - Bias = 2^(k - 1) - 1, where k is the number of exponent bits
    - Single precision: 127 (Exp: 1…254, E: -126…127)
    - Double precision: 1023 (Exp: 1…2046, E: -1022…1023)
- Significand coded with implied leading 1: M = 1.xxx…x2 - xxx…x: bits of frac - Minimum when 000…0 (M = 1.0) - Maximum when 111…1 (M = 2.0 — ε) - Get extra leading bit for “free”

no caption for u haha

Denormalized values
- Condition: exp = 000…0
- Exponent value: E = —Bias + 1 (instead of E = 0 — Bias)
- Significand encoded with implied leading 0: M = 0.xxx…x2
  - xxx…x: bits of frac
- Cases - exp = 000…0, frac = 000…0 - Represents zero value - Note distinct values: +0 and —0 (why?) - exp = 000…0, frac ≠ 000…0 - Numbers very close to 0.0 - Lose precision as get smaller - Equispaced

no caption for u haha

Special values
- Condition: exp = 111…1
- Case: exp = 111…1, frac = 000…0
  - Represents value (infinity)
  - Operation that overflows
  - Both positive and negative
  - E.g., 1.0/0.0 = −1.0/−0.0 = +, 1.0/−0.0 = −infinity
- Case: exp = 111…1, frac ≠ 000…0 - Not-a-Number (NaN) - Represents case when no numeric value can be determined - E.g., sqrt(—1), infinity, −infinity, infinity * 0

no caption for u haha

Properties
- FP zero same as integer zero
  - All bits = 0
- Can (almost) use unsigned integer comparison
  - Must first compare sign bits
  - Must consider −0 = 0
  - NaNs problematic
    - Will be greater than any other values
    - What should comparison yield?
  - Otherwise OK
    - Denorm vs. normalized
    - Normalized vs. infinity
Rounding, addition, multiplication
- Floating point operations - Basic idea - First compute exact result - Make it fit into desired precision - Possibly overflow if exponent too large - Possibly round to fit into frac

no caption for u haha

Rounding
- Rounding modes (illustrate with $ rounding)
  - Towards zero, round down, round up, nearest even (default)
- Round-to-even
  - Default Rounding Mode
  - Hard to get any other kind without dropping into assembly
  - All others are statistically biased
    - Sum of set of positive numbers will consistently be over- or under-estimated
- Rounding binary numbers
  - Binary Fractional Numbers
    - “Even” when least significant bit is 0
    - “Half way” when bits to right of rounding position = 100…(2)
Multiplication

no caption for u haha

Exact Result: (—1)s M 2E
- Sign s: s1 ^ s2
- Significand M: M1 x M2
- Exponent E: E1 + E2
Fixing
- If M ≥ 2, shift M right, increment E
- If E out of range, overflow
- Round M to fit frac precision
Implementation
- Biggest chore is multiplying significands
Addition

no caption for u haha

Fixing
- If M ≥ 2, shift M right, increment E
- if M < 1, shift M left k positions, decrement E by k
- Overflow if E out of range
- Round M to fit frac precision
Floating point in C
- C guarantees two levels
  - float single precision
  - double double precision
- Conversions/casting
  - Casting between int, float, and double changes bit representations
- double/float → int
  - Truncate fractional part
  - Not defined when out-of-range, NaN, etc.;
- int → double
  - Exact conversion for numbers that fit into ≤ 53 bits
- int → float - Round according to rounding mode
Machine-Level Programming
- Architecture
  - Design of the computer
  - Instruction Set Architecture: The parts of a processor design that one needs to understand or write assembly/machine code
    - Also known as the ISA
    - I.e., specification of instruction formats, actions, registers, etc.
  - Microarchitecture: Implementation of the architecture
    - Examples: cache sizes and core frequency.
- Code Forms
  - Machine Code: The byte-level programs that a processor executes
  - Assembly Code: A text representation of machine code
- Example ISAs
  - Intel: x86, IA32, Itanium, x86-64
  - ARM: Used in almost all mobile phones, many tablets
- Assembly/Machine Code View

no caption for u haha

Programmer-visible state
- PC: Program counter
  - Address of next instruction
  - Called “RIP” (x86-64 instruction pointer)
- Register file
  - Heavily used program data
  - Fast, easily accessible storage for the data the program is currently working with
- Condition codes
  - Store status information about most recent arithmetic or logical operation
  - Used for conditional branching
    - Determine which instruction to run next based on the result of a previous operation
- Memory
  - Larger-scale storage (than registers) for the program’s code and data.
  - Byte addressable array
  - Code and user data
    - Code stores the instructions to be executed
    - Data stores the information the program is working with
  - Stack to support procedures (aka functions)
- CPU
  - Archaic term for “Processor”
    - “Central Processing Unit”
The registers, while memory provides Execution Model for Modern Processors — von Neumann cycle (fetch-decode-execute)

no caption for u haha

Fetch: The program counter (PC) points to the memory address of the next instruction to be executed. The CPU fetches this instruction from memory.
Decode instruction and get any data it needs (possibly from memory).
Access Memory: If the instruction requires data from memory (e.g. a load or store operation), the CPU uses the address specified in the instruction to access the desired memory location.
Write to Register: Once the data is retrieved from memory, the CPU writes that data into one of the registers in the register file. This allows the program to use that data as part of the current operation.
Repeat.
Turning C into Object Code

no caption for u haha

Compiling Into Assembly

no caption for u haha

Assembly Characteristic: Data Types
- “Integer” data of 1, 2, 4, or 8 bytes
  - Data values
  - Addresses (untyped pointers)
- Floating point data of 4, 8, or 10 bytes
- Code: Byte sequences encoding series of instructions
- No aggregate types such as arrays or structures
  - Just contiguously allocated bytes in memory
Assembly Characteristics: Operations
- Move/copy data between memory and register
  - Load data from memory into register
  - Store register data into memory
- Perform arithmetic or logical function on register or memory data
- Transfer control
  - Unconditional jumps to/from procedures
  - Conditional branches
Object Code
- Assembler
  - Translates .s into .o
  - Binary encoding of each instruction
  - Nearly-complete image of executable code
  - Missing:— linkages between code of different files
- Linker
  - Resolves references between files
  - Combines with static run-time libraries
    - E.g., code for malloc, printf
  - Some libraries are dynamically linked
    - Linking occurs when program begins execution
- Machine Instruction Example

no caption for u haha

Disassembling Object Code

no caption for u haha

Disassembler
- objdump -d sum
- Useful tool for examining object code
- Analyzes bit pattern of series of instructions
- Produces approximate rendition of assembly code
- Can dump either a.out file (complete executable) or.o file (single module)
What can be disassembled?
- Anything that can be interpreted as executable code
- Disassembler examines bytes and reconstructs assembly source
Assembly basics: registers, operands, move

no caption for u haha

Moving Data
- movq Source, Dest
- Allows CPU to
  - load values from memory (can be code, data, stack, or heap memory) into registers
  - store register values back into memory
- Operand types
  - Immediate: Constant integer data
    - Example: $0x400, $-533
    - Like C constant, but prefixed with ’$’
    - Encoded with 1, 2, or 4 bytes
  - Register: One of 16 integer registers
    - Example: %rax, %r13
    - But %rsp reserved for special use
    - Others have special uses for particular instructions
  - Memory: 8 consecutive bytes of memory at the address given by the register
    - Simplest example: (%rax)
    - Various other “address modes”
      - Direct addressing: movq 0x1000, %rax (moves the 64-bit value at address 0x1000 into rax)
      - Indirect addressing: movq (%rax), %rbx (moves the 64-bit value at the address stored in rax into rbx)
      - Base-plus-offset addressing: movq 8(%rax), %rbx (moves the 64-bit value at the address rax + 8 into rbx)
- Operand Combinations

no caption for u haha

Simple Memory Addressing Modes
- Normal (R) — Mem[Reg[R]]
  - Register R specifies memory address
  - Aha! Pointer dereferencing in C
  - movq (%rcx),%rax
- Displacement D(R) — Mem[Reg[R]+D]
  - Register R specifies start of memory region
  - Constant displacement D specifies offset
  - movq 8(%rbp),%rdx
Complete Memory Addressing Modes
- Most General Form
  - D(Rb,Ri,S) — Mem[Reg[Rb]+S*Reg[Ri]+ D]
  - D: Constant “displacement” 1, 2, 4, 8 … bytes
  - Rb: Base register: Any of 16 integer registers
  - Ri: Index register: Any, except for %rsp
  - S: Scale: 1, 2, 4, or 8 (why these numbers?)
- Special Cases
  - (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
  - D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
  - (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
- Examples

no caption for u haha

Arithmetic & logical operations
- leaq Src, Dst (Load Effective Address)
  - Src is address mode expression
  - Set Dst to address denoted by expression
- Uses
  - Computing addresses without a memory reference
    - E.g., translation of p = &x[i];
  - Computing arithmetic expressions of the form x + k*y
    - k = 1, 2, 4, or 8
- Example

no caption for u haha

Some arithmetic instructions
- Format Computation
- addq Src,Dest Dest = Dest + Src
- subq Src,Dest Dest = Dest − Src
- imulq Src,Dest Dest = Dest * Src
- salq Src,Dest Dest = Dest << Src Also called shlq
- sarq Src,Dest Dest = Dest >> Src Arithmetic
- shrq Src,Dest Dest = Dest >> Src Logical
- xorq Src,Dest Dest = Dest ^ Src
- andq Src,Dest Dest = Dest & Src
- orq Src,Dest Dest = Dest | Src
Watch out for argument order!
No distinction between signed and unsigned int
One-operand instructions
- incq Dest Dest = Dest + 1
- decq Dest Dest = Dest − 1
- negq Dest Dest = − Dest
- notq Dest Dest = ~Dest
Assembly instructions - suffixes
- “b” applies to byte quantities (i.e., 8-bit operands)
- “w” applies to word quantities (i.e., 16-bit operands)
- “l” applies to long quantities (i.e., 32-bit operands)
- “q” applies to quad-word quantities (i.e., 64-bit operands)
- Used to load and store data of the corresponding sizes --- i.e., char, short, int, long int (in both signed and unsigned versions)
Machine-Level Programming II: Control
- Control: Condition codes - Processor State (x86-64, Partial)

no caption for u haha

Condition Codes (Implicit Setting)
- Single bit registers
  - Carry Flag (for unsigned)
  - SF Sign Flag (for signed)
  - ZF Zero Flag
  - OF Overflow Flag (for signed)
- An “implicit” side effect of (most) arithmetic or logical operations - Example: addq Src,Dest ↔ t = a+b - CF set if carry out from most significant bit (unsigned overflow) - ZF set if t == 0 - SF set if t < 0 (as signed) - OF set if two’s-complement (signed) overflow
  
  (a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
Not set by leaq instruction
Condition Codes (Explicit Settings)
- Explicit Setting by Compare Instruction - cmpq Src2, Src1 - cmpq b,a like computing a-b without saving difference in any destination - CF set if carry out from most significant bit (used for unsigned comparisons) - SF set if (a-b) < 0 (as signed) - ZF set if a == b - OF set if two’s-complement (signed) overflow
  
  (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
Explicit Setting by Test Instruction
- testq Src2, Src1
  - testq b,a like computing a&b without setting destination
- Sets condition codes based on value of Src1 & Src2
- Useful to have one of the operands be a mask
- ZF set when a&b == 0
- SF set when a&b < 0
Reading Condition Codes
- SetX Instructions
- Set low-order byte of destination to 0 or 1 based on combinations of condition codes
- Does not alter remaining 7 bytes

no caption for u haha

One of addressable byte registers
- Does not alter remaining bytes
- Typically use movzbl to finish job - 32-bit instructions also set upper 32 bits to 0

no caption for u haha

Conditional branches
- jX instructions - Jump to different part of code depending on condition codes

no caption for u haha

Example (Old Style)

no caption for u haha

Expressing with Goto Code
- C allows goto statement
- Jump to position designated by label

no caption for u haha

General Conditional Expression Translation (Using Branches)
- C Code
  - val = Test ? Then_Expr : Else_Expr;
  - val = x > y ? x-y : y-x;
- Goto Version

no caption for u haha

Create separate code regions for then & else expressions
Execute appropriate one
Visualizing pipeline behavior of CPU

no caption for u haha

Loops
Switch Statements
Machine-Level Programming III: Procedures (aka Functions)
- Mechanisms in Functions
  - Passing control
    - To beginning of function code
    - Back to return point
  - Passing data
    - function arguments
    - Return value
  - Memory management
    - Allocate during function execution
    - Deallocate upon return
  - Mechanisms all implemented with special machine instructions, and a set of conventions
  - x86-64 implementation of a function uses only those mechanisms required
- Every running program has its own address space
- A key component of that address spaces is The Stack
- Functions
  - Stack Structure
  - Calling Conventions
    - Passing control
    - Passing data
    - Managing local data
  - Illustration of Recursion
Machine-Level Programming IV: Data (Arrays, Structures, Alignment)
- Arrays - One-dimensional - A collection of objects of the same type stored - contiguously in memory under one name - May be any type of object - May be objects of the same class (C++) - May even be collection of arrays of the same types! - For ease of access to any member of array - For passing to functions as a collection - Array Allocation - Basic Principle
  
  T A[L];
Array of data type T and length L
Contiguously allocated region of L * sizeof(T) bytes in memory

no caption for u haha

Array Access
- Basic Principle
  
  T A[L];
Array of data type T and length L
Identifier A can be used as a pointer to array element 0: Type T*
Example

no caption for u haha

Array Accessing Example

no caption for u haha

Array Loop Example

no caption for u haha

Multi-dimensional (nested)
- Declaration
  
  T A[R][C];
2D array of data type T
R rows, C columns
Type T element requires K bytes
Array Size
- R * C * K bytes
Arrangement
- Row-Major Ordering

no caption for u haha

Example

no caption for u haha

Nested Array Row Access
- Row Vectors
  - A[i] is array of C elements
  - Each element of type T requires K bytes
  - Starting address A + i * (C * K)
Concurrency model in assembly?
Multi-level
Structures/Unions
- Allocation
- Access
- Alignment
Floating Point