Skip to content

CS 2303: System Programming Concepts

  • Virtual machines
    • An emulation of a computer system
      • Running a computer OS (the guest) inside another computer’s OS (the host)
        • The host runs the VM software application
        • The VM software application emulates the guest computer’s hardware
        • The guest OS thinks it is running directly on the underlying hardware
  • Compiled / Interpreted / Just-In-Time compiled languages
    • Compiled languages convert directly to machine code
    • Interpreters run through a program line by line and execute each command
    • JIT compilation improves the performance of interpreted programs
      • While the interpreted program is being run, the JIT compiler determines the most frequently used code and compiles it to machine code
  • Programming language paradigms
    • Imperative
      • Program is often a sequence of function calls
        • Functions accept data (not consume it), can modify data, can return results
      • Variables hold data - Which can persist
    • Functional
      • Feed data into a function, feed result into another function
        • Functions consume data, functions produce new data, might have side-effects
    • Object-Oriented
      • Objects have data and behavior: invoke object to do something to itself
    • Scripting
      • Easy to throw together a program
  • C language
    • Designed for systems programming
      • OS, compilers, utilities, assemblers, text editors, databases, network drivers,…
    • Was invented to write the OS UNIX
    • Features
      • Runs fast
      • Mix of high-level and low-level capabilities
        • Sometimes called a mid-level language
      • Good for interacting with hardware
    • Data types determine how much space the variable occupies in storage and how the bit pattern stored is interpreted
      • Fundamental types
        • char
        • int
        • float
        • double
        • long
        • short
          • short <= int <= long
      • The type void
        • function return
        • function argument
        • pointers of type void * represents the address of an object, but not its type, can be casted to any data type
      • Reference types
        • pointers
        • arrays
        • structures
        • unions
        • functions
      • Control-flow constructions
        • if-else, switch, while, for, do, break
  • Purposes of systems programming
    • Systems programs
      • Programs to make computers work efficiently
      • Tools for building and running applications
      • Building other programs and applications
    • Applications
      • Programs to make people / businesses productive
      • Programs to do useful work using the computer
      • Acquiring, managing, analyzing information
  • Priorities
    • Systems programs
      • Speed of operation — programs need to run fast to be productive
      • Compactness — minimize resources to keep costs low
      • Flexibility in use — get stuff done
      • Manipulation of low-level resources — make the computer useful
    • Application programs
      • Speed of development — create a solution to an existing problem
      • Ease of modification — keep up with evolving needs / expectations
      • Suitability for the problem domain — be intuitive to program users
  • Language evolution
    • Machine language
    • Assembly
    • High-level languages
  • C++ vs Java or Python
    • C/C++ run-time vs. Java or Python
      • C/C++ are compiled languages, Java JIT, Python interpreted
        • Better low-level control & performance
    • Developer responsible for ‘housekeeping’ such as memory management
      • Less run-time checking
        • Minimal run-time checking for C
    • Newer releases of C++
      • Strive to address many run-time issues
        • Sometimes at cost of performance
          • Some run-time errors so common it makes sense to detect
      • Compiler technology has advanced significantly
        • More able to determine possible run-time errors during compilation
  • File & directory permissions

Linux permissions

  • Filename conventions
    • code files (source code)
      • source.c common for C language
      • header.h for C language header (include) files
      • source.cpp or source.cxx common for C++ language
      • header.hpp or hxx for C++ header (include) files
    • object files (output from compiler)
      • source.o same name as source file with different extension
    • library files
      • library.a in Unix/Linix
      • library.lib in Microsoft Windows
    • dynamically-linked library
      • libary.so in Unix/Linux
      • libary.dll in Microsoft Windows
    • executable files─ filename (no extension) in Unix/Linux
      • filename.exe in Microsoft Windows
  • Editing, compiling & linking

Linking and compiling

  • The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control
  • Compiler: gcc (g++)
  • Linker: ld
  • Best practice: in C or C++ programs, keep all the constants, macros, system wide global variables, and function prototypes in the header files and include that header file wherever it is required
  • #include <calcGrades.h> tells the preprocessor to search for the header file in the system include directories. These directories are typically set during the installation of the compiler and include the standard C library headers. This is used for system-wide libraries and headers that come with the compiler.
    • Most are in /usr/include
  • #include “calcGrades.h” tells the preprocessor to search for the header file in the current directory and then in the directories specified by the -I option of the compiler. This is used for headers that are specific to the project and not a part of the system.
    • Put in sibling dir to source files ../include
  • Best practice: Use double quotes for user-defined headers and angle brackets for system headers, this way it’s easier to see where the headers are coming from.
  • Compiling with GCC
    • gcc knows filetypes by extension
      • Extensions helpful for context/tools even though not required by Linux
    • Simplest:
    • gcc file.c
    • Produces executable called a.out
    • Override default output file name:
      • gcc file.c —o file
      • e.g. gcc hello.c —o hello
    • Both invoke the compiler and then the linker.
    • Just compile to object:
    • gcc -c file.c
      • Produces object file called file.o
    • Link multiple files:
      • ld [opts] files
      • ld —o newhello file1.o file2.o
        • Also: ld hello.o foo.o —o newhello
        • Without —o, executable would be in a.out .
      • gcc file1.o file2.o —o file
    • How to use?
      • For simple files:
        • just override the default output file name.
      • For multiple files:
        • First, compile each source file with —c
        • Then, link all object files with overridden executable file name.
    • This can be automated with the make utility.
  • Primitive data types
    • int, bool, float, void, char
    • Match what the hardware supports
    • A bit is the smallest unit storage, stores a 0 or 1
    • Group of 8 bits is 1 byte
    • 1 bit holds 1 or 0 → binary representation → 0 to 255 maps to ASCII → chars
    • Size: short <= int <= long < long long
  • Symbolic constants
    • Textual substitution by the preprocessor
    • Different from typed constants, which are handled during the compilation phase
  • Storage classes
    • Defines the scope and lifetime of variables and/or functions
    • auto, register, static, extern
    • auto
      • Default storage class for all local variables
    • register - Define local variables that should be stored in a register instead of RAM

Storage classes

  • Variables MIGHT be stored in a register depending on hardware and implementation restrictions
  • static
    • Keep a local variable in existence during the lifetime of the program
    • Global static variables’ scope is restricted to the file in which it is declared
  • extern
    • Give a reference of a global variable that is visible to all program files
    • Used when there are two or more files sharing the same global variables or functions
    • Define a global variable or function in one file and use extern in other files
  • Binary operators
    • Operates on bit strings
    • A = 60 (0011 1100 binary) and B = 13 (0000 1101 binary) → a bit holds a
    • &
      • Binary AND copies a bit to the result if it exists in both operands
      • A & B = 12 (0000 1100)
    • |
      • Binary OR copies a bit if it exists in either operand
      • A | B = 61 (0011 1101)
    • ^
      • Binary XOR (exclusive OR) copies the bit if it is set in one operand but not both
      • A ^ B = 49 (0011 0001)
    • ~
      • Binary One’s Complement flip bits
      • ~A = ~(60) = (1100 0011)
    • <<
      • Binary Left Shift moves the left operand value by the number of bits specified by the right operand
      • A << 2 = 240 (1111 0000)
    • >>
      • Binary Right Shift
  • C program input and output
    • 3 I/O streams are provided by the OS to any program
      • Stream is a sequence of bytes
      • stdin - Standard Input Stream from terminal by default
      • stdout - Standard Output Stream defaults to terminal
      • stderr - Standard Error Stream defaults to terminal
  • Header files
    • Maintain consistency
      • All src files have same information
    • Reduce effort and duplication
    • Increase flexibility
    • Increase code clarity
    • What to put
      • Symbolic constants
      • Function prototypes
      • Data structure definitions
      • External variable definitions
      • Guards
      • NOT executable code
        • Macros border on exceptions
    • Using header files
      • In the source file (.c, .cpp, .cxx)
        • Use #include directive
      • Compiler (preprocessor) inserts the contents of the specified file
        • At that point of the source file
        • Textual insertion of the entire #included file
  • Guards in header files
    • Guard against multiple inclusions of same header file
      • Multiple inclusions would result in multiple definitions of same information
      • We use ‘guards’ to prevent more than one inclusion to a given source file
    • Works by defining a symbol first time the header is included
      • Definition acts as a guard (flag) to know if the file has already been included
    • To derive the symbol:
      • Convert to file name ALL CAPS.
      • Replace dot with underscore.
  • make utility
    • Build executable and library files from multiple sources
    • Multiple files go into each program
      • Modularity, Organization, Reuse other programs, Multiple Developers, etc.
    • When compiled, each .c, .cxx file produces one .o file.
      • Each C/C++ source compiles to one object file
      • Typically, object file has same base name as source file (can be renamed)
    • Multiple .o files can be linked to form one executable of library file
    • Source files can include other files
      • e.g., Header files (.h, .hxx, .hpp) with declarations and prototypes
      • Possible to include other source files, but frowned upon
    • Does things based upon dependencies
      • Usually compiling, linking, copying or moving…
    • make tries to do the minimum needed
      • Ensures everything is up-to-date with minimal work
    • Each project needs a makefile
      • Named makefile by default - sometimes Makefile
    • Source files depend on header files
      • make utility uses timestamp of files to see what’s newer and determine if something needs rebuilding
    • Each section of the makefile (aka rule) specifies
      • Target (the file to be generated)
      • Dependencies (files the target depends on)
      • Commands (what to do if dependency has changed)
    • Each section starts at the left-most column
      • Each line in the rest of the section is indented with a tab
    • First section in makefile is the default “rule”
      • Used if you run make with no parameters
        • The make utility looks to default section to figure out what rules to perform
    • makefile formatting - General format for each section / rule

Makefile

  • Commands indented with a tab character
  • The programmer’s responsibility to specify dependencies

Makefile example

  • The programmer’s responsibility to specify dependencies
  • Doxygen
    • Must have config file Doxyfile in project directory
    • A software program (tool) that will generate documentation from annotated C/C++ sources (as well as several other programming languages.) Doxygen program will scan source files looking for keywords and analyze program structure to:
    • Generate on-line documentation or off-line reference documentation. Doxygen can produce output documentation in a number of different protocols including, but not limited to, html, LaTex, RTF, PstScript, PDF, and Unix/Linux man pages.
    • Analyze and extract the code structure for undocumented source files.
    • Analyze keywords and structure to generate normal program documentation such as user manuals.
    • Accessing Doxygen HTML
      • Must be somewhere under your public_html directory
      • All files must be publicly readable
      • All directories above must be publicly executable
  • Casting
    • Instruct the compiler to do a conversion
      • Convert data from one type to another type for an assignment
    • Instruct the compiler to interpret data differently
      • How to interpret the memory location
      • How to interpret the bits in memory
    • Some type of conversion (implicit) is done automatically - especially promotion
      • No loss of information
      • float → double
      • char → int
      • int → long
      • int → double
    • Explicit type conversion can lose data due to truncation
      • int i = (int) 1.5;
    • Weaknesses of C conversion
      • Multiple meanings in different contexts.
        • Convert the data
        • Look at data differently
      • No way to specify conversion algorithm
        • Possible to write a function that does the conversion
      • Non-portable across platforms
        • Data storage sizes and representations can be different
      • Note: C++ adds more advanced casting
  • typedef
    • Allows creation of “aliases” for types
      • Greater readability
      • Greater portability
        • Since primitive types can be implementation-dependent
        • i.e., can have conditional compilation to create int_16 always 2 bytes
    • Dependence
      • Sometimes header file declarations depend on architecture
      • Some declarations must agree with underlying system / architecture
    • Abstraction
      • Often we want the data storage to be consistent between platforms
        • Some declarations want to be independent of underlying system / architecture
    • Syntax
      • Typedef existing_type new_type;
    • New type just like any other type to compiler
      • Use it as such
    • Examples
      • typedef unsigned short uint_16;
      • ─ typedef unsigned long uint_32;
      • ─ Typedef unsigned long size_t;
    • Will often be done using conditional compilation in header file so alias based on appropriate primitive type
    • Can give a name to user defined data types (unions, structs)
    • typedef vs #define
      • typedef is limited to giving symbolic names to types only where as #define can be used to define alias for values as well, q., you can define 1 as ONE etc.
      • typedef interpretation is performed by the compiler whereas #define statements are processed by the pre-processor.
  • Enums - User-defined data type
    • Symbolic names for integral constants
      • Limited set of values for enumerated type
    • Significantly improves readability
      • Good code style / hygiene
    • Provides type safety
      • Constrains valid ranges of integral values
    • Declaration
      • enum enum_name {val1, val2, valN};
    • Instantiation
      • enum enum_name myEnumVar;
    • Best practice: In general an enum is going to be used as a type definition and should always be in the header file
  • Array - sequence of data in contiguous memory locations
    • An array object is a linear collection (sequence) of storage locations for elements of the same type
    • C array is NOT an object
      • Only data, no behavior
    • Defining an array
      • Address of a[i] = &array_start + index_offset * size_data_type
    • Declaring arrays - For global scope - Arrays size must be constant known at compile time - So compilers know how much memory must be allocated at runtime - Must use constant literal or const variable for the size - For local scope (function, block) - Arrays can be of variable size - Size determined at runtime when they come into scope

Array in memory

  • Dynamic allocation of global arrays is possible using malloc()
  • An array name is a pointer to its first element
  • Accessing array element
    • The address is calculated
      • Address of a[i] = (address of start of array) + ((size of an element) * index)
        • Cost
          • 4 memory accesses
          • 1 multiplication
          • 1 addition
    • The value is read and/or written
  • Two-dimensional arrays
    • Similar to Python
  • Passing arrays as function arguments
    • Pointer as a formal parameter
      • void myFn(int *param)
    • Sized array as a formal parameter
      • void myFn(int param[10])
    • Unsized array as formal parameter
      • void myFn(int param[])
  • Return array from function
    • C does not allow returning an entire array, instead return a pointer to an array
    • The array’s name can be used in place of its pointer
    • C does not allow returning the address of a local variable, so the returned array would have to be static
  • Pointers, memory iteration, sorting
    • Pointer is a variable whose value is the address of another variable - The address of a location in memory that stores a known data type

Pointer

  • A pointer is also a group of bytes that can hold an address
  • A pointer p points to a char c
  • Thus, the variable is a pointer to an object (as in C, not OOP) of some type
  • Pointer declaration
    • Must declare the type of object variable will point to
    • Must declare name of variable that will be a pointer
    • Must declare the variable is a pointer
    • Declared with ”*” operator
      • type * pointer
  • Memory address is obtained with the & operator
    • & operator returns the memory address of some object

Memory address

  • De-referenced with ”*” operator

Pointer

  • Dereferencing is following the pointer indirection
    • Follow the pointer to the memory address it references
  • Best practice: assign a NULL value to a pointer variable in case there is no exact address
    • Programs are not permitted to access memory at address 0 (NULL value)
    • By convention, a NULL pointers are assumed to point to nothing
    • Check for null for pointer using if
  • Usage
    • Beware uninitialized pointers - Don’t know where they’re pointing
    • Pointers used incorrectly corrupt run-time environment
    • High-level languages don’t provide pointers
  • Increment pointer to move to the next memory address
    • Number of bytes moved is based on type
  • Decrement pointer to move to previous memory address
    • Number of bytes moved is based on type
  • Why do you want to do it?
    • Efficiency — fast to sequence through arrays in memory
    • Expressiveness — communicates code intent to reader
  • Do you want to increment/decrement the pointer or object pointed at?
  • Use parentheses to remove ambiguity
    • Increment object referenced: (*x)++; ++(*x);
    • Increment pointer before or after dereference: *(x++); *(++x);
  • Legal and illegal pointers
    • Safe To Point At
      • To static data — it won’t go away
      • To memory (variables) on the heap
        • But not after they are deallocated
      • To local variables and parameters
        • But only in same scope
    • Unsafe To Point At
      • To memory (variables) on the heap after the memory has been freed
      • To local variables and parameters after exit from the function or block
        • After their lifetime ends
  • Casting C-style pointers
    • All pointers are variables that store addresses to a type in memory
    • Compiler needs to know the type pointed at so it can:
      • Check for type mismatch in operations using pointer dereferences
      • Find data fields and functions in structured (non-primitive) objects
      • Stay aligned as pointers move through arrays
      • Calculate offsets for array elements
      • Do pointer arithmetic appropriately
      • Interpret bits pointed at correctly
    • Programmer responsible for casting pointer to right type - If you want to interpret bits in memory in different way - NOTE: this will not change the way bits exist in memory

Pointer casting

Pointer casting

  • Iterating arrays - iterating with index vs with pointers

Iterating through array

  • Program (compiler) does not need to address of array index each access
    • Pointer of matching data type can be incremented / decremented to iterate array
  • Programmer must ensure pointer doesn’t run past array bounds
  • The name of an array is automatically a pointer, does not need to declare another pointer variable
    • To the zeroth element of the array - start of the array in memory

Array indexing

Array indexing

  • The latter is slightly more efficient
    • Compiler doesn’t need to calculate array offset for each access
  • Memory segment different uses in system

Process memory layout

  • Code segment
    • Machine instructions generated from your source code
    • Loaded at execution time
    • Typically write-protected
      • Read access but not write access
    • Shared by multiple processes & threads
      • Shared libraries
      • Shared functions
      • Complete programs
    • Fixed size
      • Compiler/linker knows how big
  • Data segment
    • Global & static scope variables declared in program
    • Loaded at start of execution
    • Can’t be write protected
      • Programs need to write to variables
    • Every process has its own copy
      • Don’t want it changed by other programs
    • Fixed size
      • Compiler/linker knows how big
  • Heap segment
    • Dynamic memory allocations served from the heap
    • Assigned by OS at start of execution
    • Every process has its own
    • Chunks allocated and released at run time
      • Allocated using malloc()
      • Deallocated using free()
      • Best practice: Set values of freed pointers to NULL immediately after free to avoid undefined behavior when accessing those pointers after freeing
    • Dynamic size - Can grow & shrink
      • Compiler/link cannot know how big
    • C/C++ objects
      • Allocated from a heap at run time
        • When code creates object using C++ new
  • Stack segment
    • Automatic (local) variables grow and shrink like stack data struct
    • Sometimes multiple stack segments runtime environment
    • Assigned by OS at start of execution.
    • Each process has its own
    • Works just like a stack data structure
    • Holds automatic variables & housekeeping:
      • Function return address & values
      • Parameters passed to functions
      • Local (non-static) block variables
        • Functions, Code blocks
    • Dynamic size
    • Each entry into a function or block entry uses memory from stack
      • Pushed onto stack — grows to provide variable memory
      • Variable memory is not initialized unless you explicitly do it
    • Each function exit (return) or block exit releases stack memory
      • Stack pops and stack decreases in size. In reverse order to usage.
      • Stack pushed when entering function / block, Popped upon exiting
    • Memory values not cleared when released from stack
    • Stack memory values not cleared upon next function block entry - Uninitialized local / automatic variables will have leftover value - garbage

Used and freed memory

  • Accessing variables
    • When a local variable is accessed
      • The address is calculated
      • The value is read and/or written
    • For single variables this is fairly simple
      • Address = Start of stack frame + constant_offset
    • If you are dereferencing a pointer, you already have the address
  • Memory alignment

Memory alignment

  • Hardware often requires data and/or instructions to be aligned
    • Located at an addresses having certain limitations
  • Might be absolutely required or just preferred for efficiency
  • Commands & Data moved through computer on ‘bus’
    • These have number of bits in width
    • Think of lanes of a highway
  • Aligning data with bus characteristics greatly improve efficiency of data moves
  • Overall performance of computer significantly increased
  • Low-level memory allocation
    • As C system programmers we can allocate memory for our needs
      • As no automatic internal structure exists for raw (low-level) chunks of memory from the heap segment
    • Functions to allocate/deallocate memory - malloc(), calloc(), realloc(), free()

Memory allocation functions

  • void *malloc(size_t size)
    • Cast pointer returned to appropriate type
    • Returns pointer to block of memory big enough to hold size bytes.
      • size_t is a platform-dependent type guaranteed large enough to address anything
      • Usually typedef’d to an unsigned integer but not always
        • Big enough to hold the largest possible address or memory size.
    • Allocated memory is aligned to handle any type of primitive data
    • Returns NULL if error occurs
    • Memory is not cleared so it might have garbage values
  • void *calloc(size_t nmeb, size_t size)
    • Returns ptr to memory big enough to hold nmemb elements of size bytes each.
    • Enough space so nmemb elements can be properly aligned
      • The order of parameters are not interchangeable
      • First is number of elements, second is size (in bytes) of a single element
    • Entire chunk of memory is initialized to zero values before return
    • Returns NULL if error occurs
  • void *realloc(void *ptr, size_t size)
    • Can only be called for previously allocated memory
      • Memory allocated by malloc(), calloc(), or previous realloc()
    • Changes memory block to new size

      Will move to new address to different location in heap if necessary

  • Copies any existing data to the new location
  • Any additional memory added to the new chunk is not cleared
  • NULL ptr if memory cannot be reallocated
    • Existing memory is NOT freed or changed if reallocation fails
  • void free(void *ptr)
    • Releases memory back to heap that was previously allocated
      • Memory allocated using malloc(), calloc() or realloc()
    • You can only free memory pointed to by pointer ptr if it:
      • Points to memory that was dynamically allocated from the heap
      • Points to the start of the block that was dynamically allocated
    • No value or error results are returned
  • Parameters passing
    • Parameters are passed from calling code to function by pushing copies onto the stack segment of memory
    • In C, all parameters are passed by value
      • The calling code passes actual parameters
      • The receiving code gets formal parameters
    • How pass by value works:
      • Each actual parameter is evaluated, and a copy is made
      • Each formal parameter is bound to the copy just made
      • The function does its thing using the copy and other automatic variables
      • Any return value is made available to calling code
        • Calling code responsible for storing a copy of the return value
        • If return value is not saved by calling code, the value goes out of scope
      • Formal parameters & return value go out of scope and memory reused
  • sizeof()
    • Determine the storage size of things
    • Size is computed at compile time
    • Enhaces portability
      • Different systems have different type sizes and different alignment
      • Programmer doesn’t need to guess what system code will compile / run on
    • Avoids magic numbers
      • Can determine size at compile time for any system
      • Programmer doesn’t need to hard code values with conditional compilation
    • Context-aware for arrays
      • If the size of array is known at compile time
        • Returns the size of the entire array
      • Otherwise, returns the size of the pointer to the first element
  • C-Strings
    • C-Strings are just arrays of char

C string representation in memory

  • The ‘\0’ at the end is the null terminator - flags the end of a string
  • Each element is sizeof(char) - 1 byte
  • Each element is the ASCII value of bits representing the character
  • One character is enclosed in single-quotes
    • To specify ASCII value of character
    • char val1 = 'A';
      • Occupies 1 byte // no null termination
    • int val2 = ‘\n’
      • Occupies 4 bytes with ASCII value of newline (10) in LSB
  • Null-terminated string is enclosed in double-quotes
    • To specify ASCII values with binary zero appended
    • It is const static data type
    • char hello_string[] = Hello;
      • Occupies 6 bytes — 5 chars + null terminator
  • Non-printable characters
    • ‘\n’ = New Line (actually ASCII Line Feed)
    • ‘\r’ = Carriage Return (usually not needed — unless Windows)
    • ‘\t’ = Tab

Strings in C

  • stdin, stdout, stderr

stdin, stdout, stderr

  • Structures
    • A C-style struct is a way to group related data of different types together
      • In a user-defined data type - kind of like a C++ object without behavior
    • Declare in one place in your code
      • Use typedef to create an alias for clarity of use
      • Give the definition a name for reuse - most common.
    • Accessing struct fields
      • Dot notation
    • Structures as function arguments
      • Pass struct as function arguments/use struct as formal parameters/return values same way as variable/pointer
    • Pointers to structures
      • Define pointers to structures the same way as defining pointers to any other variable
      • Access a structure’s members using a pointer to that structure
        • struct_ptr->member
    • Bit fields
      • Allow the automatic packing of data in a structure as compactly as possible
      • Useful when memory is expensive
      • Examples
        • Packing several objects into a machine word
          • 1 bit flags can be compacted
        • Reading external file formats
      • Put :<bit width> after the variable
      • E.g: use unsigned int age : 3; to save variable of range [0, 2^3)
  • Unions
    • Allows storage of different data types (members) in the same memory location
    • Only one member can contain a value at any time
    • The memory occupied by a union will be large enough to hold the largest member of the union
    • Acess union members using dot notation
  • Hashing
    • Hash tables/hash maps

Hashing

  • Key gets hashed, i.e. converted to an index
  • Hash function is fast operation to create hash value
    • Very hard to get back to original key
    • Often used in crypto
  • Maps the key value to fixed size and range

Hash function qualities

  • Collisions will happen and must be handled
    • List list of student record for each hash bucket
    • Move colliding student record to first free bucket
    • Rehash with some deterministic change
  • Hash function
    • Must turn the key into an index
      • Essence: Turn larger number of bits to smaller number of bits
    • Must be efficient
    • Indexes must be well scattered
      • Low chance of collision
    • Algorithm choice is very dependent on type and values of key.\
    • Good for random numeric data:
      • Modulo
      • Multiply by a constant, add another constant.
    • Good for character strings or long data:
      • Sum of bytes
      • Shift and add, bit twiddling
      • Weighted sum of bytes
      • Polynomial
    • How to deal with upper / lower case on strings
  • Data structures
    • Object-oriented languages (C++ Language)
      • Each stack, queue, list, tree, etc. is a separate object.
      • Fields: Stack data area, size, pointer.
      • Methods: push(), pop(), etc.
      • Methods operate upon this stack.
    • Non-Object-oriented languages (C Language)
      • Data structure in its own (header) file
      • Series of functions: create(), push(), pop(), etc. in C source file
      • Need identifier if more than one instance of data structure
      • Must pass it in as a parameter.
      • Store the state of the data structure in a unique struct instance
  • System commands - fork and exec and child processes

system() library function

  • Shell: a program that reads commands and runs other programs
  • Invoke the OS to execute specified system command
    • As if you had typed command at the terminal command prompt
  • If the command is NULL
    • Then a nonzero value if a shell is available, or zero if no shell is available
  • If a child process couldn’t be created, or its status not retrieved
    • The return value is -1 and errno is set to indicate the error
  • If a shell couldn’t be executed in the child process,
    • Then the return value is as though the child shell terminated by calling _exit(2)
  • If all system calls succeed
    • Then the return value is the termination status of the child shell used to execute command
  • Forking - duplicate current task
    • OS’s handle spawning new tasks in different ways
    • Unix pioneered the approach of forking
      • The current process is duplicated to make a full copy
      • Everything is duplicated
        • All data is duplicated — including pointers
        • Open files and devices are duplicated
    • One process calls fork(), two processes return! - Parent process & Child process
  • The C++ language
    • Visibility - Namespace - a declarative region (scope) that defines the boundary of - names (types, functions, variables, etc.) inside the namespace - Namespaces organize code into logical areas - Prevents name collisions that can occur on large systems or multiple libraries - As C++ developers we control the scope of names - Not just files or blocks of structured code - We can define a logical region in which the name exists - C++ standard library has the namespace of std - std::string mystring = “C++ is great!”;

C++ namespace

C++ namespace

  • What is the difference between using namespace std and pulling the string.h header file?
    • A header file is a file that is intended to be included by source files. They typically contain declarations of certain classes and functions.
    • A namespace enables code to categorize identifiers. That is, classes, functions, etc

C++ methods and class scope

  • Best practice: right-hand side
  • Classes

C++ classes

  • Constructor (ctor) invoked when class instantiated
    • Before instantiating code has reference to created object
    • May involve memory allocation and init of objects
  • Destructor (dtor) invoked when class destroyed
    • When object goes out of scope and resources returned to system
    • May involve cleanup and deallocation of memory for objects
  • Setters / Getters
    • Support data abstraction
    • Avoid direct access to data members
  • Copy constructor
    • Initializes an object from another instance of same type by copying the member values from an object of the same type

const keyword

  • Inheritance
    • C++ supports multiple inheritance
    • Must qualify which superclass when dealing with diamond inheritance problems
  • Polymorphism

C polymorphism

  • C++ collections
    • Templates
      • Parameterized classes and functions
      • Allow the class, algorithm or function to be applied generally
        • Implemented independently from the data type
    • Standard Template Library
      • Generalized library of parameterized components
      • Contains
        • Containers
          • Sequential Containers: can be accessed in a sequential manner
            • Vector, List, Deque, Arrays, Forward_list
          • Container Adaptors: modify the interface for sequential containers
            • Queue, Priority_queue, Stack
          • Associative Containers: sorted data structures for quick searching
            • Set, Multiset, Map, Multimap
          • Unordered Associative Containers: unsorted with quick access
            • Unordered_set, Unordered_multiset, Unordered_map, Unordered_multimap
          • Iterators — access to all elements in a collection
          • Algorithms — a process to determine something
          • Functions — a relationship between parameters and results
    • Sequential containers
      • Storage which allows sequential access to members
      • array: Static contiguous array
        • Improvement over the C-Style array
      • vector: Dynamic contiguous array
        • Sequential storage that can grow and shrink as needed
      • deque: Double-ended queue
        • Better insertion & deletion performance than vectors
      • forward_list: Singly-linked list
        • Very high performance with low overhead
      • list: Doubly-linked list
        • Slower traversal than vectors
    • Iterators - An object that points to an element inside the container - Allows users to ‘iterate’ over the contents of the container - Iterator moves from beginning (front) to end (back) - Reverse Iterator move from end (back) to beginning (front) - Able to access element at current position by dereferencing iterator - Same syntax as dereferencing a pointer - Connect algorithm with containers - Allow algorithms access to elements - Without need to write code logic specific to the type of data in the container - Iterator: ++ moves forward collection, — moves backward - Reverse Iterator: ++ moves backward in collection, — forward - Some logic difference at the beginning or end of the collection - 5 types - Random Access Iterators: - Access elements at random position in collection — just like pointers - Bidirectional Iterators: - Can only move sequentially in collection — both forward and backward - Forward Iterators: - Can only move sequentially forward in collection — not backward - Output Iterators: - Can write current position but not read - Input Iterators: - Can read current position but not write

Iterator

  • C++ References
    • C++ support references to variables
      • Primitive data types or user-defined data types
      • C does not support references
    • C++ references are like Python or Java object references
      • Python and Java do not support pointers
    • Both are variables storing addresses of other variables
      • Pointers store address of which is dereferenced using *
      • References are alias to address which do not need to be dereferenced
        • The compiler does so implicitly
    • Pointer
      • Stores memory address
      • Programmer must dereference
      • Pointer can be reassigned
        • Point to different location
      • Has its own memory address
      • Can be assigned null value
      • Can be dereferenced
      • Supports arithmetic operations
        • Increment / Decrement
    • Reference
      • Alias for memory address
      • Compiler implicitly dereferences
      • Reference cannot be reassigned
        • Bound when comes into scope
      • Alias to existing memory address
      • Cannot be assigned null value
      • Cannot be dereferenced
      • Doesn’t support arithmetic operations