CS 2303: System Programming Concepts
- Virtual machines
- An emulation of a computer system
- Running a computer OS (the guest) inside another computer’s OS (the host)
- The host runs the VM software application
- The VM software application emulates the guest computer’s hardware
- The guest OS thinks it is running directly on the underlying hardware
- Running a computer OS (the guest) inside another computer’s OS (the host)
- An emulation of a computer system
- Compiled / Interpreted / Just-In-Time compiled languages
- Compiled languages convert directly to machine code
- Interpreters run through a program line by line and execute each command
- JIT compilation improves the performance of interpreted programs
- While the interpreted program is being run, the JIT compiler determines the most frequently used code and compiles it to machine code
- Programming language paradigms
- Imperative
- Program is often a sequence of function calls
- Functions accept data (not consume it), can modify data, can return results
- Variables hold data - Which can persist
- Program is often a sequence of function calls
- Functional
- Feed data into a function, feed result into another function
- Functions consume data, functions produce new data, might have side-effects
- Feed data into a function, feed result into another function
- Object-Oriented
- Objects have data and behavior: invoke object to do something to itself
- Scripting
- Easy to throw together a program
- Imperative
- C language
- Designed for systems programming
- OS, compilers, utilities, assemblers, text editors, databases, network drivers,…
- Was invented to write the OS UNIX
- Features
- Runs fast
- Mix of high-level and low-level capabilities
- Sometimes called a mid-level language
- Good for interacting with hardware
- Data types determine how much space the variable occupies in storage and how
the bit pattern stored is interpreted
- Fundamental types
- char
- int
- float
- double
- long
- short
- short <= int <= long
- The type void
- function return
- function argument
- pointers of type void * represents the address of an object, but not its type, can be casted to any data type
- Reference types
- pointers
- arrays
- structures
- unions
- functions
- Control-flow constructions
- if-else, switch, while, for, do, break
- Fundamental types
- Designed for systems programming
- Purposes of systems programming
- Systems programs
- Programs to make computers work efficiently
- Tools for building and running applications
- Building other programs and applications
- Applications
- Programs to make people / businesses productive
- Programs to do useful work using the computer
- Acquiring, managing, analyzing information
- Systems programs
- Priorities
- Systems programs
- Speed of operation — programs need to run fast to be productive
- Compactness — minimize resources to keep costs low
- Flexibility in use — get stuff done
- Manipulation of low-level resources — make the computer useful
- Application programs
- Speed of development — create a solution to an existing problem
- Ease of modification — keep up with evolving needs / expectations
- Suitability for the problem domain — be intuitive to program users
- Systems programs
- Language evolution
- Machine language
- Assembly
- High-level languages
- C++ vs Java or Python
- C/C++ run-time vs. Java or Python
- C/C++ are compiled languages, Java JIT, Python interpreted
- Better low-level control & performance
- C/C++ are compiled languages, Java JIT, Python interpreted
- Developer responsible for ‘housekeeping’ such as memory management
- Less run-time checking
- Minimal run-time checking for C
- Less run-time checking
- Newer releases of C++
- Strive to address many run-time issues
- Sometimes at cost of performance
- Some run-time errors so common it makes sense to detect
- Sometimes at cost of performance
- Compiler technology has advanced significantly
- More able to determine possible run-time errors during compilation
- Strive to address many run-time issues
- C/C++ run-time vs. Java or Python
- File & directory permissions

- Filename conventions
- code files (source code)
- source.c common for C language
- header.h for C language header (include) files
- source.cpp or source.cxx common for C++ language
- header.hpp or hxx for C++ header (include) files
- object files (output from compiler)
- source.o same name as source file with different extension
- library files
- library.a in Unix/Linix
- library.lib in Microsoft Windows
- dynamically-linked library
- libary.so in Unix/Linux
- libary.dll in Microsoft Windows
- executable files─ filename (no extension) in Unix/Linux
- filename.exe in Microsoft Windows
- code files (source code)
- Editing, compiling & linking

- The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control
- Compiler: gcc (g++)
- Linker: ld
- Best practice: in C or C++ programs, keep all the constants, macros, system wide global variables, and function prototypes in the header files and include that header file wherever it is required
- #include <calcGrades.h> tells the preprocessor to search for the header file
in the system include directories. These directories are typically set during
the installation of the compiler and include the standard C library headers.
This is used for system-wide libraries and headers that come with the
compiler.
- Most are in /usr/include
- #include “calcGrades.h” tells the preprocessor to search for the header file
in the current directory and then in the directories specified by the -I
option of the compiler. This is used for headers that are specific to the
project and not a part of the system.
- Put in sibling dir to source files ../include
- Best practice: Use double quotes for user-defined headers and angle brackets for system headers, this way it’s easier to see where the headers are coming from.
- Compiling with GCC
- gcc knows filetypes by extension
- Extensions helpful for context/tools even though not required by Linux
- Simplest:
- gcc file.c
- Produces executable called a.out
- Override default output file name:
- gcc file.c —o file
- e.g. gcc hello.c —o hello
- Both invoke the compiler and then the linker.
- Just compile to object:
- gcc -c file.c
- Produces object file called file.o
- Link multiple files:
- ld [opts] files
- ld —o newhello file1.o file2.o
- Also: ld hello.o foo.o —o newhello
- Without —o, executable would be in a.out .
- gcc file1.o file2.o —o file
- How to use?
- For simple files:
- just override the default output file name.
- For multiple files:
- First, compile each source file with —c
- Then, link all object files with overridden executable file name.
- For simple files:
- This can be automated with the make utility.
- gcc knows filetypes by extension
- Primitive data types
- int, bool, float, void, char
- Match what the hardware supports
- A bit is the smallest unit storage, stores a 0 or 1
- Group of 8 bits is 1 byte
- 1 bit holds 1 or 0 → binary representation → 0 to 255 maps to ASCII → chars
- Size: short <= int <= long < long long
- Symbolic constants
- Textual substitution by the preprocessor
- Different from typed constants, which are handled during the compilation phase
- Storage classes
- Defines the scope and lifetime of variables and/or functions
- auto, register, static, extern
- auto
- Default storage class for all local variables
- register - Define local variables that should be stored in a register instead of RAM

- Variables MIGHT be stored in a register depending on hardware and implementation restrictions
- static
- Keep a local variable in existence during the lifetime of the program
- Global static variables’ scope is restricted to the file in which it is declared
- extern
- Give a reference of a global variable that is visible to all program files
- Used when there are two or more files sharing the same global variables or functions
- Define a global variable or function in one file and use extern in other files
- Binary operators
- Operates on bit strings
- A = 60 (0011 1100 binary) and B = 13 (0000 1101 binary) → a bit holds a
- &
- Binary AND copies a bit to the result if it exists in both operands
- A & B = 12 (0000 1100)
- |
- Binary OR copies a bit if it exists in either operand
- A | B = 61 (0011 1101)
- ^
- Binary XOR (exclusive OR) copies the bit if it is set in one operand but not both
- A ^ B = 49 (0011 0001)
- ~
- Binary One’s Complement flip bits
- ~A = ~(60) = (1100 0011)
- <<
- Binary Left Shift moves the left operand value by the number of bits specified by the right operand
- A << 2 = 240 (1111 0000)
- >>
- Binary Right Shift
- C program input and output
- 3 I/O streams are provided by the OS to any program
- Stream is a sequence of bytes
- stdin - Standard Input Stream from terminal by default
- stdout - Standard Output Stream defaults to terminal
- stderr - Standard Error Stream defaults to terminal
- 3 I/O streams are provided by the OS to any program
- Header files
- Maintain consistency
- All src files have same information
- Reduce effort and duplication
- Increase flexibility
- Increase code clarity
- What to put
- Symbolic constants
- Function prototypes
- Data structure definitions
- External variable definitions
- Guards
- NOT executable code
- Macros border on exceptions
- Using header files
- In the source file (.c, .cpp, .cxx)
- Use #include directive
- Compiler (preprocessor) inserts the contents of the specified file
- At that point of the source file
- Textual insertion of the entire #included file
- In the source file (.c, .cpp, .cxx)
- Maintain consistency
- Guards in header files
- Guard against multiple inclusions of same header file
- Multiple inclusions would result in multiple definitions of same information
- We use ‘guards’ to prevent more than one inclusion to a given source file
- Works by defining a symbol first time the header is included
- Definition acts as a guard (flag) to know if the file has already been included
- To derive the symbol:
- Convert to file name ALL CAPS.
- Replace dot with underscore.
- Guard against multiple inclusions of same header file
- make utility
- Build executable and library files from multiple sources
- Multiple files go into each program
- Modularity, Organization, Reuse other programs, Multiple Developers, etc.
- When compiled, each .c, .cxx file produces one .o file.
- Each C/C++ source compiles to one object file
- Typically, object file has same base name as source file (can be renamed)
- Multiple .o files can be linked to form one executable of library file
- Source files can include other files
- e.g., Header files (.h, .hxx, .hpp) with declarations and prototypes
- Possible to include other source files, but frowned upon
- Does things based upon dependencies
- Usually compiling, linking, copying or moving…
- make tries to do the minimum needed
- Ensures everything is up-to-date with minimal work
- Each project needs a makefile
- Named makefile by default - sometimes Makefile
- Source files depend on header files
- make utility uses timestamp of files to see what’s newer and determine if something needs rebuilding
- Each section of the makefile (aka rule) specifies
- Target (the file to be generated)
- Dependencies (files the target depends on)
- Commands (what to do if dependency has changed)
- Each section starts at the left-most column
- Each line in the rest of the section is indented with a tab
- First section in makefile is the default “rule”
- Used if you run make with no parameters
- The make utility looks to default section to figure out what rules to perform
- Used if you run make with no parameters
- makefile formatting - General format for each section / rule

- Commands indented with a tab character
- The programmer’s responsibility to specify dependencies

- The programmer’s responsibility to specify dependencies
- Doxygen
- Must have config file Doxyfile in project directory
- A software program (tool) that will generate documentation from annotated C/C++ sources (as well as several other programming languages.) Doxygen program will scan source files looking for keywords and analyze program structure to:
- Generate on-line documentation or off-line reference documentation. Doxygen can produce output documentation in a number of different protocols including, but not limited to, html, LaTex, RTF, PstScript, PDF, and Unix/Linux man pages.
- Analyze and extract the code structure for undocumented source files.
- Analyze keywords and structure to generate normal program documentation such as user manuals.
- Accessing Doxygen HTML
- Must be somewhere under your public_html directory
- All files must be publicly readable
- All directories above must be publicly executable
- Casting
- Instruct the compiler to do a conversion
- Convert data from one type to another type for an assignment
- Instruct the compiler to interpret data differently
- How to interpret the memory location
- How to interpret the bits in memory
- Some type of conversion (implicit) is done automatically - especially
promotion
- No loss of information
- float → double
- char → int
- int → long
- int → double
- Explicit type conversion can lose data due to truncation
- int i = (int) 1.5;
- Weaknesses of C conversion
- Multiple meanings in different contexts.
- Convert the data
- Look at data differently
- No way to specify conversion algorithm
- Possible to write a function that does the conversion
- Non-portable across platforms
- Data storage sizes and representations can be different
- Note: C++ adds more advanced casting
- Multiple meanings in different contexts.
- Instruct the compiler to do a conversion
- typedef
- Allows creation of “aliases” for types
- Greater readability
- Greater portability
- Since primitive types can be implementation-dependent
- i.e., can have conditional compilation to create int_16 always 2 bytes
- Dependence
- Sometimes header file declarations depend on architecture
- Some declarations must agree with underlying system / architecture
- Abstraction
- Often we want the data storage to be consistent between platforms
- Some declarations want to be independent of underlying system / architecture
- Often we want the data storage to be consistent between platforms
- Syntax
- Typedef existing_type new_type;
- New type just like any other type to compiler
- Use it as such
- Examples
- typedef unsigned short uint_16;
- ─ typedef unsigned long uint_32;
- ─ Typedef unsigned long size_t;
- Will often be done using conditional compilation in header file so alias based on appropriate primitive type
- Can give a name to user defined data types (unions, structs)
- typedef vs #define
- typedef is limited to giving symbolic names to types only where as #define can be used to define alias for values as well, q., you can define 1 as ONE etc.
- typedef interpretation is performed by the compiler whereas #define statements are processed by the pre-processor.
- Allows creation of “aliases” for types
- Enums - User-defined data type
- Symbolic names for integral constants
- Limited set of values for enumerated type
- Significantly improves readability
- Good code style / hygiene
- Provides type safety
- Constrains valid ranges of integral values
- Declaration
enum enum_name {val1, val2, valN};
- Instantiation
- enum enum_name myEnumVar;
- Best practice: In general an enum is going to be used as a type definition and should always be in the header file
- Symbolic names for integral constants
- Array - sequence of data in contiguous memory locations
- An array object is a linear collection (sequence) of storage locations for elements of the same type
- C array is NOT an object
- Only data, no behavior
- Defining an array
- Address of a[i] = &array_start + index_offset * size_data_type
- Declaring arrays - For global scope - Arrays size must be constant known at compile time - So compilers know how much memory must be allocated at runtime - Must use constant literal or const variable for the size - For local scope (function, block) - Arrays can be of variable size - Size determined at runtime when they come into scope

- Dynamic allocation of global arrays is possible using malloc()
- An array name is a pointer to its first element
- Accessing array element
- The address is calculated
- Address of a[i] = (address of start of array) + ((size of an element) *
index)
- Cost
- 4 memory accesses
- 1 multiplication
- 1 addition
- Cost
- Address of a[i] = (address of start of array) + ((size of an element) *
index)
- The value is read and/or written
- The address is calculated
- Two-dimensional arrays
- Similar to Python
- Passing arrays as function arguments
- Pointer as a formal parameter
- void myFn(int *param)
- Sized array as a formal parameter
- void myFn(int param[10])
- Unsized array as formal parameter
- void myFn(int param[])
- Pointer as a formal parameter
- Return array from function
- C does not allow returning an entire array, instead return a pointer to an array
- The array’s name can be used in place of its pointer
- C does not allow returning the address of a local variable, so the returned array would have to be static
- Pointers, memory iteration, sorting
- Pointer is a variable whose value is the address of another variable - The address of a location in memory that stores a known data type

- A pointer is also a group of bytes that can hold an address
- A pointer p points to a char c
- Thus, the variable is a pointer to an object (as in C, not OOP) of some type
- Pointer declaration
- Must declare the type of object variable will point to
- Must declare name of variable that will be a pointer
- Must declare the variable is a pointer
- Declared with ”*” operator
- type * pointer
- Memory address is obtained with the & operator
- & operator returns the memory address of some object

- De-referenced with ”*” operator

- Dereferencing is following the pointer indirection
- Follow the pointer to the memory address it references
- Best practice: assign a NULL value to a pointer variable in case there is no
exact address
- Programs are not permitted to access memory at address 0 (NULL value)
- By convention, a NULL pointers are assumed to point to nothing
- Check for null for pointer using if
- Usage
- Beware uninitialized pointers - Don’t know where they’re pointing
- Pointers used incorrectly corrupt run-time environment
- High-level languages don’t provide pointers
- Increment pointer to move to the next memory address
- Number of bytes moved is based on type
- Decrement pointer to move to previous memory address
- Number of bytes moved is based on type
- Why do you want to do it?
- Efficiency — fast to sequence through arrays in memory
- Expressiveness — communicates code intent to reader
- Do you want to increment/decrement the pointer or object pointed at?
- Use parentheses to remove ambiguity
- Increment object referenced: (*x)++; ++(*x);
- Increment pointer before or after dereference: *(x++); *(++x);
- Legal and illegal pointers
- Safe To Point At
- To static data — it won’t go away
- To memory (variables) on the heap
- But not after they are deallocated
- To local variables and parameters
- But only in same scope
- Unsafe To Point At
- To memory (variables) on the heap after the memory has been freed
- To local variables and parameters after exit from the function or block
- After their lifetime ends
- Safe To Point At
- Casting C-style pointers
- All pointers are variables that store addresses to a type in memory
- Compiler needs to know the type pointed at so it can:
- Check for type mismatch in operations using pointer dereferences
- Find data fields and functions in structured (non-primitive) objects
- Stay aligned as pointers move through arrays
- Calculate offsets for array elements
- Do pointer arithmetic appropriately
- Interpret bits pointed at correctly
- Programmer responsible for casting pointer to right type - If you want to interpret bits in memory in different way - NOTE: this will not change the way bits exist in memory


- Iterating arrays - iterating with index vs with pointers

- Program (compiler) does not need to address of array index each access
- Pointer of matching data type can be incremented / decremented to iterate array
- Programmer must ensure pointer doesn’t run past array bounds
- The name of an array is automatically a pointer, does not need to declare
another pointer variable
- To the zeroth element of the array - start of the array in memory


- The latter is slightly more efficient
- Compiler doesn’t need to calculate array offset for each access
- Memory segment different uses in system

- Code segment
- Machine instructions generated from your source code
- Loaded at execution time
- Typically write-protected
- Read access but not write access
- Shared by multiple processes & threads
- Shared libraries
- Shared functions
- Complete programs
- Fixed size
- Compiler/linker knows how big
- Data segment
- Global & static scope variables declared in program
- Loaded at start of execution
- Can’t be write protected
- Programs need to write to variables
- Every process has its own copy
- Don’t want it changed by other programs
- Fixed size
- Compiler/linker knows how big
- Heap segment
- Dynamic memory allocations served from the heap
- Assigned by OS at start of execution
- Every process has its own
- Chunks allocated and released at run time
- Allocated using malloc()
- Deallocated using free()
- Best practice: Set values of freed pointers to NULL immediately after free to avoid undefined behavior when accessing those pointers after freeing
- Dynamic size - Can grow & shrink
- Compiler/link cannot know how big
- C/C++ objects
- Allocated from a heap at run time
- When code creates object using C++ new
- Allocated from a heap at run time
- Stack segment
- Automatic (local) variables grow and shrink like stack data struct
- Sometimes multiple stack segments runtime environment
- Assigned by OS at start of execution.
- Each process has its own
- Works just like a stack data structure
- Holds automatic variables & housekeeping:
- Function return address & values
- Parameters passed to functions
- Local (non-static) block variables
- Functions, Code blocks
- Dynamic size
- Each entry into a function or block entry uses memory from stack
- Pushed onto stack — grows to provide variable memory
- Variable memory is not initialized unless you explicitly do it
- Each function exit (return) or block exit releases stack memory
- Stack pops and stack decreases in size. In reverse order to usage.
- Stack pushed when entering function / block, Popped upon exiting
- Memory values not cleared when released from stack
- Stack memory values not cleared upon next function block entry - Uninitialized local / automatic variables will have leftover value - garbage

- Accessing variables
- When a local variable is accessed
- The address is calculated
- The value is read and/or written
- For single variables this is fairly simple
- Address = Start of stack frame + constant_offset
- If you are dereferencing a pointer, you already have the address
- When a local variable is accessed
- Memory alignment

- Hardware often requires data and/or instructions to be aligned
- Located at an addresses having certain limitations
- Might be absolutely required or just preferred for efficiency
- Commands & Data moved through computer on ‘bus’
- These have number of bits in width
- Think of lanes of a highway
- Aligning data with bus characteristics greatly improve efficiency of data moves
- Overall performance of computer significantly increased
- Low-level memory allocation
- As C system programmers we can allocate memory for our needs
- As no automatic internal structure exists for raw (low-level) chunks of memory from the heap segment
- Functions to allocate/deallocate memory - malloc(), calloc(), realloc(), free()
- As C system programmers we can allocate memory for our needs

- void *malloc(size_t size)
- Cast pointer returned to appropriate type
- Returns pointer to block of memory big enough to hold size bytes.
- size_t is a platform-dependent type guaranteed large enough to address anything
- Usually typedef’d to an unsigned integer but not always
- Big enough to hold the largest possible address or memory size.
- Allocated memory is aligned to handle any type of primitive data
- Returns NULL if error occurs
- Memory is not cleared so it might have garbage values
- void *calloc(size_t nmeb, size_t size)
- Returns ptr to memory big enough to hold nmemb elements of size bytes each.
- Enough space so nmemb elements can be properly aligned
- The order of parameters are not interchangeable
- First is number of elements, second is size (in bytes) of a single element
- Entire chunk of memory is initialized to zero values before return
- Returns NULL if error occurs
- void *realloc(void *ptr, size_t size)
- Can only be called for previously allocated memory
- Memory allocated by malloc(), calloc(), or previous realloc()
- Changes memory block to new size
Will move to new address to different location in heap if necessary
- Can only be called for previously allocated memory
- Copies any existing data to the new location
- Any additional memory added to the new chunk is not cleared
- NULL ptr if memory cannot be reallocated
- Existing memory is NOT freed or changed if reallocation fails
- void free(void *ptr)
- Releases memory back to heap that was previously allocated
- Memory allocated using malloc(), calloc() or realloc()
- You can only free memory pointed to by pointer ptr if it:
- Points to memory that was dynamically allocated from the heap
- Points to the start of the block that was dynamically allocated
- No value or error results are returned
- Releases memory back to heap that was previously allocated
- Parameters passing
- Parameters are passed from calling code to function by pushing copies onto the stack segment of memory
- In C, all parameters are passed by value
- The calling code passes actual parameters
- The receiving code gets formal parameters
- How pass by value works:
- Each actual parameter is evaluated, and a copy is made
- Each formal parameter is bound to the copy just made
- The function does its thing using the copy and other automatic variables
- Any return value is made available to calling code
- Calling code responsible for storing a copy of the return value
- If return value is not saved by calling code, the value goes out of scope
- Formal parameters & return value go out of scope and memory reused
- sizeof()
- Determine the storage size of things
- Size is computed at compile time
- Enhaces portability
- Different systems have different type sizes and different alignment
- Programmer doesn’t need to guess what system code will compile / run on
- Avoids magic numbers
- Can determine size at compile time for any system
- Programmer doesn’t need to hard code values with conditional compilation
- Context-aware for arrays
- If the size of array is known at compile time
- Returns the size of the entire array
- Otherwise, returns the size of the pointer to the first element
- If the size of array is known at compile time
- C-Strings
- C-Strings are just arrays of char

- The ‘\0’ at the end is the null terminator - flags the end of a string
- Each element is sizeof(char) - 1 byte
- Each element is the ASCII value of bits representing the character
- One character is enclosed in single-quotes
- To specify ASCII value of character
char val1 = 'A';- Occupies 1 byte // no null termination
- int val2 = ‘\n’
- Occupies 4 bytes with ASCII value of newline (10) in LSB
- Null-terminated string is enclosed in double-quotes
- To specify ASCII values with binary zero appended
- It is const static data type
- char hello_string[] = Hello;
- Occupies 6 bytes — 5 chars + null terminator
- Non-printable characters
- ‘\n’ = New Line (actually ASCII Line Feed)
- ‘\r’ = Carriage Return (usually not needed — unless Windows)
- ‘\t’ = Tab

- stdin, stdout, stderr

- Structures
- A C-style struct is a way to group related data of different types together
- In a user-defined data type - kind of like a C++ object without behavior
- Declare in one place in your code
- Use typedef to create an alias for clarity of use
- Give the definition a name for reuse - most common.
- Accessing struct fields
- Dot notation
- Structures as function arguments
- Pass struct as function arguments/use struct as formal parameters/return values same way as variable/pointer
- Pointers to structures
- Define pointers to structures the same way as defining pointers to any other variable
- Access a structure’s members using a pointer to that structure
- struct_ptr->member
- Bit fields
- Allow the automatic packing of data in a structure as compactly as possible
- Useful when memory is expensive
- Examples
- Packing several objects into a machine word
- 1 bit flags can be compacted
- Reading external file formats
- Packing several objects into a machine word
- Put :<bit width> after the variable
- E.g: use unsigned int age : 3; to save variable of range [0, 2^3)
- A C-style struct is a way to group related data of different types together
- Unions
- Allows storage of different data types (members) in the same memory location
- Only one member can contain a value at any time
- The memory occupied by a union will be large enough to hold the largest member of the union
- Acess union members using dot notation
- Hashing
- Hash tables/hash maps

- Key gets hashed, i.e. converted to an index
- Hash function is fast operation to create hash value
- Very hard to get back to original key
- Often used in crypto
- Maps the key value to fixed size and range

- Collisions will happen and must be handled
- List list of student record for each hash bucket
- Move colliding student record to first free bucket
- Rehash with some deterministic change
- Hash function
- Must turn the key into an index
- Essence: Turn larger number of bits to smaller number of bits
- Must be efficient
- Indexes must be well scattered
- Low chance of collision
- Algorithm choice is very dependent on type and values of key.\
- Good for random numeric data:
- Modulo
- Multiply by a constant, add another constant.
- Good for character strings or long data:
- Sum of bytes
- Shift and add, bit twiddling
- Weighted sum of bytes
- Polynomial
- How to deal with upper / lower case on strings
- Must turn the key into an index
- Data structures
- Object-oriented languages (C++ Language)
- Each stack, queue, list, tree, etc. is a separate object.
- Fields: Stack data area, size, pointer.
- Methods: push(), pop(), etc.
- Methods operate upon this stack.
- Non-Object-oriented languages (C Language)
- Data structure in its own (header) file
- Series of functions: create(), push(), pop(), etc. in C source file
- Need identifier if more than one instance of data structure
- Must pass it in as a parameter.
- Store the state of the data structure in a unique struct instance
- Object-oriented languages (C++ Language)
- System commands - fork and exec and child processes

- Shell: a program that reads commands and runs other programs
- Invoke the OS to execute specified system command
- As if you had typed command at the terminal command prompt
- If the command is NULL
- Then a nonzero value if a shell is available, or zero if no shell is available
- If a child process couldn’t be created, or its status not retrieved
- The return value is -1 and errno is set to indicate the error
- If a shell couldn’t be executed in the child process,
- Then the return value is as though the child shell terminated by calling _exit(2)
- If all system calls succeed
- Then the return value is the termination status of the child shell used to execute command
- Forking - duplicate current task
- OS’s handle spawning new tasks in different ways
- Unix pioneered the approach of forking
- The current process is duplicated to make a full copy
- Everything is duplicated
- All data is duplicated — including pointers
- Open files and devices are duplicated
- One process calls fork(), two processes return! - Parent process & Child process
- The C++ language
- Visibility - Namespace - a declarative region (scope) that defines the boundary of - names (types, functions, variables, etc.) inside the namespace - Namespaces organize code into logical areas - Prevents name collisions that can occur on large systems or multiple libraries - As C++ developers we control the scope of names - Not just files or blocks of structured code - We can define a logical region in which the name exists - C++ standard library has the namespace of std - std::string mystring = “C++ is great!”;


- What is the difference between using namespace std and pulling the string.h
header file?
- A header file is a file that is intended to be included by source files. They typically contain declarations of certain classes and functions.
- A namespace enables code to categorize identifiers. That is, classes, functions, etc

- Best practice: right-hand side
- Classes

- Constructor (ctor) invoked when class instantiated
- Before instantiating code has reference to created object
- May involve memory allocation and init of objects
- Destructor (dtor) invoked when class destroyed
- When object goes out of scope and resources returned to system
- May involve cleanup and deallocation of memory for objects
- Setters / Getters
- Support data abstraction
- Avoid direct access to data members
- Copy constructor
- Initializes an object from another instance of same type by copying the member values from an object of the same type

- Inheritance
- C++ supports multiple inheritance
- Must qualify which superclass when dealing with diamond inheritance problems
- Polymorphism

- C++ collections
- Templates
- Parameterized classes and functions
- Allow the class, algorithm or function to be applied generally
- Implemented independently from the data type
- Standard Template Library
- Generalized library of parameterized components
- Contains
- Containers
- Sequential Containers: can be accessed in a sequential manner
- Vector, List, Deque, Arrays, Forward_list
- Container Adaptors: modify the interface for sequential containers
- Queue, Priority_queue, Stack
- Associative Containers: sorted data structures for quick searching
- Set, Multiset, Map, Multimap
- Unordered Associative Containers: unsorted with quick access
- Unordered_set, Unordered_multiset, Unordered_map, Unordered_multimap
- Iterators — access to all elements in a collection
- Algorithms — a process to determine something
- Functions — a relationship between parameters and results
- Sequential Containers: can be accessed in a sequential manner
- Containers
- Sequential containers
- Storage which allows sequential access to members
- array: Static contiguous array
- Improvement over the C-Style array
- vector: Dynamic contiguous array
- Sequential storage that can grow and shrink as needed
- deque: Double-ended queue
- Better insertion & deletion performance than vectors
- forward_list: Singly-linked list
- Very high performance with low overhead
- list: Doubly-linked list
- Slower traversal than vectors
- Iterators - An object that points to an element inside the container - Allows users to ‘iterate’ over the contents of the container - Iterator moves from beginning (front) to end (back) - Reverse Iterator move from end (back) to beginning (front) - Able to access element at current position by dereferencing iterator - Same syntax as dereferencing a pointer - Connect algorithm with containers - Allow algorithms access to elements - Without need to write code logic specific to the type of data in the container - Iterator: ++ moves forward collection, — moves backward - Reverse Iterator: ++ moves backward in collection, — forward - Some logic difference at the beginning or end of the collection - 5 types - Random Access Iterators: - Access elements at random position in collection — just like pointers - Bidirectional Iterators: - Can only move sequentially in collection — both forward and backward - Forward Iterators: - Can only move sequentially forward in collection — not backward - Output Iterators: - Can write current position but not read - Input Iterators: - Can read current position but not write
- Templates

- C++ References
- C++ support references to variables
- Primitive data types or user-defined data types
- C does not support references
- C++ references are like Python or Java object references
- Python and Java do not support pointers
- Both are variables storing addresses of other variables
- Pointers store address of which is dereferenced using *
- References are alias to address which do not need to be dereferenced
- The compiler does so implicitly
- Pointer
- Stores memory address
- Programmer must dereference
- Pointer can be reassigned
- Point to different location
- Has its own memory address
- Can be assigned null value
- Can be dereferenced
- Supports arithmetic operations
- Increment / Decrement
- Reference
- Alias for memory address
- Compiler implicitly dereferences
- Reference cannot be reassigned
- Bound when comes into scope
- Alias to existing memory address
- Cannot be assigned null value
- Cannot be dereferenced
- Doesn’t support arithmetic operations
- C++ support references to variables