CS 2103: Accelerated Object Oriented Design Concepts

10/24/22
- Event-driven programming
  - Control flow is variable, it depends on user input
  - It is desirable to have the screen update at a regular rate (60Hz)
- Software testing
  - Tests suite: a group of tests
  - Module (Code) → Test suite → Report
  - Black-box testing: implementation unknown, only examine I/O behavior
  - White-box testing: implementation known
10/25/22
- Java
  - Compiled, “mid-level”
  - Native language of a CPU is its assembly language (e,g: different for Intel, ARM, Motorola,…)
  - .java → .class (bytecode, native language of the Java VM) → Java VM → CPU
  - Automatic garbage collection
  - Secure
    - Type checking
    - Array-bounds checking
  - Slower
    - Runs on a VM
    - Run-time security features
- Elements of Good Programming Style
  - Refactoring
    - Inheritance
    - Ownership: Instantiate an object of the helper class inside other classes and use that object’s methods
10/27/22
- Overloading: Multiple “versions” of the same method taking in different types and/or number of parameters
- Abstract class: a class that cannot be instantiated; only its subclasses can be instantiated
- Java is single-inheritance
- Ownership is by design, better than abstraction if targeted classes have no relationship when refactoring
10/28/22
- Coding style
  - Java class name: mixed-case (ImageAnalyzer)
  - Instance variables, local variables, method parameters, and instance methods: camel-case
  - Instance variables: _ before name
  - Constants: declare static final, all caps with underscores
  - White-spaces and braces: consistent
- Access modifiers
  - private > (default) > protected > public
  - private: only methods within the same class can access, even subclasses cannot
  - default: package-private variables/methods/classes can be accessed by every class within the same package
  - protected: can bee accessed from classes within the same package and by subclasses
  - public: can be accessed from any clas
- Interfaces
  - A collection of methods signatures, and descriptions of what they do, but no function bodies
  - Interfaces are more flexible in terms of design than classes, could group more loosely related methods
  - Name often ends with -able
  - Methods with no bodies are called abstract
  - Implementing an interface: create a body for every method in the interface
  - Java is multiple-implementation
  - Once an interface has been implemented, you can:
    - Declare a variable of the interface type
    - Declare a parameter of the interface type
    - Return a variable of the interface type
    - Declare an array of variables of the interface type
    - Run-time vs declared data types?
  - Interfaces cannot be instantiated, because they do not have function bodies to execute
- Subinterfaces
  - Similar to inheritance, the subinterface inherits all the methods from the parent interface
  - Can have multiple parent interfaces
- Abstract classes
  - Abstract classes cannot be instantiated
  - The inherited subclass from the abstract class must implement all the abstract methods in the abstract class
  - Can implement the function bodies in the abstract class itself in concrete methods
  - Abstract classes are allowed to contained
    - Variables
    - Concrete methods
    - Abstract methods
10/31/22
- Interfaces as contracts
  - Interfaces facilitate the division of labor between members of a team
11/1/22
- Common practice: Have an abstract class implement some of the methods in an interface
- Type-safety and casting
  - Objects can have multiple types
  - Declaring a variable of a certain type != instantiating the object itself
    - DoubleDate ddata != DoubleDate ddta= new DoubleDate();
  - The name before a variable is the declared type
  - An object can have many related-typed variables pointing to it
  - Two kinds of type-checking
    - Compile-time (Java compiler)
    - Run-time (Java Virtual Machine)
  - The declared type dictates which methods can be called on the object
  - Casting is a “promise” to the compiler that variables are assigned to the correct more “specific” type
  - Upcast (from more specific to less specific)
  - Downcast (from less specific to more specific)
  - The run-time type of a variable depends on the constructor that was called to instantiate it
  - The run-time type dictates which implementation of a method is executed
  - At run-time, the JVM determines which method implementation based on the run-time type → process called dynamic dispatch
- Collections
  - Arrays: convenient, efficient, fixed size for adding, changing, and accessing
    - More elements? Impossible
    - Quick find? Linear time
  - ArrayList: similar to Java arrays, but “grow” automatically
    - Implement an ArrayList
      - Under the hood, ArrayList consists of a regular Java array of objects that “grows” as necessary
      - isFull(), growArray(), add(), remove(), insert(),…
      - In practice, doubling the size of the new array is balanced in terms of speed and memory
11/4/22
- Type parameter: restrict the type in a Java collection
  - Specify the desired type as a type parameter in both the declaration and instantiation
  - Can also be an interface
- Type-safe collections
  - An ArrayList<T> will only accept objects of type T or any subclass of T
- Hash maps/hash tablespro
  - HashMap<K, V>
    - K: key type
    - V: value type
    - hashMap.get(key) → value associated with key
    - hashMap.put(key, value) → adds new entry
- Graphs
  - Edges have weights associated with them to represent distance, cost, etc.
  - Representation in memory
    - Adjacency matrix: for the whole graph
      - N x N matrix, N # of nodes in graph
      - The (u, v)th slot is the weight of that connection
      - Fast way of assessing connections
    - Adjacency list: for every node
      - Every node maintains a list of other connected nodes
  - Two fundamental problems
    - Node discovery
      - Determine the set of reachable nodes from a starting node
      - 2 principal algorithms: BFS & DFS
    - Finding shortest path
- Stacks
  - On modern computers, a program’s code and data are stored together in memory (Von Neumann architecture)
  - To keep track of which instruction in memory is being executed, the CPU maintains an Instruction Pointer (IP)
  - LIFO
  - “Return address” stack keeps track of where the IP returns to after finishing executing some function
- Queues
  - FIFO
  - Used for temporary data storage
  - One use case: inter-process communication (IPC)
    - Cross-program messaging
    - Program B needs to receive the message sent from A in the same order it was sent
11/8/22
- BFS
  - Maintain some data structures
    - A collection of visited nodes from s
    - A queue of nodes have yet to visit
- DFS
  - Just replace the queue with a stack
- Finding shortest paths
  - BFS can be extended to give the shortest path between s and t
  - Whenever we add node n’ (neighbor) to nodes ToVisit, we keep track of which node it was visited from (using Map<Node, Node> gotHereFrom)
  - As soon as BFS discovers node t, the BFS terminates
  - Once t is reached, we backtrack using gotHereFrom
- Abstract data types (ADTs)
  - ArrayList
    - Purpose: store a variable amount of data in a linear order
    - Operations: add, get, remove,…
    - Time costs
      - Adding is “fast” except when the internal array is doubled
      - The time needed to search for an element grows as the array becomes longer
  - Dictionary/Hashtable
    - Purpose: associate keys with values
    - Operations: put, get, remove,…
    - Time costs
      - All operations take about the same amount of time no matter how many elements
- Algorithmic analysis
  - Instead of measuring time cost in terms of seconds, milliseconds, etc., we will count the number of elementary operations
  - CPUs implement the following operations and more
    - Arithmetic
    - Comparison
    - Branching
    - Assignment
    - Array access
  - When analyzing time cost, there are three cases
    - Best case
      - The best-case time cost is usually the least interesting since it’s very optimistic
    - Worst case
    - Average case
      - Usually very difficult to compute in practice
- Big-”O” notation
  - Express the asymptotic time cost of an algorithm
  - Focuses on the dominant term in the function as n grows very large
  - Types
    - O(1) — constant
    - O(log n) — logarithmic
    - O(n) — linear
    - O(n log n) — loglinear
    - O(n^2) — quadratic
    - O(2^n) — exponential
    - O(n!) — factorial
11/11/22
- Problems with ArrayList
  - Forces all data to be stored in one contiguous array
- Memory management in Java
  - Some primitive types: int, float, char, etc.
  - Some reference (object) types
  - ArrayList’s are of reference type
  - Primitive values store their values directly
  - Reference/object variables store the location of the object they point to
  - The object that name points to is stored in a separate block of memory
- LinkedList
  - Break the list into “chain” of Node objects
  - Each node
    - Stores one piece of data the user wants to store
    - Contains a pointer to the next node in the chain, the last one points to a null object
  - Nodes do not need to be stored contiguously, but incur a overhead memory cost
  - Doubly-linked lists have previous and next pointers, singly-linked lists only have next pointers
  - Add, remove operations at the ends of the list are O(1)
  - Variants
    - Circular lists
    - Doubly-linked lists
    - Lists with sentinel/dummy nodes (Nodes with no data that always exist even if list is empty)
  - With sentinel/dummy nodes, handling corner cases can be simplified
11/14/22
- Encapsulation
  - Variables are hidden from outside access
  - Helper methods are hidden from outside access
  - Only public methods can be called from outside
  - Good practice: use weakest possible type when possible for maximum flexibility
  - To hide a helper class from outside access, we can embed Node as a private static inner-class of LinkedList
11/15/22
- Linear data structures
  - Useful when we want to preserve the order in which the data was added
  - An element is adjacent to at most 2 elements
- Binary search (O(logn) for a sorted list)
  - Requires the data to be already sorted
  - Adding new data into a list is slowL: O(n)
- Trees
  - A full binary tree with height h has 2^h leaf nodes and n = 2^(h+1)-1 nodes in total
  - A full binary tree with n nodes total has height h=log2(n+1)-1
  - More generally, a binary tree T with n nodes has
    - Minimum height: log2(n+1) - 1 (T is full)
    - Maximum height: n - 1 (T is a chain)
  - The maximum height of a tree with n nodes will impact the time-cost of operations of ADTs based on trees
- Tree-based data structures
  - Array-based binary trees
    - Each node in the tree is assigned a unique index at which its data should be stored
    - Applies only to complete trees
      - Every level of the tree is completely filled except possibly the last and the last level is (partially) filled from left to right
    - The index of the left-child of a node with index idx? 2*idx + 1
    - The index of the right-child of a node with index idx? 2*idx + 2
    - The index of the parent of a node with index idx? (idx - 1)/2
  - Heaps
    - A max-heap is a complete binary tree that satisfies the heap condition
      - The root of every sub-tree is no smaller than any node in the sub-tree
    - Can be implemented using arrays or LinkedList
    - To add a new object o to the heap
      - Add o to the “bottom right” of the heap
        
        This may violate the heap condition
      - Repeatedly “bubble up” o up the tree whenever o > parent(o) (Swap the node and its parent)
      - Time cost: O(logn)
    - Remove the largest element from a heap
      - Remove the root
      - Remove the last node n in the heap and make it the new root
        
        This may violate the heap condition
      - Trickling down: Recursively swap n with one of its children until the heap condition is restored
      - Time cost: O(logn)
11/17/22
- Type-bounds
  - To put a restriction on the type parameter T, we can use a generic type-bound
  - HeapImpl<T extends Comparable>
  - This means that the T must be of a type that implements the Comparable interface
  - Comparable is an upper bound on type T
- Javadoc
  - Supports automatic documentation via writing code comments in the Javadoc style
  - /** */, @param, @return
11/22/22
- Amazon Mechanical Turk: crowdsourcing human data
11/28/22
- Hash tables
  - A hash table requires
    - A hash function
    - A method of handling collisions
  - In order to ensure good performance, m must be bigger than n, the number of data the user will want to store (usually by a factor of 20 or 50)
  - Hashing function
    - Map data of arbitrary size to fixed-size values
    - A good hash function satisfies two basic properties
      - Fast: Worst case O(1) to compute
      - Uniform: Minimize duplication of output values (collisions).
      - Deterministic: Given the same key, it must always return the same value
  - Handling collisions
    - Open addressing (linear probing)
      - Many strategies for “finding another slot”
      - Simplest: linear probing
        
        If hash function maps into an index i that is occupied, put it into i + 1
        
        Handling deletions
        
        Suppose we delete an element
        
        We need to add a “bridge” element so that subsequent accesses to that bucket will “keep looking”
    - Chaining
      - Keep the head of a LinkedList at each index (kinda similar to segmented list)
      - Fast when the linked list is still short
  - Usually, keys are not integers
    - Java includes a hashCode() method in the Object class → “integer representation”/address of the object
    - Overriding of default hashCode() has two requirements
      - Deterministic
      - Consistent across equal instances
        
        If o1.equals(o2) then hashCode(o1) == hashCode(o2), the opposite might not be true
        
        If your class overrides equals(), it must also override hashCode()
- equals() vs ==
  - Comparing primitive variables: ==
  - With object-type variables, there are 2 ways of testing for equality
    - ==: tests if the variables point to the exact same object in memory
    - equals(): tests if the objects are “semantically equivalent”
12/1/22
- Exceptions
  - Java provides a language construct called exceptions
  - A mechanism to interrupt the normal control flow to immediately return to the caller and pass to it information on what went wrong
  - try {<block of code that can throw an exception>} catch (<Exception type>) (<block of code to handle>)
- Generating math expressions with context-free grammars (CFGs)
  - Parsing structured data is very similar to generating structured data
  - Branching (stochastic) process to generate random mathematical expression
    - The probability of the base case must be bigger than recursive case so it terminates without stack overflow
    - E → x | E + E | E - E | E * E | (E)
    - Each condition is called a production rule
    - We can write all the production rules for one non-terminal
  - A grammar that can have more than one parse tree generating the same string is called ambiguous
  - For mathematical expressions, the grammar needs to be unambiguous
12/5/22
- Anonymous classes
  - Defining a class that will only be used exactly once (like lambda functions)
- Lambda expressions
  - Allow method to be passed as an “object” to another method
  - Only valid when the interface has one method with the right signature
- Iterators
  - The type of data structure might change, don’t want to expose the inner workings of the data structures
  - A design pattern that provides a fast, safe, uniform way of traversing through the data structure
  - Iterator<type bound> iterator = <data structure>.iterator();
  - while (iterator.hasNext()) {
    - System.out.println(iterator.next());
  - }
- Arrays in Java
  - Java is row-major
  - For an array stored in column-major layout, the elements of the columns are contiguous in memory. In row-major layout, the elements of the rows are contiguous
  - Cache locality
    - The order in which memory locations are accessed impacts performance due to how the CPU, main memory, and cache interact
    - Cache stores expensive and fast access of a small fraction of the main memory
    - Whenever the CPU reads a location in memory, it actually reads an entire “chunk” (cache line) into the cache
    - Once the chunk is in the cache, reading from the same chunk is fast (cache hit)
  - Parallelism and multi-threading
    - With multiple CPUs on the same computer, each thread can be executed independently
    - The first thread has to wait for the second thread to finish; this is called a join
    - Can “improve” performance (multithreading has overhead)
    - Data race: Two people withdraw money from the same account at the same time, who will get the money?