CS 2103: Accelerated Object Oriented Design Concepts
- 10/24/22
- Event-driven programming
- Control flow is variable, it depends on user input
- It is desirable to have the screen update at a regular rate (60Hz)
- Software testing
- Tests suite: a group of tests
- Module (Code) → Test suite → Report
- Black-box testing: implementation unknown, only examine I/O behavior
- White-box testing: implementation known
- Event-driven programming
- 10/25/22
- Java
- Compiled, “mid-level”
- Native language of a CPU is its assembly language (e,g: different for Intel, ARM, Motorola,…)
- .java → .class (bytecode, native language of the Java VM) → Java VM → CPU
- Automatic garbage collection
- Secure
- Type checking
- Array-bounds checking
- Slower
- Runs on a VM
- Run-time security features
- Elements of Good Programming Style
- Refactoring
- Inheritance
- Ownership: Instantiate an object of the helper class inside other classes and use that object’s methods
- Refactoring
- Java
- 10/27/22
- Overloading: Multiple “versions” of the same method taking in different types and/or number of parameters
- Abstract class: a class that cannot be instantiated; only its subclasses can be instantiated
- Java is single-inheritance
- Ownership is by design, better than abstraction if targeted classes have no relationship when refactoring
- 10/28/22
- Coding style
- Java class name: mixed-case (ImageAnalyzer)
- Instance variables, local variables, method parameters, and instance methods: camel-case
- Instance variables: _ before name
- Constants: declare static final, all caps with underscores
- White-spaces and braces: consistent
- Access modifiers
- private > (default) > protected > public
- private: only methods within the same class can access, even subclasses cannot
- default: package-private variables/methods/classes can be accessed by every class within the same package
- protected: can bee accessed from classes within the same package and by subclasses
- public: can be accessed from any clas
- Interfaces
- A collection of methods signatures, and descriptions of what they do, but no function bodies
- Interfaces are more flexible in terms of design than classes, could group more loosely related methods
- Name often ends with -able
- Methods with no bodies are called abstract
- Implementing an interface: create a body for every method in the interface
- Java is multiple-implementation
- Once an interface has been implemented, you can:
- Declare a variable of the interface type
- Declare a parameter of the interface type
- Return a variable of the interface type
- Declare an array of variables of the interface type
- Run-time vs declared data types?
- Interfaces cannot be instantiated, because they do not have function bodies to execute
- Subinterfaces
- Similar to inheritance, the subinterface inherits all the methods from the parent interface
- Can have multiple parent interfaces
- Abstract classes
- Abstract classes cannot be instantiated
- The inherited subclass from the abstract class must implement all the abstract methods in the abstract class
- Can implement the function bodies in the abstract class itself in concrete methods
- Abstract classes are allowed to contained
- Variables
- Concrete methods
- Abstract methods
- Coding style
- 10/31/22
- Interfaces as contracts
- Interfaces facilitate the division of labor between members of a team
- Interfaces as contracts
- 11/1/22
- Common practice: Have an abstract class implement some of the methods in an interface
- Type-safety and casting
- Objects can have multiple types
- Declaring a variable of a certain type != instantiating the object itself
- DoubleDate ddata != DoubleDate ddta= new DoubleDate();
- The name before a variable is the declared type
- An object can have many related-typed variables pointing to it
- Two kinds of type-checking
- Compile-time (Java compiler)
- Run-time (Java Virtual Machine)
- The declared type dictates which methods can be called on the object
- Casting is a “promise” to the compiler that variables are assigned to the correct more “specific” type
- Upcast (from more specific to less specific)
- Downcast (from less specific to more specific)
- The run-time type of a variable depends on the constructor that was called to instantiate it
- The run-time type dictates which implementation of a method is executed
- At run-time, the JVM determines which method implementation based on the run-time type → process called dynamic dispatch
- Collections
- Arrays: convenient, efficient, fixed size for adding, changing, and
accessing
- More elements? Impossible
- Quick find? Linear time
- ArrayList: similar to Java arrays, but “grow” automatically
- Implement an ArrayList
- Under the hood, ArrayList consists of a regular Java array of objects that “grows” as necessary
- isFull(), growArray(), add(), remove(), insert(),…
- In practice, doubling the size of the new array is balanced in terms of speed and memory
- Implement an ArrayList
- Arrays: convenient, efficient, fixed size for adding, changing, and
accessing
- 11/4/22
- Type parameter: restrict the type in a Java collection
- Specify the desired type as a type parameter in both the declaration and instantiation
- Can also be an interface
- Type-safe collections
- An ArrayList<T> will only accept objects of type T or any subclass of T
- Hash maps/hash tablespro
- HashMap<K, V>
- K: key type
- V: value type
- hashMap.get(key) → value associated with key
- hashMap.put(key, value) → adds new entry
- HashMap<K, V>
- Graphs
- Edges have weights associated with them to represent distance, cost, etc.
- Representation in memory
- Adjacency matrix: for the whole graph
- N x N matrix, N # of nodes in graph
- The (u, v)th slot is the weight of that connection
- Fast way of assessing connections
- Adjacency list: for every node
- Every node maintains a list of other connected nodes
- Adjacency matrix: for the whole graph
- Two fundamental problems
- Node discovery
- Determine the set of reachable nodes from a starting node
- 2 principal algorithms: BFS & DFS
- Finding shortest path
- Node discovery
- Stacks
- On modern computers, a program’s code and data are stored together in memory (Von Neumann architecture)
- To keep track of which instruction in memory is being executed, the CPU maintains an Instruction Pointer (IP)
- LIFO
- “Return address” stack keeps track of where the IP returns to after finishing executing some function
- Queues
- FIFO
- Used for temporary data storage
- One use case: inter-process communication (IPC)
- Cross-program messaging
- Program B needs to receive the message sent from A in the same order it was sent
- Type parameter: restrict the type in a Java collection
- 11/8/22
- BFS
- Maintain some data structures
- A collection of visited nodes from s
- A queue of nodes have yet to visit
- Maintain some data structures
- DFS
- Just replace the queue with a stack
- Finding shortest paths
- BFS can be extended to give the shortest path between s and t
- Whenever we add node n’ (neighbor) to nodes ToVisit, we keep track of which node it was visited from (using Map<Node, Node> gotHereFrom)
- As soon as BFS discovers node t, the BFS terminates
- Once t is reached, we backtrack using gotHereFrom
- Abstract data types (ADTs)
- ArrayList
- Purpose: store a variable amount of data in a linear order
- Operations: add, get, remove,…
- Time costs
- Adding is “fast” except when the internal array is doubled
- The time needed to search for an element grows as the array becomes longer
- Dictionary/Hashtable
- Purpose: associate keys with values
- Operations: put, get, remove,…
- Time costs
- All operations take about the same amount of time no matter how many elements
- ArrayList
- Algorithmic analysis
- Instead of measuring time cost in terms of seconds, milliseconds, etc., we will count the number of elementary operations
- CPUs implement the following operations and more
- Arithmetic
- Comparison
- Branching
- Assignment
- Array access
- When analyzing time cost, there are three cases
- Best case
- The best-case time cost is usually the least interesting since it’s very optimistic
- Worst case
- Average case
- Usually very difficult to compute in practice
- Best case
- Big-”O” notation
- Express the asymptotic time cost of an algorithm
- Focuses on the dominant term in the function as n grows very large
- Types
- O(1) — constant
- O(log n) — logarithmic
- O(n) — linear
- O(n log n) — loglinear
- O(n^2) — quadratic
- O(2^n) — exponential
- O(n!) — factorial
- BFS
- 11/11/22
- Problems with ArrayList
- Forces all data to be stored in one contiguous array
- Memory management in Java
- Some primitive types: int, float, char, etc.
- Some reference (object) types
- ArrayList’s are of reference type
- Primitive values store their values directly
- Reference/object variables store the location of the object they point to
- The object that name points to is stored in a separate block of memory
- LinkedList
- Break the list into “chain” of Node objects
- Each node
- Stores one piece of data the user wants to store
- Contains a pointer to the next node in the chain, the last one points to a null object
- Nodes do not need to be stored contiguously, but incur a overhead memory cost
- Doubly-linked lists have previous and next pointers, singly-linked lists only have next pointers
- Add, remove operations at the ends of the list are O(1)
- Variants
- Circular lists
- Doubly-linked lists
- Lists with sentinel/dummy nodes (Nodes with no data that always exist even if list is empty)
- With sentinel/dummy nodes, handling corner cases can be simplified
- Problems with ArrayList
- 11/14/22
- Encapsulation
- Variables are hidden from outside access
- Helper methods are hidden from outside access
- Only public methods can be called from outside
- Good practice: use weakest possible type when possible for maximum flexibility
- To hide a helper class from outside access, we can embed Node as a private static inner-class of LinkedList
- Encapsulation
- 11/15/22
- Linear data structures
- Useful when we want to preserve the order in which the data was added
- An element is adjacent to at most 2 elements
- Binary search (O(logn) for a sorted list)
- Requires the data to be already sorted
- Adding new data into a list is slowL: O(n)
- Trees
- A full binary tree with height h has 2^h leaf nodes and n = 2^(h+1)-1 nodes in total
- A full binary tree with n nodes total has height h=log2(n+1)-1
- More generally, a binary tree T with n nodes has
- Minimum height: log2(n+1) - 1 (T is full)
- Maximum height: n - 1 (T is a chain)
- The maximum height of a tree with n nodes will impact the time-cost of operations of ADTs based on trees
- Tree-based data structures
- Array-based binary trees
- Each node in the tree is assigned a unique index at which its data should be stored
- Applies only to complete trees
- Every level of the tree is completely filled except possibly the last and the last level is (partially) filled from left to right
- The index of the left-child of a node with index idx? 2*idx + 1
- The index of the right-child of a node with index idx? 2*idx + 2
- The index of the parent of a node with index idx? (idx - 1)/2
- Heaps
- A max-heap is a complete binary tree that satisfies the heap condition
- The root of every sub-tree is no smaller than any node in the sub-tree
- Can be implemented using arrays or LinkedList
- To add a new object o to the heap
- Add o to the “bottom right” of the heap
- This may violate the heap condition
- Repeatedly “bubble up” o up the tree whenever o > parent(o) (Swap the node and its parent)
- Time cost: O(logn)
- Add o to the “bottom right” of the heap
- Remove the largest element from a heap
- Remove the root
- Remove the last node n in the heap and make it the new root
- This may violate the heap condition
- Trickling down: Recursively swap n with one of its children until the heap condition is restored
- Time cost: O(logn)
- A max-heap is a complete binary tree that satisfies the heap condition
- Array-based binary trees
- Linear data structures
- 11/17/22
- Type-bounds
- To put a restriction on the type parameter T, we can use a generic type-bound
- HeapImpl<T extends Comparable>
- This means that the T must be of a type that implements the Comparable interface
- Comparable is an upper bound on type T
- Javadoc
- Supports automatic documentation via writing code comments in the Javadoc style
- /** */, @param, @return
- Type-bounds
- 11/22/22
- Amazon Mechanical Turk: crowdsourcing human data
- 11/28/22
- Hash tables
- A hash table requires
- A hash function
- A method of handling collisions
- In order to ensure good performance, m must be bigger than n, the number of data the user will want to store (usually by a factor of 20 or 50)
- Hashing function
- Map data of arbitrary size to fixed-size values
- A good hash function satisfies two basic properties
- Fast: Worst case O(1) to compute
- Uniform: Minimize duplication of output values (collisions).
- Deterministic: Given the same key, it must always return the same value
- Handling collisions
- Open addressing (linear probing)
- Many strategies for “finding another slot”
- Simplest: linear probing
- If hash function maps into an index i that is occupied, put it into i + 1
- Handling deletions
- Suppose we delete an element
- We need to add a “bridge” element so that subsequent accesses to that bucket will “keep looking”
- Chaining
- Keep the head of a LinkedList at each index (kinda similar to segmented list)
- Fast when the linked list is still short
- Open addressing (linear probing)
- Usually, keys are not integers
- Java includes a hashCode() method in the Object class → “integer representation”/address of the object
- Overriding of default hashCode() has two requirements
- Deterministic
- Consistent across equal instances
- If o1.equals(o2) then hashCode(o1) == hashCode(o2), the opposite might not be true
- If your class overrides equals(), it must also override hashCode()
- A hash table requires
- equals() vs ==
- Comparing primitive variables: ==
- With object-type variables, there are 2 ways of testing for equality
- ==: tests if the variables point to the exact same object in memory
- equals(): tests if the objects are “semantically equivalent”
- Hash tables
- 12/1/22
- Exceptions
- Java provides a language construct called exceptions
- A mechanism to interrupt the normal control flow to immediately return to the caller and pass to it information on what went wrong
- try {<block of code that can throw an exception>} catch (<Exception type>) (<block of code to handle>)
- Generating math expressions with context-free grammars (CFGs)
- Parsing structured data is very similar to generating structured data
- Branching (stochastic) process to generate random mathematical expression
- The probability of the base case must be bigger than recursive case so it terminates without stack overflow
- E → x | E + E | E - E | E * E | (E)
- Each condition is called a production rule
- We can write all the production rules for one non-terminal
- A grammar that can have more than one parse tree generating the same string is called ambiguous
- For mathematical expressions, the grammar needs to be unambiguous
- Exceptions
- 12/5/22
- Anonymous classes
- Defining a class that will only be used exactly once (like lambda functions)
- Lambda expressions
- Allow method to be passed as an “object” to another method
- Only valid when the interface has one method with the right signature
- Iterators
- The type of data structure might change, don’t want to expose the inner workings of the data structures
- A design pattern that provides a fast, safe, uniform way of traversing through the data structure
- Iterator<type bound> iterator = <data structure>.iterator();
- while (iterator.hasNext()) {
- System.out.println(iterator.next());
- }
- Arrays in Java
- Java is row-major
- For an array stored in column-major layout, the elements of the columns are contiguous in memory. In row-major layout, the elements of the rows are contiguous
- Cache locality
- The order in which memory locations are accessed impacts performance due to how the CPU, main memory, and cache interact
- Cache stores expensive and fast access of a small fraction of the main memory
- Whenever the CPU reads a location in memory, it actually reads an entire “chunk” (cache line) into the cache
- Once the chunk is in the cache, reading from the same chunk is fast (cache hit)
- Parallelism and multi-threading
- With multiple CPUs on the same computer, each thread can be executed independently
- The first thread has to wait for the second thread to finish; this is called a join
- Can “improve” performance (multithreading has overhead)
- Data race: Two people withdraw money from the same account at the same time, who will get the money?
- Anonymous classes