RBE 474x: Deep Learning for Perception

Convolution (https://www.youtube.com/watch?v=KuXjwB4LzSA)
- Fundamental operation in image processing, computer vision, and deep learning
- Uses
  - Extract features from images
  - Detects edges, textures, and patterns
- Core: Applying a filter or kernel to an image, transforming it into a new representation that highlights specific features
  - A mathematical operation that takes 2 inputs
    - An image (a 2D matrix of pixel values)
    - A kernel (a smaller matrix, also called a filter)
  - The kernel is systematically moved or convolved across the image
  - At each position, the element-wise product of the overlapping pixels is summed to produce a new pixel value in the output image
  - Effectively combines the original image’s information with the filter’s characteristics, emphasizing certain features such as edges or textures
- Terms
  - Kernel (filter)
    - A small matrix used in convolution to modify the image
    - Common ones: Gaussian, Sobel, or Prewitt operators
  - Stride
    - The step size with which the kernel moves across the image
  - Padding
    - Adding extra pixels (usually zeroes) around the edges of the image to control the size of the output image
  - Convolution vs. Cross-Correlation
    - In true convolution, the kernel is flipped before applied (by element-wide dot product and summation) to the image
      - The traditional mathematical operation
    - In cross-correlation, the kernel is used as is
      - Many deep learning frameworks use cross-correlation for its simplicity and efficiency
  - Intuition: “Smooth out” part of the image, incorporating characteristics of the filter
  - Gaussian blur
    - Blur filter but weights of the kernel form a Gaussian distribution
  - Edge detection
    - Positive weights on the left and negative weights on the right of the kernel
    - Detect changes in pixel values as kernel moves from left to right
    - All weights add up to 0 → homogenous patch of pixels is 0 → black/nothing
  - Image sharpening (convolution neural network)
    - Use neural network to figure out what the kernel should be given as determined by whatever the neural network wants to detect
- Classical convolution takes O(n^2) while fast Fourier transform takes O(n * log(n))
Multi-Layer Perceptrons And Backpropagation
- Feature X=[], dimensions=[# of features, …]
- Linear Regression
  - Y = W(T)X + B
    - W(T) weights tranposed
    - B bias
  - Given {X, Y}i find W
  - Mathematically
    - Argmin
- Neural networks
  - Function like a mathematical function
  - Evaluating a neural network → forward pass
    - Inputs are passed through the network layers that generate outputs
  - Optimize network’s performance
    - Weights and biases need to be adjusted → backward propagation (or backpropagation)
      - The gradients of the loss function wrt each parameter are calculated
      - These gradients are subtracted from the corresponding weights and biases, allowing the network to learn and improvise its predictions
    - A backprop pass is called an epoch
  - Linear layer in a neural network
    - Performs a linear transformation of the input data
    - 2 components
      - Weights
      - Biases
  - Softmax function
    - Commonly used in neural networks for multi-class classification problems
    - Converts a vector of raw scores (logits) into probabilities, making it possible to interpret the output as the likelihood of each class
  - Convolutional layer
    - Fundamental building block in CNNs
    - Used primarily for processing grid-like data such as images
    - Applies convolution operations to detect local features in the input