Skip to content

RBE 474x: Deep Learning for Perception

  • Convolution (https://www.youtube.com/watch?v=KuXjwB4LzSA)

    • Fundamental operation in image processing, computer vision, and deep learning

    • Uses

      • Extract features from images

      • Detects edges, textures, and patterns

    • Core: Applying a filter or kernel to an image, transforming it into a new representation that highlights specific features

      • A mathematical operation that takes 2 inputs

        • An image (a 2D matrix of pixel values)

        • A kernel (a smaller matrix, also called a filter)

      • The kernel is systematically moved or convolved across the image

      • At each position, the element-wise product of the overlapping pixels is summed to produce a new pixel value in the output image

      • Effectively combines the original image’s information with the filter’s characteristics, emphasizing certain features such as edges or textures

    • Terms

      • Kernel (filter)

        • A small matrix used in convolution to modify the image

        • Common ones: Gaussian, Sobel, or Prewitt operators

      • Stride

        • The step size with which the kernel moves across the image
      • Padding

        • Adding extra pixels (usually zeroes) around the edges of the image to control the size of the output image
      • Convolution vs. Cross-Correlation

        • In true convolution, the kernel is flipped before applied (by element-wide dot product and summation) to the image

          • The traditional mathematical operation
        • In cross-correlation, the kernel is used as is

          • Many deep learning frameworks use cross-correlation for its simplicity and efficiency
      • Intuition: “Smooth out” part of the image, incorporating characteristics of the filter

      • Gaussian blur

        • Blur filter but weights of the kernel form a Gaussian distribution
      • Edge detection

        • Positive weights on the left and negative weights on the right of the kernel

        • Detect changes in pixel values as kernel moves from left to right

        • All weights add up to 0 → homogenous patch of pixels is 0 → black/nothing

      • Image sharpening (convolution neural network)

        • Use neural network to figure out what the kernel should be given as determined by whatever the neural network wants to detect
    • Classical convolution takes O(n^2) while fast Fourier transform takes O(n * log(n))

  • Multi-Layer Perceptrons And Backpropagation

    • Feature X=[], dimensions=[# of features, …]

    • Linear Regression

      • Y = W(T)X + B

        • W(T) weights tranposed

        • B bias

      • Given {X, Y}i find W

      • Mathematically

        • Argmin
    • Neural networks

      • Function like a mathematical function

      • Evaluating a neural network → forward pass

        • Inputs are passed through the network layers that generate outputs
      • Optimize network’s performance

        • Weights and biases need to be adjusted → backward propagation (or backpropagation)

          • The gradients of the loss function wrt each parameter are calculated

          • These gradients are subtracted from the corresponding weights and biases, allowing the network to learn and improvise its predictions

        • A backprop pass is called an epoch

      • Linear layer in a neural network

        • Performs a linear transformation of the input data

        • 2 components

          • Weights

          • Biases

      • Softmax function

        • Commonly used in neural networks for multi-class classification problems

        • Converts a vector of raw scores (logits) into probabilities, making it possible to interpret the output as the likelihood of each class

      • Convolutional layer

        • Fundamental building block in CNNs

        • Used primarily for processing grid-like data such as images

        • Applies convolution operations to detect local features in the input