RBE 474x: Deep Learning for Perception
-
Convolution (https://www.youtube.com/watch?v=KuXjwB4LzSA)
-
Fundamental operation in image processing, computer vision, and deep learning
-
Uses
-
Extract features from images
-
Detects edges, textures, and patterns
-
-
Core: Applying a filter or kernel to an image, transforming it into a new representation that highlights specific features
-
A mathematical operation that takes 2 inputs
-
An image (a 2D matrix of pixel values)
-
A kernel (a smaller matrix, also called a filter)
-
-
The kernel is systematically moved or convolved across the image
-
At each position, the element-wise product of the overlapping pixels is summed to produce a new pixel value in the output image
-
Effectively combines the original image’s information with the filter’s characteristics, emphasizing certain features such as edges or textures
-
-
Terms
-
Kernel (filter)
-
A small matrix used in convolution to modify the image
-
Common ones: Gaussian, Sobel, or Prewitt operators
-
-
Stride
- The step size with which the kernel moves across the image
-
Padding
- Adding extra pixels (usually zeroes) around the edges of the image to control the size of the output image
-
Convolution vs. Cross-Correlation
-
In true convolution, the kernel is flipped before applied (by element-wide dot product and summation) to the image
- The traditional mathematical operation
-
In cross-correlation, the kernel is used as is
- Many deep learning frameworks use cross-correlation for its simplicity and efficiency
-
-
Intuition: “Smooth out” part of the image, incorporating characteristics of the filter
-
Gaussian blur
- Blur filter but weights of the kernel form a Gaussian distribution
-
Edge detection
-
Positive weights on the left and negative weights on the right of the kernel
-
Detect changes in pixel values as kernel moves from left to right
-
All weights add up to 0 → homogenous patch of pixels is 0 → black/nothing
-
-
Image sharpening (convolution neural network)
- Use neural network to figure out what the kernel should be given as determined by whatever the neural network wants to detect
-
-
Classical convolution takes O(n^2) while fast Fourier transform takes O(n * log(n))
-
-
Multi-Layer Perceptrons And Backpropagation
-
Feature X=[], dimensions=[# of features, …]
-
Linear Regression
-
Y = W(T)X + B
-
W(T) weights tranposed
-
B bias
-
-
Given {X, Y}i find W
-
Mathematically
- Argmin
-
-
Neural networks
-
Function like a mathematical function
-
Evaluating a neural network → forward pass
- Inputs are passed through the network layers that generate outputs
-
Optimize network’s performance
-
Weights and biases need to be adjusted → backward propagation (or backpropagation)
-
The gradients of the loss function wrt each parameter are calculated
-
These gradients are subtracted from the corresponding weights and biases, allowing the network to learn and improvise its predictions
-
-
A backprop pass is called an epoch
-
-
Linear layer in a neural network
-
Performs a linear transformation of the input data
-
2 components
-
Weights
-
Biases
-
-
-
Softmax function
-
Commonly used in neural networks for multi-class classification problems
-
Converts a vector of raw scores (logits) into probabilities, making it possible to interpret the output as the likelihood of each class
-
-
Convolutional layer
-
Fundamental building block in CNNs
-
Used primarily for processing grid-like data such as images
-
Applies convolution operations to detect local features in the input
-
-
-