Course Site Link: https://subasish.quarto.pub/ce7393-fall24/
Course Canvas
Class Etiquette
Please be respectful, especially to other students.
Please be present. Attendance will not be taken, but you are encouraged to come and learn together.
Please restrict the use of electronic devices to course-related material; other content could be distracting.
Please be forgiving; instructors are people too, we will make mistakes.
Be Considerate; If you feel extremely drowsy or unwell, it’s better to step out for a moment to refresh yourself rather than risk distracting your peers.
It is very hard to write programs that solve problems like recognizing a three-dimensional object in complex situations.
Can’t write the code as we don’t know how its done in our brain.
Even we figure out, it might be very complex code.
It is hard to write a program to compute the probability that a credit card frauds.
There is no simple and reliable rule . We need to combine a very large number of weak rules.
Fraud is a moving target. The program needs to keep changing.
Machine Learning Approach
Instead of writing a program for each specific task, we collect lots of examples that specify the correct output for a given input.
A machine learning algorithm then takes these examples and produces a general program.
If we do it right, the program works for new cases as well as the ones we trained it on.
If the data changes the program can change too by training on the new data.
Massive amounts of computation are now cheaper than paying someone to write a task-specific program.
Some examples of tasks best solved by learning
Recognizing patterns
Identify vulnerable roadway users (VRUs)
Facial identities or facial expressions
Pedestrain crash typing
Recognizing anomalies
U-turn movements at certain intersections
Unusual patterns of sensor readings in a nuclear power plant
Prediction
Crash severity types
How many crashes will occur on that road in year 2026?
A standard example of machine learning
The Modified National Institute of Standards and Technology (MNIST) database of hand-written digits is the the machine learning equivalent of fruit flies.
They are publicly available and we can learn them quite fast in a moderate-sized neural net.
We know a huge amount about how well various machine learning methods do on MNIST.
MNIST Data
A typical neuron
Gross physical structure
There is one axon that branches
There is a dendritic tree that collects input from other neurons.
Axons typically contact dendritic trees at synapses
A spike of activity in the axon causes charge to be injected into the post-synaptic neuron.
Spike generation
There is an axon hillock that generates outgoing spikes whenever enough charge has flowed in at synapses to depolarize the cell membrane.
Linear neurons
These are simple but computationally limited
If we can make them learn we may get insight into more complicated neurons.
Linear neurons
These are simple but computationally limited
If we can make them learn we may get insight into more complicated neurons.
Binary threshold neurons
McCulloch-Pitts (1943)
First compute a weighted sum of the inputs.
Then send out a fixed size spike of activity if the weighted sum exceeds a threshold.
McCulloch and Pitts thought that each spike is like the truth value of a proposition and each neuron combines truth values to compute the truth value of another proposition!
Binary threshold neurons
There are two equivalent ways to write the equations for a binary threshold neuron.
Rectified Linear Neurons
You have heard of RELU
They compute a linear weighted sum of their inputs.
The output is a non-linear function of the total input.
Sigmoid neurons
These give a real-valued output that is a smooth and bounded function of their total input.
Typically they use the logistic function
They have nice derivatives which make learning easy.
Stochastic binary neurons
These use the same equations as logistic units.
But they treat the output of the logistic as the probability of producing a spike in a short time window.
We can do a similar trick for rectified linear units:
The output is treated as the Poisson rate for spikes.
A very simple way to recognize handwritten shapes
Consider a neural network with two layers of neurons.
neurons in the top layer represent known shapes.
neurons in the bottom layer represent pixel intensities.
A pixel gets to vote if it has ink on it.
Each inked pixel can vote for several different shapes.
The shape that gets the most votes wins.
Display the weights
Give each output unit its own “map” of the input image and display the weight coming from each pixel in the location of that pixel in the map.
Use a black or white blob with the area representing the magnitude of the weight and the color representing the sign.
Types of learning task
Supervised learning
Learn to predict an output when given an input vector.
Reinforcement learning
Learn to select an action to maximize payoff.
Unsupervised learning
Discover a good internal representation of the input.
Two types of supervised learning
Each training case consists of an input vector x and a target output t.
Regression: The target output is a real number or a whole vector of real numbers.
Crash counts on Hunter Road in 2025.
The temperature at noon tomorrow.
Classification: The target output is a class label.
Pedestrian crash typing
Crash severity type
How supervised learning typically works
We start by choosing a model-class: \(y=f(\mathbf{x}; \mathbf{W})\)
A model-class, \(f\), is a way of using some numerical parameters, \(\mathbf{W}\), to map each input vector, \(\mathbf{x}\), into a predicted output \(y\).
Learning usually means adjusting the parameters to reduce the discrepancy between the target output, t, on each training case and the actual output, y, produced by the model.
For regression, \(\dfrac{1}{2}(y-t)^2\) is often a sensible measure of the discrepancy.
For classification there are other measures that are generally more sensible.
Reinforcement learning
In reinforcement learning, the output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward.
The goal in selecting each action is to maximize the expected sum of the future rewards.
We usually use a discount factor for delayed rewards .
The rewards are typically delayed so its hard to know where we went wrong.
Unsupervised learning
For about 40 years, unsupervised learning was largely ignored by the machine learning community
Some widely used definitions of machine learning actually excluded it.
Many researchers thought that clustering was the only form of unsupervised learning.
It is hard to say what the aim of unsupervised learning is.
One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning.
You can compute the distance to a surface by using the disparity between two images. But you don’t want to learn to compute disparities by stubbing your toe thousands of times.
Other goals for unsupervised learning
It provides a compact, low-dimensional representation of the input.
High-dimensional inputs typically live on or near a low-dimensional manifold.
Some methods: Principal Component Analysis (non-categorical), Multiple Correspondence Analysis (categorical), association rules.
It provides an economical high-dimensional representation of the input in terms of learned features.
Binary features are economical.
So are real-valued features that are nearly all zero.
It finds sensible clusters in the input.
This is an example of a very sparse code in which only one of the features is non-zero.
Why the learning procedure works (first attempt)
Consider the squared distance between any feasible weight vector and the current weight vector.
Example: Every time the perceptron makes a mistake, the learning algorithm moves the current weight vector closer to all feasible weight vectors.
Understanding Residuals
Understanding Residuals
Understanding Loss
Residuals, Loss (Code)
################## MSEimport numpy as npactual =np.random.randint(0, 10, 10)predicted =np.random.randint(0, 10, 10)print('Actual :', actual)print('Predicted :', predicted)ans = []# The Computation through applying Equation for i inrange(len(actual)):ans.append((actual[i]-predicted[i])**2)MSE =1/len(ans) *sum(ans)print("Mean Squared error is :", MSE)################### MAEimport numpy as npactual =np.random.randint(0, 10, 10)predicted =np.random.randint(0, 10, 10)print('Actual :', actual)print('Predicted :', predicted)ans = []# The Computation through applying Equation for i inrange(len(actual)):ans.append((actual[i]-predicted[i])**2)MAE =1/len(ans) *sum(ans)print("Mean Absolute error is :", MAE)###################### Huber Lossimport numpy as npdef huber_loss(y_pred, y, delta=1): huber_mse =0.5*np.square(np.subtract(y,y_pred)) huber_mae = delta * (np.abs(np.subtract(y,y_pred)) -0.5* delta) return np.where(np.abs(np.subtract(y,y_pred)) <= delta, huber_mse, huber_mae).mean()actual =np.random.randint(0, 10, (2,10))predicted =np.random.randint(0, 10, (2,10))print('actual :', actual)print('predicted :', predicted)print("Mean Absolute error is :", huber_loss(actual, predicted))
You can use ‘Python’ too. But this class will be mostly R based.
It’s a Ph.D./Graduate level course. Lecture focus is on concepts and applications, not code debuggig.
Relevant code will be posted to Canvas and embedded in the slides when necessary
Pros
Exposure to R
Rich Ecosystem
Reproducibility
Textbook
Cons
Steep learning curve
Performance
Package Quality
Limited Industry Adoption
Quarto
Quarto is an open-source scientific and technical publishing systemthat allows you to combine text, images, code, plots, and tables in a fully-reproducible document.Quarto has support for multiple languages including R, Python, Julia, and Observable.It works for a range of output formats such as PDFs, HTML documents, websites, presentations,…