INFSCI 2595:Machine Learning(part one)

Reference from some lecture slides of INFSCI 2595 lectured by Dr. Mai Abdelhakim

Introduction

What is Machine Learning?

Subfield of artificial intelligence
Field of study that gives computers the ability to learn without being explicitly programmed

How can we build computer system that learn and improve with experience?

Statistics make conclusions from data, and estimate reliability of conclusions
Optimization and computing power to solve problems

Machine learns with respect to a particular task T, performance metric P and experience E, if the performance P on task T improves with experience E.

Why Machine Learning is Important

Provide solution to complex problems that cannot be easily programmed
Can adapt to new data
Helps us to understand complicated phenomena
Can outperform human performance

Machine Learning Algorithms

Supervised Learning

Learn using labeled data (correct answers are given in learning phase)
make predictions of previously unseen data
Two types of problems
- Regression: Target values (Y) are continuous/quantitative
- Classification: Target values (Y) are discrete/finite/qualitative

Unsupervised Learning

Clustering analysis
Finding groups of similar users
Detecting abnormal patterns

Machine Learning Models and Trade-offs

Why do we need a model? Why estimate f?

Predictions: Make predictions for new inputs/features
Inference: understand the way Y is affected by each features
- Which feature has stronger impact on the response?
- Is relation positive or negative
- Is the relationship linear or more complicated

How to estimate f?

Parametric Approach
- First,assume function form
- Second, use training to fit the model
Non-Parametric Approach
- No explicit form of function f is assumed
- Seek to estimate f as close as possible to the data points

Trade-off: Model Flexibility vs Model Interpretability

interpretability

Model Accuracy

In regression setting, a common measure is mean squared error(MSE)

\[MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{f(x_{i})})^{2}\]

Overfitting and Underfitting

Two thing we need to avoid: - Overfitting: Building a model that is too complex, fits training data very well, but fail to generalize to new data (e.g. large test MSE) - Underfitting: build simple model that is unable to capture variability in data

Simple models may not capture the variability in the data
Complex models may not generalize

Bias-Variance Tradeoff

\[E(y_{0} - \hat{f(x_{0})})^2 = Var(\hat{f(x_{0})}) + [Bias(\hat{f(x_{0})})]^{2} + Var(\epsilon)\]

Variance: amount by which \(\hat{f}\) changes if we made the estimation by different training set
Bias: Errors from approximating real-life problems by a simpler model

Classification Setting

\(\hat{y_{0}} = \hat{f(x_{0})}\) is the predicted output class
Test error rate: \[Average(I(y_{0} \neq \hat{y_{0}}))\]

Bayes classifier

Bayes classifier assigns each observation to the most likely class given the feature values.
Assign \(x_{0}\) to class ! that has largest \(Pr(Y= j|X = x_{0})\)

K-Nearest Neighbors

Define a positive integer K
For each test observation \(x_{0}\) , identify K points in the training data that are closest to \(x_{0}\) referred to as \(N_{0}\)
Estimate the conditional probability for class j as fraction of points in \(N_{0}\) whose response values equal to j \[Pr(Y = j | X = x_{0}) = \frac{1}{k}\sum_{i \in N_{0}}I(y_{i} == j)\]

knn