0%

INFSCI 2595:Machine Learning(part one)

Reference from some lecture slides of INFSCI 2595 lectured by Dr. Mai Abdelhakim

Introduction

What is Machine Learning?

  • Subfield of artificial intelligence
  • Field of study that gives computers the ability to learn without being explicitly programmed

How can we build computer system that learn and improve with experience?

  • Statistics make conclusions from data, and estimate reliability of conclusions
  • Optimization and computing power to solve problems
Machine learns with respect to a particular task T, performance metric P and experience E, if the performance P on task T improves with experience E.

Why Machine Learning is Important

  • Provide solution to complex problems that cannot be easily programmed
  • Can adapt to new data
  • Helps us to understand complicated phenomena
  • Can outperform human performance

Machine Learning Algorithms

Supervised Learning

  1. Learn using labeled data (correct answers are given in learning phase)
  2. make predictions of previously unseen data
  3. Two types of problems
    • Regression: Target values (Y) are continuous/quantitative
    • Classification: Target values (Y) are discrete/finite/qualitative

Unsupervised Learning

  1. Clustering analysis
  2. Finding groups of similar users
  3. Detecting abnormal patterns

Machine Learning Models and Trade-offs

Why do we need a model? Why estimate f?

  • Predictions: Make predictions for new inputs/features
  • Inference: understand the way Y is affected by each features
    • Which feature has stronger impact on the response?
    • Is relation positive or negative
    • Is the relationship linear or more complicated

How to estimate f?

  1. Parametric Approach
    • First,assume function form
    • Second, use training to fit the model
  2. Non-Parametric Approach
    • No explicit form of function f is assumed
    • Seek to estimate f as close as possible to the data points

Trade-off: Model Flexibility vs Model Interpretability

interpretability

Model Accuracy

  1. In regression setting, a common measure is mean squared error(MSE)

\[MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{f(x_{i})})^{2}\]

Overfitting and Underfitting

Two thing we need to avoid: - Overfitting: Building a model that is too complex, fits training data very well, but fail to generalize to new data (e.g. large test MSE) - Underfitting: build simple model that is unable to capture variability in data

  • Simple models may not capture the variability in the data
  • Complex models may not generalize

Bias-Variance Tradeoff

\[E(y_{0} - \hat{f(x_{0})})^2 = Var(\hat{f(x_{0})}) + [Bias(\hat{f(x_{0})})]^{2} + Var(\epsilon)\]

  • Variance: amount by which \(\hat{f}\) changes if we made the estimation by different training set
  • Bias: Errors from approximating real-life problems by a simpler model
  1. Classification Setting
  • \(\hat{y_{0}} = \hat{f(x_{0})}\) is the predicted output class
  • Test error rate: \[Average(I(y_{0} \neq \hat{y_{0}}))\]

Bayes classifier

  • Bayes classifier assigns each observation to the most likely class given the feature values.
  • Assign \(x_{0}\) to class ! that has largest \(Pr(Y= j|X = x_{0})\)

K-Nearest Neighbors

  • Define a positive integer K
  • For each test observation \(x_{0}\) , identify K points in the training data that are closest to \(x_{0}\) referred to as \(N_{0}\)
  • Estimate the conditional probability for class j as fraction of points in \(N_{0}\) whose response values equal to j \[Pr(Y = j | X = x_{0}) = \frac{1}{k}\sum_{i \in N_{0}}I(y_{i} == j)\]

knn