0%

Learning to answer Yes or No

Posted on 2019-02-15 Edited on 2019-07-05 In Artificial Intelligence , Machine Learning Views: Valine:

Lecture Two(Learning to answer Yes or No), This is the course note of Machine Learning Foundation from Hsuan-Tien Lin(林轩田) from National Taiwan University

What hypothesis set can we use?

question.png

A Simple Hypothesis Set: the "Perceptron"

Perceptron in \(\mathbb{R}^2\)

\[h(x) = sign(w_0 + w_1x_1 + w_2x_2)\]

features x: points on the plane (or points in \(\mathbb{R}^d\) )
labels y: (+1), × (-1)
hypothesis h: lines (or hyperplanes in \(\mathbb{R}^d\) ),positive on one side of a line, negative on the other side
different line classifies simples differently

perceptrons <=> linear (binary) classifiers

Select g from \(\mathscr{H}\)

want: \(g \approx f\) (hard when f unknown)
almost necessary: \(g \approx f\) on D, ideally \(g(x_n) = f(x_n) = y_n\)
difficult: H is of infinite size
idea: start from some \(g_0\), and correct its mistakes on \(D\)

Perceptron Learning Algorithm

if PLA halts (i.e. no more mistakes), (necessary condition) \(D\) allows some w to make no mistake
call such \(D\) linear separable
as long as linear separable and correct by mistake
- inner product of \(w_f\) and \(w_t\) grows fast; length of wt grows slowly
- PLA ‘lines’ are more and more aligned with \(w_f \rightarrow halts\)

Line with Noise Tolerance

\(D\) is not linear separable?

not linear separable

Pocket Algorithm

Find the best weights in pocket until enough iterations

pocket algorithm

Question

Since we do not know whether D is linear separable in advance, we may decide to just go with pocket instead of PLA. If D is actually linear separable, what’s the difference between the two?
1 pocket on D is slower than PLA
2 pocket on D is faster than PLA
3 pocket on D returns a better g in approximating f than PLA
4 pocket on D returns a worse g in approximating f than PLA

answer: Because pocket need to check whether \(w_{t+1}\) is better than \(\hat{w}\) in each iteration, it is slower than PLA. On linear separable D, \(w_{POCKET}\) is the same as \(w_{PLA}\), both making no mistakes.