Think bayesian & Statistics review

Main principles

Use prior konwledge
Chose answer that explains observations the most
Avoid extra assumptions

example

A main is running, why?

He is in a hurry
He is doing exports (use principle 2 to exclude, does not waer a sports suit, contradicts the data)
He always runs (use principle 3 to exclude)
He saw a dragon (use principle 1 to exclude)

Probability

for throw a dice, the probability of one side is 1/6

Random variable

Discrete

Probability Mass Function(PMF) \[P(X) = \begin{equation}\left\{\begin{array}{**lr**} & 0.2 & x = 1 \\ & 0.5 & x = 3 \\ & 0.3 & x = 7 \\ & 0 & otherwise \end{array}\right.\end{equation} \]

Continuous

Probability Density Function(PDF)

\[ P(x \in [a,b] = \lmoustache_{a}^{b} p(x)dx ) \]

Independence

X and Y are independent if: \[P(X,Y) = P(X)P(Y)\]

P(x,y) -> Joint
P(x) -> Marinals

Conditional probability

Probability of X given that Y happened:

\[P(X|Y) = \frac{P(X,Y)}{P(Y)}\]

Chain rule

\[\begin{equation}\begin{split} & P(X,Y) = P(X|Y)P(Y) \\ & P(X,Y,Z) = P(X|Y,Z)P(Y|Z)P(Z) \\ & P(X_1,\cdots,X_N) = \prod_{i=1}^{N}P(X_i|X_1,\cdots,X_{i-1}) \end{split} \end{equation}\]

Sum rule

\[P(X) = \lmoustache_{-\infty}^{\infty}P(X,Y)dy \]

Total probability

$B_1, B_2 $ 两两互斥，即 $B_i \cap B_j = \emptyset$ ，$i \neq j$, i,j=1，2，....，且$P(B_i)>0$,i=1,2,....;
$B_1 \cup B_2 \cdots = \Omega$ ，则称事件组 $B_1 \cup B_2 \cdots$ 是样本空间 $\Omega$ 的一个划分

\[P(A) = \sum_{i=1}^{\infty}P(B_i)(A|B_i)\]

Bayes theorem

$\theta$: parameters
$X$: observations
$P(\theta|X)$: Posterior
$P(X)$: Evidence
$P(X|\theta)$: Likelyhood
$P(\theta)$: Prior

\[P(\theta|X) = \frac{P(X,\theta)}{P(X)} = \frac{P(X|\theta)P(\theta)}{P(X)}\]

Bayesian approach to statistics

Frequentist

Objective
$\theta$ is fixed, X is random
training
Maximum Likelyhood (they try to find the parameters theta that maximize the likelihood, the probability of their data given parameters) \[\hat{\theta} = argmax_{\theta}P(x|\theta)\]

Bayesian

Subjective
X is random, $\theta$ is fixed
Training(Bayes theorem)
what Bayesians will try to do is they would try to compute the posterior, the probability of the parameters given the data. \[P(\theta|x) = \frac{P(X|\theta)P(\theta)}{P(X)}\]
Classification
- Training: \[P(\theta|x_tr,y_tr) = \frac{P(y_tr|\theta,x_tr)P(\theta)}{P(y_tr|x_tr)}\]
- Prediction: \[P(y_ts|x_ts,x_tr,y_tr) = \lmoustache{P(y_ts|x_ts,\theta)P(\theta|x_tr,y_tr)}d\theta\]
On-line learning (get posterior) \[P_k{\theta} = P(\theta|x_k) = \frac{P(x|\theta)P_{k-1}(\theta)}{P_{(x_k)}}\]

How to build a model

Model is the "joint probability" of all variables

model

Example

model