overview

## Uniform distribution(continuous)

• Uniform distribution has same probaility value on [a, b], easy probability.

[<matplotlib.lines.Line2D at 0x11eae0cc0>]


## Bernoulli distribution(discrete)

• Bernoulli distribution is not considered about prior probability P(X). Therefore, if we optimize to the maximum likelihood, we will be vulnerable to overfitting.
• We use binary cross entropy to classify binary classification. It has same form like taking a negative log of the bernoulli distribution.
• For Logistic Regression

## Binomial distribution(discrete)

• Binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments.
• Binomial distribution is distribution considered prior probaility by specifying the number to be picked in advance.

for k = 0, 1, 2, …, n, where

## Multi-Bernoulli distribution, Categorical distribution(discrete)

• Multi-bernoulli called categorical distribution, is a probability expanded more than 2.
• cross entopy has same form like taking a negative log of the Multi-Bernoulli distribution.

where $[x = i]$ evaluates to 1 if $x = i$, 0 otherwise. There are various advantages of this formulation

## Multinomial distribution(discrete)

• The multinomial distribution has the same relationship with the categorical distribution as the relationship between Bernoull and Binomial.
• For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.
• When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution. When k is 2 and n is bigger than 1, it is the binomial distribution. When k is bigger than 2 and n is 1, it is the categorical distribution.

for non-negative integers $x_1, \cdots, x_k$.

The probability mass function can be expressed using the gamma function as:

array([[1, 1, 2, 3, 8, 5],
[2, 4, 3, 3, 6, 2],
[1, 6, 3, 2, 3, 5],
[5, 3, 4, 4, 2, 2],
[3, 8, 4, 2, 0, 3],
[2, 4, 1, 5, 1, 7],
[6, 3, 2, 4, 3, 2],
[8, 2, 1, 1, 4, 4],
[3, 6, 4, 1, 4, 2],
[3, 2, 3, 3, 6, 3]])


## Beta distribution(continuous)

• Beta distribution is conjugate to the binomial and Bernoulli distributions.
• Using conjucation, we can get the posterior distribution more easily using the prior distribution we know.
• Uniform distiribution is same when beta distribution met special case(alpha=1, beta=1).

## Gamma distribution(continuous)

• Gamma distribution will be beta distribution, if $\frac{Gamma(a,1)}{Gamma(a,1) + Gamma(b,1)}$ is same with $Beta(a,b)$.

• The exponential distribution and chi-squared distribution are special cases of the gamma distribution.

A random variable X that is gamma-distributed with shape α and rate β is denoted:

The corresponding probability density function in the shape-rate parametrization is:

[<matplotlib.lines.Line2D at 0x12cb80c50>]


## Dirichlet distribution(continuous)

• Dirichlet distribution is conjugate to the MultiNomial distributions. 即Dirichlet分布乘上一个多项分布的似然函数后，得到的后验分布仍然是一个Dirichlet分布。
• If k=2, it will be Beta distribution.

where $\{x_k\}_{k=1}^{k=K}$ belong to the standard $K-1$ simplex, or in other words:

The normalizing constant is the multivariate beta function, which can be expressed in terms of the gamma function

Dirichlet分布可以看做是分布之上的分布。如何理解这句话，我们可以先举个例子：假设我们有一个骰子，其有六面，分别为{1,2,3,4,5,6}。现在我们做了10000次投掷的实验，得到的实验结果是六面分别出现了{2000,2000,2000,2000,1000,1000}次，如果用每一面出现的次数与试验总数的比值估计这个面出现的概率，则我们得到六面出现的概率，分别为{0.2,0.2,0.2,0.2,0.1,0.1}。现在，我们还不满足，我们想要做10000次试验，每次试验中我们都投掷骰子10000次。我们想知道，骰子六面出现概率为{0.2,0.2,0.2,0.2,0.1,0.1}的概率是多少（说不定下次试验统计得到的概率为{0.1, 0.1, 0.2, 0.2, 0.2, 0.2}这样了）。这样我们就在思考骰子六面出现概率分布这样的分布之上的分布。而这样一个分布就是Dirichlet分布。

(3, 20)

<BarContainer object of 20 artists>


## Exponential distribution(continuous)

• Exponential distribution is special cases of the gamma distribution when alpha is 1.

## Gaussian distribution(continuous)

[<matplotlib.lines.Line2D at 0x11122d4e0>]


## Poisson distribution

• 在一个时间段内事件平均发生的次数服从泊松分布
• e is Euler’s number (e = 2.71828…)
• k! is the factorial of k.

## Chi-squared distribution(continuous)

• Chi-square distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.
• Chi-square distribution is special case of Beta distribution

If $Z_1, \cdots, Z_k$ are independent, standard normal random variables, then the sum of their squares,

is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as

The chi-square distribution has one parameter: a positive integer k that specifies the number of degrees of freedom (the number of $Z_i$ s).

The probability density function (pdf) of the chi-square distribution is

## Student-t distribution(continuous)

• Definition

Let $X_1, \cdots, X_n$ be independent and identically distributed as $N(\mu, \sigma^2)$, i.e. this is a sample of size $n$ from a normally distributed population with expected mean value $\mu$ and variance $\sigma^{2}$

Let

be the sample mean and let

be the (Bessel-corrected) sample variance.

Then the random variable

has a standard normal distribution

Student’s t-distribution has the probability density function given by

• $\nu$ is the number of degrees of freedom
• $\Gamma$ is the gamma function.

So the p-value is about 0.009, which says the null hypothesis has a probability of about 99% of being true.

0.0086


## Reference

Donate article here