
1 | import numpy as np |
Uniform distribution(continuous)
- Uniform distribution has same probaility value on [a, b], easy probability.
1 | def uniform(x, a, b): |
1 | s = np.random.uniform(0,1,1000) |
[<matplotlib.lines.Line2D at 0x11eae0cc0>]
Bernoulli distribution(discrete)
- Bernoulli distribution is not considered about prior probability P(X). Therefore, if we optimize to the maximum likelihood, we will be vulnerable to overfitting.
- We use binary cross entropy to classify binary classification. It has same form like taking a negative log of the bernoulli distribution.
- For Logistic Regression
1 | def bernoulli(p, k): |
Binomial distribution(discrete)
- Binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments.
- Binomial distribution is distribution considered prior probaility by specifying the number to be picked in advance.
for k = 0, 1, 2, …, n, where
1 | def const(n, r): |
1 | s = np.random.binomial(10, 0.8, 1000) |
Multi-Bernoulli distribution, Categorical distribution(discrete)
- Multi-bernoulli called categorical distribution, is a probability expanded more than 2.
- cross entopy has same form like taking a negative log of the Multi-Bernoulli distribution.
where $[x = i]$ evaluates to 1 if $x = i$, 0 otherwise. There are various advantages of this formulation
1 | def categorical(p, k): |
Multinomial distribution(discrete)
- The multinomial distribution has the same relationship with the categorical distribution as the relationship between Bernoull and Binomial.
- For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.
- When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution. When k is 2 and n is bigger than 1, it is the binomial distribution. When k is bigger than 2 and n is 1, it is the categorical distribution.
for non-negative integers $x_1, \cdots, x_k$.
The probability mass function can be expressed using the gamma function as:
1 | def factorial(n): |
1 | np.random.multinomial(20, [1/6.]*6, size=10) |
array([[1, 1, 2, 3, 8, 5],
[2, 4, 3, 3, 6, 2],
[1, 6, 3, 2, 3, 5],
[5, 3, 4, 4, 2, 2],
[3, 8, 4, 2, 0, 3],
[2, 4, 1, 5, 1, 7],
[6, 3, 2, 4, 3, 2],
[8, 2, 1, 1, 4, 4],
[3, 6, 4, 1, 4, 2],
[3, 2, 3, 3, 6, 3]])
Beta distribution(continuous)
- Beta distribution is conjugate to the binomial and Bernoulli distributions.
- Using conjucation, we can get the posterior distribution more easily using the prior distribution we know.
- Uniform distiribution is same when beta distribution met special case(alpha=1, beta=1).
1 | def gamma_function(n): |
1 | s = np.random.beta(2, 5, size=1000) |
Gamma distribution(continuous)
Gamma distribution will be beta distribution, if $\frac{Gamma(a,1)}{Gamma(a,1) + Gamma(b,1)}$ is same with $Beta(a,b)$.
The exponential distribution and chi-squared distribution are special cases of the gamma distribution.
A random variable X that is gamma-distributed with shape α and rate β is denoted:
The corresponding probability density function in the shape-rate parametrization is:
1 | def gamma_function(n): |
1 | a, b = 2., 2. |
[<matplotlib.lines.Line2D at 0x12cb80c50>]
Dirichlet distribution(continuous)
- Dirichlet distribution is conjugate to the MultiNomial distributions. 即Dirichlet分布乘上一个多项分布的似然函数后,得到的后验分布仍然是一个Dirichlet分布。
- If k=2, it will be Beta distribution.
where $\{x_k\}_{k=1}^{k=K}$ belong to the standard $K-1$ simplex, or in other words:
The normalizing constant is the multivariate beta function, which can be expressed in terms of the gamma function
Dirichlet分布可以看做是分布之上的分布。如何理解这句话,我们可以先举个例子:假设我们有一个骰子,其有六面,分别为{1,2,3,4,5,6}。现在我们做了10000次投掷的实验,得到的实验结果是六面分别出现了{2000,2000,2000,2000,1000,1000}次,如果用每一面出现的次数与试验总数的比值估计这个面出现的概率,则我们得到六面出现的概率,分别为{0.2,0.2,0.2,0.2,0.1,0.1}。现在,我们还不满足,我们想要做10000次试验,每次试验中我们都投掷骰子10000次。我们想知道,骰子六面出现概率为{0.2,0.2,0.2,0.2,0.1,0.1}的概率是多少(说不定下次试验统计得到的概率为{0.1, 0.1, 0.2, 0.2, 0.2, 0.2}这样了)。这样我们就在思考骰子六面出现概率分布这样的分布之上的分布。而这样一个分布就是Dirichlet分布。
1 | def normalization(x, s): |
1 | s = np.random.dirichlet((0.2, 0.3, 0.5), 20).transpose() |
1 | s.shape |
(3, 20)
1 | plt.barh(range(20), s[0]) |
<BarContainer object of 20 artists>
Exponential distribution(continuous)
- Exponential distribution is special cases of the gamma distribution when alpha is 1.
1 | def exponential(x, lamb): |
1 | s = np.random.exponential(scale = 0.5, size=1000) |
Gaussian distribution(continuous)
1 | def gaussian(x, n): |
1 | mu, sigma = 0, 0.1 # mean and standard deviation |
[<matplotlib.lines.Line2D at 0x11122d4e0>]
Poisson distribution
- 在一个时间段内事件平均发生的次数服从泊松分布
- e is Euler’s number (e = 2.71828…)
- k! is the factorial of k.
1 | s = np.random.poisson(5, 10000) |
Chi-squared distribution(continuous)
- Chi-square distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.
- Chi-square distribution is special case of Beta distribution
If $Z_1, \cdots, Z_k$ are independent, standard normal random variables, then the sum of their squares,
is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as
The chi-square distribution has one parameter: a positive integer k that specifies the number of degrees of freedom (the number of $Z_i$ s).
The probability density function (pdf) of the chi-square distribution is
1 | def gamma_function(n): |
1 | s = np.random.chisquare(4,10000) |
Student-t distribution(continuous)
- Definition
Let $X_1, \cdots, X_n$ be independent and identically distributed as $N(\mu, \sigma^2)$, i.e. this is a sample of size $n$ from a normally distributed population with expected mean value $\mu$ and variance $\sigma^{2}$
Let
be the sample mean and let
be the (Bessel-corrected) sample variance.
Then the random variable
has a standard normal distribution
Student’s t-distribution has the probability density function given by
- $\nu$ is the number of degrees of freedom
- $\Gamma$ is the gamma function.
1 | def gamma_function(n): |
1 | ## Suppose the daily energy intake for 11 women in kilojoules (kJ) is: |
So the p-value is about 0.009, which says the null hypothesis has a probability of about 99% of being true.
1 | np.sum(s<t) / float(len(s)) |
0.0086