0%

The difference between Lasso and Ridge regularization

Cost Function of Ridge

\[Cost = RSS(W) + \lambda ||W||^{2}_{2} = \sum_{i=1}^{N}(y_{i} - w_{0}h_{0}(x_{i})- w_{1}h_{1}(x_{i}))^{2} + \lambda (w_{0}^2 + w_{1}^{2}) \]

  1. 上述方程可以表示为一个重心不在原点的椭圆(有cross term)

==>

\[\sum y^{2} + W_{0}^{2}\sum h_{0}^{2} + W_{1}^{2}\sum h_{1}^{2} + cross\; term = constant\]

==>

\[W_{0}^{2}\sum h_{0}^{2} + W_{1}^{2}\sum h_{1}^{2} + cross\; term = constant\]

椭圆更大的半径表示更大的cost

  1. 增大\(\lambda\), 在求最优cost的过程中,\(w_{0}\)\(w_{1}\)会趋近于0.

分开表示cost function 的 rss 和 l2 penalty

  1. rss 表示椭圆
  2. l2 norm 表示圆 \[\lambda (w_{0}^2 + w_{1}^{2}) \]

Ridge 的最优解

Ridge的最优解是 rss 和 l2 penalty 之间balance 后的结果

Cost Function of Lasso

\[Cost = RSS(W) + \lambda ||W||^{2}_{2} = \sum_{i=1}^{N}(y_{i} - w_{0}h_{0}(x_{i})- w_{1}h_{1}(x_{i}))^{2} + \lambda (|w_{0}| + |w_{1}|)\]

  1. 上述方程可以表示为一个重心不在原点的椭圆(有cross term)

==>

\[\sum y^{2} + W_{0}^{2}\sum h_{0}^{2} + W_{1}^{2}\sum h_{1}^{2} + cross\; term = constant\]

==>

\[W_{0}^{2}\sum h_{0}^{2} + W_{1}^{2}\sum h_{1}^{2} + cross\; term = constant\]

  1. 增大\(\lambda\), 在求最优cost的过程中,\(w_{0}\)\(w_{1}\)会趋近于0. 但是,不是同时趋近为0,会有部分w先为0

分开表示cost function 的 rss 和 l1 penalty

  1. rss 表示椭圆(和ridge相同)

  2. l1 norm 表示菱形 \[\lambda (|w_{0}| + |w_{1}|)\]

Lasso 的最优解

Ridge的最优解是 rss 和 l1 penalty 之间balance 后的结果