## Goal

Minimize some function g, Often, hard to find minimum for all coordinates, but easy for each coordinate.

Coordinate descent:

Initialize $\hat{w} = 0$ (or smartly…)
——while not converged
————-for j =0,1,…,D(pick a coordinate j)
———————-$\hat{w_{j}} <- min_{w}g(w_{0},…w_{D})$

## Optimizing least squares objective one coordinate at a time

aside: $h_{j}(xi)$ represent normalized feature

Fix all coordinates $w_{-j}$(all feature except $w_{j}$) and take partial w.r.t $w_{j}$

= $-2\sum^{N}_{i=1}h_{j}(x_{i})(y_{i} - \sum_{k \neq j}w_{k}h_{k}(x_{i}) - w_{j}h_{j}(x_{i}))$
= $-2\sum^{N}_{i=1}h_{j}(x_{i})(y_{i} - \sum_{k \neq j}w_{k}h_{k}(x_{i})) + 2w_{j}\sum^{N}_{i=1}h_{j}(x_{i})^{2}$

by definition, $\sum^{N}_{i=1}h_{j}(x_{i})^{2}$ = 1, \qquad $-2\sum^{N}_{i=1}h_{j}(x_{i})(y_{i} - \sum_{k \neq j}w_{k}h_{k}(x_{i})) = p_{j}$
we get

$$-2p_{j} + 2w_{j}$$


Set partial = 0 and solve:

$$\frac{\partial}{\partial w_{j}}RSS(w) = -p_{j} + w_{j}$$


==> $p_{j} = w_{j}$

## Coordinate descent for least squares regression

Initialize $\hat{w} = 0$ (or smartly…)
——while not converged
————-for j =0,1,…,D(pick a coordinate j)
———————-compute: $-2\sum^{N}_{i=1}h_{j}(x_{i})(y_{i} - \sum_{k \neq j}w_{k}h_{k}(x_{i}))$
———————-set: 