# Advanced Features

## Statistics and distance based features

### Groupby feature

1 | gb = df.groupby(['user_id','page_id'], as_index = False).agg({'ad_price': {'max_price': np.max, 'min_price': np.min}}) |

- How many pages user visited
- Standard deviation of prices
- Most visited page
- Many, many more

### Neighbors

- Explicit group is not needed
- More flexible
- Much harder to implement

such as:

- Number of houses in 500m, 1000m,..
- Average price per square meter in 500m, 1000m,..
- Number of schools/supermarkets/parking lots in 500m, 1000m,..
- Distance to closest subway station

KNN features as example:

- Mean encode all the variables
- For every point, find 2000 nearest neighbors using Bray-Curtis metric
- Calculate various features from those 2000 neighbors
- Mean target of nearest 5,10,15,500, 2000 neighbors
- Mean distance to 10 closest neighbors
- Mean distance to 10 closest neighbors with target 1
- Mean distance to 10 closest neighbors with target 0

## Matrix Factorizations for Feature Extraction

- Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
- It can be applied for transforming categorical features into real-valued
- Many of tricks trick suitable for linear models can be useful for MF

- Can be apply only for some columns
- Can provide additional diversity

− Good for ensembles - It is a lossy transformation. Its’ efficiency depends on:

− Particulartask

− Numberoflatentfactors(Usually 5-100) - Several MF methods you can find in sklearn
- SVD and PCA

− Standart tools for Matrix Factorization - TruncatedSVD

− Works with sparse matrices - Non-negative Matrix Factorization (NMF)

− Ensures that all latent factors are non-negative

− Good for counts-like data

## Feature interactions

- We have a lot of possible interactions − N*N for N features
- Even more if use several types in interactions

- Need to reduce its’ number
- Dimensionality reduction
- Feature selection

- Interactions’ order
- We looked at 2nd and higher order interactions.
- It is hard to do generation and selection automatically.
- Manual building of high-order interactions is some kind of art.

### Frequent operations for feature interaction

- Multiplication
- Sum
- Diff
- Division

### Example of interaction generation pipeline

### Extract features from DT

get the index of the leaf that each sample is predicted as. it a method to get the high order features

1 | tree.apply() |

## tSNE

- Result heavily depends on hyperparameters (perplexity)
- Good practice is to use several projections with different perplexities (5-100)
- Due to stochastic nature, tSNE provides different projections even for the same data\hyperparams

− Train and test should be projected together - tSNE runs for a long time with a big number of features

− it is common to do dimensionality reduction before projection.