## Statistics and distance based features

### Groupby feature

• How many pages user visited
• Standard deviation of prices
• Most visited page
• Many, many more

### Neighbors

• Explicit group is not needed
• More flexible
• Much harder to implement

such as:

• Number of houses in 500m, 1000m,..
• Average price per square meter in 500m, 1000m,..
• Number of schools/supermarkets/parking lots in 500m, 1000m,..
• Distance to closest subway station

KNN features as example:

• Mean encode all the variables
• For every point, find 2000 nearest neighbors using Bray-Curtis metric
• Calculate various features from those 2000 neighbors
• Mean target of nearest 5,10,15,500, 2000 neighbors
• Mean distance to 10 closest neighbors
• Mean distance to 10 closest neighbors with target 1
• Mean distance to 10 closest neighbors with target 0

## Matrix Factorizations for Feature Extraction

• Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
• It can be applied for transforming categorical features into real-valued
• Many of tricks trick suitable for linear models can be useful for MF

• Can be apply only for some columns
− Good for ensembles
• It is a lossy transformation. Its’ efficiency depends on:
− Numberoflatentfactors(Usually 5-100)
• Several MF methods you can find in sklearn
• SVD and PCA
− Standart tools for Matrix Factorization
• TruncatedSVD
− Works with sparse matrices
• Non-negative Matrix Factorization (NMF)
− Ensures that all latent factors are non-negative
− Good for counts-like data

## Feature interactions

• We have a lot of possible interactions − N*N for N features
• Even more if use several types in interactions
• Need to reduce its’ number
• Dimensionality reduction
• Feature selection
• Interactions’ order
• We looked at 2nd and higher order interactions.
• It is hard to do generation and selection automatically.
• Manual building of high-order interactions is some kind of art.

### Frequent operations for feature interaction

• Multiplication
• Sum
• Diff
• Division

### Extract features from DT

get the index of the leaf that each sample is predicted as. it a method to get the high order features

## tSNE

• Result heavily depends on hyperparameters (perplexity)
• Good practice is to use several projections with different perplexities (5-100)
• Due to stochastic nature, tSNE provides different projections even for the same data\hyperparams
− Train and test should be projected together
• tSNE runs for a long time with a big number of features
− it is common to do dimensionality reduction before projection.