Advanced Features

Statistics and distance based features

Groupby feature

1 2	gb = df.groupby(['user_id','page_id'], as_index = False).agg({'ad_price': {'max_price': np.max, 'min_price': np.min}}) gb.columns = ['user_id','page_id','min_price','max_price']

How many pages user visited
Standard deviation of prices
Most visited page
Many, many more

Neighbors

Explicit group is not needed
More flexible
Much harder to implement

such as:

Number of houses in 500m, 1000m,..
Average price per square meter in 500m, 1000m,..
Number of schools/supermarkets/parking lots in 500m, 1000m,..
Distance to closest subway station

KNN features as example: - Mean encode all the variables - For every point, find 2000 nearest neighbors using Bray-Curtis metric \[\frac{\sum|\mu_i - \upsilon_i|}{|\mu_i + \upsilon_i|}\] - Calculate various features from those 2000 neighbors - Mean target of nearest 5,10,15,500, 2000 neighbors - Mean distance to 10 closest neighbors - Mean distance to 10 closest neighbors with target 1 - Mean distance to 10 closest neighbors with target 0

Matrix Factorizations for Feature Extraction

Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
It can be applied for transforming categorical features into real-valued
Many of tricks trick suitable for linear models can be useful for MF

Can be apply only for some columns
Can provide additional diversity − Good for ensembles
It is a lossy transformation. Its’ efficiency depends on: − Particulartask − Numberoflatentfactors(Usually 5-100)
Several MF methods you can find in sklearn
SVD and PCA − Standart tools for Matrix Factorization
TruncatedSVD − Works with sparse matrices
Non-negative Matrix Factorization (NMF) − Ensures that all latent factors are non-negative − Good for counts-like data

Feature interactions

We have a lot of possible interactions − N*N for N features
- Even more if use several types in interactions
Need to reduce its’ number
- Dimensionality reduction
- Feature selection
Interactions' order
- We looked at 2nd and higher order interactions.
- It is hard to do generation and selection automatically.
- Manual building of high-order interactions is some kind of art.

Frequent operations for feature interaction

Multiplication
Sum
Diff
Division

Example of interaction generation pipeline

Extract features from DT

get the index of the leaf that each sample is predicted as. it a method to get the high order features

1	tree.apply()

tSNE

Result heavily depends on hyperparameters (perplexity)
Good practice is to use several projections with different perplexities (5-100)
Due to stochastic nature, tSNE provides different projections even for the same data − Train and test should be projected together
tSNE runs for a long time with a big number of features − it is common to do dimensionality reduction before projection.

RUOCHI.AI

Advanced features