Advanced features

Advanced Features

Statistics and distance based features

Groupby feature

gb = df.groupby(['user_id','page_id'], as_index = False).agg({'ad_price': {'max_price': np.max, 'min_price': np.min}})
gb.columns = ['user_id','page_id','min_price','max_price']

  • How many pages user visited
  • Standard deviation of prices
  • Most visited page
  • Many, many more


  • Explicit group is not needed
  • More flexible
  • Much harder to implement

such as:

  • Number of houses in 500m, 1000m,..
  • Average price per square meter in 500m, 1000m,..
  • Number of schools/supermarkets/parking lots in 500m, 1000m,..
  • Distance to closest subway station

KNN features as example:

  • Mean encode all the variables
  • For every point, find 2000 nearest neighbors using Bray-Curtis metric
  • Calculate various features from those 2000 neighbors
    • Mean target of nearest 5,10,15,500, 2000 neighbors
    • Mean distance to 10 closest neighbors
    • Mean distance to 10 closest neighbors with target 1
    • Mean distance to 10 closest neighbors with target 0

Matrix Factorizations for Feature Extraction

  • Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
  • It can be applied for transforming categorical features into real-valued
  • Many of tricks trick suitable for linear models can be useful for MF

  • Can be apply only for some columns
  • Can provide additional diversity
    − Good for ensembles
  • It is a lossy transformation. Its’ efficiency depends on:
    − Particulartask
    − Numberoflatentfactors(Usually 5-100)
  • Several MF methods you can find in sklearn
  • SVD and PCA
    − Standart tools for Matrix Factorization
  • TruncatedSVD
    − Works with sparse matrices
  • Non-negative Matrix Factorization (NMF)
    − Ensures that all latent factors are non-negative
    − Good for counts-like data

Feature interactions

  • We have a lot of possible interactions − N*N for N features
    • Even more if use several types in interactions
  • Need to reduce its’ number
    • Dimensionality reduction
    • Feature selection
  • Interactions’ order
    • We looked at 2nd and higher order interactions.
    • It is hard to do generation and selection automatically.
    • Manual building of high-order interactions is some kind of art.

Frequent operations for feature interaction

  • Multiplication
  • Sum
  • Diff
  • Division

Example of interaction generation pipeline

Extract features from DT

get the index of the leaf that each sample is predicted as. it a method to get the high order features



  • Result heavily depends on hyperparameters (perplexity)
  • Good practice is to use several projections with different perplexities (5-100)
  • Due to stochastic nature, tSNE provides different projections even for the same data\hyperparams
    − Train and test should be projected together
  • tSNE runs for a long time with a big number of features
    − it is common to do dimensionality reduction before projection.