Hyperparameters tuning
Don’t spend too much time tuning hyperparameters, Only if you don’t have any more ideas or you have spare computational resources
General pipeline
- Select the most influential parameters
- There are tons of parameters and we can’t tune all of them
- Understand, how exactly they influence the training
- A parameter in red
- Increase it to underfitting
- Increasing it impedes fitting
- Decrease to allow model fit easier
- A parameter in green
- Increasing it leads to a better fit(overfit) on trainset
- Increase it,if model underfits
- Decreaseifoverfits
- A parameter in red
- Tune them
- Manually(change and examine)
- Automatically(hyperopt,etc.)
- Hyperopt
- Scikit-optimize
- Spearmint
- GPyOpt
- RoBO
- SMAC3
Tree based model
GBDT
XGBoost | LightGBM |
---|---|
max_depth | max_depth/num_leaves |
subsample | bagging_fraction |
colsample_bytree/colsample_bylevel | feature_fraction |
min_child_weight | min_data_in_leaf |
lambda, alpha | lambda, alpha |
eta | learning_rate |
num_round | num_iterations |
seed | seed |
RandomForest/ExtraTrees
- N_estimators (the higher the better)
- cmax_depth
- cmax_features
- cmin_samples_leaf
- criterion ('gini' is better in most of time)
- random_state
- n_jobs
Neural Nets
- Number of neurons per layer
- Number of layers
- Optimizers
- SGD + momentum
- Adam/Adadelta/Adagrad/...
- In practice lead to more overfitting
- Batch size
- Learning rate (not too high or not too low, depend on other parameters)
- Regularization
- L2/L1 for weights
- Dropout/Dropconnect
- Static dropconnect
Linear Model
Regularization parameter (C, alpha, lambda, ...) - Start with very small value and increase it. - SVC starts to work slower as C increases
Regularization type - L1/L2/L1+L2 -- try each - L1 can be used for feature selection