Skip to contents

This is the extractor function for variable importance measures as produced by ODT and ODRF.

Usage

VarImp(obj, X = NULL, y = NULL, type = "permutation")

Arguments

obj

An object of class ODT and ODRF.

X

An n by d numerical matrix (preferably) or data frame is used in the ODRF.

y

A response vector of length n is used in the ODRF.

type

specifying the type of importance measure. "impurity": mean decrease in node impurity, "permutation" (default): mean decrease in accuracy.

Value

A matrix of importance measure, first column is the predictors and second column is Increased error. Misclassification rate (MR) for classification or mean square error (MSE) for regression. The larger the increased error the more important the variable is.

Details

A note from randomForest package, here are the definitions of the variable importance measures.

  • The first measure is the total decrease in node impurities from splitting on the variable, averaged over all trees. For classification, the node impurity is measured by the Gini index. For regression, it is measured by residual sum of squares.

  • The second measure is computed from permuting OOB data: For each tree, the prediction error on the out-of-bag portion of the data is recorded. Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees.

Examples

data(body_fat)
y=body_fat[,1]
X=body_fat[,-1]

tree <- ODT(X, y, split = "mse")
(varimp <- VarImp(tree, type="impurity"))
#> $varImp
#>         varible decrease_accuracy
#> BodyFat       1      5.985098e-03
#> Hip           8      1.388494e-05
#> Abdomen       7      8.268897e-06
#> Wrist        14      7.487498e-06
#> Height        4      7.191170e-06
#> Ankle        11      4.633147e-06
#> Biceps       12      3.746768e-06
#> Knee         10      1.684495e-06
#> Age           2      1.568993e-07
#> Neck          5      1.045500e-07
#> Thigh         9      8.405915e-08
#> Forearm      13      6.621724e-08
#> Chest         6      1.472028e-08
#> Weight        3      1.198327e-09
#> 
#> $split
#> [1] "mse"
#> 
#> attr(,"class")
#> [1] "VarImp"

forest <- ODRF(X, y, split = "mse", parallel = FALSE, ntrees=50)
(varimp <- VarImp(forest, type="impurity"))
#> $varImp
#>         varible decrease_accuracy
#> BodyFat       1      3.536018e-03
#> Weight        3      3.454507e-05
#> Abdomen       7      1.430011e-05
#> Height        4      1.294990e-05
#> Thigh         9      9.089121e-06
#> Ankle        11      6.935579e-06
#> Hip           8      6.807814e-06
#> Chest         6      5.835472e-06
#> Neck          5      4.937802e-06
#> Wrist        14      3.528603e-06
#> Knee         10      1.820711e-06
#> Biceps       12      1.444451e-06
#> Forearm      13      8.782072e-07
#> Age           2      3.991877e-07
#> 
#> $split
#> [1] "mse"
#> 
#> attr(,"class")
#> [1] "VarImp"
(varimp <- VarImp(forest, X, y, type="permutation"))
#> $varImp
#>         varible decrease_accuracy
#> BodyFat       1      6.895483e-04
#> Weight        3      3.540741e-06
#> Thigh         9      1.286703e-06
#> Hip           8      9.485006e-07
#> Ankle        11      7.919036e-07
#> Knee         10      6.297860e-07
#> Abdomen       7      5.934018e-07
#> Neck          5      5.512524e-07
#> Chest         6      5.354998e-07
#> Age           2      4.518536e-07
#> Forearm      13      1.970960e-07
#> Height        4     -2.007132e-07
#> Wrist        14     -3.065903e-07
#> Biceps       12     -4.805390e-07
#> 
#> $split
#> [1] "mse"
#> 
#> attr(,"class")
#> [1] "VarImp"