"Cluster-based Feature Selection"

Xin Man and Ernest P. Chan

Abstract

Feature importance in machine learning indicates how much information a feature contributes when building a supervised learning model, so we can exclude uninformative features from the predictive model (“feature selection”). It also improves human interpretability of the resulting model. Recently, Man & Chan (2021) compared the stability of features selected by different methods such as MDA, SHAP, or LIME when they are subject to the computational randomness of the selection algorithms. In this article, we study whether the cluster-based MDA (cMDA) method proposed by López de Prado, M. (2020) improves predictive performance, feature stability, and model interpretability. We applied cMDA to two synthetic datasets, a clinical public dataset and two financial datasets. In all cases, the stability and interpretability of the cMDA-selected features are superior to MDA-selected features.



Please provide the following details to download the paper: