When we build machine learning models with the aim to use them in production, we probably don’t want to use all the variables available in the data. Sure, adding more variables rarely makes a model less accurate, but there are certain disadvantages to including an excess of features. To select the most predictive variables, we can use several feature selection algorithms. They are typically grouped in 3 categories, filter, wrapper and embedded methods, and those algorithms that do not fit in these categories are sort of hybrid methods. In this video, I first discuss the importance of feature selection and then go through the categories of feature selection methods and describe the most popular algorithms of each. I will also compare the implementation of these feature selection algorithms in open source Python libraries.
More data science events, resources and blogs to interest you 👀