To perform some essential data cleansing, you need to go deeper to extract additional value from the data that improves the performance of your machine learning model. In machine learning, this process is called feature engineering. In this article, I’ll walk you through what is feature engineering in machine learning.
What is Feature Engineering?
In any problematic area, specific knowledge goes into choosing which data to collect, and this valuable domain knowledge can also be used to extract value from the data collected, in effect adding to the functionality of the model before the model is built. In Machine Learning we call this process as Feature Engineering.
Feature engineering is the most important step in the machine learning workflow. It’s the creative part of machine learning too, where you can use your knowledge and imagination to find ways to improve the model by digging into the data and extracting hidden value.
Here are some important examples of feature engineering:
Date and Time:
You’ll see a date or time variable in many datasets, but by themselves, they’re not useful for machine learning algorithms, which tend to require raw numbers or categories. Information can be valuable, however.
If you want to predict which ad to run, knowing the time, day of the week, and time of year will be important. With feature engineering, this information can be extracted from dates and times and made available to the model.
Additionally, when dates and times appear in observations of repetitive activities, such as a user’s repeated visits to a website over a month or year, they can be used to calculate interval durations that can be predictive. For example, on a shopping site, users can visit more frequently just before buying to review and compare items and prices.
Location data, such as latitude/longitude coordinates or place names, is available in some datasets. Sometimes this information can be used on its own, but you might be able to extract additional information useful for a specific problem.
For example, if you want to predict election results in a county, you can extract the population density, average income, and poverty rate to use as numbers in your model.
This is data such as text, documents, images and videos. Engineering the features that make this kind of data usable is the difficult part of projects like the cat and dog contest.
Edges, shapes and colour spectra are first extracted from the images. Then these are classified using mathematical transformations, the output of which is a set of features that can be used by classification algorithms.
Hopefully, feature engineering can be important for real-world Machine Learning projects. Hope you liked this article on the concept of feature engineering in machine learning. Please feel free to ask your valuable questions in the comments section below.