Data Preprocessing in Data Mining

Introduction to Data Preprocessing – Feature Engineering and Feature Selection in Data Mining

In this article, I will discuss,

Motivation for Data Preprocessing,
Steps in Data Preprocessing

Motivation for Data Preprocessing

Real-world datasets are highly influenced by negative factors such as the presence of noise, missing values, redundancy, outliers, and inconsistencies. A low-quality dataset will leads to poor performance or failure of machine learning or deep learning project.

Now a day’s, a large number of Machine Learning, Deep Learning, and transfer learning algorithms were designed. But the success or failure of these models largely depends on the quality of the data set used and the features selected.

Hence, Data Preprocessing also known as Feature Engineering & Feature Selection plays a very important stage in building a useable machine learning or deep learning project.

Video Tutorial:

There are mainly two steps in data preprocessing:

Data Preparation
Data Reduction

Following are the forms of Data Preparation

Data Cleaning

Data cleaning is the process of Correcting the bad data, filter out incorrect data from the data set, and reduce the unnecessary detail of data.

Data Transformation

Data Transformation is the process of consolidation of data so that the mining process result could be applied or maybe more efficient.

Data Integration

Collecting and Merging the data from multiple data stores.

Data Normalization

Data Normalization is the process to express data in the same measurements such as units, scale, or range.

Missing Data Imputation

The collected data may contain missing values, Imputation method is used to fill the variables that contain missing values with some intuitive data.

Noise Identification

To detect random errors or variances in a measured variable.

Following are the Forms of Data Reduction

Feature Selection

Achieves the reduction of the data set by removing irrelevant or redundant features (or dimensions).

Instance Selection

Consists of choosing a subset of the total available data to achieve the original purpose of the application as if the whole data had been used.

Discretization

Transforms quantitative data into qualitative data, that is, numerical attributes into nominal attributes with a finite number of intervals.

Feature Extraction/Instance Generation –Extends both the feature and instance selection by allowing the modification of the internal values that represent each example or attribute.

Summary

This article introduces Data Preprocessing – FeatureEngineering and Feature Selection in Data Mining. If you like the material share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.

Data Preprocessing in Data Mining

Computer Graphics OpenGL Mini Projects

Download Final Year Projects

Introduction to Data Preprocessing – Feature Engineering and Feature Selection in Data Mining

Motivation for Data Preprocessing

Video Tutorial:

Following are the forms of Data Preparation

Following are the Forms of Data Reduction

Summary

Related Posts

Leave a Comment Cancel Reply

Tutorials

Our Services

Join us at

Contact Us

Computer Graphics OpenGL Mini Projects

Download Final Year Projects

Introduction to Data Preprocessing – Feature Engineering and Feature Selection in Data Mining

Motivation for Data Preprocessing

Video Tutorial:

Following are the forms of Data Preparation

Following are the Forms of Data Reduction

Summary

Related Posts

Leave a Comment Cancel Reply

Welcome to VTUPulse.com

Computer Graphics and Image Processing Mini Projects -> Click Here

Download Final Year Project -> Click Here