Implementation of Random Forest Classification in Python – Machine Learning
In this tutorial, we will understand the Implementation of Random Forest Classification in Python – Machine Learning.
Importing the Necessary libraries
To begin the implementation first we will import the necessary libraries like NumPy for numerical computation and pandas for reading the dataset.
import numpy as np import pandas as pd
Importing the dataset
Next, we import or read the dataset. Click here to download the breast cancer dataset used in this implementation. After reading the dataset, divide the dataset into concepts and targets. Store the concepts into X and targets into y.
dataset = pd.read_csv('Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values
Splitting the dataset into the Training set and Test set
Once the dataset is read into the memory, next, divide the dataset into two parts, training and testing using the train_test_split function from sklearn. The test_size and random_state attributes are set to 0.25 and 0 respectively. You can change these attributes as per your requirements.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
Feature Scaling
Feature scaling is the process of converting the data into a min-max range. In this case, the standard scalar method is used.
from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
Training the Random Forest Classification model on the Training set
Once the dataset is scaled, next, the Naive Bayes classifier algorithm is used to create a model. The GaussianNB function is imported from sklearn.naive_bayes library. The hyperparameters such as kernel, and random_state to linear, and 0 respectively. The remaining hyperparameters of the support vector machine algorithm are set to default values.
from sklearn.ensemble import RandomForestClassifier classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0) classifier.fit(X_train, y_train)
Random Forest classifier model
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None, oob_score=False, random_state=0, verbose=0, warm_start=False)
Display the results (confusion matrix and accuracy)
Here evaluation metrics such as confusion matrix and accuracy are used to evaluate the performance of the model built using a decision tree classifier.
from sklearn.metrics import confusion_matrix, accuracy_score y_pred = classifier.predict(X_test) cm = confusion_matrix(y_test, y_pred) print(cm) accuracy_score(y_test, y_pred)
Output
Confusion Matrix of Random Forest Classifier
[[102, 5]
[ 6, 58]]
Accuracy of Random Forest Classifier: 0.935672514619883
Summary:
In this tutorial, we understood, the Implementation of Random Forest Classification in Python. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.