Decision Tree using CART algorithm Solved Example 2 – Loan Approval Data Set
In this tutorial, we will understand how to apply Classification And Regression Trees (CART) decision tree algorithm (Solved Example 2) to construct and find the optimal decision tree for the given Loan Approval Data set. Also, predict the class label for the given example…?
Age | Job | House | Credit | Loan Approved |
Young | False | No | Fair | No |
Young | False | No | Good | No |
Young | True | No | Good | Yes |
Young | True | Yes | Fair | Yes |
Young | False | No | Fair | No |
Middle | False | No | Fair | No |
Middle | False | No | Good | No |
Middle | True | Yes | Good | Yes |
Middle | False | Yes | Excellent | Yes |
Middle | False | Yes | Excellent | Yes |
Old | False | Yes | Excellent | Yes |
Old | False | Yes | Good | Yes |
Old | True | No | Good | Yes |
Old | True | No | Excellent | Yes |
Old | False | No | Fair | No |
Age | Job | House | Credit | Loan Approved |
Young | False | No | Good | ? |
Solution:
First, we need to Determine the root node of the tree
Start with any variable, in this case, Age. It can take three values: Young, Middle, and Old.
Start with the Young value of outlook. There are five instances where the Age is Young.
In two of the five instances, the loan approval decision was yes, and in the other three, the loan approval decision was no.
Thus, if the decision rule was that Age: Young → no, then three out of five loan approval decisions would be correct, while two out of five loan approval decisions would be incorrect. There are two errors out of five. This can be recorded in Row 1.
Similarly, we will write all rules for the Age attribute.
Age Attribute
Young | 5 | Yes | 2 |
No | 3 | ||
Middle | 5 | Yes | 3 |
No | 2 | ||
Old | 5 | Yes | 4 |
No | 1 |
Rules, individual error, and total for Age attribute
Attribute | Rules | Error | Total Error |
Age | Young->No | 2/5 | 5/15 |
Middle->Yes | 2/5 | ||
Old->Yes | 1/5 |
Job Attribute
False | 10 | Yes | 4 |
No | 6 | ||
True | 5 | Yes | 5 |
No | 0 |
Rules, individual error, and total for Job attribute
Attribute | Rules | Error | Total Error |
Job | False->No | 4/10 | 4/15 |
True->Yes | 0/5 |
House Attribute
No | 9 | Yes | 3 |
No | 6 | ||
Yes | 6 | Yes | 6 |
No | 0 |
Rules, individual error, and total for House attribute
Attribute | Rules | Error | Total Error |
House | No->No | 3/9 | 3/15 |
Yes->yes | 0/6 |
Credit Attribute
Fair | 5 | Yes | 1 |
No | 4 | ||
Good | 6 | Yes | 4 |
No | 2 | ||
Excellent | 4 | Yes | 4 |
No | 0 |
Rules, individual error, and total for Credit attribute
Attribute | Rules | Error | Total Error |
Credit | Fair->No | 1/5 | 3/15 |
Good->Yes | 2/6 | ||
Excellent->Yes | 0/4 |
Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.
Attribute | Rules | Error | Total Error |
Age | Young->No | 2/5 | 5/15 |
Middle->Yes | 2/5 | ||
Old->Yes | 1/5 | ||
Job | False->No | 4/10 | 4/15 |
True->Yes | 0/5 | ||
House | No->No | 3/9 | 3/15 |
Yes->yes | 0/6 | ||
Credit | Fair->No | 1/5 | 3/15 |
Good->Yes | 2/6 | ||
Excellent->Yes | 0/4 |
From the above table, we can notice that the attributes House and credit have the same minimum error that is 3/15 (3 errors out of 15 examples). Hence we consider the individual attribute value errors. Both House and credit have one rule which generates zero error that is the rule Yes → Yes in House attribute and Excellent → Yes in credit attribute. Again there is a tie. Here, with respect to house one rule is remaining, and with respect to credit, we have two rules with errors. Hence we consider House as the splitting attribute.
Now we build the tree with House as the root node. It has two branches for each possible value of the House attribute. As the rule, Yes → Yes generates zero error. When the house attribute value is Yes we get the result as Yes. For the remaining attribute value that is no, we consider the subset of data and continue building the tree. Tree with House as root node is,
Now, for the right subtree, we write all possible rules and find the total error. Based on the total error table, we will construct the tree.
Right subtree,
Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.
Attribute | Rules | Error | Total Error |
Age | Young->No | 1/4 | 2/9 |
Middle->No | 0/2 | ||
Old->Yes | 1/3 | ||
Job | False->No | 0/6 | 0/9 |
True->Yes | 0/3 | ||
Credit | Fair->No | 0/4 | 2/9 |
Good->Yes/No | 2/4 | ||
Excellent->Yes | 0/1 |
From the above table, we can notice that Job has the lowest error. Hence Job is considered as the splitting attribute. Also, when Job is False the answer is No as it produces zero errors. Similarly, when Job is True the answer is Yes, as it produces zero errors.
The final decision tree for the given Loan Approval data set is,
Also, from the above decision tree the prediction for the new example:
Age | Job | House | Credit | Loan Approved |
Young | False | No | Good | No |
Summary:
In this tutorial, we understood, how to apply Classification And Regression Trees (CART) decision tree algorithm (solved example 2) to construct and find the optimal decision tree for the Loan Approval data set. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.