Decision Tree using CART algorithm Solved Example 3

In this tutorial, we will understand how to apply Classification And Regression Trees (CART) decision tree algorithm (Solved Example 3) to construct and find the optimal decision tree for the given Data set with City Size, Avg. Income, Local Investors, LOHAS Awareness attributes. Also, predict the class label for the given example…?

City Size	Avg. Income	Local Investors	LOHAS Awareness	Decision
Big	High	Yes	High	Yes
Medium	Med	No	Med	No
Small	Low	Yes	Low	No
Big	High	No	High	Yes
Small	Med	Yes	High	No
Med	High	Yes	Med	Yes
Med	Med	Yes	Med	No
Big	Med	No	Med	No
Med	High	Yes	Low	No
Small	High	No	High	Yes
Small	Med	No	High	No
Med	Heigh	No	Med	No

City Size	Avg. Income	Local Investors	LOHAS Awareness	Decision
Med	Med	No	Med	?

Solution:

First, we need to Determine the root node of the tree

Start with any variable, in this case, City Size. It can take three values: Big, Medium, and Small.

Start with the Big value of outlook. There are three instances where the City Size is Big.

In one of the three instances, the decision was no, and in the other two, the decision was yes.

Thus, if the decision rule was that City Size: Big → Yes, then two out of three decisions would be correct, while one out of three decisions would be incorrect. There is one error out of three instances. This can be recorded in Row 1.

Similarly, we will write all rules for the City Size attribute.

City Size Attribute

Big	3	Yes	2
		No	1
Medium	5	Yes	1
		No	4
Small	4	Yes	1
		No	3

Rules, individual error, and total for City Size attribute

Attribute	Rules	Error	Total Error
City Size	Big->Yes	1/3	3/12
	Medium->No	1/5
	Small->No	1/4

Average Income Attribute

High	6	Yes	4
		No	2
Medium	5	Yes	0
		No	5
Low	1	Yes	0
		No	1

Rules, individual error, and total for Average Income attribute

Attribute	Rules	Error	Total Error
Average Income	High->Yes	2/6	2/12
	Medium->No	0/5
	Low->No	0/1

Local Investors Attribute

Yes	6	Yes	2
		No	4
No	6	Yes	2
		No	4

Rules, individual error, and total for Local Investors attribute

Attribute	Rules	Error	Total Error
Local Investors	Yes->No	2/6	4/12
	No->No	2/6

Lohas Awareness Attribute

High	5	Yes	3
		No	2
Med	5	Yes	1
		No	4
Low	2	Yes	0
		No	2

Rules, individual error, and total for Lohas Awareness attribute

Attribute	Rules	Error	Total Error
Lohas Awareness	High->Yes	2/5	3/12
	Med->No	1/5
	Low->No	0/2

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

Attribute	Rules	Error	Total Error
City Size	Big->Yes	1/3	3/12
	Medium->No	1/5
	Small->No	1/4
Avg. Inc.	High->Yes	2/6	2/12
	Medium->No	0/5
	Low->No	0/1
Local Investors	Yes->No	2/6	4/12
	Yes->No	2/6
Lohas Awareness	High->Yes	2/5	3/12
	Med->No	1/5
	Low->No	0/2

From the above table, we can notice that the attributes Average Income has the minimum error that is 2/12 (2 errors out of 12 examples).

Now we build the tree with Average Income as the root node. It has three branches for each possible value of the Average Income attribute. As the rule, Medium->No generates zero error. When the Average Income attribute value is Medium we get the result as No. For the remaining attribute value that is High and Low, we consider the subset of data and continue building the tree. Tree with Average Income as root node is,

Now, for the middle subtree, we write all possible rules and find the total error. Based on the total error table, we will construct the tree.

Middle subtree with Average income -> High,

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

Attribute	Rules	Error	Total Error
City Size	Big->Yes	0/2	1/6
	Medium->No	1/3
	Small->Yes	0/1
Local Investors	Yes->Yes	1/3	2/6
	No->Yes	1/3
Lohas Awareness	High->Yes	0/3	1/6
	Med->No/Yes	1/2
	Low->No	0/1

From the above table, we can notice that City Size and Lohas Awareness have the same lowest error that is 1/6. Also, both attributes have two rules with zero errors. Hence we have a tie again. The number of rue with errors is 1. Again we have a tie. Now, we will check a number of examples to be considered for both attributes. With respect to City Size, we have left with 3 examples, and Lohas Awareness we have left with 2 examples. Hence we consider Lohas Awareness as the splitting attribute. The Lohas Awareness has three attribute values High, Med, Low. The attribute values High and low generates no error.

The tree with Lohas Awareness as splitting attribute is shown below,

Middle subtree with Lohas Awareness -> Med,

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

Attribute	Rules	Error	Total Error
City Size	Med->Yes/No	1/2	1/2
Local Investor	Yes->Yes	0/2	0/2
	No->No	0/1

From the above table, we can notice that Local Investor has the lowest error that is 0/2. Hence we consider the Local Investor as the splitting attribute. Both the rules of attribute Local Investor generate zero errors.

The final decision tree for the given data set is,

Also, from the above decision tree the prediction for the new example:

City Size	Avg. Income	Local Investors	LOHAS Awareness	Decision
Med	Med	No	Med	No

Summary:

In this tutorial, we understood, how to apply Classification And Regression Trees (CART) decision tree algorithm (solved example 3) to construct and find the optimal decision tree for the given Data set with City Size, Avg. Income, Local Investors, LOHAS Awareness attributes. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.

Decision Tree using CART algorithm Solved Example 3

Computer Graphics OpenGL Mini Projects

Download Final Year Projects