How to install and Configure Hadoop in Ubuntu

 

How to install and Configure Hadoop in Ubuntu Step by Step Procedure – Shortcut Method

This article discusses, how you can install and configure Hadoop in Ubuntu in Just 20 minutes. This is the shortcut method to install Hadoop on Ubuntu Operating system. This article explains the step by step procedure to install Hadoop on Ubuntu operating system. Follow these simple steps to install Hadoop in just 20 minutes.

Video Tutorial – step by step procedure to Install and Configure Hadoop in Ubuntu

1. First install of Ubuntu 16.04 operating system.

2. Download the hadoop source file from the Apache website. To download click here.

2. Install nautilus – To install nautilus open terminal or press Ctrl+Alt+T shortcut to open terminal and type the following command hit the enter button to install nautilus,

sudo apt-get install nautilus

3. Type the following command to enter into nautilus,

sudo nautilus

4. Copy the downloaded Hadoop tar file into /usr/local directory and then extract the tar file into the /usr/local directory. (Note: /usr/local directory it will be located at “computer” drive.)

5. Now, open the terminal (you can use shortcut Ctrl+Alt+T to open terminal) and type the following commands. Enter the password and press “Y” if asked.

sudo apt-get update
sudo apt install default-jdk
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoopusr
sudo adduser hadoopusr sudo
sudo apt install openssh-server

Note: While installing the JDK and creating user it will ask for “Enter the new value or press ENTER for the default. Just press enter. Give blank for the Name, Number and phone and press “y”.

6. Once the hadoopusr user is created successfully, switch to the hadoopusr user by typing the following command and enter your password.

su hadoopusr

7. Now run the following commands to generate the RSA keys and authorize the keys, Enter blank while ssh key passphrase.

ssh-keygen -t rsa -P “”
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost

After completing the process type “exit” to come out of ssh shell.

8. Now run the following commands on the terminal to create local directories for namenode and datanote.

sudo mkdir -p /usr/local/hadoop_space
sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode
sudo chown -R hadoopusr /usr/local/

Now we configure the Hadoop installation

1. Type the following command in terminal to open the .bashrc file,

sudo gedit ~/.bashrc

Once the .bashrc file is opened, paste the below content at the end of the file and save the file by Ctrl+o and Hit Enter and again Hit Enter

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/native"

Type the following command and press enter to activate the most recent .bashrc file.

source ~/.bashrc

2. Type the following command to open hadoop-env.sh

sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

and locate and set export JAVA_HOME to, (Note: check the installed version in /usr/lib/jvm/ before you set the java path)

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

3. Type the following command to open core-site.xml

sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

and put the following content within <configuration> and </configuration>

<property>
    <name>fs.default.name</name>
	<value>hdfs://localhost:9000</value>
</property>

4. Type the following command to open hdfs-site.xml

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

and put the following content,

<property>
    <name>dfs.replication</name>
	<value>1</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
	<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name> 
	<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>

5. Type the following command to open yarn-site.xml

sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

and put the following content,

<property>
    <name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

6. Type the following command to open mapred-site.xml

sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

and put the following content,

<property>
    <name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>

Testing the Hadoop Installation

Now to test whether the Hadoop is installation run the Hadoop. Type the following commands and hit enter. Note: Now you are in haddopusr user. If you want to run the Hadoop, first you need to change the regular user to Hadoop user before you start running the Hadoop.

hdfs namenode -format
start-all.sh
jps

Finally, open the browser and type the following URL http://localhost:8088 to get the Hadoop graphical user interface.

Stop HDFS

stop-all.sh

Enjoy the Hadoop…..

Now if you want to restart the HDFS cluster again, first you need to remove the temporary datanode and namenode directories and then restart all the services. Use the following commands to remove temporary datanode and namenode directories and restart the services.

sudo rm -r /usr/local/hadoop_tmp/hdfs/
hdfs namenode -format
start-all.sh
jps

This article discusses the step by step procedure of procedure How to install and Configure Hadoop in Ubuntu. Don’t forget to give your comment and Subscribe to our YouTube channel for more videos and like the Facebook page for regular updates.

Leave a Comment

Your email address will not be published. Required fields are marked *