How to install and Configure Hadoop in Ubuntu Step by Step Procedure – Shortcut Method
This article discusses, how you can install and configure Hadoop in Ubuntu in Just 20 minutes. This is the shortcut method to install Hadoop on Ubuntu Operating system. This article explains the step by step procedure to install Hadoop on Ubuntu operating system. Follow these simple steps to install Hadoop in just 20 minutes.
Video Tutorial – step by step procedure to Install and Configure Hadoop in Ubuntu
1. First install of Ubuntu 16.04 operating system.
2. Download the hadoop source file from the Apache website. To download click here.
2. Install nautilus – To install nautilus open terminal or press Ctrl+Alt+T shortcut to open terminal and type the following command hit the enter button to install nautilus,
sudo apt-get install nautilus
3. Type the following command to enter into nautilus,
sudo nautilus
4. Copy the downloaded Hadoop tar file into /usr/local directory and then extract the tar file into the /usr/local directory. (Note: /usr/local directory it will be located at “computer” drive.)
5. Now, open the terminal (you can use shortcut Ctrl+Alt+T to open terminal) and type the following commands. Enter the password and press “Y” if asked.
sudo apt-get update sudo apt install default-jdk sudo addgroup hadoop sudo adduser --ingroup hadoop hadoopusr sudo adduser hadoopusr sudo sudo apt install openssh-server
Note: While installing the JDK and creating user it will ask for “Enter the new value or press ENTER for the default. Just press enter. Give blank for the Name, Number and phone and press “y”.
6. Once the hadoopusr user is created successfully, switch to the hadoopusr user by typing the following command and enter your password.
su hadoopusr
7. Now run the following commands to generate the RSA keys and authorize the keys, Enter blank while ssh key passphrase.
ssh-keygen -t rsa -P “” cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh localhost
After completing the process type “exit” to come out of ssh shell.
8. Now run the following commands on the terminal to create local directories for namenode and datanote.
sudo mkdir -p /usr/local/hadoop_space sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode sudo chown -R hadoopusr /usr/local/
Now we configure the Hadoop installation
1. Type the following command in terminal to open the .bashrc file,
sudo gedit ~/.bashrc
Once the .bashrc file is opened, paste the below content at the end of the file and save the file by Ctrl+o and Hit Enter and again Hit Enter
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/native"
Type the following command and press enter to activate the most recent .bashrc file.
source ~/.bashrc
2. Type the following command to open hadoop-env.sh
sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
and locate and set export JAVA_HOME to, (Note: check the installed version in /usr/lib/jvm/ before you set the java path)
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
3. Type the following command to open core-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
and put the following content within <configuration> and </configuration>
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
4. Type the following command to open hdfs-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
and put the following content,
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value> </property>
5. Type the following command to open yarn-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
and put the following content,
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
6. Type the following command to open mapred-site.xml
sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
and put the following content,
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
Testing the Hadoop Installation
Now to test whether the Hadoop is installation run the Hadoop. Type the following commands and hit enter. Note: Now you are in haddopusr user. If you want to run the Hadoop, first you need to change the regular user to Hadoop user before you start running the Hadoop.
hdfs namenode -format start-all.sh jps
Finally, open the browser and type the following URL http://localhost:8088 to get the Hadoop graphical user interface.
Stop HDFS
stop-all.sh
Enjoy the Hadoop…..
Now if you want to restart the HDFS cluster again, first you need to remove the temporary datanode and namenode directories and then restart all the services. Use the following commands to remove temporary datanode and namenode directories and restart the services.
sudo rm -r /usr/local/hadoop_tmp/hdfs/ hdfs namenode -format start-all.sh jps