Pages

Sunday, November 4, 2012

Hadoop Installation


Install Java :

Download JDK1.6 and extracted it to \opt\java\jdk1.6.0_29 by running ./jdk-6u29-linux-i586.bin. To run java from any location, let’s create a symbolic link to it in /usr/bin as follows :
cd /usr/bin
sudo ln -s /opt/java/jdk1.6.0_29/bin/java java
This will ensure that Java can be run from any folder. Next, we want to set JAVA_HOME as an environment variable. Login as root and  gedit /etc/environment and append :export JAVA_HOME="/opt/java/jdk1.6.0_29 as follows :
Logout and log back in for the values to take effect.

Setup SSH :

Check if sshd is running by typing pgrep sshd.  If you get empty output, that means, sshd is not running.
Login as root and start sshd by typing : /etc/init.d/sshd start
To automatically start sshd on next reboot, type:  chkconfig sshd on
To generate ssh key, type : ssh-keygen -t rsa -P "" 
Next, you need to copy the generated rsa file to ssh's authorized keys folder :
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
(Note: If you have data nodes on different machines, then you will need to copy (using scp) id_rsa.pub to each data nodes’ authorized_keys folder).
Next, type ssh localhost and say 'Yes' to ‘Are you sure you want continue connecting ?'
The call "ssh localhost" should not be prompting for password. If it does, then, you may need to chage the permissions of ~/.ssh/authorized_keys folder to 600. Otherwise, you will be prompted password when starting hadoop scripts.

Hadoop Installation :

Download hadoop-0.20.2 from apache site and unzip it to /opt/hadoop-0.20.2. Edit the following configuration files in /opt/hadoop-0.20.2/conf folder :
core-site.xml :
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
hdfs-site.xml (I added dfs.name.dir and dfs.data.dir, because, the default tmp location contents gets erased after reboot):
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
 <property> 
   <name>dfs.name.dir</name> 
   <value>/home/user/hadoopFiles/dfs/name</value> 
 </property> 
 <property> 
   <name>dfs.data.dir</name> 
   <value>/home/user/hadoopFiles/dfs/data</value> 
 </property>
mapred-site.xml :
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
Next, we are going to format the Name node from /opt/hadoop-0.20.2 by running :
bin/hadoop namenode -format
To start hadoop , go to /opt/ hadoop-0.20.2 and run bin/start-all.sh
To check if the hadoop cluster is running, you can go to browser and check Namenode at: http://localhost:50070/ and jobtracker at: http://localhost:50030/
  
To shutdown the hadoop cluster, run : bin/stop-all.sh

No comments:

Post a Comment