Hadoop Cluster set-up document

October 30, 2014 · by dbversity · in Hadoop

Running Hadoop on RHEL Linux (Multi-Node Cluster)
 
Here's we'll see how to set-up multi-node Apache Hadoop cluster backed by the Hadoop Distributed File System (HDFS), running on RHEL Linux.
 
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System (GFS) and of the MapReduce computing paradigm. Hadoop’s HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to
 
In a previous tutorial, I described how to setup up a Hadoop single-node cluster on an RHEL box. The main goal of this tutorial is to get a more sophisticated Hadoop installation up and running, namely building a multi-node cluster using below two RHEL boxes.
 
slave-host
master-host
 
This tutorial has been tested with the following software versions:
 
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Hadoop hadoop-1.2.1
 
 
Prerequisites
 
Configuring two single-node clusters first
 
Refer my previous Hadoop - Single node set-up document in www.dbversity.com in Hadoop category
 
It is recommended that you use the ‘‘same settings’’ (e.g., installation locations and paths) on both machines, or otherwise you might run into problems later when we will migrate the two machines to the final multi-node cluster setup.
Just keep in mind when setting up the single-node clusters that we will later connect and “merge” the two machines, so pick reasonable network settings etc. now for a smooth transition later.
 
Now that you have two single-node clusters up and running, we will modify the Hadoop configuration to make one RHEL box the “master” (which will also act as a slave) and the other RHEL box a “slave”.
 
Note: We will call the designated master machine just the “master“ from now on and the slave-only machine the “slave“.
We will also give the two machines these respective hostnames in their networking setup, most notably in “/etc/hosts“.
Shutdown each single-node cluster with bin/stop-all.sh before continuing if you haven’t done so already.
 
Update /etc/hosts on both machines with the following lines:
 
At Master Box
 
[root@master-host hadoop-1.2.1]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
 
10.20.30.11 master-host.nam.nsroot.net master-host
10.20.30.22 slave-host.nam.nsroot.net slave-host
 
10.20.30.11 master
10.20.30.22 slave
 
[root@master-host hadoop-1.2.1]#
 
At Slave Box
 
[root@slave-host hpadmin]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
 
10.20.30.11 master-host.nam.nsroot.net master-host
10.20.30.22 slave-host.nam.nsroot.net slave-host
 
10.20.30.11 master
10.20.30.22 slave
[root@slave-host hpadmin]#
 
 
Enable pawssword-less SSH if you've not done already.
 
The hduser user on the master (aka hduser@master) must be able to connect a) to its own user account on the master – i.e. ssh master in this context and not necessarily ssh localhost – and b) to the hduser user account on the slave (aka hduser@slave) via a password-less SSH login. If you followed my single-node cluster tutorial, you just have to add the hduser@master’s public SSH key (which should be in $HOME/.ssh/id_rsa.pub) to the authorized_keys file of hduser@slave (in this user’s $HOME/.ssh/authorized_keys). You can do this manually or use the following SSH command:
 
Distribute the SSH public key of hduser@master
 
hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
This command will prompt you for the login password for user hduser on slave, then copy the public SSH key for you, creating the correct directory and fixing the permissions as necessary.
 
The final step is to test the SSH setup by connecting with user hduser from the master to the user account hduser on the slave. The step is also needed to save slave’s host key fingerprint to the hduser@master’s known_hosts file.
 
So, connecting from master to master…
 
You may also refer to SSH set-up in dbversity.com
 
[root@master-host hadoop-1.2.1]# ssh slave hostname
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Kernel \r on an \m
 
==================== WARNING - Incomplete SOE Build ====================
 
 
If you need more help, please contact "*GT DCA Unix Support" or your
regional DCA admin team.
 
slave-host
[root@master-host hadoop-1.2.1]#
 
[root@slave-host hadoop-1.2.1]# ssh master hostname
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Kernel \r on an \m
 
==================== WARNING - Incomplete SOE Build ====================
 
 
If you need more help, please contact "*GT DCA Unix Support" or your
regional DCA admin team.
 
master-host
[root@slave-host hadoop-1.2.1]#
 
 
 
 
Hadoop
 
Cluster Overview 
 
The next sections will describe how to configure one RHEL box as a master node and the other RHEL box as a slave node. The master node will also act as a slave because we only have two machines available in our cluster but still want to spread data storage and processing to multiple machines.
 
The master node will run the “master” daemons for each layer: NameNode for the HDFS storage layer, and JobTracker for the MapReduce processing layer. Both machines will run the “slave” daemons: DataNode for the HDFS layer, and TaskTracker for MapReduce processing layer.
 
Basically, the “master” daemons are responsible for coordination and management of the “slave” daemons while the latter will do the actual data storage and data processing work.
 
Masters vs. Slaves
 
Typically one machine in the cluster is designated as the NameNode and another machine the as JobTracker, exclusively. These are the actual “master nodes”. The rest of the machines in the cluster act as both DataNode and TaskTracker.
These are the slaves or “worker nodes”.
 
 
Configuration
 
conf/masters (master only)
 
Despite its name, the conf/masters file defines on which machines Hadoop will start secondary NameNodes in our multi-node cluster. In our case, this is just the master machine. The primary NameNode and the JobTracker will always be the machines on which you run the bin/start-dfs.sh and bin/start-mapred.sh scripts, respectively (the primary NameNode and the JobTracker will be started on the same machine if you run bin/start-all.sh).
 
To start Manually on a machine
bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode | jobtracker | tasktracker] --- which will not take the “conf/masters“ and “conf/slaves“ files into account.
 
Again, the machine on which bin/start-dfs.sh is run will become the primary NameNode.
 
On master, update conf/masters that it looks like this:
 
[root@master-host hadoop-1.2.1]# cat conf/masters
master
[root@master-host hadoop-1.2.1]#
 
conf/slaves (master only)
The conf/slaves file lists the hosts, one per line, where the Hadoop slave daemons (DataNodes and TaskTrackers) will be run. We want both the master box and the slave box to act as Hadoop slaves because we want both of them to store and process data.
 
On master, update conf/slaves that it looks like this:
 
[root@master-host hadoop-1.2.1]# cat conf/slaves
master
slave
[root@master-host hadoop-1.2.1]#
 
If you have additional slave nodes, just add them to the conf/slaves file, one hostname per line.
 
Note: The conf/slaves file on master is used only by the scripts like bin/start-dfs.sh or bin/stop-dfs.sh. For example, if you want to add DataNodes on the fly (which is not described in this tutorial yet), you can “manually” start the DataNode daemon on a new slave machine via bin/hadoop-daemon.sh start datanode. Using the conf/slaves file on the master simply helps you to make “full” cluster restarts easier.
conf/*-site.xml (all machines)
 
You must change the configuration files conf/core-site.xml, conf/mapred-site.xml and conf/hdfs-site.xml on ALL machines as follows.
 
First, we have to change the fs.default.name parameter (in conf/core-site.xml), which specifies the NameNode (the HDFS master) host and port. In our case, this is the master machine.
 
[root@master-host hadoop-1.2.1]# cat conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
 
<configuration>
 <property>
 <name>fs.default.name</name>
<value>hdfs://master:9000</value>
 </property>

<! -- To override the default hadoop tmp directory and to use your own directory -->

<property>
 <name>hadoop.tmp.dir</name>
 <value>/data/2/hadoop/tmp</value>
 </property>

</configuration>
 
[root@master-host hadoop-1.2.1]#
 
[root@slave-host hadoop-1.2.1]# cat conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
 
 <property>
 <name>fs.default.name</name>
<value>hdfs://master:9000</value>
 </property>
<! -- To override the default hadoop tmp directory and to use your own directory -->

<property>
 <name>hadoop.tmp.dir</name>
 <value>/data/2/hadoop/tmp</value>
 </property>
 
</configuration>
[root@slave-host hadoop-1.2.1]#
 
 
Second, we have to change the mapred.job.tracker parameter (in conf/mapred-site.xml), which specifies the JobTracker (MapReduce master) host and port. Again, this is the master in our case.
 
[root@master-host hadoop-1.2.1]# cat conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
 
<configuration>
 <property>
 <name>mapred.job.tracker</name>
<value>master:9001</value>
 </property>
</configuration>
 
[root@master-host hadoop-1.2.1]#
 
Third, we change the dfs.replication parameter (in conf/hdfs-site.xml) which specifies the default block replication. It defines how many machines a single file should be replicated to before it becomes available. If you set this to a value higher than the number of available slave nodes (more precisely, the number of DataNodes), you will start seeing a lot of “(Zero targets found, forbidden1.size=1)” type errors in the log files.
 
The default value of dfs.replication is 3. However, we have only two nodes available, so we set dfs.replication to 2.
 
[root@master-host hadoop-1.2.1]# cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
 
 
<configuration>
 <property>
 <name>dfs.replication</name>
<value>2</value>
 
<description>
 
Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
 
 </description>
 </property>
</configuration>
[root@master-host hadoop-1.2.1]#
 
[root@slave-host hadoop-1.2.1]# cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
 
<property>
 <name>dfs.replication</name>
<value>2</value>
 
<description>Default block replication.
 The actual number of replications can be specified when the file is created.
 The default is used if replication is not specified in create time.
 
 </description>
 
 </property>
 
</configuration>
[root@slave-host hadoop-1.2.1]#
 
We can disable IPV6 thru the hadoop-env.sh as below.

[root@master-host conf]# grep "export" hadoop-env.sh | grep -v "#"
export JAVA_HOME=/usr/java/jdk1.8.0_05
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
[root@master-host conf]#

Additional Settings [ Optional ]
There are some other configuration options worth studying. The following information is taken from the Hadoop API Overview.
 
In file conf/mapred-site.xml:
 
“mapred.local.dir“
Determines where temporary MapReduce data is written. It also may be a list of directories.
“mapred.map.tasks“
As a rule of thumb, use 10x the number of slaves (i.e., number of TaskTrackers).
“mapred.reduce.tasks“
As a rule of thumb, use num_tasktrackers * num_reduce_slots_per_tasktracker * 0.99. If num_tasktrackers is small (as in the case of this tutorial), use (num_tasktrackers - 1) * num_reduce_slots_per_tasktracker.
 
 
Formatting the HDFS filesystem via the NameNode
Before we start our new multi-node cluster, we must format Hadoop’s distributed filesystem (HDFS) via the NameNode. You need to do this the first time you set up an Hadoop cluster.
 
Warning: Do not format a running cluster because this will erase all existing data in the HDFS filesytem!
To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable on the NameNode), run the command
 
 
[root@master-host bin]# jps
14802 Jps
 
[root@master-host bin]#
[root@master-host bin]#
[root@master-host bin]# hadoop namenode -format
14/07/09 02:35:46 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master-host/10.20.30.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_05
************************************************************/
Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) Y
14/07/09 02:35:49 INFO util.GSet: Computing capacity for map BlocksMap
14/07/09 02:35:49 INFO util.GSet: VM type = 64-bit
14/07/09 02:35:49 INFO util.GSet: 2.0% max memory = 932184064
14/07/09 02:35:49 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/07/09 02:35:49 INFO util.GSet: recommended=2097152, actual=2097152
14/07/09 02:35:49 INFO namenode.FSNamesystem: fsOwner=root
14/07/09 02:35:49 INFO namenode.FSNamesystem: supergroup=supergroup
14/07/09 02:35:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/07/09 02:35:49 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/07/09 02:35:49 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/07/09 02:35:49 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/07/09 02:35:49 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/07/09 02:35:49 INFO common.Storage: Image file /tmp/hadoop-root/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
14/07/09 02:35:49 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
14/07/09 02:35:49 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
14/07/09 02:35:49 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
14/07/09 02:35:49 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master-host/10.20.30.11
************************************************************/
[root@master-host bin]#
 
 
Background: The HDFS name table is stored on the NameNode’s (here: master) local filesystem in the directory specified by dfs.name.dir. The name table is used by the NameNode to store tracking and coordination information for the DataNodes.
 
 
 
 
Starting the multi-node cluster
 
Starting the cluster is performed in two steps.
 
We begin with starting the HDFS daemons: the NameNode daemon is started on master, and DataNode daemons are started on all slaves (here: master and slave).
Then we start the MapReduce daemons: the JobTracker is started on master, and TaskTracker daemons are started on all slaves (here: master and slave).
 
HDFS daemons
 
Run the command bin/start-dfs.sh on the machine you want the (primary) NameNode to run on. This will bring up HDFS with the NameNode running on the machine you ran the previous command on, and DataNodes on the machines listed in the conf/slaves file.
 
In our case, we will run bin/start-dfs.sh on master:
 
Start the HDFS layer
 
 
[root@master-host bin]# ./start-dfs.sh
 
 
starting namenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-master-host.out
master: Red Hat Enterprise Linux Server release 6.4 (Santiago)
master: Kernel \r on an \m
master: starting datanode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-master-host.out
slave:
slave:
slave: starting datanode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave-host.out
master: Red Hat Enterprise Linux Server release 6.4 (Santiago)
master: Kernel \r on an \m
master:
master: starting secondarynamenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-master-host.out
[root@master-host bin]#
[root@master-host bin]# jps
16372 SecondaryNameNode
16454 Jps
16200 DataNode
16009 NameNode
 
[root@master-host bin]#
 
 
On slave, you can examine the success or failure of this command by inspecting the log file logs/hadoop-hduser-datanode-slave.log.
 
Example output:
 
 
[root@slave-host bin]# tail -20f ../logs/hadoop-root-datanode-slave-host.log
2014-07-09 02:43:35,599 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave-host.nam.nsroot.net:50010, storageID=, infoPort=50075, ipcPort=50020)
2014-07-09 02:43:35,633 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id DS-179444144-10.20.30.22-50010-1404888215614 is assigned to data-node 10.20.30.22:50010
2014-07-09 02:43:35,633 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished generating blocks being written report for 1 volumes in 0 seconds
2014-07-09 02:43:35,641 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 0ms
2014-07-09 02:43:35,641 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.20.30.22:50010, storageID=DS-179444144-10.20.30.22-50010-1404888215614, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/tmp/hadoop-root/dfs/data/current'}
2014-07-09 02:43:35,643 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-07-09 02:43:35,643 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2014-07-09 02:43:35,646 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2014-07-09 02:43:35,646 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2014-07-09 02:43:35,646 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2014-07-09 02:43:35,646 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2014-07-09 02:43:35,655 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks took 0 msec to generate and 5 msecs for RPC and NN processing
2014-07-09 02:43:35,657 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner
2014-07-09 02:43:35,659 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated rough (lockless) block report in 1 ms
 
[root@slave-host bin]# jps
9845 DataNode
11654 Jps
[root@slave-host bin]#
 
At this point, the following Java processes should run on master & slave
 
[root@master-host bin]# jps
16372 SecondaryNameNode
16454 Jps
16200 DataNode
16009 NameNode
[root@master-host bin]#
 
[root@slave-host bin]# jps
9845 DataNode
11654 Jps
[root@slave-host bin]#
 
 
Sometimes, you may end-up with below errors. Either you may need to fix it by changing the compatible namespaceID in /tmp/hadoop-root/dfs/data/current/VERSION file or rebuild it by removing
 
2014-07-09 02:37:22,051 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-07-09 02:37:22,481 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode namespaceID = 1827152891; datanode namespaceID = 561603049
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
 
 
[root@12d4-dl585-04 bin]# cat /tmp/hadoop-root/dfs/data/current/VERSION
#Wed Jul 09 02:44:03 EDT 2014
namespaceID=22164723
storageID=DS-432013441-10.40.87.37-50010-1404888243597
cTime=0
storageType=DATA_NODE
layoutVersion=-41
[root@12d4-dl585-04 bin]#
 
 
MapReduce daemons
 
Run the command bin/start-mapred.sh on the machine you want the JobTracker to run on. This will bring up the MapReduce cluster with the JobTracker running on the machine you ran the previous command on, and TaskTrackers on the machines listed in the conf/slaves file.
 
In our case, we will run bin/start-mapred.sh on master:
 
 
 
[root@master-host bin]# ./start-mapred.sh
 
 
starting jobtracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-master-host.out
master: Red Hat Enterprise Linux Server release 6.4 (Santiago)
master: Kernel \r on an \m
slave:
slave: starting tasktracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave-host.out
master:
master:
master: starting tasktracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-master-host.out
[root@master-host bin]#
[root@master-host bin]#
[root@master-host bin]# jps
16372 SecondaryNameNode
17061 JobTracker
16200 DataNode
16009 NameNode
17370 Jps
17245 TaskTracker
6734 -- process information unavailable
[root@master-host bin]#
 
 
On slave, you can examine the success or failure of this command by inspecting the log file logs/hadoop-hduser-tasktracker-slave.log. Example output:
 
 
[root@slave-host bin]# jps
12083 Jps
9845 DataNode
11951 TaskTracker
[root@slave-host bin]#
[root@slave-host bin]# cat ../logs/hadoop-root-tasktracker-slave-host.log
2014-07-09 03:10:57,488 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG: host = slave-host/10.20.30.22
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_05
************************************************************/
2014-07-09 03:10:57,804 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-07-09 03:10:57,926 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2014-07-09 03:10:57,928 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-07-09 03:10:57,928 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2014-07-09 03:10:58,254 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2014-07-09 03:10:58,291 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-07-09 03:10:58,577 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-07-09 03:10:58,678 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-07-09 03:10:58,724 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-07-09 03:10:58,734 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as root
2014-07-09 03:10:58,737 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-root/mapred/local
2014-07-09 03:10:58,755 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2014-07-09 03:10:58,757 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2014-07-09 03:10:58,783 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2014-07-09 03:10:58,810 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort35390 registered.
2014-07-09 03:10:58,811 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort35390 registered.
2014-07-09 03:10:58,815 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-07-09 03:10:58,816 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 35390: starting
2014-07-09 03:10:58,818 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 35390: starting
2014-07-09 03:10:58,818 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 35390: starting
2014-07-09 03:10:58,818 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 35390: starting
2014-07-09 03:10:58,818 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 35390: starting
2014-07-09 03:10:58,818 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:35390
2014-07-09 03:10:58,819 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_slave-host.nam.nsroot.net:localhost/127.0.0.1:35390
2014-07-09 03:10:58,901 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_slave-host.nam.nsroot.net:localhost/127.0.0.1:35390
2014-07-09 03:10:58,924 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2014-07-09 03:10:58,935 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6ac13091
2014-07-09 03:10:58,943 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled.
2014-07-09 03:10:58,954 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2014-07-09 03:10:58,977 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ShuffleServerMetrics registered.
2014-07-09 03:10:58,985 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50060
2014-07-09 03:10:58,987 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50060 webServer.getConnectors()[0].getLocalPort() returned 50060
2014-07-09 03:10:58,987 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060
2014-07-09 03:10:58,987 INFO org.mortbay.log: jetty-6.1.26
2014-07-09 03:10:59,488 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50060
2014-07-09 03:10:59,488 INFO org.apache.hadoop.mapred.TaskTracker: FILE_CACHE_SIZE for mapOutputServlet set to : 2000
[root@slave-host bin]#
 
That's it .. your Hadoop cluster is now good to Go !

Leave a Reply

← HOW DO WE EXCLUDE SQL FROM REPLICATING IN MYSQL

Apache Hadoop 2 & Apache Hadoop YARN videos →