Administering Hadoop

Namenode directory structure :- —————————————— A newly formatted namenode creates the following directory structure: ${dfs.name.dir}/current/VERSION /edits /fsimage /fstime In my machine [root@myhostname current]# pwd /data/2/hadoop/tmp/dfs/name/current [root@myhostname current]# [root@myhostname current]# ll -lhtr total 16K -rw-r–r– 1 root root 110 Jul 22…

MapReduce Job [hadoop]

Running our first MapReduce job We will use the WordCount example job which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word…

Getting started with Hive

Hive is a data warehouse that uses MapReduce to analyze data stored on HDFS. In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard. Prerequisites Unlike Hadoop, there are no Hive…

Hadoop Map-reduce

Map step: mapper.py It will read data from STDIN, split it into words and output a list of lines mapping words to their (intermediate) counts to STDOUT. The Map script will not compute an (intermediate) sum of a word’s occurrences…

Write file to HDFS/Hadoop Read File From HDFS/Hadoop Using Java

import java.io.File; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.DistributedFileSystem; /** * * @author    Srinivas * @email     srinivas@dbversity.com * @Web       www.dbversity.com */ public class WritetoHDFSReadFromHDFSWritToLocal {     private…

Writing Hadoop MapReduce Program in Python

Map step: mapper.py It will read data from STDIN, split it into words and output a list of lines mapping words to their (intermediate) counts to STDOUT. The Map script will not compute an (intermediate) sum of a word’s occurrences…

Prevent accidental data loss in Hadoop

Some time you have accidentally delete some file which you are not suppose to do. So what will you do in that case? The option with you is to enable trash (Recycle bin) for hadoop, and define fs.trash.interval to one…

Warning: $HADOOP_HOME is deprecated

Do you have below warnings issue for every command with your Hadoop set-up ? Warning: $HADOOP_HOME is deprecated. [root@hostname logs]# hadoop dfs -ls / Warning: $HADOOP_HOME is deprecated. Found 3 items drwxr-xr-x – root supergroup 0 2014-07-09 09:06 /hadoop drwxr-xr-x…

Hadoop – Single node set-up

< Hadoop – Single node set-up document > 1) Create hadoop group & user [root@myhostname hpadmin]# groupadd hadoop [root@myhostname hpadmin]# [root@myhostname hpadmin]# adduser -g hadoop hadoop OR [root@myhostname hpadmin]# adduser -g hadoop hduser [root@myhostname hpadmin]# 2) Download & Install the…

[Hadoop] HBase Installation

http://www.apache.org/dyn/closer.cgi/hbase/ [root@my-master-host 2]# tar -zxvf hbase-0.98.3-hadoop1-bin.tar.gz [root@my-master-host 2]# ll -lhtr -rw-rw-r– 1 hpadmin hpadmin 64M Jul 14 05:44 hbase-0.98.3-hadoop1-bin.tar.gz drwxr-xr-x 7 root root 4.0K Jul 14 05:48 hbase-0.98.3-hadoop1 [root@my-master-host 2]# [root@my-master-host 2]# [root@my-master-host 2]# [root@my-master-host 2]# cd hbase-0.98.3-hadoop1 [root@my-master-host hbase-0.98.3-hadoop1]#…