Hadoop Streaming with Python

October 30, 2014 · by dbversity · in Hadoop

Hadoop Streaming Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows developers to create an run Map/Reduce jobs with any executable or script as the ampper and/or the reducer. For example: hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.5.0.jar \…

Monitoring Hadoop from the browser

October 21, 2014 · by dbversity · in Hadoop

Hadoop provides two web interfaces that you should become familiar with, one for HDFS and the other for MapReduce. Both are useful in pseudo-distributed mode and are critical tools when you have a fully distributed setup. The HDFS web UI…

Administering Hadoop

October 20, 2014 · by dbversity · in Hadoop

Namenode directory structure :- —————————————— A newly formatted namenode creates the following directory structure: ${dfs.name.dir}/current/VERSION /edits /fsimage /fstime In my machine [root@myhostname current]# pwd /data/2/hadoop/tmp/dfs/name/current [root@myhostname current]# [root@myhostname current]# ll -lhtr total 16K -rw-r–r– 1 root root 110 Jul 22…

MapReduce Job [hadoop]

October 20, 2014 · by dbversity · in Hadoop

Running our first MapReduce job We will use the WordCount example job which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word…

Getting started with Hive

October 20, 2014 · by dbversity · in Hadoop

Hive is a data warehouse that uses MapReduce to analyze data stored on HDFS. In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard. Prerequisites Unlike Hadoop, there are no Hive…

Hadoop Map-reduce

October 20, 2014 · by dbversity · in Hadoop

Map step: mapper.py It will read data from STDIN, split it into words and output a list of lines mapping words to their (intermediate) counts to STDOUT. The Map script will not compute an (intermediate) sum of a word’s occurrences…

Write file to HDFS/Hadoop Read File From HDFS/Hadoop Using Java

October 9, 2014 · by dbversity · in Hadoop

import java.io.File; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.DistributedFileSystem; /** * * @author Srinivas * @email srinivas@dbversity.com * @Web www.dbversity.com */ public class WritetoHDFSReadFromHDFSWritToLocal { private…

Writing Hadoop MapReduce Program in Python

October 9, 2014 · by dbversity · in Hadoop

Prevent accidental data loss in Hadoop

October 9, 2014 · by dbversity · in Hadoop

Some time you have accidentally delete some file which you are not suppose to do. So what will you do in that case? The option with you is to enable trash (Recycle bin) for hadoop, and define fs.trash.interval to one…

Warning: $HADOOP_HOME is deprecated

October 9, 2014 · by dbversity · in Hadoop

Do you have below warnings issue for every command with your Hadoop set-up ? Warning: $HADOOP_HOME is deprecated. [root@hostname logs]# hadoop dfs -ls / Warning: $HADOOP_HOME is deprecated. Found 3 items drwxr-xr-x – root supergroup 0 2014-07-09 09:06 /hadoop drwxr-xr-x…

Hadoop Streaming with Python

Monitoring Hadoop from the browser

Administering Hadoop

MapReduce Job [hadoop]

Getting started with Hive

Hadoop Map-reduce

Write file to HDFS/Hadoop Read File From HDFS/Hadoop Using Java

Writing Hadoop MapReduce Program in Python

Prevent accidental data loss in Hadoop

Warning: $HADOOP_HOME is deprecated

Categories

Categories