Getting started with Hive

October 20, 2014 · by dbversity · in Hadoop

Hive is a data warehouse that uses MapReduce to analyze data stored on HDFS. In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard.
 
Prerequisites
 
Unlike Hadoop, there are no Hive masters, slaves, or nodes. Hive runs as a client application that processes HiveQL queries, converts them into MapReduce jobs, and submits these to a Hadoop cluster.
 
Although there is a mode suitable for small jobs and development usage, the usual situation is that Hive will require an existing functioning Hadoop cluster.
 
Just as other Hadoop clients don't need to be executed on the actual cluster nodes, Hive can be executed on any host where the following are true:.. Hadoop is installed on the host (even if no processes are running)
.. The HADOOP_HOME environment variable is set and points to the location of the Hadoop installation
.. The ${HADOOP_HOME}/bin directory is added to the system or user path
 
Install the tar as below.
 
[root@myhostname 2]# tar -zxvf apache-hive-0.13.1-bin.tar.gz
apache-hive-0.13.1-bin/conf/hive-log4j.properties.template
apache-hive-0.13.1-bin/conf/hive-exec-log4j.properties.template
apache-hive-0.13.1-bin/hcatalog/share/doc/hcatalog/README.txt
apache-hive-0.13.1-bin/hcatalog/share/hcatalog/storage-handlers/hbase/conf/revision-manager-site.xml
apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar
-- blah blah blah ....
 
done
 
[root@myhostname 2]# ll -lhtr
total 6.0G
-rw------- 1 root root 5.9G May 14 07:44 memsql.swp
-rw-r--r-- 1 root root 61M Jul 2 08:05 hadoop-1.2.1.tar.gz
drwxr-xr-x 3 root root 4.0K Jul 15 05:41 hadoop
-rw-rw-r-- 1 hpadmin hpadmin 52M Jul 23 09:20 apache-hive-0.13.1-bin.tar.gz
drwxr-xr-x 8 root root 4.0K Jul 23 09:23 apache-hive-0.13.1-bin
[root@myhostname 2]#
 
Move to hive folder as below.
 
[root@myhostname 2]#
[root@myhostname 2]# mv apache-hive-0.13.1-bin hive
[root@myhostname 2]#
[root@myhostname 2]# ll -lhtr
total 6.0G
-rw------- 1 root root 5.9G May 14 07:44 memsql.swp
-rw-r--r-- 1 root root 61M Jul 2 08:05 hadoop-1.2.1.tar.gz
drwxr-xr-x 3 root root 4.0K Jul 15 05:41 hadoop
-rw-rw-r-- 1 hpadmin hpadmin 52M Jul 23 09:20 apache-hive-0.13.1-bin.tar.gz
drwxr-xr-x 8 root root 4.0K Jul 23 09:23 hive
[root@myhostname 2]#
 
Export the HIVE path as below
 
[root@myhostname hive]# export HIVE_HOME={{pwd}}
[root@myhostname hive]# export PATH=$HIVE_HOME/bin:$PATH
[root@myhostname hive]#
 
Or place the environment variables as below
 
[root@myhostname hive]# vi ~/.bashrc
# .bashrc
 
 
# User specific aliases and functions
 
 
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
 
 
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
 
 
 
 
export HADOOP_PREFIX=/root/hadoop/
export JAVA_HOME=/usr/java/jdk1.8.0_05
export HIVE_HOME=/data/2/hive
export MONGO_HOME=/opt/mongodb/
export KRB5_KTNAME=/opt/mongodb/keytab/mongodb.keytab
export KRB5_TRACE=/tmp/krb5trace_mongo_`date +"%d-%m-%y-%H-%Ms"`.log
 
 
 
 
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$MONGO_HOME/bin
export PATH=$PATH:$HIVE_HOME/bin
~
~
~
"~/.bashrc" 26L, 576C written
[root@myhostname hive]#
[root@myhostname hive]# exec bash
[root@myhostname hive]#
[root@myhostname hive]# env | grep HIVE
HIVE_HOME=/data/2/hive
[root@myhostname hive]#
 
[root@myhostname hive]#
[root@myhostname hive]# hadoop fs -mkdir /tmp
[root@myhostname hive]# hadoop fs -mkdir /user/hive/warehouse
[root@myhostname hive]# hadoop fs -mkdir /data/2/hive/warehouse
[root@myhostname hive]# hadoop fs -chmod g+w /tmp
[root@myhostname hive]# hadoop fs -chmod g+w /user/hive/warehouse
[root@myhostname hive]# hadoop fs -chmod g+w /data/2/hive/warehouse
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]# hive
 
 
Logging initialized using configuration in jar:file:/data/2/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 
 
hive> CREATE TABLE ufodata(sighted STRING, reported STRING, sighting_location STRING, shape STRING, duration STRING,description STRING COMMENT 'Free text description') COMMENT 'The UFO data set.' ;
OK
Time taken: 0.985 seconds
 
 
hive> show tables;
OK
ufodata
Time taken: 0.167 seconds, Fetched: 1 row(s)
 
 
hive> show tables '.*data';
OK
ufodata
Time taken: 0.074 seconds, Fetched: 1 row(s)
 
 
hive> describe ufodata;
OK
sighted string 
reported string 
sighting_location string 
shape string 
duration string 
description string Free text description
Time taken: 0.977 seconds, Fetched: 6 row(s)
 
 
hive> describe extended ufodata;
OK
sighted string 
reported string 
sighting_location string 
shape string 
duration string 
description string Free text description
 
Detailed Table Information Table(tableName:ufodata, dbName:default, owner:root, createTime:1406122540, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:sighted, type:string, comment:null), FieldSchema(name:reported, type:string, comment:null), FieldSchema(name:sighting_location, type:string, comment:null), FieldSchema(name:shape, type:string, comment:null), FieldSchema(name:duration, type:string, comment:null), FieldSchema(name:description, type:string, comment:Free text description)], location:hdfs://master:9000/user/hive/warehouse/ufodata, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1406122540, comment=The UFO data set.}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.233 seconds, Fetched: 8 row(s)
 
 
hive> quit
 
 
[root@myhostname hive]#
[root@myhostname hive]# vi /tmp/ufo.tsv
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
~
 
 
~
"/tmp/ufo.tsv" [New] 24L, 841C written
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]# hadoop fs -ls /tmp
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]# hadoop fs -copyFromLocal /tmp/ufo.tsv /tmp
[root@myhostname hive]# hadoop fs -ls /tmp
Found 1 items
-rw-r--r-- 2 root supergroup 841 2014-07-23 09:40 /tmp/ufo.tsv
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]# hive
 
 
Logging initialized using configuration in jar:file:/data/2/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 
 
hive> LOAD DATA INPATH '/tmp/ufo.tsv' OVERWRITE INTO TABLE ufodata;
Loading data to table default.ufodata
Deleted hdfs://master:9000/user/hive/warehouse/ufodata
Table default.ufodata stats: [numFiles=1, numRows=0, totalSize=841, rawDataSize=0]
OK
Time taken: 2.604 seconds
hive> exit;
 
[root@myhostname hive]# hadoop fs -ls /tmp
[root@myhostname hive]#
 
 
Querying from the Linux prompt to check the count
 
[root@myhostname hive]#
[root@myhostname hive]# hive -e "select count(*) from ufodata;"
 
 
Logging initialized using configuration in jar:file:/data/2/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 
 
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
 set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
 set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
 set mapred.reduce.tasks=<number>
Starting Job = job_201407230605_0006, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201407230605_0006
Kill Command = /root/hadoop/libexec/../bin/hadoop job -kill job_201407230605_0006
 
 
 
 
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-23 09:45:38,825 Stage-1 map = 0%, reduce = 0%
2014-07-23 09:45:41,876 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.77 sec
 
 
 
 
2014-07-23 09:45:50,971 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 2.77 sec
2014-07-23 09:45:51,987 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.43 sec
MapReduce Total cumulative CPU time: 7 seconds 430 msec
Ended Job = job_201407230605_0006
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 7.43 sec HDFS Read: 1049 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 430 msec
OK
24
Time taken: 25.838 seconds, Fetched: 1 row(s)
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]# wc -l /tmp/ufo.tsv
24 /tmp/ufo.tsv
[root@myhostname hive]#
[root@myhostname hive]#
[root@myhostname hive]#
 
Querying from the Linux prompt
 
[root@myhostname hive]# hive -e "select sighted from ufodata limit 5;"
 
 
Logging initialized using configuration in jar:file:/data/2/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201407230605_0007, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201407230605_0007
Kill Command = /root/hadoop/libexec/../bin/hadoop job -kill job_201407230605_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-23 09:46:55,562 Stage-1 map = 0%, reduce = 0%
2014-07-23 09:46:58,616 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.54 sec
2014-07-23 09:47:01,648 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.54 sec
MapReduce Total cumulative CPU time: 2 seconds 540 msec
Ended Job = job_201407230605_0007
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.54 sec HDFS Read: 1049 HDFS Write: 175 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 540 msec
OK
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
cmpny Mumbai MH Circle 40Mins Crime
Time taken: 15.875 seconds, Fetched: 5 row(s)
[root@myhostname hive]#

Leave a Reply

← MongoDB Queries

NUMA (Non Uniform Memory Access) for MongoDB →