Writing Hadoop MapReduce Program in Python

Map step: mapper.py It will read data from STDIN, split it into words and output a list of lines mapping words to their (intermediate) counts to STDOUT. The Map script will not compute an (intermediate) sum of a word’s occurrences…

Prevent accidental data loss in Hadoop

Some time you have accidentally delete some file which you are not suppose to do. So what will you do in that case? The option with you is to enable trash (Recycle bin) for hadoop, and define fs.trash.interval to one…

Warning: $HADOOP_HOME is deprecated

Do you have below warnings issue for every command with your Hadoop set-up ? Warning: $HADOOP_HOME is deprecated. [root@hostname logs]# hadoop dfs -ls / Warning: $HADOOP_HOME is deprecated. Found 3 items drwxr-xr-x – root supergroup 0 2014-07-09 09:06 /hadoop drwxr-xr-x…

NUMA (Non Uniform Memory Access) for MongoDB

From our internal testing we did see some performance boost when NUMA was disabled using -> numactl –interleave=all ….. Below are the details from MongoDB http://docs.mongodb.org/manual/administration/production-notes/ Running MongoDB on a system with Non-Uniform Access Memory (NUMA) can cause a number…

Openings with MongoDB for Developers

MongoDB Do you want to work at MongoDB, Inc. and be part of a revolution in database software? Check out our open positions buff.ly/1owu5RP Development Should Be Simple and Beautiful buff.ly MongoDB would not be the leading NoSQL database without…

Linux ‘sed’ command

sed (stream editor) is a Unix utility that parses and transforms text, using a simple, compact programming language. sed was based on the scripting features of the interactive editor ed (“editor”, 1971) and the earlier qed (“quick editor”, 1965–66). sed…

SQL access to MongoDB using SonarSQL (JSON Studio)

RDBMS – MongoDB ETL Demo http://jsonstudio.com/rdbms-mongodb-etl-demo/ SQL access to MongoDB using SonarSQL Demo http://jsonstudio.com/sql-access-mongodb/     

SQL Server Columnstore Indexes

This discussion is a starting point for providing the inital platform for knowledge sharing & discussion of various aspects regarding newly introduced MS SQL Server 2012/2014 Columnstore Indexes . Overview: A Columnstore index can be defined as a technology for…

Oracle Vs Opensource DBs

Oracle Vs Opensource DBs

Considering Oracle’s increase in their Li sense cost – the Open source databases like PostgreSQL & MariaDB can be partially replace Oracle in your some of your Applications/Projects. Please check the below comparison in their features. We observed they both are good…

How to do Yahoo! Cloud System Benchmark (YCSB) for Oracle DB

How to do Yahoo! Cloud System Benchmark (YCSB) for Oracle DB

Yahoo! Cloud System Benchmark (YCSB) for Oracle DB :-           Sample Reports      Description W/T – secs Threads# 10 Threads# 25 Threads# 30 Threads# 50 Update heavy workload (50%R : 50% W) Workload# a 1777…