My IT Journey : My First Hadoop Program with MRUnit

My first Hadoop program with MRUnit.
May be it is useful to the new hadoop developer starting from level Zero.

Here I am going to describe my first Hadoop program with MRUnit Test.

Prerequisites:

1. Oracle Virtual Box.

2. Download & Install Cloudera VM in Oracle Virtual Box [ Here I used on hadoop 3.2.0 specific vm]

3. Download and place Jar Files for MRUnit Test in lib folder.

commons-logging-1.1.1.jar , commons-logging-1.1.1-sources.jar , hamcrest-core-1.1.jar ,junit-4.10.jar , log4j-1.2.15.jar , mockito-all-1.8.5.jar , mrunit-0.9.0-incubating-hadoop2.jar

Steps:

In Eclipse create new a Java project named "FirstHadoopProgram" with package name com.tamil.

Add libraries using project>>properties>>Java Build path >> Libraries >> Add External JARs...

Select all jar files present in /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/client folder.

Add all MRUnit Jar files from lib folder.

Source Files :

Java Files

Copy all the files into project & Run the WordCountMapperTest .java file.

Now we need to create a jar file to run this program in Hadoop cluster.

1. Create a folder in test in desktop.

2. Place all the java files ( except JUnit test file ) in to the test folder.

3. Create a folder "wordcount" inside test folder.

4. compile the java code :

javac -cp `hadoop classpath` -d wordcount/ WordCountMapper.java WordCountReducer.java WordCountDriver.java

All the files will be compiled and class file will be placed into wordcount folder.

5. Create jar file

jar -cvf wordcount.jar -C wordcount/ .

added manifest

adding: com/(in = 0) (out= 0)(stored 0%)

adding: com/tamil/(in = 0) (out= 0)(stored 0%)

adding: com/tamil/WordCountReducer.class(in = 1612) (out= 677)(deflated 58%)

adding: com/tamil/WordCountMapper.class(in = 1886) (out= 861)(deflated 54%)

adding: com/tamil/WordCountDriver.class(in = 1744) (out= 942)(deflated 45%)

6. Create and Move your Text file(MyText.txt) into HDFS to test our MapReduce program.

hadoop fs -mkdir /user/cloudera/myInput

hadoop fs -put MyText.txt /user/cloudera/myInput/

hadoop fs -ls /user/cloudera/myInput/

Found 1 items

-rw-r--r-- 1 cloudera cloudera 141 2014-09-01 03:23 /user/cloudera/myInput/MyText.txt

7. Run Hadoop Jar file in HDFS:

$ hadoop jar wordcount.jar com.tamil/WordCountDriver /user/cloudera/myInput/ /user/cloudera/myOutput

14/09/01 03:29:18 INFO client.RMProxy: Connecting to ResourceManager at localhost.localdomain/127.0.0.1:8032

14/09/01 03:29:18 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

14/09/01 03:29:18 INFO input.FileInputFormat: Total input paths to process : 1

14/09/01 03:29:18 INFO mapreduce.JobSubmitter: number of splits:1

14/09/01 03:29:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1409554960178_0001

14/09/01 03:29:19 INFO impl.YarnClientImpl: Submitted application application_1409554960178_0001

14/09/01 03:29:19 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1409554960178_0001/

14/09/01 03:29:19 INFO mapreduce.Job: Running job: job_1409554960178_0001

14/09/01 03:29:25 INFO mapreduce.Job: Job job_1409554960178_0001 running in uber mode : false

14/09/01 03:29:25 INFO mapreduce.Job: map 0% reduce 0%

14/09/01 03:29:30 INFO mapreduce.Job: map 100% reduce 0%

14/09/01 03:29:35 INFO mapreduce.Job: map 100% reduce 100%

14/09/01 03:29:35 INFO mapreduce.Job: Job job_1409554960178_0001 completed successfully

14/09/01 03:29:36 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=67

FILE: Number of bytes written=183335

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=272

HDFS: Number of bytes written=33

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=699648

Total time spent by all reduces in occupied slots (ms)=757760

Total time spent by all map tasks (ms)=2733

Total time spent by all reduce tasks (ms)=2960

Total vcore-seconds taken by all map tasks=2733

Total vcore-seconds taken by all reduce tasks=2960

Total megabyte-seconds taken by all map tasks=699648

Total megabyte-seconds taken by all reduce tasks=757760

Map-Reduce Framework

Map input records=3

Map output records=7

Map output bytes=52

Map output materialized bytes=63

Input split bytes=131

Combine input records=7

Combine output records=6

Reduce input groups=6

Reduce shuffle bytes=63

Reduce input records=6

Reduce output records=6

Spilled Records=12

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=50

CPU time spent (ms)=1110

Physical memory (bytes) snapshot=459587584

Virtual memory (bytes) snapshot=1806524416

Total committed heap usage (bytes)=355467264

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=141

File Output Format Counters

Bytes Written=33

8. Check the result in HDFS newly created folder by Mapreduce program

$ hadoop fs -ls /user/cloudera/myOutput

Found 2 items

-rw-r--r-- 1 cloudera cloudera 0 2014-09-01 03:29 /user/cloudera/myOutput/_SUCCESS

-rw-r--r-- 1 cloudera cloudera 33 2014-09-01 03:29 /user/cloudera/myOutput/part-r-00000

9. Print the result in console

$ hadoop fs -cat /user/cloudera/myOutput/part-r-00000

I 1

It 1

am 1

is 2

it 1

useful 1

Wow... Great work... Its working... :-)

My IT Journey

Monday, 1 September 2014

My First Hadoop Program with MRUnit

2 comments: