Hadoop Tutor: MapReduce

Showing posts with label MapReduce. Show all posts

A MapReduce Program consists of three different phases. They are:

Mapper
Sort and Shuffle
Reducer

Among all the three phases, Mapper and Reducer are the direct implementation with respect to coding, where as the Sort and Shuffle phase acts as a glue between Mapper and Reducer. As a developer we are responsible to write the code for Mapper and Reducer phases. The following figure shows how MapReduce processes data.

Data processing in MapReduce

The above figure shows that, Mapper phase takes the input in the form of key value pairs (K,V), and generate the output in the form of (K,V).

Sort and Shuffle phase takes the input as (K,V) and generates the output int the form of Key and List of Value pairs (K, List(v)).

Reducer phase takes the input as (K, List(v)), and generates the output as (K,V). Reducer phase output is the Final Output.

As of now you may not understand the diagram clearly, but just have an idea, you can get the clear picture in the future posts.

Phases of MapReduce

Posted at 21:52 | in MapReduce | Read More»

A MapReduce Program consists of three different phases. They are:

Mapper
Sort and Shuffle
Reducer

Data processing in MapReduce

The above figure shows that, Mapper phase takes the input in the form of key value pairs (K,V), and generate the output in the form of (K,V).

Sort and Shuffle phase takes the input as (K,V) and generates the output int the form of Key and List of Value pairs (K, List(v)).

Reducer phase takes the input as (K, List(v)), and generates the output as (K,V). Reducer phase output is the Final Output.

As of now you may not understand the diagram clearly, but just have an idea, you can get the clear picture in the future posts.

Hadoop Framework mainly based on two ecosystems, They are HDFS and MapReduce. HDFS is meant for storage and MapRduce is meant for processing. As of now we have seen the storage part i.e, HDFS, now let us have a look at the processing part i.e, MapReduce.

In the Hadoop world, MapReduce is considered as one of the major component. MapReduce is responsible for the processing of huge amount of data which get stored on top of HDFS. It is also responsible for parallel processing. MapReduce achieves parallel processing by the means of splits, i.e, all the data is divided into multiple chunks and the processing will be done on each in a parallel fashion.

MapReduce is a programming model for data processing. Even if it is a programming, it is very simple. Hadoop accepts MR programs written in different languages. Mostly people use java to write MR programs.

MapReduce Introduction

Posted at 20:51 | in MapReduce | Read More»

About-Privacy Policy-Contact us

Proudly Powered by Blogger.

Phases of MapReduce

MapReduce Introduction

Search This Blog

Follow in Facebook

Popular Posts

About Me