Hadoop Tutor: Apache PIG

Showing posts with label Apache PIG. Show all posts

Difference between MapReduce and PIG:

Both PIG and MapReduce does the same work. Both are used to process the data. When the PIG program is executed internally it converts into a MapReduce job and process the data. The following are the some of the differences between MR and PIG.

MapReduce	Apache PIG
MapReduce program expects the programming language skills for writing the business logic.	In Apache PIG there is no need of much programming skills. The entire program is based on PIG transformations.
Amount of code is very large; we must write huge programming code.	Amount of code is very less when compared to MapReduce program. 200 lines of MapReduce program is equivalent to 10 lines of Pig script.
MapReduce program is compiled and executed directly.	Pig script internally converts into MapReduce program and gets executed.
Writing and executing MapReduce programming is a bit complex task.	Writing and Executing PIG script is a simple task when compared with MapReduce.

Difference between MapReduce and PIG

Posted at 06:57 | in Apache PIG | Read More»

Difference between MapReduce and PIG:

MapReduce	Apache PIG
MapReduce program expects the programming language skills for writing the business logic.	In Apache PIG there is no need of much programming skills. The entire program is based on PIG transformations.
Amount of code is very large; we must write huge programming code.	Amount of code is very less when compared to MapReduce program. 200 lines of MapReduce program is equivalent to 10 lines of Pig script.
MapReduce program is compiled and executed directly.	Pig script internally converts into MapReduce program and gets executed.
Writing and executing MapReduce programming is a bit complex task.	Writing and Executing PIG script is a simple task when compared with MapReduce.

Different modes of Pig Execution:

Pig has two execution modes or types. They are:

Local Mode
MapReduce Mode

Now let us see each execution mode in detail.

Local Mode:

In Local Mode of Pig execution, all the input data will be taken from local file system. After execution it provides output on top of local file system. In local mode, Pig runs in a single JVM and accesses the local filesystem. This mode of suitable only for small datasets and when trying out Pig. To start the local mode of execution, the following command is used.

# pig -x local

The above command starts Grunt. Grunt is the Pig interactive shell.

MapReduce Mode/HDFS Mode/ Clustered Mode:

In this mode Apache Pig will take the input form HDFS paths only, and after processing data it will put output files on top of HDFS. In MapReduce mode of execution, Pig translates queries into MapReduce jobs and runs them on a Hadoop Cluster.

Different modes of Pig Execution

Posted at 19:58 | in Apache PIG | Read More»

Different modes of Pig Execution:

Pig has two execution modes or types. They are:

Local Mode
MapReduce Mode

Now let us see each execution mode in detail.

Local Mode:

# pig -x local

The above command starts Grunt. Grunt is the Pig interactive shell.

MapReduce Mode/HDFS Mode/ Clustered Mode:

Apache PIG Introduction:

Apache PIG is a transformative language. Initially PIG was developed at Yahoo laboratories, later in 2006 it was officially opted by Apache Software Foundation (ASF) Pig is high productive when compared to MapReduce. Pig raises the level of abstraction for processing Bigdata.

Apache Pig is one of the component of Hadoop. Pig is the high level language on top of MapReduce. It uses multiple transformations to process the data. The data flow in Pig is based on these transformations. So, we call Pig as Transformative Language/ DataFlow Language.

Initially Pig is called as Pig Latin. When compared to MapReduce, Pig reduces the size of code, 15 lines of Pig code is equal to nearly 200 lines of MapReduce code. When we run the Pig script, it will internally convert into MapReduce jobs.

As part of this tutorial you can have a look at the following topics: