Pages

Monday, November 12, 2012

Pig latin

Pig is the data flow system for Hadoop. It provides a way to execute map reduce jobs without writing code in Java. Pig comes with a scripting language called Pig Latin. Pig provides an abstraction on top of map reduce. Pig interpreter, which runs outside of the hadoop system, decomposes a pig latin script into map reduce job and submits it to the hadoop cluster. Pig latin is suitable for people familiar with scripting languages. While both hive and pig provides abstraction on top of hadoop map-reduce, one big difference between the two is that pig does not have any concept of metadata. Pig loads datasets that can be modified using pig latin scripts. The pig latin scripts can be used to complex processing such as joins, group by, order by, etc., using simple constructs. Users can also create custom user defined functions and use them in pig latin scripts. There is an open source pig function library called piggy bank that can be downloaded freely.

No comments:

Post a Comment