Pig Latin Data Model

Pig Latin Data Model

the data model of Pig is fully nested. ARelationis the outermost structure of the Pig Latin data model. And it is abagwhere −

  • A bag is a collection of tuples.
  • A tuple is an ordered set of fields.
  • A field is a piece of data.

Pig Latin – Statemets

While processing data using Pig Latin,statementsare the basic constructs.

  • These statements work withrelations. They includeexpressionsandschemas.

  • Every statement ends with a semicolon (;).

  • We will perform various operations using operators provided by Pig Latin, through statements.

  • Except LOAD and STORE, while performing all other operations, Pig Latin statements take a relation as input and produce another relation as output.

  • As soon as you enter aLoadstatement in the Grunt shell, its semantic checking will be carried out. To see the contents of the schema, you need to use theDumpoperator. Only after performing thedumpoperation, the MapReduce job for loading the data into the file system will be carried out.

The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such asmapandtuple. Given below is the diagrammatical representation of Pig Latin’s data model.

Atom

Any single value in Pig Latin, irrespective of their data, type is known as anAtom. It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig. A piece of data or a simple atomic value is known as afield.

Example− ‘raja’ or ‘30’

Tuple

A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS.

Example− (Raja, 30)

Bag

A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

Example− {(Raja, 30), (Mohammad, 45)}

A bag can be a field in a relation; in that context, it is known asinner bag.

Example− {Raja, 30,{9848022338, [email protected],}}

Map

A map (or data map) is a set of key-value pairs. Thekeyneeds to be of type chararray and should be unique. Thevaluemight be of any type. It is represented by ‘[]’

Example− [name#Raja, age#30]

Relation

A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order).

results matching ""

    No results matching ""