LOAD & STORE OPERATORS

LOAD Operator:

You can load data into Apache Pig from the file system (HDFS/ Local) usingLOADoperator ofPig Latin.

Syntax

The load statement consists of two parts divided by the “=” operator. On the left-hand side, we need to mention the name of the relationwherewe want to store the data, and on the right-hand side, we have to definehowwe store the data. Given below is the syntax of theLoadoperator.

Relation_name = LOAD 'Input file path' USING function as schema;

Where,

  • relation_name− We have to mention the relation in which we want to store the data.

  • Input file path− We have to mention the HDFS directory where the file is stored. (In MapReduce mode)

  • function− We have to choose a function from the set of load functions provided by Apache Pig (BinStorage, JsonLoader, PigStorage, TextLoader).

  • Schema− We have to define the schema of the data. We can define the required schema as follows −

(column1 : data type, column2 : data type, column3 : data type);

Note− We load the data without specifying the schema. In that case, the columns will be addressed as $01, $02, etc… (check).

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt'

USING PigStorage(',')

as ( id:int, firstname:chararray, lastname:chararray, phone:chararray,

city:chararray );


STORE Operator

Given below is the syntax of the Store statement.

STORE Relation_name INTO ' required_directory_path ' [USING function];

Example

Assume we have a file student_data.txt in HDFS with the following content.

001,Rajiv,Reddy,9848022337,Hyderabad

002,siddarth,Battacharya,9848022338,Kolkata

003,Rajesh,Khanna,9848022339,Delhi

004,Preethi,Agarwal,9848022330,Pune

005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar

006,Archana,Mishra,9848022335,Chennai.

And we have read it into a relation student using the LOAD operator as shown below.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt'

USING PigStorage(',')

as ( id:int, firstname:chararray, lastname:chararray, phone:chararray,

city:chararray );

Now, let us store the relation in the HDFS directory “/pig_Output/” as shown below.

grunt> STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');

Output

After executing the store statement, you will get the following output. A directory is created with the specified name and the data will be stored in it.

results matching ""

    No results matching ""