Apache Pig - PigStorage()


Advertisements

The PigStorage() function loads and stores data as structured text files. It takes a delimiter using which each entity of a tuple is separated as a parameter. By default, it takes ‘\t’ as a parameter.

Syntax

Given below is the syntax of the PigStorage() function.

grunt> PigStorage(field_delimiter)

Example

Let us suppose we have a file named student_data.txt in the HDFS directory named /data/ with the following content.

001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata 
003,Rajesh,Khanna,9848022339,Delhi  
004,Preethi,Agarwal,9848022330,Pune 
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai.

We can load the data using the PigStorage function as shown below.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

In the above example, we have seen that we have used comma (‘,’) delimiter. Therefore, we have separated the values of a record using (,).

In the same way, we can use the PigStorage() function to store the data in to HDFS directory as shown below.

grunt> STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');

This will store the data into the given directory. You can verify the data as shown below.

Verification

You can verify the stored data as shown below. First of all, list out the files in the directory named pig_output using ls command as shown below.

$ hdfs dfs -ls 'hdfs://localhost:9000/pig_Output/'
 
Found 2 items 
rw-r--r- 1 Hadoop supergroup 0 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
 
rw-r--r- 1 Hadoop supergroup 224 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000

You can observe that two files were created after executing the Store statement.

Then, using the cat command, list the contents of the file named part-m-00000 as shown below.

$ hdfs dfs -cat 'hdfs://localhost:9000/pig_Output/part-m-00000'
  
1,Rajiv,Reddy,9848022337,Hyderabad  
2,siddarth,Battacharya,9848022338,Kolkata  
3,Rajesh,Khanna,9848022339,Delhi  
4,Preethi,Agarwal,9848022330,Pune  
5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar 
6,Archana,Mishra,9848022335,Chennai
apache_pig_load_store_functions.htm
Advertisements