Apache Pig - Handling Compression



We can load and store compressed data in Apache Pig using the functions BinStorage() and TextLoader().

Example

Assume we have a file named employee.txt.zip in the HDFS directory /pigdata/. Then, we can load the compressed file into pig as shown below.

Using PigStorage: 
 
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/employee.txt.zip' USING PigStorage(','); 
 
Using TextLoader:
  
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/employee.txt.zip' USING TextLoader;

In the same way, we can store the compressed files into pig as shown below.

Using PigStorage:
  
grunt> store data INTO 'hdfs://localhost:9000/pig_Output/data.bz' USING PigStorage(',');
apache_pig_load_store_functions.htm
Advertisements