
- Apache Pig Tutorial
- Apache Pig - Home
- Apache Pig Introduction
- Apache Pig - Overview
- Apache Pig - Architecture
- Apache Pig Environment
- Apache Pig - Installation
- Apache Pig - Execution
- Apache Pig - Grunt Shell
- Pig Latin
- Pig Latin - Basics
- Load & Store Operators
- Apache Pig - Reading Data
- Apache Pig - Storing Data
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Combining & Splitting
- Apache Pig - Union Operator
- Apache Pig - Split Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Other Modes Of Execution
- Apache Pig - User-Defined Functions
- Apache Pig - Running Scripts
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Apache Pig - BinStorage()
The BinStorage() function is used to load and store the data into Pig using machine readable format. BinStorge() in Pig is generally used to store temporary data generated between the MapReduce jobs. It supports multiple locations as input.
Syntax
Given below is the syntax of the BinStorage() function.
grunt> BinStorage();
Example
Assume that we have a file named stu_data.txt in the HDFS directory /pig_data/ as shown below.
Stu_data.txt
001,Rajiv_Reddy,21,Hyderabad 002,siddarth_Battacharya,22,Kolkata 003,Rajesh_Khanna,22,Delhi 004,Preethi_Agarwal,21,Pune 005,Trupthi_Mohanthy,23,Bhuwaneshwar 006,Archana_Mishra,23,Chennai 007,Komal_Nayak,24,trivendram 008,Bharathi_Nambiayar,24,Chennai
Let us load this data into Pig into a relation as shown below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/stu_data.txt' USING PigStorage(',') as (id:int, firstname:chararray, age:int, city:chararray);
Now, we can store this relation into the HDFS directory named /pig_data/ using the BinStorage() function.
grunt> STORE student_details INTO 'hdfs://localhost:9000/pig_Output/mydata' USING BinStorage();
After executing the above statement, the relation is stored in the given HDFS directory. You can see it using the HDFS ls command as shown below.
$ hdfs dfs -ls hdfs://localhost:9000/pig_Output/mydata/ Found 2 items -rw-r--r-- 1 Hadoop supergroup 0 2015-10-26 16:58 hdfs://localhost:9000/pig_Output/mydata/_SUCCESS -rw-r--r-- 1 Hadoop supergroup 372 2015-10-26 16:58 hdfs://localhost:9000/pig_Output/mydata/part-m-00000
Now, load the data from the file part-m-00000.
grunt> result = LOAD 'hdfs://localhost:9000/pig_Output/b/part-m-00000' USING BinStorage();
Verify the contents of the relation as shown below
grunt> Dump result; (1,Rajiv_Reddy,21,Hyderabad) (2,siddarth_Battacharya,22,Kolkata) (3,Rajesh_Khanna,22,Delhi) (4,Preethi_Agarwal,21,Pune) (5,Trupthi_Mohanthy,23,Bhuwaneshwar) (6,Archana_Mishra,23,Chennai) (7,Komal_Nayak,24,trivendram) (8,Bharathi_Nambiayar,24,Chennai)