- Apache Pig Tutorial
- Apache Pig - Home
- Apache Pig Introduction
- Apache Pig - Overview
- Apache Pig - Architecture
- Apache Pig Environment
- Apache Pig - Installation
- Apache Pig - Execution
- Apache Pig - Grunt Shell
- Pig Latin
- Pig Latin - Basics
- Load & Store Operators
- Apache Pig - Reading Data
- Apache Pig - Storing Data
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Combining & Splitting
- Apache Pig - Union Operator
- Apache Pig - Split Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Other Modes Of Execution
- Apache Pig - User-Defined Functions
- Apache Pig - Running Scripts
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
Apache Pig - IsEmpty()
The IsEmpty() function of Pig Latin is used to check if a bag or map is empty.
Syntax
Given below is the syntax of the IsEmpty() function.
grunt> IsEmpty(expression)
Example
Assume that we have two files namely emp_sales.txt and emp_bonus.txt in the HDFS directory /pig_data/ as shown below. The emp_sales.txt contains the details of the employees of the sales department and the emp_bonus.txt contains the employee details who got bonus.
emp_sales.txt
1,Robin,22,25000,sales 2,BOB,23,30000,sales 3,Maya,23,25000,sales 4,Sara,25,40000,sales 5,David,23,45000,sales 6,Maggy,22,35000,sales
emp_bonus.txt
1,Robin,22,25000,sales 2,Jaya,23,20000,admin 3,Maya,23,25000,sales 4,Alia,25,50000,admin 5,David,23,45000,sales 6,Omar,30,30000,admin
And we have loaded these files into Pig, with the relation names emp_sales and emp_bonus respectively, as shown below.
grunt> emp_sales = LOAD 'hdfs://localhost:9000/pig_data/emp_sales.txt' USING PigStorage(',') as (sno:int, name:chararray, age:int, salary:int, dept:chararray); grunt> emp_bonus = LOAD 'hdfs://localhost:9000/pig_data/emp_bonus.txt' USING PigStorage(',') as (sno:int, name:chararray, age:int, salary:int, dept:chararray);
Let us now group the records/tuples of the relations emp_sales and emp_bonus with the key age, using the cogroup operator as shown below.
grunt> cogroup_data = COGROUP emp_sales by age, emp_bonus by age;
Verify the relation cogroup_data using the DUMP operator as shown below.
grunt> Dump cogroup_data; (22,{(6,Maggy,22,35000,sales),(1,Robin,22,25000,sales)}, {(1,Robin,22,25000,sales)}) (23,{(5,David,23,45000,sales),(3,Maya,23,25000,sales),(2,BOB,23,30000,sales)}, {(5,David,23,45000,sales),(3,Maya,23,25000,sales),(2,Jaya,23,20000,admin)}) (25,{(4,Sara,25,40000,sales)},{(4,Alia,25,50000,admin)}) (30,{},{(6,Omar,30,30000,admin)})
The COGROUP operator groups the tuples from each relation according to age. Each group depicts a particular age value.
For example, if we consider the 1st tuple of the result, it is grouped by age 22. And it contains two bags, the first bag holds all the tuples from the first relation (student_details in this case) having age 22, and the second bag contains all the tuples from the second relation (employee_details in this case) having age 22. In case a relation doesn’t have tuples having the age value 22, it returns an empty bag.
Getting the Groups having Empty Bags
Let’s list such empty bags from the emp_sales relation in the group using the IsEmpty() function.
grunt> isempty_data = filter cogroup_data by IsEmpty(emp_sales);
Verification
Verify the relation isempty_data using the DUMP operator as shown below. The emp_sales relation holds the tuples that are not there in the relation emp_bonus.
grunt> Dump isempty_data; (30,{},{(6,Omar,30,30000,admin)})