Apache Pig - IsEmpty()



The IsEmpty() function of Pig Latin is used to check if a bag or map is empty.

Syntax

Given below is the syntax of the IsEmpty() function.

grunt> IsEmpty(expression)

Example

Assume that we have two files namely emp_sales.txt and emp_bonus.txt in the HDFS directory /pig_data/ as shown below. The emp_sales.txt contains the details of the employees of the sales department and the emp_bonus.txt contains the employee details who got bonus.

emp_sales.txt

1,Robin,22,25000,sales 
2,BOB,23,30000,sales 
3,Maya,23,25000,sales 
4,Sara,25,40000,sales 
5,David,23,45000,sales 
6,Maggy,22,35000,sales

emp_bonus.txt

1,Robin,22,25000,sales 
2,Jaya,23,20000,admin 
3,Maya,23,25000,sales 
4,Alia,25,50000,admin 
5,David,23,45000,sales 
6,Omar,30,30000,admin

And we have loaded these files into Pig, with the relation names emp_sales and emp_bonus respectively, as shown below.

grunt> emp_sales = LOAD 'hdfs://localhost:9000/pig_data/emp_sales.txt' USING PigStorage(',')
   as (sno:int, name:chararray, age:int, salary:int, dept:chararray);
	
grunt> emp_bonus = LOAD 'hdfs://localhost:9000/pig_data/emp_bonus.txt' USING PigStorage(',')
   as (sno:int, name:chararray, age:int, salary:int, dept:chararray);

Let us now group the records/tuples of the relations emp_sales and emp_bonus with the key age, using the cogroup operator as shown below.

grunt> cogroup_data = COGROUP emp_sales by age, emp_bonus by age;

Verify the relation cogroup_data using the DUMP operator as shown below.

grunt> Dump cogroup_data;
  
(22,{(6,Maggy,22,35000,sales),(1,Robin,22,25000,sales)}, {(1,Robin,22,25000,sales)}) 
(23,{(5,David,23,45000,sales),(3,Maya,23,25000,sales),(2,BOB,23,30000,sales)}, 
   {(5,David,23,45000,sales),(3,Maya,23,25000,sales),(2,Jaya,23,20000,admin)})  
(25,{(4,Sara,25,40000,sales)},{(4,Alia,25,50000,admin)}) 
(30,{},{(6,Omar,30,30000,admin)})

The COGROUP operator groups the tuples from each relation according to age. Each group depicts a particular age value.

For example, if we consider the 1st tuple of the result, it is grouped by age 22. And it contains two bags, the first bag holds all the tuples from the first relation (student_details in this case) having age 22, and the second bag contains all the tuples from the second relation (employee_details in this case) having age 22. In case a relation doesn’t have tuples having the age value 22, it returns an empty bag.

Getting the Groups having Empty Bags

Let’s list such empty bags from the emp_sales relation in the group using the IsEmpty() function.

grunt> isempty_data = filter cogroup_data by IsEmpty(emp_sales);

Verification

Verify the relation isempty_data using the DUMP operator as shown below. The emp_sales relation holds the tuples that are not there in the relation emp_bonus.

grunt> Dump isempty_data; 
  
(30,{},{(6,Omar,30,30000,admin)})
apache_pig_eval_functions.htm
Advertisements