Apache Pig - Split Operator


Advertisements

The SPLIT operator is used to split a relation into two or more relations.

Syntax

Given below is the syntax of the SPLIT operator.

grunt> SPLIT Relation1_name INTO Relation2_name IF (condition1), Relation2_name (condition2),

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi 
004,Preethi,Agarwal,21,9848022330,Pune 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 
006,Archana,Mishra,23,9848022335,Chennai 
007,Komal,Nayak,24,9848022334,trivendram 
008,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Pig with the relation name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); 

Let us now split the relation into two, one listing the employees of age less than 23, and the other listing the employees having the age between 22 and 25.

SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25);

Verification

Verify the relations student_details1 and student_details2 using the DUMP operator as shown below.

grunt> Dump student_details1;  

grunt> Dump student_details2; 

Output

It will produce the following output, displaying the contents of the relations student_details1 and student_details2 respectively.

grunt> Dump student_details1; 
(1,Rajiv,Reddy,21,9848022337,Hyderabad) 
(2,siddarth,Battacharya,22,9848022338,Kolkata)
(3,Rajesh,Khanna,22,9848022339,Delhi) 
(4,Preethi,Agarwal,21,9848022330,Pune)
  
grunt> Dump student_details2; 
(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar) 
(6,Archana,Mishra,23,9848022335,Chennai) 
(7,Komal,Nayak,24,9848022334,trivendram) 
(8,Bharathi,Nambiayar,24,9848022333,Chennai)

Useful Video Courses


Video

Apache Spark Online Training

46 Lectures 3.5 hours

Arnab Chakraborty

Video

Apache Spark with Scala - Hands On with Big Data

23 Lectures 1.5 hours

Mukund Kumar Mishra

Video

Learn Apache Cordova using Visual Studio 2015 & Command line

16 Lectures 1 hours

Nilay Mehta

Video

Delta Lake with Apache Spark using Scala

52 Lectures 1.5 hours

Bigdata Engineer

Video

Apache Zeppelin - Big Data Visualization Tool

14 Lectures 1 hours

Bigdata Engineer

Video

Olympic Games Analytics Project in Apache Spark for Beginner

23 Lectures 1 hours

Bigdata Engineer

Advertisements